Aquarium: Cassiopea and Alewife Languages

(Prerelease Version of 20191119)

David A. Holland
Harvard University
dholland@eecs.harvard.edu

Jingmei Hu
Harvard University
jingmei_hu@g.harvard.edu

Ming Kawaguchi
Harvard University
ming@seas.harvard.edu

Eric Lu
Harvard University
ericl01@g.harvard.edu

Stephen Chong
Harvard University
chong@seas.harvard.edu

Margo I. Seltzer
University of British Columbia
seltzer@cs.ubc.ca

Abstract
This technical report describes two of the domain specific languages used in the Aquarium kernel code synthesis project. It presents the language cores in terms of abstract syntax. Cassiopea is a machine description language for describing the semantics of processor instruction sets. Alewife is a specification language that can be used to write machine-independent specifications for assembly-level instruction blocks. An Alewife specification can be used to verify and synthesize code for any machine described in Cassiopea, given a machine-specific translation for abstractions used in the specification. This article does not include an introduction to either the Aquarium system or the use of the languages. In addition to this version of the article being a draft, the Aquarium project and the languages are work in progress. This article cannot currently be considered either final or complete.

1 Introduction
The Aquarium project’s goal is to synthesize the machine-dependent parts of an operating system. This has the potential to greatly reduce the amount of work needed to port an operating system to a new machine architecture. It also potentially reduces the depth of knowledge needed to do that work – currently an OS port requires deep expert knowledge of both the OS and the new machine architecture – and with luck the overall amount of time involved as well.

This document does not discuss either code synthesis or OS porting in any detail. It is intended as a supplement to other project publications, which should in general be read first. While this document includes discussions of and rationales for language features (as well as absence of language features) these discussions assume familiarity with the surrounding context.

The two languages described in this document are Cassiopea, which is a register-transfer-list-style machine description language used for writing down the semantics of processor instruction sets, and Alewife, which is a specification or modeling language used for writing down machine-independent specifications for assembly language code blocks to be synthesized.

Cassiopea also includes material for writing machine-dependent specifications (this is the target for the Alewife translator), a concept of mapping modules used by the Alewife translator, and a simple representation of basic blocks in assembly language. These are described in this document.

This document does not, however, describe any of the other languages or file formats used in the Aquarium project or any of the Aquarium tools.

The document is structured with six parts:

- The Cassiopea language and abstract syntax (Sec. 3)
- Cassiopea types (Section 4)
- Cassiopea semantics (Section 5)
- The Alewife language and abstract syntax (Section 6)
- Alewife types and semantics (Section 7)
- Alewife translation to Cassiopea (Section 8)

2 Notation
In the abstract syntax, type judgments, and semantics judgments we use italics for metavariables (e.g. $v$) and also for words corresponding to types in the abstract syntax (e.g. declaration). We use typewriter font for words that correspond to language keywords. The notation $\overline{a_i}$ means “a sequence one or more $a$, each to be referred to elsewhere as $a_i$.” If there are no references outside the overbar, the $i$ subscript may be left off. Epsilon (\$) appearing in syntax represents an empty production. The notation "\$" represents a string literal with arbitrary contents.

Bitvectors (aka machine integers) may be any width greater than zero. Bitvector constants are represented as $0bC$, which can be thought of as an explicit sequence of zeros and ones. The number of bits in a bitvector constant (that is, the number of digits) gives its type. Thus, $0b00$ and $0b0000$ are different. In the concrete syntax, bitvector constants whose size is a multiple of 4 can also be written in the form $0x\times$C. These are desugared in the parser and not shown further in this document.

The Cassiopea and Alewife syntax should be considered disjoint. (Some elements are the same in each, but these are specified separately regardless.) They use the same metavariables as well, which should not be mixed; any language
construct in a judgment should be all Cassiopea or all Alewife. In a few places mixing is needed, in which case the translation defined in Section 8 is applied to allow inserting Alewife fragments into Cassiopea terms. The Alewife rules in Section 7 do use the same environments as the Cassiopea rules. These should be construed as holding only Cassiopea elements. Further details can be found in Section 7.

We have attempted as much as possible to use standard notation in as many places as possible.

3 Cassiopea Overview

This section covers the abstract syntax for Cassiopea.

As mentioned above, Cassiopea is a register-transfer-list style language: it models instructions as non-Turing-complete procedures that update a machine state. (This gives it an executable semantics, which is covered in Section 5.)

Because of our operating domain (which is similar to a compiler, but requires access to all the dusty corners of an architecture compiler can normally ignore) we model the machine from the assembly-language programmer’s perspective. In particular, we do not treat memory as a huge block of address space but handle it in small chunks passed in from somewhere else. We also need to model control registers as well as general-purpose registers, and machine state like whether interrupts are enabled.

Furthermore, we need to handle assembler labels: these are like pointers, but they are not themselves pointers; they have addresses, but those addresses are not resolved until after programs are compiled and linked and must be treated as abstract.

Notation. We use the following metavariables:

\[
\begin{align*}
\tau &::= \tau_{\text{base}} \mid \tau_{\text{mem}} \mid \tau_{\text{func}} \\
\tau_{\text{base}} &::= () \mid \text{int} \mid \text{bool} \mid \text{string} \mid x \mid C \text{ bit} \\
\tau_{\text{reg}} &::= C \text{ reg} \\
\tau_{\text{regs}} &::= C \text{ reg set} \\
\tau_{\text{label}} &::= C \text{ label} \\
\tau_{\text{mem}} &::= C_1 \text{ bit } C_2 \text{ len } C_3 \text{ ref} \\
\tau_{\text{func}} &::= \tau_{\text{base}, i} \rightarrow \tau_{\text{base}}
\end{align*}
\]

\(x, y, z\) Program variables (binders)
\(r\) Registers (abstract)
\(C\) Integer constants (written in decimal)
\(\text{0b} C\) Bitvector constants (written in binary)
\(\tau\) Types
\(v\) Values
\(e\) Expressions
\(S\) Statements
\(l, j\) Rule-level integers

(Other constructions are referred to with longer names.)

A number of constructions are lists written out in longhand (with a null case and a cons case) – these are written out in longhand so that typing and semantic judgments can be applied explicitly to each case, in order to, for example, thread environment updates through correctly.

Identifiers and variables. Identifiers are divided syntactically into eight categories.

\(x_{\text{mem}}\) are identifiers bound to memory regions, which are second-class; \(x_{\text{label}}\) are identifiers that are assembler labels. \(x_{\text{func}}\) and \(x_{\text{proc}}\) are identifiers bound to functions and procedures, respectively, which are also second-class. \(x_{\text{op}}\) are identifiers bound to instructions ("operations"), which are akin to procedures but distinguished from them.

\(x_\tau\) are identifiers for type aliases, which are bound to base types in declarations.

\(x_{\text{module}}\) are the names of “modules”, which are used to select among many possible groups of mapping elements.

Other identifiers \(x\) are used for other things, and should be assumed to not range over the above elements.

Note that all identifiers live in the same namespace and rebinding or shadowing them is not allowed.

All these identifiers can be thought of as variables, in the sense that they are names that stand for other things. All of them are immutable once defined, including the ordinary variables \(x\) that contain plain values.

Types. Types are divided syntactically into base types (integers, booleans, strings, bitvectors, etc.) and others (memory regions and functions). User functions may handle only base types. Furthermore, memory regions and functions are intended to be second-class for reasons discussed below, and are excluded in various places in the syntax and the typing rules. We use index typing to capture the bit width of values.

Registers. Registers are represented in the specification with the metavariable \(r\), which stands for the underlying abstract identity of a register. Declaring a register, e.g. with \(\text{letstate} x : C \text{ reg}\), allocates a fresh register \(r\) and binds the variable \(x\) to it. A subsequent declaration of the form \(\text{let} y : C \text{ reg} = x\) creates another variable \(y\) that refers to the same underlying register. One might think of registers as numbered internally. (This is different from another possible world, where the initial declaration creates a register named \(x\), and then the second declaration creates a variable that ranges over registers, and the identifiers \(x\) and \(y\) are different kinds of things. That is, however, not how Cassiopea works.)
Registers, or at least some registers, have associated with them a text form, which is declared separately and is the form an assembler for the machine expects to see. This is used when our internal representation of a program is extracted to an assembly source file. It is referred to by attaching the suffix .txt to the/a register variable. Note that because on some machines some registers may not be directly addressable by the assembler (they might be subfields of some larger addressable unit, or nonaddressable internal state), not all registers necessarily have a text form. And because of the way registers are referenced, we cannot readily check statically whether a given reference with .txt is valid or not, or at least without introducing additional machinery, so if not the failure happens at extraction time.

The type of a register is $C:\text{reg}$, which is a register that holds a $C$-bit bitvector. The bitvector value in question can be updated by assigning a new value; this is a statement ($e_1 := e_2$) and can only happen in places where statements are allowed. The construction $^*e$ reads a register.

The reader will note that the semantics rules for machines and declarations do not provide initial values for registers. Instead, executions are defined in terms of some initial register state (and also some memory state), which is required to have the right registers to match the machine definition. This allows reasoning about the execution of programs and program fragments in terms of many or all possible initial states. (These issues are discussed further below.)

**Memory.** A memory region has the type $C_1:\text{bit} \times C_2:\text{len} \times C_3:\text{ref}$. This refers to a memory region that has $C_2$ cells, each of which stores a bitvector of width $C_1$. This memory region is addressed with pointers of width $C_3$.

Note that we assume byte-addressed machines, and for the purposes of both this specification and our implementation, we assume bytes are 8 bits wide. (This restriction could be relaxed if we wanted to model various historic machines.) Thus a memory region of type $32:\text{bit} \times 4:\text{len} \times 32:\text{ref}$ has 4 32-bit values in it, which can be addressed at byte offsets 0, 4, 8, and 12. These values can of course be changed, like values in registers.

Memory regions are named with identifiers. These names, and memory regions themselves, are not first class; variables are not allowed to range over them.

Also note that memory regions are a property of programs (and thus are declared in specifications) and not a property of the machine as a whole.

**Pointers.** A pointer literal has the form $(x_{\text{mem}}, C)$, in which $x_{\text{mem}}$ is the region name and $C$ is the offset. Because memory regions are second-class, $x_{\text{mem}}$ must be specifically one of the available declared memory regions.

Pointer literals exist in the abstract syntax, but are not allowed in the concrete syntax except in specifications. The only way to get a pointer value is to look up a label (discussed immediately below) or have it provided in a register as part of the initial machine state.

A pointer literal is treated as a bitvector of the same width, so one can appear in a register or in memory. However, we enforce a restriction (not captured in the semantics rules so far) that no value in the initial machine state, whether in a register or in memory, is a pointer unless required to be so by the precondition part of the specification. All other values are restricted to be plain bitvector values.

Addition and subtraction are allowed on pointers (this changes the offset) but other bitvector operations (e.g. multiply) are disallowed and fail. Similarly, attempting to fetch from or store to a plain bitvector that is not a pointer fails.

Note however that we do not statically distinguish pointers and plain bitvectors. (We could have used flow-sensitive

### (Cassiopea Values)

<table>
<thead>
<tr>
<th>Syntax</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\tilde{v} ::= v</td>
<td>\text{fail}$</td>
</tr>
</tbody>
</table>

### (Cassiopea Operators)

<table>
<thead>
<tr>
<th>Syntax</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>$\text{unop} ::= \text{neg}</td>
<td>\text{not}$</td>
</tr>
</tbody>
</table>

### (Cassiopea Expressions)

<table>
<thead>
<tr>
<th>Syntax</th>
<th>Description</th>
</tr>
</thead>
<tbody>
<tr>
<td>$e ::= \text{true}</td>
<td>\text{false}</td>
</tr>
</tbody>
</table>

**Note:** The operators $\text{unop}$ and $\text{binop}$ are defined in terms of the underlying bitvector operations. The operators $\text{unop}$ and $\text{binop}$ are defined for bitvectors, and the underlying bitvector operations are defined for bitvectors of arbitrary width. Thus, the operators $\text{unop}$ and $\text{binop}$ are defined for bitvectors of arbitrary width.
typing to reason about when registers and memory cells contain pointers and when they do not; but this adds substantial complexity and for our problem domain does not buy much in return.) Instead, we step to failure at runtime. This can be seen in the semantics rules.

Fetching from a pointer takes the form $\text{fetch}(e, C)$. Storing to a pointer takes the form $\text{store}(e_1, C) \leftarrow e_2$. The extra constant $C$ specifies the width of the cell pointed to. (This is not an offset.) Because we do not check pointers statically, we do not know the memory region being pointed to and cannot look up its cell size; thus we need the width explicitly for typing. It is checked at runtime.

Labels. As mentioned above, by “label” we mean an assembler label or linker symbol. These have addresses (or depending on how one looks at them, they are addresses) and those addresses are constants, but the constants are not known at assembly time, so we must model them abstractly.

When one declares a memory region, one may attach a label to it, which is an additional identifier. This identifier is created as a variable of type $C$ \textit{label}. The value is a pointer to the first entry in the region, and a single type subsumption rule allows this value to be accessed and placed in an ordinary register or variable of suitable bitvector type.

The intended mechanism is that for each machine the preferred instruction on that machine for loading assembler symbols into a register can be taken an operand of type $C$ \textit{label}, and then value can be just assigned to the destination register. The type restriction on the operand is sufficient to synthesize programs that use labels correctly.

Register sets. Register sets are second-class elements intended to exist only as literals and only as the result of lowering machine-independent specifications that cannot directly talk about specific registers.

Currently they do not exist in the implementation and so everything about them is a bit fuzzy.

Register sets are not allowed to be operands to instructions to avoid state explosions when synthesizing. This restriction is currently not captured in the abstract syntax or typing rules.

Functions and procedures. Functions, defined with \textit{def}, are pure functions whose bodies are expressions. They produce values. They can access registers and memory, and can fail, but cannot update anything. Procedures, defined with \textit{proc}, are on the other hand impure and their bodies are statements. They do not produce values, but they may update the machine state.

They are otherwise similar, and are intended to be used to abstract out common chunks of functionality shared among multiple instructions in machine descriptions. Functions can also be used for state hiding in specifications.

Functions and procedures are second-class; they may be only called by their own name and may not be bound to variables or passed around. Furthermore, they are only allowed to handle base types: higher-order functions are explicitly not supported.

Operations. Operations (defined with \textit{defop}) are essentially instructions, and we refer to these interchangeably. An operation takes zero or more operands and does some transform on the machine state defined by one or more statements.

Operands are currently defined as expressions, but are restricted as follows:
- They may be values, but not string values, and not fail.
- They may be variables of register type.
- They may be variables of label type.

This restriction affects what the synthesizer tries to generate; a broader set of expressions may be accepted for verification or concrete execution and simply evaluated in place.
There is an important distinction between "operations" and "instructions". Operations are the units in which Cassiopea thinks about machine operations, and the units in which Cassiopea generates programs and code fragments, but they do not necessarily need to be single instructions. The text output to the assembler is arbitrary and can be computed on the fly based on the operand value. On some platforms the assembler defines so-called "synthetic instructions" that are potentially multiple real instructions. This facility takes that a step further by allowing the writer of the machine description to define their own synthetics.

**Other Constructs.** $e[C]$ and $e[C_1, C_2]$ extract a single bit and a slice, respectively, from a bitvector. The offsets are constants; a shift can be used beforehand if variable offsets are needed. The width of the slice must be constant for static typing.

**Machines, Mappings, Specs, and Programs.**

A *machine* is a complete description of a machine architecture; it includes declarations (including types, constants, registers, functions and procedures) and also instructions. This is typically a standalone file, or possibly several via include.

A (single) mapping is a collection of declarations used to instantiate elements in Alewife translations. These are placed into a *module*, with multiple modules per file, so that the mappings associated with multiple related Alewife specifications can be kept together. The module name is selected by using the Alewife block name.

---

A *spec* is a precondition and postcondition, which are boolean expressions, along with optional permission to destroy additional registers. (The latter is a *frame.*) Cassiopea specs are produced by compiling Alewife specs. Note that a module can also include frame declarations; these are added to any provided in the Alewife spec. A code block is permitted to destroy any register that is either explicitly listed in the frame declarations or mentioned in the postcondition. (This restriction is currently not adequately captured in the semantics rules.)

A *program* is a sequence of instruction invocations. There are no labels bound in these because currently we only not support a single basic block at a time.

**Built-in functions.** Here is a partial list of the built-in functions in Cassiopea:

- `empty (int C -> C reg set)` produces an empty register set of the requested bit size.
- `hex (int|C bit -> string)` prints numbers in hexadecimal.
- `bin (int|C bit -> string)` prints numbers in binary.
- `dec (int|C bit -> string)` prints numbers in decimal.
- `lbl (C label -> string)` prints labels (it returns the label identifier as a string).
- `format (string -> string ... -> string)` formats strings. The first argument is a format string; the remainder of the arguments are substituted into the format string where a dollar sign appears followed by the argument number (1-based). (A literal dollar sign can be inserted by using $$.) The number of additional arguments expected is deduced from the contents of the format string.
- `bv_to_len (C_1 -> C_2 bit -> C_1 bit)` returns a new bitvector of size $C_1$ with the same value up to the ability of the new size to represent that value.
- `bv_to_uint (C bit -> int)` converts a bitvector to unsigned int.
- `uint_to_bv_1 (int C_1 -> int C_2 -> C_1 bit)` converts an unsigned int $C_1$ into a bitvector of size $C_1$.
- `isptr (C bit -> bool)` tests at runtime if a bitvector value is a pointer or not.

Note that some of these functions have their own typing rules, some of which are polymorphic in bitvector size. We have not complicated the typing rules presented by including all of these as special cases.

**Concrete Syntax.** As noted earlier we do not describe the concrete syntax here; however, it does not stray very far from the abstract syntax. The operator precedence and most of the operator spellings are taken from C (to avoid violating the principle of least surprise) but most of the rest of the concrete syntax is ML-style.

There are also a few things desugared in the parser and not shown in the abstract syntax. As already mentioned,
environments. The type system uses two environments: $\Delta$ maps type alias names to the types they represent, and $\Gamma$ maps variables to the types assigned to them. Recall from the syntax that only base types may have alias names, so alias names can be treated as base types.

well-formedness. Since types include alias names, we need to check that a proposed alias name is actually a type name. At the same time we insist that the widths of bitvectors be greater than zero. The judgment for this has the form $\Delta \vdash_\text{wf} \tau$.

There is an intended invariant that only well-formed types may be entered into the variable typing environment $\Gamma$, so that types taken out of it do not need to be checked for well-formedness again.

In a typing environment comprised of $\Delta$ mapping user-defined type names (type aliases) to types and $\Gamma$ mapping program binders (variables) to types, we say that a type is well formed when all type names are well-formed and all indices are of type $\text{int}$.

expressions. Expressions produce values that have types. Because types appear explicitly in some expressions (e.g. let), we need both environments, so the form of an expression typing judgment is $\Delta, \Gamma \vdash e : \tau$. This means that we conclude $e$ has type $\tau$.

Note that the .txt form is restricted to registers; it is specifically for extracting the assembly text form of a register.

We have not written out a separate rule for each unary and binary operator. The types of operators are as follows.

Note that the bitvector operators are polymorphic in bit size.

<table>
<thead>
<tr>
<th>Operator</th>
<th>Type Signature</th>
</tr>
</thead>
<tbody>
<tr>
<td>$-$</td>
<td>$\text{int} \rightarrow \text{int}$</td>
</tr>
<tr>
<td>$\neg$</td>
<td>$\text{bool} \rightarrow \text{bool}$</td>
</tr>
<tr>
<td>$\Rightarrow$</td>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
</tr>
<tr>
<td>$\text{int} \rightarrow \text{int}$</td>
<td></td>
</tr>
<tr>
<td>$\text{int} \rightarrow \text{int}$</td>
<td></td>
</tr>
<tr>
<td>$\text{bool} \rightarrow \text{bool}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
<tr>
<td>$\forall C, C\text{ bit} \rightarrow C\text{ bit}$</td>
<td></td>
</tr>
</tbody>
</table>

4 Cassiopea Static Typing

This section describes the Cassiopea type system.
and keeping the set of possible regions fixed simplifies a number of things.

**Statements.** Statements do not produce values. We still need both environments, though, so the form of a typing judgment for a statement is \( \Delta, \Gamma \vdash S \). This means that \( S \) is well typed.

**Declarations.** Declarations update the environment. The form of a typing judgment for a declaration is \( \Delta, \Gamma \vdash \text{decl} \gg \Delta', \Gamma' \), and a judgment for a list of declarations has the same form. This means that the declaration (or list) is well typed and produces the new environment on the right. One can think of the \( \gg \) as the tail and head of an arrow with the element transforming the environments labeling the body.

We impose an additional syntactic restriction on declarations found in a machine description (as opposed to the additional declarations that may appear in a spec): they may not use the expression forms that refer to machine state (registers or memory). This is because when defining the machine there is no specific machine state to refer to; any references would need to be quantified. (And that in turn is not allowed to avoid feeding quantifiers to the solver.)

**Machines.** A machine is some declarations followed by some defops, so the typing rule is just sequencing, but there’s a wrinkle: the initial environment for the machine is not an input. \( \Delta_{\text{built-in}} \) is the (fixed) environment describing the
machine which has the form
one that says that a frame (modifies list) is well typed, which
applies an additional list of declarations to a machine,
and programs are only valid relative to a given machine.

The form of a typing judgment for a machine is
⊢ Δ, Γ ⊢ [\[
\Delta, \Gamma \vdash S_1 \quad \Delta, \Gamma \vdash S_2 \\
\Delta, \Gamma \vdash S_1; S_2
\]]

(Statement Typing)

⊢ \Delta, \Gamma \vdash e : \tau_{base} \quad x \notin \Delta, \Gamma \\
\Delta, \Gamma \vdash \Gamma[x \mapsto \tau_{base}] \vdash S
\]

⊢ \Delta, \Gamma \vdash for x \in (C_1 \ldots C_2) do S

(Declaration Typing)

⊢ \Delta, \Gamma \vdash \Gamma \vdash e : bool \\
\Delta, \Gamma \vdash \Gamma \vdash e = e_2
\]

⊢ \Delta, \Gamma \vdash \Gamma \vdash C \mapsto \Gamma \vdash e_1 : C \mapsto \Gamma \vdash e_2 : C \mapsto \Gamma \vdash store(e_1, C_2) \leftarrow e_2
\]

⊢ \Delta, \Gamma \vdash \Gamma \vdash assert(e)
\]

⊢ \Delta, \Gamma \vdash \Gamma \vdash skip
\]

⊢ \Delta, \Gamma \vdash crash
\]

built-in type aliases. (Currently there are none.) \( \Gamma_{\text{builtin}} \) is the
environment describing the types of built-in variables. This
notionally includes the built-in functions. (But as mentioned
earlier some of them actually have their own typing rules.)

The form of a typing judgment for a machine is \( \vdash \) machine \( \triangleright \Delta, \Gamma \). This means that the machine description
is well typed and provides the environment on the right for
use of other constructs that depend on the machine. (Specs
and programs are only valid relative to a given machine.)

Specs. For specifications we need two helper rules, one
that applies an additional list of declarations to a machine,
which has the same form as the judgment on a machine, and
one that says that a frame (modifies list) is well typed, which
has the form \( \Delta, \Gamma \vdash \text{frame} \). This lets us write the real rule,
which has the form \( \text{machine} \vdash \text{spec} \) and means that the spec
is well typed under the machine description.

Programs. A program is a sequence of calls to instructions.
We need judgments of the form \( \Delta, \Gamma \vdash \text{inst} \) for a single
instruction and also \( \Delta, \Gamma \vdash \text{insts} \) for the sequence.
There are two cases for a single instruction because of a
minor glitch in formulation: because the overbar notation
means "one or more", there are two cases in the syntax for
This section defines the semantics of Cassiopea.

**Environment.** The execution environment $\Lambda$ maps Cassiopea variables $x$ to values $v$. However, we take advantage of/abuse the polymorphism and dynamic typing of paper rules to also store the following in the same environment:

- $x_{\text{label}}$ (label names) map to values; specifically each label maps to a pointer that points to the base (offset 0) of the region associated with the label.
- $x_{\text{func}}$ (function names) map to pairs $\{x, e\}$, which give the list of argument names and the body for functions.
- $x_{\text{proc}}$ (procedure names) map to pairs $\{x, S\}$, which give the list of argument names and the body for procedures.
- $x_{\text{op}}$ (operation/instruction names) map to triples $\{x, e, S\}$, which give the list of argument names, the expression for the text form, and the body for operations.
- $r, \text{txt}$ (the form for the text version of a register) maps to a value.

instructions, one for zero operands and one for one or more operands; we need typing rules for both cases. Meanwhile the type entered into $\Gamma$ for a zero-operand instruction is unit to unit, not $\epsilon$ to unit, to avoid needing an additional form for types just for this case. (Notice that a one-operand instruction may not have type unit to unit because unit is not allowed as an instruction operand, so the type is not ambiguous.)

---

(More Declaration Typing)

\[
\begin{align*}
\Delta, \Gamma \vdash & \ e \triangleright \Delta, \Gamma \\
\Delta, \Gamma \vdash & \ defop \triangleright \Delta', \Gamma' \quad \Delta', \Gamma' \vdash \ defops \triangleright \Delta'', \Gamma'' \\
\Delta, \Gamma \vdash & \ defop; \ defops \triangleright \Delta'', \Gamma''
\end{align*}
\]

\[
\begin{align*}
x_{\text{op}} \notin & \ \Delta, \Gamma \\
\Delta, \Gamma \vdash & \ e : \text{string} \\
\Delta, \Gamma \vdash & \ S \quad \Gamma' = \Gamma[x_{\text{op}} \mapsto () \mapsto ()] \\
\Delta, \Gamma \vdash & \ defop \ x_{\text{op}} \{\text{txt} = e, \ sem = S\} \triangleright \Delta, \Gamma'
\end{align*}
\]

(Instruction Typing)

\[
\begin{align*}
\Delta, \Gamma \vdash & \ x_{\text{op}} : (\tau_{\text{base}} \mapsto \tau_{\text{base}}) \\
\Delta, \Gamma \vdash & \ x_{\text{op}} \ e_i \triangleright \tau_{\text{base}} \\
\Delta, \Gamma \vdash & \ e \\
\Delta, \Gamma \vdash & \ inst \quad \Delta, \Gamma \vdash \ insts \\
\Delta, \Gamma \vdash & \ machine \triangleright \Delta, \Gamma \\
\Delta, \Gamma \vdash & \ program
\end{align*}
\]

These rules let us write a judgment for a program, which has the form $machine \triangleright program$ and means that the program is well typed relative to the machine.

**Soundness.** Note that even though we do not check certain things statically, the type system remains sound: we include the necessary checks and failure states in the semantics so that evaluation does not get stuck.

We have a partial but largely complete mechanized proof of soundness for an old version of Cassiopea. We intend to update and release it when and if time permits.

### 5 Cassiopea Semantics

This section defines the semantics of Cassiopea.

- $\Delta, \Gamma \vdash \ x_{\text{reg}} : \ C_i \ \text{reg}$
- $\Delta, \Gamma \vdash \ reg-modify : x_{\text{reg}}$
- $\Delta, \Gamma \vdash \ mem-modify : (x_{\text{mem}}, e_i)$
- $\Delta, \Gamma \vdash \ frame \quad \Delta, \Gamma \vdash \ pre : \ \text{bool}$
- $\Delta, \Gamma \vdash \ post : \ \text{bool}$

(machine $\triangleright \Delta, \Gamma$)

(machine $\triangleright \Delta, \Gamma$)

(machine $\triangleright \Delta, \Gamma$)

(machine $\triangleright \Delta, \Gamma$)
Since identifiers are not allowed to overlap in well-typed programs, and register identities are not strings at all, this usage creates no conflicts.

Note that \( x_{\text{mem}} \), \( x_r \), and \( x_{\text{module}} \) do not appear in \( \Lambda \) as these require no translation at runtime.

**Machine state.** In addition to the execution environment we also need a representation of machine state. We define two stores, one for registers and one for memory. The register store \( \rho \) maps registers \( r \) to values \( v \). The memory store \( \sigma \) is more complicated: it maps pairs \( (x_{\text{mem}}, C) \) (that is, pointer literals) to pairs \( (v, C_j) \), where \( v \) is the value stored at that location and \( C_j \) is the bit width.

The bit widths of memory regions are invariant, both across the region (this is the only way they can be declared) and also over time. They are used to check the access widths appearing in fetch and store operations. Also note that new entries cannot be created in either the register store or the memory store, as real hardware does not permit such actions. The values stored in registers and memory regions are restricted by the typing rules to bitvectors (whether constants or pointers) of the appropriate width.

Notice that stepping through the declarations does not initialize the machine state. We want to reason about executions over ranges of possible starting machine states; so instead we provide a judgment that uses the typing environments to restricts the stores to forms consistent with the declarations. This is discussed further below.

**Expressions.** We describe expressions with a big-step operational semantics. The form of an expression semantic judgment is: \( \Lambda \vdash (\rho, \sigma, e) \Downarrow v \), which means that given the environment \( \Lambda \) and the machine state \( \rho, \sigma \), the expression \( e \) evaluates to the value \( v \). Expressions may refer to the machine state, but not modify it.

Expressions can fail; in addition to the explicit failure cases seen, some of the operators and built-in functions can fail. For example, as mentioned earlier, attempting bitvector arithmetic other than addition and subtraction on pointers will fail. Furthermore, division by zero fails.

Note that we currently do not statically check (in the typing rules) that the .txt form is present for every register, or that it is defined for registers on which it is used. Thus we have an explicit failure rule for when no matching declaration has been seen. We also have failure rules for bad fetch operations: if the length annotation is wrong, if the pointer is not in the machine state (this covers both unaligned accesses and out of bounds accesses), or if the value used is not a pointer. Similarly, we have failure rules for when bit indexing/slicing a pointer. (We do not, conversely, need explicit failure checks or rules for the bit indexes in the bit extraction/slicing constructs as they are statically checked.)

Also note that we include in the semantics the obvious failure propagation rules for when subexpressions fail. We do not show these explicitly as they are not particularly interesting or informative.
Statements. Unlike expressions, statements can change machine state. Thus, the form of a machine state semantics judgment (also large step) is $\Lambda \vdash (\rho, \sigma, S) \Downarrow (\rho', \sigma', S')$. This means that the statement $S$ evaluates to the irreducible statement $S'$ (which must be either skip or crash) and in the course of doing so changes the machine state from $\rho, \sigma$ to $\rho', \sigma'$.
As with expressions, statements can fail. Explicit failure rules are shown for bad stores (corresponding to the cases for bad fetches) and for a failed assertions. We also similarly include, but do not show, the obvious failure propagation rules for cases where sub-statements, or expressions within statements, fail.

**Declarations.** The semantics for declarations have judgments of the form \( \Gamma, (\rho, \sigma) \vdash \text{decl} \triangleright \Gamma' \). This means that the given declaration updates \( \Gamma \) as shown.

As stated above, we do not *initialize* the machine state while handling declarations; this instead allows us to work with arbitrary (or universally quantified) machine states afterwards. However, because the let-binding declaration evaluates an expression, it potentially needs access to a machine state. Consequently we write the rules so they accept a machine state as input, but do not update it. In the case of machine descriptions, where there is no machine state, we pass empty environments; let declarations in machine

As stated above, we do not *initialize* the machine state while handling declarations; this instead allows us to work with arbitrary (or universally quantified) machine states afterwards. However, because the let-binding declaration evaluates an expression, it potentially needs access to a machine state. Consequently we write the rules so they accept a machine state as input, but do not update it. In the case of machine descriptions, where there is no machine state, we pass empty environments; let declarations in machine descriptions are not allowed to reference machine state. In the case of the additional declarations that accompany a specification, we pass in the initial machine state; this allows values from the initial machine state to be globally bound so they can be referred to in the postcondition.

We give first the rules for a list of declarations (which chain in the obvious way), then the rules for the various declarations, then the rules for a list of operation definitions (which chain in the same way) and a rule for a single operation definition.

Note that several of the declarations do not update \( \Lambda \), and nothing is placed in \( \Lambda \) for memory regions. (And for registers, only the mapping of the identifier to its underlying register \( r \) is entered; nothing for \( r \) is inserted.)

**Machines.** Like with the typing rules, the semantics rule for a whole machine description bakes in the initial environment and gives a judgment of the form \( \vdash \text{machine} \triangleright \Gamma' \).

We also include a comparable form that includes additional declarations, as it will be used below.

**Programs.** Instructions update the machine state, and we chose to represent programs as lists of instructions (rather than having dummy instruction forms for skip and sequence) so the form of the judgments is slightly different: \( \Gamma, (\rho, \sigma, \text{inst}) \vdash (\rho', \sigma') \) means that the instruction executes and updates the machine state \( \rho, \sigma \) to \( \rho', \sigma' \). The judgments for lists of programs has the same form.

There are two versions of the judgment for instructions because instructions with no arguments are declared as taking unit, but invoked with empty operands (not with unit) to correspond to the way assembly languages normally work.

We include a final judgment of the form \( \vdash \text{machine} \triangleright (\rho, \sigma, \text{program}) \rightarrow (\rho', \sigma') \) that puts the machine on the left-hand side of the turnstile. It means that under a given machine the program maps \( \rho, \sigma \) to \( \rho', \sigma' \).

There is a limitation in the way we have formulated programs and the rules for programs, which is that there is no

<table>
<thead>
<tr>
<th>(Declaration Semantics)</th>
</tr>
</thead>
<tbody>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash e \triangleright \Lambda )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{decl} \triangleright \Lambda' \quad \Lambda', (\rho, \sigma) \vdash \text{decls} \triangleright \Lambda'' )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{type} x \rightarrow \tau_{\text{base}} \triangleright \Lambda )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{let} x : \tau_{\text{base}} = e \triangleright \Lambda[x \mapsto v] )</td>
</tr>
<tr>
<td>( \Lambda \vdash (\rho, \sigma, e) \parallel v )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{let} x.\text{txt} = e \triangleright \Lambda[r.\text{txt} \mapsto v] )</td>
</tr>
<tr>
<td>( \Lambda' = \Lambda[x_{\text{func}} \mapsto {x_i, e}] )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{def} x_{\text{func}} x_i : \tau_{\text{base}} \rightarrow \tau_{\text{base}} = e \triangleright \Lambda' )</td>
</tr>
<tr>
<td>( \Lambda' = \Lambda[x_{\text{func}} \mapsto {x_i, S}] )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{proc} x_{\text{func}} x_i : \tau_{\text{base}} \rightarrow () = S \triangleright \Lambda' )</td>
</tr>
<tr>
<td>( \Lambda' = \Lambda[x \mapsto r] \quad r \text{ fresh} )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{letstate} x : \tau_{\text{reg}} \triangleright \Lambda' )</td>
</tr>
<tr>
<td>( \Lambda' = \Lambda[x_{\text{label}} \mapsto (x_{\text{mem}}, 0)] )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{letstate} x_{\text{mem}} : \tau_{\text{mem}} \text{ with } x_{\text{label}} \triangleright \Lambda' )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash e \triangleright \Lambda )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{defop} \triangleright \Lambda' \quad \Lambda', (\rho, \sigma) \vdash \text{defops} \triangleright \Lambda'' )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{defop; defops} \triangleright \Lambda' )</td>
</tr>
<tr>
<td>( \Lambda' = \Lambda[x_{\text{op}} \mapsto {\ldots, e, S}] )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{defop x_{op} {txt = e, sem = S}} \triangleright \Lambda' )</td>
</tr>
<tr>
<td>( \Lambda' = \Lambda[x_{\text{op}} \mapsto {x_i, e, S}] )</td>
</tr>
<tr>
<td>( \Lambda, (\rho, \sigma) \vdash \text{defop x_{op} x_{i} : \tau_{\text{base}}, {\text{txt} = e, \text{sem} = S}} \triangleright \Lambda' )</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>(Machine Semantics)</th>
</tr>
</thead>
<tbody>
<tr>
<td>( \Lambda_{\text{builtin}}(({}), ({})) \vdash \text{decls} \triangleright \Lambda \quad \Lambda \vdash \text{defops} \triangleright \Lambda' )</td>
</tr>
<tr>
<td>( \vdash \text{decls; defops} \triangleright \Lambda' )</td>
</tr>
</tbody>
</table>
easy way to represent failure. (Failure in this might represent triggering an exception and stopping execution, which we
do not model, or invoking “unpredictable” or “undefined”
behavior in the processor and transitioning to an arbitrary
unknown machine state.)

The intended behavior is that a program that fails during
execution (that is, the body of one of its instructions steps to
i
crash) enters a state where no postcondition can evaluate
to true. We have decided for the moment that working
this explicitly into the formalism would result in a lot of
complication and obscuration without providing any useful
information.

Specifications. For specifications we need three judg-
ments: the first two state what the reg-modify and
mem-modify clauses mean, respectively (they are properties
on initial and final register and memory states), and the last
one says what it means for a program to satisfy a specifi-
cation.

Note that the reg-modify and mem-modify rule as shown
is slightly misleading, because the register and pointer list
actually written in the input file is implicitly augmented with
all registers and pointers mentioned in the postcondition
before it gets to this point.

Machine state validity. As discussed above we do not
initialize the machine state while processing declaration.
Instead we treat the starting machine state as an input (e.g.
in the final judgment about programs) or quantify it universal-
ly (as in the specification judgment). In order to do this we must,
however, have a predicate to reject machine states that do
not match the machine description.

The validity judgment has the form \( \Delta, \Gamma, \Delta \vdash \rho \), and corre-
spondingly for \( \sigma \) (except without \( \Lambda \)) and then for \( \rho, \sigma \) (both
stores at once). It means that the given stores match the
given environments.

We will use this with the typing environments that come
from both the machine description and the additional decla-
rations arising from a specification. (Recall that memory
regions are normally defined by specifications and not by
machines.)

In the case of registers we need access to \( \Lambda \) in order to
handle the names of registers. We do not use the \( \Lambda \) generated
from the additional declarations in a specification; this avoids
circularity. We can get away with this because specifications
are not allowed to define new registers.

For memory regions we need to enumerate the valid off-
sets for the region (note the literal 8 that hardwires 8-bit
bytes) and check the cell width.
6 Alewife Overview

This section describes Alewife, our specification language for writing abstracted machine-independent specifications of low-level code.

Alewife specifications are abstractions of machine-level Cassiopea specifications; we say that Cassiopea constructs are lifted into Alewife and Alewife constructs are lowered into Cassiopea.

Alewife is only for specifications, so there are no statements, no updates, and no notion of instructions or programs.

**Notation.** We use the following metavariables:

- \( x, y, z \) Program variables (binders)
- \( r \) Registers (abstract)
- \( C \) Integer constants (written in decimal)
- \( \texttt{0b}C \) Bitvector constants (written in binary)
- \( N \) Symbolic integer constants
- \( \tau \) Types
- \( v \) Values
- \( e \) Expressions
- \( i, j \) Rule-level integers

(Other constructions are referred to with longer names.)

As noted previously, Alewife types and expressions should be considered distinct from Cassiopea ones (even where they correspond directly). We use the same letters in the hopes that this will cause less confusion (even in the definition of the translation) than picking an entirely different set of letters for Alewife.

**Identifiers and variables.** In Alewife there are six syntactic categories of identifiers:

- Like in Cassiopea, \( x_{\text{mem}} \) name memory regions and \( x_{\text{label}} \) are assembler labels. \( x_{\text{func}} \) name functions, and \( x_{\tau} \) are type aliases.
- \( x_{\text{block}} \) are the names given to blocks of code to be synthesized, which are used to match up with the module names in Cassiopea mappings.

Other \( x \) are ordinary variables that range over other things, and may be presumed to not range over the above reserved categories.

All variables are immutable, in the sense that they do not change once bound.

**Symbolic Constants.** In some places in Alewife symbolic constants \( N \) are permitted to occur in places where only integer constants are allowed in the corresponding Cassiopea constructions. In particular, the bit sizes associated with types (and the lengths of memory regions, which are functionally similar) may be given as symbolic values \( x \) instead of integer constants. These must be bound to integer constants either directly in the Alewife spec, in the Cassiopea mapping, or by the Cassiopea machine description. This allows the concrete sizes of bitvectors to vary depending on the machine architecture.

Note: for compatibility with older versions of Alewife the symbolic constant \texttt{word} is recognized and replaced with \texttt{wordsize}. New specifications should not use \texttt{word} as a symbolic constant, as it is properly expected to be a type name.

**Types.** Like in Cassiopea, Alewife types are divided syntactically into base types and others. The chief difference from Cassiopea is that bit widths (and the lengths of memory regions) can be symbolic constants. However, an additional difference is that pointers (\texttt{ptr}) are distinguished from plain bitvectors (\texttt{vec}). This is reasonably possible in Alewife because it need not reason about the progression of values through machine registers (only before and after states)... but also currently pointless as we do not define a static type-check for Alewife.

Strings and unit are also absent, as they are not needed for specifications.

As syntactic sugar and for compatibility with an earlier version of Alewife, if no size is given \texttt{wordsize} is implicitly inserted. This applies to all the base types but not \( r_{\text{mem}} \), which worked differently then.

**Values and expressions.** The values in Alewife correspond directly to the values in Cassiopea. Likewise for operators and most expressions. Note that the width argument of fetch can be a symbolic size.

The notable change is the addition of bounded quantification. \texttt{forall} and \texttt{exists} are allowed to quantify over finite sets of whatever type; these are expanded to logical and/or expressions (respectively) in Cassiopea. This is useful when the set is abstract at the Alewife level and provided by the

\[
(N := C \mid x)
\]

\[
(\text{Alewife Types})
\]

\[
\begin{align*}
\tau & := \\
& \tau_{\text{base}} \mid \tau_{\text{mem}} \mid \tau_{\text{func}} \\
\tau_{\text{base}} & := \text{int} \mid \text{bool} \mid x_{\tau} \\
& \mid N \text{ vec} \mid N \text{ ptr} \mid N \text{ reg} \mid N \text{ label} \mid \tau_{\text{regs}} \\
\tau_{\text{regs}} & := N \text{ reg set} \\
\tau_{\text{mem}} & := N_{1} \text{ bit} N_{2} \text{ len} N_{3} \text{ ref} \\
\tau_{\text{func}} & := \tau_{\text{base}}, \quad \rightarrow \tau_{\text{base}}
\end{align*}
\]

\[
(\text{Alewife Values})
\]

\[
\begin{align*}
v & := \text{true} \mid \text{false} \mid C \mid \texttt{0b}C \mid (x_{\text{mem}}, C) \\
& \mid \text{fail}
\end{align*}
\]
mapping definitions or the machine descriptions. (For example, the set of callee-save registers might often appear here.)

Declarations. Alewife declarations come in two forms: require and provide. The second form declares elements in the ordinary way.

The first form declares an element that must be provided by the Cassiopea mappings (or the machine description). (So the type is given, but not the value. This functions as a form of import, and allows an Alewife file to be checked on its own separately from any particular machine description or Cassiopea mappings. However, we do not currently define or implement such a check.)

Note that it is possible to require functions that implicitly depend on machine state, or that depend on machine state on some machines and not others. Such functions can also depend on constants or other elements that are not visible in the Alewife specification at all.

The region declarations declare memory regions, like the memory-typed let state declarations in Cassiopea. (These are implicitly always provide, because for memory regions the corresponding require declaration would be entirely equivalent but require cutpaste in the Cassiopea mapping.)

Note that the parameters of the region can be symbolic constants if abstraction is needed.

Block-lets. While Alewife expressions include let-bindings, the scope of those let-bindings is conventional: it lasts until the end of the expression. In order to refer to values taken from the initial state (that is, the machine state of which the precondition must be true) we need a way to bind these values so their scope extends to the postcondition. The block-lets serve this purpose in Alewife, much like the additional declarations seen in Cassiopea specs can. These are found within a block (because a block corresponds to a synthesis problem, it is meaningful to associate before and after machine states with it) and the scope is the entire block.
Frames. Frame declarations in Alewife are exactly the same as in Cassiopea. Because Alewife files are machine-independent, the registers mentioned must be abstract and concretized via the Cassiopea mappings.

Blocks and specs. As just noted, a block is a single synthesis problem, and a full spec is that plus a preamble of declarations. (It used to be possible to place multiple related blocks together in a single Alewife specification, which would produce one Cassiopea specification for each block. This was removed because it caused unspecified problems in the implementation, as well as complications generating output because Cassiopea only allows one spec at a time. Common declarations can now be shared with include.)

As noted elsewhere, the block name is used to look up the Cassiopea mapping module to apply.

7 Alewife Typing and Semantics

We do not provide (or implement) a full typechecking pass for Alewife. Instead, when we lower to Cassiopea, we allow the Cassiopea typechecker to reject invalid results. (Invalid results might be caused by invalid Alewife input, or by bad/mismatched Cassiopea mapping definitions.)

The rules provided here are for doing scans over the declarations sufficient to make the translation to Cassiopea work and no more.

Environments. We retain the Cassiopea typing environments, $\Delta, \Gamma$. We add an additional environment $\Sigma$, which maps identifiers to integer constants. This is a projection of the Cassiopea execution environment $\Lambda$: it holds mappings only for variables defined as integer constants and excludes everything else. We include a separate set of rules for extracting these integer constants without doing a full Cassiopea execution. (Among other things, this avoids involving machine state or the machine state stores.)

Translation. The translation (lowering) from Alewife to Cassiopea, defined in the next section, appears cross-recursively in the rules in this section.

Because $\Delta, \Gamma$ are Cassiopea environments, they map identifiers to Cassiopea types, not Alewife ones. This means Alewife types must be lowered on the fly in order to update them correctly.

Integer constant extraction. The integer constant extraction rules do a simple pass over Cassiopea declarations to extract the variables defined as integer constants. These populate a substitution environment $\Sigma$ that we use for lowering Alewife types containing symbolic constants.

These rules are judgments of the form $\Sigma \vdash\Sigma'$ or $\Sigma \vdash\Sigma'$, plus one of the form $\vdash machine \vdash\Sigma$ for a whole machine description.

Typing. The declaration typing rules are intended to accumulate types for all the declarations in an Alewife spec. They are applied concurrently with the Cassiopea declaration rules to the Alewife specification, the Cassiopea machine description, and the Cassiopea mapping. (How this is made to happen is described below.)

The declaration typing rules have judgments of the form $\Delta, \Gamma, \Sigma \vdash decl \vdash\Delta', \Gamma', \Sigma'$ and $\Delta, \Gamma, \Sigma \vdash decls \vdash\Delta', \Gamma', \Sigma'$. These mean that the declaration or declarations update the type environment (and integer constant environment) as shown.

Note that there is a special-case rule for provide value for when the value is an integer constant; this enters the constant into $\Sigma$. The integer constants are in turn used when lowering the types of memory regions, which can be seen in the last two rules.

Block-lets. The rules for block-lets are effectively the same as the rules for declarations. The ways in which block-lets are special mostly do not apply here. Note however that even though we pass through $\Sigma$ (for consistency of the form of the rules) there is no rule for loading integer constants.
(Alewife Declaration Typing)

\[
\Delta, \Gamma, \Sigma \vdash decl \triangleright \Delta', \Gamma', \Sigma' \quad \Delta', \Gamma', \Sigma \vdash decls \triangleright \Delta'', \Gamma'', \Sigma''
\]

\[
\Delta \vdash_{\text{wf}} x_r \quad \Delta, \Gamma, \Sigma \vdash \text{require type } x_r \triangleright \Delta', \Gamma', \Sigma
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{AC}[\tau_{\text{base}}] = \tau \quad \Gamma(x) = \tau
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{require value } x: \tau_{\text{base}} \triangleright \Delta', \Gamma', \Sigma
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{AC}[\tau_{\text{func}}] = \tau \quad \Gamma(x_{\text{func}}) = \tau
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{require func } x_{\text{func}}: \tau_{\text{func}} \triangleright \Delta', \Gamma', \Sigma
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{let } x: \tau_{\text{base}} = e \triangleright \Delta', \Gamma', \Sigma
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{AC}[\tau_{\text{spec new}}] = \tau 
\]

( Alewife Block Typing )

\[
\Delta, \Gamma, \Sigma \vdash e \triangleright \Delta', \Gamma', \Sigma
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{block-let} \triangleright \Delta', \Gamma', \Sigma'
\]

\[
\Delta', \Gamma', \Sigma \vdash \text{block-lets} \triangleright \Delta'', \Gamma'', \Sigma''
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{AC}[\tau_{\text{base}}] = \tau 
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{ac}[\tau_{\text{func}}] = e' \quad \Delta, \Gamma \vdash e': \tau \quad \Gamma' = \Gamma[x \mapsto \tau]
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{let } x: \tau_{\text{base}} = e \triangleright \Delta', \Gamma', \Sigma
\]

( Alewife Spec Semantics )

\[
\text{module } = \text{module } x_{\text{module}} \{ \text{decls}_{\text{map}}, \text{frame}_{\text{map}} \}
\]

\[
\text{spec} = \text{decls}_{\text{spec}}; \text{block } x_{\text{block}} \{ \text{block-lets}; \text{frame}_{\text{spec}}; \text{pre}; \text{post} \}
\]

\[
\text{x}_{\text{module}} = x_{\text{block}} \quad \text{frame} = \text{frame}_{\text{spec}} \cup \text{frame}_{\text{map}}
\]

\[
\text{spec}_{\text{new}} = \text{decls}_{\text{spec}}; \text{block-lets}; \text{frame}; \text{pre}; \text{post}
\]

\[
+ \text{machine} \triangleright \Delta_0, \Gamma_0
\]

\[
+ \text{machine} \triangleright \Sigma_0 \land (\Delta_0 \subseteq \Delta) \land (\Gamma_0 \subseteq \Gamma) \land (\Sigma_0 \subseteq \Sigma)
\]

\[
\Delta, \Gamma \vdash \text{decls}_{\text{map}} \triangleright \Delta, \Gamma \quad \Sigma \vdash \text{decls}_{\text{map}} \triangleright \Sigma
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{block-lets} \triangleright \Delta, \Gamma, \Sigma
\]

\[
\Delta, \Gamma, \Sigma \vdash \text{AC}[\text{spec}_{\text{new}}] = \Omega
\]

\[
\text{machine}, \text{module}, \text{spec} \triangleright \Omega
\]

\[
\text{into } \Sigma \text{ from block-lets. Integer constants used in types and}
\text{ defined in the Alewife spec should be defined with provide}
\text{ value; block-lets are intended to provide access to machine}
\text{ state.}
\]

\textbf{Specs.} The spec-level rule is enormous and difficult to read, so the next few paragraphs walk through it.

The conclusion is that a given machine, mapping module, and Alewife specification produce a final translation output \( \Omega \).

The first two premises expand the mapping module and Alewife specification as we’ll need to work with the components. Then we require that these match (by name), combine the frame declarations, and generate a new spec updated with the additional frame declarations that we’ll feed to the translation.

The next two premises generate initial environments: the Cassiopea typing environments induced by the machine description, and its integer constants.

We then nondeterministically/magically (by the power of paper rules) choose a final set of environments \( \Gamma, \Delta, \Sigma \) that represents a fixpoint of collecting all the declarations. These contain at least everything in the initial environments, but also contain everything needed such that going through, respectively, the Cassiopea declarations from the mapping
file (both types and constants); the Alewife declarations; and the Alewife block-lets does not add anything additional.

The final environments can then be used to run the translation on the entire spec and get the final output.

Such an evaluation strategy for declarations is required because the Alewife declarations rely on the Cassiopea mapping file (most notably for resolving symbolic constants) but the Cassiopea mapping file is in turn also specifically allowed to refer to objects declared by the Alewife spec, such as memory regions. In the implementation this circularity is resolved by lifting both the Cassiopea and Alewife declarations (and block-lets) into a common representation and topologically sorting them based on identifier references. (Genuinely circular references among identifiers are prohibited.) From this point they can be handled in order in a more conventional way.

**Complete output.** Note that the complete output file includes the declarations from the Cassiopea mapping module (declsmap) as well as the translated Alewife spec \( \Omega \). Apart from symbolic constants we do not substitute the definitions of the mapping elements, as that would make a rather large mess, especially with functions; instead we include the definitions and let the translation refer to them. In fact, because of the declaration ordering issues, in the implementation the complete output the mapping declarations and translated Alewife declarations can be arbitrarily interleaved.

Note furthermore that it would not be sufficient to include only the mapping declarations explicitly imported with `require` declarations, as those may refer freely to other things declared in the mapping module that the Alewife spec itself may have no cognizance of whatsoever.

### Alewife – Cassiopea Type Translation

\[
\begin{align*}
\Delta, \Gamma, \Sigma & \vdash \text{AC}[N] = \begin{cases} 
C & N = C \\
\Sigma(x) & N = x \land x \in \Sigma \\
\bot & N = x \land x \notin \Sigma
\end{cases} \\
\Delta, \Gamma, \Sigma & \vdash \text{AC}[x] = \begin{cases} 
\Delta(x_r) & x_r \in \Delta \\
\bot & x_r \notin \Delta
\end{cases} \\
\Delta, \Gamma, \Sigma & \vdash \text{AC}[\text{int}] = \text{int} \\
\Delta, \Gamma, \Sigma & \vdash \text{AC}[\text{bool}] = \text{bool} \\
\Delta, \Gamma, \Sigma & \vdash \text{AC}[N \text{ vec}] = \text{AC}[N] \text{ bit} \\
\Delta, \Gamma, \Sigma & \vdash \text{AC}[N \text{ ptr}] = \text{AC}[N] \text{ bit} \\
\Delta, \Gamma, \Sigma & \vdash \text{AC}[N \text{ reg}] = \text{AC}[N] \text{ reg} \\
\Delta, \Gamma, \Sigma & \vdash \text{AC}[N \text{ reg set}] = \text{AC}[N] \text{ reg set} \\
\Delta, \Gamma, \Sigma & \vdash \text{AC}[N \text{ bit}] = \text{AC}[N] \text{ bit} \\
\Delta, \Gamma, \Sigma & \vdash \text{AC}[N \text{ len}] = \text{AC}[N] \text{ len} \\
\Delta, \Gamma, \Sigma & \vdash \text{AC}[N \text{ ref}] = \text{AC}[N] \text{ ref} \\
\end{align*}
\]

\[
\begin{align*}
\text{AC}[\tau_{\text{base}} \rightarrow \tau_{\text{base}}] & = \text{AC}[\tau_{\text{base}}] \rightarrow \text{AC}[\tau_{\text{base}}]
\end{align*}
\]

### 8 Lowering Alewife

The semantics of an Alewife specification depend on material taken from a Cassiopea mapping and machine description. This does not preclude defining a semantics for Alewife in terms of that material, or even some abstracted concept of what any such Cassiopea material might be. However, doing so is messy (as can be seen from the material in the previous section, which does not even attempt to handle expression evaluation) and not necessarily very rewarding or illuminating.

So instead, as mentioned already, we write only enough typing rules to prepare material for writing a translation (lowering) to Cassiopea, and apply the Cassiopea typing (and, implicitly, semantics) to the lowered material. That material goes into the Cassiopea typing environments \( \Delta, \Gamma \), and as discussed in the previous section, we also maintain an additional environment \( \Sigma \) of integer constants used for substituting symbolic constants in types.

This section defines the translation. \( \text{AC}[a] \) defines the Cassiopea lowering of an Alewife element \( a \). We make the translation polymorphic over the various kinds of element; that is, \( \text{AC}[\tau] \) is the translation of a type, \( \text{AC}[e] \) is the translation of an expression, etc. Some of the translation rules rely on the environments; these are written \( \Delta, \Gamma, \Sigma \vdash \text{AC}[a] \).

Some of the translation rules produce \( \bot \). If these are reached, the translation fails; this can happen if the Alewife spec was malformed and also potentially if the mapping module failed to declare elements that were expected of it, or declared them in an incompatible or inconsistent way. The rules in the previous section exclude some of these cases, but we are not (yet) prepared to argue that they rule out all translation-time failures.
Alewife - Cassiopea Expression Translation

\[ \Delta, \Gamma, \Sigma \vdash AC[\llbracket x \rrbracket] = \begin{cases} \text{true} & x \in \Gamma \\ \bot & x \notin \Gamma \end{cases} \]

\[ AC[\llbracket \text{true} \rrbracket] = \text{true} \]
\[ AC[\llbracket \text{false} \rrbracket] = \text{false} \]
\[ AC[\llbracket C \rrbracket] = C \]
\[ AC[\llbracket 0bC \rrbracket] = 0bC \]
\[ AC[\llbracket \text{fail} \rrbracket] = \text{fail} \]

\[ \Delta, \Gamma, \Sigma \vdash AC[\llbracket x_{\text{func}}(\overline{e}) \rrbracket] = \begin{cases} x_{\text{func}}(AC[\llbracket e \rrbracket]) & x_{\text{func}} \in \Gamma \\ \bot & x_{\text{func}} \notin \Gamma \end{cases} \]

\[ AC[\llbracket \text{unop } e \rrbracket] = \text{unop } AC[\llbracket e \rrbracket] \]
\[ AC[\llbracket e_1 \text{ binop } e_2 \rrbracket] = AC[\llbracket e_1 \rrbracket \text{ binop } AC[\llbracket e_2 \rrbracket] \]
\[ AC[\llbracket e[C] \rrbracket] = AC[\llbracket e \rrbracket][C] \]
\[ AC[\llbracket e[C_1, C_2] \rrbracket] = AC[\llbracket e \rrbracket][C_1, C_2] \]
\[ AC[\llbracket \text{let } x : \tau_{\text{base}} = e_1 \text{ in } e_2 \rrbracket] = AC[\llbracket e_1 \rrbracket] \text{ in } AC[\llbracket e_2 \rrbracket] \]
\[ AC[\llbracket \text{if } e_1 \text{ then } e_2 \text{ else } e_3 \rrbracket] = \text{if } AC[\llbracket e_1 \rrbracket] \text{ then } AC[\llbracket e_2 \rrbracket] \text{ else } AC[\llbracket e_3 \rrbracket] \]

\[ \Delta, \Gamma, \Sigma \vdash AC[\llbracket x_{\text{mem}}, e \rrbracket] = \begin{cases} (x_{\text{mem}}, AC[\llbracket e \rrbracket]) & x_{\text{mem}} \in \Gamma \\ \bot & x_{\text{mem}} \notin \Gamma \end{cases} \]

\[ \Delta, \Gamma, \Sigma \vdash AC[\llbracket x_{\text{label}} \rrbracket] = \begin{cases} x_{\text{label}} & x_{\text{label}} \in \Gamma \\ \bot & x_{\text{label}} \notin \Gamma \end{cases} \]

\[ AC[\llbracket e \rrbracket] = \text{"}AC[\llbracket e \rrbracket]\text{"} \]
\[ AC[\llbracket \text{fetch}(e, N) \rrbracket] = \text{fetch}(AC[\llbracket e \rrbracket], AC[\llbracket N \rrbracket]) \]
\[ AC[\llbracket x_1, \ldots, x_k \rrbracket] = \{ AC[\llbracket x_1 \rrbracket], \ldots, AC[\llbracket x_k \rrbracket] \} \]
\[ AC[\llbracket ||e|| \rrbracket] = ||AC[\llbracket e \rrbracket||] \]
\[ AC[\llbracket e_1 \in e_2 \rrbracket] = AC[\llbracket e_1 \rrbracket] \in AC[\llbracket e_2 \rrbracket] \]

Notice that the translations for require declarations are empty; as mentioned in the previous section, this is because the declarations from the mapping module are output along with the translated Alewife specification.

9 Conclusion

In this technical report, we described two domain specific languages involved in the Aquarium kernel synthesis project. First, we presented a machine modeling language named Cassiopea that can be used to describe the semantics of many different processor ISAs at the assembly language level. Then, we presented a specification language named Alewife that allows stating abstract specifications for blocks of assembly code, such that these abstract specifications can be lowered to concrete specifications and used for synthesis and verification against Cassiopea machine descriptions.

We note that this is work in progress, and does not yet present a final or complete view of either the Aquarium system or the languages presented.

References

Alewife – Cassiopea Block Translation

\[
\begin{align*}
\mathcal{AC} \llbracket \text{let } x : \tau_{\text{base}} = e \rrbracket & = \text{let } x : \mathcal{AC} \llbracket \tau_{\text{base}} \rrbracket = \mathcal{AC} \llbracket e \rrbracket \\
\mathcal{AC} \llbracket \text{reg-modify } x_{\text{reg}} \rrbracket & = \text{reg-modify } \mathcal{AC} \llbracket x_{\text{reg}} \rrbracket \\
\mathcal{AC} \llbracket \text{mem-modify } (x_{\text{mem}}, e_i) \rrbracket & = \text{mem-modify } (\mathcal{AC} \llbracket x_{\text{mem}} \rrbracket, \mathcal{AC} \llbracket e_i \rrbracket)
\end{align*}
\]

Alewife – Cassiopea Declaration Translation

\[
\begin{align*}
\Delta, \Gamma, \Sigma \vdash \mathcal{AC} \llbracket \text{require type } x_{\tau} \rrbracket & = \begin{cases} 
\epsilon & x_{\tau} \in \Delta \\
\bot & x_{\tau} \notin \Delta 
\end{cases} \\
\Delta, \Gamma, \Sigma \vdash \mathcal{AC} \llbracket \text{require value } x : \tau_{\text{base}} \rrbracket & = \begin{cases} 
\epsilon & x \in \Gamma \\
\bot & x \notin \Gamma 
\end{cases} \\
\Delta, \Gamma, \Sigma \vdash \mathcal{AC} \llbracket \text{require func } x_{\text{func}} : \tau_{\text{func}} \rrbracket & = \begin{cases} 
\epsilon & x_{\text{func}} \in \Gamma \\
\bot & x_{\text{func}} \notin \Sigma 
\end{cases}
\end{align*}
\]

\[
\begin{align*}
\mathcal{AC} \llbracket \text{provide type } x_{\tau} = \tau \rrbracket & = \text{type } x_{\tau} = \mathcal{AC} \llbracket \tau \rrbracket \\
\mathcal{AC} \llbracket \text{provide value } x : \tau_{\text{base}} = e \rrbracket & = \text{let } x : \mathcal{AC} \llbracket \tau_{\text{base}} \rrbracket = \mathcal{AC} \llbracket e \rrbracket \\
\mathcal{AC} \llbracket \text{provide func } x_{\text{func}} : x_{\tau_{\text{base}}} \rightarrow \tau_{\text{base}} = e \rrbracket & = \text{def } x_{\text{func}} \llbracket x_{\text{func}} : \mathcal{AC} \llbracket \tau_{\text{base}} \rrbracket \rightarrow \mathcal{AC} \llbracket \tau_{\text{base}} \rrbracket = \mathcal{AC} \llbracket e \rrbracket \\
\mathcal{AC} \llbracket \text{region } x_{\text{mem}} : \tau_{\text{mem}} \rrbracket & = \text{letstate } x_{\text{mem}} : \mathcal{AC} \llbracket \tau_{\text{mem}} \rrbracket \\
\mathcal{AC} \llbracket \text{region } x_{\text{mem}} : \tau_{\text{mem}} \text{ with } x_{\text{label}} \rrbracket & = \text{letstate } x_{\text{mem}} : \mathcal{AC} \llbracket \tau_{\text{mem}} \rrbracket \text{ with } x_{\text{label}}
\end{align*}
\]