Specifying the Caltech asynchronous microprocessor

R.J.R. Back a,l, A.J. Martin b, K. Sere c,l,*

a Åbo Akademi University, Department of Computer Science, FIN-20520 Turku, Finland
b California Institute of Technology, Department of Computer Science, Pasadena CA 91125, USA
c University of Kuopio, Department of Computer Science and Applied Mathematics, FIN-70211 Kuopio, Finland

Abstract

The action systems framework for modelling parallel programs is used to formally specify a microprocessor. First the microprocessor is specified as a sequential program. The sequential specification is then decomposed and refined into a concurrent program using correctness-preserving program transformations. Previously this microprocessor has been specified at Caltech, where an asynchronous circuit for the microprocessor was derived from the specification. We propose a specification strategy that is based on the idea of spatial decomposition of the program variable space.

1. Introduction

An action system is a parallel or distributed program where parallel activity is described in terms of events, so-called actions. The actions are atomic: if an action is chosen for execution, it is executed to completion without any interference from the other actions in the system. Several actions can be executed in parallel, as long as the actions do not share any variables. Atomicity guarantees that a parallel execution of an action system gives the same results as a sequential and non-deterministic execution.

A recent extension of the action system framework, adding procedure declarations to action systems [6], gives us a very general mechanism for synchronized communication between action systems. When an action in one action system calls a procedure in another action system, the effect is that of a remote procedure call. The calling action and the procedure body involved in the call are each executed as a single atomic entity.

The use of action systems permits the design of the logical behaviour of a system to be separated from the issue of how the system is to be implemented. The decision whether the action system is to be executed in a sequential or parallel fashion can

* Corresponding author.
1 Partially supported by the Academy of Finland.
be postponed to a later stage, when the logical behaviour of the action system has been designed. The construction of the program is thus done within a single unifying framework.

The action systems formalism was proposed by Back and Kurki-Suonio [3]. Later similar event-based formalisms have been put forward by several other researchers, see for example the work of Chandy and Misra [8], who describe their UNITY framework and Francez [10], who develops his IP-language.

The refinement calculus is a formalization of the stepwise refinement method of program construction. It was originally proposed by Back [1] and has been later studied and extended by several researchers, see [13, 14] among others.

Originally, the refinement calculus was designed as a framework for systematic derivation of sequential programs only. Back and Sere [5, 15] extended the refinement calculus to the design of action systems and hence it was possible to handle parallel algorithms within the calculus. Back [2] made yet another extension to the calculus showing how reactive programs could be derived in a stepwise manner within it relying heavily on the work done in data refinement. In both cases parallel and concurrent activity is modelled within a purely sequential framework. In [6] Back and Sere show how action systems with remote procedure calls can be derived within the refinement calculus for reactive systems. We will here show how this extension of the refinement calculus/action system framework is applied to a non-trivial case study, the formal derivation of an asynchronous microprocessor.

The initial specification of the microprocessor will be given as a sequential program that has the syntactic form of an action system. Our goal is to isolate the different functional components of the microprocessor, like instruction memory, data memory, ALU, registers, etc., into action systems of their own. The component action systems are joined together in a parallel composition, where they interact with each other using shared variables and remote procedure calls. The parallel composition of action systems is derived from the sequential specification using correctness-preserving program transformations within the refinement calculus.

The derivation is based on the novel idea of spatial decomposition of the program variables. At each step we identify one functional component of the microprocessor and gather the program variables and their associated code relevant to this component into a separate module, i.e., an action system. The approach is well supported by the refinement calculus. Back and Sere [7] show how this idea is reflected in a specification language based on action systems and the refinement calculus.

Martin [12] has developed a methodology for designing asynchronous VLSI circuits that is based on methods familiar from parallel program design. Using this method he has specified the same microprocessor within the CSP-framework, but without a completely formal calculus. A delay-insensitive, asynchronous circuit for the microprocessor was derived from the concurrent program that is more or less equivalent to the parallel composition of action systems that we derive here.

Our purpose here is to demonstrate that, in addition with software design, action systems and the refinement calculus provide us with a uniform framework for formal
VLSI circuit design. In this paper we concentrate on the initial steps of circuit design focusing on a high level specification of the microprocessor as a collection of parallel processes. In an accompanying paper [11] we develop these ideas close to the architectural level by e.g. taking into account the delay-insensitive features of the target circuit.

A somewhat related method and formalism is developed in [16], but the emphasis is put on the verification of and formal models for delay-insensitive circuits.

1.1. Overview of the paper

In Section 2, we describe the action systems formalism. In Section 3, we describe how action systems are composed into parallel systems. We also briefly describe the refinement calculus. In Section 4, we give an initial specification for the microprocessor as a sequential program. In Section 5, this specification is stepwise turned into a parallel composition of action systems, where each action system represents one functional component of the target microprocessor. Finally in Section 6, we conclude with some remarks on the proposed method.

2. Action systems

An action system is a statement of the form

$$\mathcal{A} :: \text{var } v; \text{proc } w \cdot \left[ \begin{array}{l}
\text{var } x := x_0; \text{proc } p_1 = P_1; \ldots; p_n = P_n; \text{do } A_1[] \ldots A_m \text{ od } \end{array} \right] : z$$

The identifiers $x$ are the variables declared in $\mathcal{A}$ and initialized to $x_0$, $p_1, \ldots, p_n$ are the procedure headers, and $P_i$ is the procedure body of $p_i$, $i = 1, \ldots, n$. Within the loop, $A_1, \ldots, A_m$ are the actions of $\mathcal{A}$. Finally, $z, v$ and $w$ are pairwise disjoint lists of identifiers. The list $z$ is the import list, indicating which variables and procedures are referenced, but not declared in $\mathcal{A}$. The lists $v$ and $w$ are the export lists, indicating which variables and procedures declared in $\mathcal{A}$ are accessible from other action systems. Procedure bodies and actions can be arbitrary statements, and may contain procedure calls.

Both procedure bodies and actions will in general be guarded commands, i.e., statements of the form

$$A = g \rightarrow S,$$

where $g$ is a boolean condition, the guard, and $S$ is a program statement, the body. The guard of $A$ will be denoted by $gA$ and the body will be denoted by $SA$.

The local variables (procedures) of $\mathcal{A}$ are those variables $x_i$ (procedures $p_i$) that are not listed in the export list. The global variables (procedures) of $\mathcal{A}$ are the variables (procedures) listed in the import and export lists. The local and global variables
(procedures) are assumed to be disjoint. Hence, \( x \cap z = \emptyset \), where \( x \) denotes the list of variables declared in \( A \) (no redeclaration of variables is thus permitted). The state variables of \( A \) consist of the local variables and the global variables.

A statement or an action is said to be local to an action system if it only refers to local variables of the action system. The procedures and actions are allowed to refer to all the state variables of an action system. Furthermore, each procedure and action may have local variables of its own.

We consider two different parameter passing mechanisms for procedures, call-by-value and call-by-result. Call-by-value is denoted with \( p(f) \), where \( f \) stands for the formal parameters, and call-by-result with \( p(\text{var}f) \). For simplicity, we will here assume that the procedures are not recursive.

### 3. Composing and refining action systems

Consider two action systems,

\[
A :: \text{var } v; \text{proc } r \bullet \\
\begin{align*}
& [ [ \text{var } x := x; \text{proc } p = P; \text{do } A_1 \quad \ldots \quad A_m \text{ od } ] ] : z
\end{align*}
\]

\[
B :: \text{var } w; \text{proc } s \bullet \\
\begin{align*}
& [ [ \text{var } y := y; \text{proc } q = Q; \text{do } B_1 \quad \ldots \quad B_k \text{ od } ] ] : u
\end{align*}
\]

where \( x \cap y = \emptyset, v \cap w = \emptyset \) and \( r \cap s = \emptyset \). Furthermore, the lists of local procedures declared in the two action systems are required to be disjoint.

The parallel composition \( A \| B \) of \( A \) and \( B \) is the action system \( C \)

\[
C :: \text{var } b; \text{proc } c \bullet \\
\begin{align*}
& [ [ \text{var } x, y := x, y; \text{proc } p = P; q = Q; \\
& \quad \text{do } A_1 \quad \ldots \quad A_m ] \quad B_1 \quad \ldots \quad B_k \text{ od } ] ] : a
\end{align*}
\]

where \( a = z \cup u - (v \cup r \cup w \cup s), b = v \cup w, c = r \cup s \). Thus, parallel composition will combine the state spaces of the two constituent action systems, merging the global variables and global procedures and keeping the local variables distinct. The imported identifiers denote those global variables and/or procedures that are not declared in either \( A \) or \( B \). The exported identifiers are the variables and/or procedures declared global in \( A \) or \( B \). The procedure declarations and the actions in the parallel composition consists of the procedure declarations and actions in the original systems.

Parallel composition is a way of associating a meaning to procedures that are called in an action system but which are not declared there, i.e., they are part of the import list. The meaning can be given by a procedure declared in another action system,
provided the procedure has been declared global, i.e., it is included in the action systems export list.

The behaviour of a parallel composition of action systems is dependent on how the individual action systems, the reactive components, interact with each other via the shared global variables and remote procedure calls. We have for instance that a reactive component does not terminate by itself: termination is a global property of the composed action system. More on these topics can be found in [2].

3.1. Hiding and revealing

Let \( \text{var } v_1, v_2; \text{proc } v_3, v_4 \bullet \mathcal{A} : z \) be an action system of the form above, where \( z \) denotes the import list and \( v_1, v_2, v_3, v_4 \) denote the export lists. We can hide some of the exported global variables (\( v_2 \)) and procedure names (\( v_4 \)) by removing them from the export list, \( \mathcal{A}' = \text{var } v_1; \text{proc } v_3 \bullet \mathcal{A} : z \). Hiding the variables \( v_2 \) and procedure names \( v_4 \) makes them inaccessible from other actions outside \( \mathcal{A}' \) in a parallel composition. Hiding thus has an effect only on the variables and procedures in the export list. The opposite operation, revealing, is also useful.

In connection with the parallel composition below we will use the following convention. Let \( \text{var } a_1; \text{proc } a_2 \bullet \mathcal{A} : a_3 \) and \( \text{var } b_1; \text{proc } b_2 \bullet \mathcal{B} : b_3 \) be two action systems. Then their parallel composition is the action system

\[
\text{var } a_1 \cup b_1; \text{proc } a_2 \cup b_2 \bullet \mathcal{A} \parallel \mathcal{B} : c
\]

where \( c = a_3 \cup b_3 - (a_1 \cup a_2 \cup b_1 \cup b_2) \) according to the definition above. Hence, the parallel composition exports all the variables and procedures exported by either \( \mathcal{A} \) or \( \mathcal{B} \). Sometimes there is no need to export all these identifiers, i.e., when they are exclusively accessed by the two component action systems \( \mathcal{A} \) and \( \mathcal{B} \). This effect is achieved with the following construct that turns out to be extremely useful later:

\[
\text{var } v; \text{proc } p \bullet \left[ \left[ \mathcal{A} \right| \parallel \mathcal{B} \right] : c
\]

Here the identifiers \( v \) and \( p \) are as follows: \( v \subseteq a_1 \cup b_1 \) and \( p \subseteq a_2 \cup b_2 \).

3.2. Decomposing action systems

Given an action system

\[
\mathcal{C} :: \text{var } u; \text{proc } s \bullet \left[ \left[ \text{var } v := v_0; \text{do } C_1 [ \ldots [ C_n \text{ od } ] \right] : z
\]

we can decompose it into smaller action systems by parallel composition. This means that we split the variables, actions and procedures of \( \mathcal{C} \) into disjoint sets so that

\[
\mathcal{C} = \text{var } u; \text{proc } s \bullet \left[ \left[ \text{var } w := w_0; \text{proc } r = R; \mathcal{A} \parallel \mathcal{B} \right] : z
\]
where

\[ \mathcal{A} ::= \text{var } a_2; \text{proc } a_3 \bullet \\
[\text{[ var } x := x0; \text{proc } p = P; \text{do } A_1 \ldots A_m \text{ od } ]} : a_1 \]

\[ \mathcal{B} ::= \text{var } b_2; \text{proc } b_3 \bullet \\
[\text{[ var } y := y0; \text{proc } q = Q; \text{do } B_1 \ldots B_k \text{ od } ]} : b_1 \]

The reactive components \( \mathcal{A} \) and \( \mathcal{B} \) interact with each other via the global variables and procedures included in the lists \( a_2, a_3, b_2, b_3 \).

In the process of decomposing the action system \( C \) into parallel reactive components, it may also be necessary to introduce some new procedures \( r \), to handle situations where an action affects program variables in both \( x \) and \( y \). As these variables are local in the decomposed action system, no procedure or action can access both. Hence, one needs to introduce auxiliary procedures that have access to the local variables, and in terms of which the original procedure/action can be expressed.

### 3.3. Refining action systems

Most of the steps we will carry out within the microprocessor derivation are purely syntactic decomposition steps. There are, however, a couple of steps where a higher level action system is refined into another action system. These steps are formally carried out within the refinement calculus, where we consider action systems as ordinary statements, i.e., as initialized iteration statements.

The refinement calculus is based on the following definition. Let \( S \) and \( S' \) be two statements. Then \( S \) is correctly refined by \( S' \), denoted \( S \sqsubseteq S' \), if for any postcondition \( Q \)

\[ \text{wp}(S, Q) \Rightarrow \text{wp}(S', Q). \]

Here \( \text{wp} \) is the weakest precondition predicate transformer of Dijkstra [9].

We will not go into details of this calculus here. The interested reader should consult [1, 5, 15, 2, 6].

### 4. Initial specification of the Caltech microprocessor

The microprocessor we want to specify has a conventional 16-bit-word instruction set of load-store type. The processor uses two separate memories for instructions and data. There are three types of instructions: ALU, memory and program-counter (pc). The ALU instructions operate on the 16 registers. The memory instructions involve a register and a data word. Some instructions use the following word as offset.

The initial action system is a sequential non-terminating loop. The variable \( i \) holds the instruction under execution. It is of record type containing several fields. Each instruction has an \( \text{op} \) field for the \( \text{opcode} \), the other fields depend on the instruction. The two memories are represented by the globally visible arrays \( \text{imem} \) and \( \text{dmem} \).
\( M_0 :: \text{var } \text{imem}, \text{dmem} \cdot \)
\[
|| \text{var } i \in \text{record}, \text{pc, offset, imem}[\text{ilow..ihigh}], \text{dmem}[\text{dlow..dhigh}],
\text{reg}[0..15], f \]
\( \text{pc} := \text{pc}0; \)
do \text{true } \rightarrow 
i, \text{pc} := \text{imem}[\text{pc}], \text{pc} + 1;
\text{if } \text{offset}(\text{i.op}) \rightarrow \text{offset}, \text{pc} := \text{imem}[\text{pc}], \text{pc} + 1
\text{if } \text{alu}(\text{i.op}) \rightarrow < \text{reg}[i.z], f > := \text{alu}(\text{reg}[i.x], \text{reg}[i.y], \text{i.op}, f)
\text{ld}(\text{i.op}) \rightarrow \text{reg}[i.z] := \text{dmem}[\text{reg}[i.x] + \text{reg}[i.y]]
\text{st}(\text{i.op}) \rightarrow \text{dmem}[\text{reg}[i.x] + \text{reg}[i.y]] := \text{reg}[i.z]
\text{ldx}(\text{i.op}) \rightarrow \text{reg}[i.z] := \text{dmem}[\text{offset} + \text{reg}[i.y]]
\text{stx}(\text{i.op}) \rightarrow \text{dmem}[\text{offset} + \text{reg}[i.y]] := \text{reg}[i.z]
\text{lda}(\text{i.op}) \rightarrow \text{reg}[i.z] := \text{offset} + \text{reg}[i.y]
\text{stpc}(\text{i.op}) \rightarrow \text{reg}[i.z] := \text{pc} + \text{reg}[i.y]
\text{jmp}(\text{i.op}) \rightarrow \text{pc} := \text{reg}[i.y]
\text{brch}(\text{i.op}) \rightarrow 
\text{if } \text{cond}(f, i.cc) \rightarrow \text{pc} := \text{pc} + \text{offset} \text{if } \neg \text{cond}(f, i.cc) \rightarrow \text{skip} \text{fi}
\text{fi}
\text{od}
\]]<>

Fig. 1. Initial specification of the microprocessor.

The index to \text{imem} is the program-counter variable \text{pc}. The registers are described as the array \text{reg}[0..15]. The action system \(M_0\) in Fig. 1 describes the processor. Here the statement \(< \text{reg}[i.z], f > := \text{alu}(\text{reg}[i.x], \text{reg}[i.y], \text{i.op}, f)\) denotes a simultaneous assignment of values to a pair of variables \text{reg}[i.z] and \(f\).

5. Decomposition into parallel action systems

Let us decompose the action system \(M_0\) into a parallel composition of action systems so that each system models one functional component of the microprocessor. At each step one component is identified by its program variables. These variables and the associated code is gathered into a module of its own. Furthermore, we make a decision on how the variables of the module should be accessed, exporting the variables and accessing them as shared variables, or making them local and accessing them via global procedures.

The components in the order of their introduction and their associated variables are as follows:

(1) Instruction memory: \text{imem}[\text{ilow..ihigh}]

(2) Program counter and offset: \text{pc, offset}
Register array: \(\text{reg}[0..15]\)
Arithmetic-logical unit: \(f\)
Data memory: \(\text{dmem}[\text{dlow}..\text{dhigh}]\)
Instruction execution: \(i\)

The main lines of the derivation will follow the presentation of Martin [12] rather closely. We describe the first three steps more carefully, the other steps follow a similar pattern.

5.1. Instruction memory

We start by making the instruction memory an action system of its own. We assign the variable \(\text{imem}\) to become local to this action system and hence, references to \(\text{imem}\) must be done via a procedure call. There are two such references in the specification, one that writes \(\text{imem}[\text{pc}]\) into the variable \(i\) and the other that writes \(\text{imem}[\text{pc}]\) to \(\text{offset}\).

Let us create a procedure \(\text{IMEM}\) that reads the instruction denoted by \(\text{pc}, \text{imem}[\text{pc}]\), into a variable \(k\) (\(k\) will be later instantiated to \(i\) and \(\text{offset}\) respective):

\[
k, \text{pc} := \text{imem}[\text{pc}], \text{pc} + 1
\]

\[
= k := \text{imem}[\text{pc}]; \text{pc} := \text{pc} + 1
\]

\[
\leq \{\text{introducing local variables}\}
\]

\[
\begin{cases}
    \text{proc }\text{IMEM}(\text{var }j) = (j := \text{imem}[\text{pc}]); \text{IMEM}(j); \text{pc} := \text{pc} + 1
\end{cases}
\]

Hence, the action system \(M_0\) is refined by the following action system:

\[
M_1 :: \text{var }\text{imem}, \text{dmem} \cdot
\]

\[
\begin{cases}
    \text{proc }\text{IMEM}(\text{var }j) = (j := \text{imem}[\text{pc}]); \\
    \text{pc} := \text{pc} + 1;
\end{cases}
\]

\[
\begin{cases}
    \text{do }\text{true } \rightarrow \\
    \text{IMEM}(i); \text{pc} := \text{pc} + 1;
\end{cases}
\]

\[
\begin{cases}
    \text{if }\text{offset}(\text{i.op}) \rightarrow \text{IMEM}(\text{offset}); \text{pc} := \text{pc} + 1
\end{cases}
\]

\[
\begin{cases}
    \text{fi };
\end{cases}
\]

\[
\begin{cases}
    \text{if }\ldots \text{as before} \ldots \text{fi}
\end{cases}
\]

\[
\text{od}
\]

\[
| : < >
\]
5.1.1. Separate FETCH and IMEM

The next step is to separate the instruction memory into an action system of its own. We therefore decompose the initial specification of the microprocessor as follows:

\[ M_1 = \texttt{var imem, dmem} \cdot [ M_2 \parallel \mathcal{I} ] : < > \]

where

\[ M_2 :: \texttt{var pc, dmem} \cdot \]

\[
[[ \texttt{var i \in record}, pc, offset, dmem[dlow..dhigh], reg[0..15], f ] \\
 pc := pc0; \\
 \texttt{do true } \rightarrow \\
 \texttt{IMEM(i), pc := pc + 1;} \\
 \texttt{if offset(i.o p) } \rightarrow \texttt{IMEM(offset), pc := pc + 1} \\
 \texttt{[=offset(i.o p) } \rightarrow \texttt{skip} \\
 \texttt{fi ;} \\
 \texttt{if ...as before... fi} \\
 \texttt{od} \\
]] : \texttt{IMEM} \]

\[ \mathcal{I} :: \texttt{var imem; proc IMEM} \cdot \]

\[
[[ \texttt{var imem[ilow..ihigh]; proc IMEM(var j) = (j := imem[pc])} ]] : pc \]

The instruction memory \textit{imem} is now located in the module \mathcal{I}. The program counter, \textit{pc}, is a shared variable between the two component modules. It is located in the module \mathcal{M}_2. Also the procedure \texttt{IMEM} has become global as it is accessed from \mathcal{M}_2. The instruction memory \textit{imem} is not directly accessed in \mathcal{M}_2. However, both \textit{pc} and \texttt{IMEM} are local to the parallel composition. Therefore this exports only the two memories, \textit{imem} and \textit{dmem}.

5.2. Program counter

Our next step is to isolate the program counter and offset administration from the rest of the processor. The program counter \textit{pc} is referenced at instruction and offset fetch and during the execution of the \textit{stpc}, \textit{jmp}, and \textit{brch} instructions.

Let us start by refining the \textit{pc} updates at instruction fetch time as follows:

\[ \texttt{IMEM}(k); \texttt{pc} := \texttt{pc} + 1 \]

\[ \texttt{\{introducing a local variable\}} \]

\[ [[ \texttt{var y; IMEM}(k); y := \texttt{pc} + 1; \text{pc} := y ]] \]

\[ \texttt{\{commuting statements\}} \]

\[ [[ \texttt{var y}; y := \texttt{pc} + 1; \text{IMEM}(k); \text{pc} := y ]] \]
\[ \{ \text{introducing procedures} \} \]
\[ \llbracket \text{var } y; \text{proc } PCI1 = (y := pc + 1); \text{proc } PCI2 = (pc := y) \rrbracket; \]
\[ PCI1; \text{IMEM}(k); PCI2 \]

where \( y \) is a fresh variable. We have refined the \( pc \) access into a separate read-access and a write-access. This will allow us a parallel \( pc \) update and instruction fetch as will become clear below. The \( pc \) update at the \( brch \) instruction can be treated similarly:

\[ pc := pc + \text{offset} \]
\[ \llbracket \text{introducing a local variable} \} \]
\[ \llbracket \text{var } z; z := pc + \text{offset}; pc := z \rrbracket \]
\[ = \{ \text{introducing procedures} \} \]
\[ \llbracket \text{var } z; \text{proc } PCA1 = (z := pc + \text{offset}); \text{proc } PCA2 = (pc := z) \rrbracket \]
\[ PCA1; PCA2 \]

The other two \( pc \) accesses are read-accesses and hence, correspond to procedure calls. Now it is a straightforward task to make the \( pc \) accesses via procedures:

\[ \mathcal{P} ::= \text{var } pc; \text{proc } PCI1, PCI2, PCA1, PCA2, PCST, PCJMP \]
\[ \llbracket \text{var } y, z \]
\[ \llbracket \text{proc } PCI1 = (y := pc + 1); \]
\[ \text{proc } PCI2 = (pc := y); \]
\[ \text{proc } PCA1 = (z := pc + \text{offset}); \]
\[ \text{proc } PCA2 = (pc := z); \]
\[ \text{proc } PCST(\text{var } o) = (o := pc); \]
\[ \text{proc } PCJMP(o) = (pc := o); \]
\[ \rrbracket; \]
\[ pc := pc0; \]
\[ \rrbracket : \text{offset} \]

Every access to \( pc \) is now done via these procedures. For instance, the instruction fetch and the subsequent \( pc \)-update \( \text{IMEM}(i); pc := pc + 1 \) is transformed to

\[ PCI1; \text{IMEM}(i); PCI2. \]

Furthermore, the \( pc \) update \( pc := pc + \text{offset} \) in the branch instruction \( brch \) is transformed to a pair of procedure calls

\[ PCA1; PCA2. \]

The offset is read during the execution of the \( ldx, stx \), and \( lda \) instructions. Furthermore, it is referenced at instruction fetch and during the program counter update at \( brch \) execution.
The offset value in the load and store instructions is received via a call to a procedure \texttt{XOFF} as follows:

\[
\text{reg}[i\ z] := \text{dmem}[\text{offset + reg}[i\ y]]
\]

\[
\leq \{\text{introducing a local variable}\}
\]

\[
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{var}} \text{ off := offset; reg}[i\ z] := \text{dmem}[\text{off + reg}[i\ y]]
\end{array} \text{\texttt{]}}
\]

\[
= \{\text{introducing a procedure}\}
\]

\[
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{proc}} \texttt{XOFF(var o)} = (o := \text{offset})
\end{array} \text{\texttt{]}}
\]

\[
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{var}} \text{ off; XOFF(off); reg}[i\ z] := \text{dmem}[\text{off + reg}[i\ y]]
\end{array} \text{\texttt{]}}
\]

The module for offset administration is defined next:

\[
\mathcal{X} :: \text{var offset}; \text{proc XOFF} \bullet
\]

\[
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{var}} \text{ offset; proc XOFF(var o)} = (o := \text{offset})
\end{array} \text{\texttt{]}} : < >
\end{array}
\]

We now have that

\[
\mathcal{M}_2 = \text{var dmem, pc} \bullet [\mathcal{M}_3 \mid \mathcal{P} \mid \mathcal{X}] : \text{IMEM}
\]

where \(\mathcal{M}_3\) is given in Fig. 2.

\[
\mathcal{M}_3 :: \text{var dmem} \bullet
\]

\[
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{var}} \text{ i} \in \text{record, dmem[dlow..dhigh], reg[0..15]}, f
\end{array} \text{\texttt{]} do}
\end{array}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{true}} \rightarrow
\end{array} \\
\begin{array}{l}
\text{PCI1; IMEM(i); PCI2};
\end{array} \\
\text{if offset(i.op) \rightarrow PCI1; IMEM(offset); PCI2}
\end{array}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{\sim offset(i.op) \rightarrow skip}}
\end{array} \text{\texttt{]}}
\end{array}
\end{array}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{fi;}}
\end{array} \\
\begin{array}{l}
\text{if alu(i.op) \rightarrow < reg}[i\ z], f := \text{alu}(\text{reg}[i\ z], \text{reg}[i\ y], i\ .op, f)
\end{array}
\end{array}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{ldr(i.op) \rightarrow reg}[i\ z] := dmem[reg}[i\ z] + \text{reg}[i\ y]]
\end{array} \text{\texttt{]}}
\end{array}
\end{array}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{st(i.op) \rightarrow dmem[reg}[i\ z] + \text{reg}[i\ y]] := \text{reg}[i\ z]}}
\end{array} \text{\texttt{]}}
\end{array} \\
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{ldx(i.op) \rightarrow}}
\end{array} \\
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{var}} \text{ off; XOFF(off); \text{reg}[i\ z] := dmem[off + \text{reg}[i\ y]]}}
\end{array} \text{\texttt{]}}
\end{array}
\end{array}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{stx(i.op) \rightarrow}}
\end{array} \\
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{var}} \text{ off; XOFF(off); \text{dmem[off + \text{reg}[i\ y]] := \text{reg}[i\ z]}}
\end{array} \text{\texttt{]}}
\end{array} \\
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{lda(i.op) \rightarrow}}
\end{array} \\
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{var}} \text{ off; XOFF(off); \text{reg}[i\ z] := off + \text{reg}[i\ y]}}
\end{array} \text{\texttt{]}}
\end{array}
\end{array}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{stpc(i.op) \rightarrow \text{\texttt{[}} \begin{array}{l}
\text{\texttt{var}} \text{ r; PCST(r); \text{reg}[i\ z] := r + \text{reg}[i\ y]}\end{array} \text{\texttt{]}}}}
\end{array} \text{\texttt{]}}
\end{array}
\end{array}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{\texttt{jmp(i.op) \rightarrow \text{\texttt{[}} \begin{array}{l}
\text{\texttt{var}} \text{ y := \text{\texttt{reg}}[i\ y]; PCJMP(y)]}}
\end{array} \text{\texttt{]}}}}
\end{array}
\end{array}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{\texttt{brch(i.op) \rightarrow}}}
\end{array} \\
\begin{array}{l}
\text{\texttt{if cond(f, i.cc) \rightarrow PCA1; PCA2 \mid \sim cond(f, i.cc) \rightarrow skip}}
\end{array}
\end{array}
\]

\[
\text{\texttt{fi}}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{od}}
\end{array}
\end{array}
\]

\[
\begin{array}{l}
\begin{array}{l}
\text{\texttt{[}} \begin{array}{l}
\text{\texttt{IMEM, PCI1, PCI2, PCA1, PCA2, PCST, PCJMP, XOFF, offset}}
\end{array} \text{\texttt{]}}
\end{array} \\
\end{array}
\]

Fig. 2. The action system \(\mathcal{M}_3\).
Observe that offset is shared between $\mathcal{M}_3$, $\mathcal{P}$ and $\mathcal{F}$, whereas pc is shared between $\mathcal{M}_3$, $\mathcal{P}$ and $\mathcal{F}$. Therefore offset is a hidden variable. The variable pc is revealed from the parallel composition, because it is needed in the module $\mathcal{F}$.

5.3. Other components

Let us now briefly consider the registers, arithmetic-logical unit, data memory and the instruction execution. These modules are specified using similar argumentation as above.

5.3.1. Registers

First we isolate the register array from the rest of the program. The 16 registers are accessed through four buses in [12]. The buses are used by the ALU and the memory unit to concurrently access the registers. With this in mind we decompose our system further.

We define three procedures $\text{REGRX}$, $\text{REGRY}$, and $\text{REGRZ}$ to read the value stored in a register, corresponding to the $x$, $y$ and $z$ fields of an instruction $i$. Furthermore, the ALU and the memory unit will use different procedures (buses), $\text{REGWA}$ and $\text{REGWM}$ respectively, to write on the registers. The instruction under consideration is kept in a shared variable $j$ which is imported to the register modules from $\mathcal{M}_3$.

Let $\mathcal{R}$ be the action system

$$\mathcal{R} ::= \text{proc } \text{REGRX, REGRY, REGRZ, REGWA, REGWM} \cdot$$

$$[[ \text{var } \text{reg}[0..15]$$

$$\quad \text{proc } \text{REGRX}(\text{var } v) = (v := \text{req}[j.x])$$

$$\quad \text{proc } \text{REGRY}(\text{var } v) = (v := \text{req}[j.y])$$

$$\quad \text{proc } \text{REGRZ}(\text{var } v) = (v := \text{req}[j.z])$$

$$\quad \text{proc } \text{REGWA}(v) = (\text{reg}[j.z] := v)$$

$$\quad \text{proc } \text{REGWM}(v) = (\text{reg}[j.z] := v)$$

$$]] : j$$

This module represents the register array and also the four buses as will become clear later when we derive modules for the memory unit, the ALU and the instruction execution.

We now have that

$$\mathcal{M}_3 = \text{var } \text{dmem} \cdot [[ \mathcal{M}_4 \parallel \mathcal{R} ]] :$$

$$\text{IMEM, PC11, PC12, PCA1, PCA2, PCST, PCJMP, XOFF, offset}$$

where $\mathcal{M}_4$ is derived from $\mathcal{M}_3$ by the following changes. Each read-access to $\text{req}[i.x]$ in $\mathcal{M}_3$ is replaced with a call to $\text{REGRX}$ in $\mathcal{M}_4$, every read-access to $\text{req}[i.y]$ is replaced with $\text{REGRY}$ and every read-access to $\text{req}[i.z]$ is replaced with $\text{REGRZ}$. A write-access to $\text{reg}[i.z]$ is replaced with a call to $\text{REGWA}$ when the ALU writes this
register and with a call to REGWM when the access is made from the memory unit as will be seen below. The register array \textit{req}[0..15] is missing from \textit{M_4}.

The register module \textit{R} and the action system \textit{M_4} communicate via a shared variable \(j\) which is exported from \textit{M_4}. The variable \(j\) is assigned the value \(i\) immediately prior the execution of the instruction kept in \(i\). This effect is achieved in \textit{M_4} by refining \(M_3\) as follows:

\[
\begin{align*}
\text{if } & \text{offset}(i.o p) \rightarrow \ldots \text{as before...} \quad \text{fi ;} \\
\text{if } & \text{alu}(i.o p) \rightarrow \ldots \text{as before...} \quad \text{fi} \\
\end{align*}
\]

\(\{\text{introducing a local variable}\}\)

\[
\begin{align*}
\text{if } & \text{offset}(i.o p) \rightarrow \ldots \text{as before...} \quad \text{fi ;} \\
\text{if } & \text{alu}(i.o p) \rightarrow \ldots \text{as before...} \quad \text{fi} \\
\end{align*}
\]

\(j := i;\)

\[
\begin{align*}
\quad & \text{if } \text{alu}(i.o p) \rightarrow \ldots \text{as before...} \quad \text{fi}
\end{align*}
\]

The variable \(j\) is needed further on when generating a module for the instruction execution. In the full paper \cite{4} we refine the registers further in order to allow more parallelism. In the final configuration each of the 16 registers constitutes a module of its own.

5.3.2. Arithmetic-logical unit

Our following task is to isolate the arithmetic-logical unit. This unit is accessed in the \textit{alu} instruction execution. As mentioned above, it has its own bus to access the register array, modelled by the procedure \textit{REGWA}. Hence, this piece of code is refined as follows:

\[
\begin{align*}
< \text{reg}[i.z], f > := & \text{alu}(\text{reg}[i.x], \text{reg}[i.y], i.o p, f) \\
\end{align*}
\]

\(\{\text{introducing register procedures, from above}\}\)

\[
\begin{align*}
|| & \text{var } x, y; \text{REGRX}(x), \text{REGRY}(y); \\
|| & \text{var } v; < v, f > := \text{alu}(x, y, i.o p, f); \text{REGWA}(v) \} | ]
\end{align*}
\]

from where it is a straightforward task to generate the module for the arithmetic-logical unit:

\[
\begin{align*}
\mathcal{A} :: \text{proc } & \text{ALU, ALUF} \quad \bullet \\
|| & \text{var } f \\
\text{proc } & \text{ALU}(u, w, o p) = \\
& (|| \text{var } v; < v, f > := \text{alu}(u, w, o p, f); \text{REGWA}(v) \} | ]); \\
\text{proc } & \text{ALUF}(\text{var } e) = (e := f) \\
|| : & \text{REGWA}
\end{align*}
\]

The \textit{ALUF} procedure is used during the \textit{brch} execution to read the value of the flag \(f\).
We now have that
\[ M_4 = \text{var } \text{dmem} \bullet [ [ M_5 || A ] ] : \]
\[ \text{IMEM, PC1, PC2, PCA1, PCA2, PCST, PJMP, XOFF, offset} \]

where for instance the above \textit{ALU} reference in \( M_4 \) is transformed to a call to the \textit{ALU} unit as follows:

\[
< \text{reg}[i.z], f > := \text{aluf}(\text{reg}[i.x], \text{reg}[i.y], i.o p, f) \\
\leq \{ \text{introducing procedures from above} \} \\
[ [ \text{var } x, y; \text{REGRX}(x); \text{REGRY}(y); \text{ALU}(x, y, i.o p) ] ] .
\]

5.3.3. Data memory

Let us now consider the data memory, \textit{dmem}. It also has its own bus to access the registers, modelled by the procedures \textit{REGRZ} and \textit{REGWM}. The data memory is read during the execution of the \textit{ld} and \textit{ldx} instructions and it is written during the store instructions \textit{st} and \textit{stx}. In the final implementation, the execution of the \textit{ld} \textit{a} instruction is also carried out via the data memory unit. Let us look at the \textit{ld} \textit{d} and \textit{st} instructions more carefully.

We have for the load instruction that

\[
\text{reg}[i.z] := \text{dmem}[\text{reg}[i.x] + \text{reg}[i.y]] \\
\leq \{ \text{introducing register procedures, from above} \} \\
[ [ \text{var } x, y; \text{REGRX}(x); \text{REGRY}(y); \text{REGWM}(v) ] ]
\]

and for the store instruction that

\[
\text{dmem}[\text{reg}[i.x] + \text{reg}[i.y]] := \text{reg}[i.z] \\
\leq \{ \text{introducing register procedures, from above} \} \\
[ [ \text{var } x, y; \text{REGRX}(x); \text{REGRY}(y); \text{REGWM}(v) ] ]
\]

We now define
\[
\mathcal{G} :: \text{var } \text{dmem}; \text{proc } \text{MADD}, \text{MSTO}, \text{MLDA} \cdot \\
[ [ \text{var } \text{dmem}[\text{dlow..dhigh}] \\
\text{proc } \text{MADD}(u, w) = ([ [ \text{var } v; v := \text{dmem}[u + w]; \text{REGWM}(v) ] ]) ; \\
\text{proc } \text{MSTO}(u, w) = \\
\text{(return)}; [ [ \text{var } v; \text{REGRZ}(v); \text{dmem}[u + w] := v ] ]) ; \\
\text{proc } \text{MLDA}(u, w) = ([ [ \text{var } ma; ma := u + w; \text{REGWM}(ma) ] ]) ; \\
] ] : \text{REGWM, REGRZ}
\]
The memory is now represented by the following module:

```plaintext
var dmem; proc MADD, MSTO, MLDA \oplus \mathcal{D} : REGWM, REGRZ
```

which exports the three memory access procedures \textit{MADD}, \textit{MSTO}, and \textit{MLDA} and imports the bus, i.e., the procedures \textit{REGWM} and \textit{REGRZ}.

The memory unit \(\mathcal{D}\) is now removed from the rest of the code in module \(\mathcal{M}_5\). Therefore, we have that

```plaintext
\mathcal{M}_5 = \text{var } dmem \cdot [\mathcal{M}_6 \parallel \mathcal{D}] : \\
IMEM, PC1, PC2, PCA1, PCA2, PCST, PCJMP, XOFF, offset, \\
REGWM, REGRZ
```

In \(\mathcal{M}_6\) we have replaced the direct memory accesses with the appropriate procedure calls, for instance the above refined load and store instructions are transformed to

```plaintext
\text{reg}[i.z] := dmem[\text{reg}[i.x] + \text{reg}[i.y]] \\
\leq \{\text{introducing procedures from above}\} \\
[\text{var } x, y; \text{REGRX}(x); \text{REGRY}(y); \text{MADD}(x, y)]
```

and

```plaintext
dmem[\text{reg}[i.x] + \text{reg}[i.y]] := \text{reg}[i.z] \\
\leq \{\text{introducing procedures from above}\} \\
[\text{var } x, y; \text{REGRX}(x); \text{REGRY}(y); \text{MSTO}(x, y)]
```

in \(\mathcal{M}_6\) respectively.

A slightly more optimized version of the memory unit is derived in the full paper [4].

5.3.4. Instruction execution

We next isolate the instruction execution into a separate module. The code for this module, \(\mathcal{E}\), is given in Fig. 3.

The instruction under execution is in this module represented by the variable \(j\), which is shared with the register array. The module uses two buses, modelled by the procedures \textit{REGRX} and \textit{REGRY} respective, for additional communication with the registers. We have replaced all the register, \textit{pc}, \textit{offset}, memory, and \textit{ALU} references with appropriate procedure calls. Observe that the \textit{pc} update during \textit{stpc} execution is carried out via a call to \textit{ALU}.

This gives us the system

```plaintext
\mathcal{M}_6 = \text{var } j \cdot [\mathcal{M}_7 \parallel \mathcal{E}] : \\
IMEM, PC1, PC2, PCA1, PCA2, PCST, PCJMP, XOFF, \\
REGRX, REGRY, ALU, ALUF, MADD, MSTO, MLDA
```
\[ E :: \text{var } j; \text{proc EXEC } \bullet \]

\[
\begin{aligned}
| & | \text{var } j \in \text{record} \]
\text{proc EXEC}(k) = \\
(j := k; \\
\text{if } \text{alu}(k.\text{op}) \rightarrow \text{return}; \\
| | | \text{var } x, y; \text{REGRX}(x); \text{REGRY}(y); \text{ALU}(x, y, k.\text{op}) | | \\
| | | \text{ld}(k.\text{op}) \rightarrow \text{return}; \\
| | | | \text{var } x, y; \text{REGRX}(x); \text{REGRY}(y); \text{MADD}(x, y) | | \\
| | | | \text{st}(k.\text{op}) \rightarrow \text{return}; \\
| | | | | \text{var } x, y; \text{REGRX}(x); \text{REGRY}(y); \text{MSTO}(x, y) | | \\
| | | | | \text{ldx}(k.\text{op}) \rightarrow \\
| | | | | | \text{var } x, y; \text{XOFF}(x); \text{REGRY}(y); \text{MADD}(y, x) | | \\
| | | | | | \text{stx}(k.\text{op}) \rightarrow \\
| | | | | | | \text{var } x, y; \text{XOFF}(x); \text{REGRY}(y); \text{MSTO}(y, x) | | \\
| | | | | | | \text{lda}(k.\text{op}) \rightarrow \\
| | | | | | | | \text{var } x, y; \text{XOFF}(x); \text{REGRY}(y); \text{MLDA}(y, x) | | \\
| | | | | | | | \text{stpc}(k.\text{op}) \rightarrow \\
| | | | | | | | | \text{var } r, y; \text{PCST}(r); \text{REGRY}(y); \text{ALU}(r, y, add) | | \\
| | | | | | | | | \text{jmp}(k.\text{op}) \rightarrow | | | | \text{var } y; \text{REGRY}(y); \text{PCJMP}(y) | | \\
| | | | | | | | | | \text{brch}(k.\text{op}) \rightarrow | | | | | | \text{var } ff; \text{ALUF}(ff); \\
| | | | | | | | | | | \text{if } \text{cond}(ff, k.\text{cc}) \rightarrow \text{PCA1}; \text{PCA2} \\
| | | | | | | | | | | | \text{if } \neg \text{cond}(ff, k.\text{cc}) \rightarrow \text{skip} \\
| | | | | | | | | | | | fi | | \\
| | | | | | fi | | \\
| | | | | | :\text{PCA1}, \text{PCA2}, \text{PCST}, \text{PCJMP}, \text{XOFF}, \text{REGRX}, \text{REGRY}, \\
| | | | | \text{ALU}, \text{ALUF}, \text{MADD}, \text{MSTO}, \text{MLDA} \\
\end{aligned}
\]

Fig. 3. Instruction execution.

where the execution of an instruction in the variable \( i \) in \( \mathcal{M}_7 \) is now initiated via a procedure call

\( \text{EXEC}(i) \).

5.4. Refine the fetch-and-execute cycle

Finally, we refine the fetch and execute cycle to make parallel instruction fetch and execution possible. In our framework, as mentioned earlier, we have to create independent actions in order to make parallel activity possible. This calls for atomicity refinement.

5.4.1. Create FETCH

Let us first collect all the transformations above, and see what is left of the action system \( \mathcal{M}_3 \), i.e., the system \( \mathcal{M}_7 \) above. The procedures EXEC and its related ALU together with the instruction and data memories, register array, and \( p \) and offset
administration were all isolated into action systems of their own, separate from $\mathcal{M}_3$, leaving only the appropriate procedure calls behind. The result is the system $\mathcal{M}_7$ that will be from here on called $\mathcal{F}_0$ where

$$\mathcal{F}_0 :: [[\text{var } i \in \text{record}$$

$$\begin{aligned}
do & \text{true} \\
& \text{PCI1;IMEM}(i);\text{PCI2}; \\
& \text{if } \text{offset}(i.\text{op}) \rightarrow \text{PCI1;IMEM}(\text{offset});\text{PCI2} \\
& \text{[} \text{offset}(i.\text{op}) \rightarrow \text{skip} \\
& \text{fi} ; \\
& \text{EXEC}(i) \\
\od
\end{aligned}$$

]] : \text{offset,IMEM,PCI1,PCI2,EXEC}

In this system there is only one atomic action and no parallelism is possible. Hence, we split the action into two distinct parts so that $\mathcal{F}_0 \leq \mathcal{F}_1$ where

$$\mathcal{F}_1 :: [[\text{var } i \in \text{record}$$

$$\begin{aligned}
do & \text{<PCI1;IMEM}(i);\text{PCI2}; \\
& \text{if } \text{offset}(i.\text{op}) \rightarrow \text{PCI1;IMEM}(\text{offset});\text{PCI2} \\
& \text{[} \text{offset}(i.\text{op}) \rightarrow \text{skip} \\
& \text{fi} >; \\
& \text{< EXEC}(i) > \\
\od
\end{aligned}$$

]] : \text{offset,IMEM,PCI1,PCI2,EXEC}.

We have used sequential notation for $\mathcal{F}_1$ by denoting the atomicity of actions explicitly with brackets.

We have that $\mathcal{F}_1$ and $\mathcal{F}$ share no variables. They communicate through the global procedure EXEC only. When looking into the specification of the procedure EXEC we notice, that when we are executing an ALU, load or store instruction, the control returns to $\mathcal{F}_1$ immediately after the call of EXEC due to the \texttt{return} statements. At this point the next instruction is fetched from the instruction memory. Hence, the execution of these three instructions in $\mathcal{F}$ can proceed in parallel with the fetch of the next instruction in $\mathcal{F}_1$.

6. Concluding remarks

We have created the action system $\mathcal{M}_8$

$$\mathcal{M}_8 :: \text{var imem,dmem} \cup [ [ I \| \mathcal{F}_1 \| \mathcal{P} \| I \| \mathcal{F} \| A \| D \| R ] ] : < >$$
that is by construction a correct refinement of the initial high level microprocessor specification \( \mathcal{M}_0 \), i.e.,

\[
\mathcal{M}_0 \subseteq \mathcal{M}_9.
\]

At Caltech, a delay-insensitive circuit is derived from a concurrent program that is essentially the same as our resulting action system [12]. The advantage of our method is that it is based on a formal calculus for reasoning about programs, the refinement calculus.

The main method used throughout our derivation was the spatial decomposition of an action system into a parallel composition of action systems that mainly communicate via (remote) procedure calls. Hence, most of the steps we carried out are correct by construction. Only a couple of steps required more tedious proofs, i.e., those where the atomicity of the system was refined.

When we compare our system to that in [12] there are a couple of notions that are implicit in an action system. The bullet operator used in [12] corresponds to an action in the sense that when an action is chosen for execution, it is jointly executed to completion by the involved modules without interference from other actions. The point of termination for an action need not coincide for every module involved in it as long as atomicity is guaranteed. The probes in [12] are here modelled by the interplay between the caller and the callee while making procedure calls.

References


