Jeremy Siek преди 4 години
родител
ревизия
d4a21c5971
променени са 1 файла, в които са добавени 311 реда и са изтрити 315 реда
  1. 311 315
      book.tex

+ 311 - 315
book.tex

@@ -1397,53 +1397,38 @@ criteria in the following diagram.
  \node (o)  at (4, -2) {$n$};
 
  \path[->] (p1) edge [above] node {\footnotesize compile} (p2);
- \path[->] (p1) edge [left]  node {\footnotesize interp-\LangVar{}} (o);
- \path[->] (p2) edge [right] node {\footnotesize interp-x86} (o);
+ \path[->] (p1) edge [left]  node {\footnotesize\code{interp-Rvar}} (o);
+ \path[->] (p2) edge [right] node {\footnotesize\code{interp-x86int}} (o);
 \end{tikzpicture}
 \]
-In the next section we introduce enough of the x86 assembly
-language to compile \LangVar{}.
+In the next section we introduce the \LangXASTInt{} subset of x86 that
+suffices for compiling \LangVar{}.
 
 \section{The \LangXASTInt{} Assembly Language}
 \label{sec:x86}
 \index{x86}
 
-Figure~\ref{fig:x86-0-concrete} defines the concrete syntax for the subset of
-the x86 assembly language needed for this chapter, which we call \LangXASTInt{}.
+Figure~\ref{fig:x86-0-concrete} defines the concrete syntax for
+\LangXASTInt{}.  We use the AT\&T syntax expected by the GNU
+assembler.
 %
-An x86 program begins with a \code{main} label followed by a sequence
-of instructions.  In the grammar, ellipses such as $\ldots$ are used to
-indicate a sequence of items, e.g., $\Instr \ldots$ is a sequence of
-instructions.\index{instruction}
+A program begins with a \code{main} label followed by a sequence of
+instructions. The \key{globl} directive says that the \key{main}
+procedure is externally visible, which is necessary so that the
+operating system can call it.  In the grammar, ellipses such as
+$\ldots$ are used to indicate a sequence of items, e.g., $\Instr
+\ldots$ is a sequence of instructions.\index{instruction}
 %
-An x86 program is stored in the computer's memory and the computer has
-a \emph{program counter} (PC)\index{program counter}\index{PC}
-that points to the address of the next
-instruction to be executed. For most instructions, once the
-instruction is executed, the program counter is incremented to point
-to the immediately following instruction in memory. Most x86
+An x86 program is stored in the computer's memory.  For our purposes,
+the computer's memory is as a mapping of 64-bit addresses to 64-bit
+values.  The computer has a \emph{program counter} (PC)\index{program
+  counter}\index{PC} stored in the \code{rip} register that points to
+the address of the next instruction to be executed.  For most
+instructions, the program counter is incremented after the instruction
+is executed, so it points to the next instruction in memory.  Most x86
 instructions take two operands, where each operand is either an
-integer constant (called \emph{immediate value}\index{immediate value}),
-a \emph{register}\index{register}, or a memory location.
-A register is a special kind of variable. Each
-one holds a 64-bit value; there are 16 registers in the computer and
-their names are given in Figure~\ref{fig:x86-0-concrete}. The computer's memory
-as a mapping of 64-bit addresses to 64-bit values%
-\footnote{This simple story suffices for describing how sequential
-  programs access memory but is not sufficient for multi-threaded
-  programs. However, multi-threaded execution is beyond the scope of
-  this book.}.
-%
-We use the AT\&T syntax expected by the GNU assembler, which comes
-with the \key{gcc} compiler that we use for compiling assembly code to
-machine code.
-%
-Appendix~\ref{sec:x86-quick-reference} is a quick-reference for all of
-the x86 instructions used in this book.
-
-
-% to do: finish treatment of imulq
-% it's needed for vector's in Rany/Rdyn
+integer constant (called \emph{immediate value}\index{immediate
+  value}), a \emph{register}\index{register}, or a memory location.
 
 \newcommand{\allregisters}{\key{rsp} \mid \key{rbp} \mid \key{rax} \mid \key{rbx} \mid \key{rcx}
               \mid \key{rdx} \mid \key{rsi} \mid \key{rdi} \mid \\
@@ -1474,17 +1459,21 @@ the x86 instructions used in this book.
 \label{fig:x86-0-concrete}
 \end{figure}
 
+A register is a special kind of variable. Each one holds a 64-bit
+value; there are 16 general-purpose registers in the computer and
+their names are given in Figure~\ref{fig:x86-0-concrete}.  A register
+is written with a \key{\%} followed by the register name, such as
+\key{\%rax}.
+
 An immediate value is written using the notation \key{\$}$n$ where $n$
 is an integer.
 %
-A register is written with a \key{\%} followed by the register name,
-such as \key{\%rax}.
 %
 An access to memory is specified using the syntax $n(\key{\%}r)$,
 which obtains the address stored in register $r$ and then adds $n$
-bytes to the address. The resulting address is used to either load or
-store to memory depending on whether it occurs as a source or
-destination argument of an instruction.
+bytes to the address. The resulting address is used to load or store
+to memory depending on whether it occurs as a source or destination
+argument of an instruction.
 
 An arithmetic instruction such as $\key{addq}\,s\key{,}\,d$ reads from the
 source $s$ and destination $d$, applies the arithmetic operation, then
@@ -1497,31 +1486,28 @@ The $\key{callq}\,\itm{label}$ instruction jumps to the procedure
 specified by the label and $\key{retq}$ returns from a procedure to
 its caller. 
 %
-We discuss procedure calls in more detail later in this
-chapter and in Chapter~\ref{ch:functions}. The
-$\key{jmp}\,\itm{label}$ instruction updates the program counter to
-the address of the instruction after the specified label.
-
-Figure~\ref{fig:p0-x86} depicts an x86 program that is equivalent
-to \code{(+ 10 32)}. The \key{globl} directive says that the
-\key{main} procedure is externally visible, which is necessary so
-that the operating system can call it. The label \key{main:}
-indicates the beginning of the \key{main} procedure which is where
-the operating system starts executing this program.  The instruction
-\lstinline{movq $10, %rax} puts $10$ into register \key{rax}. The
-following instruction \lstinline{addq $32, %rax} adds $32$ to the
-$10$ in \key{rax} and puts the result, $42$, back into
-  \key{rax}.
+We discuss procedure calls in more detail later in this chapter and in
+Chapter~\ref{ch:functions}. The instruction $\key{jmp}\,\itm{label}$
+updates the program counter to the address of the instruction after
+the specified label.
+
+Appendix~\ref{sec:x86-quick-reference} contains a quick-reference for
+all of the x86 instructions used in this book.
+
+Figure~\ref{fig:p0-x86} depicts an x86 program that is equivalent to
+\code{(+ 10 32)}. The instruction \lstinline{movq $10, %rax}
+puts $10$ into register \key{rax} and then \lstinline{addq $32, %rax}
+adds $32$ to the $10$ in \key{rax} and
+puts the result, $42$, back into \key{rax}.
 %
 The last instruction, \key{retq}, finishes the \key{main} function by
 returning the integer in \key{rax} to the operating system. The
 operating system interprets this integer as the program's exit
 code. By convention, an exit code of 0 indicates that a program
 completed successfully, and all other exit codes indicate various
-errors.  Nevertheless, we return the result of the program as the exit
-code.
+errors. Nevertheless, in this book we return the result of the program
+as the exit code.
 
-%\begin{wrapfigure}{r}{2.25in}
 \begin{figure}[tbp]
 \begin{lstlisting}
 	.globl main
@@ -1532,14 +1518,13 @@ main:
 \end{lstlisting}
 \caption{An x86 program equivalent to \code{(+ 10 32)}.}
 \label{fig:p0-x86}
-%\end{wrapfigure}
 \end{figure}
 
-Unfortunately, x86 varies in a couple ways depending on what operating
-system it is assembled in. The code examples shown here are correct on
-Linux and most Unix-like platforms, but when assembled on Mac OS X,
-labels like \key{main} must be prefixed with an underscore, as in
-\key{\_main}.
+The x86 assembly language varies in a couple ways depending on what
+operating system it is assembled in. The code examples shown here are
+correct on Linux and most Unix-like platforms, but when assembled on
+Mac OS X, labels like \key{main} must be prefixed with an underscore,
+as in \key{\_main}.
 
 We exhibit the use of memory for storing intermediate results in the
 next example.  Figure~\ref{fig:p1-x86} lists an x86 program that is
@@ -1560,9 +1545,10 @@ jumping to the procedure.  The register \key{rbp} is the \emph{base
   pointer}\index{base pointer} and is used to access variables that
 are stored in the frame of the current procedure call.  The base
 pointer of the caller is pushed onto the stack after the return
-address. In Figure~\ref{fig:frame} we number the variables from $1$ to
-$n$. Variable $1$ is stored at address $-8\key{(\%rbp)}$, variable $2$
-at $-16\key{(\%rbp)}$, etc.
+address and then the base pointer is set to the location of the old
+base pointer. In Figure~\ref{fig:frame} we number the variables from
+$1$ to $n$. Variable $1$ is stored at address $-8\key{(\%rbp)}$,
+variable $2$ at $-16\key{(\%rbp)}$, etc.
 
 \begin{figure}[tbp]
 \begin{lstlisting}
@@ -1584,7 +1570,7 @@ conclusion:
 	popq	%rbp
 	retq
 \end{lstlisting}
-\caption{An x86 program equivalent to \code{(+ 10 32)}.}
+\caption{An x86 program equivalent to \code{(+ 52 (- 10))}.}
 \label{fig:p1-x86}
 \end{figure}
 
@@ -1616,62 +1602,52 @@ alignment (because the \code{callq} pushed the return address).  The
 first three instructions are the typical \emph{prelude}\index{prelude}
 for a procedure.  The instruction \code{pushq \%rbp} saves the base
 pointer for the caller onto the stack and subtracts $8$ from the stack
-pointer. At this point the stack pointer is back to being 16-byte
-aligned. The second instruction \code{movq \%rsp, \%rbp} changes the
+pointer. The second instruction \code{movq \%rsp, \%rbp} changes the
 base pointer so that it points the location of the old base
 pointer. The instruction \code{subq \$16, \%rsp} moves the stack
 pointer down to make enough room for storing variables.  This program
-needs one variable ($8$ bytes) but we round up to 16 bytes to maintain
-the 16-byte alignment of the \code{rsp}. With the \code{rsp} aligned,
-we are ready to make calls to other functions. The last instruction of
-the prelude is \code{jmp start}, which transfers control to the
-instructions that were generated from the Racket expression \code{(+
-  10 32)}.
-
-The four instructions under the label \code{start} carry out the work
-of computing \code{(+ 52 (- 10)))}.
-%
-The first instruction \code{movq \$10, -8(\%rbp)} stores $10$ in
-variable $1$.
+needs one variable ($8$ bytes) but we round up to 16 bytes so that
+\code{rsp} is 16-byte aligned and we're ready to make calls to other
+functions. The last instruction of the prelude is \code{jmp start},
+which transfers control to the instructions that were generated from
+the Racket expression \code{(+ 52 (- 10))}.
+
+The first instruction under the \code{start} label is
+\code{movq \$10, -8(\%rbp)}, which stores $10$ in variable $1$.
 %
 The instruction \code{negq -8(\%rbp)} changes variable $1$ to $-10$.
 %
-The following instruction moves the $-10$ from variable $1$ into the
+The next instruction moves the $-10$ from variable $1$ into the
 \code{rax} register.  Finally, \code{addq \$52, \%rax} adds $52$ to
 the value in \code{rax}, updating its contents to $42$.
 
 The three instructions under the label \code{conclusion} are the
 typical \emph{conclusion}\index{conclusion} of a procedure.  The first
-two instructions are necessary to get the state of the machine back to
-where it was at the beginning of the procedure.  The instruction
-\key{addq \$16, \%rsp} moves the stack pointer back to point at the
-old base pointer. The amount added here needs to match the amount that
-was subtracted in the prelude of the procedure. Then \key{popq \%rbp}
-returns the old base pointer to \key{rbp} and adds $8$ to the stack
-pointer.  The last instruction, \key{retq}, jumps back to the
-procedure that called this one and adds 8 to the stack pointer, which
-returns the stack pointer to where it was prior to the procedure call.
+two instructions restore the \code{rsp} and \code{rbp} registers to
+the state they were in at the beginning of the procedure.  The
+instruction \key{addq \$16, \%rsp} moves the stack pointer back to
+point at the old base pointer. Then \key{popq \%rbp} returns the old
+base pointer to \key{rbp} and adds $8$ to the stack pointer.  The last
+instruction, \key{retq}, jumps back to the procedure that called this
+one and adds $8$ to the stack pointer.
 
 The compiler needs a convenient representation for manipulating x86
 programs, so we define an abstract syntax for x86 in
 Figure~\ref{fig:x86-0-ast}. We refer to this language as
-\LangXASTInt{} with a subscript $0$ because later we introduce
-extended versions of this assembly language. The main difference
-compared to the concrete syntax of x86
-(Figure~\ref{fig:x86-0-concrete}) is that it does not allow labeled
-instructions to appear anywhere, but instead organizes instructions
-into a group called a \emph{block}\index{block}\index{basic block} and
-associates a label with every block, which is why the \key{X86Program}
-struct includes an alist mapping labels to blocks. The reason for
-using blocks and a control-flow graph becomes apparent in
-Chapter~\ref{ch:bool-types} when we introduce conditional
-branching. The \code{Block} structure includes an $\itm{info}$ field
-that is not needed for this chapter, but will become useful in
-Chapter~\ref{ch:register-allocation-r1}.  For now, the $\itm{info}$
-field should just contain an empty list. Also, regarding the abstract
-syntax for \code{callq}, the \code{Callq} struct includes an integer
-for representing the arity of the function, i.e., the number of
-arguments, which is helpful to know during register allocation
+\LangXASTInt{}. The main difference compared to the concrete syntax of
+\LangXInt{} (Figure~\ref{fig:x86-0-concrete}) is that labels are not
+allowed in front of every instructions. Instead instructions are
+grouped into \emph{blocks}\index{block}\index{basic block} with a
+label associated with every block, which is why the \key{X86Program}
+struct includes an alist mapping labels to blocks. The reason for this
+organization becomes apparent in Chapter~\ref{ch:bool-types} when we
+introduce conditional branching. The \code{Block} structure includes
+an $\itm{info}$ field that is not needed for this chapter, but becomes
+useful in Chapter~\ref{ch:register-allocation-r1}.  For now, the
+$\itm{info}$ field should contain an empty list. Also, regarding the
+abstract syntax for \code{callq}, the \code{Callq} struct includes an
+integer for representing the arity of the function, i.e., the number
+of arguments, which is helpful to know during register allocation
 (Chapter~\ref{ch:register-allocation-r1}).
 
 \begin{figure}[tp]
@@ -1683,10 +1659,10 @@ arguments, which is helpful to know during register allocation
 \Reg &::=& \allregisters{} \\
 \Arg &::=&  \IMM{\Int} \mid \REG{\Reg}
    \mid \DEREF{\Reg}{\Int} \\
-\Instr &::=& \BININSTR{\code{'addq}}{\Arg}{\Arg} 
-       \mid \BININSTR{\code{'subq}}{\Arg}{\Arg} \\
-       &\mid& \BININSTR{\code{'movq}}{\Arg}{\Arg}
-       \mid \UNIINSTR{\code{'negq}}{\Arg}\\
+\Instr &::=& \BININSTR{\code{addq}}{\Arg}{\Arg} 
+       \mid \BININSTR{\code{subq}}{\Arg}{\Arg} \\
+       &\mid& \BININSTR{\code{movq}}{\Arg}{\Arg}
+       \mid \UNIINSTR{\code{negq}}{\Arg}\\
        &\mid& \CALLQ{\itm{label}}{\itm{int}} \mid \RETQ{} 
        \mid \PUSHQ{\Arg} \mid \POPQ{\Arg} \mid \JMP{\itm{label}} \\
 \Block &::= & \BLOCK{\itm{info}}{\LP\Instr\ldots\RP} \\
@@ -1724,8 +1700,8 @@ and x86 assembly? Here are some of the most important ones:
   \LangVar{} the order of evaluation is a left-to-right depth-first
   traversal of the abstract syntax tree.
 
-\item[(d)] An \LangVar{} program can have any number of variables whereas
-  x86 has 16 registers and the procedure calls stack.
+\item[(d)] A program in \LangVar{} can have any number of variables
+  whereas x86 has 16 registers and the procedure calls stack.
 
 \item[(e)] Variables in \LangVar{} can overshadow other variables with the
   same name. In x86, registers have unique names and memory locations
@@ -1737,103 +1713,86 @@ the problem into several steps, dealing with the above differences one
 at a time.  Each of these steps is called a \emph{pass} of the
 compiler.\index{pass}\index{compiler pass}
 %
-This terminology comes from each step passing over the AST of the
-program.
+This terminology comes from the way each step passes over the AST of
+the program.
 %
 We begin by sketching how we might implement each pass, and give them
 names.  We then figure out an ordering of the passes and the
-input/output language for each pass. The very first pass has \LangVar{} as
-its input language and the last pass has x86 as its output
-language. In between we can choose whichever language is most
-convenient for expressing the output of each pass, whether that be
-\LangVar{}, x86, or new \emph{intermediate languages} of our own design.
-Finally, to implement each pass we write one recursive function per
-non-terminal in the grammar of the input language of the pass.
-\index{intermediate language}
+input/output language for each pass. The very first pass has
+\LangVar{} as its input language and the last pass has \LangXInt{} as
+its output language. In between we can choose whichever language is
+most convenient for expressing the output of each pass, whether that
+be \LangVar{}, \LangXInt{}, or new \emph{intermediate languages} of
+our own design.  Finally, to implement each pass we write one
+recursive function per non-terminal in the grammar of the input
+language of the pass.  \index{intermediate language}
 
 \begin{description}
-\item[Pass \key{select-instructions}] handles the difference between
-  \LangVar{} operations and x86 instructions we convert each \LangVar{}
-  operation to a short sequence of instructions that accomplishes the
-  same task.
-
-\item[Pass \key{remove-complex-opera*}] ensures that each
-  subexpression (i.e. operator and operand, and hence the name
-  \key{opera*}) is an \emph{atomic} expression (a variable or
-  integer), we introduce temporary variables to hold the results
-  of subexpressions.\index{atomic expression}
+\item[\key{select-instructions}] handles the difference between
+  \LangVar{} operations and x86 instructions. This pass converts each
+  \LangVar{} operation to a short sequence of instructions that
+  accomplishes the same task.
+
+\item[\key{remove-complex-opera*}] ensures that each subexpression of
+  a primitive operation is a variable or integer, that is, an
+  \emph{atomic} expression. We refer to non-atomic expressions as
+  \emph{complex}.  This pass introduces temporary variables to hold
+  the results of complex subexpressions.\index{atomic
+    expression}\index{complex expression}%
+  \footnote{The subexpressions of an operation are often called
+    operators and operands which explains the presence of
+    \code{opera*} in the name of this pass.}
   
-\item[Pass \key{explicate-control}] makes the execution order of the
-  program explicit, we convert from the abstract syntax tree
-  representation into a control-flow graph in which each node contains
-  a sequence of statements and the edges between nodes say which nodes
-  contain jumps to other nodes.
+\item[\key{explicate-control}] makes the execution order of the
+  program explicit. It convert the abstract syntax tree representation
+  into a control-flow graph in which each node contains a sequence of
+  statements and the edges between nodes say which nodes contain jumps
+  to other nodes.
 
-\item[Pass \key{assign-homes}] assigns the variables in \LangVar{} to
+\item[\key{assign-homes}] replaces the variables in \LangVar{} with
   registers or stack locations in x86.
 
-\item[Pass \key{uniquify}] deals with the shadowing of variables by
+\item[\key{uniquify}] deals with the shadowing of variables by
   renaming every variable to a unique name.
 \end{description}
 
 The next question is: in what order should we apply these passes? This
 question can be challenging because it is difficult to know ahead of
-time which orders will be better (easier to implement, produce more
+time which orderings will be better (easier to implement, produce more
 efficient code, etc.) so oftentimes trial-and-error is
 involved. Nevertheless, we can try to plan ahead and make educated
 choices regarding the ordering.
 
-%% Let us consider the ordering of \key{uniquify} and
-%% \key{remove-complex-opera*}. The assignment of subexpressions to
-%% temporary variables involves introducing new variables and moving
-%% subexpressions, which might change the shadowing of variables and
-%% inadvertently change the behavior of the program.  But if we apply
-%% \key{uniquify} first, this will not be an issue. Of course, this means
-%% that in \key{remove-complex-opera*}, we need to ensure that the
-%% temporary variables that it creates are unique.
-
 What should be the ordering of \key{explicate-control} with respect to
 \key{uniquify}? The \key{uniquify} pass should come first because
 \key{explicate-control} changes all the \key{let}-bound variables to
 become local variables whose scope is the entire program, which would
 confuse variables with the same name.
 %
-Likewise, we place \key{remove-complex-opera*} before
-\key{explicate-control} because \key{explicate-control} removes the
-\key{let} form, but it is convenient to use \key{let} in the output of
-\key{remove-complex-opera*}.
+We place \key{remove-complex-opera*} before \key{explicate-control}
+because the later removes the \key{let} form, but it is convenient to
+use \key{let} in the output of \key{remove-complex-opera*}.
 %
-The ordering of \key{uniquify} and \key{remove-complex-opera*} does
-not matter, so we arbitrarily choose \key{uniquify} to come first.
-
-%% Regarding \key{assign-homes}, it is helpful to place
-%% \key{explicate-control} first because \key{explicate-control} changes
-%% \key{let}-bound variables into program-scope variables.  This means
-%% that the \key{assign-homes} pass can read off the variables from the
-%% $\itm{info}$ of the \key{Program} AST node instead of traversing the
-%% entire program in search of \key{let}-bound variables.
+The ordering of \key{uniquify} with respect to
+\key{remove-complex-opera*} does not matter so we arbitrarily choose
+\key{uniquify} to come first.
 
 Last, we consider \key{select-instructions} and \key{assign-homes}.
-These two passes are intertwined, creating a Gordian Knot. To do a
-good job of assigning homes, it is helpful to have already determined
-which instructions will be used, because x86 instructions have
-restrictions about which of their arguments can be registers versus
-stack locations. One might want to give preferential treatment to
-variables that occur in register-argument positions. On the other
-hand, it may turn out to be impossible to make sure that all such
-variables are assigned to registers, and then one must redo the
-selection of instructions. A sophisticated solution to this problem is
-to iteratively repeat the two passes until a good solution is found.
-To reduce implementation complexity, we recommend a simpler approach
-in which \key{select-instructions} comes first, followed by the
-\key{assign-homes}, then a third pass named \key{patch-instructions}
-that uses a reserved register to patch-up outstanding problems
-regarding instructions with too many memory accesses.
-
-%% The disadvantage of this approach is some programs may not execute
-%% as efficiently as they would if we used the iterative approach and
-%% used all of the registers for variables.
-
+These two passes are intertwined. In Chapter~\ref{ch:functions} we
+learn that, in x86, registers are used for passing arguments to
+functions and it is preferable to assign parameters to their
+corresponding registers. On the other hand, by selecting instructions
+first we may run into a dead end in \key{assign-homes}. Recall that
+only one argument of an x86 instruction may be a memory access but
+\key{assign-homes} might fail to assign even one of them to a
+register.
+%
+A sophisticated approach is to iteratively repeat the two passes until
+a solution is found. However, to reduce implementation complexity we
+recommend a simpler approach in which \key{select-instructions} comes
+first, followed by the \key{assign-homes}, then a third pass named
+\key{patch-instructions} that uses a reserved register to fix
+outstanding problems.
 
 \begin{figure}[tbp]
 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
@@ -1862,27 +1821,27 @@ regarding instructions with too many memory accesses.
 \end{figure}
 
 Figure~\ref{fig:Rvar-passes} presents the ordering of the compiler
-passes in the form of a graph. Each pass is an edge and the
-input/output language of each pass is a node in the graph.  The output
-of \key{uniquify} and \key{remove-complex-opera*} are programs that
-are still in the \LangVar{} language, through the output of the later
-is a subset of \LangVar{}, a language we name \LangVarANF{} and
-describe in Section~\ref{sec:remove-complex-opera-Rvar}.
-%
-The output of \key{explicate-control} is in an intermediate language
-\LangCVar{} designed to make the order of evaluation explicit in its
-syntax, which we introduce in the next section. The
-\key{select-instruction} pass translates from \LangCVar{} to a variant
-of x86. The \key{assign-homes} and \key{patch-instructions} passes
-input and output variants of x86 assembly. The last pass in
-Figure~\ref{fig:Rvar-passes} is \key{print-x86}, which converts from the
-abstract syntax of \LangXASTInt{} to the concrete syntax.
-
-In the next sections we discuss the \LangCVar{} language and the
-\LangXVar{} and \LangXInt{} dialects of x86.  The
-remainder of this chapter gives hints regarding the implementation of
-each of the compiler passes in Figure~\ref{fig:Rvar-passes}.
+passes and identifies the input and output language of each pass.  The
+last pass, \key{print-x86}, converts from the abstract syntax of
+\LangXASTInt{} to the concrete syntax.  In the following two sections
+we discuss the \LangCVar{} intermediate language and the \LangXVar{}
+dialect of x86. The remainder of this chapter gives hints regarding
+the implementation of each of the compiler passes in
+Figure~\ref{fig:Rvar-passes}.
+
+%% The output of \key{uniquify} and \key{remove-complex-opera*}
+%% are programs that are still in the \LangVar{} language, though the
+%% output of the later is a subset of \LangVar{} named \LangVarANF{}
+%% (Section~\ref{sec:remove-complex-opera-Rvar}).
+%% %
+%% The output of \key{explicate-control} is in an intermediate language
+%% \LangCVar{} designed to make the order of evaluation explicit in its
+%% syntax, which we introduce in the next section. The
+%% \key{select-instruction} pass translates from \LangCVar{} to
+%% \LangXVar{}. The \key{assign-homes} and
 
+%% \key{patch-instructions}
+%% passes input and output variants of x86 assembly.
 
 \subsection{The \LangCVar{} Intermediate Language}
 
@@ -1893,28 +1852,24 @@ abstract syntax for \LangCVar{} is defined in Figure~\ref{fig:c0-syntax}.
 (The concrete syntax for \LangCVar{} is in the Appendix,
 Figure~\ref{fig:c0-concrete-syntax}.)
 %
-The \LangCVar{} language supports the same operators as \LangVar{} but the
-arguments of operators are restricted to atomic expressions (variables
-and integers), thanks to the \key{remove-complex-opera*} pass. Instead
-of \key{Let} expressions, \LangCVar{} has assignment statements which can be
-executed in sequence using the \key{Seq} form. A sequence of
-statements always ends with \key{Return}, a guarantee that is baked
-into the grammar rules for the \itm{tail} non-terminal. The naming of
-this non-terminal comes from the term \emph{tail position}\index{tail position},
-which refers to an expression that is the last one to execute within a
-function. (An expression in tail position may contain subexpressions,
-and those may or may not be in tail position depending on the kind of
-expression.)
-
-A \LangCVar{} program consists of a control-flow graph (represented as an
-alist mapping labels to tails). This is more general than
-necessary for the present chapter, as we do not yet need to introduce
-\key{goto} for jumping to labels, but it saves us from having to
-change the syntax of the program construct in
+The \LangCVar{} language supports the same operators as \LangVar{} but
+the arguments of operators are restricted to atomic
+expressions. Instead of \key{let} expressions, \LangCVar{} has
+assignment statements which can be executed in sequence using the
+\key{Seq} form. A sequence of statements always ends with
+\key{Return}, a guarantee that is baked into the grammar rules for
+\itm{tail}. The naming of this non-terminal comes from the term
+\emph{tail position}\index{tail position}, which refers to an
+expression that is the last one to execute within a function.
+
+A \LangCVar{} program consists of a control-flow graph represented as
+an alist mapping labels to tails. This is more general than necessary
+for the present chapter, as we do not yet introduce \key{goto} for
+jumping to labels, but it saves us from having to change the syntax in
 Chapter~\ref{ch:bool-types}.  For now there will be just one label,
 \key{start}, and the whole program is its tail.
 %
-The $\itm{info}$ field of the \key{Program} form, after the
+The $\itm{info}$ field of the \key{CProgram} form, after the
 \key{explicate-control} pass, contains a mapping from the symbol
 \key{locals} to a list of variables, that is, a list of all the
 variables used in the program. At the start of the program, these
@@ -1940,24 +1895,29 @@ assignment.
 \label{fig:c0-syntax}
 \end{figure}
 
+The definitional interpreter for \LangCVar{} is in the support code
+for this book, in the file \code{interp-Cvar.rkt}. The support code is
+in a \code{github} repository at the following URL:
+\begin{center}\footnotesize
+  \url{https://github.com/IUCompilerCourse/public-student-support-code}
+\end{center}
 
-\subsection{The dialects of x86}
 
-The \LangXVar{} language, pronounced ``pseudo x86'', is the output of
-the pass \key{select-instructions}. It extends \LangXASTInt{} with an
-unbounded number of program-scope variables and it does not have
-special-case rules regarding instruction arguments. The
-x86$^{\dagger}$ language, the output of \key{print-x86}, is the
-concrete syntax for x86.
+\subsection{The \LangXVar{} dialect}
+
+The \LangXVar{} language, which we call ``pseudo x86'', is the output
+of the pass \key{select-instructions}. It extends \LangXASTInt{} with
+an unbounded number of program-scope variables and removes the
+restrictions regarding instruction arguments.
 
 
 \section{Uniquify Variables}
-\label{sec:uniquify-s0}
+\label{sec:uniquify-Rvar}
 
-The \code{uniquify} pass compiles \LangVar{} programs into \LangVar{} programs
-in which every \key{let} uses a unique variable name. For example, the
-\code{uniquify} pass should translate the program on the left into the
-program on the right. \\
+The \code{uniquify} pass compiles \LangVar{} programs into \LangVar{}
+programs in which every \key{let} binds a unique variable name. For
+example, the \code{uniquify} pass should translate the program on the
+left into the program on the right. \\
 \begin{tabular}{lll}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
@@ -2000,72 +1960,71 @@ $\Rightarrow$
 \end{tabular}
 
 We recommend implementing \code{uniquify} by creating a structurally
-recursive function function named \code{uniquify-exp} that mostly just
-copies the input program. However, when encountering a \key{let}, it
-should generate a unique name for the variable (the Racket function
-\code{gensym} is handy for this) and associate the old name with the
-new unique name in an alist. The \code{uniquify-exp} function will
-need to access this alist when it gets to a variable reference, so we
-add another parameter to \code{uniquify-exp} for the alist.
+recursive function named \code{uniquify-exp} that mostly just copies
+an expression. However, when encountering a \key{let}, it should
+generate a unique name for the variable and associate the old name
+with the new name in an alist.\footnote{The Racket function
+  \code{gensym} is handy for generating unique variable names.} The
+\code{uniquify-exp} function needs to access this alist when it gets
+to a variable reference, so we add a parameter to \code{uniquify-exp}
+for the alist.
 
 The skeleton of the \code{uniquify-exp} function is shown in
-Figure~\ref{fig:uniquify-s0}.  The function is curried so that it is
+Figure~\ref{fig:uniquify-Rvar}.  The function is curried so that it is
 convenient to partially apply it to an alist and then apply it to
 different expressions, as in the last clause for primitive operations
-in Figure~\ref{fig:uniquify-s0}.  The
+in Figure~\ref{fig:uniquify-Rvar}.  The
+%
 \href{https://docs.racket-lang.org/reference/for.html#%28form._%28%28lib._racket%2Fprivate%2Fbase..rkt%29._for%2Flist%29%29}{\key{for/list}}
-  form is useful for applying a function to each element of a list to
-  produce a new list.  \index{for/list}
+%
+form of Racket is useful for transforming each element of a list to
+produce a new list.\index{for/list}
 
 \begin{exercise}
 \normalfont % I don't like the italics for exercises. -Jeremy
 
-Complete the \code{uniquify} pass by filling in the blanks, that is,
-implement the clauses for variables and for the \key{let} form.
+Complete the \code{uniquify} pass by filling in the blanks in
+Figure~\ref{fig:uniquify-Rvar}, that is, implement the clauses for
+variables and for the \key{let} form in the file \code{compiler.rkt}
+in the support code.
 \end{exercise}
 
 \begin{figure}[tbp]
 \begin{lstlisting}
-   (define (uniquify-exp env)
-     (lambda (e)
-       (match e
-         [(Var x) ___]
-         [(Int n) (Int n)]
-         [(Let x e body) ___]
-         [(Prim op es)
-          (Prim op (for/list ([e es]) ((uniquify-exp env) e)))]
-         )))
+(define (uniquify-exp env)
+  (lambda (e)
+    (match e
+      [(Var x) ___]
+      [(Int n) (Int n)]
+      [(Let x e body) ___]
+      [(Prim op es)
+       (Prim op (for/list ([e es]) ((uniquify-exp env) e)))])))
 
-   (define (uniquify p)
-     (match p
-       [(Program '() e) (Program '() ((uniquify-exp '()) e))]))
+(define (uniquify p)
+  (match p
+    [(Program '() e) (Program '() ((uniquify-exp '()) e))]))
 \end{lstlisting}
 \caption{Skeleton for the \key{uniquify} pass.}
-\label{fig:uniquify-s0}
+\label{fig:uniquify-Rvar}
 \end{figure}
 
 \begin{exercise}
 \normalfont % I don't like the italics for exercises. -Jeremy
 
-Test your \key{uniquify} pass by creating five example \LangVar{} programs.
-Check whether the output programs produce the same result as the input
-programs. The \LangVar{} programs should be designed to test the most
-interesting parts of the \key{uniquify} pass, that is, the programs
-should include \key{let} forms, variables, and variables that
-overshadow each other.  The five programs should be in a subdirectory
-named \key{tests} and they should have the same file name except for a
-different integer at the end of the name, followed by the ending
-\key{.rkt}. Use the \key{interp-tests} function
+Creating five \LangVar{} programs to test the most interesting parts
+of the \key{uniquify} pass, that is, the programs should include
+\key{let} forms, variables, and variables that overshadow each other.
+The five programs should be placed in the subdirectory named
+\key{tests} and the file names should start with \code{var\_test\_}
+followed by a unique integer and end with the file extension
+\key{.rkt}. Run the \key{run-tests.rkt} script in the support code to
+check whether the output programs produce the same result as the input
+programs. The script uses the \key{interp-tests} function
 (Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
-your \key{uniquify} pass on the example programs.  See the
-\key{run-tests.rkt} script in the support code for an example of how
-to use \key{interp-tests}. The support code is in a \code{github}
-repository at the following URL:
-\begin{center}\footnotesize
-  \url{https://github.com/IUCompilerCourse/public-student-support-code}
-\end{center}
+your \key{uniquify} pass on the example programs.
 \end{exercise}
 
+
 \section{Remove Complex Operands}
 \label{sec:remove-complex-opera-Rvar}
 
@@ -2117,7 +2076,7 @@ R^{\dagger}_1  &::=& \PROGRAM{\code{'()}}{\Exp}
 \end{figure}
 
 Figure~\ref{fig:r1-anf-syntax} presents the grammar for the output of
-this pass, the language \LangVarANF{}. The main difference is that
+this pass, the language \LangVarANF{}. The only difference is that
 operator arguments are required to be atomic expressions.  In the
 literature, this is called \emph{administrative normal form}, or ANF
 for short~\citep{Danvy:1991fk,Flanagan:1993cg}.  \index{administrative
@@ -2161,9 +2120,9 @@ tmp.1
 \end{minipage}
 \end{tabular}
 
-Take special care of programs such as the next one that \key{let}-bind
-variables with integers or other variables. You should leave them
-unchanged, as shown in to the program on the right \\
+Take special care of programs such as the following one that binds a
+variable to an atomic expression. You should leave such variable
+bindings unchanged, as shown in to the program on the right \\
 \begin{tabular}{lll}
 \begin{minipage}{0.4\textwidth}
 % s0_20.rkt
@@ -2185,7 +2144,7 @@ $\Rightarrow$
 \end{minipage}
 \end{tabular} \\
 A careless implementation of \key{rco-exp} and \key{rco-atom} might
-produce the following output.\\
+produce the following output with unnecessary temporary variables.\\
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 (let ([tmp.1 42])
@@ -2197,13 +2156,13 @@ produce the following output.\\
 \end{minipage}
 
 \begin{exercise}
-\normalfont Implement the \code{remove-complex-opera*} pass.
-Test the new pass on all of the example programs that you created to test the
-\key{uniquify} pass and create three new example programs that are
-designed to exercise the interesting code in the
-\code{remove-complex-opera*} pass. Use the \key{interp-tests} function
-(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
-your passes on the example programs.
+  \normalfont Implement the \code{remove-complex-opera*} in
+  \code{compiler.rkt}.  Create three new \LangInt{} programs that are
+  designed to exercise the interesting code in the
+  \code{remove-complex-opera*} pass (Following the same file name
+  guidelines as before.). In the \code{run-tests.rkt} script,
+  uncomment the line for this pass in the list of \code{passes} and
+  then run the script to test your compiler.
 \end{exercise}
 
 
@@ -2227,7 +2186,7 @@ sequence of assignment statements. For example, consider the following
 The output of the previous pass and of \code{explicate-control} is
 shown below. Recall that the right-hand-side of a \key{let} executes
 before its body, so the order of evaluation for this program is to
-assign \code{20} to \code{x.1}, assign \code{22} to \code{x.2}, assign
+assign \code{20} to \code{x.1}, \code{22} to \code{x.2}, and
 \code{(+ x.1 x.2)} to \code{y}, then return \code{y}. Indeed, the
 output of \code{explicate-control} makes this ordering explicit.\\
 \begin{tabular}{lll}
@@ -2243,7 +2202,7 @@ output of \code{explicate-control} makes this ordering explicit.\\
 $\Rightarrow$
 &
 \begin{minipage}{0.4\textwidth}
-\begin{lstlisting}
+\begin{lstlisting}[language=C]
 start:
   x.1 = 20;
   x.2 = 22;
@@ -2253,30 +2212,67 @@ start:
 \end{minipage}
 \end{tabular}
 
+\begin{figure}[tbp]
+\begin{lstlisting}
+(define (explicate-tail e)
+  (match e
+    [(Var x) ___]
+    [(Int n) (Return (Int n))]
+    [(Let x rhs body) ___]
+    [(Prim op es) ___]
+    [else (error "explicate-tail unhandled case" e)]))
+
+(define (explicate-assign e x cont)
+  (match e
+    [(Var x) ___]
+    [(Int n) (Seq (Assign (Var x) (Int n)) cont)]
+    [(Let y rhs body) ___]
+    [(Prim op es) ___]
+    [else (error "explicate-assign unhandled case" e)]))
+
+(define (explicate-control p)
+  (match p
+    [(Program info body) ___]))
+\end{lstlisting}
+\caption{Skeleton for the \key{explicate-control} pass.}
+\label{fig:explicate-control-Rvar}
+\end{figure}
+
+The organization of this pass depends on the notion of tail position
+that we have alluded to earlier. Formally, \emph{tail
+  position}\index{tail position} in the context of \LangVar{} is
+defined recursively by the following two rules.
+\begin{enumerate}
+\item In $\PROGRAM{\code{()}}{e}$, expression $e$ is in tail position.
+\item If $\LET{x}{e_1}{e_2}$ is in tail position, then so is $e_2$.
+\end{enumerate}
+
 We recommend implementing \code{explicate-control} using two mutually
-recursive functions: \code{explicate-tail} and
-\code{explicate-assign}.  The first function should be applied to
-expressions in tail position whereas the second should be applied to
-expressions that occur on the right-hand-side of a \key{let}.
+recursive functions, \code{explicate-tail} and
+\code{explicate-assign}, as suggested in the skeleton code in
+Figure~\ref{fig:explicate-control-Rvar}.  The \code{explicate-tail}
+function should be applied to expressions in tail position whereas the
+\code{explicate-assign} should be applied to expressions that occur on
+the right-hand-side of a \key{let}.
 %
-The \code{explicate-tail} function takes an \LangVar{} expression as input
-and produces a \LangCVar{} $\Tail$ (see Figure~\ref{fig:c0-syntax}).
+The \code{explicate-tail} function takes an \Exp{} in \LangVar{} as
+input and produces a \Tail{} in \LangCVar{} (see
+Figure~\ref{fig:c0-syntax}).
 %
-The \code{explicate-assign} function takes an \LangVar{} expression, the
-variable that it is to be assigned to, and \LangCVar{} code (a $\Tail$) that
-should come after the assignment (e.g., the code generated for the
-body of the \key{let}) and returns a $\Tail$. The
-\code{explicate-assign} function is in accumulator-passing style in
-that its third parameter is some \LangCVar{} code that it adds to and
-returns. The reader might be tempted to instead organize
-\code{explicate-assign} in a more direct fashion, without the third
-parameter and perhaps using \code{append} to combine statements. We
-warn against that alternative because the accumulator-passing style is
-key to how we generate high-quality code for conditional expressions
-in Chapter~\ref{ch:bool-types}.
-
-The top-level \code{explicate-control} function should invoke
-\code{explicate-tail} on the body of the \key{Program} AST node.
+The \code{explicate-assign} function takes an \Exp{} in \LangVar{},
+the variable that it is to be assigned to, and a \Tail{} in
+\LangCVar{} for the code that will come after the assignment.  The
+\code{explicate-assign} function returns a $\Tail$ in \LangCVar{}.
+
+The \code{explicate-assign} function is in accumulator-passing style
+in that the \code{cont} parameter is used for accumulating the
+output. The reader might be tempted to instead organize
+\code{explicate-assign} in a more direct fashion, without the
+\code{cont} parameter and perhaps using \code{append} to combine
+statements. We warn against that alternative because the
+accumulator-passing style is key to how we generate high-quality code
+for conditional expressions in Chapter~\ref{ch:bool-types}.
+
 
 \section{Select Instructions}
 \label{sec:select-r1}