4 gadi atpakaļ · 03ab5610f7
--- a/book.tex
+++ b/book.tex
@@ -103,11 +103,10 @@ showstringspaces=false
 
				   language=[Objective]Caml,
			
 
				   basicstyle=\ttfamily\small\color{blue},
			
 
				   columns=flexible,
			
 
				-  escapechar={},
			
 
				+  escapechar=~,
			
 
				   showstringspaces=false
			
 
				 }
			
 
				 
			
 
				-
			
 
				 \newtheorem{theorem}{Theorem}
			
 
				 \newtheorem{lemma}[theorem]{Lemma}
			
 
				 \newtheorem{corollary}[theorem]{Corollary}
			
@@ -887,13 +886,15 @@ Appendix~\ref{appendix:utilities} for more details.
 
				     | SNum of Int64.t
			
 
				             (* 64-bit integers *)
			
 
				     | SSym of string
			
 
				-            (* non-digit character sequence delimited by white space *)
			
 
				+            (* character sequence starting with non-digit,
			
 
				+               delimited by white space *)
			
 
				     | SString of string
			
 
				             (* arbitrary character sequence delimited by double quotes *)
			
 
				   \end{lstlisting}
			
 
				   The generic S-expression parser handles (nestable) comments delimited by
			
 
				-  curly braces (\code{\{} and \code{\}}).  Symbols can contain any
			
 
				-  non-digit, non-whitespace characters except parentheses, curly braces, and
			
 
				+  curly braces (\code{\{} and \code{\}}).  Symbols must start with a non-digit
			
 
				+  character and can contain any
			
 
				+  non-whitespace characters except parentheses, curly braces, and
			
 
				   the back tick (\code{\`}); this last exclusion is handy when we want to
			
 
				   generate internal names during compilation and be sure they don't clash
			
 
				   with a user-defined symbol.
			
@@ -1641,19 +1642,30 @@ We hope to give enough hints that the well-prepared reader, together
 
				 with a few friends, can implement a compiler from \LangVar{} to x86 in
			
 
				 a couple weeks.  To give the reader a feeling for the scale of this
			
 
				 first compiler, the instructor solution for the \LangVar{} compiler is
			
 
				-approximately 500 lines of code.
			
 
				+approximately 500 lines of code. \ocaml{For the OCaml-based course,
			
 
				+  several pieces of the compiler will be provided for you, leaving enough
			
 
				+  work for a week-long assignment. The instructor solution for
			
 
				+  the tasks left to you is under 200 lines of code.
			
 
				+  However, in return for not writing so much code,
			
 
				+  you will need to \emph{read} more existing code.}
			
 
				 
			
 
				 \section{The \LangVar{} Language}
			
 
				 \label{sec:s0}
			
 
				 \index{variable}
			
 
				 
			
 
				 The \LangVar{} language extends the \LangInt{} language with variable
			
 
				-definitions.  The concrete syntax of the \LangVar{} language is defined by
			
 
				+definitions. The concrete syntax of the \LangVar{} language is defined by
			
 
				 the grammar in Figure~\ref{fig:r1-concrete-syntax} and the abstract
			
 
				-syntax is defined in Figure~\ref{fig:r1-syntax}.  The non-terminal
			
 
				-\Var{} may be any Racket identifier. As in \LangInt{}, \key{read} is a
			
 
				+syntax is defined in Figure~\ref{fig:r1-syntax}.  \ocaml{For the OCaml
			
 
				+  version, we don't feel the need to match the syntax of Racket exactly,
			
 
				+  so we can simplify the concrete syntax of \key{let} bindings.}   The non-terminal
			
 
				+\Var{} may be any Racket identifier. \ocaml{For OCaml, it can be any S-expression symbol.}
			
 
				+As in \LangInt{}, \key{read} is a
			
 
				 nullary operator, \key{-} is a unary operator, and \key{+} is a binary
			
 
				-operator.  Similar to \LangInt{}, the abstract syntax of \LangVar{} includes the
			
 
				+operator.  \ocaml{We also add \key{-} as a binary subtraction operator in
			
 
				+  the concrete syntax, but not in the abstract syntax: 
			
 
				+  we will ``de-sugar'' substraction into a combination
			
 
				+  of addition and negation.}Similar to \LangInt{}, the abstract syntax of \LangVar{} includes the
			
 
				 \key{Program} struct to mark the top of the program.
			
 
				 %% The $\itm{info}$
			
 
				 %% field of the \key{Program} structure contains an \emph{association
			
@@ -1675,7 +1687,20 @@ exhibit several compilation techniques.
 
				 \]
			
 
				 \end{minipage}
			
 
				 }
			
 
				-\caption{The concrete syntax of \LangVar{}.}
			
 
				+\begin{ocamlx}
			
 
				+\fbox{
			
 
				+\begin{minipage}{0.96\textwidth}
			
 
				+\[
			
 
				+\begin{array}{rcl}
			
 
				+  \Exp &::=& \Int \mid \CREAD{} \mid \CNEG{\Exp} \mid \CADD{\Exp}{\Exp} \mid \CSUB{\Exp}{\Exp}\\
			
 
				+       &\mid& \Var \mid \code{(let $\Var$ $\Exp$ $\Exp$)}\\
			
 
				+  \LangVar{} &::=& \Exp
			
 
				+\end{array}
			
 
				+\]
			
 
				+\end{minipage}
			
 
				+}
			
 
				+\end{ocamlx}
			
 
				+\caption{The concrete syntax of \LangVar{} \ocaml{in OCaml}.}
			
 
				 \label{fig:r1-concrete-syntax}
			
 
				 \end{figure}
			
 
				 
			
@@ -1693,6 +1718,19 @@ exhibit several compilation techniques.
 
				 \]
			
 
				 \end{minipage}
			
 
				 }
			
 
				+\begin{lstlisting}[style=ocaml,frame=single]
			
 
				+type primop = 
			
 
				+   Read
			
 
				+ | Neg
			
 
				+ | Add
			
 
				+type var = string
			
 
				+type exp = 
			
 
				+   Int of int64  
			
 
				+ | Prim of primop * exp list
			
 
				+ | Var of var
			
 
				+ | Let of var * exp * exp
			
 
				+type 'info program = Program of 'info * exp
			
 
				+\end{lstlisting}
			
 
				 \caption{The abstract syntax of \LangVar{}.}
			
 
				 \label{fig:r1-syntax}
			
 
				 \end{figure}
			
@@ -1705,11 +1743,17 @@ Figure~\ref{fig:r1-syntax}.  The concrete syntax for \key{let} is
 
				 \begin{lstlisting}
			
 
				 (let ([|$\itm{var}$| |$\itm{exp}$|]) |$\itm{exp}$|)
			
 
				 \end{lstlisting}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+(let ~$\itm{var}$~ ~$\itm{exp}$~ ~$\itm{exp}$~)
			
 
				+\end{lstlisting}
			
 
				 For example, the following program initializes \code{x} to $32$ and then
			
 
				 evaluates the body \code{(+ 10 x)}, producing $42$.
			
 
				 \begin{lstlisting}
			
 
				 (let ([x (+ 12 20)]) (+ 10 x))
			
 
				 \end{lstlisting}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+(let x (+ 12 20) (+ 10 x))
			
 
				+\end{lstlisting}
			
 
				 When there are multiple \key{let}'s for the same variable, the closest
			
 
				 enclosing \key{let} is used. That is, variable definitions overshadow
			
 
				 prior definitions. Consider the following program with two \key{let}'s
			
@@ -1717,6 +1761,9 @@ that define variables named \code{x}. Can you figure out the result?
 
				 \begin{lstlisting}
			
 
				 (let ([x 32]) (+ (let ([x 10]) x) x))
			
 
				 \end{lstlisting}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+(let x 32 (+ (let x 10 x) x))
			
 
				+\end{lstlisting}
			
 
				 For the purposes of depicting which variable uses correspond to which
			
 
				 definitions, the following shows the \code{x}'s annotated with
			
 
				 subscripts to distinguish them. Double check that your answer for the
			
@@ -1725,6 +1772,9 @@ program.
 
				 \begin{lstlisting}
			
 
				 (let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|))
			
 
				 \end{lstlisting}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+(let x~$_1$~ 32 (+ (let x~$_2$~ 10 x~$_2$~) x~$_1$~))
			
 
				+\end{lstlisting}
			
 
				 The initializing expression is always evaluated before the body of the
			
 
				 \key{let}, so in the following, the \key{read} for \code{x} is
			
 
				 performed before the \key{read} for \code{y}. Given the input
			
@@ -1732,10 +1782,23 @@ $52$ then $10$, the following produces $42$ (not $-42$).
 
				 \begin{lstlisting}
			
 
				 (let ([x (read)]) (let ([y (read)]) (+ x (- y))))
			
 
				 \end{lstlisting}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+(let x (read) (let y (read) (+ x (- y)))))
			
 
				+\end{lstlisting}
			
 
				 
			
 
				 \subsection{Extensible Interpreters via Method Overriding}
			
 
				 \label{sec:extensible-interp}
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+  We are not going to bother with making our OCaml interpreters
			
 
				+  extensible, although there are several mechanisms in OCaml that
			
 
				+  we could use to acheive this. The languages involved here just
			
 
				+  don't seem big enough to warrant the added complexity.
			
 
				+  We will, however, break out the definition and interpretation of
			
 
				+  primops into a separate module, so that this can be easily shared among
			
 
				+  different languages.
			
 
				+\end{ocamlx}
			
 
				+
			
 
				 To prepare for discussing the interpreter for \LangVar{}, we need to
			
 
				 explain why we choose to implement the interpreter using
			
 
				 object-oriented programming, that is, as a collection of methods
			
@@ -1885,10 +1948,16 @@ extensible way.
 
				 \end{wrapfigure}
			
 
				 
			
 
				 Having justified the use of classes and methods to implement
			
 
				-interpreters, we turn to the definitional interpreter for \LangVar{}
			
 
				-in Figure~\ref{fig:interp-Rvar}. It is similar to the interpreter for
			
 
				+interpreters \ocaml{(or not)}, we turn to the definitional interpreter for \LangVar{}
			
 
				+in Figure~\ref{fig:interp-Rvar} \ocaml{(Figure~\ref{fig:interp-Rvar-ocaml})}.
			
 
				+It is similar to the interpreter for
			
 
				 \LangInt{} but adds two new \key{match} cases for variables and
			
 
				-\key{let}.  For \key{let} we need a way to communicate the value bound
			
 
				+\key{let}. \ocaml{Also, the code for performing primops has been split out
			
 
				+  into a separate function. We rely on the fact that
			
 
				+  \code{List.map} processes list elements from left to right to
			
 
				+  enforce the intended order of evaluation of primop subexpressions.}
			
 
				+
			
 
				+For \key{let} we need a way to communicate the value bound
			
 
				 to a variable to all the uses of the variable. To accomplish this, we
			
 
				 maintain a mapping from variables to values. Throughout the compiler
			
 
				 we often need to map variables to information about them. We refer to
			
@@ -1899,7 +1968,7 @@ these mappings as
 
				 %
			
 
				 For simplicity, we use an association list (alist) to represent the
			
 
				 environment. The sidebar to the right gives a brief introduction to
			
 
				-alists and the \code{racket/dict} package.  The \code{interp-exp}
			
 
				+alists and the \code{racket/dict} package. The \code{interp-exp}
			
 
				 function takes the current environment, \code{env}, as an extra
			
 
				 parameter.  When the interpreter encounters a variable, it finds the
			
 
				 corresponding value using the \code{dict-ref} function.  When the
			
@@ -1908,6 +1977,51 @@ expression, extends the environment with the result value bound to the
 
				 variable, using \code{dict-set}, then evaluates the body of the
			
 
				 \key{Let}.
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+  In OCaml, we thread environments in the same way, but
			
 
				+  it is convenient to represent environments using
			
 
				+  the \code{Map} library module, which provides efficient
			
 
				+  mappings from keys to values (using balanced binary trees,
			
 
				+  although that is an implementation detail we don't need to
			
 
				+  know about). \code{Map} is an example of a module that
			
 
				+  is \emph{parameterized} by another module signature; this
			
 
				+  is sometimes called a \emph{functor}.  Here we use \code{Map.Make}
			
 
				+  to \emph{apply} the functor, thereby defining a module \code{Env} that provides operations
			
 
				+  specialized to \code{string} keys (suitable for variables).
			
 
				+  The type of environments is written \code{'a Env.t}; it is
			
 
				+  parametric in the type \code{'a} of values stored in the map.
			
 
				+  Here we will be using \LangVar{}
			
 
				+  values, i.e. \code{int64}s, so the type is \code{int64 Env.t}.  
			
 
				+  \code{Env.empty} represents an empty environment.
			
 
				+  \code{Env.find $x$ $env$} returns the value associated with
			
 
				+  variable $x$ in $env$ (throwing an exception if $x$ is not found). 
			
 
				+  \code{Env.add $x$ $v$ $env$} produces a new environment
			
 
				+  that is the same as $env$ except that variable $x$ is associated to
			
 
				+  value $v$. Note that these operations are \emph{pure}; that is, they
			
 
				+  do not mutate any environment.
			
 
				+\end{ocamlx}
			
 
				+
			
 
				+\begin{ocamlx}
			
 
				+  The OCaml code for \LangVar{} ASTs, concrete parsing and printing (for debug purposes),
			
 
				+  and interpretation are in file \texttt{RVar.ml}, which also imports
			
 
				+  from file \texttt{Primops.ml}.  These files also contain code for
			
 
				+  static checking of \LangVar{} programs. The checker makes sure that
			
 
				+  (i) every use of a variable is in the scope of a corresponding \code{let} binding;
			
 
				+  and (ii) each primop is applied to the correct number of arguments.
			
 
				+
			
 
				+  Note that if a source program fails the checker for reason (i), this is a static user error
			
 
				+  that should be reported as such. (Violations of (ii) in user programs
			
 
				+  should be caught by the parser; parse errors are always reported as user errors.)
			
 
				+  Your compiler should stop trying to process a file as soon as it reports a static user
			
 
				+  error! (That's what the provided test driver will do.)
			
 
				+
			
 
				+  However, if a program initially passes
			
 
				+  the checker but is subsequently transformed by the compiler and then
			
 
				+  fails a re-check, this indicates that the problem is the compiler's fault.
			
 
				+  In this case, the compiler itself should halt with a suitable error message.
			
 
				+  The checker has a boolean flag to distinguish these cases.
			
 
				+\end{ocamlx}
			
 
				+
			
 
				 \begin{figure}[tp]
			
 
				 \begin{lstlisting}
			
 
				 (define interp-Rvar-class
			
@@ -1940,6 +2054,31 @@ variable, using \code{dict-set}, then evaluates the body of the
 
				 \caption{Interpreter for the \LangVar{} language.}
			
 
				 \label{fig:interp-Rvar}
			
 
				 \end{figure}
			
 
				+\begin{figure}[tp]
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+type value = int64
			
 
				+  
			
 
				+let interp_primop (op:primop) (args: value list) : value = 
			
 
				+  match op,args with
			
 
				+    Read,[] -> read_int()
			
 
				+  | Neg,[v] -> Int64.neg v
			
 
				+  | Add,[v1;v2] -> Int64.add v1 v2
			
 
				+  | _,_ -> assert false (* arity mismatch *)
			
 
				+
			
 
				+module StringKey = struct type t = string let compare = String.compare end
			
 
				+module Env = Map.Make(StringKey)
			
 
				+
			
 
				+let rec interp_exp (env:value Env.t) = function
			
 
				+    Int n -> n
			
 
				+  | Prim(op,args) -> interp_primop op (List.map (interp_exp env) args)
			
 
				+  | Var x -> Env.find x env
			
 
				+  | Let (x,e1,e2) -> interp_exp (Env.add x (interp_exp env e1) env) e2
			
 
				+
			
 
				+let interp_program (Program(_,e)) = interp_exp Env.empty e
			
 
				+\end{lstlisting}
			
 
				+\caption{\ocaml{Ocaml interpreter for the \LangVar{} language.}}
			
 
				+\label{fig:interp-Rvar-ocaml}
			
 
				+\end{figure}
			
 
				 
			
 
				 The goal for this chapter is to implement a compiler that translates
			
 
				 any program $P_1$ written in the \LangVar{} language into an x86 assembly
			
@@ -2002,7 +2141,8 @@ integer constant (called \emph{immediate value}\index{immediate
 
				 \Arg &::=&  \key{\$}\Int \mid \key{\%}\Reg \mid \Int\key{(}\key{\%}\Reg\key{)}\\
			
 
				 \Instr &::=& \key{addq} \; \Arg\key{,} \Arg \mid
			
 
				       \key{subq} \; \Arg\key{,} \Arg \mid
			
 
				-      \key{negq} \; \Arg \mid \key{movq} \; \Arg\key{,} \Arg \mid \\
			
 
				+      \key{negq} \; \Arg \mid \\
			
 
				+  &&  \key{movq} \; \Arg\key{,} \Arg \mid \ocaml{\key{movabsq} \; \Arg\key{,} \Arg \mid} \\
			
 
				   &&  \key{callq} \; \mathit{label} \mid
			
 
				       \key{pushq}\;\Arg \mid \key{popq}\;\Arg \mid \key{retq} \mid \key{jmp}\,\itm{label} \\
			
 
				   && \itm{label}\key{:}\; \Instr \\
			
@@ -2062,8 +2202,10 @@ returning the integer in \key{rax} to the operating system. The
 
				 operating system interprets this integer as the program's exit
			
 
				 code. By convention, an exit code of 0 indicates that a program
			
 
				 completed successfully, and all other exit codes indicate various
			
 
				-errors. Nevertheless, in this book we return the result of the program
			
 
				-as the exit code.
			
 
				+errors. \ocaml{Also, exit codes are unsigned bytes, so they cannot accurately represent
			
 
				+arbitrary \code{int64}s.} Nevertheless, in this book we return the result of the program
			
 
				+as the exit code. \ocaml{(Incidentally, if you run a program at the unix shell
			
 
				+  prompt, you can retrieve its exit code by typing \texttt{echo \$?} as the very next command.)}
			
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{lstlisting}
			
@@ -2081,7 +2223,8 @@ The x86 assembly language varies in a couple ways depending on what
 
				 operating system it is assembled in. The code examples shown here are
			
 
				 correct on Linux and most Unix-like platforms, but when assembled on
			
 
				 Mac OS X, labels like \key{main} must be prefixed with an underscore,
			
 
				-as in \key{\_main}.
			
 
				+as in \key{\_main}. \ocaml{There is a utility function \code{get\_ostype}
			
 
				+provided in the \texttt{utils.ml} module provided with the support materials.}
			
 
				 
			
 
				 We exhibit the use of memory for storing intermediate results in the
			
 
				 next example.  Figure~\ref{fig:p1-x86} lists an x86 program that is
			
@@ -2201,12 +2344,23 @@ organization becomes apparent in Chapter~\ref{ch:Rif} when we
 
				 introduce conditional branching. The \code{Block} structure includes
			
 
				 an $\itm{info}$ field that is not needed for this chapter, but becomes
			
 
				 useful in Chapter~\ref{ch:register-allocation-Rvar}.  For now, the
			
 
				-$\itm{info}$ field should contain an empty list. Also, regarding the
			
 
				+$\itm{info}$ field should contain an empty list. \ocaml{The \code{'binfo}
			
 
				+  type parameter should be instantiated with \code{unit}.}
			
 
				+Also, regarding the
			
 
				 abstract syntax for \code{callq}, the \code{Callq} struct includes an
			
 
				 integer for representing the arity of the function, i.e., the number
			
 
				 of arguments, which is helpful to know during register allocation
			
 
				 (Chapter~\ref{ch:register-allocation-Rvar}).
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+  The OCaml code for \LangXInt{} AST, printing, and checking is
			
 
				+  in file \texttt{X86Int.ml}. Printing is used to produce \texttt{.s} files that
			
 
				+  can be input to the system assembler; it can also be useful for debugging.
			
 
				+  File \texttt{utils.ml} contains functions for invoking the assembler and linker and
			
 
				+  running the resulting executables from inside OCaml; these are invoked
			
 
				+  from the test drivers also defined in that file.
			
 
				+\end{ocamlx}    
			
 
				+
			
 
				 \begin{figure}[tp]
			
 
				 \fbox{
			
 
				 \begin{minipage}{0.98\textwidth}
			
@@ -2218,8 +2372,9 @@ of arguments, which is helpful to know during register allocation
 
				    \mid \DEREF{\Reg}{\Int} \\
			
 
				 \Instr &::=& \BININSTR{\code{addq}}{\Arg}{\Arg} 
			
 
				        \mid \BININSTR{\code{subq}}{\Arg}{\Arg} \\
			
 
				+       &\mid& \UNIINSTR{\code{negq}}{\Arg}\\
			
 
				        &\mid& \BININSTR{\code{movq}}{\Arg}{\Arg}
			
 
				-       \mid \UNIINSTR{\code{negq}}{\Arg}\\
			
 
				+       \ocaml{\mid \BININSTR{\code{movabsq}}{\Arg}{\Arg}} \\
			
 
				        &\mid& \CALLQ{\itm{label}}{\itm{int}} \mid \RETQ{} 
			
 
				        \mid \PUSHQ{\Arg} \mid \POPQ{\Arg} \mid \JMP{\itm{label}} \\
			
 
				 \Block &::= & \BLOCK{\itm{info}}{\LP\Instr\ldots\RP} \\
			
@@ -2228,10 +2383,34 @@ of arguments, which is helpful to know during register allocation
 
				 \]
			
 
				 \end{minipage}
			
 
				 }
			
 
				-\caption{The abstract syntax of \LangXInt{} assembly.}
			
 
				+\begin{lstlisting}[style=ocaml,frame=single]
			
 
				+type reg =
			
 
				+    RSP | RBP | RAX | RBX | RCX | RDX | RSI | RDI
			
 
				+  | R8  | R9  | R10 | R11 | R12 | R13 | R14 | R15
			
 
				+
			
 
				+type label = string
			
 
				+
			
 
				+type arg =
			
 
				+    Imm of int64  (* in most cases must actually be an int32 *)
			
 
				+  | Reg of reg
			
 
				+  | Deref of reg*int32
			
 
				+  | Var of string (* a pseudo-argument for ~$\LangXVar{}$~ *)
			
 
				+
			
 
				+type instr =
			
 
				+    Addq of arg*arg | Subq of arg*arg | Negq of arg 
			
 
				+  | Movq of arg*arg | Movabsq of arg*arg | Callq of label*int 
			
 
				+  | Retq | Pushq of arg | Popq of arg | Jmp of label
			
 
				+
			
 
				+type 'binfo block = Block of 'binfo * instr list
			
 
				+
			
 
				+type ('pinfo,'binfo) program =
			
 
				+    Program of 'pinfo * (label * 'binfo block) list 
			
 
				+\end{lstlisting}
			
 
				+\caption{The abstract syntax of \LangXInt{} \ocaml{and \LangXVar{}} assembly.}
			
 
				 \label{fig:x86-int-ast}
			
 
				 \end{figure}
			
 
				 
			
 
				+
			
 
				 \section{Planning the trip to x86 via the \LangCVar{} language}
			
 
				 \label{sec:plan-s0-x86}
			
 
				 
			
@@ -2246,7 +2425,8 @@ and x86 assembly? Here are some of the most important ones:
 
				   arithmetic operations take two arguments and produce a new value.
			
 
				   An x86 instruction may have at most one memory-accessing argument.
			
 
				   Furthermore, some instructions place special restrictions on their
			
 
				-  arguments.
			
 
				+  arguments. \ocaml{For example, immediate operands are usually restricted
			
 
				+    to fit in 32 bits (except for the \code{movabsq} instruction).}
			
 
				 
			
 
				 \item[(b)] An argument of an \LangVar{} operator can be a deeply-nested
			
 
				   expression, whereas x86 instructions restrict their arguments to be
			
@@ -2327,7 +2507,7 @@ become local variables whose scope is the entire program, which would
 
				 confuse variables with the same name.
			
 
				 %
			
 
				 We place \key{remove-complex-opera*} before \key{explicate-control}
			
 
				-because the later removes the \key{let} form, but it is convenient to
			
 
				+because the latter removes the \key{let} form, but it is convenient to
			
 
				 use \key{let} in the output of \key{remove-complex-opera*}.
			
 
				 %
			
 
				 The ordering of \key{uniquify} with respect to
			
@@ -2407,7 +2587,10 @@ language~\citep{Kernighan:1988nx} in that it has separate syntactic
 
				 categories for expressions and statements, so we name it \LangCVar{}.  The
			
 
				 abstract syntax for \LangCVar{} is defined in Figure~\ref{fig:c0-syntax}.
			
 
				 (The concrete syntax for \LangCVar{} is in the Appendix,
			
 
				-Figure~\ref{fig:c0-concrete-syntax}.)
			
 
				+Figure~\ref{fig:c0-concrete-syntax}. \ocaml{(This appendix is not quite accurate
			
 
				+  for the OCaml version, but the details of the concrete syntax of
			
 
				+  an IR like this don't matter much, since it will normally be used
			
 
				+  only to dump out information when debugging; it won't be parsed.})
			
 
				 %
			
 
				 The \LangCVar{} language supports the same operators as \LangVar{} but
			
 
				 the arguments of operators are restricted to atomic
			
@@ -2420,19 +2603,23 @@ assignment statements which can be executed in sequence using the
 
				 expression that is the last one to execute within a function.
			
 
				 
			
 
				 A \LangCVar{} program consists of a control-flow graph represented as
			
 
				-an alist mapping labels to tails. This is more general than necessary
			
 
				+an alist mapping labels to tails \ocaml{(that is, a list of \code{(label*tail)} pairs)}.
			
 
				+This is more general than necessary
			
 
				 for the present chapter, as we do not yet introduce \key{goto} for
			
 
				 jumping to labels, but it saves us from having to change the syntax in
			
 
				 Chapter~\ref{ch:Rif}.  For now there will be just one label,
			
 
				-\key{start}, and the whole program is its tail.
			
 
				+\key{start}, and the whole program \ocaml{body} is its tail.
			
 
				 %
			
 
				 The $\itm{info}$ field of the \key{CProgram} form, after the
			
 
				 \key{explicate-control} pass, contains a mapping from the symbol
			
 
				 \key{locals} to a list of variables, that is, a list of all the
			
 
				-variables used in the program. At the start of the program, these
			
 
				+variables used in the program. \ocaml{It is represented as a \code{unit Env.t},
			
 
				+a kind of degenerate map that effectively acts like a set.}
			
 
				+At the start of the program, these
			
 
				 variables are uninitialized; they become initialized on their first
			
 
				 assignment.
			
 
				 
			
 
				+
			
 
				 \begin{figure}[tbp]
			
 
				 \fbox{
			
 
				 \begin{minipage}{0.96\textwidth}
			
@@ -2448,12 +2635,38 @@ assignment.
 
				 \]
			
 
				 \end{minipage}
			
 
				 }
			
 
				+\begin{lstlisting}[style=ocaml,frame=single]
			
 
				+type var = string
			
 
				+
			
 
				+type label = string
			
 
				+
			
 
				+type atm = 
			
 
				+    Int of int64
			
 
				+  | Var of var
			
 
				+
			
 
				+type exp =
			
 
				+    Atom of atm
			
 
				+  | Prim of primop * atm list
			
 
				+
			
 
				+type stmt =
			
 
				+    Assign of var * exp
			
 
				+
			
 
				+type tail =
			
 
				+    Return of exp
			
 
				+  | Seq of stmt*tail
			
 
				+
			
 
				+type 'pinfo program = Program of 'pinfo * (label*tail) list
			
 
				+\end{lstlisting}
			
 
				 \caption{The abstract syntax of the \LangCVar{} intermediate language.}
			
 
				 \label{fig:c0-syntax}
			
 
				 \end{figure}
			
 
				 
			
 
				 The definitional interpreter for \LangCVar{} is in the support code,
			
 
				 in the file \code{interp-Cvar.rkt}.
			
 
				+\begin{ocamlx}
			
 
				+  The OCaml code for \LangCVar{} AST, checking, printing (for debug purposes),
			
 
				+  and interpretation is in file \texttt{CVar.ml}. 
			
 
				+\end{ocamlx}
			
 
				 
			
 
				 \subsection{The \LangXVar{} dialect}
			
 
				 
			
@@ -2461,7 +2674,23 @@ The \LangXVar{} language is the output of the pass
 
				 \key{select-instructions}. It extends \LangXInt{} with an unbounded
			
 
				 number of program-scope variables and removes the restrictions
			
 
				 regarding instruction arguments.
			
 
				-
			
 
				+\begin{ocamlx}
			
 
				+For simplicity, we treat \LangXInt{}  and \LangXVar{} as the same
			
 
				+  language, defined in \texttt{X86Int.ml}. In particular, we allow \code{Var}
			
 
				+  as one of the possible forms for an instruction argument (\code{arg}).
			
 
				+  We provide two different check routines.
			
 
				+  \begin{itemize}
			
 
				+    \item \code{CheckLabels.check\_program}
			
 
				+      just checks that all label
			
 
				+      declarations are unique and that all jump targets are defined; this
			
 
				+      is suitable for checking the code produced from the \key{select-instructions}
			
 
				+      pass, which will use \code{Var} arguments freely.
			
 
				+    \item 
			
 
				+      \code{CheckArgs.check\_program} checks that all arguments are legal for the
			
 
				+      actual X86-64 machine (in particular, that they are not \code{Var} arguments);
			
 
				+      this is suitable for checking the output of the \key{patch-instr} pass.
			
 
				+  \end{itemize}
			
 
				+\end{ocamlx}
			
 
				 
			
 
				 \section{Uniquify Variables}
			
 
				 \label{sec:uniquify-Rvar}
			
@@ -2488,6 +2717,24 @@ $\Rightarrow$
 
				 \end{minipage}
			
 
				 \end{tabular} \\
			
 
				 %
			
 
				+\begin{tabular}{lll}
			
 
				+\begin{minipage}{0.4\textwidth}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+(let x 32
			
 
				+  (+ (let x 10 x) x))
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+&
			
 
				+\ocaml{$\Rightarrow$}
			
 
				+&
			
 
				+\begin{minipage}{0.4\textwidth}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+(let x.1 32
			
 
				+  (+ (let x.2 10 x.2) x.1))
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{tabular} \\
			
 
				+%
			
 
				 The following is another example translation, this time of a program
			
 
				 with a \key{let} nested inside the initializing expression of another
			
 
				 \key{let}.\\
			
@@ -2510,20 +2757,21 @@ $\Rightarrow$
 
				 \end{lstlisting}
			
 
				 \end{minipage}
			
 
				 \end{tabular}
			
 
				-
			
 
				+\ocaml{You can transliterate examples like this for yourself by now...}
			
 
				 We recommend implementing \code{uniquify} by creating a structurally
			
 
				 recursive function named \code{uniquify-exp} that mostly just copies
			
 
				 an expression. However, when encountering a \key{let}, it should
			
 
				 generate a unique name for the variable and associate the old name
			
 
				-with the new name in an alist.\footnote{The Racket function
			
 
				-  \code{gensym} is handy for generating unique variable names.} The
			
 
				-\code{uniquify-exp} function needs to access this alist when it gets
			
 
				+with the new name in an alist \ocaml{(Ocaml: \key{Env})}.\footnote{The Racket function
			
 
				+\code{gensym} is handy for generating unique variable names. \ocaml{There is a similar
			
 
				+function defined in \texttt{utils.ml}.}} The
			
 
				+\code{uniquify-exp} function needs to access this alist \ocaml{(\key{Env})} when it gets
			
 
				 to a variable reference, so we add a parameter to \code{uniquify-exp}
			
 
				-for the alist.
			
 
				+for the alist \ocaml{(\key{Env})} .
			
 
				 
			
 
				 The skeleton of the \code{uniquify-exp} function is shown in
			
 
				 Figure~\ref{fig:uniquify-Rvar}.  The function is curried so that it is
			
 
				-convenient to partially apply it to an alist and then apply it to
			
 
				+convenient to partially apply it to an alist \ocaml{(\key{Env})} and then apply it to
			
 
				 different expressions, as in the last case for primitive operations in
			
 
				 Figure~\ref{fig:uniquify-Rvar}.  The
			
 
				 %
			
@@ -2531,6 +2779,19 @@ Figure~\ref{fig:uniquify-Rvar}.  The
 
				 %
			
 
				 form of Racket is useful for transforming each element of a list to
			
 
				 produce a new list.\index{for/list}
			
 
				+\ocaml{The \code{List.map} function is similar.}
			
 
				+
			
 
				+\ocaml{In addition to writing the \code{uniquify} transformation, it is worthwhile 
			
 
				+  to write a \emph{checker} to make sure that the result obeys any invariants we
			
 
				+  expect to hold.  (Sometimes these invariants are baked into the abstract syntax
			
 
				+  of the target, but that's not the case here.) Our checker should re-traverse the
			
 
				+  result AST and make sure that no identifier is bound more than once.  It should also
			
 
				+  re-run the \LangVar{} checker defined in module \code{RVar} to make sure that
			
 
				+  all variables uses are in the scope of a binding (something we might easily have
			
 
				+  messed up) and that we have not accidentally introduced a primop arity error (much
			
 
				+  less likely, but still possible).
			
 
				+}
			
 
				+
			
 
				 
			
 
				 \begin{exercise}
			
 
				 \normalfont % I don't like the italics for exercises. -Jeremy
			
@@ -2538,7 +2799,8 @@ produce a new list.\index{for/list}
 
				 Complete the \code{uniquify} pass by filling in the blanks in
			
 
				 Figure~\ref{fig:uniquify-Rvar}, that is, implement the cases for
			
 
				 variables and for the \key{let} form in the file \code{compiler.rkt}
			
 
				-in the support code.
			
 
				+in the support code. \ocaml{This exercise is done for you, in the
			
 
				+  \code{Uniquify} module of file \code{Chapter2.ml}.}
			
 
				 \end{exercise}
			
 
				 
			
 
				 \begin{figure}[tbp]
			
@@ -2569,12 +2831,14 @@ parts of the \key{uniquify} pass, that is, the programs should include
 
				 The five programs should be placed in the subdirectory named
			
 
				 \key{tests} and the file names should start with \code{var\_test\_}
			
 
				 followed by a unique integer and end with the file extension
			
 
				-\key{.rkt}.
			
 
				+\key{.rkt}. \ocaml{OCaml: use extension \key{.r}.}
			
 
				 %
			
 
				-The \key{run-tests.rkt} script in the support code checks whether the
			
 
				+The \key{run-tests.rkt} script in the support code \ocaml{(\key{test\_files}
			
 
				+  function in \code{Chapter2.ml})} checks whether the
			
 
				 output programs produce the same result as the input programs.  The
			
 
				 script uses the \key{interp-tests} function
			
 
				-(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
			
 
				+(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} \ocaml{(\key{test\_files}
			
 
				+  function from \code{utils.ml})} to test
			
 
				 your \key{uniquify} pass on the example programs.  The \code{passes}
			
 
				 parameter of \key{interp-tests} is a list that should have one entry
			
 
				 for each pass in your compiler.  For now, define \code{passes} to
			
@@ -2585,7 +2849,7 @@ contain just one entry for \code{uniquify} as follows.
 
				 \end{lstlisting}
			
 
				 Run the \key{run-tests.rkt} script in the support code to check
			
 
				 whether the output programs produce the same result as the input
			
 
				-programs.
			
 
				+programs. \ocaml{XXXXXXX}  
			
 
				 \end{exercise}
			
 
				 
			
 
				 
			
@@ -2619,7 +2883,11 @@ $\Rightarrow$
 
				 \end{minipage}
			
 
				 \end{tabular}
			
 
				 
			
 
				-
			
 
				+\begin{ocamlx}
			
 
				+We suggest generating temporary names that begin with a back-tick (\verb'`')
			
 
				+since these are illegal as S-expression symbols, and so cannot conflict with existing
			
 
				+user-defined names.
			
 
				+\end{ocamlx}
			
 
				 \begin{figure}[tp]
			
 
				 \centering
			
 
				 \fbox{
			
@@ -2628,13 +2896,13 @@ $\Rightarrow$
 
				 \begin{array}{rcl}
			
 
				 \Atm &::=& \INT{\Int} \mid \VAR{\Var} \\
			
 
				 \Exp &::=& \Atm \mid \READ{} \\
			
 
				-     &\mid& \NEG{\Atm} \mid \ADD{\Atm}{\Atm}  \\
			
 
				+n     &\mid& \NEG{\Atm} \mid \ADD{\Atm}{\Atm}  \\
			
 
				      &\mid&  \LET{\Var}{\Exp}{\Exp} \\
			
 
				 R^{\dagger}_1  &::=& \PROGRAM{\code{'()}}{\Exp}
			
 
				 \end{array}
			
 
				 \]
			
 
				 \end{minipage}
			
 
				-}
			
 
				+}nnn
			
 
				 \caption{\LangVarANF{} is \LangVar{} in administrative normal form (ANF).}
			
 
				 \label{fig:r1-anf-syntax}
			
 
				 \end{figure}
			
@@ -2647,6 +2915,11 @@ and variables are atomic. In the literature, restricting arguments to
 
				 be atomic expressions is called \emph{administrative normal form}, or
			
 
				 ANF for short~\citep{Danvy:1991fk,Flanagan:1993cg}.
			
 
				 \index{administrative normal form} \index{ANF}
			
 
				+\ocaml{Actually, ANF
			
 
				+  as defined in~\citep{Flanagan:1993cg}
			
 
				+  refers to a more restricted form in which the defining expressions of
			
 
				+  \code{let}s cannot themselves contain \code{lets}s. This essentially
			
 
				+  corresponds to the \LangCVar{} language.}
			
 
				 
			
 
				 We recommend implementing this pass with two mutually recursive
			
 
				 functions, \code{rco-atom} and \code{rco-exp}. The idea is to apply
			
@@ -2654,7 +2927,7 @@ functions, \code{rco-atom} and \code{rco-exp}. The idea is to apply
 
				 apply \code{rco-exp} to subexpressions that do not.  Both functions
			
 
				 take an \LangVar{} expression as input.  The \code{rco-exp} function
			
 
				 returns an expression.  The \code{rco-atom} function returns two
			
 
				-things: an atomic expression and alist mapping temporary variables to
			
 
				+things: an atomic expression and alist \ocaml{(i.e. list of pairs)} mapping temporary variables to
			
 
				 complex subexpressions. You can return multiple things from a function
			
 
				 using Racket's \key{values} form and you can receive multiple things
			
 
				 from a function call using the \key{define-values} form. If you are
			
@@ -2664,7 +2937,9 @@ Also, the
 
				   form is useful for applying a function to each element of a list, in
			
 
				   the case where the function returns multiple values.
			
 
				   \index{for/lists}
			
 
				-
			
 
				+  \ocaml{OCaml: You can return multiple things from a function using a tuple
			
 
				+    and binding the return value to a tuple pattern. Again, the \code{List.map}
			
 
				+    function is handy.}
			
 
				 Returning to the example program \code{(+ 52 (- 10))}, the
			
 
				 subexpression \code{(- 10)} should be processed using the
			
 
				 \code{rco-atom} function because it is an argument of the \code{+} and
			
@@ -2723,10 +2998,15 @@ produce the following output with unnecessary temporary variables.\\
 
				 \end{lstlisting}
			
 
				 \end{minipage}
			
 
				 
			
 
				+
			
 
				 \begin{exercise}\normalfont
			
 
				 %
			
 
				 Implement the \code{remove-complex-opera*} function in
			
 
				-\code{compiler.rkt}.
			
 
				+\code{compiler.rkt}. \ocaml{Fill in the RemoveComplexOperations submodule in \code{Chapter2.ml}.
			
 
				+  Be sure to include a checker that re-traverses the target AST to make sure that
			
 
				+  all primop arguments are indeed now atomic, and that we haven't broken any of the
			
 
				+  other invariants we expect to hold of \LangInt{} programs at this point.
			
 
				+}
			
 
				 %
			
 
				 Create three new \LangInt{} programs that exercise the interesting
			
 
				 code in the \code{remove-complex-opera*} pass (Following the same file
			
@@ -2744,6 +3024,7 @@ intermeidate programs, place the following before the call to
 
				 \begin{lstlisting}
			
 
				 (debug-level 1)  
			
 
				 \end{lstlisting}
			
 
				+\ocaml{XXXXX}
			
 
				 \end{exercise}
			
 
				 
			
 
				 
			
@@ -2792,7 +3073,7 @@ start:
 
				 \end{lstlisting}
			
 
				 \end{minipage}
			
 
				 \end{tabular}
			
 
				-
			
 
				+%
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{lstlisting}
			
 
				 (define (explicate-tail e)
			
@@ -2853,11 +3134,22 @@ output. The reader might be tempted to instead organize
 
				 statements. We warn against that alternative because the
			
 
				 accumulator-passing style is key to how we generate high-quality code
			
 
				 for conditional expressions in Chapter~\ref{ch:Rif}.
			
 
				+\begin{ocamlx}
			
 
				+  Don't take this advice too seriously. Organize things in the cleanest way you
			
 
				+  can find; it will always be  possible to adjust your approach in later chapters.
			
 
				+\end{ocamlx}
			
 
				 
			
 
				 \begin{exercise}\normalfont
			
 
				 %
			
 
				 Implement the \code{explicate-control} function in
			
 
				-\code{compiler.rkt}.  Create three new \LangInt{} programs that
			
 
				+\code{compiler.rkt}.  \ocaml{Fill in the \code{ExplicateControl} submodule
			
 
				+  of \code{Chapter2.ml} by implementing the \code{do\_program} function.
			
 
				+  The checking field of this pass should invoke \code{CVar.check\_program},
			
 
				+  which checks that the target code is properly bound (and also fills in
			
 
				+  some information about the set of bound variables in the \code{'pinfo}
			
 
				+  field of the program that will be useful in a later pass).}
			
 
				+%
			
 
				+Create three new \LangInt{} programs that
			
 
				 exercise the code in \code{explicate-control}.
			
 
				 %
			
 
				 In the \code{run-tests.rkt} script, add the following entry to the
			
@@ -2865,6 +3157,7 @@ list of \code{passes} and then run the script to test your compiler.
 
				 \begin{lstlisting}
			
 
				 (list "explicate control" explicate-control interp-Cvar type-check-Cvar)  
			
 
				 \end{lstlisting}
			
 
				+\ocaml{XXXXX}
			
 
				 \end{exercise}
			
 
				 
			
 
				 \section{Select Instructions}
			
@@ -2875,8 +3168,9 @@ In the \code{select-instructions} pass we begin the work of
 
				 translating from \LangCVar{} to \LangXVar{}. The target language of
			
 
				 this pass is a variant of x86 that still uses variables, so we add an
			
 
				 AST node of the form $\VAR{\itm{var}}$ to the \Arg{} non-terminal of
			
 
				-the \LangXInt{} abstract syntax (Figure~\ref{fig:x86-int-ast}).  We
			
 
				-recommend implementing the \code{select-instructions} with
			
 
				+the \LangXInt{} abstract syntax (Figure~\ref{fig:x86-int-ast}). \ocaml{Recall that
			
 
				+  we use the same module to define \LangXInt{} and \LangXVar{}.}
			
 
				+We recommend implementing the \code{select-instructions} with
			
 
				 three auxiliary functions, one for each of the non-terminals of
			
 
				 \LangCVar{}: $\Atm$, $\Stmt$, and $\Tail$.
			
 
				 
			
@@ -2975,6 +3269,7 @@ list of \code{passes} and then run the script to test your compiler.
 
				 \begin{lstlisting}
			
 
				 (list "instruction selection" select-instructions interp-pseudo-x86-0)
			
 
				 \end{lstlisting}
			
 
				+\ocaml{XXXXXX}
			
 
				 \end{exercise}
			
 
				 
			
 
				 
			
@@ -3037,6 +3332,7 @@ with stack locations.  As an aside, the \code{locals-types} entry is
 
				 computed by \code{type-check-Cvar} in the support code, which installs
			
 
				 it in the $\itm{info}$ field of the \code{CProgram} node, which should
			
 
				 be propagated to the \code{X86Program} node.
			
 
				+\ocaml{XXXXX}
			
 
				 
			
 
				 In the process of assigning variables to stack locations, it is
			
 
				 convenient for you to compute and store the size of the frame (in
			
@@ -3057,6 +3353,7 @@ list of \code{passes} and then run the script to test your compiler.
 
				 \begin{lstlisting}
			
 
				 (list "assign homes" assign-homes interp-x86-0)
			
 
				 \end{lstlisting}
			
 
				+\ocaml{XXXX}
			
 
				 \end{exercise}
			
 
				 
			
 
				 
			
@@ -3066,7 +3363,10 @@ list of \code{passes} and then run the script to test your compiler.
 
				 The \code{patch-instructions} pass compiles from \LangXVar{} to
			
 
				 \LangXInt{} by making sure that each instruction adheres to the
			
 
				 restriction that at most one argument of an instruction may be a
			
 
				-memory reference.
			
 
				+memory reference. \ocaml{It also ensures that no immediate operand
			
 
				+  to an ordinary instruction exceeds 32 bits, by introducing \code{movabsq}
			
 
				+  instructions as needed. \code{movabsq} is the sole instruction that
			
 
				+  allows a 64-bit immediate source operand; its destination must be a register.}
			
 
				 
			
 
				 We return to the following example.
			
 
				 % var_test_20.rkt
			
@@ -3098,7 +3398,9 @@ from \key{rax} to the destination location, as follows.
 
				 
			
 
				 \begin{exercise}
			
 
				 \normalfont Implement the \key{patch-instructions} pass in
			
 
				-\code{compiler.rkt}. Create three new example programs that are
			
 
				+\code{compiler.rkt}. \ocaml{This task has been done for you, in the \code{PatchInstructions} submodule
			
 
				+of \code{Chapter2}.}
			
 
				+Create three new example programs that are
			
 
				 designed to exercise all of the interesting cases in this pass.
			
 
				 %
			
 
				 In the \code{run-tests.rkt} script, add the following entry to the
			
@@ -3116,7 +3418,8 @@ The last step of the compiler from \LangVar{} to x86 is to convert the
 
				 \LangXInt{} AST (defined in Figure~\ref{fig:x86-int-ast}) to the
			
 
				 string representation (defined in
			
 
				 Figure~\ref{fig:x86-int-concrete}). The Racket \key{format} and
			
 
				-\key{string-append} functions are useful in this regard. The main work
			
 
				+\key{string-append} functions are useful in this regard. \ocaml{The \code{Printf}
			
 
				+  library is useful here.} The main work
			
 
				 that this step needs to perform is to create the \key{main} function
			
 
				 and the standard instructions for its prelude and conclusion, as shown
			
 
				 in Figure~\ref{fig:p1-x86} of Section~\ref{sec:x86}. You will need to
			
@@ -3128,10 +3431,14 @@ When running on Mac OS X, you compiler should prefix an underscore to
 
				 labels like \key{main}. The Racket call \code{(system-type 'os)} is
			
 
				 useful for determining which operating system the compiler is running
			
 
				 on. It returns \code{'macosx}, \code{'unix}, or \code{'windows}.
			
 
				+\ocaml{There is a similar utility function \code{get\_ostype}
			
 
				+provided in the \texttt{utils.ml} module.}
			
 
				 
			
 
				 \begin{exercise}\normalfont
			
 
				 %
			
 
				 Implement the \key{print-x86} pass in \code{compiler.rkt}.
			
 
				+\ocaml{This task has been done for you; the relevant printing
			
 
				+  code is in module \code{X86Int}.}
			
 
				 %
			
 
				 In the \code{run-tests.rkt} script, add the following entry to the
			
 
				 list of \code{passes} and then run the script to test your compiler.
			
@@ -3144,6 +3451,8 @@ Uncomment the call to the \key{compiler-tests} function
 
				 compiler by executing the generated x86 code. Compile the provided
			
 
				 \key{runtime.c} file to \key{runtime.o} using \key{gcc}. Run the
			
 
				 script to test your compiler.
			
 
				+\ocaml{XXXXX}
			
 
				+
			
 
				 \end{exercise}