Andrew Tolmach 4 gadi atpakaļ
vecāks
revīzija
03ab5610f7
1 mainītis faili ar 362 papildinājumiem un 53 dzēšanām
  1. 362 53
      book.tex

+ 362 - 53
book.tex

@@ -103,11 +103,10 @@ showstringspaces=false
   language=[Objective]Caml,
   basicstyle=\ttfamily\small\color{blue},
   columns=flexible,
-  escapechar={},
+  escapechar=~,
   showstringspaces=false
 }
 
-
 \newtheorem{theorem}{Theorem}
 \newtheorem{lemma}[theorem]{Lemma}
 \newtheorem{corollary}[theorem]{Corollary}
@@ -887,13 +886,15 @@ Appendix~\ref{appendix:utilities} for more details.
     | SNum of Int64.t
             (* 64-bit integers *)
     | SSym of string
-            (* non-digit character sequence delimited by white space *)
+            (* character sequence starting with non-digit,
+               delimited by white space *)
     | SString of string
             (* arbitrary character sequence delimited by double quotes *)
   \end{lstlisting}
   The generic S-expression parser handles (nestable) comments delimited by
-  curly braces (\code{\{} and \code{\}}).  Symbols can contain any
-  non-digit, non-whitespace characters except parentheses, curly braces, and
+  curly braces (\code{\{} and \code{\}}).  Symbols must start with a non-digit
+  character and can contain any
+  non-whitespace characters except parentheses, curly braces, and
   the back tick (\code{\`}); this last exclusion is handy when we want to
   generate internal names during compilation and be sure they don't clash
   with a user-defined symbol.
@@ -1641,19 +1642,30 @@ We hope to give enough hints that the well-prepared reader, together
 with a few friends, can implement a compiler from \LangVar{} to x86 in
 a couple weeks.  To give the reader a feeling for the scale of this
 first compiler, the instructor solution for the \LangVar{} compiler is
-approximately 500 lines of code.
+approximately 500 lines of code. \ocaml{For the OCaml-based course,
+  several pieces of the compiler will be provided for you, leaving enough
+  work for a week-long assignment. The instructor solution for
+  the tasks left to you is under 200 lines of code.
+  However, in return for not writing so much code,
+  you will need to \emph{read} more existing code.}
 
 \section{The \LangVar{} Language}
 \label{sec:s0}
 \index{variable}
 
 The \LangVar{} language extends the \LangInt{} language with variable
-definitions.  The concrete syntax of the \LangVar{} language is defined by
+definitions. The concrete syntax of the \LangVar{} language is defined by
 the grammar in Figure~\ref{fig:r1-concrete-syntax} and the abstract
-syntax is defined in Figure~\ref{fig:r1-syntax}.  The non-terminal
-\Var{} may be any Racket identifier. As in \LangInt{}, \key{read} is a
+syntax is defined in Figure~\ref{fig:r1-syntax}.  \ocaml{For the OCaml
+  version, we don't feel the need to match the syntax of Racket exactly,
+  so we can simplify the concrete syntax of \key{let} bindings.}   The non-terminal
+\Var{} may be any Racket identifier. \ocaml{For OCaml, it can be any S-expression symbol.}
+As in \LangInt{}, \key{read} is a
 nullary operator, \key{-} is a unary operator, and \key{+} is a binary
-operator.  Similar to \LangInt{}, the abstract syntax of \LangVar{} includes the
+operator.  \ocaml{We also add \key{-} as a binary subtraction operator in
+  the concrete syntax, but not in the abstract syntax: 
+  we will ``de-sugar'' substraction into a combination
+  of addition and negation.}Similar to \LangInt{}, the abstract syntax of \LangVar{} includes the
 \key{Program} struct to mark the top of the program.
 %% The $\itm{info}$
 %% field of the \key{Program} structure contains an \emph{association
@@ -1675,7 +1687,20 @@ exhibit several compilation techniques.
 \]
 \end{minipage}
 }
-\caption{The concrete syntax of \LangVar{}.}
+\begin{ocamlx}
+\fbox{
+\begin{minipage}{0.96\textwidth}
+\[
+\begin{array}{rcl}
+  \Exp &::=& \Int \mid \CREAD{} \mid \CNEG{\Exp} \mid \CADD{\Exp}{\Exp} \mid \CSUB{\Exp}{\Exp}\\
+       &\mid& \Var \mid \code{(let $\Var$ $\Exp$ $\Exp$)}\\
+  \LangVar{} &::=& \Exp
+\end{array}
+\]
+\end{minipage}
+}
+\end{ocamlx}
+\caption{The concrete syntax of \LangVar{} \ocaml{in OCaml}.}
 \label{fig:r1-concrete-syntax}
 \end{figure}
 
@@ -1693,6 +1718,19 @@ exhibit several compilation techniques.
 \]
 \end{minipage}
 }
+\begin{lstlisting}[style=ocaml,frame=single]
+type primop = 
+   Read
+ | Neg
+ | Add
+type var = string
+type exp = 
+   Int of int64  
+ | Prim of primop * exp list
+ | Var of var
+ | Let of var * exp * exp
+type 'info program = Program of 'info * exp
+\end{lstlisting}
 \caption{The abstract syntax of \LangVar{}.}
 \label{fig:r1-syntax}
 \end{figure}
@@ -1705,11 +1743,17 @@ Figure~\ref{fig:r1-syntax}.  The concrete syntax for \key{let} is
 \begin{lstlisting}
 (let ([|$\itm{var}$| |$\itm{exp}$|]) |$\itm{exp}$|)
 \end{lstlisting}
+\begin{lstlisting}[style=ocaml]
+(let ~$\itm{var}$~ ~$\itm{exp}$~ ~$\itm{exp}$~)
+\end{lstlisting}
 For example, the following program initializes \code{x} to $32$ and then
 evaluates the body \code{(+ 10 x)}, producing $42$.
 \begin{lstlisting}
 (let ([x (+ 12 20)]) (+ 10 x))
 \end{lstlisting}
+\begin{lstlisting}[style=ocaml]
+(let x (+ 12 20) (+ 10 x))
+\end{lstlisting}
 When there are multiple \key{let}'s for the same variable, the closest
 enclosing \key{let} is used. That is, variable definitions overshadow
 prior definitions. Consider the following program with two \key{let}'s
@@ -1717,6 +1761,9 @@ that define variables named \code{x}. Can you figure out the result?
 \begin{lstlisting}
 (let ([x 32]) (+ (let ([x 10]) x) x))
 \end{lstlisting}
+\begin{lstlisting}[style=ocaml]
+(let x 32 (+ (let x 10 x) x))
+\end{lstlisting}
 For the purposes of depicting which variable uses correspond to which
 definitions, the following shows the \code{x}'s annotated with
 subscripts to distinguish them. Double check that your answer for the
@@ -1725,6 +1772,9 @@ program.
 \begin{lstlisting}
 (let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|))
 \end{lstlisting}
+\begin{lstlisting}[style=ocaml]
+(let x~$_1$~ 32 (+ (let x~$_2$~ 10 x~$_2$~) x~$_1$~))
+\end{lstlisting}
 The initializing expression is always evaluated before the body of the
 \key{let}, so in the following, the \key{read} for \code{x} is
 performed before the \key{read} for \code{y}. Given the input
@@ -1732,10 +1782,23 @@ $52$ then $10$, the following produces $42$ (not $-42$).
 \begin{lstlisting}
 (let ([x (read)]) (let ([y (read)]) (+ x (- y))))
 \end{lstlisting}
+\begin{lstlisting}[style=ocaml]
+(let x (read) (let y (read) (+ x (- y)))))
+\end{lstlisting}
 
 \subsection{Extensible Interpreters via Method Overriding}
 \label{sec:extensible-interp}
 
+\begin{ocamlx}
+  We are not going to bother with making our OCaml interpreters
+  extensible, although there are several mechanisms in OCaml that
+  we could use to acheive this. The languages involved here just
+  don't seem big enough to warrant the added complexity.
+  We will, however, break out the definition and interpretation of
+  primops into a separate module, so that this can be easily shared among
+  different languages.
+\end{ocamlx}
+
 To prepare for discussing the interpreter for \LangVar{}, we need to
 explain why we choose to implement the interpreter using
 object-oriented programming, that is, as a collection of methods
@@ -1885,10 +1948,16 @@ extensible way.
 \end{wrapfigure}
 
 Having justified the use of classes and methods to implement
-interpreters, we turn to the definitional interpreter for \LangVar{}
-in Figure~\ref{fig:interp-Rvar}. It is similar to the interpreter for
+interpreters \ocaml{(or not)}, we turn to the definitional interpreter for \LangVar{}
+in Figure~\ref{fig:interp-Rvar} \ocaml{(Figure~\ref{fig:interp-Rvar-ocaml})}.
+It is similar to the interpreter for
 \LangInt{} but adds two new \key{match} cases for variables and
-\key{let}.  For \key{let} we need a way to communicate the value bound
+\key{let}. \ocaml{Also, the code for performing primops has been split out
+  into a separate function. We rely on the fact that
+  \code{List.map} processes list elements from left to right to
+  enforce the intended order of evaluation of primop subexpressions.}
+
+For \key{let} we need a way to communicate the value bound
 to a variable to all the uses of the variable. To accomplish this, we
 maintain a mapping from variables to values. Throughout the compiler
 we often need to map variables to information about them. We refer to
@@ -1899,7 +1968,7 @@ these mappings as
 %
 For simplicity, we use an association list (alist) to represent the
 environment. The sidebar to the right gives a brief introduction to
-alists and the \code{racket/dict} package.  The \code{interp-exp}
+alists and the \code{racket/dict} package. The \code{interp-exp}
 function takes the current environment, \code{env}, as an extra
 parameter.  When the interpreter encounters a variable, it finds the
 corresponding value using the \code{dict-ref} function.  When the
@@ -1908,6 +1977,51 @@ expression, extends the environment with the result value bound to the
 variable, using \code{dict-set}, then evaluates the body of the
 \key{Let}.
 
+\begin{ocamlx}
+  In OCaml, we thread environments in the same way, but
+  it is convenient to represent environments using
+  the \code{Map} library module, which provides efficient
+  mappings from keys to values (using balanced binary trees,
+  although that is an implementation detail we don't need to
+  know about). \code{Map} is an example of a module that
+  is \emph{parameterized} by another module signature; this
+  is sometimes called a \emph{functor}.  Here we use \code{Map.Make}
+  to \emph{apply} the functor, thereby defining a module \code{Env} that provides operations
+  specialized to \code{string} keys (suitable for variables).
+  The type of environments is written \code{'a Env.t}; it is
+  parametric in the type \code{'a} of values stored in the map.
+  Here we will be using \LangVar{}
+  values, i.e. \code{int64}s, so the type is \code{int64 Env.t}.  
+  \code{Env.empty} represents an empty environment.
+  \code{Env.find $x$ $env$} returns the value associated with
+  variable $x$ in $env$ (throwing an exception if $x$ is not found). 
+  \code{Env.add $x$ $v$ $env$} produces a new environment
+  that is the same as $env$ except that variable $x$ is associated to
+  value $v$. Note that these operations are \emph{pure}; that is, they
+  do not mutate any environment.
+\end{ocamlx}
+
+\begin{ocamlx}
+  The OCaml code for \LangVar{} ASTs, concrete parsing and printing (for debug purposes),
+  and interpretation are in file \texttt{RVar.ml}, which also imports
+  from file \texttt{Primops.ml}.  These files also contain code for
+  static checking of \LangVar{} programs. The checker makes sure that
+  (i) every use of a variable is in the scope of a corresponding \code{let} binding;
+  and (ii) each primop is applied to the correct number of arguments.
+
+  Note that if a source program fails the checker for reason (i), this is a static user error
+  that should be reported as such. (Violations of (ii) in user programs
+  should be caught by the parser; parse errors are always reported as user errors.)
+  Your compiler should stop trying to process a file as soon as it reports a static user
+  error! (That's what the provided test driver will do.)
+
+  However, if a program initially passes
+  the checker but is subsequently transformed by the compiler and then
+  fails a re-check, this indicates that the problem is the compiler's fault.
+  In this case, the compiler itself should halt with a suitable error message.
+  The checker has a boolean flag to distinguish these cases.
+\end{ocamlx}
+
 \begin{figure}[tp]
 \begin{lstlisting}
 (define interp-Rvar-class
@@ -1940,6 +2054,31 @@ variable, using \code{dict-set}, then evaluates the body of the
 \caption{Interpreter for the \LangVar{} language.}
 \label{fig:interp-Rvar}
 \end{figure}
+\begin{figure}[tp]
+\begin{lstlisting}[style=ocaml]
+type value = int64
+  
+let interp_primop (op:primop) (args: value list) : value = 
+  match op,args with
+    Read,[] -> read_int()
+  | Neg,[v] -> Int64.neg v
+  | Add,[v1;v2] -> Int64.add v1 v2
+  | _,_ -> assert false (* arity mismatch *)
+
+module StringKey = struct type t = string let compare = String.compare end
+module Env = Map.Make(StringKey)
+
+let rec interp_exp (env:value Env.t) = function
+    Int n -> n
+  | Prim(op,args) -> interp_primop op (List.map (interp_exp env) args)
+  | Var x -> Env.find x env
+  | Let (x,e1,e2) -> interp_exp (Env.add x (interp_exp env e1) env) e2
+
+let interp_program (Program(_,e)) = interp_exp Env.empty e
+\end{lstlisting}
+\caption{\ocaml{Ocaml interpreter for the \LangVar{} language.}}
+\label{fig:interp-Rvar-ocaml}
+\end{figure}
 
 The goal for this chapter is to implement a compiler that translates
 any program $P_1$ written in the \LangVar{} language into an x86 assembly
@@ -2002,7 +2141,8 @@ integer constant (called \emph{immediate value}\index{immediate
 \Arg &::=&  \key{\$}\Int \mid \key{\%}\Reg \mid \Int\key{(}\key{\%}\Reg\key{)}\\
 \Instr &::=& \key{addq} \; \Arg\key{,} \Arg \mid
       \key{subq} \; \Arg\key{,} \Arg \mid
-      \key{negq} \; \Arg \mid \key{movq} \; \Arg\key{,} \Arg \mid \\
+      \key{negq} \; \Arg \mid \\
+  &&  \key{movq} \; \Arg\key{,} \Arg \mid \ocaml{\key{movabsq} \; \Arg\key{,} \Arg \mid} \\
   &&  \key{callq} \; \mathit{label} \mid
       \key{pushq}\;\Arg \mid \key{popq}\;\Arg \mid \key{retq} \mid \key{jmp}\,\itm{label} \\
   && \itm{label}\key{:}\; \Instr \\
@@ -2062,8 +2202,10 @@ returning the integer in \key{rax} to the operating system. The
 operating system interprets this integer as the program's exit
 code. By convention, an exit code of 0 indicates that a program
 completed successfully, and all other exit codes indicate various
-errors. Nevertheless, in this book we return the result of the program
-as the exit code.
+errors. \ocaml{Also, exit codes are unsigned bytes, so they cannot accurately represent
+arbitrary \code{int64}s.} Nevertheless, in this book we return the result of the program
+as the exit code. \ocaml{(Incidentally, if you run a program at the unix shell
+  prompt, you can retrieve its exit code by typing \texttt{echo \$?} as the very next command.)}
 
 \begin{figure}[tbp]
 \begin{lstlisting}
@@ -2081,7 +2223,8 @@ The x86 assembly language varies in a couple ways depending on what
 operating system it is assembled in. The code examples shown here are
 correct on Linux and most Unix-like platforms, but when assembled on
 Mac OS X, labels like \key{main} must be prefixed with an underscore,
-as in \key{\_main}.
+as in \key{\_main}. \ocaml{There is a utility function \code{get\_ostype}
+provided in the \texttt{utils.ml} module provided with the support materials.}
 
 We exhibit the use of memory for storing intermediate results in the
 next example.  Figure~\ref{fig:p1-x86} lists an x86 program that is
@@ -2201,12 +2344,23 @@ organization becomes apparent in Chapter~\ref{ch:Rif} when we
 introduce conditional branching. The \code{Block} structure includes
 an $\itm{info}$ field that is not needed for this chapter, but becomes
 useful in Chapter~\ref{ch:register-allocation-Rvar}.  For now, the
-$\itm{info}$ field should contain an empty list. Also, regarding the
+$\itm{info}$ field should contain an empty list. \ocaml{The \code{'binfo}
+  type parameter should be instantiated with \code{unit}.}
+Also, regarding the
 abstract syntax for \code{callq}, the \code{Callq} struct includes an
 integer for representing the arity of the function, i.e., the number
 of arguments, which is helpful to know during register allocation
 (Chapter~\ref{ch:register-allocation-Rvar}).
 
+\begin{ocamlx}
+  The OCaml code for \LangXInt{} AST, printing, and checking is
+  in file \texttt{X86Int.ml}. Printing is used to produce \texttt{.s} files that
+  can be input to the system assembler; it can also be useful for debugging.
+  File \texttt{utils.ml} contains functions for invoking the assembler and linker and
+  running the resulting executables from inside OCaml; these are invoked
+  from the test drivers also defined in that file.
+\end{ocamlx}    
+
 \begin{figure}[tp]
 \fbox{
 \begin{minipage}{0.98\textwidth}
@@ -2218,8 +2372,9 @@ of arguments, which is helpful to know during register allocation
    \mid \DEREF{\Reg}{\Int} \\
 \Instr &::=& \BININSTR{\code{addq}}{\Arg}{\Arg} 
        \mid \BININSTR{\code{subq}}{\Arg}{\Arg} \\
+       &\mid& \UNIINSTR{\code{negq}}{\Arg}\\
        &\mid& \BININSTR{\code{movq}}{\Arg}{\Arg}
-       \mid \UNIINSTR{\code{negq}}{\Arg}\\
+       \ocaml{\mid \BININSTR{\code{movabsq}}{\Arg}{\Arg}} \\
        &\mid& \CALLQ{\itm{label}}{\itm{int}} \mid \RETQ{} 
        \mid \PUSHQ{\Arg} \mid \POPQ{\Arg} \mid \JMP{\itm{label}} \\
 \Block &::= & \BLOCK{\itm{info}}{\LP\Instr\ldots\RP} \\
@@ -2228,10 +2383,34 @@ of arguments, which is helpful to know during register allocation
 \]
 \end{minipage}
 }
-\caption{The abstract syntax of \LangXInt{} assembly.}
+\begin{lstlisting}[style=ocaml,frame=single]
+type reg =
+    RSP | RBP | RAX | RBX | RCX | RDX | RSI | RDI
+  | R8  | R9  | R10 | R11 | R12 | R13 | R14 | R15
+
+type label = string
+
+type arg =
+    Imm of int64  (* in most cases must actually be an int32 *)
+  | Reg of reg
+  | Deref of reg*int32
+  | Var of string (* a pseudo-argument for ~$\LangXVar{}$~ *)
+
+type instr =
+    Addq of arg*arg | Subq of arg*arg | Negq of arg 
+  | Movq of arg*arg | Movabsq of arg*arg | Callq of label*int 
+  | Retq | Pushq of arg | Popq of arg | Jmp of label
+
+type 'binfo block = Block of 'binfo * instr list
+
+type ('pinfo,'binfo) program =
+    Program of 'pinfo * (label * 'binfo block) list 
+\end{lstlisting}
+\caption{The abstract syntax of \LangXInt{} \ocaml{and \LangXVar{}} assembly.}
 \label{fig:x86-int-ast}
 \end{figure}
 
+
 \section{Planning the trip to x86 via the \LangCVar{} language}
 \label{sec:plan-s0-x86}
 
@@ -2246,7 +2425,8 @@ and x86 assembly? Here are some of the most important ones:
   arithmetic operations take two arguments and produce a new value.
   An x86 instruction may have at most one memory-accessing argument.
   Furthermore, some instructions place special restrictions on their
-  arguments.
+  arguments. \ocaml{For example, immediate operands are usually restricted
+    to fit in 32 bits (except for the \code{movabsq} instruction).}
 
 \item[(b)] An argument of an \LangVar{} operator can be a deeply-nested
   expression, whereas x86 instructions restrict their arguments to be
@@ -2327,7 +2507,7 @@ become local variables whose scope is the entire program, which would
 confuse variables with the same name.
 %
 We place \key{remove-complex-opera*} before \key{explicate-control}
-because the later removes the \key{let} form, but it is convenient to
+because the latter removes the \key{let} form, but it is convenient to
 use \key{let} in the output of \key{remove-complex-opera*}.
 %
 The ordering of \key{uniquify} with respect to
@@ -2407,7 +2587,10 @@ language~\citep{Kernighan:1988nx} in that it has separate syntactic
 categories for expressions and statements, so we name it \LangCVar{}.  The
 abstract syntax for \LangCVar{} is defined in Figure~\ref{fig:c0-syntax}.
 (The concrete syntax for \LangCVar{} is in the Appendix,
-Figure~\ref{fig:c0-concrete-syntax}.)
+Figure~\ref{fig:c0-concrete-syntax}. \ocaml{(This appendix is not quite accurate
+  for the OCaml version, but the details of the concrete syntax of
+  an IR like this don't matter much, since it will normally be used
+  only to dump out information when debugging; it won't be parsed.})
 %
 The \LangCVar{} language supports the same operators as \LangVar{} but
 the arguments of operators are restricted to atomic
@@ -2420,19 +2603,23 @@ assignment statements which can be executed in sequence using the
 expression that is the last one to execute within a function.
 
 A \LangCVar{} program consists of a control-flow graph represented as
-an alist mapping labels to tails. This is more general than necessary
+an alist mapping labels to tails \ocaml{(that is, a list of \code{(label*tail)} pairs)}.
+This is more general than necessary
 for the present chapter, as we do not yet introduce \key{goto} for
 jumping to labels, but it saves us from having to change the syntax in
 Chapter~\ref{ch:Rif}.  For now there will be just one label,
-\key{start}, and the whole program is its tail.
+\key{start}, and the whole program \ocaml{body} is its tail.
 %
 The $\itm{info}$ field of the \key{CProgram} form, after the
 \key{explicate-control} pass, contains a mapping from the symbol
 \key{locals} to a list of variables, that is, a list of all the
-variables used in the program. At the start of the program, these
+variables used in the program. \ocaml{It is represented as a \code{unit Env.t},
+a kind of degenerate map that effectively acts like a set.}
+At the start of the program, these
 variables are uninitialized; they become initialized on their first
 assignment.
 
+
 \begin{figure}[tbp]
 \fbox{
 \begin{minipage}{0.96\textwidth}
@@ -2448,12 +2635,38 @@ assignment.
 \]
 \end{minipage}
 }
+\begin{lstlisting}[style=ocaml,frame=single]
+type var = string
+
+type label = string
+
+type atm = 
+    Int of int64
+  | Var of var
+
+type exp =
+    Atom of atm
+  | Prim of primop * atm list
+
+type stmt =
+    Assign of var * exp
+
+type tail =
+    Return of exp
+  | Seq of stmt*tail
+
+type 'pinfo program = Program of 'pinfo * (label*tail) list
+\end{lstlisting}
 \caption{The abstract syntax of the \LangCVar{} intermediate language.}
 \label{fig:c0-syntax}
 \end{figure}
 
 The definitional interpreter for \LangCVar{} is in the support code,
 in the file \code{interp-Cvar.rkt}.
+\begin{ocamlx}
+  The OCaml code for \LangCVar{} AST, checking, printing (for debug purposes),
+  and interpretation is in file \texttt{CVar.ml}. 
+\end{ocamlx}
 
 \subsection{The \LangXVar{} dialect}
 
@@ -2461,7 +2674,23 @@ The \LangXVar{} language is the output of the pass
 \key{select-instructions}. It extends \LangXInt{} with an unbounded
 number of program-scope variables and removes the restrictions
 regarding instruction arguments.
-
+\begin{ocamlx}
+For simplicity, we treat \LangXInt{}  and \LangXVar{} as the same
+  language, defined in \texttt{X86Int.ml}. In particular, we allow \code{Var}
+  as one of the possible forms for an instruction argument (\code{arg}).
+  We provide two different check routines.
+  \begin{itemize}
+    \item \code{CheckLabels.check\_program}
+      just checks that all label
+      declarations are unique and that all jump targets are defined; this
+      is suitable for checking the code produced from the \key{select-instructions}
+      pass, which will use \code{Var} arguments freely.
+    \item 
+      \code{CheckArgs.check\_program} checks that all arguments are legal for the
+      actual X86-64 machine (in particular, that they are not \code{Var} arguments);
+      this is suitable for checking the output of the \key{patch-instr} pass.
+  \end{itemize}
+\end{ocamlx}
 
 \section{Uniquify Variables}
 \label{sec:uniquify-Rvar}
@@ -2488,6 +2717,24 @@ $\Rightarrow$
 \end{minipage}
 \end{tabular} \\
 %
+\begin{tabular}{lll}
+\begin{minipage}{0.4\textwidth}
+\begin{lstlisting}[style=ocaml]
+(let x 32
+  (+ (let x 10 x) x))
+\end{lstlisting}
+\end{minipage}
+&
+\ocaml{$\Rightarrow$}
+&
+\begin{minipage}{0.4\textwidth}
+\begin{lstlisting}[style=ocaml]
+(let x.1 32
+  (+ (let x.2 10 x.2) x.1))
+\end{lstlisting}
+\end{minipage}
+\end{tabular} \\
+%
 The following is another example translation, this time of a program
 with a \key{let} nested inside the initializing expression of another
 \key{let}.\\
@@ -2510,20 +2757,21 @@ $\Rightarrow$
 \end{lstlisting}
 \end{minipage}
 \end{tabular}
-
+\ocaml{You can transliterate examples like this for yourself by now...}
 We recommend implementing \code{uniquify} by creating a structurally
 recursive function named \code{uniquify-exp} that mostly just copies
 an expression. However, when encountering a \key{let}, it should
 generate a unique name for the variable and associate the old name
-with the new name in an alist.\footnote{The Racket function
-  \code{gensym} is handy for generating unique variable names.} The
-\code{uniquify-exp} function needs to access this alist when it gets
+with the new name in an alist \ocaml{(Ocaml: \key{Env})}.\footnote{The Racket function
+\code{gensym} is handy for generating unique variable names. \ocaml{There is a similar
+function defined in \texttt{utils.ml}.}} The
+\code{uniquify-exp} function needs to access this alist \ocaml{(\key{Env})} when it gets
 to a variable reference, so we add a parameter to \code{uniquify-exp}
-for the alist.
+for the alist \ocaml{(\key{Env})} .
 
 The skeleton of the \code{uniquify-exp} function is shown in
 Figure~\ref{fig:uniquify-Rvar}.  The function is curried so that it is
-convenient to partially apply it to an alist and then apply it to
+convenient to partially apply it to an alist \ocaml{(\key{Env})} and then apply it to
 different expressions, as in the last case for primitive operations in
 Figure~\ref{fig:uniquify-Rvar}.  The
 %
@@ -2531,6 +2779,19 @@ Figure~\ref{fig:uniquify-Rvar}.  The
 %
 form of Racket is useful for transforming each element of a list to
 produce a new list.\index{for/list}
+\ocaml{The \code{List.map} function is similar.}
+
+\ocaml{In addition to writing the \code{uniquify} transformation, it is worthwhile 
+  to write a \emph{checker} to make sure that the result obeys any invariants we
+  expect to hold.  (Sometimes these invariants are baked into the abstract syntax
+  of the target, but that's not the case here.) Our checker should re-traverse the
+  result AST and make sure that no identifier is bound more than once.  It should also
+  re-run the \LangVar{} checker defined in module \code{RVar} to make sure that
+  all variables uses are in the scope of a binding (something we might easily have
+  messed up) and that we have not accidentally introduced a primop arity error (much
+  less likely, but still possible).
+}
+
 
 \begin{exercise}
 \normalfont % I don't like the italics for exercises. -Jeremy
@@ -2538,7 +2799,8 @@ produce a new list.\index{for/list}
 Complete the \code{uniquify} pass by filling in the blanks in
 Figure~\ref{fig:uniquify-Rvar}, that is, implement the cases for
 variables and for the \key{let} form in the file \code{compiler.rkt}
-in the support code.
+in the support code. \ocaml{This exercise is done for you, in the
+  \code{Uniquify} module of file \code{Chapter2.ml}.}
 \end{exercise}
 
 \begin{figure}[tbp]
@@ -2569,12 +2831,14 @@ parts of the \key{uniquify} pass, that is, the programs should include
 The five programs should be placed in the subdirectory named
 \key{tests} and the file names should start with \code{var\_test\_}
 followed by a unique integer and end with the file extension
-\key{.rkt}.
+\key{.rkt}. \ocaml{OCaml: use extension \key{.r}.}
 %
-The \key{run-tests.rkt} script in the support code checks whether the
+The \key{run-tests.rkt} script in the support code \ocaml{(\key{test\_files}
+  function in \code{Chapter2.ml})} checks whether the
 output programs produce the same result as the input programs.  The
 script uses the \key{interp-tests} function
-(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
+(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} \ocaml{(\key{test\_files}
+  function from \code{utils.ml})} to test
 your \key{uniquify} pass on the example programs.  The \code{passes}
 parameter of \key{interp-tests} is a list that should have one entry
 for each pass in your compiler.  For now, define \code{passes} to
@@ -2585,7 +2849,7 @@ contain just one entry for \code{uniquify} as follows.
 \end{lstlisting}
 Run the \key{run-tests.rkt} script in the support code to check
 whether the output programs produce the same result as the input
-programs.
+programs. \ocaml{XXXXXXX}  
 \end{exercise}
 
 
@@ -2619,7 +2883,11 @@ $\Rightarrow$
 \end{minipage}
 \end{tabular}
 
-
+\begin{ocamlx}
+We suggest generating temporary names that begin with a back-tick (\verb'`')
+since these are illegal as S-expression symbols, and so cannot conflict with existing
+user-defined names.
+\end{ocamlx}
 \begin{figure}[tp]
 \centering
 \fbox{
@@ -2628,13 +2896,13 @@ $\Rightarrow$
 \begin{array}{rcl}
 \Atm &::=& \INT{\Int} \mid \VAR{\Var} \\
 \Exp &::=& \Atm \mid \READ{} \\
-     &\mid& \NEG{\Atm} \mid \ADD{\Atm}{\Atm}  \\
+n     &\mid& \NEG{\Atm} \mid \ADD{\Atm}{\Atm}  \\
      &\mid&  \LET{\Var}{\Exp}{\Exp} \\
 R^{\dagger}_1  &::=& \PROGRAM{\code{'()}}{\Exp}
 \end{array}
 \]
 \end{minipage}
-}
+}nnn
 \caption{\LangVarANF{} is \LangVar{} in administrative normal form (ANF).}
 \label{fig:r1-anf-syntax}
 \end{figure}
@@ -2647,6 +2915,11 @@ and variables are atomic. In the literature, restricting arguments to
 be atomic expressions is called \emph{administrative normal form}, or
 ANF for short~\citep{Danvy:1991fk,Flanagan:1993cg}.
 \index{administrative normal form} \index{ANF}
+\ocaml{Actually, ANF
+  as defined in~\citep{Flanagan:1993cg}
+  refers to a more restricted form in which the defining expressions of
+  \code{let}s cannot themselves contain \code{lets}s. This essentially
+  corresponds to the \LangCVar{} language.}
 
 We recommend implementing this pass with two mutually recursive
 functions, \code{rco-atom} and \code{rco-exp}. The idea is to apply
@@ -2654,7 +2927,7 @@ functions, \code{rco-atom} and \code{rco-exp}. The idea is to apply
 apply \code{rco-exp} to subexpressions that do not.  Both functions
 take an \LangVar{} expression as input.  The \code{rco-exp} function
 returns an expression.  The \code{rco-atom} function returns two
-things: an atomic expression and alist mapping temporary variables to
+things: an atomic expression and alist \ocaml{(i.e. list of pairs)} mapping temporary variables to
 complex subexpressions. You can return multiple things from a function
 using Racket's \key{values} form and you can receive multiple things
 from a function call using the \key{define-values} form. If you are
@@ -2664,7 +2937,9 @@ Also, the
   form is useful for applying a function to each element of a list, in
   the case where the function returns multiple values.
   \index{for/lists}
-
+  \ocaml{OCaml: You can return multiple things from a function using a tuple
+    and binding the return value to a tuple pattern. Again, the \code{List.map}
+    function is handy.}
 Returning to the example program \code{(+ 52 (- 10))}, the
 subexpression \code{(- 10)} should be processed using the
 \code{rco-atom} function because it is an argument of the \code{+} and
@@ -2723,10 +2998,15 @@ produce the following output with unnecessary temporary variables.\\
 \end{lstlisting}
 \end{minipage}
 
+
 \begin{exercise}\normalfont
 %
 Implement the \code{remove-complex-opera*} function in
-\code{compiler.rkt}.
+\code{compiler.rkt}. \ocaml{Fill in the RemoveComplexOperations submodule in \code{Chapter2.ml}.
+  Be sure to include a checker that re-traverses the target AST to make sure that
+  all primop arguments are indeed now atomic, and that we haven't broken any of the
+  other invariants we expect to hold of \LangInt{} programs at this point.
+}
 %
 Create three new \LangInt{} programs that exercise the interesting
 code in the \code{remove-complex-opera*} pass (Following the same file
@@ -2744,6 +3024,7 @@ intermeidate programs, place the following before the call to
 \begin{lstlisting}
 (debug-level 1)  
 \end{lstlisting}
+\ocaml{XXXXX}
 \end{exercise}
 
 
@@ -2792,7 +3073,7 @@ start:
 \end{lstlisting}
 \end{minipage}
 \end{tabular}
-
+%
 \begin{figure}[tbp]
 \begin{lstlisting}
 (define (explicate-tail e)
@@ -2853,11 +3134,22 @@ output. The reader might be tempted to instead organize
 statements. We warn against that alternative because the
 accumulator-passing style is key to how we generate high-quality code
 for conditional expressions in Chapter~\ref{ch:Rif}.
+\begin{ocamlx}
+  Don't take this advice too seriously. Organize things in the cleanest way you
+  can find; it will always be  possible to adjust your approach in later chapters.
+\end{ocamlx}
 
 \begin{exercise}\normalfont
 %
 Implement the \code{explicate-control} function in
-\code{compiler.rkt}.  Create three new \LangInt{} programs that
+\code{compiler.rkt}.  \ocaml{Fill in the \code{ExplicateControl} submodule
+  of \code{Chapter2.ml} by implementing the \code{do\_program} function.
+  The checking field of this pass should invoke \code{CVar.check\_program},
+  which checks that the target code is properly bound (and also fills in
+  some information about the set of bound variables in the \code{'pinfo}
+  field of the program that will be useful in a later pass).}
+%
+Create three new \LangInt{} programs that
 exercise the code in \code{explicate-control}.
 %
 In the \code{run-tests.rkt} script, add the following entry to the
@@ -2865,6 +3157,7 @@ list of \code{passes} and then run the script to test your compiler.
 \begin{lstlisting}
 (list "explicate control" explicate-control interp-Cvar type-check-Cvar)  
 \end{lstlisting}
+\ocaml{XXXXX}
 \end{exercise}
 
 \section{Select Instructions}
@@ -2875,8 +3168,9 @@ In the \code{select-instructions} pass we begin the work of
 translating from \LangCVar{} to \LangXVar{}. The target language of
 this pass is a variant of x86 that still uses variables, so we add an
 AST node of the form $\VAR{\itm{var}}$ to the \Arg{} non-terminal of
-the \LangXInt{} abstract syntax (Figure~\ref{fig:x86-int-ast}).  We
-recommend implementing the \code{select-instructions} with
+the \LangXInt{} abstract syntax (Figure~\ref{fig:x86-int-ast}). \ocaml{Recall that
+  we use the same module to define \LangXInt{} and \LangXVar{}.}
+We recommend implementing the \code{select-instructions} with
 three auxiliary functions, one for each of the non-terminals of
 \LangCVar{}: $\Atm$, $\Stmt$, and $\Tail$.
 
@@ -2975,6 +3269,7 @@ list of \code{passes} and then run the script to test your compiler.
 \begin{lstlisting}
 (list "instruction selection" select-instructions interp-pseudo-x86-0)
 \end{lstlisting}
+\ocaml{XXXXXX}
 \end{exercise}
 
 
@@ -3037,6 +3332,7 @@ with stack locations.  As an aside, the \code{locals-types} entry is
 computed by \code{type-check-Cvar} in the support code, which installs
 it in the $\itm{info}$ field of the \code{CProgram} node, which should
 be propagated to the \code{X86Program} node.
+\ocaml{XXXXX}
 
 In the process of assigning variables to stack locations, it is
 convenient for you to compute and store the size of the frame (in
@@ -3057,6 +3353,7 @@ list of \code{passes} and then run the script to test your compiler.
 \begin{lstlisting}
 (list "assign homes" assign-homes interp-x86-0)
 \end{lstlisting}
+\ocaml{XXXX}
 \end{exercise}
 
 
@@ -3066,7 +3363,10 @@ list of \code{passes} and then run the script to test your compiler.
 The \code{patch-instructions} pass compiles from \LangXVar{} to
 \LangXInt{} by making sure that each instruction adheres to the
 restriction that at most one argument of an instruction may be a
-memory reference.
+memory reference. \ocaml{It also ensures that no immediate operand
+  to an ordinary instruction exceeds 32 bits, by introducing \code{movabsq}
+  instructions as needed. \code{movabsq} is the sole instruction that
+  allows a 64-bit immediate source operand; its destination must be a register.}
 
 We return to the following example.
 % var_test_20.rkt
@@ -3098,7 +3398,9 @@ from \key{rax} to the destination location, as follows.
 
 \begin{exercise}
 \normalfont Implement the \key{patch-instructions} pass in
-\code{compiler.rkt}. Create three new example programs that are
+\code{compiler.rkt}. \ocaml{This task has been done for you, in the \code{PatchInstructions} submodule
+of \code{Chapter2}.}
+Create three new example programs that are
 designed to exercise all of the interesting cases in this pass.
 %
 In the \code{run-tests.rkt} script, add the following entry to the
@@ -3116,7 +3418,8 @@ The last step of the compiler from \LangVar{} to x86 is to convert the
 \LangXInt{} AST (defined in Figure~\ref{fig:x86-int-ast}) to the
 string representation (defined in
 Figure~\ref{fig:x86-int-concrete}). The Racket \key{format} and
-\key{string-append} functions are useful in this regard. The main work
+\key{string-append} functions are useful in this regard. \ocaml{The \code{Printf}
+  library is useful here.} The main work
 that this step needs to perform is to create the \key{main} function
 and the standard instructions for its prelude and conclusion, as shown
 in Figure~\ref{fig:p1-x86} of Section~\ref{sec:x86}. You will need to
@@ -3128,10 +3431,14 @@ When running on Mac OS X, you compiler should prefix an underscore to
 labels like \key{main}. The Racket call \code{(system-type 'os)} is
 useful for determining which operating system the compiler is running
 on. It returns \code{'macosx}, \code{'unix}, or \code{'windows}.
+\ocaml{There is a similar utility function \code{get\_ostype}
+provided in the \texttt{utils.ml} module.}
 
 \begin{exercise}\normalfont
 %
 Implement the \key{print-x86} pass in \code{compiler.rkt}.
+\ocaml{This task has been done for you; the relevant printing
+  code is in module \code{X86Int}.}
 %
 In the \code{run-tests.rkt} script, add the following entry to the
 list of \code{passes} and then run the script to test your compiler.
@@ -3144,6 +3451,8 @@ Uncomment the call to the \key{compiler-tests} function
 compiler by executing the generated x86 code. Compile the provided
 \key{runtime.c} file to \key{runtime.o} using \key{gcc}. Run the
 script to test your compiler.
+\ocaml{XXXXX}
+
 \end{exercise}