Andrew Tolmach пре 4 година
родитељ
комит
6b694bb6a6
1 измењених фајлова са 146 додато и 12 уклоњено
  1. 146 12
      book.tex

+ 146 - 12
book.tex

@@ -5493,7 +5493,7 @@ in Figure~\ref{fig:Rif-syntax}. The \LangIf{} language includes all of
 \code{\#f}, and the conditional \code{if} expression. We expand the
 \code{\#f}, and the conditional \code{if} expression. We expand the
 operators to include
 operators to include
 \begin{enumerate}
 \begin{enumerate}
-\item subtraction on integers,
+\item subtraction on integers \ocaml{OCaml version already had thes},
 \item the logical operators \key{and}, \key{or} and \key{not},
 \item the logical operators \key{and}, \key{or} and \key{not},
 \item the \key{eq?} operation for comparing two integers or two Booleans, and
 \item the \key{eq?} operation for comparing two integers or two Booleans, and
 \item the \key{<}, \key{<=}, \key{>}, and \key{>=} operations for
 \item the \key{<}, \key{<=}, \key{>}, and \key{>=} operations for
@@ -5525,7 +5525,26 @@ Section~\ref{sec:type-check-Rif}.
 \]
 \]
 \end{minipage}
 \end{minipage}
 }
 }
-\caption{The concrete syntax of \LangIf{}, extending \LangVar{}
+\begin{ocamlx}
+\fbox{
+\begin{minipage}{0.96\textwidth}
+\[
+\begin{array}{rcl}
+  \itm{bool} &::=& \key{\#t} \mid \key{\#f} \\  
+  \itm{cmp} &::= & \key{=} \mid \key{<} \mid \key{<=} \mid \key{>} \mid \key{>=} \\
+  \Exp &::=& \gray{ \Int \mid \CREAD{} \mid \CNEG{\Exp} \mid \CADD{\Exp}{\Exp}   \mid \CSUB{\Exp}{\Exp}} \\
+     &\mid&  \gray{ \Var \mid \code{(let $\Var$ $\Exp$ $\Exp$)}}\\
+     &\mid& \itm{bool}
+      \mid (\key{and}\;\Exp\;\Exp) \mid (\key{or}\;\Exp\;\Exp)
+      \mid (\key{not}\;\Exp) \\
+      &\mid& (\itm{cmp}\;\Exp\;\Exp) \mid \CIF{\Exp}{\Exp}{\Exp} \\
+  \LangIf{} &::=& \Exp
+\end{array}
+\]
+\end{minipage}
+}
+\end{ocamlx}
+\caption{The concrete syntax of \LangIf{} \ocaml{for OCaml version}, extending \LangVar{} 
   (Figure~\ref{fig:r1-concrete-syntax}) with Booleans and conditionals.}
   (Figure~\ref{fig:r1-concrete-syntax}) with Booleans and conditionals.}
 \label{fig:Rif-concrete-syntax}
 \label{fig:Rif-concrete-syntax}
 \end{figure}
 \end{figure}
@@ -5548,13 +5567,29 @@ Section~\ref{sec:type-check-Rif}.
 \]
 \]
 \end{minipage}
 \end{minipage}
 }
 }
+\begin{minipage}{0.96\textwidth}
+\begin{lstlisting}[style=ocaml,frame=single]
+type cmp = Eq | Lt | Le | Gt | Ge 
+type primop =  Read | Neg | Add | Sub | And | Or | Not | Cmp of cmp
+type var = string
+type exp = 
+   Int of int64  
+ | Bool of bool
+ | Prim of primop * exp list
+ | Var of var
+ | Let of var * exp * exp
+ | If of exp * exp * exp 
+type 'info program = Program of 'info * exp
+\end{lstlisting}
+\end{minipage}
 \caption{The abstract syntax of \LangIf{}.}
 \caption{The abstract syntax of \LangIf{}.}
 \label{fig:Rif-syntax}
 \label{fig:Rif-syntax}
 \end{figure}
 \end{figure}
 
 
 Figure~\ref{fig:interp-Rif} defines the interpreter for \LangIf{},
 Figure~\ref{fig:interp-Rif} defines the interpreter for \LangIf{},
 which inherits from the interpreter for \LangVar{}
 which inherits from the interpreter for \LangVar{}
-(Figure~\ref{fig:interp-Rvar}). The literals \code{\#t} and \code{\#f}
+(Figure~\ref{fig:interp-Rvar}). \ocaml{The OCaml interpreter
+  can be found in \code{RIf.ml}.} The literals \code{\#t} and \code{\#f}
 evaluate to the corresponding Boolean values. The conditional
 evaluate to the corresponding Boolean values. The conditional
 expression $(\key{if}\, \itm{cnd}\,\itm{thn}\,\itm{els})$ evaluates
 expression $(\key{if}\, \itm{cnd}\,\itm{thn}\,\itm{els})$ evaluates
 \itm{cnd} and then either evaluates \itm{thn} or \itm{els} depending
 \itm{cnd} and then either evaluates \itm{thn} or \itm{els} depending
@@ -5562,7 +5597,13 @@ on whether \itm{cnd} produced \code{\#t} or \code{\#f}. The logical
 operations \code{not} and \code{and} behave as you might expect, but
 operations \code{not} and \code{and} behave as you might expect, but
 note that the \code{and} operation is short-circuiting. That is, given
 note that the \code{and} operation is short-circuiting. That is, given
 the expression $(\key{and}\,e_1\,e_2)$, the expression $e_2$ is not
 the expression $(\key{and}\,e_1\,e_2)$, the expression $e_2$ is not
-evaluated if $e_1$ evaluates to \code{\#f}.
+evaluated if $e_1$ evaluates to \code{\#f}. \ocaml{Note also that
+  the \code{or} operation is \emph{not} short-circuiting; that is,
+  both operands are always evaluated. Having \code{and} and
+  \code{or} behave differently with respect to short-circuiting
+  would be bizarre in a production language, but here it gives
+  us an opportunity to compare the implementation of the two
+  styles of operators.}
 
 
 With the increase in the number of primitive operations, the
 With the increase in the number of primitive operations, the
 interpreter would become repetitive without some care.  We refactor
 interpreter would become repetitive without some care.  We refactor
@@ -5678,7 +5719,7 @@ class, shown in Figure~\ref{fig:type-check-Rvar}. The type checker for
 \LangIf{} is shown in Figure~\ref{fig:type-check-Rif} and it inherits
 \LangIf{} is shown in Figure~\ref{fig:type-check-Rif} and it inherits
 from the type checker for \LangVar{}. These type checkers are in the
 from the type checker for \LangVar{}. These type checkers are in the
 files \code{type-check-Rvar.rkt} and \code{type-check-Rif.rkt} of the
 files \code{type-check-Rvar.rkt} and \code{type-check-Rif.rkt} of the
-support code.
+support code. \ocaml{A single unified checker is in \code{RIf.ml}.}
 %
 %
 Each type checker is a structurally recursive function over the AST.
 Each type checker is a structurally recursive function over the AST.
 Given an input expression \code{e}, the type checker either signals an
 Given an input expression \code{e}, the type checker either signals an
@@ -5876,6 +5917,30 @@ to x86.
 \]
 \]
 \end{minipage}
 \end{minipage}
 }
 }
+\begin{minipage}{0.96\textwidth}
+
+\begin{lstlisting}[style=ocaml,frame=single]
+type cmp = Eq | Lt 
+type primop =  Read | Neg | Add | Not | Cmp of cmp
+type var = string
+type label = string
+type atm =
+    Int of int64
+  | Bool of bool
+  | Var of var 
+type exp =
+    Atom of atm 
+  | Prim of primop * atm list
+type stmt =
+    Assign of var * exp
+type tail =
+    Return of exp
+  | Seq of stmt*tail
+  | Goto of label
+  | IfStmt of cmp * atm * atm * label * label 
+type 'info program = Program of 'info * (label*tail) list
+\end{lstlisting}
+\end{minipage}
 \caption{The abstract syntax of \LangCIf{}, an extension of \LangCVar{}
 \caption{The abstract syntax of \LangCIf{}, an extension of \LangCVar{}
   (Figure~\ref{fig:c0-syntax}).}
   (Figure~\ref{fig:c0-syntax}).}
 \label{fig:c1-syntax}
 \label{fig:c1-syntax}
@@ -5889,7 +5954,8 @@ operations, and the \key{if} expression, we need to delve further into
 the x86 language. Figures~\ref{fig:x86-1-concrete} and \ref{fig:x86-1}
 the x86 language. Figures~\ref{fig:x86-1-concrete} and \ref{fig:x86-1}
 define the concrete and abstract syntax for the \LangXIf{} subset
 define the concrete and abstract syntax for the \LangXIf{} subset
 of x86, which includes instructions for logical operations,
 of x86, which includes instructions for logical operations,
-comparisons, and conditional jumps.
+comparisons, and conditional jumps. \ocaml{The OCaml concrete
+  syntax is in \code{X86If.ml}.}
 
 
 One challenge is that x86 does not provide an instruction that
 One challenge is that x86 does not provide an instruction that
 directly implements logical negation (\code{not} in \LangIf{} and
 directly implements logical negation (\code{not} in \LangIf{} and
@@ -5932,6 +5998,7 @@ the first argument:
 \Instr &::=& \gray{ \key{addq} \; \Arg\key{,} \Arg \mid
 \Instr &::=& \gray{ \key{addq} \; \Arg\key{,} \Arg \mid
       \key{subq} \; \Arg\key{,} \Arg \mid
       \key{subq} \; \Arg\key{,} \Arg \mid
       \key{negq} \; \Arg \mid \key{movq} \; \Arg\key{,} \Arg \mid } \\
       \key{negq} \; \Arg \mid \key{movq} \; \Arg\key{,} \Arg \mid } \\
+  && \ocaml{\key{movabsq} \; \Arg\key{,} \Arg \mid} \\
   &&  \gray{ \key{callq} \; \itm{label} \mid
   &&  \gray{ \key{callq} \; \itm{label} \mid
       \key{pushq}\;\Arg \mid \key{popq}\;\Arg \mid \key{retq} \mid \key{jmp}\,\itm{label} } \\
       \key{pushq}\;\Arg \mid \key{popq}\;\Arg \mid \key{retq} \mid \key{jmp}\,\itm{label} } \\
   && \gray{ \itm{label}\key{:}\; \Instr }
   && \gray{ \itm{label}\key{:}\; \Instr }
@@ -5951,6 +6018,8 @@ the first argument:
 \label{fig:x86-1-concrete}
 \label{fig:x86-1-concrete}
 \end{figure}
 \end{figure}
 
 
+
+
 \begin{figure}[tp]
 \begin{figure}[tp]
 \fbox{
 \fbox{
 \begin{minipage}{0.98\textwidth}
 \begin{minipage}{0.98\textwidth}
@@ -6019,6 +6088,17 @@ the conditional jump instruction relies on the EFLAGS register, it is
 common for it to be immediately preceded by a \key{cmpq} instruction
 common for it to be immediately preceded by a \key{cmpq} instruction
 to set the EFLAGS register.
 to set the EFLAGS register.
 
 
+\begin{ocamlx}
+  The EFLAGS register is affected not just by \code{cmpq}, but by almost
+  all the arithmetic and logical instructions.  Clever coders can sometimes
+  figure out how combine a test with an othewise useful operation.  But we
+  will always rely on  \code{cmpq} to set EFLAGS.  Moreover, we will always
+  place the \code{cmpq} immediately before the 
+  \code{set} or $\key{j}\itm{cc}$ instruction that relies on EFLAGS.
+  The interpreter provided for {\tt X86If} code assumes this, and
+  will fail if it tries to execute an isolated instance of one
+  of these instructions.
+\end{ocamlx}
 
 
 \section{Shrink the \LangIf{} Language}
 \section{Shrink the \LangIf{} Language}
 \label{sec:shrink-Rif}
 \label{sec:shrink-Rif}
@@ -6036,11 +6116,16 @@ and logical negation.
 \LP\key{let}~\LP\LS\key{tmp.1}~e_1\RS\RP~\LP\key{not}\;\LP\key{<}\;e_2\;\key{tmp.1})\RP\RP
 \LP\key{let}~\LP\LS\key{tmp.1}~e_1\RS\RP~\LP\key{not}\;\LP\key{<}\;e_2\;\key{tmp.1})\RP\RP
 \]
 \]
 The \key{let} is needed in the above translation to ensure that
 The \key{let} is needed in the above translation to ensure that
-expression $e_1$ is evaluated before $e_2$.
+expression $e_1$ is evaluated before $e_2$. \ocaml{However, such a \code{let}
+  should be inserted only if $e_1$ is not already a variable or integer.}
 
 
 By performing these translations in the front-end of the compiler, the
 By performing these translations in the front-end of the compiler, the
 later passes of the compiler do not need to deal with these operators,
 later passes of the compiler do not need to deal with these operators,
-making the passes shorter.
+making the passes shorter. \ocaml{On the other hand, unlike the
+  syntactic desugaring we performed in the parser in an earlier chapter,
+  we wait to perform this shrinking pass until after typechecking; that way,
+  any type error messages will be in terms of the original program.
+}
 
 
 %% On the other hand, sometimes
 %% On the other hand, sometimes
 %% these translations make it more difficult to generate the most
 %% these translations make it more difficult to generate the most
@@ -6054,8 +6139,14 @@ Implement the pass \code{shrink} to remove subtraction, \key{and},
 \key{or}, \key{<=}, \key{>}, and \key{>=} from the language by
 \key{or}, \key{<=}, \key{>}, and \key{>=} from the language by
 translating them to other constructs in \LangIf{}.
 translating them to other constructs in \LangIf{}.
 %
 %
+\ocaml{Put your solution in the \code{Shrink} submodule of {\tt Chapter4.ml}.}
+%
 Create six test programs that involve these operators.
 Create six test programs that involve these operators.
 %
 %
+\ocaml{Make sure to include tests that confirm you have not altered
+  the order of evaluation of sub-expressions of these operators.
+  (Hint: use \code{read}s.)}
+%
 In the \code{run-tests.rkt} script, add the following entry for
 In the \code{run-tests.rkt} script, add the following entry for
 \code{shrink} to the list of passes (it should be the only pass at
 \code{shrink} to the list of passes (it should be the only pass at
 this point).
 this point).
@@ -6066,6 +6157,10 @@ This instructs \code{interp-tests} to run the intepreter
 \code{interp-Rif} and the type checker \code{type-check-Rif} on the
 \code{interp-Rif} and the type checker \code{type-check-Rif} on the
 output of \code{shrink}.
 output of \code{shrink}.
 %
 %
+\ocaml{You should consider writing an additional checking pass that
+  makes sure all the forbidden operators have really been removed,
+  in addition to invoking the standard \code{RIf} checker.}
+%
 Run the script to test your compiler on all the test programs.
 Run the script to test your compiler on all the test programs.
 
 
 \end{exercise}
 \end{exercise}
@@ -6077,6 +6172,8 @@ Add cases to \code{uniquify-exp} to handle Boolean constants and
 \code{if} expressions.
 \code{if} expressions.
 
 
 \begin{exercise}\normalfont
 \begin{exercise}\normalfont
+  \ocaml{This exercise has been done for you, in submodule \code{Uniquify}
+    of \code{Chapter4.ml}.}
 Update the \code{uniquify-exp} for \LangIf{} and add the following
 Update the \code{uniquify-exp} for \LangIf{} and add the following
 entry to the list of \code{passes} in the \code{run-tests.rkt} script.
 entry to the list of \code{passes} in the \code{run-tests.rkt} script.
 \begin{lstlisting}
 \begin{lstlisting}
@@ -6126,6 +6223,8 @@ R^{\dagger}_2  &::=& \PROGRAM{\code{()}}{\Exp}
 
 
 
 
 \begin{exercise}\normalfont
 \begin{exercise}\normalfont
+  \ocaml{This exercise has been done for you, in submodule \code{RemoveComplexOperands}
+    of \code{Chapter4.ml}.}
 %
 %
 Add cases for Boolean constants and \code{if} to the \code{rco-atom}
 Add cases for Boolean constants and \code{if} to the \code{rco-atom}
 and \code{rco-exp} functions in \code{compiler.rkt}.
 and \code{rco-exp} functions in \code{compiler.rkt}.
@@ -6223,6 +6322,11 @@ the following code.
 \end{center}
 \end{center}
 Unfortunately, this approach duplicates the two branches from the
 Unfortunately, this approach duplicates the two branches from the
 outer \code{if} and a compiler must never duplicate code!
 outer \code{if} and a compiler must never duplicate code!
+\ocaml{That may be a bit too strong. Sometimes duplicating
+  small amounts of code may actually produce a program that runs faster.
+  But it fair to say that a compiler should never duplicate
+  an \emph{unbounded} amount of code, as might happen with
+  the transformation here.}
 
 
 We need a way to perform the above transformation but without
 We need a way to perform the above transformation but without
 duplicating code. That is, we need a way for different parts of a
 duplicating code. That is, we need a way for different parts of a
@@ -6308,7 +6412,7 @@ block39:
 \end{tabular} 
 \end{tabular} 
 
 
 \caption{Translation from \LangIf{} to \LangCIf{}
 \caption{Translation from \LangIf{} to \LangCIf{}
-  via the \code{explicate-control}.}
+  via the \code{explicate-control}.\ocaml{Note that the RCO pass does \emph{not} pull out the conditions from the \code{if} expressions.}}
 \label{fig:explicate-control-s1-38}
 \label{fig:explicate-control-s1-38}
 \end{figure}
 \end{figure}
 
 
@@ -6385,6 +6489,18 @@ for Boolean constants, the blocks \code{thn} and \code{els} may not
 get used at all and we don't want to prematurely add them to the
 get used at all and we don't want to prematurely add them to the
 control-flow graph if they end up being discarded.
 control-flow graph if they end up being discarded.
 
 
+\ocaml{But this only happens quite rarely (when a \code{if}
+  tests a literal boolean value). Moreover, it is easy to forestall
+  this from happening by performing a partial-evaluation style pass
+  prior to \code{explicate-control}, or, alternatively, to 
+  clean up any generated but unused blocks after the fact. So I suggest ignoring
+  the whole lazy evaluation story in the remainder of this section.
+  Instead, design \code{explicate\_pred} to take as arguments
+  two \emph{labels} representing where to transfer control when
+  the test expression is true or fale.  It is the responsibility of
+  \emph{caller} of \code{explicate\_pred} to construct appropriate
+  blocks and pass their labels. }
+  
 The solution to this conundrum is to use \emph{lazy
 The solution to this conundrum is to use \emph{lazy
   evaluation}\index{lazy evaluation}\citep{Friedman:1976aa} to delay
   evaluation}\index{lazy evaluation}\citep{Friedman:1976aa} to delay
 adding the blocks to the control-flow graph until the points where we
 adding the blocks to the control-flow graph until the points where we
@@ -6542,6 +6658,10 @@ Boolean constants and \key{if} to the \code{explicate-tail} and
 \code{explicate-assign}. Implement the auxiliary function
 \code{explicate-assign}. Implement the auxiliary function
 \code{explicate-pred} for predicate contexts.
 \code{explicate-pred} for predicate contexts.
 %
 %
+\ocaml{Put your code in the \code{ExplicateControl} submodule of
+  \code{Chapter4.ml}. It is recommended that you base your code
+  on the skeleton already in that file.}
+%
 Create test cases that exercise all of the new cases in the code for
 Create test cases that exercise all of the new cases in the code for
 this pass.
 this pass.
 %
 %
@@ -6649,6 +6769,9 @@ jmp |$\ell_2$|
 Expand your \code{select-instructions} pass to handle the new features
 Expand your \code{select-instructions} pass to handle the new features
 of the \LangIf{} language.
 of the \LangIf{} language.
 %
 %
+\ocaml{Place your solution in the \code{SelectInstructions} submodule of
+  \code{Chapter4.ml}.}
+%
 Add the following entry to the list of \code{passes} in
 Add the following entry to the list of \code{passes} in
 \code{run-tests.rkt}
 \code{run-tests.rkt}
 \begin{lstlisting}
 \begin{lstlisting}
@@ -6693,6 +6816,9 @@ before computing a topological order.
 Use the \code{tsort} and \code{transpose} functions of the Racket
 Use the \code{tsort} and \code{transpose} functions of the Racket
 \code{graph} package to accomplish this.
 \code{graph} package to accomplish this.
 %
 %
+\ocaml{Use the \code{topsort} and \code{transpose} functions of the
+  provided \code{Digraph} functor.}
+%
 As an aside, a topological ordering is only guaranteed to exist if the
 As an aside, a topological ordering is only guaranteed to exist if the
 graph does not contain any cycles. That is indeed the case for the
 graph does not contain any cycles. That is indeed the case for the
 control-flow graphs that we generate from \LangIf{} programs.
 control-flow graphs that we generate from \LangIf{} programs.
@@ -6705,7 +6831,9 @@ control-flow graph. Do not use the \code{directed-graph} of the
 each pair of vertices, but a control-flow graph may have multiple
 each pair of vertices, but a control-flow graph may have multiple
 edges between a pair of vertices. The \code{multigraph.rkt} file in
 edges between a pair of vertices. The \code{multigraph.rkt} file in
 the support code implements a graph representation that allows
 the support code implements a graph representation that allows
-multiple edges between a pair of vertices.
+multiple edges between a pair of vertices. \ocaml{There is no
+  need for a multigraph for our purposes in this chapter. Just
+use the plain directed graphs in \code{digraph.ml}.}
 
 
 The next question is how to analyze jump instructions.  Recall that in
 The next question is how to analyze jump instructions.  Recall that in
 Section~\ref{sec:liveness-analysis-Rvar} we maintain an alist named
 Section~\ref{sec:liveness-analysis-Rvar} we maintain an alist named
@@ -6738,7 +6866,9 @@ new kinds of arguments and instructions in \LangXIfVar{}.
 \begin{exercise}\normalfont
 \begin{exercise}\normalfont
 Update the \code{uncover-live} pass and implement the
 Update the \code{uncover-live} pass and implement the
 \code{uncover-live-CFG} auxiliary function to apply liveness analysis
 \code{uncover-live-CFG} auxiliary function to apply liveness analysis
-to the control-flow graph.  Add the following entry to the list of
+to the control-flow graph.
+\ocaml{This is in the \code{UncoverLive} submodule of \code{Chapter4.ml}.}
+Add the following entry to the list of
 \code{passes} in the \code{run-tests.rkt} script.
 \code{passes} in the \code{run-tests.rkt} script.
 \begin{lstlisting}
 \begin{lstlisting}
 (list "uncover-live" uncover-live interp-pseudo-x86-1)
 (list "uncover-live" uncover-live interp-pseudo-x86-1)
@@ -6761,6 +6891,8 @@ similar to the \key{movq} instruction. See rule number 1 in
 Section~\ref{sec:build-interference}.
 Section~\ref{sec:build-interference}.
 
 
 \begin{exercise}\normalfont
 \begin{exercise}\normalfont
+  \ocaml{This exercise has been done for you, in submodule \code{BuildInterference}
+    of \code{Chapter4.ml}.}
 Update the \code{build-interference} pass for \LangXIfVar{} and add the
 Update the \code{build-interference} pass for \LangXIfVar{} and add the
 following entries to the list of \code{passes} in the
 following entries to the list of \code{passes} in the
 \code{run-tests.rkt} script.
 \code{run-tests.rkt} script.
@@ -6786,6 +6918,8 @@ The second argument of the \key{movzbq} must be a register.
 There are no special restrictions on the jump instructions.
 There are no special restrictions on the jump instructions.
 
 
 \begin{exercise}\normalfont
 \begin{exercise}\normalfont
+  \ocaml{This exercise has been done for you, in submodule \code{PatchInstructions}
+    of \code{Chapter4.ml}.}
 %
 %
 Update \code{patch-instructions} pass for \LangXIfVar{}.
 Update \code{patch-instructions} pass for \LangXIfVar{}.
 %  
 %  
@@ -6912,7 +7046,7 @@ conclusion:
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 \end{tabular}
 \end{tabular}
-\caption{Example compilation of an \key{if} expression to x86.}
+\caption{Example compilation of an \key{if} expression to x86.\ocaml{(For some reason, all the callee-save registers are being saved, even though they are not used.)}}
 \label{fig:if-example-x86}
 \label{fig:if-example-x86}
 \end{figure}
 \end{figure}