Ver código fonte

polishing in ch. 4

Andrew Tolmach 4 anos atrás
pai
commit
6b694bb6a6
1 arquivos alterados com 146 adições e 12 exclusões
  1. 146 12
      book.tex

+ 146 - 12
book.tex

@@ -5493,7 +5493,7 @@ in Figure~\ref{fig:Rif-syntax}. The \LangIf{} language includes all of
 \code{\#f}, and the conditional \code{if} expression. We expand the
 operators to include
 \begin{enumerate}
-\item subtraction on integers,
+\item subtraction on integers \ocaml{OCaml version already had thes},
 \item the logical operators \key{and}, \key{or} and \key{not},
 \item the \key{eq?} operation for comparing two integers or two Booleans, and
 \item the \key{<}, \key{<=}, \key{>}, and \key{>=} operations for
@@ -5525,7 +5525,26 @@ Section~\ref{sec:type-check-Rif}.
 \]
 \end{minipage}
 }
-\caption{The concrete syntax of \LangIf{}, extending \LangVar{}
+\begin{ocamlx}
+\fbox{
+\begin{minipage}{0.96\textwidth}
+\[
+\begin{array}{rcl}
+  \itm{bool} &::=& \key{\#t} \mid \key{\#f} \\  
+  \itm{cmp} &::= & \key{=} \mid \key{<} \mid \key{<=} \mid \key{>} \mid \key{>=} \\
+  \Exp &::=& \gray{ \Int \mid \CREAD{} \mid \CNEG{\Exp} \mid \CADD{\Exp}{\Exp}   \mid \CSUB{\Exp}{\Exp}} \\
+     &\mid&  \gray{ \Var \mid \code{(let $\Var$ $\Exp$ $\Exp$)}}\\
+     &\mid& \itm{bool}
+      \mid (\key{and}\;\Exp\;\Exp) \mid (\key{or}\;\Exp\;\Exp)
+      \mid (\key{not}\;\Exp) \\
+      &\mid& (\itm{cmp}\;\Exp\;\Exp) \mid \CIF{\Exp}{\Exp}{\Exp} \\
+  \LangIf{} &::=& \Exp
+\end{array}
+\]
+\end{minipage}
+}
+\end{ocamlx}
+\caption{The concrete syntax of \LangIf{} \ocaml{for OCaml version}, extending \LangVar{} 
   (Figure~\ref{fig:r1-concrete-syntax}) with Booleans and conditionals.}
 \label{fig:Rif-concrete-syntax}
 \end{figure}
@@ -5548,13 +5567,29 @@ Section~\ref{sec:type-check-Rif}.
 \]
 \end{minipage}
 }
+\begin{minipage}{0.96\textwidth}
+\begin{lstlisting}[style=ocaml,frame=single]
+type cmp = Eq | Lt | Le | Gt | Ge 
+type primop =  Read | Neg | Add | Sub | And | Or | Not | Cmp of cmp
+type var = string
+type exp = 
+   Int of int64  
+ | Bool of bool
+ | Prim of primop * exp list
+ | Var of var
+ | Let of var * exp * exp
+ | If of exp * exp * exp 
+type 'info program = Program of 'info * exp
+\end{lstlisting}
+\end{minipage}
 \caption{The abstract syntax of \LangIf{}.}
 \label{fig:Rif-syntax}
 \end{figure}
 
 Figure~\ref{fig:interp-Rif} defines the interpreter for \LangIf{},
 which inherits from the interpreter for \LangVar{}
-(Figure~\ref{fig:interp-Rvar}). The literals \code{\#t} and \code{\#f}
+(Figure~\ref{fig:interp-Rvar}). \ocaml{The OCaml interpreter
+  can be found in \code{RIf.ml}.} The literals \code{\#t} and \code{\#f}
 evaluate to the corresponding Boolean values. The conditional
 expression $(\key{if}\, \itm{cnd}\,\itm{thn}\,\itm{els})$ evaluates
 \itm{cnd} and then either evaluates \itm{thn} or \itm{els} depending
@@ -5562,7 +5597,13 @@ on whether \itm{cnd} produced \code{\#t} or \code{\#f}. The logical
 operations \code{not} and \code{and} behave as you might expect, but
 note that the \code{and} operation is short-circuiting. That is, given
 the expression $(\key{and}\,e_1\,e_2)$, the expression $e_2$ is not
-evaluated if $e_1$ evaluates to \code{\#f}.
+evaluated if $e_1$ evaluates to \code{\#f}. \ocaml{Note also that
+  the \code{or} operation is \emph{not} short-circuiting; that is,
+  both operands are always evaluated. Having \code{and} and
+  \code{or} behave differently with respect to short-circuiting
+  would be bizarre in a production language, but here it gives
+  us an opportunity to compare the implementation of the two
+  styles of operators.}
 
 With the increase in the number of primitive operations, the
 interpreter would become repetitive without some care.  We refactor
@@ -5678,7 +5719,7 @@ class, shown in Figure~\ref{fig:type-check-Rvar}. The type checker for
 \LangIf{} is shown in Figure~\ref{fig:type-check-Rif} and it inherits
 from the type checker for \LangVar{}. These type checkers are in the
 files \code{type-check-Rvar.rkt} and \code{type-check-Rif.rkt} of the
-support code.
+support code. \ocaml{A single unified checker is in \code{RIf.ml}.}
 %
 Each type checker is a structurally recursive function over the AST.
 Given an input expression \code{e}, the type checker either signals an
@@ -5876,6 +5917,30 @@ to x86.
 \]
 \end{minipage}
 }
+\begin{minipage}{0.96\textwidth}
+
+\begin{lstlisting}[style=ocaml,frame=single]
+type cmp = Eq | Lt 
+type primop =  Read | Neg | Add | Not | Cmp of cmp
+type var = string
+type label = string
+type atm =
+    Int of int64
+  | Bool of bool
+  | Var of var 
+type exp =
+    Atom of atm 
+  | Prim of primop * atm list
+type stmt =
+    Assign of var * exp
+type tail =
+    Return of exp
+  | Seq of stmt*tail
+  | Goto of label
+  | IfStmt of cmp * atm * atm * label * label 
+type 'info program = Program of 'info * (label*tail) list
+\end{lstlisting}
+\end{minipage}
 \caption{The abstract syntax of \LangCIf{}, an extension of \LangCVar{}
   (Figure~\ref{fig:c0-syntax}).}
 \label{fig:c1-syntax}
@@ -5889,7 +5954,8 @@ operations, and the \key{if} expression, we need to delve further into
 the x86 language. Figures~\ref{fig:x86-1-concrete} and \ref{fig:x86-1}
 define the concrete and abstract syntax for the \LangXIf{} subset
 of x86, which includes instructions for logical operations,
-comparisons, and conditional jumps.
+comparisons, and conditional jumps. \ocaml{The OCaml concrete
+  syntax is in \code{X86If.ml}.}
 
 One challenge is that x86 does not provide an instruction that
 directly implements logical negation (\code{not} in \LangIf{} and
@@ -5932,6 +5998,7 @@ the first argument:
 \Instr &::=& \gray{ \key{addq} \; \Arg\key{,} \Arg \mid
       \key{subq} \; \Arg\key{,} \Arg \mid
       \key{negq} \; \Arg \mid \key{movq} \; \Arg\key{,} \Arg \mid } \\
+  && \ocaml{\key{movabsq} \; \Arg\key{,} \Arg \mid} \\
   &&  \gray{ \key{callq} \; \itm{label} \mid
       \key{pushq}\;\Arg \mid \key{popq}\;\Arg \mid \key{retq} \mid \key{jmp}\,\itm{label} } \\
   && \gray{ \itm{label}\key{:}\; \Instr }
@@ -5951,6 +6018,8 @@ the first argument:
 \label{fig:x86-1-concrete}
 \end{figure}
 
+
+
 \begin{figure}[tp]
 \fbox{
 \begin{minipage}{0.98\textwidth}
@@ -6019,6 +6088,17 @@ the conditional jump instruction relies on the EFLAGS register, it is
 common for it to be immediately preceded by a \key{cmpq} instruction
 to set the EFLAGS register.
 
+\begin{ocamlx}
+  The EFLAGS register is affected not just by \code{cmpq}, but by almost
+  all the arithmetic and logical instructions.  Clever coders can sometimes
+  figure out how combine a test with an othewise useful operation.  But we
+  will always rely on  \code{cmpq} to set EFLAGS.  Moreover, we will always
+  place the \code{cmpq} immediately before the 
+  \code{set} or $\key{j}\itm{cc}$ instruction that relies on EFLAGS.
+  The interpreter provided for {\tt X86If} code assumes this, and
+  will fail if it tries to execute an isolated instance of one
+  of these instructions.
+\end{ocamlx}
 
 \section{Shrink the \LangIf{} Language}
 \label{sec:shrink-Rif}
@@ -6036,11 +6116,16 @@ and logical negation.
 \LP\key{let}~\LP\LS\key{tmp.1}~e_1\RS\RP~\LP\key{not}\;\LP\key{<}\;e_2\;\key{tmp.1})\RP\RP
 \]
 The \key{let} is needed in the above translation to ensure that
-expression $e_1$ is evaluated before $e_2$.
+expression $e_1$ is evaluated before $e_2$. \ocaml{However, such a \code{let}
+  should be inserted only if $e_1$ is not already a variable or integer.}
 
 By performing these translations in the front-end of the compiler, the
 later passes of the compiler do not need to deal with these operators,
-making the passes shorter.
+making the passes shorter. \ocaml{On the other hand, unlike the
+  syntactic desugaring we performed in the parser in an earlier chapter,
+  we wait to perform this shrinking pass until after typechecking; that way,
+  any type error messages will be in terms of the original program.
+}
 
 %% On the other hand, sometimes
 %% these translations make it more difficult to generate the most
@@ -6054,8 +6139,14 @@ Implement the pass \code{shrink} to remove subtraction, \key{and},
 \key{or}, \key{<=}, \key{>}, and \key{>=} from the language by
 translating them to other constructs in \LangIf{}.
 %
+\ocaml{Put your solution in the \code{Shrink} submodule of {\tt Chapter4.ml}.}
+%
 Create six test programs that involve these operators.
 %
+\ocaml{Make sure to include tests that confirm you have not altered
+  the order of evaluation of sub-expressions of these operators.
+  (Hint: use \code{read}s.)}
+%
 In the \code{run-tests.rkt} script, add the following entry for
 \code{shrink} to the list of passes (it should be the only pass at
 this point).
@@ -6066,6 +6157,10 @@ This instructs \code{interp-tests} to run the intepreter
 \code{interp-Rif} and the type checker \code{type-check-Rif} on the
 output of \code{shrink}.
 %
+\ocaml{You should consider writing an additional checking pass that
+  makes sure all the forbidden operators have really been removed,
+  in addition to invoking the standard \code{RIf} checker.}
+%
 Run the script to test your compiler on all the test programs.
 
 \end{exercise}
@@ -6077,6 +6172,8 @@ Add cases to \code{uniquify-exp} to handle Boolean constants and
 \code{if} expressions.
 
 \begin{exercise}\normalfont
+  \ocaml{This exercise has been done for you, in submodule \code{Uniquify}
+    of \code{Chapter4.ml}.}
 Update the \code{uniquify-exp} for \LangIf{} and add the following
 entry to the list of \code{passes} in the \code{run-tests.rkt} script.
 \begin{lstlisting}
@@ -6126,6 +6223,8 @@ R^{\dagger}_2  &::=& \PROGRAM{\code{()}}{\Exp}
 
 
 \begin{exercise}\normalfont
+  \ocaml{This exercise has been done for you, in submodule \code{RemoveComplexOperands}
+    of \code{Chapter4.ml}.}
 %
 Add cases for Boolean constants and \code{if} to the \code{rco-atom}
 and \code{rco-exp} functions in \code{compiler.rkt}.
@@ -6223,6 +6322,11 @@ the following code.
 \end{center}
 Unfortunately, this approach duplicates the two branches from the
 outer \code{if} and a compiler must never duplicate code!
+\ocaml{That may be a bit too strong. Sometimes duplicating
+  small amounts of code may actually produce a program that runs faster.
+  But it fair to say that a compiler should never duplicate
+  an \emph{unbounded} amount of code, as might happen with
+  the transformation here.}
 
 We need a way to perform the above transformation but without
 duplicating code. That is, we need a way for different parts of a
@@ -6308,7 +6412,7 @@ block39:
 \end{tabular} 
 
 \caption{Translation from \LangIf{} to \LangCIf{}
-  via the \code{explicate-control}.}
+  via the \code{explicate-control}.\ocaml{Note that the RCO pass does \emph{not} pull out the conditions from the \code{if} expressions.}}
 \label{fig:explicate-control-s1-38}
 \end{figure}
 
@@ -6385,6 +6489,18 @@ for Boolean constants, the blocks \code{thn} and \code{els} may not
 get used at all and we don't want to prematurely add them to the
 control-flow graph if they end up being discarded.
 
+\ocaml{But this only happens quite rarely (when a \code{if}
+  tests a literal boolean value). Moreover, it is easy to forestall
+  this from happening by performing a partial-evaluation style pass
+  prior to \code{explicate-control}, or, alternatively, to 
+  clean up any generated but unused blocks after the fact. So I suggest ignoring
+  the whole lazy evaluation story in the remainder of this section.
+  Instead, design \code{explicate\_pred} to take as arguments
+  two \emph{labels} representing where to transfer control when
+  the test expression is true or fale.  It is the responsibility of
+  \emph{caller} of \code{explicate\_pred} to construct appropriate
+  blocks and pass their labels. }
+  
 The solution to this conundrum is to use \emph{lazy
   evaluation}\index{lazy evaluation}\citep{Friedman:1976aa} to delay
 adding the blocks to the control-flow graph until the points where we
@@ -6542,6 +6658,10 @@ Boolean constants and \key{if} to the \code{explicate-tail} and
 \code{explicate-assign}. Implement the auxiliary function
 \code{explicate-pred} for predicate contexts.
 %
+\ocaml{Put your code in the \code{ExplicateControl} submodule of
+  \code{Chapter4.ml}. It is recommended that you base your code
+  on the skeleton already in that file.}
+%
 Create test cases that exercise all of the new cases in the code for
 this pass.
 %
@@ -6649,6 +6769,9 @@ jmp |$\ell_2$|
 Expand your \code{select-instructions} pass to handle the new features
 of the \LangIf{} language.
 %
+\ocaml{Place your solution in the \code{SelectInstructions} submodule of
+  \code{Chapter4.ml}.}
+%
 Add the following entry to the list of \code{passes} in
 \code{run-tests.rkt}
 \begin{lstlisting}
@@ -6693,6 +6816,9 @@ before computing a topological order.
 Use the \code{tsort} and \code{transpose} functions of the Racket
 \code{graph} package to accomplish this.
 %
+\ocaml{Use the \code{topsort} and \code{transpose} functions of the
+  provided \code{Digraph} functor.}
+%
 As an aside, a topological ordering is only guaranteed to exist if the
 graph does not contain any cycles. That is indeed the case for the
 control-flow graphs that we generate from \LangIf{} programs.
@@ -6705,7 +6831,9 @@ control-flow graph. Do not use the \code{directed-graph} of the
 each pair of vertices, but a control-flow graph may have multiple
 edges between a pair of vertices. The \code{multigraph.rkt} file in
 the support code implements a graph representation that allows
-multiple edges between a pair of vertices.
+multiple edges between a pair of vertices. \ocaml{There is no
+  need for a multigraph for our purposes in this chapter. Just
+use the plain directed graphs in \code{digraph.ml}.}
 
 The next question is how to analyze jump instructions.  Recall that in
 Section~\ref{sec:liveness-analysis-Rvar} we maintain an alist named
@@ -6738,7 +6866,9 @@ new kinds of arguments and instructions in \LangXIfVar{}.
 \begin{exercise}\normalfont
 Update the \code{uncover-live} pass and implement the
 \code{uncover-live-CFG} auxiliary function to apply liveness analysis
-to the control-flow graph.  Add the following entry to the list of
+to the control-flow graph.
+\ocaml{This is in the \code{UncoverLive} submodule of \code{Chapter4.ml}.}
+Add the following entry to the list of
 \code{passes} in the \code{run-tests.rkt} script.
 \begin{lstlisting}
 (list "uncover-live" uncover-live interp-pseudo-x86-1)
@@ -6761,6 +6891,8 @@ similar to the \key{movq} instruction. See rule number 1 in
 Section~\ref{sec:build-interference}.
 
 \begin{exercise}\normalfont
+  \ocaml{This exercise has been done for you, in submodule \code{BuildInterference}
+    of \code{Chapter4.ml}.}
 Update the \code{build-interference} pass for \LangXIfVar{} and add the
 following entries to the list of \code{passes} in the
 \code{run-tests.rkt} script.
@@ -6786,6 +6918,8 @@ The second argument of the \key{movzbq} must be a register.
 There are no special restrictions on the jump instructions.
 
 \begin{exercise}\normalfont
+  \ocaml{This exercise has been done for you, in submodule \code{PatchInstructions}
+    of \code{Chapter4.ml}.}
 %
 Update \code{patch-instructions} pass for \LangXIfVar{}.
 %  
@@ -6912,7 +7046,7 @@ conclusion:
 \end{lstlisting}
 \end{minipage}
 \end{tabular}
-\caption{Example compilation of an \key{if} expression to x86.}
+\caption{Example compilation of an \key{if} expression to x86.\ocaml{(For some reason, all the callee-save registers are being saved, even though they are not used.)}}
 \label{fig:if-example-x86}
 \end{figure}