пре 9 година · 18d91fce83
--- a/book.tex
+++ b/book.tex
@@ -374,7 +374,7 @@ called an ``alternative''.
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \fbox{
			
 
				-\begin{minipage}{\textwidth}
			
 
				+\begin{minipage}{0.96\textwidth}
			
 
				 \[
			
 
				 R_0 ::= \Int \mid ({\tt \key{read}}) \mid (\key{-} \; R_0) \mid
			
 
				    (\key{+} \; R_0 \; R_0) 
			
@@ -772,25 +772,29 @@ some fun and creativity.
 
				 The $R_1$ language extends the $R_0$ language
			
 
				 (Figure~\ref{fig:r0-syntax}) with variable definitions.  The syntax of
			
 
				 the $R_1$ language is defined by the grammar in
			
 
				-Figure~\ref{fig:r1-syntax}. In addition to variable definitions, the
			
 
				-$R_1$ language includes the \key{program} form to mark the top of the
			
 
				-program, which is helpful in some of the compiler passes.  The $R_1$
			
 
				-language is rich enough to exhibit several compilation techniques but
			
 
				-simple enough so that the reader can implement a compiler for it in a
			
 
				-couple weeks of part-time work.  To give the reader a feeling for the
			
 
				-scale of this first compiler, the instructor solution for the $R_1$
			
 
				-compiler consists of 6 recursive functions and a few small helper
			
 
				-functions that together span 256 lines of code.
			
 
				+Figure~\ref{fig:r1-syntax}. As in $R_0$, \key{read} is a nullary
			
 
				+operator, \key{-} is a unary operator, and \key{+} is a binary
			
 
				+operator. In addition to variable definitions, the $R_1$ language
			
 
				+includes the \key{program} form to mark the top of the program, which
			
 
				+is helpful in some of the compiler passes.  The $R_1$ language is rich
			
 
				+enough to exhibit several compilation techniques but simple enough so
			
 
				+that the reader can implement a compiler for it in a week of part-time
			
 
				+work.  To give the reader a feeling for the scale of this first
			
 
				+compiler, the instructor solution for the $R_1$ compiler consists of 6
			
 
				+recursive functions and a few small helper functions that together
			
 
				+span 256 lines of code.
			
 
				 
			
 
				 \begin{figure}[btp]
			
 
				 \centering
			
 
				 \fbox{
			
 
				-\begin{minipage}{\textwidth}
			
 
				-\begin{align*}
			
 
				-\Exp &::= \Int \mid ({\tt \key{read}}) \mid (\key{-} \; \Exp) \mid
			
 
				-   (\key{+} \; \Exp \; \Exp)  \mid  \Var \mid \LET{\Var}{\Exp}{\Exp} \\
			
 
				-R_1 &::= (\key{program} \; () \; \Exp)
			
 
				-\end{align*}
			
 
				+\begin{minipage}{0.96\textwidth}
			
 
				+\[
			
 
				+\begin{array}{rcl}
			
 
				+\Op  &::=& \key{read} \mid \key{-} \mid \key{+} \\
			
 
				+\Exp &::=& \Int \mid (\Op \; \Exp^{*})  \mid  \Var \mid \LET{\Var}{\Exp}{\Exp} \\
			
 
				+R_1  &::=& (\key{program} \; \Exp)
			
 
				+\end{array}
			
 
				+\]
			
 
				 \end{minipage}
			
 
				 }
			
 
				 \caption{The syntax of the $R_1$ language. 
			
@@ -803,7 +807,7 @@ and initializes the variable with the value of an expression.  So the
 
				 following program initializes \code{x} to \code{32} and then evaluates
			
 
				 the body \code{(+ 10 x)}, producing \code{42}.
			
 
				 \begin{lstlisting}
			
 
				-   (program ()
			
 
				+   (program
			
 
				       (let ([x (+ 12 20)]) (+ 10 x)))
			
 
				 \end{lstlisting}
			
 
				 When there are multiple \key{let}'s for the same variable, the closest
			
@@ -811,7 +815,7 @@ enclosing \key{let} is used. That is, variable definitions overshadow
 
				 prior definitions. Consider the following program with two \key{let}'s
			
 
				 that define variables named \code{x}. Can you figure out the result?
			
 
				 \begin{lstlisting}
			
 
				-   (program ()
			
 
				+   (program
			
 
				       (let ([x 32]) (+ (let ([x 10]) x) x)))
			
 
				 \end{lstlisting}
			
 
				 For the purposes of showing which variable uses correspond to which
			
@@ -819,7 +823,7 @@ definitions, the following shows the \code{x}'s annotated with subscripts
 
				 to distinguish them. Double check that your answer for the above is
			
 
				 the same as your answer for this annotated version of the program.
			
 
				 \begin{lstlisting}
			
 
				-   (program ()
			
 
				+   (program
			
 
				       (let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|)))
			
 
				 \end{lstlisting}
			
 
				 The initializing expression is always evaluated before the body of the
			
@@ -828,7 +832,7 @@ performed before the \key{read} for \code{y}. Given the input
 
				 \code{52} then \code{10}, the following produces \code{42} (and not
			
 
				 \code{-42}).
			
 
				 \begin{lstlisting}
			
 
				-   (program ()
			
 
				+   (program
			
 
				      (let ([x (read)]) (let ([y (read)]) (- x y))))
			
 
				 \end{lstlisting}
			
 
				 
			
@@ -865,7 +869,7 @@ to the variable, then evaluates the body of the \key{let}.
 
				         (fx- 0 (interp-R1 env e))]
			
 
				        [`(+ ,e1 ,e2)
			
 
				         (fx+ (interp-R1 env e1) (interp-R1 env e2))]
			
 
				-       [`(program () ,e) (interp-R1 '() e)]
			
 
				+       [`(program ,e) (interp-R1 '() e)]
			
 
				        ))
			
 
				 \end{lstlisting}
			
 
				 \caption{Interpreter for the $R_1$ language.}
			
@@ -1287,7 +1291,7 @@ translate the program on the left into the program on the right. \\
 
				 \begin{tabular}{lll}
			
 
				 \begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
 
				- (program ()
			
 
				+ (program
			
 
				    (let ([x 32])
			
 
				      (+ (let ([x 10]) x) x)))
			
 
				 \end{lstlisting}
			
@@ -1297,7 +1301,7 @@ $\Rightarrow$
 
				 &
			
 
				 \begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
 
				-(program ()
			
 
				+(program
			
 
				   (let ([x.1 32])
			
 
				     (+ (let ([x.2 10]) x.2) x.1)))
			
 
				 \end{lstlisting}
			
@@ -1310,7 +1314,7 @@ with a \key{let} nested inside the initializing expression of another
 
				 \begin{tabular}{lll}
			
 
				 \begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
 
				-(program ()
			
 
				+(program
			
 
				   (let ([x (let ([x 4])
			
 
				              (+ x 1))])
			
 
				     (+ x 2)))
			
@@ -1321,7 +1325,7 @@ $\Rightarrow$
 
				 &
			
 
				 \begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
 
				-(program ()
			
 
				+(program
			
 
				   (let ([x.2 (let ([x.1 4])
			
 
				                (+ x.1 1))])
			
 
				     (+ x.2 2)))
			
@@ -1366,8 +1370,8 @@ implement the clauses for variables and for the \key{let} construct.
 
				            [(? symbol?) ___]
			
 
				            [(? integer?) e]
			
 
				            [`(let ([,x ,e]) ,body) ___]
			
 
				-           [`(program ,info ,e)
			
 
				-            `(program ,info ,((uniquify alist) e))]
			
 
				+           [`(program ,e)
			
 
				+            `(program ,((uniquify alist) e))]
			
 
				            [`(,op ,es ...)
			
 
				             `(,op ,@(map (uniquify alist) es))]
			
 
				            ))))
			
@@ -1401,33 +1405,68 @@ your \key{uniquify} pass on the example programs.
 
				 \section{Flatten Expressions}
			
 
				 \label{sec:flatten-s0}
			
 
				 
			
 
				-The \key{flatten} pass will transform $R_1$ programs into $C_0$
			
 
				-programs. In particular, the purpose of the \key{flatten} pass is to
			
 
				-get rid of nested expressions, such as the $\UNIOP{-}{10}$ in the
			
 
				-following program.
			
 
				+The \code{flatten} pass will transform $R_1$ programs into $C_0$
			
 
				+programs. In particular, the purpose of the \code{flatten} pass is to
			
 
				+get rid of nested expressions, such as the \code{(- 10)} in the below
			
 
				+program. This can be accomplished by introducing a new variable,
			
 
				+assigning the nested expression to the new variable, and then using
			
 
				+the new variable in place of the nested expressions, as shown in the
			
 
				+output of \code{flatten} on the right.\\
			
 
				+\begin{tabular}{lll}
			
 
				+\begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
 
				-   (program ()
			
 
				-     (+ 52 (- 10)))
			
 
				+ (program
			
 
				+   (+ 52 (- 10)))
			
 
				 \end{lstlisting}
			
 
				-This can be accomplished by introducing a new variable, assigning the
			
 
				-nested expression to the new variable, and then using the new variable
			
 
				-in place of the nested expressions. For example, the above program is
			
 
				-translated to the following one.
			
 
				+\end{minipage}
			
 
				+&
			
 
				+$\Rightarrow$
			
 
				+&
			
 
				+\begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
 
				-   (program (tmp.1 tmp.2)
			
 
				-     (assign tmp.1 (- 10))
			
 
				-     (assign tmp.2 (+ 52 tmp.1))
			
 
				-     (return tmp.2))
			
 
				+(program (tmp.1 tmp.2)
			
 
				+  (assign tmp.1 (- 10))
			
 
				+  (assign tmp.2 (+ 52 tmp.1))
			
 
				+  (return tmp.2))
			
 
				 \end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{tabular}
			
 
				+
			
 
				+The clause of \code{flatten} for \key{let} is straightforward to
			
 
				+implement as it just requires the generation of an assignment
			
 
				+statement for the \key{let}-bound variable. The following shows the
			
 
				+result of \code{flatten} for a \key{let}. \\
			
 
				+\begin{tabular}{lll}
			
 
				+\begin{minipage}{0.4\textwidth}
			
 
				+\begin{lstlisting}
			
 
				+ (program
			
 
				+   (let ([x (+ (- 10) 11)])
			
 
				+     (+ x 41)))
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+&
			
 
				+$\Rightarrow$
			
 
				+&
			
 
				+\begin{minipage}{0.4\textwidth}
			
 
				+\begin{lstlisting}
			
 
				+(program (tmp.1 x tmp.2)
			
 
				+  (assign tmp.1 (- 10))
			
 
				+  (assign x (+ tmp.1 11))
			
 
				+  (assign tmp.2 (+ x 41))
			
 
				+  (return tmp.2))
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{tabular}
			
 
				 
			
 
				 We recommend implementing \key{flatten} as a structurally recursive
			
 
				 function that returns two things, 1) the newly flattened expression,
			
 
				 and 2) a list of assignment statements, one for each of the new
			
 
				-variables introduced while flattening the expression. You can return
			
 
				-multiple things from a function using the \key{values} form and you
			
 
				-can receive multiple things from a function call using the
			
 
				-\key{define-values} form. If you are not familiar with these
			
 
				-constructs, the Racket documentation will be of help.
			
 
				+variables introduced while flattening the expression.  The newly
			
 
				+flattened expression should be leaf node. You can return multiple
			
 
				+things from a function using the \key{values} form and you can receive
			
 
				+multiple things from a function call using the \key{define-values}
			
 
				+form. If you are not familiar with these constructs, the Racket
			
 
				+documentation will be of help.
			
 
				 
			
 
				 The clause of \key{flatten} for the \key{program} node needs to
			
 
				 recursively flatten the body of the program and also compute the list
			
@@ -1492,61 +1531,90 @@ of the form $\VAR{\itm{var}}$ to the x86 abstract syntax.  The
 
				 \key{select-instructions} pass deals with the differing format of
			
 
				 arithmetic operations. For example, in $C_0$ an addition operation
			
 
				 could take the following form:
			
 
				-\[
			
 
				-\ASSIGN{x}{ \BINOP{+}{10}{32} }
			
 
				-\]
			
 
				+\begin{lstlisting}
			
 
				+   (assign x (+ 10 32))
			
 
				+\end{lstlisting}
			
 
				 To translate to x86, we need to express this addition using the
			
 
				 \key{addq} instruction that does an inplace update. So we first move
			
 
				-$10$ to $x$ then perform the \key{addq}.
			
 
				-\[
			
 
				-(\key{mov}\,\INT{10}\, \VAR{x})\; (\key{addq} \;\INT{32}\; \VAR{x})
			
 
				-\]
			
 
				+\code{10} to \code{x} then perform the \key{addq}.
			
 
				+\begin{lstlisting}
			
 
				+  (movq (int 10) (var x))
			
 
				+  (addq (int 32) (var x))
			
 
				+\end{lstlisting}
			
 
				 
			
 
				 There are some cases that require special care to avoid generating
			
 
				 needlessly complicated code. If one of the arguments is the same as
			
 
				 the left-hand side of the assignment, then there is no need for the
			
 
				-extra move instruction.  For example, the following
			
 
				-\[
			
 
				-\ASSIGN{x}{ \BINOP{+}{10}{x} }
			
 
				-\quad\text{should translate to}\quad
			
 
				-(\key{addq} \; \INT{10}\; \VAR{x})
			
 
				-\]
			
 
				+extra move instruction.  For example, the following assignment
			
 
				+statement can be translated into a single \key{addq} instruction.\\
			
 
				+\begin{tabular}{lll}
			
 
				+\begin{minipage}{0.4\textwidth}
			
 
				+\begin{lstlisting}
			
 
				+ (assign x (+ 10 x))
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+&
			
 
				+$\Rightarrow$
			
 
				+&
			
 
				+\begin{minipage}{0.4\textwidth}
			
 
				+\begin{lstlisting}
			
 
				+(addq (int 10) (var x))
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{tabular} \\
			
 
				 
			
 
				 Regarding the \RETURN{e} statement of $C_0$, we recommend treating it
			
 
				 as an assignment to the \key{rax} register and let the procedure
			
 
				 conclusion handle the transfer of control back to the calling
			
 
				 procedure.
			
 
				 
			
 
				+\begin{exercise}
			
 
				+\normalfont
			
 
				+Implement the \key{select-instructions} pass and test it on all of the
			
 
				+example programs that you created for the previous passes and create
			
 
				+three new example programs that are designed to exercise all of the
			
 
				+interesting code in this pass. Use the \key{interp-tests} function
			
 
				+(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
			
 
				+your passes on the example programs.
			
 
				+\end{exercise}
			
 
				+
			
 
				 \section{Assign Homes}
			
 
				 \label{sec:assign-s0}
			
 
				 
			
 
				 As discussed in Section~\ref{sec:plan-s0-x86}, the
			
 
				 \key{assign-homes} pass places all of the variables on the stack.
			
 
				-Consider again the example $R_1$ program $\BINOP{+}{52}{ \UNIOP{-}{10} }$,
			
 
				+Consider again the example $R_1$ program \code{(+ 52 (- 10))},
			
 
				 which after \key{select-instructions} looks like the following.
			
 
				-\[
			
 
				-\begin{array}{l}
			
 
				-(\key{movq}\;\INT{10}\; \VAR{x})\\
			
 
				-(\key{negq}\; \VAR{x})\\
			
 
				-(\key{movq}\; \INT{52}\; \REG{\itm{rax}})\\
			
 
				-(\key{addq}\; \VAR{x} \REG{\itm{rax}})
			
 
				-\end{array}
			
 
				-\]
			
 
				-The one and only variable $x$ is assigned to stack location
			
 
				-\key{-8(\%rbp)}, so the \key{assign-homes} pass translates the
			
 
				+\begin{lstlisting}
			
 
				+   (movq (int 10) (var x))
			
 
				+   (negq (var x))
			
 
				+   (movq (int 52) (reg rax))
			
 
				+   (addq (var x) (reg rax))
			
 
				+\end{lstlisting}
			
 
				+The one and only variable \code{x} is assigned to stack location
			
 
				+\code{-8(\%rbp)}, so the \code{assign-homes} pass translates the
			
 
				 above to
			
 
				-\[
			
 
				-\begin{array}{l}
			
 
				-(\key{movq}\;\INT{10}\; \STACKLOC{{-}8})\\
			
 
				-(\key{negq}\; \STACKLOC{{-}8})\\
			
 
				-(\key{movq}\; \INT{52}\; \REG{\itm{rax}})\\
			
 
				-(\key{addq}\; \STACKLOC{{-}8}\; \REG{\itm{rax}})
			
 
				-\end{array}
			
 
				-\]
			
 
				+\begin{lstlisting}
			
 
				+   (movq (int 10) (stack -8))
			
 
				+   (negq (stack -8))
			
 
				+   (movq (int 52) (reg rax))
			
 
				+   (addq (stack -8) (reg rax))
			
 
				+\end{lstlisting}
			
 
				 
			
 
				 In the process of assigning stack locations to variables, it is
			
 
				-convenient to compute and store the size of the frame which will be
			
 
				-needed later to generate the procedure conclusion.
			
 
				+convenient to compute and store the size of the frame in the
			
 
				+$\itm{info}$ field of the \key{program} node which will be needed
			
 
				+later to generate the procedure conclusion. Some operating systems
			
 
				+place restrictions on the frame size. For example, Mac OS X requires
			
 
				+the frame size to be a multiple of 16 bytes.
			
 
				+
			
 
				+\begin{exercise}
			
 
				+\normalfont
			
 
				+Implement the \key{assign-homes} pass and test it on all of the
			
 
				+example programs that you created for the previous passes pass. Use
			
 
				+the \key{interp-tests} function (Appendix~\ref{appendix:utilities})
			
 
				+from \key{utilities.rkt} to test your passes on the example programs.
			
 
				+\end{exercise}
			
 
				 
			
 
				 \section{Patch Instructions}
			
 
				 \label{sec:patch-s0}
			
@@ -1557,32 +1625,38 @@ references. For most instructions, the rule is that at most one
 
				 argument may be a memory reference.
			
 
				 
			
 
				 Consider again the following example.
			
 
				-\[
			
 
				-\LET{a}{42}{ \LET{b}{a}{ b }}
			
 
				-\]
			
 
				+\begin{lstlisting}
			
 
				+   (let ([a 42])
			
 
				+     (let ([b a])
			
 
				+       b))
			
 
				+\end{lstlisting}
			
 
				 After \key{assign-homes} pass, the above has been translated to
			
 
				-\[
			
 
				-\begin{array}{l}
			
 
				-(\key{movq} \;\INT{42}\; \STACKLOC{{-}8})\\
			
 
				-(\key{movq}\;\STACKLOC{{-}8}\; \STACKLOC{{-}16})\\
			
 
				-(\key{movq}\;\STACKLOC{{-}16}\; \REG{\itm{rax}})
			
 
				-\end{array}
			
 
				-\]
			
 
				+\begin{lstlisting}
			
 
				+   (movq (int 42) (stack -8))
			
 
				+   (movq (stack -8) (stack -16))
			
 
				+   (movq (stack -16) (reg rax))
			
 
				+\end{lstlisting}
			
 
				 The second \key{movq} instruction is problematic because both arguments
			
 
				 are stack locations. We suggest fixing this problem by moving from the
			
 
				 source to \key{rax} and then from \key{rax} to the destination, as
			
 
				 follows.
			
 
				-\[
			
 
				-\begin{array}{l}
			
 
				-(\key{movq} \;\INT{42}\; \STACKLOC{{-}8})\\
			
 
				-(\key{movq}\;\STACKLOC{{-}8}\; \REG{\itm{rax}})\\
			
 
				-(\key{movq}\;\REG{\itm{rax}}\; \STACKLOC{{-}16})\\
			
 
				-(\key{movq}\;\STACKLOC{{-}16}\; \REG{\itm{rax}})
			
 
				-\end{array}
			
 
				-\]
			
 
				+\begin{lstlisting}
			
 
				+   (movq (int 42) (stack -8))
			
 
				+   (movq (stack -8) (reg rax))
			
 
				+   (movq (reg rax) (stack -16))
			
 
				+   (movq (stack -16) (reg rax))
			
 
				+\end{lstlisting}
			
 
				+
			
 
				+\begin{exercise}
			
 
				+\normalfont
			
 
				+Implement the \key{patch-instructions} pass and test it on all of the
			
 
				+example programs that you created for the previous passes and create
			
 
				+three new example programs that are designed to exercise all of the
			
 
				+interesting code in this pass. Use the \key{interp-tests} function
			
 
				+(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
			
 
				+your passes on the example programs.
			
 
				+\end{exercise}
			
 
				 
			
 
				-%% The \key{imulq} instruction is a special case because the destination
			
 
				-%% argument must be a register.
			
 
				 
			
 
				 \section{Print x86-64}
			
 
				 \label{sec:print-x86}
			
@@ -1593,10 +1667,19 @@ representation (defined in Figure~\ref{fig:x86-a}). The Racket
 
				 \key{format} and \key{string-append} functions are useful in this
			
 
				 regard. The main work that this step needs to perform is to create the
			
 
				 \key{\_main} function and the standard instructions for its prelude
			
 
				-and conclusion, as described in Section~\ref{sec:x86-64}. You need to
			
 
				-know the number of stack-allocated variables, which is convenient to
			
 
				-compute in the \key{assign-homes} pass (Section~\ref{sec:assign-s0})
			
 
				-and then store in the $\itm{info}$ field of the \key{program}.
			
 
				+and conclusion, as shown in Figure~\ref{fig:p1-x86} of
			
 
				+Section~\ref{sec:x86-64}. You need to know the number of
			
 
				+stack-allocated variables, for which it is suggest that you compute in
			
 
				+the \key{assign-homes} pass (Section~\ref{sec:assign-s0}) and store in
			
 
				+the $\itm{info}$ field of the \key{program} node.
			
 
				+
			
 
				+\begin{exercise}
			
 
				+\normalfont Implement the \key{print-x86} pass and test it on all of
			
 
				+the example programs that you created for the previous passes. Use the
			
 
				+\key{compiler-tests} function (Appendix~\ref{appendix:utilities}) from
			
 
				+\key{utilities.rkt} to test your complete compiler on the example
			
 
				+programs.
			
 
				+\end{exercise}
			
 
				 
			
 
				 %% \section{Testing with Interpreters}