Jeremy Siek 9 years ago
parent
commit
18d91fce83
1 changed files with 187 additions and 104 deletions
  1. 187 104
      book.tex

+ 187 - 104
book.tex

@@ -374,7 +374,7 @@ called an ``alternative''.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \fbox{
 \fbox{
-\begin{minipage}{\textwidth}
+\begin{minipage}{0.96\textwidth}
 \[
 \[
 R_0 ::= \Int \mid ({\tt \key{read}}) \mid (\key{-} \; R_0) \mid
 R_0 ::= \Int \mid ({\tt \key{read}}) \mid (\key{-} \; R_0) \mid
    (\key{+} \; R_0 \; R_0) 
    (\key{+} \; R_0 \; R_0) 
@@ -772,25 +772,29 @@ some fun and creativity.
 The $R_1$ language extends the $R_0$ language
 The $R_1$ language extends the $R_0$ language
 (Figure~\ref{fig:r0-syntax}) with variable definitions.  The syntax of
 (Figure~\ref{fig:r0-syntax}) with variable definitions.  The syntax of
 the $R_1$ language is defined by the grammar in
 the $R_1$ language is defined by the grammar in
-Figure~\ref{fig:r1-syntax}. In addition to variable definitions, the
-$R_1$ language includes the \key{program} form to mark the top of the
-program, which is helpful in some of the compiler passes.  The $R_1$
-language is rich enough to exhibit several compilation techniques but
-simple enough so that the reader can implement a compiler for it in a
-couple weeks of part-time work.  To give the reader a feeling for the
-scale of this first compiler, the instructor solution for the $R_1$
-compiler consists of 6 recursive functions and a few small helper
-functions that together span 256 lines of code.
+Figure~\ref{fig:r1-syntax}. As in $R_0$, \key{read} is a nullary
+operator, \key{-} is a unary operator, and \key{+} is a binary
+operator. In addition to variable definitions, the $R_1$ language
+includes the \key{program} form to mark the top of the program, which
+is helpful in some of the compiler passes.  The $R_1$ language is rich
+enough to exhibit several compilation techniques but simple enough so
+that the reader can implement a compiler for it in a week of part-time
+work.  To give the reader a feeling for the scale of this first
+compiler, the instructor solution for the $R_1$ compiler consists of 6
+recursive functions and a few small helper functions that together
+span 256 lines of code.
 
 
 \begin{figure}[btp]
 \begin{figure}[btp]
 \centering
 \centering
 \fbox{
 \fbox{
-\begin{minipage}{\textwidth}
-\begin{align*}
-\Exp &::= \Int \mid ({\tt \key{read}}) \mid (\key{-} \; \Exp) \mid
-   (\key{+} \; \Exp \; \Exp)  \mid  \Var \mid \LET{\Var}{\Exp}{\Exp} \\
-R_1 &::= (\key{program} \; () \; \Exp)
-\end{align*}
+\begin{minipage}{0.96\textwidth}
+\[
+\begin{array}{rcl}
+\Op  &::=& \key{read} \mid \key{-} \mid \key{+} \\
+\Exp &::=& \Int \mid (\Op \; \Exp^{*})  \mid  \Var \mid \LET{\Var}{\Exp}{\Exp} \\
+R_1  &::=& (\key{program} \; \Exp)
+\end{array}
+\]
 \end{minipage}
 \end{minipage}
 }
 }
 \caption{The syntax of the $R_1$ language. 
 \caption{The syntax of the $R_1$ language. 
@@ -803,7 +807,7 @@ and initializes the variable with the value of an expression.  So the
 following program initializes \code{x} to \code{32} and then evaluates
 following program initializes \code{x} to \code{32} and then evaluates
 the body \code{(+ 10 x)}, producing \code{42}.
 the body \code{(+ 10 x)}, producing \code{42}.
 \begin{lstlisting}
 \begin{lstlisting}
-   (program ()
+   (program
       (let ([x (+ 12 20)]) (+ 10 x)))
       (let ([x (+ 12 20)]) (+ 10 x)))
 \end{lstlisting}
 \end{lstlisting}
 When there are multiple \key{let}'s for the same variable, the closest
 When there are multiple \key{let}'s for the same variable, the closest
@@ -811,7 +815,7 @@ enclosing \key{let} is used. That is, variable definitions overshadow
 prior definitions. Consider the following program with two \key{let}'s
 prior definitions. Consider the following program with two \key{let}'s
 that define variables named \code{x}. Can you figure out the result?
 that define variables named \code{x}. Can you figure out the result?
 \begin{lstlisting}
 \begin{lstlisting}
-   (program ()
+   (program
       (let ([x 32]) (+ (let ([x 10]) x) x)))
       (let ([x 32]) (+ (let ([x 10]) x) x)))
 \end{lstlisting}
 \end{lstlisting}
 For the purposes of showing which variable uses correspond to which
 For the purposes of showing which variable uses correspond to which
@@ -819,7 +823,7 @@ definitions, the following shows the \code{x}'s annotated with subscripts
 to distinguish them. Double check that your answer for the above is
 to distinguish them. Double check that your answer for the above is
 the same as your answer for this annotated version of the program.
 the same as your answer for this annotated version of the program.
 \begin{lstlisting}
 \begin{lstlisting}
-   (program ()
+   (program
       (let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|)))
       (let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|)))
 \end{lstlisting}
 \end{lstlisting}
 The initializing expression is always evaluated before the body of the
 The initializing expression is always evaluated before the body of the
@@ -828,7 +832,7 @@ performed before the \key{read} for \code{y}. Given the input
 \code{52} then \code{10}, the following produces \code{42} (and not
 \code{52} then \code{10}, the following produces \code{42} (and not
 \code{-42}).
 \code{-42}).
 \begin{lstlisting}
 \begin{lstlisting}
-   (program ()
+   (program
      (let ([x (read)]) (let ([y (read)]) (- x y))))
      (let ([x (read)]) (let ([y (read)]) (- x y))))
 \end{lstlisting}
 \end{lstlisting}
 
 
@@ -865,7 +869,7 @@ to the variable, then evaluates the body of the \key{let}.
         (fx- 0 (interp-R1 env e))]
         (fx- 0 (interp-R1 env e))]
        [`(+ ,e1 ,e2)
        [`(+ ,e1 ,e2)
         (fx+ (interp-R1 env e1) (interp-R1 env e2))]
         (fx+ (interp-R1 env e1) (interp-R1 env e2))]
-       [`(program () ,e) (interp-R1 '() e)]
+       [`(program ,e) (interp-R1 '() e)]
        ))
        ))
 \end{lstlisting}
 \end{lstlisting}
 \caption{Interpreter for the $R_1$ language.}
 \caption{Interpreter for the $R_1$ language.}
@@ -1287,7 +1291,7 @@ translate the program on the left into the program on the right. \\
 \begin{tabular}{lll}
 \begin{tabular}{lll}
 \begin{minipage}{0.4\textwidth}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
- (program ()
+ (program
    (let ([x 32])
    (let ([x 32])
      (+ (let ([x 10]) x) x)))
      (+ (let ([x 10]) x) x)))
 \end{lstlisting}
 \end{lstlisting}
@@ -1297,7 +1301,7 @@ $\Rightarrow$
 &
 &
 \begin{minipage}{0.4\textwidth}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-(program ()
+(program
   (let ([x.1 32])
   (let ([x.1 32])
     (+ (let ([x.2 10]) x.2) x.1)))
     (+ (let ([x.2 10]) x.2) x.1)))
 \end{lstlisting}
 \end{lstlisting}
@@ -1310,7 +1314,7 @@ with a \key{let} nested inside the initializing expression of another
 \begin{tabular}{lll}
 \begin{tabular}{lll}
 \begin{minipage}{0.4\textwidth}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-(program ()
+(program
   (let ([x (let ([x 4])
   (let ([x (let ([x 4])
              (+ x 1))])
              (+ x 1))])
     (+ x 2)))
     (+ x 2)))
@@ -1321,7 +1325,7 @@ $\Rightarrow$
 &
 &
 \begin{minipage}{0.4\textwidth}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-(program ()
+(program
   (let ([x.2 (let ([x.1 4])
   (let ([x.2 (let ([x.1 4])
                (+ x.1 1))])
                (+ x.1 1))])
     (+ x.2 2)))
     (+ x.2 2)))
@@ -1366,8 +1370,8 @@ implement the clauses for variables and for the \key{let} construct.
            [(? symbol?) ___]
            [(? symbol?) ___]
            [(? integer?) e]
            [(? integer?) e]
            [`(let ([,x ,e]) ,body) ___]
            [`(let ([,x ,e]) ,body) ___]
-           [`(program ,info ,e)
-            `(program ,info ,((uniquify alist) e))]
+           [`(program ,e)
+            `(program ,((uniquify alist) e))]
            [`(,op ,es ...)
            [`(,op ,es ...)
             `(,op ,@(map (uniquify alist) es))]
             `(,op ,@(map (uniquify alist) es))]
            ))))
            ))))
@@ -1401,33 +1405,68 @@ your \key{uniquify} pass on the example programs.
 \section{Flatten Expressions}
 \section{Flatten Expressions}
 \label{sec:flatten-s0}
 \label{sec:flatten-s0}
 
 
-The \key{flatten} pass will transform $R_1$ programs into $C_0$
-programs. In particular, the purpose of the \key{flatten} pass is to
-get rid of nested expressions, such as the $\UNIOP{-}{10}$ in the
-following program.
+The \code{flatten} pass will transform $R_1$ programs into $C_0$
+programs. In particular, the purpose of the \code{flatten} pass is to
+get rid of nested expressions, such as the \code{(- 10)} in the below
+program. This can be accomplished by introducing a new variable,
+assigning the nested expression to the new variable, and then using
+the new variable in place of the nested expressions, as shown in the
+output of \code{flatten} on the right.\\
+\begin{tabular}{lll}
+\begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-   (program ()
-     (+ 52 (- 10)))
+ (program
+   (+ 52 (- 10)))
 \end{lstlisting}
 \end{lstlisting}
-This can be accomplished by introducing a new variable, assigning the
-nested expression to the new variable, and then using the new variable
-in place of the nested expressions. For example, the above program is
-translated to the following one.
+\end{minipage}
+&
+$\Rightarrow$
+&
+\begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-   (program (tmp.1 tmp.2)
-     (assign tmp.1 (- 10))
-     (assign tmp.2 (+ 52 tmp.1))
-     (return tmp.2))
+(program (tmp.1 tmp.2)
+  (assign tmp.1 (- 10))
+  (assign tmp.2 (+ 52 tmp.1))
+  (return tmp.2))
 \end{lstlisting}
 \end{lstlisting}
+\end{minipage}
+\end{tabular}
+
+The clause of \code{flatten} for \key{let} is straightforward to
+implement as it just requires the generation of an assignment
+statement for the \key{let}-bound variable. The following shows the
+result of \code{flatten} for a \key{let}. \\
+\begin{tabular}{lll}
+\begin{minipage}{0.4\textwidth}
+\begin{lstlisting}
+ (program
+   (let ([x (+ (- 10) 11)])
+     (+ x 41)))
+\end{lstlisting}
+\end{minipage}
+&
+$\Rightarrow$
+&
+\begin{minipage}{0.4\textwidth}
+\begin{lstlisting}
+(program (tmp.1 x tmp.2)
+  (assign tmp.1 (- 10))
+  (assign x (+ tmp.1 11))
+  (assign tmp.2 (+ x 41))
+  (return tmp.2))
+\end{lstlisting}
+\end{minipage}
+\end{tabular}
 
 
 We recommend implementing \key{flatten} as a structurally recursive
 We recommend implementing \key{flatten} as a structurally recursive
 function that returns two things, 1) the newly flattened expression,
 function that returns two things, 1) the newly flattened expression,
 and 2) a list of assignment statements, one for each of the new
 and 2) a list of assignment statements, one for each of the new
-variables introduced while flattening the expression. You can return
-multiple things from a function using the \key{values} form and you
-can receive multiple things from a function call using the
-\key{define-values} form. If you are not familiar with these
-constructs, the Racket documentation will be of help.
+variables introduced while flattening the expression.  The newly
+flattened expression should be leaf node. You can return multiple
+things from a function using the \key{values} form and you can receive
+multiple things from a function call using the \key{define-values}
+form. If you are not familiar with these constructs, the Racket
+documentation will be of help.
 
 
 The clause of \key{flatten} for the \key{program} node needs to
 The clause of \key{flatten} for the \key{program} node needs to
 recursively flatten the body of the program and also compute the list
 recursively flatten the body of the program and also compute the list
@@ -1492,61 +1531,90 @@ of the form $\VAR{\itm{var}}$ to the x86 abstract syntax.  The
 \key{select-instructions} pass deals with the differing format of
 \key{select-instructions} pass deals with the differing format of
 arithmetic operations. For example, in $C_0$ an addition operation
 arithmetic operations. For example, in $C_0$ an addition operation
 could take the following form:
 could take the following form:
-\[
-\ASSIGN{x}{ \BINOP{+}{10}{32} }
-\]
+\begin{lstlisting}
+   (assign x (+ 10 32))
+\end{lstlisting}
 To translate to x86, we need to express this addition using the
 To translate to x86, we need to express this addition using the
 \key{addq} instruction that does an inplace update. So we first move
 \key{addq} instruction that does an inplace update. So we first move
-$10$ to $x$ then perform the \key{addq}.
-\[
-(\key{mov}\,\INT{10}\, \VAR{x})\; (\key{addq} \;\INT{32}\; \VAR{x})
-\]
+\code{10} to \code{x} then perform the \key{addq}.
+\begin{lstlisting}
+  (movq (int 10) (var x))
+  (addq (int 32) (var x))
+\end{lstlisting}
 
 
 There are some cases that require special care to avoid generating
 There are some cases that require special care to avoid generating
 needlessly complicated code. If one of the arguments is the same as
 needlessly complicated code. If one of the arguments is the same as
 the left-hand side of the assignment, then there is no need for the
 the left-hand side of the assignment, then there is no need for the
-extra move instruction.  For example, the following
-\[
-\ASSIGN{x}{ \BINOP{+}{10}{x} }
-\quad\text{should translate to}\quad
-(\key{addq} \; \INT{10}\; \VAR{x})
-\]
+extra move instruction.  For example, the following assignment
+statement can be translated into a single \key{addq} instruction.\\
+\begin{tabular}{lll}
+\begin{minipage}{0.4\textwidth}
+\begin{lstlisting}
+ (assign x (+ 10 x))
+\end{lstlisting}
+\end{minipage}
+&
+$\Rightarrow$
+&
+\begin{minipage}{0.4\textwidth}
+\begin{lstlisting}
+(addq (int 10) (var x))
+\end{lstlisting}
+\end{minipage}
+\end{tabular} \\
 
 
 Regarding the \RETURN{e} statement of $C_0$, we recommend treating it
 Regarding the \RETURN{e} statement of $C_0$, we recommend treating it
 as an assignment to the \key{rax} register and let the procedure
 as an assignment to the \key{rax} register and let the procedure
 conclusion handle the transfer of control back to the calling
 conclusion handle the transfer of control back to the calling
 procedure.
 procedure.
 
 
+\begin{exercise}
+\normalfont
+Implement the \key{select-instructions} pass and test it on all of the
+example programs that you created for the previous passes and create
+three new example programs that are designed to exercise all of the
+interesting code in this pass. Use the \key{interp-tests} function
+(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
+your passes on the example programs.
+\end{exercise}
+
 \section{Assign Homes}
 \section{Assign Homes}
 \label{sec:assign-s0}
 \label{sec:assign-s0}
 
 
 As discussed in Section~\ref{sec:plan-s0-x86}, the
 As discussed in Section~\ref{sec:plan-s0-x86}, the
 \key{assign-homes} pass places all of the variables on the stack.
 \key{assign-homes} pass places all of the variables on the stack.
-Consider again the example $R_1$ program $\BINOP{+}{52}{ \UNIOP{-}{10} }$,
+Consider again the example $R_1$ program \code{(+ 52 (- 10))},
 which after \key{select-instructions} looks like the following.
 which after \key{select-instructions} looks like the following.
-\[
-\begin{array}{l}
-(\key{movq}\;\INT{10}\; \VAR{x})\\
-(\key{negq}\; \VAR{x})\\
-(\key{movq}\; \INT{52}\; \REG{\itm{rax}})\\
-(\key{addq}\; \VAR{x} \REG{\itm{rax}})
-\end{array}
-\]
-The one and only variable $x$ is assigned to stack location
-\key{-8(\%rbp)}, so the \key{assign-homes} pass translates the
+\begin{lstlisting}
+   (movq (int 10) (var x))
+   (negq (var x))
+   (movq (int 52) (reg rax))
+   (addq (var x) (reg rax))
+\end{lstlisting}
+The one and only variable \code{x} is assigned to stack location
+\code{-8(\%rbp)}, so the \code{assign-homes} pass translates the
 above to
 above to
-\[
-\begin{array}{l}
-(\key{movq}\;\INT{10}\; \STACKLOC{{-}8})\\
-(\key{negq}\; \STACKLOC{{-}8})\\
-(\key{movq}\; \INT{52}\; \REG{\itm{rax}})\\
-(\key{addq}\; \STACKLOC{{-}8}\; \REG{\itm{rax}})
-\end{array}
-\]
+\begin{lstlisting}
+   (movq (int 10) (stack -8))
+   (negq (stack -8))
+   (movq (int 52) (reg rax))
+   (addq (stack -8) (reg rax))
+\end{lstlisting}
 
 
 In the process of assigning stack locations to variables, it is
 In the process of assigning stack locations to variables, it is
-convenient to compute and store the size of the frame which will be
-needed later to generate the procedure conclusion.
+convenient to compute and store the size of the frame in the
+$\itm{info}$ field of the \key{program} node which will be needed
+later to generate the procedure conclusion. Some operating systems
+place restrictions on the frame size. For example, Mac OS X requires
+the frame size to be a multiple of 16 bytes.
+
+\begin{exercise}
+\normalfont
+Implement the \key{assign-homes} pass and test it on all of the
+example programs that you created for the previous passes pass. Use
+the \key{interp-tests} function (Appendix~\ref{appendix:utilities})
+from \key{utilities.rkt} to test your passes on the example programs.
+\end{exercise}
 
 
 \section{Patch Instructions}
 \section{Patch Instructions}
 \label{sec:patch-s0}
 \label{sec:patch-s0}
@@ -1557,32 +1625,38 @@ references. For most instructions, the rule is that at most one
 argument may be a memory reference.
 argument may be a memory reference.
 
 
 Consider again the following example.
 Consider again the following example.
-\[
-\LET{a}{42}{ \LET{b}{a}{ b }}
-\]
+\begin{lstlisting}
+   (let ([a 42])
+     (let ([b a])
+       b))
+\end{lstlisting}
 After \key{assign-homes} pass, the above has been translated to
 After \key{assign-homes} pass, the above has been translated to
-\[
-\begin{array}{l}
-(\key{movq} \;\INT{42}\; \STACKLOC{{-}8})\\
-(\key{movq}\;\STACKLOC{{-}8}\; \STACKLOC{{-}16})\\
-(\key{movq}\;\STACKLOC{{-}16}\; \REG{\itm{rax}})
-\end{array}
-\]
+\begin{lstlisting}
+   (movq (int 42) (stack -8))
+   (movq (stack -8) (stack -16))
+   (movq (stack -16) (reg rax))
+\end{lstlisting}
 The second \key{movq} instruction is problematic because both arguments
 The second \key{movq} instruction is problematic because both arguments
 are stack locations. We suggest fixing this problem by moving from the
 are stack locations. We suggest fixing this problem by moving from the
 source to \key{rax} and then from \key{rax} to the destination, as
 source to \key{rax} and then from \key{rax} to the destination, as
 follows.
 follows.
-\[
-\begin{array}{l}
-(\key{movq} \;\INT{42}\; \STACKLOC{{-}8})\\
-(\key{movq}\;\STACKLOC{{-}8}\; \REG{\itm{rax}})\\
-(\key{movq}\;\REG{\itm{rax}}\; \STACKLOC{{-}16})\\
-(\key{movq}\;\STACKLOC{{-}16}\; \REG{\itm{rax}})
-\end{array}
-\]
+\begin{lstlisting}
+   (movq (int 42) (stack -8))
+   (movq (stack -8) (reg rax))
+   (movq (reg rax) (stack -16))
+   (movq (stack -16) (reg rax))
+\end{lstlisting}
+
+\begin{exercise}
+\normalfont
+Implement the \key{patch-instructions} pass and test it on all of the
+example programs that you created for the previous passes and create
+three new example programs that are designed to exercise all of the
+interesting code in this pass. Use the \key{interp-tests} function
+(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
+your passes on the example programs.
+\end{exercise}
 
 
-%% The \key{imulq} instruction is a special case because the destination
-%% argument must be a register.
 
 
 \section{Print x86-64}
 \section{Print x86-64}
 \label{sec:print-x86}
 \label{sec:print-x86}
@@ -1593,10 +1667,19 @@ representation (defined in Figure~\ref{fig:x86-a}). The Racket
 \key{format} and \key{string-append} functions are useful in this
 \key{format} and \key{string-append} functions are useful in this
 regard. The main work that this step needs to perform is to create the
 regard. The main work that this step needs to perform is to create the
 \key{\_main} function and the standard instructions for its prelude
 \key{\_main} function and the standard instructions for its prelude
-and conclusion, as described in Section~\ref{sec:x86-64}. You need to
-know the number of stack-allocated variables, which is convenient to
-compute in the \key{assign-homes} pass (Section~\ref{sec:assign-s0})
-and then store in the $\itm{info}$ field of the \key{program}.
+and conclusion, as shown in Figure~\ref{fig:p1-x86} of
+Section~\ref{sec:x86-64}. You need to know the number of
+stack-allocated variables, for which it is suggest that you compute in
+the \key{assign-homes} pass (Section~\ref{sec:assign-s0}) and store in
+the $\itm{info}$ field of the \key{program} node.
+
+\begin{exercise}
+\normalfont Implement the \key{print-x86} pass and test it on all of
+the example programs that you created for the previous passes. Use the
+\key{compiler-tests} function (Appendix~\ref{appendix:utilities}) from
+\key{utilities.rkt} to test your complete compiler on the example
+programs.
+\end{exercise}
 
 
 %% \section{Testing with Interpreters}
 %% \section{Testing with Interpreters}