Jeremy Siek 9 years ago
parent
commit
d1a8441caf
1 changed files with 110 additions and 41 deletions
  1. 110 41
      book.tex

+ 110 - 41
book.tex

@@ -1175,15 +1175,20 @@ an intermediate language named $C_0$, roughly half-way between $S_0$
 and x86, to provide a rest stop along the way. We name the language
 and x86, to provide a rest stop along the way. We name the language
 $C_0$ because it is vaguely similar to the $C$
 $C_0$ because it is vaguely similar to the $C$
 language~\citep{Kernighan:1988nx}. The differences \#4 and \#1,
 language~\citep{Kernighan:1988nx}. The differences \#4 and \#1,
-regarding variables and nested expressions, are handled by the passes
-\textsf{uniquify} and \textsf{flatten} that bring us to $C_0$.
+regarding variables and nested expressions, will be handled by two
+steps, \key{uniquify} and \key{flatten}, which bring us to
+$C_0$.
 \[\large
 \[\large
 \xymatrix@=50pt{
 \xymatrix@=50pt{
-  S_0 \ar@/^/[r]^-{\textsf{uniquify}} & 
-  S_0 \ar@/^/[r]^-{\textsf{flatten}} &
+  S_0 \ar@/^/[r]^-{\key{uniquify}} & 
+  S_0 \ar@/^/[r]^-{\key{flatten}} &
   C_0 
   C_0 
 }
 }
 \]
 \]
+Each of these steps in the compiler is implemented by a function,
+typically a structurally recursive function that translates an input
+AST into an output AST. We refer to such a function as a \emph{pass}
+because it makes a pass over the AST.
 
 
 The syntax for $C_0$ is defined in Figure~\ref{fig:c0-syntax}.  The
 The syntax for $C_0$ is defined in Figure~\ref{fig:c0-syntax}.  The
 $C_0$ language supports the same operators as $S_0$ but the arguments
 $C_0$ language supports the same operators as $S_0$ but the arguments
@@ -1215,21 +1220,21 @@ To get from $C_0$ to x86-64 assembly requires three more steps, which
 we discuss below.
 we discuss below.
 \[\large
 \[\large
 \xymatrix@=50pt{
 \xymatrix@=50pt{
-  C_0 \ar@/^/[r]^-{\textsf{select\_instr.}}
-  & \text{x86}^{*} \ar@/^/[r]^-{\textsf{assign\_homes}} 
-  & \text{x86}^{*} \ar@/^/[r]^-{\textsf{patch\_instr.}}
+  C_0 \ar@/^/[r]^-{\key{select\_instr.}}
+  & \text{x86}^{*} \ar@/^/[r]^-{\key{assign\_homes}} 
+  & \text{x86}^{*} \ar@/^/[r]^-{\key{patch\_instr.}}
   & \text{x86}
   & \text{x86}
 }
 }
 \]
 \]
 We handle difference \#1, concerning the format of arithmetic
 We handle difference \#1, concerning the format of arithmetic
-instructions, in the \textsf{select\_instructions} pass.  The result
+instructions, in the \key{select\_instructions} pass.  The result
 of this pass produces programs consisting of x86-64 instructions that
 of this pass produces programs consisting of x86-64 instructions that
 use variables.
 use variables.
 %
 %
 As there are only 16 registers, we cannot always map variables to
 As there are only 16 registers, we cannot always map variables to
-registers (difference \#3). Fortunately, the stack can grow quite, so
-we can map variables to locations on the stack. This is handled in the
-\textsf{assign\_homes} pass. The topic of
+registers (difference \#3). Fortunately, the stack can grow quite
+large, so we can map variables to locations on the stack. This is
+handled in the \key{assign\_homes} pass. The topic of
 Chapter~\ref{ch:register-allocation} is implementing a smarter
 Chapter~\ref{ch:register-allocation} is implementing a smarter
 approach in which we make a best-effort to map variables to registers,
 approach in which we make a best-effort to map variables to registers,
 resorting to the stack only when necessary.
 resorting to the stack only when necessary.
@@ -1238,15 +1243,15 @@ The final pass in our journey to x86 handles an indiosycracy of x86
 assembly. Many x86 instructions have two arguments but only one of the
 assembly. Many x86 instructions have two arguments but only one of the
 arguments may be a memory reference. Because we are mapping variables
 arguments may be a memory reference. Because we are mapping variables
 to stack locations, many of our generated instructions will violate
 to stack locations, many of our generated instructions will violate
-this restriction. The purpose of the \textsf{patch\_instructions} pass
-is to fix this problem by replacing every bad instruction with a short
-sequence of instructions that use the \key{rax} register.
+this restriction. The purpose of the \key{patch\_instructions} pass
+is to fix this problem by replacing every violating instruction with a
+short sequence of instructions that use the \key{rax} register.
 
 
 \section{Uniquify Variables}
 \section{Uniquify Variables}
 \label{sec:uniquify-s0}
 \label{sec:uniquify-s0}
 
 
 The purpose of this pass is to make sure that each \key{let} uses a
 The purpose of this pass is to make sure that each \key{let} uses a
-unique variable name. For example, the \textsf{uniquify} pass could
+unique variable name. For example, the \key{uniquify} pass could
 translate
 translate
 \[
 \[
 \LET{x}{32}{ \BINOP{+}{ \LET{x}{10}{x} }{ x } }
 \LET{x}{32}{ \BINOP{+}{ \LET{x}{10}{x} }{ x } }
@@ -1256,21 +1261,71 @@ to
 \LET{x.1}{32}{ \BINOP{+}{ \LET{x.2}{10}{x.2} }{ x.1 } }
 \LET{x.1}{32}{ \BINOP{+}{ \LET{x.2}{10}{x.2} }{ x.1 } }
 \]
 \]
 
 
-We recommend implementing \textsf{uniquify} as a recursive function
-that mostly just copies the input program. However, when encountering
-a \key{let}, it should generate a unique name for the variable (the
+We recommend implementing \key{uniquify} as a recursive function that
+mostly just copies the input program. However, when encountering a
+\key{let}, it should generate a unique name for the variable (the
 Racket function \key{gensym} is handy for this) and associate the old
 Racket function \key{gensym} is handy for this) and associate the old
 name with the new unique name in an association list. The
 name with the new unique name in an association list. The
-\textsf{uniquify} function will need to access this association list
-when it gets to a variable reference, so we add another paramter to
-\textsf{uniquify} for the association list.
+\key{uniquify} function will need to access this association list when
+it gets to a variable reference, so we add another paramter to
+\key{uniquify} for the association list. It is quite common for a
+compiler pass to need a map to store extra information about
+variables. Such maps are often called \emph{symbol tables}.
+
+The skeleton of the \key{uniquify} function is shown in
+Figure~\ref{fig:uniquify-s0}.  The function is curried so that it is
+convenient to partially apply it to an association list and then apply
+it to different expressions, as in the last clause for primitive
+operations in Figure~\ref{fig:uniquify-s0}.
+
+\begin{exercise}
+Complete the \key{uniquify} pass by filling in the blanks, that is,
+implement the clauses for variables and for the \key{let} construct.
+\end{exercise}
+
+\begin{figure}[tbp]
+\begin{lstlisting}
+   (define uniquify
+     (lambda (alist)
+       (lambda (e)
+         (match e
+           [(? symbol?) ___]
+           [(? integer?) e]
+           [`(let ([,x ,e]) ,body) ___]
+           [`(program ,info ,e)
+            `(program ,info ,((uniquify alist) e))]
+           [`(,op ,es ...)
+            `(,op ,@(map (uniquify alist) es))]
+           ))))
+\end{lstlisting}
+\caption{Skeleton for the \key{uniquify} pass.}
+\label{fig:uniquify-s0}
+\end{figure}
+
+\begin{exercise}
+Test your \key{uniquify} pass by creating three example $S_0$ programs
+and checking whether the output programs produce the same result as
+the input programs. The $S_0$ programs should be designed to test the
+most interesting parts of the \key{uniquify} pass, that is, the
+programs should include \key{let} constructs, variables, and variables
+that overshadow eachother.  
+
+[to do: explain the test-compiler function and interpret-S0.]
+
+%% You can use the interpreter \key{interpret-S0} defined in the
+%% \key{interp.rkt} file. The entire sequence of tests should be a short
+%% Racket program so you can re-run all the tests by running the Racket
+%% program. We refer to this as the \emph{regression test} program.
+\end{exercise}
+
 
 
 \section{Flatten Expressions}
 \section{Flatten Expressions}
 \label{sec:flatten-s0}
 \label{sec:flatten-s0}
 
 
-The purpose of the \textsf{flatten} pass is to get rid of nested
-expressions, such as the $\UNIOP{-}{10}$ in the following program,
-without changing the behavior of the program.
+The \key{flatten} pass will transform $S_0$ programs into $C_0$
+programs. In particular, the purpose of the \key{flatten} pass is to
+get rid of nested expressions, such as the $\UNIOP{-}{10}$ in the
+following program.
 \[
 \[
 \BINOP{+}{52}{ \UNIOP{-}{10} }
 \BINOP{+}{52}{ \UNIOP{-}{10} }
 \]
 \]
@@ -1281,14 +1336,19 @@ translated to the following one.
 \[
 \[
 \begin{array}{l}
 \begin{array}{l}
 \ASSIGN{ \itm{x} }{ \UNIOP{-}{10} } \\
 \ASSIGN{ \itm{x} }{ \UNIOP{-}{10} } \\
-\RETURN{ \BINOP{+}{52}{ \itm{x} } }
+\ASSIGN{ \itm{y} }{ \BINOP{+}{52}{ \itm{x} } } \\
+\RETURN{ y }
 \end{array}
 \end{array}
 \]
 \]
 
 
-We recommend implementing \textsf{flatten} as a recursive function
-that returns two things, 1) the newly flattened expression, and 2) a
-list of assignment statements, one for each of the new variables
-introduced while flattening the expression.
+We recommend implementing \key{flatten} as a structurally recursive
+function that returns two things, 1) the newly flattened expression,
+and 2) a list of assignment statements, one for each of the new
+variables introduced while flattening the expression. You can return
+multiple things from a function using the \key{values} form and you
+can receive multiple things from a function call using the
+\key{define-values} form. If you are not familiar with these
+constructs, the Racket documentation will be of help.
 
 
 Take special care for programs such as the following that initialize
 Take special care for programs such as the following that initialize
 variables with integers or other variables.
 variables with integers or other variables.
@@ -1302,7 +1362,7 @@ This program should be translated to
 \RETURN{b}
 \RETURN{b}
 \]
 \]
 and not the following, which could result from a naive implementation
 and not the following, which could result from a naive implementation
-of \textsf{flatten}.
+of \key{flatten}.
 \[
 \[
 \ASSIGN{x.1}{42}\;
 \ASSIGN{x.1}{42}\;
 \ASSIGN{a}{x.1}\;
 \ASSIGN{a}{x.1}\;
@@ -1311,13 +1371,22 @@ of \textsf{flatten}.
 \RETURN{b}
 \RETURN{b}
 \]
 \]
 
 
+\begin{exercise}
+Implement the \key{flatten} pass and test it on all of the example
+programs that you created to test the \key{uniquify} pass and create
+three new example programs that are designed to exercise all of the
+interesting code in the \key{flatten} pass. 
+[to do: add to the test-compiler stuff]
+\end{exercise}
+
+
 \section{Select Instructions}
 \section{Select Instructions}
 \label{sec:select-s0}
 \label{sec:select-s0}
 
 
-In the \textsf{select\_instructions} pass we begin the work of
+In the \key{select\_instructions} pass we begin the work of
 translating from $C_0$ to x86. The target language of this pass is a
 translating from $C_0$ to x86. The target language of this pass is a
 pseudo-x86 language that still uses variables, so we add an AST node
 pseudo-x86 language that still uses variables, so we add an AST node
-of the form $\VAR{\itm{var}}$.  The \textsf{select\_instructions} pass
+of the form $\VAR{\itm{var}}$.  The \key{select\_instructions} pass
 deals with the differing format of arithmetic operations. For example,
 deals with the differing format of arithmetic operations. For example,
 in $C_0$ an addition operation could take the following form:
 in $C_0$ an addition operation could take the following form:
 \[
 \[
@@ -1349,9 +1418,9 @@ procedure.
 \label{sec:assign-s0}
 \label{sec:assign-s0}
 
 
 As discussed in Section~\ref{sec:plan-s0-x86}, the
 As discussed in Section~\ref{sec:plan-s0-x86}, the
-\textsf{assign\_homes} pass places all of the variables on the stack.
+\key{assign\_homes} pass places all of the variables on the stack.
 Consider again the example $S_0$ program $\BINOP{+}{52}{ \UNIOP{-}{10} }$,
 Consider again the example $S_0$ program $\BINOP{+}{52}{ \UNIOP{-}{10} }$,
-which after \textsf{select\_instructions} looks like the following.
+which after \key{select\_instructions} looks like the following.
 \[
 \[
 \begin{array}{l}
 \begin{array}{l}
 (\key{mov}\;\INT{10}\; \VAR{x})\\
 (\key{mov}\;\INT{10}\; \VAR{x})\\
@@ -1361,7 +1430,7 @@ which after \textsf{select\_instructions} looks like the following.
 \end{array}
 \end{array}
 \]
 \]
 The one and only variable $x$ is assigned to stack location
 The one and only variable $x$ is assigned to stack location
-\key{-8(\%rbp)}, so the \textsf{assign\_homes} pass translates the
+\key{-8(\%rbp)}, so the \key{assign\_homes} pass translates the
 above to
 above to
 \[
 \[
 \begin{array}{l}
 \begin{array}{l}
@@ -1388,7 +1457,7 @@ Consider again the following example.
 \[
 \[
 \LET{a}{42}{ \LET{b}{a}{ b }}
 \LET{a}{42}{ \LET{b}{a}{ b }}
 \]
 \]
-After \textsf{assign\_homes} pass, the above has been translated to
+After \key{assign\_homes} pass, the above has been translated to
 \[
 \[
 \begin{array}{l}
 \begin{array}{l}
 (\key{mov} \;\INT{42}\; \STACKLOC{{-}8})\\
 (\key{mov} \;\INT{42}\; \STACKLOC{{-}8})\\
@@ -1823,11 +1892,11 @@ shown in Figure~\ref{fig:reg-alloc-passes}.
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \[
 \[
 \xymatrix{
 \xymatrix{
-  C_0 \ar@/^/[r]^-{\textsf{select\_instr.}}
-    & \text{x86}^{*} \ar[d]^-{\textsf{uncover\_live}} \\
-    & \text{x86}^{*} \ar[d]^-{\textsf{build\_interference}} \\
-    & \text{x86}^{*} \ar[d]_-{\textsf{allocate\_register}} \\
-    & \text{x86}^{*} \ar@/^/[r]^-{\textsf{patch\_instr.}} 
+  C_0 \ar@/^/[r]^-{\key{select\_instr.}}
+    & \text{x86}^{*} \ar[d]^-{\key{uncover\_live}} \\
+    & \text{x86}^{*} \ar[d]^-{\key{build\_interference}} \\
+    & \text{x86}^{*} \ar[d]_-{\key{allocate\_register}} \\
+    & \text{x86}^{*} \ar@/^/[r]^-{\key{patch\_instr.}} 
     & \text{x86} 
     & \text{x86} 
 }
 }
 \]
 \]