6 年之前 · 829dfa1766
--- a/book.tex
+++ b/book.tex
@@ -337,9 +337,12 @@ by the following Racket expression:
 
															 \begin{center}
														
 
															 \texttt{`(+ (read) (- 8))}
														
 
															 \end{center}
														
 
															-The result is a list whose first element is the symbol \code{`+},
														
 
															-second element is a list (containing just one symbol), and third
														
 
															-element is another list (containing a symbol and a number).
														
 
															+When using S-expressions to represent ASTs, the convention is to
														
 
															+represent each AST node as a list and to put the operation symbol at
														
 
															+the front of the list. The rest of the list contains the children.  So
														
 
															+in the above case, the root AST node has operation \code{`+} and its
														
 
															+two children are \code{`(read)} and \code{`(- 8)}, just as in the
														
 
															+diagram \eqref{eq:arith-prog}.
														
 
															 To build larger S-expressions one often needs to splice together
														
 
															 several smaller S-expressions. Racket provides the comma operator to
														
@@ -640,11 +643,12 @@ $R_0$ program.
 
															 \end{minipage}
														
 
															 \end{center}
														
 
															-Indeed, the structural recursion follows the grammar itself.  We can generally
														
 
															-expect to write a recursive function to handle each non-terminal in the
														
 
															-grammar.\footnote{If you read the book \emph{How to Design Programs} 
														
 
															-  \url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}, this principle of
														
 
															-  structuring code according to the data definition is probably quite familiar.}
														
 
															+Indeed, the structural recursion follows the grammar itself.  We can
														
 
															+generally expect to write a recursive function to handle each
														
 
															+non-terminal in the grammar.\footnote{This principle of structuring
														
 
															+  code according to the data definition is advocated in the book
														
 
															+  \emph{How to Design Programs}
														
 
															+  \url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}.}
														
 
															 You may be tempted to write the program with just one function, like this:
														
 
															 \begin{center}
														
@@ -686,49 +690,27 @@ regard~\citep{reynolds72:_def_interp}. Here we warm up by writing an
 
															 interpreter for the $R_0$ language, which serves as a second example
														
 
															 of structural recursion. The \texttt{interp-R0} function is defined in
														
 
															 Figure~\ref{fig:interp-R0}. The body of the function is a match on the
														
 
															-input program \texttt{p} and then a call to the \lstinline{exp} helper
														
 
															-function, which in turn has one match clause per grammar rule for
														
 
															-$R_0$ expressions.
														
 
															-
														
 
															-The \lstinline{exp} function is naturally recursive: clauses for internal AST
														
 
															-nodes make recursive calls on each child node.  Note that the recursive cases
														
 
															-for negation and addition are a place where we could have made use of the
														
 
															-\key{app} feature of Racket's \key{match} to apply a function and bind the
														
 
															-result.  The two recursive cases of \lstinline{interp-R0} would become:
														
 
															-
														
 
															-\begin{minipage}{0.5\textwidth}
														
 
															-\begin{lstlisting}
														
 
															-     [`(- ,(app exp v))  (fx- 0 v)]
														
 
															-     [`(+ ,(app exp v1) ,(app exp v2)) (fx+ v1 v2)]))
														
 
															-\end{lstlisting}
														
 
															-\end{minipage}
														
 
															-
														
 
															-Here we use \lstinline{(app exp v)} to recursively apply \texttt{exp} to the
														
 
															-child node and bind the \emph{result value} to variable \texttt{v}.  The
														
 
															-difference between this version and the code in Figure~\ref{fig:interp-R0} is
														
 
															-mainly stylistic, although if side effects are involved the order of evaluation
														
 
															-can become important.  Further, when we write functions with multiple return
														
 
															-values, the \key{app} form can be convenient for binding the resulting values.
														
 
															+input program \texttt{p} and then a call to the \lstinline{interp-exp}
														
 
															+helper function, which in turn has one match clause per grammar rule
														
 
															+for $R_0$ expressions.
														
 
															 \begin{figure}[tbp]
														
 
															 \begin{lstlisting}
														
 
															+   (define (interp-exp e)
														
 
															+     (match e
														
 
															+       [(? fixnum?) e]
														
 
															+       [`(read)
														
 
															+        (let ([r (read)])
														
 
															+          (cond [(fixnum? r) r]
														
 
															+                [else (error 'interp-R0 "input not an integer" r)]))]
														
 
															+       [`(- ,e1)     (fx- 0 (interp-exp e1))]
														
 
															+       [`(+ ,e1 ,e2) (fx+ (interp-exp e1) (interp-exp e2))]))
														
 
															+
														
 
															    (define (interp-R0 p)
														
 
															-     (define (exp ex)
														
 
															-       (match ex
														
 
															-         [(? fixnum?) ex]
														
 
															-         [`(read)
														
 
															-          (let ([r (read)])
														
 
															-            (cond [(fixnum? r) r]
														
 
															-                  [else (error 'interp-R0 "input not an integer" r)]))]
														
 
															-         [`(- ,e)        (fx- 0 (exp e))]
														
 
															-         [`(+ ,e1 ,e2) (fx+ (exp e1) (exp e2))]))
														
 
															      (match p
														
 
															-       [`(program ,e) (exp e)]))
														
 
															+       [`(program ,e) (interp-exp e)]))
														
 
															 \end{lstlisting}
														
 
															-\caption{Interpreter for the $R_0$ language.
														
 
															-  \rn{Having two functions here for prog/exp wouldn't take much more space.
														
 
															-    I'll change that once I get further.. but I also need to know what the story
														
 
															-   is for running this code?}}
														
 
															+\caption{Interpreter for the $R_0$ language.}
														
 
															 \label{fig:interp-R0}
														
 
															 \end{figure}
														
@@ -748,25 +730,38 @@ each other, in this case nesting several additions and negations.
 
															 \end{lstlisting}
														
 
															 What is the result of the above program?
														
 
															-\noindent
														
 
															-If we interpret the AST \eqref{eq:arith-prog} and give it the input
														
 
															-\texttt{50}
														
 
															+As mentioned previously, the $R0$ language does not support
														
 
															+arbitrarily-large integers, but only $63$-bit integers, so we
														
 
															+interpret the arithmetic operations of $R0$ using fixnum arithmetic.
														
 
															+What happens when we run the following program?
														
 
															 \begin{lstlisting}
														
 
															-   (interp-R0 ast1.1)
														
 
															+   (define large 999999999999999999)
														
 
															+   (interp-R0 `(program (+ (+ (+ ,large ,large) (+ ,large ,large))
														
 
															+                           (+ (+ ,large ,large) (+ ,large ,large)))))
														
 
															 \end{lstlisting}
														
 
															-we get the answer to life, the universe, and everything:
														
 
															+It produces an error:
														
 
															 \begin{lstlisting}
														
 
															-   42
														
 
															+   fx+: result is not a fixnum
														
 
															 \end{lstlisting}
														
 
															+We shall use the convention that if the interpreter for a language
														
 
															+produces an error when run on a program, then the meaning of the
														
 
															+program is unspecified. The compiler for the language is under no
														
 
															+obligation for such a program; it can produce an executable that does
														
 
															+anything.
														
 
															+\noindent
														
 
															 Moving on, the \key{read} operation prompts the user of the program
														
 
															-for an integer. Given an input of \key{10}, the following program
														
 
															-produces \key{42}.
														
 
															+for an integer. If we interpret the AST \eqref{eq:arith-prog} and give
														
 
															+it the input \texttt{50}
														
 
															+\begin{lstlisting}
														
 
															+   (interp-R0 ast1.1)
														
 
															+\end{lstlisting}
														
 
															+we get the answer to life, the universe, and everything:
														
 
															 \begin{lstlisting}
														
 
															-   (+ (read) 32)
														
 
															+   42
														
 
															 \end{lstlisting}
														
 
															-We include the \key{read} operation in $R_1$ so a clever student
														
 
															-cannot implement a compiler for $R_1$ simply by running the
														
 
															+We include the \key{read} operation in $R_0$ so a clever student
														
 
															+cannot implement a compiler for $R_0$ simply by running the
														
 
															 interpreter at compilation time to obtain the output and then
														
 
															 generating the trivial code to return the output.  (A clever student
														
 
															 did this in a previous version of the course.)
														
@@ -845,6 +840,17 @@ partially evaluating the children nodes.
 
															 \label{fig:pe-arith}
														
 
															 \end{figure}
														
 
															+Note that in the recursive cases in \code{pe-arith} for negation and
														
 
															+addition, we have made use of the \key{app} feature of Racket's
														
 
															+\key{match} to apply a function and bind the result.  Here we use
														
 
															+\lstinline{(app pe-arith r1)} to recursively apply \texttt{pe-arith}
														
 
															+to the child node and bind the \emph{result value} to variable
														
 
															+\texttt{r1}.  The choice of whether to use \key{app} is mainly
														
 
															+stylistic, although if side effects are involved the change in order
														
 
															+of evaluation may be in issue.  Further, when we write functions with
														
 
															+multiple return values, the \key{app} form can be convenient for
														
 
															+binding the resulting values.
														
 
															+
														
 
															 Our code for \texttt{pe-neg} and \texttt{pe-add} implements the simple
														
 
															 idea of checking whether the inputs are integers and if they are, to
														
 
															 go ahead and perform the arithmetic.  Otherwise, we use quasiquote to
														
@@ -1004,26 +1010,27 @@ to the variable, then evaluates the body of the \key{let}.
 
															 \begin{figure}[tbp]
														
 
															 \begin{lstlisting}
														
 
															+   (define (interp-exp env)
														
 
															+     (lambda (e)
														
 
															+       (match e
														
 
															+         [(? symbol?) (lookup e env)]
														
 
															+         [`(let ([,x ,(app (interp-exp env) v)]) ,body)
														
 
															+          (define new-env (cons (cons x v) env))
														
 
															+          ((interp-exp new-env) body)]
														
 
															+         [(? fixnum?) e]
														
 
															+         [`(read)
														
 
															+          (define r (read))
														
 
															+          (cond [(fixnum? r) r]
														
 
															+                [else (error 'interp-R1 "expected an integer" r)])]
														
 
															+         [`(- ,(app (interp-exp env) v))
														
 
															+          (fx- 0 v)]
														
 
															+         [`(+ ,(app (interp-exp env) v1) ,(app (interp-exp env) v2))
														
 
															+           (fx+ v1 v2)])))
														
 
															+           
														
 
															    (define (interp-R1 env)
														
 
															-     (define (exp env)
														
 
															-       (lambda (e)
														
 
															-         (match e
														
 
															-           [(? symbol?) (lookup e env)]
														
 
															-           [`(let ([,x ,(app (exp env) v)]) ,body)
														
 
															-            (define new-env (cons (cons x v) env))
														
 
															-            ((exp new-env) body)]
														
 
															-           [(? fixnum?) e]
														
 
															-           [`(read)
														
 
															-            (define r (read))
														
 
															-            (cond [(fixnum? r) r]
														
 
															-                  [else (error 'interp-R1 "expected an integer" r)])]
														
 
															-           [`(- ,(app (exp env) v))
														
 
															-            (fx- 0 v)]
														
 
															-           [`(+ ,(app (exp env) v1) ,(app (exp env) v2))
														
 
															-             (fx+ v1 v2)])))
														
 
															      (lambda (p)
														
 
															        (match p
														
 
															-         [`(program ,e) ((exp '()) e)])))
														
 
															+         [`(program ,e) ((interp-exp '()) e)])))
														
 
															 \end{lstlisting}
														
 
															 \caption{Interpreter for the $R_1$ language.}
														
 
															 \label{fig:interp-R1}
														
@@ -1397,7 +1404,7 @@ $C_0$.
 
															 Each of these steps in the compiler is implemented by a function,
														
 
															 typically a structurally recursive function that translates an input
														
 
															 AST into an output AST. We refer to such a function as a \emph{pass}
														
 
															-because it makes a pass over, i.e. it traverses the entire AST.
														
 
															+because it makes a pass over, i.e. it traverses, the entire AST.
														
 
															 The syntax for $C_0$ is defined in Figure~\ref{fig:c0-syntax}.  The
														
 
															 $C_0$ language supports the same operators as $R_1$ but the arguments
														
@@ -1428,23 +1435,24 @@ C_0 & ::= & (\key{program}\;(\Var^{*})\;\Stmt^{+})
 
															 \label{fig:c0-syntax}
														
 
															 \end{figure}
														
 
															-To get from $C_0$ to x86 assembly it remains for us to handle
														
 
															+To get from $C_0$ to x86 assembly, it remains for us to handle
														
 
															 difference \#1 (the format of instructions) and difference \#3
														
 
															-(variables versus registers). These two differences are intertwined,
														
 
															-creating a bit of a Gordian Knot. To handle difference \#3, we need to
														
 
															-map some variables to registers (there are only 16 registers) and the
														
 
															-remaining variables to locations on the stack (which is unbounded). To
														
 
															-make good decisions regarding this mapping, we need the program to be
														
 
															-close to its final form (in x86 assembly) so we know exactly when
														
 
															-which variables are used. After all, variables that are used in
														
 
															-disjoint parts of the program can be assigned to the same register.
														
 
															-However, our choice of x86 instructions depends on whether the
														
 
															-variables are mapped to registers or stack locations, so we have a
														
 
															-circular dependency. We cut this knot by doing an optimistic selection
														
 
															-of instructions in the \key{select-instructions} pass, followed by the
														
 
															-\key{assign-homes} pass to map variables to registers or stack
														
 
															-locations, and conclude by finalizing the instruction selection in the
														
 
															-\key{patch-instructions} pass.
														
 
															+(variables versus stack locations and registers). These two
														
 
															+differences are intertwined, creating a bit of a Gordian Knot. To
														
 
															+handle difference \#3, we need to map some variables to registers
														
 
															+(there are only 16 registers) and the remaining variables to locations
														
 
															+on the stack (which is unbounded). To make good decisions regarding
														
 
															+this mapping, we need the program to be close to its final form (in
														
 
															+x86 assembly) so we know exactly when which variables are used. After
														
 
															+all, variables that are used at different time periods during program
														
 
															+execution can be assigned to the same register.  However, our choice
														
 
															+of x86 instructions depends on whether the variables are mapped to
														
 
															+registers or stack locations, so we have a circular dependency. We cut
														
 
															+this knot by doing an optimistic selection of instructions in the
														
 
															+\key{select-instructions} pass, followed by the \key{assign-homes}
														
 
															+pass to map variables to registers or stack locations, and conclude by
														
 
															+finalizing the instruction selection in the \key{patch-instructions}
														
 
															+pass.
														
 
															 \[
														
 
															 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
														
 
															 \node (1) at (0,0)  {\large $C_0$};