6 lat temu · 829dfa1766
--- a/book.tex
+++ b/book.tex
@@ -337,9 +337,12 @@ by the following Racket expression:
 
				 \begin{center}
			
 
				 \texttt{`(+ (read) (- 8))}
			
 
				 \end{center}
			
 
				-The result is a list whose first element is the symbol \code{`+},
			
 
				-second element is a list (containing just one symbol), and third
			
 
				-element is another list (containing a symbol and a number).
			
 
				+When using S-expressions to represent ASTs, the convention is to
			
 
				+represent each AST node as a list and to put the operation symbol at
			
 
				+the front of the list. The rest of the list contains the children.  So
			
 
				+in the above case, the root AST node has operation \code{`+} and its
			
 
				+two children are \code{`(read)} and \code{`(- 8)}, just as in the
			
 
				+diagram \eqref{eq:arith-prog}.
			
 
				 
			
 
				 To build larger S-expressions one often needs to splice together
			
 
				 several smaller S-expressions. Racket provides the comma operator to
			
@@ -640,11 +643,12 @@ $R_0$ program.
 
				 \end{minipage}
			
 
				 \end{center}
			
 
				 
			
 
				-Indeed, the structural recursion follows the grammar itself.  We can generally
			
 
				-expect to write a recursive function to handle each non-terminal in the
			
 
				-grammar.\footnote{If you read the book \emph{How to Design Programs} 
			
 
				-  \url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}, this principle of
			
 
				-  structuring code according to the data definition is probably quite familiar.}
			
 
				+Indeed, the structural recursion follows the grammar itself.  We can
			
 
				+generally expect to write a recursive function to handle each
			
 
				+non-terminal in the grammar.\footnote{This principle of structuring
			
 
				+  code according to the data definition is advocated in the book
			
 
				+  \emph{How to Design Programs}
			
 
				+  \url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}.}
			
 
				 
			
 
				 You may be tempted to write the program with just one function, like this:
			
 
				 \begin{center}
			
@@ -686,49 +690,27 @@ regard~\citep{reynolds72:_def_interp}. Here we warm up by writing an
 
				 interpreter for the $R_0$ language, which serves as a second example
			
 
				 of structural recursion. The \texttt{interp-R0} function is defined in
			
 
				 Figure~\ref{fig:interp-R0}. The body of the function is a match on the
			
 
				-input program \texttt{p} and then a call to the \lstinline{exp} helper
			
 
				-function, which in turn has one match clause per grammar rule for
			
 
				-$R_0$ expressions.
			
 
				-
			
 
				-The \lstinline{exp} function is naturally recursive: clauses for internal AST
			
 
				-nodes make recursive calls on each child node.  Note that the recursive cases
			
 
				-for negation and addition are a place where we could have made use of the
			
 
				-\key{app} feature of Racket's \key{match} to apply a function and bind the
			
 
				-result.  The two recursive cases of \lstinline{interp-R0} would become:
			
 
				-
			
 
				-\begin{minipage}{0.5\textwidth}
			
 
				-\begin{lstlisting}
			
 
				-     [`(- ,(app exp v))  (fx- 0 v)]
			
 
				-     [`(+ ,(app exp v1) ,(app exp v2)) (fx+ v1 v2)]))
			
 
				-\end{lstlisting}
			
 
				-\end{minipage}
			
 
				-
			
 
				-Here we use \lstinline{(app exp v)} to recursively apply \texttt{exp} to the
			
 
				-child node and bind the \emph{result value} to variable \texttt{v}.  The
			
 
				-difference between this version and the code in Figure~\ref{fig:interp-R0} is
			
 
				-mainly stylistic, although if side effects are involved the order of evaluation
			
 
				-can become important.  Further, when we write functions with multiple return
			
 
				-values, the \key{app} form can be convenient for binding the resulting values.
			
 
				+input program \texttt{p} and then a call to the \lstinline{interp-exp}
			
 
				+helper function, which in turn has one match clause per grammar rule
			
 
				+for $R_0$ expressions.
			
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{lstlisting}
			
 
				+   (define (interp-exp e)
			
 
				+     (match e
			
 
				+       [(? fixnum?) e]
			
 
				+       [`(read)
			
 
				+        (let ([r (read)])
			
 
				+          (cond [(fixnum? r) r]
			
 
				+                [else (error 'interp-R0 "input not an integer" r)]))]
			
 
				+       [`(- ,e1)     (fx- 0 (interp-exp e1))]
			
 
				+       [`(+ ,e1 ,e2) (fx+ (interp-exp e1) (interp-exp e2))]))
			
 
				+
			
 
				    (define (interp-R0 p)
			
 
				-     (define (exp ex)
			
 
				-       (match ex
			
 
				-         [(? fixnum?) ex]
			
 
				-         [`(read)
			
 
				-          (let ([r (read)])
			
 
				-            (cond [(fixnum? r) r]
			
 
				-                  [else (error 'interp-R0 "input not an integer" r)]))]
			
 
				-         [`(- ,e)        (fx- 0 (exp e))]
			
 
				-         [`(+ ,e1 ,e2) (fx+ (exp e1) (exp e2))]))
			
 
				      (match p
			
 
				-       [`(program ,e) (exp e)]))
			
 
				+       [`(program ,e) (interp-exp e)]))
			
 
				 \end{lstlisting}
			
 
				-\caption{Interpreter for the $R_0$ language.
			
 
				-  \rn{Having two functions here for prog/exp wouldn't take much more space.
			
 
				-    I'll change that once I get further.. but I also need to know what the story
			
 
				-   is for running this code?}}
			
 
				+\caption{Interpreter for the $R_0$ language.}
			
 
				 \label{fig:interp-R0}
			
 
				 \end{figure}
			
 
				 
			
@@ -748,25 +730,38 @@ each other, in this case nesting several additions and negations.
 
				 \end{lstlisting}
			
 
				 What is the result of the above program?
			
 
				 
			
 
				-\noindent
			
 
				-If we interpret the AST \eqref{eq:arith-prog} and give it the input
			
 
				-\texttt{50}
			
 
				+As mentioned previously, the $R0$ language does not support
			
 
				+arbitrarily-large integers, but only $63$-bit integers, so we
			
 
				+interpret the arithmetic operations of $R0$ using fixnum arithmetic.
			
 
				+What happens when we run the following program?
			
 
				 \begin{lstlisting}
			
 
				-   (interp-R0 ast1.1)
			
 
				+   (define large 999999999999999999)
			
 
				+   (interp-R0 `(program (+ (+ (+ ,large ,large) (+ ,large ,large))
			
 
				+                           (+ (+ ,large ,large) (+ ,large ,large)))))
			
 
				 \end{lstlisting}
			
 
				-we get the answer to life, the universe, and everything:
			
 
				+It produces an error:
			
 
				 \begin{lstlisting}
			
 
				-   42
			
 
				+   fx+: result is not a fixnum
			
 
				 \end{lstlisting}
			
 
				+We shall use the convention that if the interpreter for a language
			
 
				+produces an error when run on a program, then the meaning of the
			
 
				+program is unspecified. The compiler for the language is under no
			
 
				+obligation for such a program; it can produce an executable that does
			
 
				+anything.
			
 
				 
			
 
				+\noindent
			
 
				 Moving on, the \key{read} operation prompts the user of the program
			
 
				-for an integer. Given an input of \key{10}, the following program
			
 
				-produces \key{42}.
			
 
				+for an integer. If we interpret the AST \eqref{eq:arith-prog} and give
			
 
				+it the input \texttt{50}
			
 
				+\begin{lstlisting}
			
 
				+   (interp-R0 ast1.1)
			
 
				+\end{lstlisting}
			
 
				+we get the answer to life, the universe, and everything:
			
 
				 \begin{lstlisting}
			
 
				-   (+ (read) 32)
			
 
				+   42
			
 
				 \end{lstlisting}
			
 
				-We include the \key{read} operation in $R_1$ so a clever student
			
 
				-cannot implement a compiler for $R_1$ simply by running the
			
 
				+We include the \key{read} operation in $R_0$ so a clever student
			
 
				+cannot implement a compiler for $R_0$ simply by running the
			
 
				 interpreter at compilation time to obtain the output and then
			
 
				 generating the trivial code to return the output.  (A clever student
			
 
				 did this in a previous version of the course.)
			
@@ -845,6 +840,17 @@ partially evaluating the children nodes.
 
				 \label{fig:pe-arith}
			
 
				 \end{figure}
			
 
				 
			
 
				+Note that in the recursive cases in \code{pe-arith} for negation and
			
 
				+addition, we have made use of the \key{app} feature of Racket's
			
 
				+\key{match} to apply a function and bind the result.  Here we use
			
 
				+\lstinline{(app pe-arith r1)} to recursively apply \texttt{pe-arith}
			
 
				+to the child node and bind the \emph{result value} to variable
			
 
				+\texttt{r1}.  The choice of whether to use \key{app} is mainly
			
 
				+stylistic, although if side effects are involved the change in order
			
 
				+of evaluation may be in issue.  Further, when we write functions with
			
 
				+multiple return values, the \key{app} form can be convenient for
			
 
				+binding the resulting values.
			
 
				+
			
 
				 Our code for \texttt{pe-neg} and \texttt{pe-add} implements the simple
			
 
				 idea of checking whether the inputs are integers and if they are, to
			
 
				 go ahead and perform the arithmetic.  Otherwise, we use quasiquote to
			
@@ -1004,26 +1010,27 @@ to the variable, then evaluates the body of the \key{let}.
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{lstlisting}
			
 
				+   (define (interp-exp env)
			
 
				+     (lambda (e)
			
 
				+       (match e
			
 
				+         [(? symbol?) (lookup e env)]
			
 
				+         [`(let ([,x ,(app (interp-exp env) v)]) ,body)
			
 
				+          (define new-env (cons (cons x v) env))
			
 
				+          ((interp-exp new-env) body)]
			
 
				+         [(? fixnum?) e]
			
 
				+         [`(read)
			
 
				+          (define r (read))
			
 
				+          (cond [(fixnum? r) r]
			
 
				+                [else (error 'interp-R1 "expected an integer" r)])]
			
 
				+         [`(- ,(app (interp-exp env) v))
			
 
				+          (fx- 0 v)]
			
 
				+         [`(+ ,(app (interp-exp env) v1) ,(app (interp-exp env) v2))
			
 
				+           (fx+ v1 v2)])))
			
 
				+           
			
 
				    (define (interp-R1 env)
			
 
				-     (define (exp env)
			
 
				-       (lambda (e)
			
 
				-         (match e
			
 
				-           [(? symbol?) (lookup e env)]
			
 
				-           [`(let ([,x ,(app (exp env) v)]) ,body)
			
 
				-            (define new-env (cons (cons x v) env))
			
 
				-            ((exp new-env) body)]
			
 
				-           [(? fixnum?) e]
			
 
				-           [`(read)
			
 
				-            (define r (read))
			
 
				-            (cond [(fixnum? r) r]
			
 
				-                  [else (error 'interp-R1 "expected an integer" r)])]
			
 
				-           [`(- ,(app (exp env) v))
			
 
				-            (fx- 0 v)]
			
 
				-           [`(+ ,(app (exp env) v1) ,(app (exp env) v2))
			
 
				-             (fx+ v1 v2)])))
			
 
				      (lambda (p)
			
 
				        (match p
			
 
				-         [`(program ,e) ((exp '()) e)])))
			
 
				+         [`(program ,e) ((interp-exp '()) e)])))
			
 
				 \end{lstlisting}
			
 
				 \caption{Interpreter for the $R_1$ language.}
			
 
				 \label{fig:interp-R1}
			
@@ -1397,7 +1404,7 @@ $C_0$.
 
				 Each of these steps in the compiler is implemented by a function,
			
 
				 typically a structurally recursive function that translates an input
			
 
				 AST into an output AST. We refer to such a function as a \emph{pass}
			
 
				-because it makes a pass over, i.e. it traverses the entire AST.
			
 
				+because it makes a pass over, i.e. it traverses, the entire AST.
			
 
				 
			
 
				 The syntax for $C_0$ is defined in Figure~\ref{fig:c0-syntax}.  The
			
 
				 $C_0$ language supports the same operators as $R_1$ but the arguments
			
@@ -1428,23 +1435,24 @@ C_0 & ::= & (\key{program}\;(\Var^{*})\;\Stmt^{+})
 
				 \label{fig:c0-syntax}
			
 
				 \end{figure}
			
 
				 
			
 
				-To get from $C_0$ to x86 assembly it remains for us to handle
			
 
				+To get from $C_0$ to x86 assembly, it remains for us to handle
			
 
				 difference \#1 (the format of instructions) and difference \#3
			
 
				-(variables versus registers). These two differences are intertwined,
			
 
				-creating a bit of a Gordian Knot. To handle difference \#3, we need to
			
 
				-map some variables to registers (there are only 16 registers) and the
			
 
				-remaining variables to locations on the stack (which is unbounded). To
			
 
				-make good decisions regarding this mapping, we need the program to be
			
 
				-close to its final form (in x86 assembly) so we know exactly when
			
 
				-which variables are used. After all, variables that are used in
			
 
				-disjoint parts of the program can be assigned to the same register.
			
 
				-However, our choice of x86 instructions depends on whether the
			
 
				-variables are mapped to registers or stack locations, so we have a
			
 
				-circular dependency. We cut this knot by doing an optimistic selection
			
 
				-of instructions in the \key{select-instructions} pass, followed by the
			
 
				-\key{assign-homes} pass to map variables to registers or stack
			
 
				-locations, and conclude by finalizing the instruction selection in the
			
 
				-\key{patch-instructions} pass.
			
 
				+(variables versus stack locations and registers). These two
			
 
				+differences are intertwined, creating a bit of a Gordian Knot. To
			
 
				+handle difference \#3, we need to map some variables to registers
			
 
				+(there are only 16 registers) and the remaining variables to locations
			
 
				+on the stack (which is unbounded). To make good decisions regarding
			
 
				+this mapping, we need the program to be close to its final form (in
			
 
				+x86 assembly) so we know exactly when which variables are used. After
			
 
				+all, variables that are used at different time periods during program
			
 
				+execution can be assigned to the same register.  However, our choice
			
 
				+of x86 instructions depends on whether the variables are mapped to
			
 
				+registers or stack locations, so we have a circular dependency. We cut
			
 
				+this knot by doing an optimistic selection of instructions in the
			
 
				+\key{select-instructions} pass, followed by the \key{assign-homes}
			
 
				+pass to map variables to registers or stack locations, and conclude by
			
 
				+finalizing the instruction selection in the \key{patch-instructions}
			
 
				+pass.
			
 
				 \[
			
 
				 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
			
 
				 \node (1) at (0,0)  {\large $C_0$};