Jeremy Siek 6 years ago
parent
commit
829dfa1766
1 changed files with 98 additions and 90 deletions
  1. 98 90
      book.tex

+ 98 - 90
book.tex

@@ -337,9 +337,12 @@ by the following Racket expression:
 \begin{center}
 \begin{center}
 \texttt{`(+ (read) (- 8))}
 \texttt{`(+ (read) (- 8))}
 \end{center}
 \end{center}
-The result is a list whose first element is the symbol \code{`+},
-second element is a list (containing just one symbol), and third
-element is another list (containing a symbol and a number).
+When using S-expressions to represent ASTs, the convention is to
+represent each AST node as a list and to put the operation symbol at
+the front of the list. The rest of the list contains the children.  So
+in the above case, the root AST node has operation \code{`+} and its
+two children are \code{`(read)} and \code{`(- 8)}, just as in the
+diagram \eqref{eq:arith-prog}.
 
 
 To build larger S-expressions one often needs to splice together
 To build larger S-expressions one often needs to splice together
 several smaller S-expressions. Racket provides the comma operator to
 several smaller S-expressions. Racket provides the comma operator to
@@ -640,11 +643,12 @@ $R_0$ program.
 \end{minipage}
 \end{minipage}
 \end{center}
 \end{center}
 
 
-Indeed, the structural recursion follows the grammar itself.  We can generally
-expect to write a recursive function to handle each non-terminal in the
-grammar.\footnote{If you read the book \emph{How to Design Programs} 
-  \url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}, this principle of
-  structuring code according to the data definition is probably quite familiar.}
+Indeed, the structural recursion follows the grammar itself.  We can
+generally expect to write a recursive function to handle each
+non-terminal in the grammar.\footnote{This principle of structuring
+  code according to the data definition is advocated in the book
+  \emph{How to Design Programs}
+  \url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}.}
 
 
 You may be tempted to write the program with just one function, like this:
 You may be tempted to write the program with just one function, like this:
 \begin{center}
 \begin{center}
@@ -686,49 +690,27 @@ regard~\citep{reynolds72:_def_interp}. Here we warm up by writing an
 interpreter for the $R_0$ language, which serves as a second example
 interpreter for the $R_0$ language, which serves as a second example
 of structural recursion. The \texttt{interp-R0} function is defined in
 of structural recursion. The \texttt{interp-R0} function is defined in
 Figure~\ref{fig:interp-R0}. The body of the function is a match on the
 Figure~\ref{fig:interp-R0}. The body of the function is a match on the
-input program \texttt{p} and then a call to the \lstinline{exp} helper
-function, which in turn has one match clause per grammar rule for
-$R_0$ expressions.
-
-The \lstinline{exp} function is naturally recursive: clauses for internal AST
-nodes make recursive calls on each child node.  Note that the recursive cases
-for negation and addition are a place where we could have made use of the
-\key{app} feature of Racket's \key{match} to apply a function and bind the
-result.  The two recursive cases of \lstinline{interp-R0} would become:
-
-\begin{minipage}{0.5\textwidth}
-\begin{lstlisting}
-     [`(- ,(app exp v))  (fx- 0 v)]
-     [`(+ ,(app exp v1) ,(app exp v2)) (fx+ v1 v2)]))
-\end{lstlisting}
-\end{minipage}
-
-Here we use \lstinline{(app exp v)} to recursively apply \texttt{exp} to the
-child node and bind the \emph{result value} to variable \texttt{v}.  The
-difference between this version and the code in Figure~\ref{fig:interp-R0} is
-mainly stylistic, although if side effects are involved the order of evaluation
-can become important.  Further, when we write functions with multiple return
-values, the \key{app} form can be convenient for binding the resulting values.
+input program \texttt{p} and then a call to the \lstinline{interp-exp}
+helper function, which in turn has one match clause per grammar rule
+for $R_0$ expressions.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \begin{lstlisting}
 \begin{lstlisting}
+   (define (interp-exp e)
+     (match e
+       [(? fixnum?) e]
+       [`(read)
+        (let ([r (read)])
+          (cond [(fixnum? r) r]
+                [else (error 'interp-R0 "input not an integer" r)]))]
+       [`(- ,e1)     (fx- 0 (interp-exp e1))]
+       [`(+ ,e1 ,e2) (fx+ (interp-exp e1) (interp-exp e2))]))
+
    (define (interp-R0 p)
    (define (interp-R0 p)
-     (define (exp ex)
-       (match ex
-         [(? fixnum?) ex]
-         [`(read)
-          (let ([r (read)])
-            (cond [(fixnum? r) r]
-                  [else (error 'interp-R0 "input not an integer" r)]))]
-         [`(- ,e)        (fx- 0 (exp e))]
-         [`(+ ,e1 ,e2) (fx+ (exp e1) (exp e2))]))
      (match p
      (match p
-       [`(program ,e) (exp e)]))
+       [`(program ,e) (interp-exp e)]))
 \end{lstlisting}
 \end{lstlisting}
-\caption{Interpreter for the $R_0$ language.
-  \rn{Having two functions here for prog/exp wouldn't take much more space.
-    I'll change that once I get further.. but I also need to know what the story
-   is for running this code?}}
+\caption{Interpreter for the $R_0$ language.}
 \label{fig:interp-R0}
 \label{fig:interp-R0}
 \end{figure}
 \end{figure}
 
 
@@ -748,25 +730,38 @@ each other, in this case nesting several additions and negations.
 \end{lstlisting}
 \end{lstlisting}
 What is the result of the above program?
 What is the result of the above program?
 
 
-\noindent
-If we interpret the AST \eqref{eq:arith-prog} and give it the input
-\texttt{50}
+As mentioned previously, the $R0$ language does not support
+arbitrarily-large integers, but only $63$-bit integers, so we
+interpret the arithmetic operations of $R0$ using fixnum arithmetic.
+What happens when we run the following program?
 \begin{lstlisting}
 \begin{lstlisting}
-   (interp-R0 ast1.1)
+   (define large 999999999999999999)
+   (interp-R0 `(program (+ (+ (+ ,large ,large) (+ ,large ,large))
+                           (+ (+ ,large ,large) (+ ,large ,large)))))
 \end{lstlisting}
 \end{lstlisting}
-we get the answer to life, the universe, and everything:
+It produces an error:
 \begin{lstlisting}
 \begin{lstlisting}
-   42
+   fx+: result is not a fixnum
 \end{lstlisting}
 \end{lstlisting}
+We shall use the convention that if the interpreter for a language
+produces an error when run on a program, then the meaning of the
+program is unspecified. The compiler for the language is under no
+obligation for such a program; it can produce an executable that does
+anything.
 
 
+\noindent
 Moving on, the \key{read} operation prompts the user of the program
 Moving on, the \key{read} operation prompts the user of the program
-for an integer. Given an input of \key{10}, the following program
-produces \key{42}.
+for an integer. If we interpret the AST \eqref{eq:arith-prog} and give
+it the input \texttt{50}
+\begin{lstlisting}
+   (interp-R0 ast1.1)
+\end{lstlisting}
+we get the answer to life, the universe, and everything:
 \begin{lstlisting}
 \begin{lstlisting}
-   (+ (read) 32)
+   42
 \end{lstlisting}
 \end{lstlisting}
-We include the \key{read} operation in $R_1$ so a clever student
-cannot implement a compiler for $R_1$ simply by running the
+We include the \key{read} operation in $R_0$ so a clever student
+cannot implement a compiler for $R_0$ simply by running the
 interpreter at compilation time to obtain the output and then
 interpreter at compilation time to obtain the output and then
 generating the trivial code to return the output.  (A clever student
 generating the trivial code to return the output.  (A clever student
 did this in a previous version of the course.)
 did this in a previous version of the course.)
@@ -845,6 +840,17 @@ partially evaluating the children nodes.
 \label{fig:pe-arith}
 \label{fig:pe-arith}
 \end{figure}
 \end{figure}
 
 
+Note that in the recursive cases in \code{pe-arith} for negation and
+addition, we have made use of the \key{app} feature of Racket's
+\key{match} to apply a function and bind the result.  Here we use
+\lstinline{(app pe-arith r1)} to recursively apply \texttt{pe-arith}
+to the child node and bind the \emph{result value} to variable
+\texttt{r1}.  The choice of whether to use \key{app} is mainly
+stylistic, although if side effects are involved the change in order
+of evaluation may be in issue.  Further, when we write functions with
+multiple return values, the \key{app} form can be convenient for
+binding the resulting values.
+
 Our code for \texttt{pe-neg} and \texttt{pe-add} implements the simple
 Our code for \texttt{pe-neg} and \texttt{pe-add} implements the simple
 idea of checking whether the inputs are integers and if they are, to
 idea of checking whether the inputs are integers and if they are, to
 go ahead and perform the arithmetic.  Otherwise, we use quasiquote to
 go ahead and perform the arithmetic.  Otherwise, we use quasiquote to
@@ -1004,26 +1010,27 @@ to the variable, then evaluates the body of the \key{let}.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \begin{lstlisting}
 \begin{lstlisting}
+   (define (interp-exp env)
+     (lambda (e)
+       (match e
+         [(? symbol?) (lookup e env)]
+         [`(let ([,x ,(app (interp-exp env) v)]) ,body)
+          (define new-env (cons (cons x v) env))
+          ((interp-exp new-env) body)]
+         [(? fixnum?) e]
+         [`(read)
+          (define r (read))
+          (cond [(fixnum? r) r]
+                [else (error 'interp-R1 "expected an integer" r)])]
+         [`(- ,(app (interp-exp env) v))
+          (fx- 0 v)]
+         [`(+ ,(app (interp-exp env) v1) ,(app (interp-exp env) v2))
+           (fx+ v1 v2)])))
+           
    (define (interp-R1 env)
    (define (interp-R1 env)
-     (define (exp env)
-       (lambda (e)
-         (match e
-           [(? symbol?) (lookup e env)]
-           [`(let ([,x ,(app (exp env) v)]) ,body)
-            (define new-env (cons (cons x v) env))
-            ((exp new-env) body)]
-           [(? fixnum?) e]
-           [`(read)
-            (define r (read))
-            (cond [(fixnum? r) r]
-                  [else (error 'interp-R1 "expected an integer" r)])]
-           [`(- ,(app (exp env) v))
-            (fx- 0 v)]
-           [`(+ ,(app (exp env) v1) ,(app (exp env) v2))
-             (fx+ v1 v2)])))
      (lambda (p)
      (lambda (p)
        (match p
        (match p
-         [`(program ,e) ((exp '()) e)])))
+         [`(program ,e) ((interp-exp '()) e)])))
 \end{lstlisting}
 \end{lstlisting}
 \caption{Interpreter for the $R_1$ language.}
 \caption{Interpreter for the $R_1$ language.}
 \label{fig:interp-R1}
 \label{fig:interp-R1}
@@ -1397,7 +1404,7 @@ $C_0$.
 Each of these steps in the compiler is implemented by a function,
 Each of these steps in the compiler is implemented by a function,
 typically a structurally recursive function that translates an input
 typically a structurally recursive function that translates an input
 AST into an output AST. We refer to such a function as a \emph{pass}
 AST into an output AST. We refer to such a function as a \emph{pass}
-because it makes a pass over, i.e. it traverses the entire AST.
+because it makes a pass over, i.e. it traverses, the entire AST.
 
 
 The syntax for $C_0$ is defined in Figure~\ref{fig:c0-syntax}.  The
 The syntax for $C_0$ is defined in Figure~\ref{fig:c0-syntax}.  The
 $C_0$ language supports the same operators as $R_1$ but the arguments
 $C_0$ language supports the same operators as $R_1$ but the arguments
@@ -1428,23 +1435,24 @@ C_0 & ::= & (\key{program}\;(\Var^{*})\;\Stmt^{+})
 \label{fig:c0-syntax}
 \label{fig:c0-syntax}
 \end{figure}
 \end{figure}
 
 
-To get from $C_0$ to x86 assembly it remains for us to handle
+To get from $C_0$ to x86 assembly, it remains for us to handle
 difference \#1 (the format of instructions) and difference \#3
 difference \#1 (the format of instructions) and difference \#3
-(variables versus registers). These two differences are intertwined,
-creating a bit of a Gordian Knot. To handle difference \#3, we need to
-map some variables to registers (there are only 16 registers) and the
-remaining variables to locations on the stack (which is unbounded). To
-make good decisions regarding this mapping, we need the program to be
-close to its final form (in x86 assembly) so we know exactly when
-which variables are used. After all, variables that are used in
-disjoint parts of the program can be assigned to the same register.
-However, our choice of x86 instructions depends on whether the
-variables are mapped to registers or stack locations, so we have a
-circular dependency. We cut this knot by doing an optimistic selection
-of instructions in the \key{select-instructions} pass, followed by the
-\key{assign-homes} pass to map variables to registers or stack
-locations, and conclude by finalizing the instruction selection in the
-\key{patch-instructions} pass.
+(variables versus stack locations and registers). These two
+differences are intertwined, creating a bit of a Gordian Knot. To
+handle difference \#3, we need to map some variables to registers
+(there are only 16 registers) and the remaining variables to locations
+on the stack (which is unbounded). To make good decisions regarding
+this mapping, we need the program to be close to its final form (in
+x86 assembly) so we know exactly when which variables are used. After
+all, variables that are used at different time periods during program
+execution can be assigned to the same register.  However, our choice
+of x86 instructions depends on whether the variables are mapped to
+registers or stack locations, so we have a circular dependency. We cut
+this knot by doing an optimistic selection of instructions in the
+\key{select-instructions} pass, followed by the \key{assign-homes}
+pass to map variables to registers or stack locations, and conclude by
+finalizing the instruction selection in the \key{patch-instructions}
+pass.
 \[
 \[
 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
 \node (1) at (0,0)  {\large $C_0$};
 \node (1) at (0,0)  {\large $C_0$};