6 years ago · 9d523a447e
--- a/book.tex
+++ b/book.tex
@@ -261,14 +261,16 @@ the compilers memory, rather than programs as they are stored on disk, in
 
				 ASTs can be represented in many different ways, depending on the programming
			
 
				 language used to write the compiler.
			
 
				 %
			
 
				-Because this book uses Racket (\url{http://racket-lang.org}), a descendant of
			
 
				-Scheme, we use S-expressions to represent programs (Section~\ref{sec:ast})
			
 
				-and pattern matching to inspect individual nodes in an AST
			
 
				-(Section~\ref{sec:pattern-matching}).  We use recursion to construct
			
 
				-and deconstruct entire ASTs (Section~\ref{sec:recursion}).
			
 
				-This chapter provides an introduction to these ideas.
			
 
				-
			
 
				-\section{Abstract Syntax Trees}
			
 
				+Because this book uses Racket (\url{http://racket-lang.org}), a
			
 
				+descendant of Lisp, we use S-expressions to represent programs
			
 
				+(Section~\ref{sec:ast}), grammars to defined programming languages
			
 
				+(Section~\ref{sec:grammar}), and pattern matching to inspect
			
 
				+individual nodes in an AST (Section~\ref{sec:pattern-matching}).  We
			
 
				+use recursion to construct and deconstruct entire ASTs
			
 
				+(Section~\ref{sec:recursion}).  This chapter provides an brief
			
 
				+introduction to these ideas.
			
 
				+
			
 
				+\section{Abstract Syntax Trees and S-expressions}
			
 
				 \label{sec:ast}
			
 
				 
			
 
				 The primary data structure that is commonly used for representing
			
@@ -311,11 +313,15 @@ Recall that an \emph{symbolic expression} (S-expression) is either
 
				 \item a pair of two S-expressions, written $(e_1 \key{.} e_2)$,
			
 
				     where $e_1$ and $e_2$ are each an S-expression.
			
 
				 \end{enumerate}
			
 
				-An \emph{atom} can be a symbol, such as \code{'hello}, a number, the null
			
 
				-value \code{'()}, etc. It is quite common to use S-expressions
			
 
				+An \emph{atom} can be a symbol, such as \code{`hello}, a number, the null
			
 
				+value \code{'()}, etc.
			
 
				+We can create an S-expression in Racket simply by writing a backquote
			
 
				+(called a quasi-quote in Racket).
			
 
				+followed by the textual representation of the S-expression.
			
 
				+It is quite common to use S-expressions
			
 
				 to represent a list, such as $a, b ,c$ in the following way:
			
 
				 \begin{lstlisting}
			
 
				-    '(a . (b . (c . ())))
			
 
				+    `(a . (b . (c . ())))
			
 
				 \end{lstlisting}
			
 
				 Each element of the list is in the first slot of a pair, and the
			
 
				 second slot is either the rest of the list or the null value, to mark
			
@@ -323,21 +329,42 @@ the end of the list. Such lists are so common that Racket provides
 
				 special notation for them that removes the need for the periods
			
 
				 and so many parenthesis:
			
 
				 \begin{lstlisting}
			
 
				-    '(a b c)
			
 
				+    `(a b c)
			
 
				 \end{lstlisting}
			
 
				-Thus, the S-expression of \eqref{eq:arith-prog} is a list whose first
			
 
				-element is the symbol \code{'+}, whose second element is a list
			
 
				-(containing just one element, the symbol \code{read}), and whose third
			
 
				-element is another list (containing two atoms).
			
 
				+For another example,
			
 
				+an S-expression to represent the AST \eqref{eq:arith-prog} is created
			
 
				+by the following Racket expression:
			
 
				+\begin{center}
			
 
				+\texttt{`(+ (read) (- 8))}
			
 
				+\end{center}
			
 
				+The result is a list whose first element is the symbol \code{`+},
			
 
				+second element is a list (containing just one symbol), and third
			
 
				+element is another list (containing a symbol and a number).
			
 
				+
			
 
				+To build larger S-expressions one often needs to splice together
			
 
				+several smaller S-expressions. Racket provides the comma operator to
			
 
				+splice an S-expression into a larger one. For example, instead of
			
 
				+creating the S-expression for AST \eqref{eq:arith-prog} all at once,
			
 
				+we could have first created an S-expression for AST
			
 
				+\eqref{eq:arith-neg8} and then spliced that into the addition
			
 
				+S-expression.
			
 
				+\begin{lstlisting}
			
 
				+   (define ast1.4 `(- 8))
			
 
				+   (define ast1.1 `(+ (read) ,ast1.4))
			
 
				+\end{lstlisting}
			
 
				+In general, the Racket expression that follows the comma (splice)
			
 
				+can be any expression that computes an S-expression.
			
 
				+
			
 
				 
			
 
				-When deciding how to compile the above program, we need to know that
			
 
				-the root node operation is addition and that it has two children:
			
 
				-\texttt{read} and a negation. The abstract syntax tree data structure
			
 
				-directly supports these queries and hence is a good choice. In this
			
 
				-book, we will often write down the textual representation of a program
			
 
				-even when we really have in mind the AST because the textual
			
 
				-representation is more concise.  We recommend that, in your mind, you
			
 
				-always interpret programs as abstract syntax trees.
			
 
				+When deciding how to compile program \eqref{eq:arith-prog}, we need to
			
 
				+know that the operation associated with the root node is addition and
			
 
				+that it has two children: \texttt{read} and a negation. The AST data
			
 
				+structure directly supports these queries, as we shall see in
			
 
				+Section~\ref{sec:pattern-matching}, and hence is a good choice for use
			
 
				+in compilers. In this book, we will often write down the S-expression
			
 
				+representation of a program even when we really have in mind the AST
			
 
				+because the S-expression is more concise.  We recommend that, in your
			
 
				+mind, you always think of programs as abstract syntax trees.
			
 
				 
			
 
				 \section{Grammars}
			
 
				 \label{sec:grammar}
			
@@ -431,8 +458,8 @@ language with a grammar, we implicitly mean for the language to be the
 
				 smallest set of programs that are justified by the rules. That is, the
			
 
				 language only includes those programs that the rules allow.
			
 
				 
			
 
				-The last grammar for $R_0$ states that there is a \key{program} node
			
 
				-to mark the top of the whole program:
			
 
				+The last grammar rule for $R_0$ states that there is a \key{program}
			
 
				+node to mark the top of the whole program:
			
 
				 \[
			
 
				   R_0 ::= (\key{program} \; \Exp)
			
 
				 \]
			
@@ -467,34 +494,8 @@ R_0  &::=& (\key{program} \; \Exp)
 
				 \label{fig:r0-syntax}
			
 
				 \end{figure}
			
 
				 
			
 
				-\section{S-Expressions}
			
 
				-\label{sec:s-expr}
			
 
				 
			
 
				-Racket, as a descendant of Lisp, has
			
 
				-convenient support for creating and manipulating abstract syntax trees
			
 
				-with its \emph{symbolic expression} feature, or S-expression for
			
 
				-short. We can create an S-expression simply by writing a backquote
			
 
				-followed by the textual representation of the AST. (Technically
			
 
				-speaking, this is called a \emph{quasiquote} in Racket.)  For example,
			
 
				-an S-expression to represent the AST \eqref{eq:arith-prog} is created
			
 
				-by the following Racket expression:
			
 
				-\begin{center}
			
 
				-\texttt{`(+ (read) (- 8))}
			
 
				-\end{center}
			
 
				 
			
 
				-To build larger S-expressions one often needs to splice together
			
 
				-several smaller S-expressions. Racket provides the comma operator to
			
 
				-splice an S-expression into a larger one. For example, instead of
			
 
				-creating the S-expression for AST \eqref{eq:arith-prog} all at once,
			
 
				-we could have first created an S-expression for AST
			
 
				-\eqref{eq:arith-neg8} and then spliced that into the addition
			
 
				-S-expression.
			
 
				-\begin{lstlisting}
			
 
				-   (define ast1.4 `(- 8))
			
 
				-   (define ast1.1 `(+ (read) ,ast1.4))
			
 
				-\end{lstlisting}
			
 
				-In general, the Racket expression that follows the comma (splice)
			
 
				-can be any expression that computes an S-expression.
			
 
				 
			
 
				 \section{Pattern Matching}
			
 
				 \label{sec:pattern-matching}
			
@@ -529,7 +530,7 @@ The \texttt{match} form takes AST \eqref{eq:arith-prog} and binds its
 
				 parts to the three variables \texttt{op}, \texttt{child1}, and
			
 
				 \texttt{child2}. In general, a match clause consists of a
			
 
				 \emph{pattern} and a \emph{body}. The pattern is a quoted S-expression
			
 
				-that may contain pattern-variables (preceded by a comma).
			
 
				+that may contain pattern-variables (each one preceded by a comma).
			
 
				 %
			
 
				 The pattern is not the same thing as a quasiquote expression used to
			
 
				 \emph{construct} ASTs, however, the similarity is intentional: constructing and
			
@@ -565,7 +566,8 @@ S-expression to see if it is a machine-representable integer.
 
				 \end{minipage}
			
 
				 \vrule
			
 
				 \begin{minipage}{0.25\textwidth}
			
 
				-\begin{lstlisting}
			
 
				+  \begin{lstlisting}
			
 
				+    
			
 
				 
			
 
				 
			
 
				 
			
@@ -583,31 +585,34 @@ S-expression to see if it is a machine-representable integer.
 
				 \section{Recursion}
			
 
				 \label{sec:recursion}
			
 
				 
			
 
				-Programs are inherently recursive in that an $R_0$ $\Exp$ AST is made
			
 
				-up of smaller expressions. Thus, the natural way to process in
			
 
				+Programs are inherently recursive in that an $R_0$ expression ($\Exp$)
			
 
				+is made up of smaller expressions. Thus, the natural way to process an
			
 
				 entire program is with a recursive function.  As a first example of
			
 
				-such a function, we define \texttt{R0?} below, which takes an
			
 
				+such a function, we define \texttt{exp?} below, which takes an
			
 
				 arbitrary S-expression, {\tt sexp}, and determines whether or not {\tt
			
 
				-  sexp} is in {\tt arith}. Note that each match clause corresponds to
			
 
				-one grammar rule for $R_0$ and the body of each clause makes a
			
 
				+  sexp} is an $R_0$ expression. Note that each match clause
			
 
				+corresponds to one grammar rule the body of each clause makes a
			
 
				 recursive call for each child node. This pattern of recursive function
			
 
				 is so common that it has a name, \emph{structural recursion}.  In
			
 
				 general, when a recursive function is defined using a sequence of
			
 
				 match clauses that correspond to a grammar, and each clause body makes
			
 
				 a recursive call on each child node, then we say the function is
			
 
				-defined by structural recursion.
			
 
				+defined by structural recursion. Below we also define a second
			
 
				+function, named \code{R0?}, determines whether an S-expression is an
			
 
				+$R_0$ program.
			
 
				 %
			
 
				 \begin{center}
			
 
				 \begin{minipage}{0.7\textwidth}
			
 
				 \begin{lstlisting}
			
 
				+(define (exp? sexp)
			
 
				+  (match sexp
			
 
				+    [(? fixnum?) #t]
			
 
				+    [`(read) #t]
			
 
				+    [`(- ,e) (exp? e)]
			
 
				+    [`(+ ,e1 ,e2)
			
 
				+     (and (exp? e1) (exp? e2))]))  
			
 
				+
			
 
				 (define (R0? sexp)
			
 
				-  (define (exp? ex)
			
 
				-    (match ex
			
 
				-      [(? fixnum?) #t]
			
 
				-      [`(read) #t]
			
 
				-      [`(- ,e) (exp? e)]
			
 
				-      [`(+ ,e1 ,e2)
			
 
				-       (and (exp? e1) (exp? e2))]))  
			
 
				   (match sexp
			
 
				     [`(program ,e) (exp? e)]    
			
 
				     [else #f]))
			
@@ -637,11 +642,11 @@ defined by structural recursion.
 
				 
			
 
				 Indeed, the structural recursion follows the grammar itself.  We can generally
			
 
				 expect to write a recursive function to handle each non-terminal in the
			
 
				-grammar\footnote{If you took the \emph{How to Design Programs} course
			
 
				+grammar.\footnote{If you read the book \emph{How to Design Programs} 
			
 
				   \url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}, this principle of
			
 
				   structuring code according to the data definition is probably quite familiar.}
			
 
				 
			
 
				-You may be tempted to write the program like this:
			
 
				+You may be tempted to write the program with just one function, like this:
			
 
				 \begin{center}
			
 
				 \begin{minipage}{0.5\textwidth}
			
 
				 \begin{lstlisting}
			
@@ -659,7 +664,7 @@ You may be tempted to write the program like this:
 
				 %
			
 
				 Sometimes such a trick will save a few lines of code, especially when it comes
			
 
				 to the {\tt program} wrapper.  Yet this style is generally \emph{not}
			
 
				-recommended, because it can get you into trouble.
			
 
				+recommended because it can get you into trouble.
			
 
				 %
			
 
				 For instance, the above function is subtly wrong:
			
 
				 \lstinline{(R0? `(program (program 3)))} will return true, when it
			
@@ -677,13 +682,13 @@ defined in the report by \cite{SPERBER:2009aa}. The Racket language is
 
				 defined in its reference manual~\citep{plt-tr}. In this book we use an
			
 
				 interpreter to define the meaning of each language that we consider,
			
 
				 following Reynold's advice in this
			
 
				-regard~\citep{reynolds72:_def_interp}. Here we will warm up by writing
			
 
				-an interpreter for the $R_0$ language, which will also serve as a
			
 
				-second example of structural recursion. The \texttt{interp-R0}
			
 
				-function is defined in Figure~\ref{fig:interp-R0}. The body of the
			
 
				-function is a match on the input program \texttt{p} and
			
 
				-then a call to the \lstinline{exp} helper function, which in turn has 
			
 
				-one match clause per grammar rule for $R_0$ expressions.
			
 
				+regard~\citep{reynolds72:_def_interp}. Here we warm up by writing an
			
 
				+interpreter for the $R_0$ language, which serves as a second example
			
 
				+of structural recursion. The \texttt{interp-R0} function is defined in
			
 
				+Figure~\ref{fig:interp-R0}. The body of the function is a match on the
			
 
				+input program \texttt{p} and then a call to the \lstinline{exp} helper
			
 
				+function, which in turn has one match clause per grammar rule for
			
 
				+$R_0$ expressions.
			
 
				 
			
 
				 The \lstinline{exp} function is naturally recursive: clauses for internal AST
			
 
				 nodes make recursive calls on each child node.  Note that the recursive cases
			
@@ -727,8 +732,8 @@ values, the \key{app} form can be convenient for binding the resulting values.
 
				 \label{fig:interp-R0}
			
 
				 \end{figure}
			
 
				 
			
 
				-Let us consider the result of interpreting some example $R_0$
			
 
				-programs. The following program simply adds two integers.
			
 
				+Let us consider the result of interpreting a few $R_0$ programs. The
			
 
				+following program simply adds two integers.
			
 
				 \begin{lstlisting}
			
 
				    (+ 10 32)
			
 
				 \end{lstlisting}
			
@@ -760,11 +765,11 @@ produces \key{42}.
 
				 \begin{lstlisting}
			
 
				    (+ (read) 32)
			
 
				 \end{lstlisting}
			
 
				-We include the \key{read} operation in $R_1$ so that a compiler for
			
 
				-$R_1$ cannot be implemented simply by running the interpreter at
			
 
				-compilation time to obtain the output and then generating the trivial
			
 
				-code to return the output.
			
 
				-(A clever did this in a previous version of the course.)
			
 
				+We include the \key{read} operation in $R_1$ so a clever student
			
 
				+cannot implement a compiler for $R_1$ simply by running the
			
 
				+interpreter at compilation time to obtain the output and then
			
 
				+generating the trivial code to return the output.  (A clever student
			
 
				+did this in a previous version of the course.)
			
 
				 
			
 
				 The job of a compiler is to translate a program in one language into a
			
 
				 program in another language so that the output program behaves the