浏览代码

updated chapter 1 prelims

Jeremy Siek 4 年之前
父节点
当前提交
97317caa66
共有 2 个文件被更改,包括 230 次插入174 次删除
  1. 223 170
      book.tex
  2. 7 4
      defs.tex

+ 223 - 170
book.tex

@@ -78,6 +78,7 @@
 language=Lisp,
 language=Lisp,
 basicstyle=\ttfamily\small,
 basicstyle=\ttfamily\small,
 morekeywords={seq,assign,program,block,define,lambda,match},
 morekeywords={seq,assign,program,block,define,lambda,match},
+deletekeywords={read},
 escapechar=|,
 escapechar=|,
 columns=flexible,
 columns=flexible,
 moredelim=[is][\color{red}]{~}{~}
 moredelim=[is][\color{red}]{~}{~}
@@ -302,8 +303,7 @@ typically stored in text files on disk, as \emph{concrete syntax}.
 ASTs can be represented in many different ways, depending on the programming
 ASTs can be represented in many different ways, depending on the programming
 language used to write the compiler.
 language used to write the compiler.
 %
 %
-Because this book uses Racket (\url{http://racket-lang.org}), a
-descendant of Lisp, we can use S-expressions to conveniently represent
+We use Racket's \code{struct} feature to conveniently represent
 ASTs (Section~\ref{sec:ast}). We use grammars to defined the abstract
 ASTs (Section~\ref{sec:ast}). We use grammars to defined the abstract
 syntax of programming languages (Section~\ref{sec:grammar}) and
 syntax of programming languages (Section~\ref{sec:grammar}) and
 pattern matching to inspect individual nodes in an AST
 pattern matching to inspect individual nodes in an AST
@@ -311,15 +311,14 @@ pattern matching to inspect individual nodes in an AST
 and deconstruct entire ASTs (Section~\ref{sec:recursion}).  This
 and deconstruct entire ASTs (Section~\ref{sec:recursion}).  This
 chapter provides an brief introduction to these ideas.
 chapter provides an brief introduction to these ideas.
 
 
-\section{Abstract Syntax Trees and S-expressions}
+\section{Abstract Syntax Trees and Racket Structures}
 \label{sec:ast}
 \label{sec:ast}
 
 
 The primary data structure that is commonly used for representing
 The primary data structure that is commonly used for representing
 programs is the \emph{abstract syntax tree} (AST). When considering
 programs is the \emph{abstract syntax tree} (AST). When considering
 some part of a program, a compiler needs to ask what kind of thing it
 some part of a program, a compiler needs to ask what kind of thing it
 is and what sub-parts it contains. For example, the program on the
 is and what sub-parts it contains. For example, the program on the
-left, represented by an S-expression, corresponds to the AST on the
-right.
+left corresponds to the AST on the right.
 \begin{center}
 \begin{center}
 \begin{minipage}{0.4\textwidth}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
@@ -349,64 +348,98 @@ node except for the root has a \emph{parent} (the node it is the child
 of). If a node has no children, it is a \emph{leaf} node.  Otherwise
 of). If a node has no children, it is a \emph{leaf} node.  Otherwise
 it is an \emph{internal} node.
 it is an \emph{internal} node.
 
 
-Recall that an \emph{symbolic expression} (S-expression) is either
-\begin{enumerate}
-\item an atom, or
-\item a pair of two S-expressions, written $(e_1 \key{.} e_2)$,
-    where $e_1$ and $e_2$ are each an S-expression.
-\end{enumerate}
-An \emph{atom} can be a symbol, such as \code{`hello}, a number, the
-null value \code{'()}, etc.  We can create an S-expression in Racket
-simply by writing a backquote (called a quasi-quote in Racket)
-followed by the textual representation of the S-expression.  It is
-quite common to use S-expressions to represent a list, such as $a, b
-,c$ in the following way:
-\begin{lstlisting}
-`(a . (b . (c . ())))
-\end{lstlisting}
-Each element of the list is in the first slot of a pair, and the
-second slot is either the rest of the list or the null value, to mark
-the end of the list. Such lists are so common that Racket provides
-special notation for them that removes the need for the periods
-and so many parenthesis:
-\begin{lstlisting}
-`(a b c)
-\end{lstlisting}
-The following expression creates an S-expression that represents AST
-\eqref{eq:arith-prog}.
-\begin{lstlisting}
-`(+ (read) (- 8))
-\end{lstlisting}
-When using S-expressions to represent ASTs, the convention is to
-represent each AST node as a list and to put the operation symbol at
-the front of the list. The rest of the list contains the children.  So
-in the above case, the root AST node has operation \code{`+} and its
-two children are \code{`(read)} and \code{`(- 8)}, just as in the
-diagram \eqref{eq:arith-prog}.
-
-To build larger S-expressions one often needs to splice together
-several smaller S-expressions. Racket provides the comma operator to
-splice an S-expression into a larger one. For example, instead of
-creating the S-expression for AST \eqref{eq:arith-prog} all at once,
-we could have first created an S-expression for AST
-\eqref{eq:arith-neg8} and then spliced that into the addition
-S-expression.
+%% Recall that an \emph{symbolic expression} (S-expression) is either
+%% \begin{enumerate}
+%% \item an atom, or
+%% \item a pair of two S-expressions, written $(e_1 \key{.} e_2)$,
+%%     where $e_1$ and $e_2$ are each an S-expression.
+%% \end{enumerate}
+%% An \emph{atom} can be a symbol, such as \code{`hello}, a number, the
+%% null value \code{'()}, etc.  We can create an S-expression in Racket
+%% simply by writing a backquote (called a quasi-quote in Racket)
+%% followed by the textual representation of the S-expression.  It is
+%% quite common to use S-expressions to represent a list, such as $a, b
+%% ,c$ in the following way:
+%% \begin{lstlisting}
+%% `(a . (b . (c . ())))
+%% \end{lstlisting}
+%% Each element of the list is in the first slot of a pair, and the
+%% second slot is either the rest of the list or the null value, to mark
+%% the end of the list. Such lists are so common that Racket provides
+%% special notation for them that removes the need for the periods
+%% and so many parenthesis:
+%% \begin{lstlisting}
+%% `(a b c)
+%% \end{lstlisting}
+%% The following expression creates an S-expression that represents AST
+%% \eqref{eq:arith-prog}.
+%% \begin{lstlisting}
+%% `(+ (read) (- 8))
+%% \end{lstlisting}
+%% When using S-expressions to represent ASTs, the convention is to
+%% represent each AST node as a list and to put the operation symbol at
+%% the front of the list. The rest of the list contains the children.  So
+%% in the above case, the root AST node has operation \code{`+} and its
+%% two children are \code{`(read)} and \code{`(- 8)}, just as in the
+%% diagram \eqref{eq:arith-prog}.
+
+%% To build larger S-expressions one often needs to splice together
+%% several smaller S-expressions. Racket provides the comma operator to
+%% splice an S-expression into a larger one. For example, instead of
+%% creating the S-expression for AST \eqref{eq:arith-prog} all at once,
+%% we could have first created an S-expression for AST
+%% \eqref{eq:arith-neg8} and then spliced that into the addition
+%% S-expression.
+%% \begin{lstlisting}
+%% (define ast1.4 `(- 8))
+%% (define ast1.1 `(+ (read) ,ast1.4))
+%% \end{lstlisting}
+%% In general, the Racket expression that follows the comma (splice)
+%% can be any expression that produces an S-expression.
+
+We define a Racket \code{struct} for each kind of node. For this
+chapter we require just two kinds of nodes: one for integer constants
+and one for primitive operations. The following is the \code{struct}
+definition for integer constants.
+\begin{lstlisting}
+(struct Int (value))
+\end{lstlisting}
+An integer node includes just one thing: the integer value.
+To create a AST node for the integer $8$, we write \code{(Int 8)}.
+\begin{lstlisting}
+(define eight (Int 8))
+\end{lstlisting}
+The following is the \code{struct} definition for primitives operations.
+\begin{lstlisting}
+(struct Prim (op arg*))
+\end{lstlisting}
+A primitive operation node includes an operator symbol \code{op}
+and a list of children \code{arg*}. For example, to create
+an AST that negates the number $8$, we write \code{(Prim '- (list eight))}.
+\begin{lstlisting}
+(define neg-eight (Prim '- (list eight)))
+\end{lstlisting}
+Primitive operations may have zero or more children. The \code{read}
+operator has zero children:
+\begin{lstlisting}
+(define rd (Prim 'read '()))
+\end{lstlisting}
+whereas the addition operator has two children:
 \begin{lstlisting}
 \begin{lstlisting}
-(define ast1.4 `(- 8))
-(define ast1.1 `(+ (read) ,ast1.4))
+(define ast1.1 (Prim '+ (list rd neg-eight)))
 \end{lstlisting}
 \end{lstlisting}
-In general, the Racket expression that follows the comma (splice)
-can be any expression that produces an S-expression.
 
 
 When deciding how to compile program \eqref{eq:arith-prog}, we need to
 When deciding how to compile program \eqref{eq:arith-prog}, we need to
 know that the operation associated with the root node is addition and
 know that the operation associated with the root node is addition and
 that it has two children: \texttt{read} and a negation. The AST data
 that it has two children: \texttt{read} and a negation. The AST data
 structure directly supports these queries, as we shall see in
 structure directly supports these queries, as we shall see in
 Section~\ref{sec:pattern-matching}, and hence is a good choice for use
 Section~\ref{sec:pattern-matching}, and hence is a good choice for use
-in compilers. In this book, we often write down the S-expression
-representation of a program even when we really have in mind the AST
-because the S-expression is more concise.  We recommend that, in your
-mind, you always think of programs as abstract syntax trees.
+in compilers.
+
+In this book, we often write down the concrete syntax of a program
+even when we really have in mind the AST because the concrete syntax
+is more concise.  We recommend that, in your mind, you always think of
+programs as abstract syntax trees.
 
 
 \section{Grammars}
 \section{Grammars}
 \label{sec:grammar}
 \label{sec:grammar}
@@ -415,13 +448,17 @@ A programming language can be thought of as a \emph{set} of programs.
 The set is typically infinite (one can always create larger and larger
 The set is typically infinite (one can always create larger and larger
 programs), so one cannot simply describe a language by listing all of
 programs), so one cannot simply describe a language by listing all of
 the programs in the language. Instead we write down a set of rules, a
 the programs in the language. Instead we write down a set of rules, a
-\emph{grammar}, for building programs. We shall write our rules in a
-variant of Backus-Naur Form (BNF)~\citep{Backus:1960aa,Knuth:1964aa}.
-As an example, we describe a small language, named $R_0$, that
-consists of integers and arithmetic operations. The first grammar rule
-says that any integer is an expression:
+\emph{grammar}, for building programs. Grammars are often used to
+define the concrete syntax of a language, but they can also be used to
+describe the abstract syntax. We shall write our rules in a variant of
+Backus-Naur Form (BNF)~\citep{Backus:1960aa,Knuth:1964aa}.  As an
+example, we describe a small language, named $R_0$, that consists of
+integers and arithmetic operations.
+
+The first grammar rule says that given any integer $n$, an integer
+node $\INT{n}$ is an expression:
 \begin{equation}
 \begin{equation}
-\Exp ::= \Int  \label{eq:arith-int}
+\Exp ::= \INT{n}  \label{eq:arith-int}
 \end{equation}
 \end{equation}
 %
 %
 Each rule has a left-hand-side and a right-hand-side. The way to read
 Each rule has a left-hand-side and a right-hand-side. The way to read
@@ -432,40 +469,39 @@ according to the left-hand-side.
 A name such as $\Exp$ that is
 A name such as $\Exp$ that is
 defined by the grammar rules is a \emph{non-terminal}.
 defined by the grammar rules is a \emph{non-terminal}.
 %
 %
-The name $\Int$ is a also a non-terminal, however, we do not define
-$\Int$ because the reader already knows what an integer is.
-%
-Further, we make the simplifying design decision that all of the languages in
-this book only handle machine-representable integers.  On most modern machines
-this corresponds to integers represented with 64-bits, i.e., the in range
-$-2^{63}$ to $2^{63}-1$.
-%
-However, we restrict this range further to match the Racket \texttt{fixnum}
-datatype, which allows 63-bit integers on a 64-bit machine.
+%% The name $\Int$ is a also a non-terminal, however, we do not define
+%% $\Int$ because the reader already knows what an integer is.
+
+We make the simplifying design decision that all of the languages in
+this book only handle machine-representable integers.  On most modern
+machines this corresponds to integers represented with 64-bits, i.e.,
+the in range $-2^{63}$ to $2^{63}-1$.  We restrict this range further
+to match the Racket \texttt{fixnum} datatype, which allows 63-bit
+integers on a 64-bit machine.
 
 
 The second grammar rule is the \texttt{read} operation that receives
 The second grammar rule is the \texttt{read} operation that receives
 an input integer from the user of the program.
 an input integer from the user of the program.
 \begin{equation}
 \begin{equation}
-  \Exp ::= (\key{read}) \label{eq:arith-read}
+  \Exp ::= \READ{} \label{eq:arith-read}
 \end{equation}
 \end{equation}
 
 
 The third rule says that, given an $\Exp$ node, you can build another
 The third rule says that, given an $\Exp$ node, you can build another
 $\Exp$ node by negating it.
 $\Exp$ node by negating it.
 \begin{equation}
 \begin{equation}
-  \Exp ::= (\key{-} \; \Exp)  \label{eq:arith-neg}
+  \Exp ::= \NEG{\Exp}  \label{eq:arith-neg}
 \end{equation}
 \end{equation}
 Symbols in typewriter font such as \key{-} and \key{read} are
 Symbols in typewriter font such as \key{-} and \key{read} are
 \emph{terminal} symbols and must literally appear in the program for
 \emph{terminal} symbols and must literally appear in the program for
 the rule to be applicable.
 the rule to be applicable.
 
 
 We can apply the rules to build ASTs in the $R_0$
 We can apply the rules to build ASTs in the $R_0$
-language. For example, by rule \eqref{eq:arith-int}, \texttt{8} is an
+language. For example, by rule \eqref{eq:arith-int}, \texttt{(Int 8)} is an
 $\Exp$, then by rule \eqref{eq:arith-neg}, the following AST is
 $\Exp$, then by rule \eqref{eq:arith-neg}, the following AST is
 an $\Exp$.
 an $\Exp$.
 \begin{center}
 \begin{center}
-\begin{minipage}{0.25\textwidth}
+\begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-(- 8)
+(Prim '- (list (Int 8)))
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 \begin{minipage}{0.25\textwidth}
 \begin{minipage}{0.25\textwidth}
@@ -483,26 +519,37 @@ an $\Exp$.
 
 
 The next grammar rule defines addition expressions:
 The next grammar rule defines addition expressions:
 \begin{equation}
 \begin{equation}
-  \Exp ::= (\key{+} \; \Exp \; \Exp) \label{eq:arith-add}
+  \Exp ::= \ADD{\Exp}{\Exp} \label{eq:arith-add}
 \end{equation}
 \end{equation}
-We can now see that the AST \eqref{eq:arith-prog} is an $\Exp$ in
-$R_0$.  We know that \lstinline{(read)} is an $\Exp$ by rule
-\eqref{eq:arith-read} and we have shown that \texttt{(- 8)} is an
-$\Exp$, so we can apply rule \eqref{eq:arith-add} to show that
-\texttt{(+ (read) (- 8))} is an $\Exp$ in the $R_0$ language.
+We can now justify that the AST \eqref{eq:arith-prog} is an $\Exp$ in
+$R_0$.  We know that \lstinline{(Prim 'read '())} is an $\Exp$ by rule
+\eqref{eq:arith-read} and we have already shown that \code{(Prim '-
+  (list (Int 8)))} is an $\Exp$, so we apply rule \eqref{eq:arith-add}
+to show that
+\begin{lstlisting}
+(Prim '+ (list (Prim 'read '()) (Prim '- (list (Int 8)))))
+\end{lstlisting}
+is an $\Exp$ in the $R_0$ language.
 
 
 If you have an AST for which the above rules do not apply, then the
 If you have an AST for which the above rules do not apply, then the
-AST is not in $R_0$. For example, the AST \texttt{(- (read) (+ 8))} is
-not in $R_0$ because there are no rules for \key{+} with only one
-argument, nor for \key{-} with two arguments.  Whenever we define a
-language with a grammar, we mean for the language to only include
-those programs that are justified by the rules.
+AST is not in $R_0$. For example, the program \code{(- (read) (+ 8))}
+is not in $R_0$ because there are no rules for \code{+} with only one
+argument, nor for \key{-} with two arguments. Whenever we define a
+language with a grammar, the language only includes those programs
+that are justified by the rules.
 
 
-The last grammar rule for $R_0$ states that there is a \key{program}
+The last grammar rule for $R_0$ states that there is a \code{Program}
 node to mark the top of the whole program:
 node to mark the top of the whole program:
 \[
 \[
-  R_0 ::= (\key{program} \; \Exp)
+  R_0 ::= \PROGRAM{\code{'()}}{\Exp}
 \]
 \]
+The \code{Program} structure is defined as follows
+\begin{lstlisting}
+(struct Program (info body))
+\end{lstlisting}
+where \code{body} is an expression. In later chapters, the \code{info}
+part will be used to store auxilliary information but for now it is
+just the empty list.
 
 
 The \code{read-program} function provided in \code{utilities.rkt}
 The \code{read-program} function provided in \code{utilities.rkt}
 reads programs in from a file (the sequence of characters in the
 reads programs in from a file (the sequence of characters in the
@@ -523,9 +570,9 @@ called an {\em alternative}.
 \begin{minipage}{0.96\textwidth}
 \begin{minipage}{0.96\textwidth}
 \[
 \[
 \begin{array}{rcl}
 \begin{array}{rcl}
-\Exp &::=& \Int \mid ({\tt \key{read}}) \mid (\key{-} \; \Exp) \mid
-   (\key{+} \; \Exp \; \Exp)  \\
-R_0  &::=& (\key{program} \; \Exp)
+\Exp &::=& \INT{n} \mid \READ{} \mid \NEG{\Exp} \\
+     &\mid&  \ADD{\Exp}{\Exp}  \\
+R_0  &::=& \code{(Program} \; \code{'()}\; \Exp \code{)}
 \end{array}
 \end{array}
 \]
 \]
 \end{minipage}
 \end{minipage}
@@ -542,16 +589,14 @@ R_0  &::=& (\key{program} \; \Exp)
 
 
 As mentioned above, compilers often need to access the children of an
 As mentioned above, compilers often need to access the children of an
 AST node. Racket provides the \texttt{match} form to access the parts
 AST node. Racket provides the \texttt{match} form to access the parts
-of an S-expression. Consider the following example and the output on
-the right.
+of a structure. Consider the following example and the output on the
+right.
 \begin{center}
 \begin{center}
 \begin{minipage}{0.5\textwidth}
 \begin{minipage}{0.5\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
 (match ast1.1
 (match ast1.1
-  [`(,op ,child1 ,child2)
-    (print op) (newline)
-    (print child1) (newline)
-    (print child2)])
+  [(Prim op (list child1 child2))
+    (print op)])
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 \vrule
 \vrule
@@ -560,8 +605,6 @@ the right.
 
 
 
 
    '+
    '+
-   '(read)
-   '(- 8)
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 \end{center}
 \end{center}
@@ -581,25 +624,22 @@ clause may contain any Racket code whatsoever.
 A \code{match} form may contain several clauses, as in the following
 A \code{match} form may contain several clauses, as in the following
 function \code{leaf?} that recognizes when an $R_0$ node is
 function \code{leaf?} that recognizes when an $R_0$ node is
 a leaf. The \code{match} proceeds through the clauses in order,
 a leaf. The \code{match} proceeds through the clauses in order,
-checking whether the pattern can match the input S-expression. The
+checking whether the pattern can match the input AST. The
 body of the first clause that matches is executed. The output of
 body of the first clause that matches is executed. The output of
-\code{leaf?} for several S-expressions is shown on the right. In the
-below \code{match}, we see another form of pattern: the
-pattern \code{(? fixnum?)} applies the predicate \code{fixnum?} to the input
-S-expression to see if it is a machine-representable integer.
+\code{leaf?} for several ASTs is shown on the right.
 \begin{center}
 \begin{center}
-\begin{minipage}{0.5\textwidth}
+\begin{minipage}{0.6\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
 (define (leaf? arith)
 (define (leaf? arith)
   (match arith
   (match arith
-    [(? fixnum?) #t]
-    [`(read) #t]
-    [`(- ,c1) #f]
-    [`(+ ,c1 ,c2) #f]))
+    [(Int n) #t]
+    [(Prim 'read '()) #t]
+    [(Prim '- (list c1)) #f]
+    [(Prim '+ (list c1 c2)) #f]))
 
 
-(leaf? `(read))
-(leaf? `(- 8))
-(leaf? `(+ (read) (- 8)))
+(leaf? (Prim 'read '()))
+(leaf? (Prim '- (list (Int 8))))
+(leaf? (Int 8))
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 \vrule
 \vrule
@@ -611,10 +651,10 @@ S-expression to see if it is a machine-representable integer.
 
 
 
 
 
 
-
+    
    #t
    #t
    #f
    #f
-   #f
+   #t
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 \end{center}
 \end{center}
@@ -626,14 +666,13 @@ match against, then we make sure that 1) we have one clause for each
 alternative of that non-terminal and 2) that the pattern in each
 alternative of that non-terminal and 2) that the pattern in each
 clause corresponds to the corresponding right-hand side of a grammar
 clause corresponds to the corresponding right-hand side of a grammar
 rule. For the \code{match} in the \code{leaf?} function, we refer to
 rule. For the \code{match} in the \code{leaf?} function, we refer to
-the grammar for $R\_0$ in Figure~\ref{fig:r0-syntax}. The $\Exp$
+the grammar for $R_0$ in Figure~\ref{fig:r0-syntax}. The $\Exp$
 non-terminal has 4 alternatives, so the \code{match} has 4 clauses.
 non-terminal has 4 alternatives, so the \code{match} has 4 clauses.
 The pattern in each clause corresponds to the right-hand side of a
 The pattern in each clause corresponds to the right-hand side of a
-grammar rule. For example, the pattern \code{`(+ ,c1 ,c2)} corresponds
-to the right-hand side $(\key{+} \; \Exp \; \Exp)$. When translating
+grammar rule. For example, the pattern \code{(Prim '+ (list c1 c2))}
+corresponds to the right-hand side $\ADD{\Exp}{\Exp}$. When translating
 from grammars to patterns, replace non-terminals such as $\Exp$ with
 from grammars to patterns, replace non-terminals such as $\Exp$ with
-pattern variables (a comma followed by a variable name of your
-choice).
+pattern variables (e.g. \code{c1} and \code{c2}).
 
 
 
 
 \section{Recursion}
 \section{Recursion}
@@ -662,22 +701,24 @@ one recursive function to handle each non-terminal in the grammar.
 \begin{center}
 \begin{center}
 \begin{minipage}{0.7\textwidth}
 \begin{minipage}{0.7\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-(define (exp? sexp)
-  (match sexp
-    [(? fixnum?) #t]
-    [`(read) #t]
-    [`(- ,e) (exp? e)]
-    [`(+ ,e1 ,e2)
+(define (exp? ast)
+  (match ast
+    [(Int n) #t]
+    [(Prim 'read '()) #t]
+    [(Prim '- (list e)) (exp? e)]
+    [(Prim '+ (list e1 e2))
       (and (exp? e1) (exp? e2))]
       (and (exp? e1) (exp? e2))]
     [else #f]))
     [else #f]))
 
 
-(define (R0? sexp)
-  (match sexp
-    [`(program ,e) (exp? e)]
+(define (R0? ast)
+  (match ast
+    [(Program '() e) (exp? e)]
     [else #f]))
     [else #f]))
 
 
-(R0? `(program (+ (read) (- 8))))
-(R0? `(program (- (read) (+ 8))))
+(R0? (Program '() ast1.1)
+(R0? (Program '()
+       (Prim '- (list (Prim 'read '())
+                      (Prim '+ (list (Num 8)))))))
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 \vrule
 \vrule
@@ -696,7 +737,6 @@ one recursive function to handle each non-terminal in the grammar.
 
 
 
 
 
 
-
    #t
    #t
    #f
    #f
 \end{lstlisting}
 \end{lstlisting}
@@ -708,13 +748,13 @@ You may be tempted to merge the two functions into one, like this:
 \begin{center}
 \begin{center}
 \begin{minipage}{0.5\textwidth}
 \begin{minipage}{0.5\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-(define (R0? sexp)
-  (match sexp
-    [(? fixnum?) #t]
-    [`(read) #t]
-    [`(- ,e) (R0? e)]
-    [`(+ ,e1 ,e2) (and (R0? e1) (R0? e2))]
-    [`(program ,e) (R0? e)]
+(define (R0? ast)
+  (match ast
+    [(Int n) #t]
+    [(Prim 'read '()) #t]
+    [(Prim '- (list e)) (R0? e)]
+    [(Prim '+ (list e1 e2)) (and (R0? e1) (R0? e2))]
+    [(Program '() e) (R0? e)]
     [else #f]))
     [else #f]))
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
@@ -725,7 +765,7 @@ to the {\tt program} wrapper.  Yet this style is generally \emph{not}
 recommended because it can get you into trouble.
 recommended because it can get you into trouble.
 %
 %
 For instance, the above function is subtly wrong:
 For instance, the above function is subtly wrong:
-\lstinline{(R0? `(program (program 3)))} will return true, when it
+\lstinline{(R0? (Program '() (Program '() (Int 3))))} will return true, when it
 should return false.
 should return false.
 
 
 %% NOTE FIXME - must check for consistency on this issue throughout.
 %% NOTE FIXME - must check for consistency on this issue throughout.
@@ -754,18 +794,24 @@ clause per grammar rule for $R_0$ expressions.
 \begin{lstlisting}
 \begin{lstlisting}
 (define (interp-exp e)
 (define (interp-exp e)
   (match e
   (match e
-    [(? fixnum?) e]
-    [`(read)
-     (let ([r (read)])
-       (cond [(fixnum? r) r]
-             [else (error 'interp-R0 "input not an integer" r)]))]
-    [`(- ,e1)     (fx- 0 (interp-exp e1))]
-    [`(+ ,e1 ,e2) (fx+ (interp-exp e1) (interp-exp e2))]
-    ))
+    [(Int n) n]
+    [(Prim 'read '())
+     (define r (read))
+     (cond [(fixnum? r) r]
+           [else (error 'interp-R1 "expected an integer" r)])]
+    [(Prim '- (list e))
+     (define v (interp-exp e))
+     (fx- 0 v)]
+    [(Prim '+ (list e1 e2))
+     (define v1 (interp-exp e1))
+     (define v2 (interp-exp e2))
+     (fx+ v1 v2)]
+    )))
 
 
 (define (interp-R0 p)
 (define (interp-R0 p)
   (match p
   (match p
-    [`(program ,e) (interp-exp e)]))
+    [(Program '() e) (interp-exp e)]
+    ))
 \end{lstlisting}
 \end{lstlisting}
 \caption{Interpreter for the $R_0$ language.}
 \caption{Interpreter for the $R_0$ language.}
 \label{fig:interp-R0}
 \label{fig:interp-R0}
@@ -776,8 +822,11 @@ following program adds two integers.
 \begin{lstlisting}
 \begin{lstlisting}
 (+ 10 32)
 (+ 10 32)
 \end{lstlisting}
 \end{lstlisting}
-The result is \key{42}.  (We wrote the above program in concrete syntax,
-whereas the parsed abstract syntax is \lstinline{(program (+ 10 32))}.)
+The result is \key{42}.  We wrote the above program in concrete syntax,
+whereas the parsed abstract syntax is:
+\begin{lstlisting}
+(Program '() (Prim '+ (list (Int 10) (Int 32))))
+\end{lstlisting}
 
 
 The next example demonstrates that expressions may be nested within
 The next example demonstrates that expressions may be nested within
 each other, in this case nesting several additions and negations.
 each other, in this case nesting several additions and negations.
@@ -789,11 +838,11 @@ What is the result of the above program?
 As mentioned previously, the $R_0$ language does not support
 As mentioned previously, the $R_0$ language does not support
 arbitrarily-large integers, but only $63$-bit integers, so we
 arbitrarily-large integers, but only $63$-bit integers, so we
 interpret the arithmetic operations of $R_0$ using fixnum arithmetic
 interpret the arithmetic operations of $R_0$ using fixnum arithmetic
-in Racket.  What happens when we run the following program?
+in Racket.
+Suppose $n = 999999999999999999$, which indeed fits in $63$-bits.
+What happens when we run the following program in our interpreter?
 \begin{lstlisting}
 \begin{lstlisting}
-(define large 999999999999999999)
-(interp-R0 `(program (+ (+ (+ ,large ,large) (+ ,large ,large))
-                        (+ (+ ,large ,large) (+ ,large ,large)))))
+(+ (+ (+ |$n$| |$n$|) (+ |$n$| |$n$|)) (+ (+ |$n$| |$n$|) (+ |$n$| |$n$|)))))
 \end{lstlisting}
 \end{lstlisting}
 It produces an error:
 It produces an error:
 \begin{lstlisting}
 \begin{lstlisting}
@@ -816,7 +865,8 @@ program \eqref{eq:arith-prog} performs a \key{read} and then subtracts
 (interp-R0 ast1.1)
 (interp-R0 ast1.1)
 \end{lstlisting}
 \end{lstlisting}
 and the input the integer \code{50} we get the answer to life, the
 and the input the integer \code{50} we get the answer to life, the
-universe, and everything: \code{42}.
+universe, and everything: \code{42}!\footnote{\emph{The Hitchhiker's
+    Guide to the Galaxy} by Douglas Adams.}
 
 
 We include the \key{read} operation in $R_0$ so a clever student
 We include the \key{read} operation in $R_0$ so a clever student
 cannot implement a compiler for $R_0$ that simply runs the interpreter
 cannot implement a compiler for $R_0$ that simply runs the interpreter
@@ -876,24 +926,26 @@ functions is the output of partially evaluating the children.
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \begin{lstlisting}
 \begin{lstlisting}
 (define (pe-neg r)
 (define (pe-neg r)
-  (cond [(fixnum? r) (fx- 0 r)]
-        [else `(- ,r)]))
+  (match r
+    [(Int n) (Int (fx- 0 n))]
+    [else (Prim '- (list r))]))
 
 
 (define (pe-add r1 r2)
 (define (pe-add r1 r2)
-  (cond [(and (fixnum? r1) (fixnum? r2)) (fx+ r1 r2)]
-        [else `(+ ,r1 ,r2)]))
+  (match* (r1 r2)
+    [((Int n1) (Int n2)) (Int (fx+ n1 n2))]
+    [(_ _) (Prim '+ (list r1 r2))]))
 
 
 (define (pe-exp e)
 (define (pe-exp e)
   (match e
   (match e
-    [(? fixnum?) e]
-    [`(read) `(read)]
-    [`(- ,e1) (pe-neg (pe-exp e1))]
-    [`(+ ,e1 ,e2) (pe-add (pe-exp e1) (pe-exp e2))]
+    [(Int n) (Int n)]
+    [(Prim 'read '()) (Prim 'read '())]
+    [(Prim '- (list e1)) (pe-neg (pe-exp e1))]
+    [(Prim '+ (list e1 e2)) (pe-add (pe-exp e1) (pe-exp e2))]
     ))
     ))
 
 
 (define (pe-R0 p)
 (define (pe-R0 p)
   (match p
   (match p
-    [`(program ,e) `(program ,(pe-exp e))]
+    [(Program info e) (Program info (pe-exp e))]
     ))
     ))
 \end{lstlisting}
 \end{lstlisting}
 \caption{A partial evaluator for $R_0$ expressions.}
 \caption{A partial evaluator for $R_0$ expressions.}
@@ -911,16 +963,17 @@ test whether it produces programs that get the same result as the
 input programs. That is, we can test whether it satisfies Diagram
 input programs. That is, we can test whether it satisfies Diagram
 \eqref{eq:compile-correct}. The following code runs the partial
 \eqref{eq:compile-correct}. The following code runs the partial
 evaluator on several examples and tests the output program.  The
 evaluator on several examples and tests the output program.  The
-\texttt{assert} function is defined in Appendix~\ref{appendix:utilities}.\\
+\texttt{parse-program} and \texttt{assert} functions are defined in
+Appendix~\ref{appendix:utilities}.\\
 \begin{minipage}{1.0\textwidth}
 \begin{minipage}{1.0\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
 (define (test-pe p)
 (define (test-pe p)
   (assert "testing pe-R0"
   (assert "testing pe-R0"
      (equal? (interp-R0 p) (interp-R0 (pe-R0 p)))))
      (equal? (interp-R0 p) (interp-R0 (pe-R0 p)))))
 
 
-(test-pe `(+ (read) (- (+ 5 3))))
-(test-pe `(+ 1 (+ (read) 1)))
-(test-pe `(- (+ (read) (- 5))))
+(test-pe (parse-program `(program () (+ 10 (- (+ 5 3))))))
+(test-pe (parse-program `(program () (+ 1 (+ 3 1)))))
+(test-pe (parse-program `(program () (- (+ 3 (- 5))))))
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 
 
@@ -7778,7 +7831,7 @@ registers.
 %%  LocalWords:  Sarkar lcl Matz aa representable Chez Ph Dan's nano
 %%  LocalWords:  Sarkar lcl Matz aa representable Chez Ph Dan's nano
 %%  LocalWords:  fk bh Siek plt uq Felleisen Bor Yuh ASTs AST Naur eq
 %%  LocalWords:  fk bh Siek plt uq Felleisen Bor Yuh ASTs AST Naur eq
 %%  LocalWords:  BNF fixnum datatype arith prog backquote quasiquote
 %%  LocalWords:  BNF fixnum datatype arith prog backquote quasiquote
-%%  LocalWords:  ast sexp Reynold's reynolds interp cond fx evaluator
+%%  LocalWords:  ast Reynold's reynolds interp cond fx evaluator
 %%  LocalWords:  quasiquotes pe nullary unary rcl env lookup gcc rax
 %%  LocalWords:  quasiquotes pe nullary unary rcl env lookup gcc rax
 %%  LocalWords:  addq movq callq rsp rbp rbx rcx rdx rsi rdi subq nx
 %%  LocalWords:  addq movq callq rsp rbp rbx rcx rdx rsi rdi subq nx
 %%  LocalWords:  negq pushq popq retq globl Kernighan uniquify lll ve
 %%  LocalWords:  negq pushq popq retq globl Kernighan uniquify lll ve

+ 7 - 4
defs.tex

@@ -17,7 +17,10 @@
 \newcommand{\Op}{\itm{op}}
 \newcommand{\Op}{\itm{op}}
 \newcommand{\key}[1]{\texttt{#1}}
 \newcommand{\key}[1]{\texttt{#1}}
 \newcommand{\code}[1]{\texttt{#1}}
 \newcommand{\code}[1]{\texttt{#1}}
-\newcommand{\READ}{(\key{read})}
+\newcommand{\READ}{\key{(Prim}\;\code{'read}\;\key{'())}}
+\newcommand{\NEG}[1]{\key{(Prim}\;\code{'-}\;\code{(list}\;#1\;\code{))}}
+\newcommand{\PROGRAM}[2]{\code{(Program}\;#1\;#2\code{)}}
+\newcommand{\ADD}[2]{\key{(Prim}\;\code{'+}\;\code{(list}\;#1\;#2\code{))}}
 \newcommand{\UNIOP}[2]{(\key{#1}~#2)}
 \newcommand{\UNIOP}[2]{(\key{#1}~#2)}
 \newcommand{\BINOP}[3]{(\key{#1}~#2~#3)}
 \newcommand{\BINOP}[3]{(\key{#1}~#2~#3)}
 \newcommand{\LET}[3]{(\key{let}~([#1\;#2])~#3)}
 \newcommand{\LET}[3]{(\key{let}~([#1\;#2])~#3)}
@@ -25,9 +28,9 @@
 \newcommand{\ASSIGN}[2]{(\key{assign}~#1\;#2)}
 \newcommand{\ASSIGN}[2]{(\key{assign}~#1\;#2)}
 \newcommand{\RETURN}[1]{(\key{return}~#1)}
 \newcommand{\RETURN}[1]{(\key{return}~#1)}
 
 
-\newcommand{\INT}[1]{(\key{int}\;#1)}
-\newcommand{\REG}[1]{(\key{reg}\;#1)}
-\newcommand{\VAR}[1]{(\key{var}\;#1)}
+\newcommand{\INT}[1]{\key{(Int}\;#1\key{)}}
+\newcommand{\REG}[1]{\key{(Reg}\;#1\key{)}}
+\newcommand{\VAR}[1]{\key{(Var}\;#1\key{)}}
 \newcommand{\STACKLOC}[1]{(\key{stack}\;#1)}
 \newcommand{\STACKLOC}[1]{(\key{stack}\;#1)}
 
 
 \newcommand{\IF}[3]{(\key{if}\,#1\;#2\;#3)}
 \newcommand{\IF}[3]{(\key{if}\,#1\;#2\;#3)}