4 years ago · 946a5e87b0
--- a/book.tex
+++ b/book.tex
@@ -409,6 +409,9 @@ To create a AST node for the integer $8$, we write \code{(Int 8)}.
 
															 \begin{lstlisting}
														
 
															 (define eight (Int 8))
														
 
															 \end{lstlisting}
														
 
															+We say that the value created by \code{(Int 8)} is an
														
 
															+\emph{instance} of the \code{Int} structure.
														
 
															+
														
 
															 The following is the \code{struct} definition for primitives operations.
														
 
															 \begin{lstlisting}
														
 
															 (struct Prim (op arg*))
														
@@ -429,7 +432,21 @@ whereas the addition operator has two children:
 
															 (define ast1.1 (Prim '+ (list rd neg-eight)))
														
 
															 \end{lstlisting}
														
 
															-When deciding how to compile program \eqref{eq:arith-prog}, we need to
														
 
															+We have made a design choice regarding the \code{Prim} structure.
														
 
															+Instead of using one structure for many different operations
														
 
															+(\code{read}, \code{+}, and \code{-}), we could have instead defined a
														
 
															+structure for each operation, as follows.
														
 
															+\begin{lstlisting}
														
 
															+(struct Read ())
														
 
															+(struct Add (left right))
														
 
															+(struct Neg (value))
														
 
															+\end{lstlisting}
														
 
															+The reason we choose to use just one structure is that in many parts
														
 
															+of the compiler, the code for the different primitive operators is the
														
 
															+same, so we might as well just write that code once, which is enabled
														
 
															+by using a single structure.
														
 
															+
														
 
															+When compiling a program such as \eqref{eq:arith-prog}, we need to
														
 
															 know that the operation associated with the root node is addition and
														
 
															 that it has two children: \texttt{read} and a negation. The AST data
														
 
															 structure directly supports these queries, as we shall see in
														
@@ -455,10 +472,10 @@ Backus-Naur Form (BNF)~\citep{Backus:1960aa,Knuth:1964aa}.  As an
 
															 example, we describe a small language, named $R_0$, that consists of
														
 
															 integers and arithmetic operations.
														
 
															-The first grammar rule says that given any integer $n$, an integer
														
 
															-node $\INT{n}$ is an expression:
														
 
															+The first grammar rule says that an instance of the \code{Int}
														
 
															+structure is an expression:
														
 
															 \begin{equation}
														
 
															-\Exp ::= \INT{n}  \label{eq:arith-int}
														
 
															+\Exp ::= \INT{\Int}  \label{eq:arith-int}
														
 
															 \end{equation}
														
 
															 %
														
 
															 Each rule has a left-hand-side and a right-hand-side. The way to read
														
@@ -469,15 +486,17 @@ according to the left-hand-side.
 
															 A name such as $\Exp$ that is
														
 
															 defined by the grammar rules is a \emph{non-terminal}.
														
 
															 %
														
 
															-%% The name $\Int$ is a also a non-terminal, however, we do not define
														
 
															-%% $\Int$ because the reader already knows what an integer is.
														
 
															-
														
 
															-We make the simplifying design decision that all of the languages in
														
 
															-this book only handle machine-representable integers.  On most modern
														
 
															+The name $\Int$ is a also a non-terminal, but instead of defining it
														
 
															+with a grammar rule, we define it with the following explanation.  We
														
 
															+make the simplifying design decision that all of the languages in this
														
 
															+book only handle machine-representable integers.  On most modern
														
 
															 machines this corresponds to integers represented with 64-bits, i.e.,
														
 
															 the in range $-2^{63}$ to $2^{63}-1$.  We restrict this range further
														
 
															 to match the Racket \texttt{fixnum} datatype, which allows 63-bit
														
 
															-integers on a 64-bit machine.
														
 
															+integers on a 64-bit machine. So an $\Int$ is a sequence of decimals
														
 
															+($0$ to $9$), possibly starting with $-$ (for negative integers), such
														
 
															+that the sequence of decimals represent an integer in range $-2^{62}$
														
 
															+to $2^{62}-1$.
														
 
															 The second grammar rule is the \texttt{read} operation that receives
														
 
															 an input integer from the user of the program.
														
@@ -570,14 +589,14 @@ called an {\em alternative}.
 
															 \begin{minipage}{0.96\textwidth}
														
 
															 \[
														
 
															 \begin{array}{rcl}
														
 
															-\Exp &::=& \INT{n} \mid \READ{} \mid \NEG{\Exp} \\
														
 
															+\Exp &::=& \INT{\Int} \mid \READ{} \mid \NEG{\Exp} \\
														
 
															      &\mid&  \ADD{\Exp}{\Exp}  \\
														
 
															-R_0  &::=& \code{(Program} \; \code{'()}\; \Exp \code{)}
														
 
															+R_0  &::=& \PROGRAM{\code{'()}}{\Exp}
														
 
															 \end{array}
														
 
															 \]
														
 
															 \end{minipage}
														
 
															 }
														
 
															-\caption{The syntax of $R_0$, a language of integer arithmetic.}
														
 
															+\caption{The abstract syntax of $R_0$, a language of integer arithmetic.}
														
 
															 \label{fig:r0-syntax}
														
 
															 \end{figure}
														
@@ -987,18 +1006,18 @@ integer arithmetic and local variable binding, which we name $R_1$, to
 
															 x86-64 assembly code~\citep{Intel:2015aa}.  Henceforth we shall refer
														
 
															 to x86-64 simply as x86.  The chapter begins with a description of the
														
 
															 $R_1$ language (Section~\ref{sec:s0}) followed by a description of x86
														
 
															-(Section~\ref{sec:x86}). The x86 assembly language is quite large, so
														
 
															-we discuss only what is needed for compiling $R_1$. We introduce more
														
 
															-of x86 in later chapters. Once we have introduced $R_1$ and x86, we
														
 
															+(Section~\ref{sec:x86}). The x86 assembly language is large, so we
														
 
															+discuss only what is needed for compiling $R_1$. We introduce more of
														
 
															+x86 in later chapters. Once we have introduced $R_1$ and x86, we
														
 
															 reflect on their differences and come up with a plan to break down the
														
 
															 translation from $R_1$ to x86 into a handful of steps
														
 
															 (Section~\ref{sec:plan-s0-x86}).  The rest of the sections in this
														
 
															 chapter give detailed hints regarding each step
														
 
															 (Sections~\ref{sec:uniquify-s0} through \ref{sec:patch-s0}).  We hope
														
 
															-to give enough hints that the well-prepared reader, together with some
														
 
															-friends, can implement a compiler from $R_1$ to x86 in a couple weeks
														
 
															-while at the same time leaving room for some fun and creativity.  To
														
 
															-give the reader a feeling for the scale of this first compiler, the
														
 
															+to give enough hints that the well-prepared reader, together with a
														
 
															+few friends, can implement a compiler from $R_1$ to x86 in a couple
														
 
															+weeks while at the same time leaving room for some fun and creativity.
														
 
															+To give the reader a feeling for the scale of this first compiler, the
														
 
															 instructor solution for the $R_1$ compiler is less than 500 lines of
														
 
															 code.
														
@@ -1011,12 +1030,12 @@ the $R_1$ language is defined by the grammar in
 
															 Figure~\ref{fig:r1-syntax}.  The non-terminal \Var{} may be any Racket
														
 
															 identifier. As in $R_0$, \key{read} is a nullary operator, \key{-} is
														
 
															 a unary operator, and \key{+} is a binary operator.  Similar to $R_0$,
														
 
															-the $R_1$ language includes the \key{program} construct to mark the
														
 
															-top of the program, which is helpful in some of the compiler passes.
														
 
															-The $\itm{info}$ field of the \key{program} construct contains an
														
 
															-association list that is used to communicate auxiliary data from one
														
 
															-compiler pass the next. Despite the simplicity of the $R_1$ language,
														
 
															-it is rich enough to exhibit several compilation techniques.
														
 
															+the $R_1$ language includes the \key{Program} struct to mark the top
														
 
															+of the program. The $\itm{info}$ field of the \key{Program} struct
														
 
															+contains an \emph{association list} (a list of key-value pairs) that
														
 
															+is used to communicate auxiliary data from one compiler pass the
														
 
															+next. Despite the simplicity of the $R_1$ language, it is rich enough
														
 
															+to exhibit several compilation techniques.
														
 
															 \begin{figure}[btp]
														
 
															 \centering
														
@@ -1024,50 +1043,52 @@ it is rich enough to exhibit several compilation techniques.
 
															 \begin{minipage}{0.96\textwidth}
														
 
															 \[
														
 
															 \begin{array}{rcl}
														
 
															-\Exp &::=& \Int \mid (\key{read}) \mid (\key{-}\;\Exp) \mid (\key{+} \; \Exp\;\Exp)  \\
														
 
															-     &\mid&  \Var \mid \LET{\Var}{\Exp}{\Exp} \\
														
 
															-R_1  &::=& (\key{program} \;\itm{info}\; \Exp)
														
 
															+\Exp &::=& \INT{\Int} \mid \READ{} \mid \NEG{\Exp} \\
														
 
															+     &\mid& \ADD{\Exp}{\Exp}  
														
 
															+     \mid  \VAR{\Var} \mid \LET{\Var}{\Exp}{\Exp} \\
														
 
															+R_1  &::=& \PROGRAM{\code{'()}}{\Exp}
														
 
															 \end{array}
														
 
															 \]
														
 
															 \end{minipage}
														
 
															 }
														
 
															-\caption{The syntax of $R_1$, a language of integers and variables.}
														
 
															+\caption{The abstract syntax of $R_1$, a language of integers and variables.}
														
 
															 \label{fig:r1-syntax}
														
 
															 \end{figure}
														
 
															 Let us dive further into the syntax and semantics of the $R_1$
														
 
															-language.  The \key{let} construct defines a variable for use within
														
 
															-its body and initializes the variable with the value of an expression.
														
 
															-So the following program initializes \code{x} to \code{32} and then
														
 
															-evaluates the body \code{(+ 10 x)}, producing \code{42}.
														
 
															+language.  The \key{Let} feature defines a variable for use within its
														
 
															+body and initializes the variable with the value of an expression.
														
 
															+The abstract syntax for \key{Let} is defined in Figure~\ref{fig:r1-syntax}.
														
 
															+The concrete syntax for \key{Let} is
														
 
															 \begin{lstlisting}
														
 
															-(program ()
														
 
															-   (let ([x (+ 12 20)]) (+ 10 x)))
														
 
															+(let ([|$\itm{var}$| |$\itm{exp}$|]) |$\itm{exp}$|)
														
 
															+\end{lstlisting}
														
 
															+For example, the following program initializes \code{x} to $32$ and then
														
 
															+evaluates the body \code{(+ 10 x)}, producing $42$.
														
 
															+\begin{lstlisting}
														
 
															+(let ([x (+ 12 20)]) (+ 10 x))
														
 
															 \end{lstlisting}
														
 
															 When there are multiple \key{let}'s for the same variable, the closest
														
 
															 enclosing \key{let} is used. That is, variable definitions overshadow
														
 
															 prior definitions. Consider the following program with two \key{let}'s
														
 
															 that define variables named \code{x}. Can you figure out the result?
														
 
															 \begin{lstlisting}
														
 
															-(program ()
														
 
															-   (let ([x 32]) (+ (let ([x 10]) x) x)))
														
 
															+(let ([x 32]) (+ (let ([x 10]) x) x))
														
 
															 \end{lstlisting}
														
 
															-For the purposes of showing which variable uses correspond to which
														
 
															-definitions, the following shows the \code{x}'s annotated with subscripts
														
 
															-to distinguish them. Double check that your answer for the above is
														
 
															-the same as your answer for this annotated version of the program.
														
 
															+For the purposes of depicting which variable uses correspond to which
														
 
															+definitions, the following shows the \code{x}'s annotated with
														
 
															+subscripts to distinguish them. Double check that your answer for the
														
 
															+above is the same as your answer for this annotated version of the
														
 
															+program.
														
 
															 \begin{lstlisting}
														
 
															-(program ()
														
 
															-   (let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|)))
														
 
															+(let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|))
														
 
															 \end{lstlisting}
														
 
															 The initializing expression is always evaluated before the body of the
														
 
															 \key{let}, so in the following, the \key{read} for \code{x} is
														
 
															 performed before the \key{read} for \code{y}. Given the input
														
 
															-\code{52} then \code{10}, the following produces \code{42} (and not
														
 
															-\code{-42}).
														
 
															+$52$ then $10$, the following produces $42$ (not $-42$).
														
 
															 \begin{lstlisting}
														
 
															-(program ()
														
 
															-  (let ([x (read)]) (let ([y (read)]) (+ x (- y)))))
														
 
															+(let ([x (read)]) (let ([y (read)]) (+ x (- y))))
														
 
															 \end{lstlisting}
														
 
															 Figure~\ref{fig:interp-R1} shows the definitional interpreter for the
														
@@ -1081,37 +1102,37 @@ environment. The \code{interp-R1} function takes the current
 
															 environment, \code{env}, as an extra parameter.  When the interpreter
														
 
															 encounters a variable, it finds the corresponding value using the
														
 
															 \code{lookup} function (Appendix~\ref{appendix:utilities}).  When the
														
 
															-interpreter encounters a \key{let}, it evaluates the initializing
														
 
															+interpreter encounters a \key{Let}, it evaluates the initializing
														
 
															 expression, extends the environment with the result value bound to the
														
 
															-variable, then evaluates the body of the \key{let}.
														
 
															+variable, then evaluates the body of the \key{Let}.
														
 
															 \begin{figure}[tbp]
														
 
															 \begin{lstlisting}
														
 
															 (define (interp-exp env)
														
 
															   (lambda (e)
														
 
															     (match e
														
 
															-      [(? fixnum?) e]
														
 
															-      [`(read)
														
 
															+      [(Int n) n]
														
 
															+      [(Prim 'read '())
														
 
															        (define r (read))
														
 
															        (cond [(fixnum? r) r]
														
 
															              [else (error 'interp-R1 "expected an integer" r)])]
														
 
															-      [`(- ,e)
														
 
															+      [(Prim '- (list e))
														
 
															        (define v ((interp-exp env) e))
														
 
															        (fx- 0 v)]
														
 
															-      [`(+ ,e1 ,e2)
														
 
															+      [(Prim '+ (list e1 e2))
														
 
															        (define v1 ((interp-exp env) e1))
														
 
															        (define v2 ((interp-exp env) e2))
														
 
															        (fx+ v1 v2)]
														
 
															-      [(? symbol?) (lookup e env)]
														
 
															-      [`(let ([,x ,e]) ,body)
														
 
															+      [(Var x) (lookup x env)]
														
 
															+      [(Let x e body)
														
 
															        (define new-env (cons (cons x ((interp-exp env) e)) env))
														
 
															        ((interp-exp new-env) body)]
														
 
															       )))
														
 
															-   (define (interp-R1 env)
														
 
															-     (lambda (p)
														
 
															-       (match p
														
 
															-         [`(program ,info ,e) ((interp-exp '()) e)])))
														
 
															+(define (interp-R1 p)
														
 
															+  (match p
														
 
															+    [(Program info e) ((interp-exp '()) e)]
														
 
															+    ))
														
 
															 \end{lstlisting}
														
 
															 \caption{Interpreter for the $R_1$ language.}
														
 
															 \label{fig:interp-R1}
														
--- a/defs.tex
+++ b/defs.tex
@@ -17,20 +17,21 @@
 
															 \newcommand{\Op}{\itm{op}}
														
 
															 \newcommand{\key}[1]{\texttt{#1}}
														
 
															 \newcommand{\code}[1]{\texttt{#1}}
														
 
															+
														
 
															+\newcommand{\INT}[1]{\key{(Int}\;#1\key{)}}
														
 
															 \newcommand{\READ}{\key{(Prim}\;\code{'read}\;\key{'())}}
														
 
															 \newcommand{\NEG}[1]{\key{(Prim}\;\code{'-}\;\code{(list}\;#1\;\code{))}}
														
 
															 \newcommand{\PROGRAM}[2]{\code{(Program}\;#1\;#2\code{)}}
														
 
															 \newcommand{\ADD}[2]{\key{(Prim}\;\code{'+}\;\code{(list}\;#1\;#2\code{))}}
														
 
															 \newcommand{\UNIOP}[2]{(\key{#1}~#2)}
														
 
															 \newcommand{\BINOP}[3]{(\key{#1}~#2~#3)}
														
 
															-\newcommand{\LET}[3]{(\key{let}~([#1\;#2])~#3)}
														
 
															+\newcommand{\VAR}[1]{\key{(Var}\;#1\key{)}}
														
 
															+\newcommand{\LET}[3]{\key{(Let}~#1~#2~#3\key{)}}
														
 
															 \newcommand{\ASSIGN}[2]{(\key{assign}~#1\;#2)}
														
 
															 \newcommand{\RETURN}[1]{(\key{return}~#1)}
														
 
															-\newcommand{\INT}[1]{\key{(Int}\;#1\key{)}}
														
 
															 \newcommand{\REG}[1]{\key{(Reg}\;#1\key{)}}
														
 
															-\newcommand{\VAR}[1]{\key{(Var}\;#1\key{)}}
														
 
															 \newcommand{\STACKLOC}[1]{(\key{stack}\;#1)}
														
 
															 \newcommand{\IF}[3]{(\key{if}\,#1\;#2\;#3)}