4 anni fa · 946a5e87b0
--- a/book.tex
+++ b/book.tex
@@ -409,6 +409,9 @@ To create a AST node for the integer $8$, we write \code{(Int 8)}.
 
				 \begin{lstlisting}
			
 
				 (define eight (Int 8))
			
 
				 \end{lstlisting}
			
 
				+We say that the value created by \code{(Int 8)} is an
			
 
				+\emph{instance} of the \code{Int} structure.
			
 
				+
			
 
				 The following is the \code{struct} definition for primitives operations.
			
 
				 \begin{lstlisting}
			
 
				 (struct Prim (op arg*))
			
@@ -429,7 +432,21 @@ whereas the addition operator has two children:
 
				 (define ast1.1 (Prim '+ (list rd neg-eight)))
			
 
				 \end{lstlisting}
			
 
				 
			
 
				-When deciding how to compile program \eqref{eq:arith-prog}, we need to
			
 
				+We have made a design choice regarding the \code{Prim} structure.
			
 
				+Instead of using one structure for many different operations
			
 
				+(\code{read}, \code{+}, and \code{-}), we could have instead defined a
			
 
				+structure for each operation, as follows.
			
 
				+\begin{lstlisting}
			
 
				+(struct Read ())
			
 
				+(struct Add (left right))
			
 
				+(struct Neg (value))
			
 
				+\end{lstlisting}
			
 
				+The reason we choose to use just one structure is that in many parts
			
 
				+of the compiler, the code for the different primitive operators is the
			
 
				+same, so we might as well just write that code once, which is enabled
			
 
				+by using a single structure.
			
 
				+
			
 
				+When compiling a program such as \eqref{eq:arith-prog}, we need to
			
 
				 know that the operation associated with the root node is addition and
			
 
				 that it has two children: \texttt{read} and a negation. The AST data
			
 
				 structure directly supports these queries, as we shall see in
			
@@ -455,10 +472,10 @@ Backus-Naur Form (BNF)~\citep{Backus:1960aa,Knuth:1964aa}.  As an
 
				 example, we describe a small language, named $R_0$, that consists of
			
 
				 integers and arithmetic operations.
			
 
				 
			
 
				-The first grammar rule says that given any integer $n$, an integer
			
 
				-node $\INT{n}$ is an expression:
			
 
				+The first grammar rule says that an instance of the \code{Int}
			
 
				+structure is an expression:
			
 
				 \begin{equation}
			
 
				-\Exp ::= \INT{n}  \label{eq:arith-int}
			
 
				+\Exp ::= \INT{\Int}  \label{eq:arith-int}
			
 
				 \end{equation}
			
 
				 %
			
 
				 Each rule has a left-hand-side and a right-hand-side. The way to read
			
@@ -469,15 +486,17 @@ according to the left-hand-side.
 
				 A name such as $\Exp$ that is
			
 
				 defined by the grammar rules is a \emph{non-terminal}.
			
 
				 %
			
 
				-%% The name $\Int$ is a also a non-terminal, however, we do not define
			
 
				-%% $\Int$ because the reader already knows what an integer is.
			
 
				-
			
 
				-We make the simplifying design decision that all of the languages in
			
 
				-this book only handle machine-representable integers.  On most modern
			
 
				+The name $\Int$ is a also a non-terminal, but instead of defining it
			
 
				+with a grammar rule, we define it with the following explanation.  We
			
 
				+make the simplifying design decision that all of the languages in this
			
 
				+book only handle machine-representable integers.  On most modern
			
 
				 machines this corresponds to integers represented with 64-bits, i.e.,
			
 
				 the in range $-2^{63}$ to $2^{63}-1$.  We restrict this range further
			
 
				 to match the Racket \texttt{fixnum} datatype, which allows 63-bit
			
 
				-integers on a 64-bit machine.
			
 
				+integers on a 64-bit machine. So an $\Int$ is a sequence of decimals
			
 
				+($0$ to $9$), possibly starting with $-$ (for negative integers), such
			
 
				+that the sequence of decimals represent an integer in range $-2^{62}$
			
 
				+to $2^{62}-1$.
			
 
				 
			
 
				 The second grammar rule is the \texttt{read} operation that receives
			
 
				 an input integer from the user of the program.
			
@@ -570,14 +589,14 @@ called an {\em alternative}.
 
				 \begin{minipage}{0.96\textwidth}
			
 
				 \[
			
 
				 \begin{array}{rcl}
			
 
				-\Exp &::=& \INT{n} \mid \READ{} \mid \NEG{\Exp} \\
			
 
				+\Exp &::=& \INT{\Int} \mid \READ{} \mid \NEG{\Exp} \\
			
 
				      &\mid&  \ADD{\Exp}{\Exp}  \\
			
 
				-R_0  &::=& \code{(Program} \; \code{'()}\; \Exp \code{)}
			
 
				+R_0  &::=& \PROGRAM{\code{'()}}{\Exp}
			
 
				 \end{array}
			
 
				 \]
			
 
				 \end{minipage}
			
 
				 }
			
 
				-\caption{The syntax of $R_0$, a language of integer arithmetic.}
			
 
				+\caption{The abstract syntax of $R_0$, a language of integer arithmetic.}
			
 
				 \label{fig:r0-syntax}
			
 
				 \end{figure}
			
 
				 
			
@@ -987,18 +1006,18 @@ integer arithmetic and local variable binding, which we name $R_1$, to
 
				 x86-64 assembly code~\citep{Intel:2015aa}.  Henceforth we shall refer
			
 
				 to x86-64 simply as x86.  The chapter begins with a description of the
			
 
				 $R_1$ language (Section~\ref{sec:s0}) followed by a description of x86
			
 
				-(Section~\ref{sec:x86}). The x86 assembly language is quite large, so
			
 
				-we discuss only what is needed for compiling $R_1$. We introduce more
			
 
				-of x86 in later chapters. Once we have introduced $R_1$ and x86, we
			
 
				+(Section~\ref{sec:x86}). The x86 assembly language is large, so we
			
 
				+discuss only what is needed for compiling $R_1$. We introduce more of
			
 
				+x86 in later chapters. Once we have introduced $R_1$ and x86, we
			
 
				 reflect on their differences and come up with a plan to break down the
			
 
				 translation from $R_1$ to x86 into a handful of steps
			
 
				 (Section~\ref{sec:plan-s0-x86}).  The rest of the sections in this
			
 
				 chapter give detailed hints regarding each step
			
 
				 (Sections~\ref{sec:uniquify-s0} through \ref{sec:patch-s0}).  We hope
			
 
				-to give enough hints that the well-prepared reader, together with some
			
 
				-friends, can implement a compiler from $R_1$ to x86 in a couple weeks
			
 
				-while at the same time leaving room for some fun and creativity.  To
			
 
				-give the reader a feeling for the scale of this first compiler, the
			
 
				+to give enough hints that the well-prepared reader, together with a
			
 
				+few friends, can implement a compiler from $R_1$ to x86 in a couple
			
 
				+weeks while at the same time leaving room for some fun and creativity.
			
 
				+To give the reader a feeling for the scale of this first compiler, the
			
 
				 instructor solution for the $R_1$ compiler is less than 500 lines of
			
 
				 code.
			
 
				 
			
@@ -1011,12 +1030,12 @@ the $R_1$ language is defined by the grammar in
 
				 Figure~\ref{fig:r1-syntax}.  The non-terminal \Var{} may be any Racket
			
 
				 identifier. As in $R_0$, \key{read} is a nullary operator, \key{-} is
			
 
				 a unary operator, and \key{+} is a binary operator.  Similar to $R_0$,
			
 
				-the $R_1$ language includes the \key{program} construct to mark the
			
 
				-top of the program, which is helpful in some of the compiler passes.
			
 
				-The $\itm{info}$ field of the \key{program} construct contains an
			
 
				-association list that is used to communicate auxiliary data from one
			
 
				-compiler pass the next. Despite the simplicity of the $R_1$ language,
			
 
				-it is rich enough to exhibit several compilation techniques.
			
 
				+the $R_1$ language includes the \key{Program} struct to mark the top
			
 
				+of the program. The $\itm{info}$ field of the \key{Program} struct
			
 
				+contains an \emph{association list} (a list of key-value pairs) that
			
 
				+is used to communicate auxiliary data from one compiler pass the
			
 
				+next. Despite the simplicity of the $R_1$ language, it is rich enough
			
 
				+to exhibit several compilation techniques.
			
 
				 
			
 
				 \begin{figure}[btp]
			
 
				 \centering
			
@@ -1024,50 +1043,52 @@ it is rich enough to exhibit several compilation techniques.
 
				 \begin{minipage}{0.96\textwidth}
			
 
				 \[
			
 
				 \begin{array}{rcl}
			
 
				-\Exp &::=& \Int \mid (\key{read}) \mid (\key{-}\;\Exp) \mid (\key{+} \; \Exp\;\Exp)  \\
			
 
				-     &\mid&  \Var \mid \LET{\Var}{\Exp}{\Exp} \\
			
 
				-R_1  &::=& (\key{program} \;\itm{info}\; \Exp)
			
 
				+\Exp &::=& \INT{\Int} \mid \READ{} \mid \NEG{\Exp} \\
			
 
				+     &\mid& \ADD{\Exp}{\Exp}  
			
 
				+     \mid  \VAR{\Var} \mid \LET{\Var}{\Exp}{\Exp} \\
			
 
				+R_1  &::=& \PROGRAM{\code{'()}}{\Exp}
			
 
				 \end{array}
			
 
				 \]
			
 
				 \end{minipage}
			
 
				 }
			
 
				-\caption{The syntax of $R_1$, a language of integers and variables.}
			
 
				+\caption{The abstract syntax of $R_1$, a language of integers and variables.}
			
 
				 \label{fig:r1-syntax}
			
 
				 \end{figure}
			
 
				 
			
 
				 Let us dive further into the syntax and semantics of the $R_1$
			
 
				-language.  The \key{let} construct defines a variable for use within
			
 
				-its body and initializes the variable with the value of an expression.
			
 
				-So the following program initializes \code{x} to \code{32} and then
			
 
				-evaluates the body \code{(+ 10 x)}, producing \code{42}.
			
 
				+language.  The \key{Let} feature defines a variable for use within its
			
 
				+body and initializes the variable with the value of an expression.
			
 
				+The abstract syntax for \key{Let} is defined in Figure~\ref{fig:r1-syntax}.
			
 
				+The concrete syntax for \key{Let} is
			
 
				 \begin{lstlisting}
			
 
				-(program ()
			
 
				-   (let ([x (+ 12 20)]) (+ 10 x)))
			
 
				+(let ([|$\itm{var}$| |$\itm{exp}$|]) |$\itm{exp}$|)
			
 
				+\end{lstlisting}
			
 
				+For example, the following program initializes \code{x} to $32$ and then
			
 
				+evaluates the body \code{(+ 10 x)}, producing $42$.
			
 
				+\begin{lstlisting}
			
 
				+(let ([x (+ 12 20)]) (+ 10 x))
			
 
				 \end{lstlisting}
			
 
				 When there are multiple \key{let}'s for the same variable, the closest
			
 
				 enclosing \key{let} is used. That is, variable definitions overshadow
			
 
				 prior definitions. Consider the following program with two \key{let}'s
			
 
				 that define variables named \code{x}. Can you figure out the result?
			
 
				 \begin{lstlisting}
			
 
				-(program ()
			
 
				-   (let ([x 32]) (+ (let ([x 10]) x) x)))
			
 
				+(let ([x 32]) (+ (let ([x 10]) x) x))
			
 
				 \end{lstlisting}
			
 
				-For the purposes of showing which variable uses correspond to which
			
 
				-definitions, the following shows the \code{x}'s annotated with subscripts
			
 
				-to distinguish them. Double check that your answer for the above is
			
 
				-the same as your answer for this annotated version of the program.
			
 
				+For the purposes of depicting which variable uses correspond to which
			
 
				+definitions, the following shows the \code{x}'s annotated with
			
 
				+subscripts to distinguish them. Double check that your answer for the
			
 
				+above is the same as your answer for this annotated version of the
			
 
				+program.
			
 
				 \begin{lstlisting}
			
 
				-(program ()
			
 
				-   (let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|)))
			
 
				+(let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|))
			
 
				 \end{lstlisting}
			
 
				 The initializing expression is always evaluated before the body of the
			
 
				 \key{let}, so in the following, the \key{read} for \code{x} is
			
 
				 performed before the \key{read} for \code{y}. Given the input
			
 
				-\code{52} then \code{10}, the following produces \code{42} (and not
			
 
				-\code{-42}).
			
 
				+$52$ then $10$, the following produces $42$ (not $-42$).
			
 
				 \begin{lstlisting}
			
 
				-(program ()
			
 
				-  (let ([x (read)]) (let ([y (read)]) (+ x (- y)))))
			
 
				+(let ([x (read)]) (let ([y (read)]) (+ x (- y))))
			
 
				 \end{lstlisting}
			
 
				 
			
 
				 Figure~\ref{fig:interp-R1} shows the definitional interpreter for the
			
@@ -1081,37 +1102,37 @@ environment. The \code{interp-R1} function takes the current
 
				 environment, \code{env}, as an extra parameter.  When the interpreter
			
 
				 encounters a variable, it finds the corresponding value using the
			
 
				 \code{lookup} function (Appendix~\ref{appendix:utilities}).  When the
			
 
				-interpreter encounters a \key{let}, it evaluates the initializing
			
 
				+interpreter encounters a \key{Let}, it evaluates the initializing
			
 
				 expression, extends the environment with the result value bound to the
			
 
				-variable, then evaluates the body of the \key{let}.
			
 
				+variable, then evaluates the body of the \key{Let}.
			
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{lstlisting}
			
 
				 (define (interp-exp env)
			
 
				   (lambda (e)
			
 
				     (match e
			
 
				-      [(? fixnum?) e]
			
 
				-      [`(read)
			
 
				+      [(Int n) n]
			
 
				+      [(Prim 'read '())
			
 
				        (define r (read))
			
 
				        (cond [(fixnum? r) r]
			
 
				              [else (error 'interp-R1 "expected an integer" r)])]
			
 
				-      [`(- ,e)
			
 
				+      [(Prim '- (list e))
			
 
				        (define v ((interp-exp env) e))
			
 
				        (fx- 0 v)]
			
 
				-      [`(+ ,e1 ,e2)
			
 
				+      [(Prim '+ (list e1 e2))
			
 
				        (define v1 ((interp-exp env) e1))
			
 
				        (define v2 ((interp-exp env) e2))
			
 
				        (fx+ v1 v2)]
			
 
				-      [(? symbol?) (lookup e env)]
			
 
				-      [`(let ([,x ,e]) ,body)
			
 
				+      [(Var x) (lookup x env)]
			
 
				+      [(Let x e body)
			
 
				        (define new-env (cons (cons x ((interp-exp env) e)) env))
			
 
				        ((interp-exp new-env) body)]
			
 
				       )))
			
 
				 
			
 
				-   (define (interp-R1 env)
			
 
				-     (lambda (p)
			
 
				-       (match p
			
 
				-         [`(program ,info ,e) ((interp-exp '()) e)])))
			
 
				+(define (interp-R1 p)
			
 
				+  (match p
			
 
				+    [(Program info e) ((interp-exp '()) e)]
			
 
				+    ))
			
 
				 \end{lstlisting}
			
 
				 \caption{Interpreter for the $R_1$ language.}
			
 
				 \label{fig:interp-R1}
			
--- a/defs.tex
+++ b/defs.tex
@@ -17,20 +17,21 @@
 
				 \newcommand{\Op}{\itm{op}}
			
 
				 \newcommand{\key}[1]{\texttt{#1}}
			
 
				 \newcommand{\code}[1]{\texttt{#1}}
			
 
				+
			
 
				+\newcommand{\INT}[1]{\key{(Int}\;#1\key{)}}
			
 
				 \newcommand{\READ}{\key{(Prim}\;\code{'read}\;\key{'())}}
			
 
				 \newcommand{\NEG}[1]{\key{(Prim}\;\code{'-}\;\code{(list}\;#1\;\code{))}}
			
 
				 \newcommand{\PROGRAM}[2]{\code{(Program}\;#1\;#2\code{)}}
			
 
				 \newcommand{\ADD}[2]{\key{(Prim}\;\code{'+}\;\code{(list}\;#1\;#2\code{))}}
			
 
				 \newcommand{\UNIOP}[2]{(\key{#1}~#2)}
			
 
				 \newcommand{\BINOP}[3]{(\key{#1}~#2~#3)}
			
 
				-\newcommand{\LET}[3]{(\key{let}~([#1\;#2])~#3)}
			
 
				+\newcommand{\VAR}[1]{\key{(Var}\;#1\key{)}}
			
 
				+\newcommand{\LET}[3]{\key{(Let}~#1~#2~#3\key{)}}
			
 
				 
			
 
				 \newcommand{\ASSIGN}[2]{(\key{assign}~#1\;#2)}
			
 
				 \newcommand{\RETURN}[1]{(\key{return}~#1)}
			
 
				 
			
 
				-\newcommand{\INT}[1]{\key{(Int}\;#1\key{)}}
			
 
				 \newcommand{\REG}[1]{\key{(Reg}\;#1\key{)}}
			
 
				-\newcommand{\VAR}[1]{\key{(Var}\;#1\key{)}}
			
 
				 \newcommand{\STACKLOC}[1]{(\key{stack}\;#1)}
			
 
				 
			
 
				 \newcommand{\IF}[3]{(\key{if}\,#1\;#2\;#3)}