Jeremy Siek пре 4 година
родитељ
комит
946a5e87b0
2 измењених фајлова са 85 додато и 63 уклоњено
  1. 81 60
      book.tex
  2. 4 3
      defs.tex

+ 81 - 60
book.tex

@@ -409,6 +409,9 @@ To create a AST node for the integer $8$, we write \code{(Int 8)}.
 \begin{lstlisting}
 (define eight (Int 8))
 \end{lstlisting}
+We say that the value created by \code{(Int 8)} is an
+\emph{instance} of the \code{Int} structure.
+
 The following is the \code{struct} definition for primitives operations.
 \begin{lstlisting}
 (struct Prim (op arg*))
@@ -429,7 +432,21 @@ whereas the addition operator has two children:
 (define ast1.1 (Prim '+ (list rd neg-eight)))
 \end{lstlisting}
 
-When deciding how to compile program \eqref{eq:arith-prog}, we need to
+We have made a design choice regarding the \code{Prim} structure.
+Instead of using one structure for many different operations
+(\code{read}, \code{+}, and \code{-}), we could have instead defined a
+structure for each operation, as follows.
+\begin{lstlisting}
+(struct Read ())
+(struct Add (left right))
+(struct Neg (value))
+\end{lstlisting}
+The reason we choose to use just one structure is that in many parts
+of the compiler, the code for the different primitive operators is the
+same, so we might as well just write that code once, which is enabled
+by using a single structure.
+
+When compiling a program such as \eqref{eq:arith-prog}, we need to
 know that the operation associated with the root node is addition and
 that it has two children: \texttt{read} and a negation. The AST data
 structure directly supports these queries, as we shall see in
@@ -455,10 +472,10 @@ Backus-Naur Form (BNF)~\citep{Backus:1960aa,Knuth:1964aa}.  As an
 example, we describe a small language, named $R_0$, that consists of
 integers and arithmetic operations.
 
-The first grammar rule says that given any integer $n$, an integer
-node $\INT{n}$ is an expression:
+The first grammar rule says that an instance of the \code{Int}
+structure is an expression:
 \begin{equation}
-\Exp ::= \INT{n}  \label{eq:arith-int}
+\Exp ::= \INT{\Int}  \label{eq:arith-int}
 \end{equation}
 %
 Each rule has a left-hand-side and a right-hand-side. The way to read
@@ -469,15 +486,17 @@ according to the left-hand-side.
 A name such as $\Exp$ that is
 defined by the grammar rules is a \emph{non-terminal}.
 %
-%% The name $\Int$ is a also a non-terminal, however, we do not define
-%% $\Int$ because the reader already knows what an integer is.
-
-We make the simplifying design decision that all of the languages in
-this book only handle machine-representable integers.  On most modern
+The name $\Int$ is a also a non-terminal, but instead of defining it
+with a grammar rule, we define it with the following explanation.  We
+make the simplifying design decision that all of the languages in this
+book only handle machine-representable integers.  On most modern
 machines this corresponds to integers represented with 64-bits, i.e.,
 the in range $-2^{63}$ to $2^{63}-1$.  We restrict this range further
 to match the Racket \texttt{fixnum} datatype, which allows 63-bit
-integers on a 64-bit machine.
+integers on a 64-bit machine. So an $\Int$ is a sequence of decimals
+($0$ to $9$), possibly starting with $-$ (for negative integers), such
+that the sequence of decimals represent an integer in range $-2^{62}$
+to $2^{62}-1$.
 
 The second grammar rule is the \texttt{read} operation that receives
 an input integer from the user of the program.
@@ -570,14 +589,14 @@ called an {\em alternative}.
 \begin{minipage}{0.96\textwidth}
 \[
 \begin{array}{rcl}
-\Exp &::=& \INT{n} \mid \READ{} \mid \NEG{\Exp} \\
+\Exp &::=& \INT{\Int} \mid \READ{} \mid \NEG{\Exp} \\
      &\mid&  \ADD{\Exp}{\Exp}  \\
-R_0  &::=& \code{(Program} \; \code{'()}\; \Exp \code{)}
+R_0  &::=& \PROGRAM{\code{'()}}{\Exp}
 \end{array}
 \]
 \end{minipage}
 }
-\caption{The syntax of $R_0$, a language of integer arithmetic.}
+\caption{The abstract syntax of $R_0$, a language of integer arithmetic.}
 \label{fig:r0-syntax}
 \end{figure}
 
@@ -987,18 +1006,18 @@ integer arithmetic and local variable binding, which we name $R_1$, to
 x86-64 assembly code~\citep{Intel:2015aa}.  Henceforth we shall refer
 to x86-64 simply as x86.  The chapter begins with a description of the
 $R_1$ language (Section~\ref{sec:s0}) followed by a description of x86
-(Section~\ref{sec:x86}). The x86 assembly language is quite large, so
-we discuss only what is needed for compiling $R_1$. We introduce more
-of x86 in later chapters. Once we have introduced $R_1$ and x86, we
+(Section~\ref{sec:x86}). The x86 assembly language is large, so we
+discuss only what is needed for compiling $R_1$. We introduce more of
+x86 in later chapters. Once we have introduced $R_1$ and x86, we
 reflect on their differences and come up with a plan to break down the
 translation from $R_1$ to x86 into a handful of steps
 (Section~\ref{sec:plan-s0-x86}).  The rest of the sections in this
 chapter give detailed hints regarding each step
 (Sections~\ref{sec:uniquify-s0} through \ref{sec:patch-s0}).  We hope
-to give enough hints that the well-prepared reader, together with some
-friends, can implement a compiler from $R_1$ to x86 in a couple weeks
-while at the same time leaving room for some fun and creativity.  To
-give the reader a feeling for the scale of this first compiler, the
+to give enough hints that the well-prepared reader, together with a
+few friends, can implement a compiler from $R_1$ to x86 in a couple
+weeks while at the same time leaving room for some fun and creativity.
+To give the reader a feeling for the scale of this first compiler, the
 instructor solution for the $R_1$ compiler is less than 500 lines of
 code.
 
@@ -1011,12 +1030,12 @@ the $R_1$ language is defined by the grammar in
 Figure~\ref{fig:r1-syntax}.  The non-terminal \Var{} may be any Racket
 identifier. As in $R_0$, \key{read} is a nullary operator, \key{-} is
 a unary operator, and \key{+} is a binary operator.  Similar to $R_0$,
-the $R_1$ language includes the \key{program} construct to mark the
-top of the program, which is helpful in some of the compiler passes.
-The $\itm{info}$ field of the \key{program} construct contains an
-association list that is used to communicate auxiliary data from one
-compiler pass the next. Despite the simplicity of the $R_1$ language,
-it is rich enough to exhibit several compilation techniques.
+the $R_1$ language includes the \key{Program} struct to mark the top
+of the program. The $\itm{info}$ field of the \key{Program} struct
+contains an \emph{association list} (a list of key-value pairs) that
+is used to communicate auxiliary data from one compiler pass the
+next. Despite the simplicity of the $R_1$ language, it is rich enough
+to exhibit several compilation techniques.
 
 \begin{figure}[btp]
 \centering
@@ -1024,50 +1043,52 @@ it is rich enough to exhibit several compilation techniques.
 \begin{minipage}{0.96\textwidth}
 \[
 \begin{array}{rcl}
-\Exp &::=& \Int \mid (\key{read}) \mid (\key{-}\;\Exp) \mid (\key{+} \; \Exp\;\Exp)  \\
-     &\mid&  \Var \mid \LET{\Var}{\Exp}{\Exp} \\
-R_1  &::=& (\key{program} \;\itm{info}\; \Exp)
+\Exp &::=& \INT{\Int} \mid \READ{} \mid \NEG{\Exp} \\
+     &\mid& \ADD{\Exp}{\Exp}  
+     \mid  \VAR{\Var} \mid \LET{\Var}{\Exp}{\Exp} \\
+R_1  &::=& \PROGRAM{\code{'()}}{\Exp}
 \end{array}
 \]
 \end{minipage}
 }
-\caption{The syntax of $R_1$, a language of integers and variables.}
+\caption{The abstract syntax of $R_1$, a language of integers and variables.}
 \label{fig:r1-syntax}
 \end{figure}
 
 Let us dive further into the syntax and semantics of the $R_1$
-language.  The \key{let} construct defines a variable for use within
-its body and initializes the variable with the value of an expression.
-So the following program initializes \code{x} to \code{32} and then
-evaluates the body \code{(+ 10 x)}, producing \code{42}.
+language.  The \key{Let} feature defines a variable for use within its
+body and initializes the variable with the value of an expression.
+The abstract syntax for \key{Let} is defined in Figure~\ref{fig:r1-syntax}.
+The concrete syntax for \key{Let} is
 \begin{lstlisting}
-(program ()
-   (let ([x (+ 12 20)]) (+ 10 x)))
+(let ([|$\itm{var}$| |$\itm{exp}$|]) |$\itm{exp}$|)
+\end{lstlisting}
+For example, the following program initializes \code{x} to $32$ and then
+evaluates the body \code{(+ 10 x)}, producing $42$.
+\begin{lstlisting}
+(let ([x (+ 12 20)]) (+ 10 x))
 \end{lstlisting}
 When there are multiple \key{let}'s for the same variable, the closest
 enclosing \key{let} is used. That is, variable definitions overshadow
 prior definitions. Consider the following program with two \key{let}'s
 that define variables named \code{x}. Can you figure out the result?
 \begin{lstlisting}
-(program ()
-   (let ([x 32]) (+ (let ([x 10]) x) x)))
+(let ([x 32]) (+ (let ([x 10]) x) x))
 \end{lstlisting}
-For the purposes of showing which variable uses correspond to which
-definitions, the following shows the \code{x}'s annotated with subscripts
-to distinguish them. Double check that your answer for the above is
-the same as your answer for this annotated version of the program.
+For the purposes of depicting which variable uses correspond to which
+definitions, the following shows the \code{x}'s annotated with
+subscripts to distinguish them. Double check that your answer for the
+above is the same as your answer for this annotated version of the
+program.
 \begin{lstlisting}
-(program ()
-   (let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|)))
+(let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|))
 \end{lstlisting}
 The initializing expression is always evaluated before the body of the
 \key{let}, so in the following, the \key{read} for \code{x} is
 performed before the \key{read} for \code{y}. Given the input
-\code{52} then \code{10}, the following produces \code{42} (and not
-\code{-42}).
+$52$ then $10$, the following produces $42$ (not $-42$).
 \begin{lstlisting}
-(program ()
-  (let ([x (read)]) (let ([y (read)]) (+ x (- y)))))
+(let ([x (read)]) (let ([y (read)]) (+ x (- y))))
 \end{lstlisting}
 
 Figure~\ref{fig:interp-R1} shows the definitional interpreter for the
@@ -1081,37 +1102,37 @@ environment. The \code{interp-R1} function takes the current
 environment, \code{env}, as an extra parameter.  When the interpreter
 encounters a variable, it finds the corresponding value using the
 \code{lookup} function (Appendix~\ref{appendix:utilities}).  When the
-interpreter encounters a \key{let}, it evaluates the initializing
+interpreter encounters a \key{Let}, it evaluates the initializing
 expression, extends the environment with the result value bound to the
-variable, then evaluates the body of the \key{let}.
+variable, then evaluates the body of the \key{Let}.
 
 \begin{figure}[tbp]
 \begin{lstlisting}
 (define (interp-exp env)
   (lambda (e)
     (match e
-      [(? fixnum?) e]
-      [`(read)
+      [(Int n) n]
+      [(Prim 'read '())
        (define r (read))
        (cond [(fixnum? r) r]
              [else (error 'interp-R1 "expected an integer" r)])]
-      [`(- ,e)
+      [(Prim '- (list e))
        (define v ((interp-exp env) e))
        (fx- 0 v)]
-      [`(+ ,e1 ,e2)
+      [(Prim '+ (list e1 e2))
        (define v1 ((interp-exp env) e1))
        (define v2 ((interp-exp env) e2))
        (fx+ v1 v2)]
-      [(? symbol?) (lookup e env)]
-      [`(let ([,x ,e]) ,body)
+      [(Var x) (lookup x env)]
+      [(Let x e body)
        (define new-env (cons (cons x ((interp-exp env) e)) env))
        ((interp-exp new-env) body)]
       )))
 
-   (define (interp-R1 env)
-     (lambda (p)
-       (match p
-         [`(program ,info ,e) ((interp-exp '()) e)])))
+(define (interp-R1 p)
+  (match p
+    [(Program info e) ((interp-exp '()) e)]
+    ))
 \end{lstlisting}
 \caption{Interpreter for the $R_1$ language.}
 \label{fig:interp-R1}

+ 4 - 3
defs.tex

@@ -17,20 +17,21 @@
 \newcommand{\Op}{\itm{op}}
 \newcommand{\key}[1]{\texttt{#1}}
 \newcommand{\code}[1]{\texttt{#1}}
+
+\newcommand{\INT}[1]{\key{(Int}\;#1\key{)}}
 \newcommand{\READ}{\key{(Prim}\;\code{'read}\;\key{'())}}
 \newcommand{\NEG}[1]{\key{(Prim}\;\code{'-}\;\code{(list}\;#1\;\code{))}}
 \newcommand{\PROGRAM}[2]{\code{(Program}\;#1\;#2\code{)}}
 \newcommand{\ADD}[2]{\key{(Prim}\;\code{'+}\;\code{(list}\;#1\;#2\code{))}}
 \newcommand{\UNIOP}[2]{(\key{#1}~#2)}
 \newcommand{\BINOP}[3]{(\key{#1}~#2~#3)}
-\newcommand{\LET}[3]{(\key{let}~([#1\;#2])~#3)}
+\newcommand{\VAR}[1]{\key{(Var}\;#1\key{)}}
+\newcommand{\LET}[3]{\key{(Let}~#1~#2~#3\key{)}}
 
 \newcommand{\ASSIGN}[2]{(\key{assign}~#1\;#2)}
 \newcommand{\RETURN}[1]{(\key{return}~#1)}
 
-\newcommand{\INT}[1]{\key{(Int}\;#1\key{)}}
 \newcommand{\REG}[1]{\key{(Reg}\;#1\key{)}}
-\newcommand{\VAR}[1]{\key{(Var}\;#1\key{)}}
 \newcommand{\STACKLOC}[1]{(\key{stack}\;#1)}
 
 \newcommand{\IF}[3]{(\key{if}\,#1\;#2\;#3)}