Browse Source

updates to chapter 2

Jeremy Siek 4 years ago
parent
commit
946a5e87b0
2 changed files with 85 additions and 63 deletions
  1. 81 60
      book.tex
  2. 4 3
      defs.tex

+ 81 - 60
book.tex

@@ -409,6 +409,9 @@ To create a AST node for the integer $8$, we write \code{(Int 8)}.
 \begin{lstlisting}
 \begin{lstlisting}
 (define eight (Int 8))
 (define eight (Int 8))
 \end{lstlisting}
 \end{lstlisting}
+We say that the value created by \code{(Int 8)} is an
+\emph{instance} of the \code{Int} structure.
+
 The following is the \code{struct} definition for primitives operations.
 The following is the \code{struct} definition for primitives operations.
 \begin{lstlisting}
 \begin{lstlisting}
 (struct Prim (op arg*))
 (struct Prim (op arg*))
@@ -429,7 +432,21 @@ whereas the addition operator has two children:
 (define ast1.1 (Prim '+ (list rd neg-eight)))
 (define ast1.1 (Prim '+ (list rd neg-eight)))
 \end{lstlisting}
 \end{lstlisting}
 
 
-When deciding how to compile program \eqref{eq:arith-prog}, we need to
+We have made a design choice regarding the \code{Prim} structure.
+Instead of using one structure for many different operations
+(\code{read}, \code{+}, and \code{-}), we could have instead defined a
+structure for each operation, as follows.
+\begin{lstlisting}
+(struct Read ())
+(struct Add (left right))
+(struct Neg (value))
+\end{lstlisting}
+The reason we choose to use just one structure is that in many parts
+of the compiler, the code for the different primitive operators is the
+same, so we might as well just write that code once, which is enabled
+by using a single structure.
+
+When compiling a program such as \eqref{eq:arith-prog}, we need to
 know that the operation associated with the root node is addition and
 know that the operation associated with the root node is addition and
 that it has two children: \texttt{read} and a negation. The AST data
 that it has two children: \texttt{read} and a negation. The AST data
 structure directly supports these queries, as we shall see in
 structure directly supports these queries, as we shall see in
@@ -455,10 +472,10 @@ Backus-Naur Form (BNF)~\citep{Backus:1960aa,Knuth:1964aa}.  As an
 example, we describe a small language, named $R_0$, that consists of
 example, we describe a small language, named $R_0$, that consists of
 integers and arithmetic operations.
 integers and arithmetic operations.
 
 
-The first grammar rule says that given any integer $n$, an integer
-node $\INT{n}$ is an expression:
+The first grammar rule says that an instance of the \code{Int}
+structure is an expression:
 \begin{equation}
 \begin{equation}
-\Exp ::= \INT{n}  \label{eq:arith-int}
+\Exp ::= \INT{\Int}  \label{eq:arith-int}
 \end{equation}
 \end{equation}
 %
 %
 Each rule has a left-hand-side and a right-hand-side. The way to read
 Each rule has a left-hand-side and a right-hand-side. The way to read
@@ -469,15 +486,17 @@ according to the left-hand-side.
 A name such as $\Exp$ that is
 A name such as $\Exp$ that is
 defined by the grammar rules is a \emph{non-terminal}.
 defined by the grammar rules is a \emph{non-terminal}.
 %
 %
-%% The name $\Int$ is a also a non-terminal, however, we do not define
-%% $\Int$ because the reader already knows what an integer is.
-
-We make the simplifying design decision that all of the languages in
-this book only handle machine-representable integers.  On most modern
+The name $\Int$ is a also a non-terminal, but instead of defining it
+with a grammar rule, we define it with the following explanation.  We
+make the simplifying design decision that all of the languages in this
+book only handle machine-representable integers.  On most modern
 machines this corresponds to integers represented with 64-bits, i.e.,
 machines this corresponds to integers represented with 64-bits, i.e.,
 the in range $-2^{63}$ to $2^{63}-1$.  We restrict this range further
 the in range $-2^{63}$ to $2^{63}-1$.  We restrict this range further
 to match the Racket \texttt{fixnum} datatype, which allows 63-bit
 to match the Racket \texttt{fixnum} datatype, which allows 63-bit
-integers on a 64-bit machine.
+integers on a 64-bit machine. So an $\Int$ is a sequence of decimals
+($0$ to $9$), possibly starting with $-$ (for negative integers), such
+that the sequence of decimals represent an integer in range $-2^{62}$
+to $2^{62}-1$.
 
 
 The second grammar rule is the \texttt{read} operation that receives
 The second grammar rule is the \texttt{read} operation that receives
 an input integer from the user of the program.
 an input integer from the user of the program.
@@ -570,14 +589,14 @@ called an {\em alternative}.
 \begin{minipage}{0.96\textwidth}
 \begin{minipage}{0.96\textwidth}
 \[
 \[
 \begin{array}{rcl}
 \begin{array}{rcl}
-\Exp &::=& \INT{n} \mid \READ{} \mid \NEG{\Exp} \\
+\Exp &::=& \INT{\Int} \mid \READ{} \mid \NEG{\Exp} \\
      &\mid&  \ADD{\Exp}{\Exp}  \\
      &\mid&  \ADD{\Exp}{\Exp}  \\
-R_0  &::=& \code{(Program} \; \code{'()}\; \Exp \code{)}
+R_0  &::=& \PROGRAM{\code{'()}}{\Exp}
 \end{array}
 \end{array}
 \]
 \]
 \end{minipage}
 \end{minipage}
 }
 }
-\caption{The syntax of $R_0$, a language of integer arithmetic.}
+\caption{The abstract syntax of $R_0$, a language of integer arithmetic.}
 \label{fig:r0-syntax}
 \label{fig:r0-syntax}
 \end{figure}
 \end{figure}
 
 
@@ -987,18 +1006,18 @@ integer arithmetic and local variable binding, which we name $R_1$, to
 x86-64 assembly code~\citep{Intel:2015aa}.  Henceforth we shall refer
 x86-64 assembly code~\citep{Intel:2015aa}.  Henceforth we shall refer
 to x86-64 simply as x86.  The chapter begins with a description of the
 to x86-64 simply as x86.  The chapter begins with a description of the
 $R_1$ language (Section~\ref{sec:s0}) followed by a description of x86
 $R_1$ language (Section~\ref{sec:s0}) followed by a description of x86
-(Section~\ref{sec:x86}). The x86 assembly language is quite large, so
-we discuss only what is needed for compiling $R_1$. We introduce more
-of x86 in later chapters. Once we have introduced $R_1$ and x86, we
+(Section~\ref{sec:x86}). The x86 assembly language is large, so we
+discuss only what is needed for compiling $R_1$. We introduce more of
+x86 in later chapters. Once we have introduced $R_1$ and x86, we
 reflect on their differences and come up with a plan to break down the
 reflect on their differences and come up with a plan to break down the
 translation from $R_1$ to x86 into a handful of steps
 translation from $R_1$ to x86 into a handful of steps
 (Section~\ref{sec:plan-s0-x86}).  The rest of the sections in this
 (Section~\ref{sec:plan-s0-x86}).  The rest of the sections in this
 chapter give detailed hints regarding each step
 chapter give detailed hints regarding each step
 (Sections~\ref{sec:uniquify-s0} through \ref{sec:patch-s0}).  We hope
 (Sections~\ref{sec:uniquify-s0} through \ref{sec:patch-s0}).  We hope
-to give enough hints that the well-prepared reader, together with some
-friends, can implement a compiler from $R_1$ to x86 in a couple weeks
-while at the same time leaving room for some fun and creativity.  To
-give the reader a feeling for the scale of this first compiler, the
+to give enough hints that the well-prepared reader, together with a
+few friends, can implement a compiler from $R_1$ to x86 in a couple
+weeks while at the same time leaving room for some fun and creativity.
+To give the reader a feeling for the scale of this first compiler, the
 instructor solution for the $R_1$ compiler is less than 500 lines of
 instructor solution for the $R_1$ compiler is less than 500 lines of
 code.
 code.
 
 
@@ -1011,12 +1030,12 @@ the $R_1$ language is defined by the grammar in
 Figure~\ref{fig:r1-syntax}.  The non-terminal \Var{} may be any Racket
 Figure~\ref{fig:r1-syntax}.  The non-terminal \Var{} may be any Racket
 identifier. As in $R_0$, \key{read} is a nullary operator, \key{-} is
 identifier. As in $R_0$, \key{read} is a nullary operator, \key{-} is
 a unary operator, and \key{+} is a binary operator.  Similar to $R_0$,
 a unary operator, and \key{+} is a binary operator.  Similar to $R_0$,
-the $R_1$ language includes the \key{program} construct to mark the
-top of the program, which is helpful in some of the compiler passes.
-The $\itm{info}$ field of the \key{program} construct contains an
-association list that is used to communicate auxiliary data from one
-compiler pass the next. Despite the simplicity of the $R_1$ language,
-it is rich enough to exhibit several compilation techniques.
+the $R_1$ language includes the \key{Program} struct to mark the top
+of the program. The $\itm{info}$ field of the \key{Program} struct
+contains an \emph{association list} (a list of key-value pairs) that
+is used to communicate auxiliary data from one compiler pass the
+next. Despite the simplicity of the $R_1$ language, it is rich enough
+to exhibit several compilation techniques.
 
 
 \begin{figure}[btp]
 \begin{figure}[btp]
 \centering
 \centering
@@ -1024,50 +1043,52 @@ it is rich enough to exhibit several compilation techniques.
 \begin{minipage}{0.96\textwidth}
 \begin{minipage}{0.96\textwidth}
 \[
 \[
 \begin{array}{rcl}
 \begin{array}{rcl}
-\Exp &::=& \Int \mid (\key{read}) \mid (\key{-}\;\Exp) \mid (\key{+} \; \Exp\;\Exp)  \\
-     &\mid&  \Var \mid \LET{\Var}{\Exp}{\Exp} \\
-R_1  &::=& (\key{program} \;\itm{info}\; \Exp)
+\Exp &::=& \INT{\Int} \mid \READ{} \mid \NEG{\Exp} \\
+     &\mid& \ADD{\Exp}{\Exp}  
+     \mid  \VAR{\Var} \mid \LET{\Var}{\Exp}{\Exp} \\
+R_1  &::=& \PROGRAM{\code{'()}}{\Exp}
 \end{array}
 \end{array}
 \]
 \]
 \end{minipage}
 \end{minipage}
 }
 }
-\caption{The syntax of $R_1$, a language of integers and variables.}
+\caption{The abstract syntax of $R_1$, a language of integers and variables.}
 \label{fig:r1-syntax}
 \label{fig:r1-syntax}
 \end{figure}
 \end{figure}
 
 
 Let us dive further into the syntax and semantics of the $R_1$
 Let us dive further into the syntax and semantics of the $R_1$
-language.  The \key{let} construct defines a variable for use within
-its body and initializes the variable with the value of an expression.
-So the following program initializes \code{x} to \code{32} and then
-evaluates the body \code{(+ 10 x)}, producing \code{42}.
+language.  The \key{Let} feature defines a variable for use within its
+body and initializes the variable with the value of an expression.
+The abstract syntax for \key{Let} is defined in Figure~\ref{fig:r1-syntax}.
+The concrete syntax for \key{Let} is
 \begin{lstlisting}
 \begin{lstlisting}
-(program ()
-   (let ([x (+ 12 20)]) (+ 10 x)))
+(let ([|$\itm{var}$| |$\itm{exp}$|]) |$\itm{exp}$|)
+\end{lstlisting}
+For example, the following program initializes \code{x} to $32$ and then
+evaluates the body \code{(+ 10 x)}, producing $42$.
+\begin{lstlisting}
+(let ([x (+ 12 20)]) (+ 10 x))
 \end{lstlisting}
 \end{lstlisting}
 When there are multiple \key{let}'s for the same variable, the closest
 When there are multiple \key{let}'s for the same variable, the closest
 enclosing \key{let} is used. That is, variable definitions overshadow
 enclosing \key{let} is used. That is, variable definitions overshadow
 prior definitions. Consider the following program with two \key{let}'s
 prior definitions. Consider the following program with two \key{let}'s
 that define variables named \code{x}. Can you figure out the result?
 that define variables named \code{x}. Can you figure out the result?
 \begin{lstlisting}
 \begin{lstlisting}
-(program ()
-   (let ([x 32]) (+ (let ([x 10]) x) x)))
+(let ([x 32]) (+ (let ([x 10]) x) x))
 \end{lstlisting}
 \end{lstlisting}
-For the purposes of showing which variable uses correspond to which
-definitions, the following shows the \code{x}'s annotated with subscripts
-to distinguish them. Double check that your answer for the above is
-the same as your answer for this annotated version of the program.
+For the purposes of depicting which variable uses correspond to which
+definitions, the following shows the \code{x}'s annotated with
+subscripts to distinguish them. Double check that your answer for the
+above is the same as your answer for this annotated version of the
+program.
 \begin{lstlisting}
 \begin{lstlisting}
-(program ()
-   (let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|)))
+(let ([x|$_1$| 32]) (+ (let ([x|$_2$| 10]) x|$_2$|) x|$_1$|))
 \end{lstlisting}
 \end{lstlisting}
 The initializing expression is always evaluated before the body of the
 The initializing expression is always evaluated before the body of the
 \key{let}, so in the following, the \key{read} for \code{x} is
 \key{let}, so in the following, the \key{read} for \code{x} is
 performed before the \key{read} for \code{y}. Given the input
 performed before the \key{read} for \code{y}. Given the input
-\code{52} then \code{10}, the following produces \code{42} (and not
-\code{-42}).
+$52$ then $10$, the following produces $42$ (not $-42$).
 \begin{lstlisting}
 \begin{lstlisting}
-(program ()
-  (let ([x (read)]) (let ([y (read)]) (+ x (- y)))))
+(let ([x (read)]) (let ([y (read)]) (+ x (- y))))
 \end{lstlisting}
 \end{lstlisting}
 
 
 Figure~\ref{fig:interp-R1} shows the definitional interpreter for the
 Figure~\ref{fig:interp-R1} shows the definitional interpreter for the
@@ -1081,37 +1102,37 @@ environment. The \code{interp-R1} function takes the current
 environment, \code{env}, as an extra parameter.  When the interpreter
 environment, \code{env}, as an extra parameter.  When the interpreter
 encounters a variable, it finds the corresponding value using the
 encounters a variable, it finds the corresponding value using the
 \code{lookup} function (Appendix~\ref{appendix:utilities}).  When the
 \code{lookup} function (Appendix~\ref{appendix:utilities}).  When the
-interpreter encounters a \key{let}, it evaluates the initializing
+interpreter encounters a \key{Let}, it evaluates the initializing
 expression, extends the environment with the result value bound to the
 expression, extends the environment with the result value bound to the
-variable, then evaluates the body of the \key{let}.
+variable, then evaluates the body of the \key{Let}.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \begin{lstlisting}
 \begin{lstlisting}
 (define (interp-exp env)
 (define (interp-exp env)
   (lambda (e)
   (lambda (e)
     (match e
     (match e
-      [(? fixnum?) e]
-      [`(read)
+      [(Int n) n]
+      [(Prim 'read '())
        (define r (read))
        (define r (read))
        (cond [(fixnum? r) r]
        (cond [(fixnum? r) r]
              [else (error 'interp-R1 "expected an integer" r)])]
              [else (error 'interp-R1 "expected an integer" r)])]
-      [`(- ,e)
+      [(Prim '- (list e))
        (define v ((interp-exp env) e))
        (define v ((interp-exp env) e))
        (fx- 0 v)]
        (fx- 0 v)]
-      [`(+ ,e1 ,e2)
+      [(Prim '+ (list e1 e2))
        (define v1 ((interp-exp env) e1))
        (define v1 ((interp-exp env) e1))
        (define v2 ((interp-exp env) e2))
        (define v2 ((interp-exp env) e2))
        (fx+ v1 v2)]
        (fx+ v1 v2)]
-      [(? symbol?) (lookup e env)]
-      [`(let ([,x ,e]) ,body)
+      [(Var x) (lookup x env)]
+      [(Let x e body)
        (define new-env (cons (cons x ((interp-exp env) e)) env))
        (define new-env (cons (cons x ((interp-exp env) e)) env))
        ((interp-exp new-env) body)]
        ((interp-exp new-env) body)]
       )))
       )))
 
 
-   (define (interp-R1 env)
-     (lambda (p)
-       (match p
-         [`(program ,info ,e) ((interp-exp '()) e)])))
+(define (interp-R1 p)
+  (match p
+    [(Program info e) ((interp-exp '()) e)]
+    ))
 \end{lstlisting}
 \end{lstlisting}
 \caption{Interpreter for the $R_1$ language.}
 \caption{Interpreter for the $R_1$ language.}
 \label{fig:interp-R1}
 \label{fig:interp-R1}

+ 4 - 3
defs.tex

@@ -17,20 +17,21 @@
 \newcommand{\Op}{\itm{op}}
 \newcommand{\Op}{\itm{op}}
 \newcommand{\key}[1]{\texttt{#1}}
 \newcommand{\key}[1]{\texttt{#1}}
 \newcommand{\code}[1]{\texttt{#1}}
 \newcommand{\code}[1]{\texttt{#1}}
+
+\newcommand{\INT}[1]{\key{(Int}\;#1\key{)}}
 \newcommand{\READ}{\key{(Prim}\;\code{'read}\;\key{'())}}
 \newcommand{\READ}{\key{(Prim}\;\code{'read}\;\key{'())}}
 \newcommand{\NEG}[1]{\key{(Prim}\;\code{'-}\;\code{(list}\;#1\;\code{))}}
 \newcommand{\NEG}[1]{\key{(Prim}\;\code{'-}\;\code{(list}\;#1\;\code{))}}
 \newcommand{\PROGRAM}[2]{\code{(Program}\;#1\;#2\code{)}}
 \newcommand{\PROGRAM}[2]{\code{(Program}\;#1\;#2\code{)}}
 \newcommand{\ADD}[2]{\key{(Prim}\;\code{'+}\;\code{(list}\;#1\;#2\code{))}}
 \newcommand{\ADD}[2]{\key{(Prim}\;\code{'+}\;\code{(list}\;#1\;#2\code{))}}
 \newcommand{\UNIOP}[2]{(\key{#1}~#2)}
 \newcommand{\UNIOP}[2]{(\key{#1}~#2)}
 \newcommand{\BINOP}[3]{(\key{#1}~#2~#3)}
 \newcommand{\BINOP}[3]{(\key{#1}~#2~#3)}
-\newcommand{\LET}[3]{(\key{let}~([#1\;#2])~#3)}
+\newcommand{\VAR}[1]{\key{(Var}\;#1\key{)}}
+\newcommand{\LET}[3]{\key{(Let}~#1~#2~#3\key{)}}
 
 
 \newcommand{\ASSIGN}[2]{(\key{assign}~#1\;#2)}
 \newcommand{\ASSIGN}[2]{(\key{assign}~#1\;#2)}
 \newcommand{\RETURN}[1]{(\key{return}~#1)}
 \newcommand{\RETURN}[1]{(\key{return}~#1)}
 
 
-\newcommand{\INT}[1]{\key{(Int}\;#1\key{)}}
 \newcommand{\REG}[1]{\key{(Reg}\;#1\key{)}}
 \newcommand{\REG}[1]{\key{(Reg}\;#1\key{)}}
-\newcommand{\VAR}[1]{\key{(Var}\;#1\key{)}}
 \newcommand{\STACKLOC}[1]{(\key{stack}\;#1)}
 \newcommand{\STACKLOC}[1]{(\key{stack}\;#1)}
 
 
 \newcommand{\IF}[3]{(\key{if}\,#1\;#2\;#3)}
 \newcommand{\IF}[3]{(\key{if}\,#1\;#2\;#3)}