Jeremy Siek 4 anni fa
parent
commit
3f8e49d918
2 ha cambiato i file con 85 aggiunte e 87 eliminazioni
  1. 71 73
      book.tex
  2. 14 14
      defs.tex

+ 71 - 73
book.tex

@@ -40,7 +40,7 @@
 \usepackage{amsmath}
 \usepackage{amsthm}
 \usepackage{amssymb}
-\usepackage{natbib}
+\usepackage[numbers]{natbib}
 \usepackage{stmaryrd}
 \usepackage{xypic}
 \usepackage{semantic}
@@ -203,26 +203,27 @@ During that time, another graduate student named Abdulaziz Ghuloum
 observed that the front-to-back organization of the course made it
 difficult for students to understand the rationale for the compiler
 design. Ghuloum proposed an incremental approach in which the students
-build the compiler in stages; they start by implementing a complete
-compiler for a very small subset of the input language and in each
-subsequent stage they add a language feature and add or modify passes
-to handle the new feature~\citep{Ghuloum:2006bh}.  In this way, the
-students see how the language features motivate aspects of the
-compiler design.
+start by implementing a complete compiler for a very small subset of
+the language. In each subsequent stage they add a feature to the
+language and then add or modify passes to handle the new
+feature~\citep{Ghuloum:2006bh}.  In this way, the students see how the
+language features motivate aspects of the compiler design.
 
 After graduating from Indiana University in 2005, I went on to teach
 at the University of Colorado. I adapted the nano-pass and incremental
 approaches to compiling a subset of the Python
-language~\citep{Siek:2012ab}.  Python and Scheme are quite different
-on the surface but there is a large overlap in the compiler techniques
-required for the two languages. Thus, I was able to teach much of the
-same content from the Indiana compiler course. I very much enjoyed
-teaching the course organized in this way, and even better, many of
-the students learned a lot and got excited about compilers.
+language~\citep{Siek:2012ab}.
+%% Python and Scheme are quite different
+%% on the surface but there is a large overlap in the compiler techniques
+%% required for the two languages. Thus, I was able to teach much of the
+%% same content from the Indiana compiler course.
+I very much enjoyed teaching the course organized in this way, and
+even better, many of the students learned a lot and got excited about
+compilers.
 
 I returned to Indiana University in 2013.  In my absence the compiler
 course had switched from the front-to-back organization to a
-back-to-front~\cite{Dybvig:2010aa}. While that organization also works
+back-to-front~\citep{Dybvig:2010aa}. While that organization also works
 well, I prefer the incremental approach and started porting and
 adapting the structure of the Colorado course back into the land of
 Scheme. In the meantime Indiana University had moved on from Scheme to
@@ -250,8 +251,8 @@ prepare students for a lifelong career in programming languages.
 
 The book uses the Racket language both for the implementation of the
 compiler and for the language that is compiled, so a student should be
-proficient with Racket (or Scheme) prior to reading this book. There
-are many excellent resources for learning Scheme and
+proficient with Racket or Scheme prior to reading this book. There are
+many excellent resources for learning Scheme and
 Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}.
 
 It is helpful but not necessary for the student to have prior exposure
@@ -267,8 +268,8 @@ system (written in C) when it is compiled using the GNU C compiler
 (\code{gcc}) on the Linux and MacOS operating systems. (Minor
 adjustments are needed for MacOS which we note as they arise.)
 %
-When running on the Microsoft Windows operating system, the GNU C
-compiler follows the Microsoft x64 calling
+The GNU C compiler, when running on the Microsoft Windows operating
+system, follows the Microsoft x64 calling
 convention~\citep{Microsoft:2018aa,Microsoft:2020aa}. So the assembly
 code that we generate will \emph{not} work properly with our runtime
 system on Windows. One option to consider for using a Windows computer
@@ -329,7 +330,7 @@ feature to represent ASTs (Section~\ref{sec:ast}). We use grammars to
 define the abstract syntax of programming languages
 (Section~\ref{sec:grammar}) and pattern matching to inspect individual
 nodes in an AST (Section~\ref{sec:pattern-matching}).  We use
-recursive functions to construct and deconstruct entire ASTs
+recursive functions to construct and deconstruct ASTs
 (Section~\ref{sec:recursion}).  This chapter provides an brief
 introduction to these ideas.  \index{struct}
 
@@ -340,7 +341,7 @@ Compilers use abstract syntax trees to represent programs because they
 often need to ask questions like: for a given part of a program, what
 kind of language feature is it? What are its sub-parts? Consider the
 program on the left and its AST on the right. This program is an
-addition and it has two sub-parts, a read operation and a
+addition operation and it has two sub-parts, a read operation and a
 negation. The negation has another sub-part, the integer constant
 \code{8}. By using a tree to represent the program, we can easily
 follow the links to go from one part of a program to its sub-parts.
@@ -480,8 +481,8 @@ by using a single structure.
 When compiling a program such as \eqref{eq:arith-prog}, we need to
 know that the operation associated with the root node is addition and
 we need to be able to access its two children. Racket provides pattern
-matching over structures to support these kinds of queries, as we
-see in Section~\ref{sec:pattern-matching}.
+matching to support these kinds of queries, as we see in
+Section~\ref{sec:pattern-matching}.
 
 In this book, we often write down the concrete syntax of a program
 even when we really have in mind the AST because the concrete syntax
@@ -539,8 +540,8 @@ an input integer from the user of the program.
   \Exp ::= \READ{} \label{eq:arith-read}
 \end{equation}
 
-The third rule says that, given an $\Exp$ node, you can build another
-$\Exp$ node by negating it.
+The third rule says that, given an $\Exp$ node, the negation of that
+node is also an $\Exp$.
 \begin{equation}
   \Exp ::= \NEG{\Exp}  \label{eq:arith-neg}
 \end{equation}
@@ -549,9 +550,10 @@ Symbols in typewriter font such as \key{-} and \key{read} are
 the rule to be applicable.
 \index{terminal}
 
-We can apply these rules to build ASTs in the \LangInt{} language. By rule
-\eqref{eq:arith-int}, \texttt{(Int 8)} is an $\Exp$, then by rule
-\eqref{eq:arith-neg}, the following AST is an $\Exp$.
+We can apply these rules to categorize the ASTs that are in the
+\LangInt{} language. For example, by rule \eqref{eq:arith-int}
+\texttt{(Int 8)} is an $\Exp$, then by rule \eqref{eq:arith-neg} the
+following AST is an $\Exp$.
 \begin{center}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
@@ -571,7 +573,7 @@ We can apply these rules to build ASTs in the \LangInt{} language. By rule
 \end{minipage}
 \end{center}
 
-The next grammar rule defines addition expressions:
+The next grammar rule is for addition expressions:
 \begin{equation}
   \Exp ::= \ADD{\Exp}{\Exp} \label{eq:arith-add}
 \end{equation}
@@ -679,16 +681,14 @@ the output on the right. \index{match} \index{pattern matching}
 \end{lstlisting}
 \end{minipage}
 \end{center}
-In the above example, the \texttt{match} form takes the AST
+In the above example, the \texttt{match} form takes an AST
 \eqref{eq:arith-prog} and binds its parts to the three pattern
-variables \texttt{op}, \texttt{child1}, and \texttt{child2}. In
-general, a match clause consists of a \emph{pattern} and a
-\emph{body}.
-\index{pattern}
-Patterns are recursively defined to be either a pattern
-variable, a structure name followed by a pattern for each of the
-structure's arguments, or an S-expression (symbols, lists, etc.).
-(See Chapter 12 of The Racket
+variables \texttt{op}, \texttt{child1}, and \texttt{child2}, and then
+prints out the operator. In general, a match clause consists of a
+\emph{pattern} and a \emph{body}.\index{pattern} Patterns are
+recursively defined to be either a pattern variable, a structure name
+followed by a pattern for each of the structure's arguments, or an
+S-expression (symbols, lists, etc.).  (See Chapter 12 of The Racket
 Guide\footnote{\url{https://docs.racket-lang.org/guide/match.html}}
 and Chapter 9 of The Racket
 Reference\footnote{\url{https://docs.racket-lang.org/reference/match.html}}
@@ -767,7 +767,7 @@ it is defined using a sequence of match clauses that correspond to a
 grammar, and the body of each clause makes a recursive call on each
 child node.\footnote{This principle of structuring code according to
   the data definition is advocated in the book \emph{How to Design
-    Programs}\url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}.}.
+    Programs} \url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}.}.
 Below we also define a second function, named \code{Rint?}, that
 determines whether an AST is an \LangInt{} program.  In general we can
 expect to write one recursive function to handle each non-terminal in
@@ -841,7 +841,7 @@ it comes to the \code{Program} wrapper.  Yet this style is generally
 %
 For example, the above function is subtly wrong:
 \lstinline{(Rint? (Program '() (Program '() (Int 3))))}
-would return true, when it should return false.
+returns true when it should return false.
 
 
 \section{Interpreters}
@@ -878,13 +878,11 @@ function, which in turn has one match clause per grammar rule for
     [(Prim '+ (list e1 e2))
      (define v1 (interp-exp e1))
      (define v2 (interp-exp e2))
-     (fx+ v1 v2)]
-    ))
+     (fx+ v1 v2)]))
 
 (define (interp-Rint p)
   (match p
-    [(Program '() e) (interp-exp e)]
-    ))
+    [(Program '() e) (interp-exp e)]))
 \end{lstlisting}
 \caption{Interpreter for the \LangInt{} language.}
 \label{fig:interp-Rint}
@@ -895,8 +893,12 @@ following program adds two integers.
 \begin{lstlisting}
 (+ 10 32)
 \end{lstlisting}
-The result is \key{42}.  We wrote the above program in concrete syntax,
-whereas the parsed abstract syntax is:
+The result is \key{42}, the answer to life, the universe, and
+everything: \code{42}!\footnote{\emph{The Hitchhiker's Guide to the
+    Galaxy} by Douglas Adams.}.
+%
+We wrote the above program in concrete syntax whereas the parsed
+abstract syntax is:
 \begin{lstlisting}
 (Program '() (Prim '+ (list (Int 10) (Int 32))))
 \end{lstlisting}
@@ -926,15 +928,15 @@ It produces an error:
 fx+: result is not a fixnum
 \end{lstlisting}
 We establish the convention that if running the definitional
-interpreter on a program produces an error other than
-\code{trapped-error}, then the meaning of that program is
-\emph{unspecified}\index{unspecified behavior}. That means a compiler
-for the language is under no obligations regarding that program; it
-may or may not produce an executable, and if it does, that executable
-can do anything.  On the other hand, if the error is a
-\code{trapped-error}, then the compiled program is also required to
-report that an error occurred. To signal an error, exit with a return
-code of \code{255}.  The interpreters in chapters
+interpreter on a program produces an error then the meaning of that
+program is \emph{unspecified}\index{unspecified behavior}, unless the
+error is a \code{trapped-error}. A compiler for the language is under
+no obligations regarding programs with unspecified behavior; it does
+not have to produce an executable, and if it does, that executable can
+do anything.  On the other hand, if the error is a
+\code{trapped-error}, then the compiler must produce an executable and
+it is required to report that an error occurred. To signal an error,
+exit with a return code of \code{255}.  The interpreters in chapters
 \ref{ch:type-dynamic} and \ref{ch:gradual-typing} use
 \code{trapped-error}.
 
@@ -950,9 +952,7 @@ program \eqref{eq:arith-prog} performs a \key{read} and then subtracts
 \begin{lstlisting}
 (interp-Rint (Program '() ast1.1))
 \end{lstlisting}
-and if the input is \code{50}, then we get the answer to life, the
-universe, and everything: \code{42}!\footnote{\emph{The Hitchhiker's
-    Guide to the Galaxy} by Douglas Adams.}
+and if the input is \code{50}, the result is \code{42}.
 
 We include the \key{read} operation in \LangInt{} so a clever student
 cannot implement a compiler for \LangInt{} that simply runs the interpreter
@@ -962,14 +962,14 @@ first instance of this course.)
 
 The job of a compiler is to translate a program in one language into a
 program in another language so that the output program behaves the
-same way as the input program does according to its definitional
-interpreter. This idea is depicted in the following diagram. Suppose
-we have two languages, $\mathcal{L}_1$ and $\mathcal{L}_2$, and an
-interpreter for each language.  Suppose that the compiler translates
-program $P_1$ in language $\mathcal{L}_1$ into program $P_2$ in
-language $\mathcal{L}_2$.  Then interpreting $P_1$ and $P_2$ on their
-respective interpreters with input $i$ should yield the same output
-$o$.
+same way as the input program does. This idea is depicted in the
+following diagram. Suppose we have two languages, $\mathcal{L}_1$ and
+$\mathcal{L}_2$, and a definitional interpreter for each language.
+Given a compiler that translates from language $\mathcal{L}_1$ to
+$\mathcal{L}_2$ and given any program $P_1$ in $\mathcal{L}_1$, the
+compiler must translate it into some program $P_2$ such that
+interpreting $P_1$ and $P_2$ on their respective interpreters with
+same input $i$ yields the same output $o$.
 \begin{equation} \label{eq:compile-correct}
 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
  \node (p1) at (0,  0) {$P_1$};
@@ -991,7 +991,7 @@ In this section we consider a compiler that translates \LangInt{} programs
 into \LangInt{} programs that may be more efficient, that is, this compiler
 is an optimizer. This optimizer eagerly computes the parts of the
 program that do not depend on any inputs, a process known as
-\emph{partial evaluation}~\cite{Jones:1993uq}.
+\emph{partial evaluation}~\citep{Jones:1993uq}.
 \index{partial evaluation}
 For example, given the following program
 \begin{lstlisting}
@@ -1028,13 +1028,11 @@ functions is the output of partially evaluating the children.
     [(Int n) (Int n)]
     [(Prim 'read '()) (Prim 'read '())]
     [(Prim '- (list e1)) (pe-neg (pe-exp e1))]
-    [(Prim '+ (list e1 e2)) (pe-add (pe-exp e1) (pe-exp e2))]
-    ))
+    [(Prim '+ (list e1 e2)) (pe-add (pe-exp e1) (pe-exp e2))]))
 
 (define (pe-Rint p)
   (match p
-    [(Program '() e) (Program '() (pe-exp e))]
-    ))
+    [(Program '() e) (Program '() (pe-exp e))]))
 \end{lstlisting}
 \caption{A partial evaluator for \LangInt{} expressions.}
 \label{fig:pe-arith}
@@ -1042,13 +1040,13 @@ functions is the output of partially evaluating the children.
 
 The \texttt{pe-neg} and \texttt{pe-add} functions check whether their
 arguments are integers and if they are, perform the appropriate
-arithmetic.  Otherwise, they create an AST node for the operation
-(either negation or addition).
+arithmetic.  Otherwise, they create an AST node for the arithmetic
+operation.
 
 To gain some confidence that the partial evaluator is correct, we can
 test whether it produces programs that get the same result as the
 input programs. That is, we can test whether it satisfies Diagram
-\eqref{eq:compile-correct}. The following code runs the partial
+\ref{eq:compile-correct}. The following code runs the partial
 evaluator on several examples and tests the output program.  The
 \texttt{parse-program} and \texttt{assert} functions are defined in
 Appendix~\ref{appendix:utilities}.\\

+ 14 - 14
defs.tex

@@ -78,18 +78,18 @@
 \newcommand{\INT}[1]{\key{(Int}\;#1\key{)}}
 \newcommand{\BOOL}[1]{\key{(Bool}\;#1\key{)}}
 \newcommand{\PRIM}[2]{\LP\key{Prim}~#1~\LP #2\RP\RP}
-\newcommand{\READ}{\key{(Prim}\;\code{'read}\;\key{'())}}
+\newcommand{\READ}{\key{(Prim}\;\code{read}\;\key{())}}
 \newcommand{\CREAD}{\key{(read)}}
-\newcommand{\NEG}[1]{\key{(Prim}\;\code{'-}\;\code{(}#1\;\code{))}}
+\newcommand{\NEG}[1]{\key{(Prim}\;\code{-}\;\code{(}#1\code{))}}
 \newcommand{\CNEG}[1]{\LP\key{-}~#1\RP}
 \newcommand{\PROGRAM}[2]{\LP\code{Program}\;#1\;#2\RP}
 \newcommand{\CPROGRAM}[2]{\LP\code{CProgram}\;#1\;#2\RP}
 \newcommand{\XPROGRAM}[2]{\LP\code{X86Program}\;#1\;#2\RP}
 \newcommand{\PROGRAMDEFSEXP}[3]{\code{(ProgramDefsExp}~#1~#2~#3\code{)}}
 \newcommand{\PROGRAMDEFS}[2]{\code{(ProgramDefs}~#1~#2\code{)}}
-\newcommand{\ADD}[2]{\key{(Prim}\;\code{'+}\;\code{(}#1\;#2\code{))}}
+\newcommand{\ADD}[2]{\key{(Prim}\;\code{+}\;\code{(}#1\;#2\code{))}}
 \newcommand{\CADD}[2]{\LP\key{+}~#1~#2\RP}
-\newcommand{\SUB}[2]{\key{(Prim}\;\code{'-}\;\code{(}#1\;#2\code{))}}
+\newcommand{\SUB}[2]{\key{(Prim}\;\code{-}\;\code{(}#1\;#2\code{))}}
 \newcommand{\CSUB}[2]{\LP\key{-}~#1~#2\RP}
 \newcommand{\CWHILE}[2]{\LP\key{while}~#1~#2\RP}
 \newcommand{\WHILE}[2]{\LP\key{WhileLoop}~#1~#2\RP}
@@ -97,9 +97,9 @@
 \newcommand{\BEGIN}[2]{\LP\key{Begin}~#1~#2\RP}
 \newcommand{\CSETBANG}[2]{\LP\key{set!}~#1~#2\RP}
 \newcommand{\SETBANG}[2]{\LP\key{SetBang}~#1~#2\RP}
-\newcommand{\AND}[2]{\key{(Prim}\;\code{'and}\;\code{(}#1\;#2\code{))}}
-\newcommand{\OR}[2]{\key{(Prim}\;\code{'or}\;\code{(}#1\;#2\code{))}}
-\newcommand{\NOT}[1]{\key{(Prim}\;\code{'not}\;\code{(}#1\;\code{))}}
+\newcommand{\AND}[2]{\key{(Prim}\;\code{and}\;\code{(}#1\;#2\code{))}}
+\newcommand{\OR}[2]{\key{(Prim}\;\code{or}\;\code{(}#1\;#2\code{))}}
+\newcommand{\NOT}[1]{\key{(Prim}\;\code{not}\;\code{(}#1\;\code{))}}
 \newcommand{\UNIOP}[2]{\key{(Prim}\;#1\;\code{(}#2\code{))}}
 \newcommand{\CUNIOP}[2]{\LP #1\;#2 \RP}
 \newcommand{\BINOP}[3]{\key{(Prim}\;#1\;\code{(}#2\;#3\code{))}}
@@ -110,13 +110,13 @@
 \newcommand{\LET}[3]{\key{(Let}~#1~#2~#3\key{)}}
 \newcommand{\IF}[3]{\key{(If}\,#1\;#2\;#3\key{)}}
 \newcommand{\CAST}[3]{\LP\key{Cast}~#1~#2~#3\RP}
-\newcommand{\VECTOR}[1]{\LP\key{Prim}\;\code{'vector}\;\LP\;#1\RP\RP}
-\newcommand{\VECREF}[2]{\LP\key{Prim}\;\code{'vector-ref}\;\LP\;#1\;#2\RP\RP}
-\newcommand{\VECSET}[3]{\LP\key{Prim}\;\code{'vector-set!}\;\LP\;#1\;#2\;#3\RP\RP}
-\newcommand{\VECLEN}[1]{\LP\key{Prim}\;\code{'vector-length}\;\LP\;#1\RP\RP}
-\newcommand{\ANYVECREF}[2]{\LP\key{Prim}\;\code{'any-vector-ref}\;\LP\;#1\;#2\RP\RP}
-\newcommand{\ANYVECSET}[3]{\LP\key{Prim}\;\code{'any-vector-set!}\;\LP\;#1\;#2\;#3\RP\RP}
-\newcommand{\ANYVECLEN}[1]{\LP\key{Prim}\;\code{'any-vector-length}\;\LP\;#1\RP\RP}
+\newcommand{\VECTOR}[1]{\LP\key{Prim}\;\code{vector}\;\LP\;#1\RP\RP}
+\newcommand{\VECREF}[2]{\LP\key{Prim}\;\code{vector-ref}\;\LP\;#1\;#2\RP\RP}
+\newcommand{\VECSET}[3]{\LP\key{Prim}\;\code{vector-set!}\;\LP\;#1\;#2\;#3\RP\RP}
+\newcommand{\VECLEN}[1]{\LP\key{Prim}\;\code{vector-length}\;\LP\;#1\RP\RP}
+\newcommand{\ANYVECREF}[2]{\LP\key{Prim}\;\code{any-vector-ref}\;\LP\;#1\;#2\RP\RP}
+\newcommand{\ANYVECSET}[3]{\LP\key{Prim}\;\code{any-vector-set!}\;\LP\;#1\;#2\;#3\RP\RP}
+\newcommand{\ANYVECLEN}[1]{\LP\key{Prim}\;\code{any-vector-length}\;\LP\;#1\RP\RP}
 \newcommand{\CLOSURE}[2]{\LP\key{Closure}~#1~#2\RP}
 \newcommand{\ALLOC}[2]{\LP\key{Allocate}~#1~#2\RP}
 \newcommand{\ALLOCCLOS}[3]{\LP\key{AllocateClosure}~#1~#2~#3\RP}