|
@@ -40,7 +40,7 @@
|
|
|
\usepackage{amsmath}
|
|
|
\usepackage{amsthm}
|
|
|
\usepackage{amssymb}
|
|
|
-\usepackage{natbib}
|
|
|
+\usepackage[numbers]{natbib}
|
|
|
\usepackage{stmaryrd}
|
|
|
\usepackage{xypic}
|
|
|
\usepackage{semantic}
|
|
@@ -203,26 +203,27 @@ During that time, another graduate student named Abdulaziz Ghuloum
|
|
|
observed that the front-to-back organization of the course made it
|
|
|
difficult for students to understand the rationale for the compiler
|
|
|
design. Ghuloum proposed an incremental approach in which the students
|
|
|
-build the compiler in stages; they start by implementing a complete
|
|
|
-compiler for a very small subset of the input language and in each
|
|
|
-subsequent stage they add a language feature and add or modify passes
|
|
|
-to handle the new feature~\citep{Ghuloum:2006bh}. In this way, the
|
|
|
-students see how the language features motivate aspects of the
|
|
|
-compiler design.
|
|
|
+start by implementing a complete compiler for a very small subset of
|
|
|
+the language. In each subsequent stage they add a feature to the
|
|
|
+language and then add or modify passes to handle the new
|
|
|
+feature~\citep{Ghuloum:2006bh}. In this way, the students see how the
|
|
|
+language features motivate aspects of the compiler design.
|
|
|
|
|
|
After graduating from Indiana University in 2005, I went on to teach
|
|
|
at the University of Colorado. I adapted the nano-pass and incremental
|
|
|
approaches to compiling a subset of the Python
|
|
|
-language~\citep{Siek:2012ab}. Python and Scheme are quite different
|
|
|
-on the surface but there is a large overlap in the compiler techniques
|
|
|
-required for the two languages. Thus, I was able to teach much of the
|
|
|
-same content from the Indiana compiler course. I very much enjoyed
|
|
|
-teaching the course organized in this way, and even better, many of
|
|
|
-the students learned a lot and got excited about compilers.
|
|
|
+language~\citep{Siek:2012ab}.
|
|
|
+%% Python and Scheme are quite different
|
|
|
+%% on the surface but there is a large overlap in the compiler techniques
|
|
|
+%% required for the two languages. Thus, I was able to teach much of the
|
|
|
+%% same content from the Indiana compiler course.
|
|
|
+I very much enjoyed teaching the course organized in this way, and
|
|
|
+even better, many of the students learned a lot and got excited about
|
|
|
+compilers.
|
|
|
|
|
|
I returned to Indiana University in 2013. In my absence the compiler
|
|
|
course had switched from the front-to-back organization to a
|
|
|
-back-to-front~\cite{Dybvig:2010aa}. While that organization also works
|
|
|
+back-to-front~\citep{Dybvig:2010aa}. While that organization also works
|
|
|
well, I prefer the incremental approach and started porting and
|
|
|
adapting the structure of the Colorado course back into the land of
|
|
|
Scheme. In the meantime Indiana University had moved on from Scheme to
|
|
@@ -250,8 +251,8 @@ prepare students for a lifelong career in programming languages.
|
|
|
|
|
|
The book uses the Racket language both for the implementation of the
|
|
|
compiler and for the language that is compiled, so a student should be
|
|
|
-proficient with Racket (or Scheme) prior to reading this book. There
|
|
|
-are many excellent resources for learning Scheme and
|
|
|
+proficient with Racket or Scheme prior to reading this book. There are
|
|
|
+many excellent resources for learning Scheme and
|
|
|
Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}.
|
|
|
|
|
|
It is helpful but not necessary for the student to have prior exposure
|
|
@@ -267,8 +268,8 @@ system (written in C) when it is compiled using the GNU C compiler
|
|
|
(\code{gcc}) on the Linux and MacOS operating systems. (Minor
|
|
|
adjustments are needed for MacOS which we note as they arise.)
|
|
|
%
|
|
|
-When running on the Microsoft Windows operating system, the GNU C
|
|
|
-compiler follows the Microsoft x64 calling
|
|
|
+The GNU C compiler, when running on the Microsoft Windows operating
|
|
|
+system, follows the Microsoft x64 calling
|
|
|
convention~\citep{Microsoft:2018aa,Microsoft:2020aa}. So the assembly
|
|
|
code that we generate will \emph{not} work properly with our runtime
|
|
|
system on Windows. One option to consider for using a Windows computer
|
|
@@ -329,7 +330,7 @@ feature to represent ASTs (Section~\ref{sec:ast}). We use grammars to
|
|
|
define the abstract syntax of programming languages
|
|
|
(Section~\ref{sec:grammar}) and pattern matching to inspect individual
|
|
|
nodes in an AST (Section~\ref{sec:pattern-matching}). We use
|
|
|
-recursive functions to construct and deconstruct entire ASTs
|
|
|
+recursive functions to construct and deconstruct ASTs
|
|
|
(Section~\ref{sec:recursion}). This chapter provides an brief
|
|
|
introduction to these ideas. \index{struct}
|
|
|
|
|
@@ -340,7 +341,7 @@ Compilers use abstract syntax trees to represent programs because they
|
|
|
often need to ask questions like: for a given part of a program, what
|
|
|
kind of language feature is it? What are its sub-parts? Consider the
|
|
|
program on the left and its AST on the right. This program is an
|
|
|
-addition and it has two sub-parts, a read operation and a
|
|
|
+addition operation and it has two sub-parts, a read operation and a
|
|
|
negation. The negation has another sub-part, the integer constant
|
|
|
\code{8}. By using a tree to represent the program, we can easily
|
|
|
follow the links to go from one part of a program to its sub-parts.
|
|
@@ -480,8 +481,8 @@ by using a single structure.
|
|
|
When compiling a program such as \eqref{eq:arith-prog}, we need to
|
|
|
know that the operation associated with the root node is addition and
|
|
|
we need to be able to access its two children. Racket provides pattern
|
|
|
-matching over structures to support these kinds of queries, as we
|
|
|
-see in Section~\ref{sec:pattern-matching}.
|
|
|
+matching to support these kinds of queries, as we see in
|
|
|
+Section~\ref{sec:pattern-matching}.
|
|
|
|
|
|
In this book, we often write down the concrete syntax of a program
|
|
|
even when we really have in mind the AST because the concrete syntax
|
|
@@ -539,8 +540,8 @@ an input integer from the user of the program.
|
|
|
\Exp ::= \READ{} \label{eq:arith-read}
|
|
|
\end{equation}
|
|
|
|
|
|
-The third rule says that, given an $\Exp$ node, you can build another
|
|
|
-$\Exp$ node by negating it.
|
|
|
+The third rule says that, given an $\Exp$ node, the negation of that
|
|
|
+node is also an $\Exp$.
|
|
|
\begin{equation}
|
|
|
\Exp ::= \NEG{\Exp} \label{eq:arith-neg}
|
|
|
\end{equation}
|
|
@@ -549,9 +550,10 @@ Symbols in typewriter font such as \key{-} and \key{read} are
|
|
|
the rule to be applicable.
|
|
|
\index{terminal}
|
|
|
|
|
|
-We can apply these rules to build ASTs in the \LangInt{} language. By rule
|
|
|
-\eqref{eq:arith-int}, \texttt{(Int 8)} is an $\Exp$, then by rule
|
|
|
-\eqref{eq:arith-neg}, the following AST is an $\Exp$.
|
|
|
+We can apply these rules to categorize the ASTs that are in the
|
|
|
+\LangInt{} language. For example, by rule \eqref{eq:arith-int}
|
|
|
+\texttt{(Int 8)} is an $\Exp$, then by rule \eqref{eq:arith-neg} the
|
|
|
+following AST is an $\Exp$.
|
|
|
\begin{center}
|
|
|
\begin{minipage}{0.4\textwidth}
|
|
|
\begin{lstlisting}
|
|
@@ -571,7 +573,7 @@ We can apply these rules to build ASTs in the \LangInt{} language. By rule
|
|
|
\end{minipage}
|
|
|
\end{center}
|
|
|
|
|
|
-The next grammar rule defines addition expressions:
|
|
|
+The next grammar rule is for addition expressions:
|
|
|
\begin{equation}
|
|
|
\Exp ::= \ADD{\Exp}{\Exp} \label{eq:arith-add}
|
|
|
\end{equation}
|
|
@@ -679,16 +681,14 @@ the output on the right. \index{match} \index{pattern matching}
|
|
|
\end{lstlisting}
|
|
|
\end{minipage}
|
|
|
\end{center}
|
|
|
-In the above example, the \texttt{match} form takes the AST
|
|
|
+In the above example, the \texttt{match} form takes an AST
|
|
|
\eqref{eq:arith-prog} and binds its parts to the three pattern
|
|
|
-variables \texttt{op}, \texttt{child1}, and \texttt{child2}. In
|
|
|
-general, a match clause consists of a \emph{pattern} and a
|
|
|
-\emph{body}.
|
|
|
-\index{pattern}
|
|
|
-Patterns are recursively defined to be either a pattern
|
|
|
-variable, a structure name followed by a pattern for each of the
|
|
|
-structure's arguments, or an S-expression (symbols, lists, etc.).
|
|
|
-(See Chapter 12 of The Racket
|
|
|
+variables \texttt{op}, \texttt{child1}, and \texttt{child2}, and then
|
|
|
+prints out the operator. In general, a match clause consists of a
|
|
|
+\emph{pattern} and a \emph{body}.\index{pattern} Patterns are
|
|
|
+recursively defined to be either a pattern variable, a structure name
|
|
|
+followed by a pattern for each of the structure's arguments, or an
|
|
|
+S-expression (symbols, lists, etc.). (See Chapter 12 of The Racket
|
|
|
Guide\footnote{\url{https://docs.racket-lang.org/guide/match.html}}
|
|
|
and Chapter 9 of The Racket
|
|
|
Reference\footnote{\url{https://docs.racket-lang.org/reference/match.html}}
|
|
@@ -767,7 +767,7 @@ it is defined using a sequence of match clauses that correspond to a
|
|
|
grammar, and the body of each clause makes a recursive call on each
|
|
|
child node.\footnote{This principle of structuring code according to
|
|
|
the data definition is advocated in the book \emph{How to Design
|
|
|
- Programs}\url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}.}.
|
|
|
+ Programs} \url{http://www.ccs.neu.edu/home/matthias/HtDP2e/}.}.
|
|
|
Below we also define a second function, named \code{Rint?}, that
|
|
|
determines whether an AST is an \LangInt{} program. In general we can
|
|
|
expect to write one recursive function to handle each non-terminal in
|
|
@@ -841,7 +841,7 @@ it comes to the \code{Program} wrapper. Yet this style is generally
|
|
|
%
|
|
|
For example, the above function is subtly wrong:
|
|
|
\lstinline{(Rint? (Program '() (Program '() (Int 3))))}
|
|
|
-would return true, when it should return false.
|
|
|
+returns true when it should return false.
|
|
|
|
|
|
|
|
|
\section{Interpreters}
|
|
@@ -878,13 +878,11 @@ function, which in turn has one match clause per grammar rule for
|
|
|
[(Prim '+ (list e1 e2))
|
|
|
(define v1 (interp-exp e1))
|
|
|
(define v2 (interp-exp e2))
|
|
|
- (fx+ v1 v2)]
|
|
|
- ))
|
|
|
+ (fx+ v1 v2)]))
|
|
|
|
|
|
(define (interp-Rint p)
|
|
|
(match p
|
|
|
- [(Program '() e) (interp-exp e)]
|
|
|
- ))
|
|
|
+ [(Program '() e) (interp-exp e)]))
|
|
|
\end{lstlisting}
|
|
|
\caption{Interpreter for the \LangInt{} language.}
|
|
|
\label{fig:interp-Rint}
|
|
@@ -895,8 +893,12 @@ following program adds two integers.
|
|
|
\begin{lstlisting}
|
|
|
(+ 10 32)
|
|
|
\end{lstlisting}
|
|
|
-The result is \key{42}. We wrote the above program in concrete syntax,
|
|
|
-whereas the parsed abstract syntax is:
|
|
|
+The result is \key{42}, the answer to life, the universe, and
|
|
|
+everything: \code{42}!\footnote{\emph{The Hitchhiker's Guide to the
|
|
|
+ Galaxy} by Douglas Adams.}.
|
|
|
+%
|
|
|
+We wrote the above program in concrete syntax whereas the parsed
|
|
|
+abstract syntax is:
|
|
|
\begin{lstlisting}
|
|
|
(Program '() (Prim '+ (list (Int 10) (Int 32))))
|
|
|
\end{lstlisting}
|
|
@@ -926,15 +928,15 @@ It produces an error:
|
|
|
fx+: result is not a fixnum
|
|
|
\end{lstlisting}
|
|
|
We establish the convention that if running the definitional
|
|
|
-interpreter on a program produces an error other than
|
|
|
-\code{trapped-error}, then the meaning of that program is
|
|
|
-\emph{unspecified}\index{unspecified behavior}. That means a compiler
|
|
|
-for the language is under no obligations regarding that program; it
|
|
|
-may or may not produce an executable, and if it does, that executable
|
|
|
-can do anything. On the other hand, if the error is a
|
|
|
-\code{trapped-error}, then the compiled program is also required to
|
|
|
-report that an error occurred. To signal an error, exit with a return
|
|
|
-code of \code{255}. The interpreters in chapters
|
|
|
+interpreter on a program produces an error then the meaning of that
|
|
|
+program is \emph{unspecified}\index{unspecified behavior}, unless the
|
|
|
+error is a \code{trapped-error}. A compiler for the language is under
|
|
|
+no obligations regarding programs with unspecified behavior; it does
|
|
|
+not have to produce an executable, and if it does, that executable can
|
|
|
+do anything. On the other hand, if the error is a
|
|
|
+\code{trapped-error}, then the compiler must produce an executable and
|
|
|
+it is required to report that an error occurred. To signal an error,
|
|
|
+exit with a return code of \code{255}. The interpreters in chapters
|
|
|
\ref{ch:type-dynamic} and \ref{ch:gradual-typing} use
|
|
|
\code{trapped-error}.
|
|
|
|
|
@@ -950,9 +952,7 @@ program \eqref{eq:arith-prog} performs a \key{read} and then subtracts
|
|
|
\begin{lstlisting}
|
|
|
(interp-Rint (Program '() ast1.1))
|
|
|
\end{lstlisting}
|
|
|
-and if the input is \code{50}, then we get the answer to life, the
|
|
|
-universe, and everything: \code{42}!\footnote{\emph{The Hitchhiker's
|
|
|
- Guide to the Galaxy} by Douglas Adams.}
|
|
|
+and if the input is \code{50}, the result is \code{42}.
|
|
|
|
|
|
We include the \key{read} operation in \LangInt{} so a clever student
|
|
|
cannot implement a compiler for \LangInt{} that simply runs the interpreter
|
|
@@ -962,14 +962,14 @@ first instance of this course.)
|
|
|
|
|
|
The job of a compiler is to translate a program in one language into a
|
|
|
program in another language so that the output program behaves the
|
|
|
-same way as the input program does according to its definitional
|
|
|
-interpreter. This idea is depicted in the following diagram. Suppose
|
|
|
-we have two languages, $\mathcal{L}_1$ and $\mathcal{L}_2$, and an
|
|
|
-interpreter for each language. Suppose that the compiler translates
|
|
|
-program $P_1$ in language $\mathcal{L}_1$ into program $P_2$ in
|
|
|
-language $\mathcal{L}_2$. Then interpreting $P_1$ and $P_2$ on their
|
|
|
-respective interpreters with input $i$ should yield the same output
|
|
|
-$o$.
|
|
|
+same way as the input program does. This idea is depicted in the
|
|
|
+following diagram. Suppose we have two languages, $\mathcal{L}_1$ and
|
|
|
+$\mathcal{L}_2$, and a definitional interpreter for each language.
|
|
|
+Given a compiler that translates from language $\mathcal{L}_1$ to
|
|
|
+$\mathcal{L}_2$ and given any program $P_1$ in $\mathcal{L}_1$, the
|
|
|
+compiler must translate it into some program $P_2$ such that
|
|
|
+interpreting $P_1$ and $P_2$ on their respective interpreters with
|
|
|
+same input $i$ yields the same output $o$.
|
|
|
\begin{equation} \label{eq:compile-correct}
|
|
|
\begin{tikzpicture}[baseline=(current bounding box.center)]
|
|
|
\node (p1) at (0, 0) {$P_1$};
|
|
@@ -991,7 +991,7 @@ In this section we consider a compiler that translates \LangInt{} programs
|
|
|
into \LangInt{} programs that may be more efficient, that is, this compiler
|
|
|
is an optimizer. This optimizer eagerly computes the parts of the
|
|
|
program that do not depend on any inputs, a process known as
|
|
|
-\emph{partial evaluation}~\cite{Jones:1993uq}.
|
|
|
+\emph{partial evaluation}~\citep{Jones:1993uq}.
|
|
|
\index{partial evaluation}
|
|
|
For example, given the following program
|
|
|
\begin{lstlisting}
|
|
@@ -1028,13 +1028,11 @@ functions is the output of partially evaluating the children.
|
|
|
[(Int n) (Int n)]
|
|
|
[(Prim 'read '()) (Prim 'read '())]
|
|
|
[(Prim '- (list e1)) (pe-neg (pe-exp e1))]
|
|
|
- [(Prim '+ (list e1 e2)) (pe-add (pe-exp e1) (pe-exp e2))]
|
|
|
- ))
|
|
|
+ [(Prim '+ (list e1 e2)) (pe-add (pe-exp e1) (pe-exp e2))]))
|
|
|
|
|
|
(define (pe-Rint p)
|
|
|
(match p
|
|
|
- [(Program '() e) (Program '() (pe-exp e))]
|
|
|
- ))
|
|
|
+ [(Program '() e) (Program '() (pe-exp e))]))
|
|
|
\end{lstlisting}
|
|
|
\caption{A partial evaluator for \LangInt{} expressions.}
|
|
|
\label{fig:pe-arith}
|
|
@@ -1042,13 +1040,13 @@ functions is the output of partially evaluating the children.
|
|
|
|
|
|
The \texttt{pe-neg} and \texttt{pe-add} functions check whether their
|
|
|
arguments are integers and if they are, perform the appropriate
|
|
|
-arithmetic. Otherwise, they create an AST node for the operation
|
|
|
-(either negation or addition).
|
|
|
+arithmetic. Otherwise, they create an AST node for the arithmetic
|
|
|
+operation.
|
|
|
|
|
|
To gain some confidence that the partial evaluator is correct, we can
|
|
|
test whether it produces programs that get the same result as the
|
|
|
input programs. That is, we can test whether it satisfies Diagram
|
|
|
-\eqref{eq:compile-correct}. The following code runs the partial
|
|
|
+\ref{eq:compile-correct}. The following code runs the partial
|
|
|
evaluator on several examples and tests the output program. The
|
|
|
\texttt{parse-program} and \texttt{assert} functions are defined in
|
|
|
Appendix~\ref{appendix:utilities}.\\
|