|
@@ -169,45 +169,42 @@ University.
|
|
|
There is a magical moment when a programmer presses the ``run'' button
|
|
|
and the software begins to execute. Somehow a program written in a
|
|
|
high-level language is running on a computer that is only capable of
|
|
|
-shuffling bits. This book reveals the wizardry that makes that
|
|
|
-transformation possible. Beginning with the groundbreaking work of
|
|
|
-Backus and colleagues in the 1950s, computer scientists discovered
|
|
|
-techniques for constructing programs, called \emph{compilers}, that
|
|
|
-automatically translate high-level programs into machine code.
|
|
|
+shuffling bits. This book reveals the wizardry that makes that moment
|
|
|
+possible. Beginning with the groundbreaking work of Backus and
|
|
|
+colleagues in the 1950s, computer scientists discovered techniques for
|
|
|
+constructing programs, called \emph{compilers}, that automatically
|
|
|
+translate high-level programs into machine code.
|
|
|
|
|
|
-This book guides the reader on a journey, constructing their own
|
|
|
+This book guides the reader on the journey of constructing their own
|
|
|
compiler for a small but powerful language. Along the way the reader
|
|
|
learns the essential concepts, algorithms, and data structures that
|
|
|
-underlie modern compilers. They develop a clear understanding of how
|
|
|
+underlie modern compilers. They develop an understanding of how
|
|
|
programs are mapped onto computer hardware which is helpful when
|
|
|
reasoning about execution time, debugging errors across layers of the
|
|
|
-software stack, and understanding security vulnerabilities in a piece
|
|
|
-of code.
|
|
|
+software stack, and finding security vulnerabilities.
|
|
|
%
|
|
|
For readers interested in a career in compiler construction, this book
|
|
|
-serves as stepping-stone to more advanced topics such as just-in-time
|
|
|
+is a stepping-stone to advanced topics such as just-in-time
|
|
|
compilation, program analysis, and program optimization.
|
|
|
%
|
|
|
-For readers interested in the creation of programming languages, this
|
|
|
-book connects language design choices to their impact on compiler
|
|
|
-organization and the generated code.
|
|
|
+For readers interested in the design of programming languages, this
|
|
|
+book connects language design choices to their impact on the compiler
|
|
|
+and generated code.
|
|
|
|
|
|
-Compilers are typically organized into a pipeline with a handful of
|
|
|
-stages, called passes, that translate a program into lower-level
|
|
|
-abstractions. We take this approach to the extreme by splitting the
|
|
|
+A compiler is typically organized as a pipeline with a handful of
|
|
|
+passes that translate a program into ever lower levels of
|
|
|
+abstraction. We take this approach to the extreme by partitioning our
|
|
|
compiler into a large number of \emph{nanopasses}, each of which
|
|
|
performs a single task. This makes the compiler easier to debug,
|
|
|
because we test the output of each pass, and it makes the compiler
|
|
|
easier to understand, because each pass involves fewer concepts.
|
|
|
|
|
|
-Most books about compiler construction are structured in the same way
|
|
|
-as the compiler, with each chapter describing how to construct one
|
|
|
-pass. The problem with that structure is that it becomes easy to lose
|
|
|
-sight of which features of the input language motivate the design
|
|
|
-choices in a particular pass of the compiler. We instead take an
|
|
|
-\emph{incremental} approach in which we build a complete compiler in
|
|
|
-each chapter, starting with a tiny language and adding new features in
|
|
|
-each subsequent chapter.
|
|
|
+Most books about compiler construction are structured like the
|
|
|
+compiler, with each chapter describing one pass. The problem with that
|
|
|
+structure is that it obfuscates how language features motivate design
|
|
|
+choices in the compiler. We take an \emph{incremental} approach in
|
|
|
+which we build a complete compiler in each chapter, starting with a
|
|
|
+tiny language and adding new features in subsequent chapters.
|
|
|
|
|
|
Our choice of language features is designed to elicit the fundamental
|
|
|
concepts and algorithms used in compilers for modern programming
|
|
@@ -221,19 +218,21 @@ languages.
|
|
|
Chapter~\ref{ch:register-allocation-Rvar} we apply \emph{graph
|
|
|
coloring} to assign variables to registers.
|
|
|
\item Chapter~\ref{ch:Rif} adds conditional control-flow, which
|
|
|
- motivates the need for \emph{control-flow graphs}.
|
|
|
+ motivates an elegant recursive algorithm for mapping expressions to
|
|
|
+ \emph{control-flow graphs}.
|
|
|
\item Chapter~\ref{ch:Rvec} adds heap-allocated tuples, motivating
|
|
|
\emph{garbage collection}.
|
|
|
-\item Chapter~\ref{ch:Rfun} adds functions similar those in the C
|
|
|
- programming language~\citep{Kernighan:1988nx}: first-class values
|
|
|
- without lexical scoping. The reader learns about the procedure call
|
|
|
- stack, \emph{calling conventions}, and their interaction with
|
|
|
- register allocation and garbage collection.
|
|
|
+\item Chapter~\ref{ch:Rfun} adds functions that are first-class values
|
|
|
+ but lack lexical scoping, similar to the C programming
|
|
|
+ language~\citep{Kernighan:1988nx} except that we generate efficient
|
|
|
+ tail calls. The reader learns about the procedure call stack,
|
|
|
+ \emph{calling conventions}, and their interaction with register
|
|
|
+ allocation and garbage collection.
|
|
|
\item Chapter~\ref{ch:Rlam} adds anonymous functions with lexical
|
|
|
scoping, i.e., \emph{lambda abstraction}. The reader learns about
|
|
|
\emph{closure conversion}, in which lambdas are translated into a
|
|
|
combination of functions and tuples.
|
|
|
-\item Chapter~\ref{ch:Rdyn} adds \emph{dynamic typing}. Up until this
|
|
|
+\item Chapter~\ref{ch:Rdyn} adds \emph{dynamic typing}. Prior to this
|
|
|
point the input languages are statically typed. The reader extends
|
|
|
the statically typed language with an \code{Any} type which serves
|
|
|
as a target for compiling the dynamically typed language.
|
|
@@ -250,31 +249,31 @@ languages.
|
|
|
leveraging the \code{Any} type and type casts developed in Chapters
|
|
|
\ref{ch:Rdyn} and \ref{ch:Rgrad}.
|
|
|
\end{itemize}
|
|
|
-Alas, there are many language features that we do not include. Our
|
|
|
-choices are informed by a cost-benefit analysis in which we weigh the
|
|
|
-incidental complexity of a feature against the number of fundamental
|
|
|
+There are many language features that we do not include. Our choices
|
|
|
+weigh the incidental complexity of a feature against the fundamental
|
|
|
concepts that it exposes. For example, we include tuples and not
|
|
|
records because they both elicit the study of heap allocation and
|
|
|
garbage collection but records come with more incidental complexity.
|
|
|
|
|
|
Since 2016 this book has served as the textbook for the compiler
|
|
|
course at Indiana University, a 16-week course for upper-level
|
|
|
-undergraduates and first-year graduate students. Prior to this
|
|
|
-course, students learn to program in both imperative and functional
|
|
|
-languages, study data structures and algorithms, and take discrete
|
|
|
-mathematics.
|
|
|
-%
|
|
|
-The students form groups of 2-4 people and complete one chapter every
|
|
|
-two weeks, starting with Chapter~\ref{ch:Rvar} and finishing with
|
|
|
-Chapter~\ref{ch:Rdyn}. Most chapters include a challenge problem that
|
|
|
-we assign to the graduate students. The last two weeks of the course
|
|
|
-are reserved for a final project in which students design and
|
|
|
-implement an extension to the compiler of their choosing.
|
|
|
-Chapters~\ref{ch:Rwhile}, \ref{ch:Rgrad}, and \ref{ch:Rpoly} can be
|
|
|
-used in support of these projects or can be swapped in to replace some
|
|
|
-of the earlier chapters. For example, a course with an emphasis on
|
|
|
-statically-typed imperative languages would skip Chapter~\ref{ch:Rdyn}
|
|
|
-in favor of
|
|
|
+undergraduates and first-year graduate students.
|
|
|
+%
|
|
|
+Prior to this course, students learn to program in both imperative and
|
|
|
+functional languages, study data structures and algorithms, and take
|
|
|
+discrete mathematics.
|
|
|
+%
|
|
|
+At the beginning of the course, students form groups of 2-4 people.
|
|
|
+The groups complete one chapter every two weeks, starting with
|
|
|
+Chapter~\ref{ch:Rvar} and finishing with Chapter~\ref{ch:Rdyn}. Many
|
|
|
+chapters include a challenge problem that we assign to the graduate
|
|
|
+students. The last two weeks of the course involve a final project in
|
|
|
+which students design and implement a compiler extension of their
|
|
|
+choosing. Chapters~\ref{ch:Rwhile}, \ref{ch:Rgrad}, and
|
|
|
+\ref{ch:Rpoly} can be used in support of these projects or they can
|
|
|
+replace some of the earlier chapters. For example, a course with an
|
|
|
+emphasis on statically-typed imperative languages would skip
|
|
|
+Chapter~\ref{ch:Rdyn} in favor of
|
|
|
Chapter~\ref{ch:Rwhile}. Figure~\ref{fig:chapter-dependences} depicts
|
|
|
the dependencies between chapters.
|
|
|
|
|
@@ -285,17 +284,17 @@ University of Massachusetts Lowell.
|
|
|
|
|
|
\begin{figure}[tp]
|
|
|
\begin{tikzpicture}[baseline=(current bounding box.center)]
|
|
|
- \node (C1) at (0,1) {\small Ch.~\ref{ch:trees-recur} Preliminaries};
|
|
|
- \node (C2) at (4,1) {\small Ch.~\ref{ch:Rvar} Variables};
|
|
|
- \node (C3) at (8,1) {\small Ch.~\ref{ch:register-allocation-Rvar} Registers};
|
|
|
+ \node (C1) at (0,1.5) {\small Ch.~\ref{ch:trees-recur} Preliminaries};
|
|
|
+ \node (C2) at (4,1.5) {\small Ch.~\ref{ch:Rvar} Variables};
|
|
|
+ \node (C3) at (8,1.5) {\small Ch.~\ref{ch:register-allocation-Rvar} Registers};
|
|
|
\node (C4) at (0,0) {\small Ch.~\ref{ch:Rif} Control Flow};
|
|
|
\node (C5) at (4,0) {\small Ch.~\ref{ch:Rvec} Tuples};
|
|
|
\node (C6) at (8,0) {\small Ch.~\ref{ch:Rfun} Functions};
|
|
|
- \node (C9) at (0,-1) {\small Ch.~\ref{ch:Rwhile} Loops};
|
|
|
- \node (C8) at (4,-1) {\small Ch.~\ref{ch:Rdyn} Dynamic};
|
|
|
- \node (C7) at (8,-1) {\small Ch.~\ref{ch:Rlam} Lambda};
|
|
|
- \node (C10) at (4,-2) {\small Ch.~\ref{ch:Rgrad} Gradual};
|
|
|
- \node (C11) at (8,-2) {\small Ch.~\ref{ch:Rpoly} Generics};
|
|
|
+ \node (C9) at (0,-1.5) {\small Ch.~\ref{ch:Rwhile} Loops};
|
|
|
+ \node (C8) at (4,-1.5) {\small Ch.~\ref{ch:Rdyn} Dynamic};
|
|
|
+ \node (C7) at (8,-1.5) {\small Ch.~\ref{ch:Rlam} Lambda};
|
|
|
+ \node (C10) at (4,-3) {\small Ch.~\ref{ch:Rgrad} Gradual};
|
|
|
+ \node (C11) at (8,-3) {\small Ch.~\ref{ch:Rpoly} Generics};
|
|
|
|
|
|
\path[->] (C1) edge [above] node {} (C2);
|
|
|
\path[->] (C2) edge [above] node {} (C3);
|
|
@@ -312,12 +311,16 @@ University of Massachusetts Lowell.
|
|
|
\label{fig:chapter-dependences}
|
|
|
\end{figure}
|
|
|
|
|
|
-This book uses the \href{https://racket-lang.org/}{Racket} language
|
|
|
-both for the implementation of the compiler and for the input
|
|
|
-language, so the reader should be proficient with Racket or Scheme
|
|
|
-prior to reading this book. There are many excellent resources for
|
|
|
-learning Scheme and
|
|
|
-Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}.
|
|
|
+We use the \href{https://racket-lang.org/}{Racket} language both for
|
|
|
+the implementation of the compiler and for the input language, so the
|
|
|
+reader should be proficient with Racket or Scheme. There are many
|
|
|
+excellent resources for learning Scheme and
|
|
|
+Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}. The
|
|
|
+support code for this book is in the \code{github} repository at the
|
|
|
+following URL:
|
|
|
+\begin{center}\small
|
|
|
+ \url{https://github.com/IUCompilerCourse/public-student-support-code}
|
|
|
+\end{center}
|
|
|
|
|
|
The compiler targets x86 assembly language~\citep{Intel:2015aa}, so it
|
|
|
is helpful but not necessary for the reader to have taken a computer
|
|
@@ -325,18 +328,16 @@ systems course~\citep{Bryant:2010aa}. This book introduces the parts
|
|
|
of x86-64 assembly language that are needed.
|
|
|
%
|
|
|
We follow the System V calling
|
|
|
-conventions~\citep{Bryant:2005aa,Matz:2013aa}, which means that the
|
|
|
-assembly code that we generate will work properly with our runtime
|
|
|
-system (written in C) when it is compiled using the GNU C compiler
|
|
|
-(\code{gcc}) on the Linux and MacOS operating systems. (Minor
|
|
|
-adjustments are needed for MacOS which we note as they arise.)
|
|
|
-%
|
|
|
-The GNU C compiler, when running on the Microsoft Windows operating
|
|
|
-system, follows the Microsoft x64 calling
|
|
|
-convention~\citep{Microsoft:2018aa,Microsoft:2020aa}. So the assembly
|
|
|
-code that we generate will \emph{not} work properly with our runtime
|
|
|
-system on Windows. One option to consider for using a Windows computer
|
|
|
-is to run a virtual machine with Linux as the guest operating system.
|
|
|
+conventions~\citep{Bryant:2005aa,Matz:2013aa}, so the assembly code
|
|
|
+that we generate works with the runtime system (written in C) when it
|
|
|
+is compiled using the GNU C compiler (\code{gcc}) on Linux and MacOS
|
|
|
+operating systems.
|
|
|
+%
|
|
|
+On the Windows operating system, \code{gcc} uses the Microsoft x64
|
|
|
+calling convention~\citep{Microsoft:2018aa,Microsoft:2020aa}. So the
|
|
|
+assembly code that we generate does \emph{not} work with the runtime
|
|
|
+system on Windows. One workaround is to use a virtual machine with
|
|
|
+Linux as the guest operating system.
|
|
|
|
|
|
% TODO: point to support code on github
|
|
|
|
|
@@ -438,13 +439,12 @@ is to run a virtual machine with Linux as the guest operating system.
|
|
|
|
|
|
\section*{Acknowledgments}
|
|
|
|
|
|
-The tradition of compiler writing at Indiana University goes back to
|
|
|
-research and courses on programming languages by Professor Daniel
|
|
|
-Friedman in the 1970's and 1980's. One of his students, Kent Dybvig,
|
|
|
-built Chez Scheme~\citep{Dybvig:2006aa}, a production-quality and
|
|
|
-efficient compiler for Scheme. Throughout the 1990's and 2000's,
|
|
|
-Professor Dybvig taught the compiler course and continued development
|
|
|
-of Chez Scheme.
|
|
|
+The tradition of compiler construction at Indiana University goes back
|
|
|
+to research and courses on programming languages by Daniel Friedman in
|
|
|
+the 1970's and 1980's. One of his students, Kent Dybvig, implemented
|
|
|
+Chez Scheme~\citep{Dybvig:2006aa}, a production-quality, efficient
|
|
|
+compiler for Scheme. Throughout the 1990's and 2000's, Dybvig taught
|
|
|
+the compiler course and continued the development of Chez Scheme.
|
|
|
%
|
|
|
The compiler course evolved to incorporate novel pedagogical ideas
|
|
|
while also including elements of efficient real-world compilers. One
|
|
@@ -452,24 +452,29 @@ of Friedman's ideas was to split the compiler into many small
|
|
|
passes. Another idea, called ``the game'', was to test the code
|
|
|
generated by each pass on interpreters.
|
|
|
|
|
|
-Dybvig, with later help from his students Dipanwita Sarkar and Andrew
|
|
|
-Keep, developed infrastructure to support this approach and evolved
|
|
|
-the course use even smaller
|
|
|
+Dybvig, with help from his students Dipanwita Sarkar and Andrew Keep,
|
|
|
+developed infrastructure to support this approach and evolved the
|
|
|
+course to use even smaller
|
|
|
nanopasses~\citep{Sarkar:2004fk,Keep:2012aa}. Many of the compiler
|
|
|
-design decisions in this book are drawn from the assignment
|
|
|
-descriptions of \citet{Dybvig:2010aa}. A graduate student named
|
|
|
-Abdulaziz Ghuloum observed that the front-to-back organization of the
|
|
|
-course made it difficult for students to understand the rationale for
|
|
|
-the compiler design. Ghuloum proposed the incremental
|
|
|
-approach~\citep{Ghuloum:2006bh}.
|
|
|
+design decisions in this book are inspired by the assignment
|
|
|
+descriptions of \citet{Dybvig:2010aa}. In the mid 2000's a student of
|
|
|
+Dybvig's named Abdulaziz Ghuloum observed that the front-to-back
|
|
|
+organization of the course made it difficult for students to
|
|
|
+understand the rationale for the compiler design. Ghuloum proposed the
|
|
|
+incremental approach~\citep{Ghuloum:2006bh}.
|
|
|
+
|
|
|
+We thank Bor-Yuh Chang, John Clements, Jay McCarthy, Nate Nystrom, and
|
|
|
+Michael Wollowski for teaching courses based on early drafts.
|
|
|
|
|
|
-We thank John Clements, Bor-Yuh Evan Chang, Daniel P. Friedman, Ronald
|
|
|
-Garcia, Abdulaziz Ghuloum, Jay McCarthy, Nate Nystrom, Dipanwita
|
|
|
-Sarkar, Oscar Waddell, and Michael Wollowski.
|
|
|
+We thank Ronald Garcia for being Jeremy's partner when they took the
|
|
|
+compiler course in the early 2000's, especially for finding the bug
|
|
|
+that was send the garbage collector on a wild goose chase!
|
|
|
+
|
|
|
+%Oscar Waddell ??
|
|
|
|
|
|
\mbox{}\\
|
|
|
\noindent Jeremy G. Siek \\
|
|
|
-Indiana University
|
|
|
+Bloomington, Indiana
|
|
|
%\noindent \url{http://homes.soic.indiana.edu/jsiek} \\
|
|
|
%\noindent Spring 2016
|
|
|
|
|
@@ -491,7 +496,7 @@ perform.\index{concrete syntax}\index{abstract syntax}\index{abstract
|
|
|
from concrete syntax to abstract syntax is a process called
|
|
|
\emph{parsing}~\citep{Aho:1986qf}. We do not cover the theory and
|
|
|
implementation of parsing in this book. A parser is provided in the
|
|
|
-supporting materials for translating from concrete to abstract syntax.
|
|
|
+support code for translating from concrete to abstract syntax.
|
|
|
|
|
|
ASTs can be represented in many different ways inside the compiler,
|
|
|
depending on the programming language used to write the compiler.
|
|
@@ -789,7 +794,7 @@ in Figure~\ref{fig:r0-syntax}. The concrete syntax for \LangInt{} is
|
|
|
defined in Figure~\ref{fig:r0-concrete-syntax}.
|
|
|
|
|
|
The \code{read-program} function provided in \code{utilities.rkt} of
|
|
|
-the support materials reads a program in from a file (the sequence of
|
|
|
+the support code reads a program in from a file (the sequence of
|
|
|
characters in the concrete syntax of Racket) and parses it into an
|
|
|
abstract syntax tree. See the description of \code{read-program} in
|
|
|
Appendix~\ref{appendix:utilities} for more details.
|
|
@@ -2068,13 +2073,8 @@ assignment.
|
|
|
\label{fig:c0-syntax}
|
|
|
\end{figure}
|
|
|
|
|
|
-The definitional interpreter for \LangCVar{} is in the support code
|
|
|
-for this book, in the file \code{interp-Cvar.rkt}. The support code is
|
|
|
-in a \code{github} repository at the following URL:
|
|
|
-\begin{center}\footnotesize
|
|
|
- \url{https://github.com/IUCompilerCourse/public-student-support-code}
|
|
|
-\end{center}
|
|
|
-
|
|
|
+The definitional interpreter for \LangCVar{} is in the support code,
|
|
|
+in the file \code{interp-Cvar.rkt}.
|
|
|
|
|
|
\subsection{The \LangXVar{} dialect}
|
|
|
|