|
@@ -169,56 +169,55 @@ University.
|
|
There is a magical moment when a programmer presses the ``run'' button
|
|
There is a magical moment when a programmer presses the ``run'' button
|
|
and the software begins to execute. Somehow a program written in a
|
|
and the software begins to execute. Somehow a program written in a
|
|
high-level language is running on a computer that is only capable of
|
|
high-level language is running on a computer that is only capable of
|
|
-shuffling bits. This book reveals the wizardry that makes that moment
|
|
|
|
|
|
+shuffling bits. Here we reveal the wizardry that makes that moment
|
|
possible. Beginning with the groundbreaking work of Backus and
|
|
possible. Beginning with the groundbreaking work of Backus and
|
|
colleagues in the 1950s, computer scientists discovered techniques for
|
|
colleagues in the 1950s, computer scientists discovered techniques for
|
|
constructing programs, called \emph{compilers}, that automatically
|
|
constructing programs, called \emph{compilers}, that automatically
|
|
translate high-level programs into machine code.
|
|
translate high-level programs into machine code.
|
|
|
|
|
|
-This book guides the reader on the journey of constructing their own
|
|
|
|
-compiler for a small but powerful language. Along the way the reader
|
|
|
|
-learns the essential concepts, algorithms, and data structures that
|
|
|
|
-underlie modern compilers. They develop an understanding of how
|
|
|
|
-programs are mapped onto computer hardware which is helpful when
|
|
|
|
-reasoning about execution time, debugging errors across layers of the
|
|
|
|
-software stack, and finding security vulnerabilities.
|
|
|
|
-%
|
|
|
|
-For readers interested in a career in compiler construction, this book
|
|
|
|
-is a stepping-stone to advanced topics such as just-in-time
|
|
|
|
-compilation, program analysis, and program optimization.
|
|
|
|
-%
|
|
|
|
-For readers interested in the design of programming languages, this
|
|
|
|
-book connects language design choices to their impact on the compiler
|
|
|
|
-and generated code.
|
|
|
|
-
|
|
|
|
-A compiler is typically organized as a pipeline with a handful of
|
|
|
|
-passes that translate a program into ever lower levels of
|
|
|
|
-abstraction. We take this approach to the extreme by partitioning our
|
|
|
|
-compiler into a large number of \emph{nanopasses}, each of which
|
|
|
|
-performs a single task. This makes the compiler easier to debug,
|
|
|
|
-because we test the output of each pass, and it makes the compiler
|
|
|
|
-easier to understand, because each pass involves fewer concepts.
|
|
|
|
-
|
|
|
|
-Most books about compiler construction are structured like the
|
|
|
|
-compiler, with each chapter describing one pass. The problem with that
|
|
|
|
-structure is that it obfuscates how language features motivate design
|
|
|
|
-choices in the compiler. We take an \emph{incremental} approach in
|
|
|
|
-which we build a complete compiler in each chapter, starting with a
|
|
|
|
-tiny language and adding new features in subsequent chapters.
|
|
|
|
|
|
+We take you on a journey by constructing your own compiler for a small
|
|
|
|
+but powerful language. Along the way we explain the essential
|
|
|
|
+concepts, algorithms, and data structures that underlie compilers. We
|
|
|
|
+develop your understanding of how programs are mapped onto computer
|
|
|
|
+hardware, which is helpful when reasoning about properties at the
|
|
|
|
+junction between hardware and software such as execution time,
|
|
|
|
+software errors, and security vulnerabilities. For those interested
|
|
|
|
+in pursuing compiler construction, our goal is to provide a
|
|
|
|
+stepping-stone to advanced topics such as just-in-time compilation,
|
|
|
|
+program analysis, and program optimization. For those interested in
|
|
|
|
+designing and implementing their own programming languages, we connect
|
|
|
|
+language design choices to their impact on the compiler its generated
|
|
|
|
+code.
|
|
|
|
+
|
|
|
|
+A compiler is typically organized as a sequence of stages that
|
|
|
|
+progressively translates a program to code that runs on hardware. We
|
|
|
|
+take this approach to the extreme by partitioning our compiler into a
|
|
|
|
+large number of \emph{nanopasses}, each of which performs a single
|
|
|
|
+task. This allows us to test the output of each pass in isolation, and
|
|
|
|
+furthermore, allows us to focus our attention making the compiler far
|
|
|
|
+easier to understand.
|
|
|
|
+
|
|
|
|
+%% [TODO: easier to understand/debug for those maintaining the compiler,
|
|
|
|
+%% proving correctness]
|
|
|
|
+
|
|
|
|
+The most familiar approach to describing compilers is with one pass
|
|
|
|
+per chapter. The problem with that is it obfuscates how language
|
|
|
|
+features motivate design choices in a compiler. We take an
|
|
|
|
+\emph{incremental} approach in which we build a complete compiler in
|
|
|
|
+each chapter, starting with arithmetic and variables and add new
|
|
|
|
+features in subsequent chapters.
|
|
|
|
|
|
Our choice of language features is designed to elicit the fundamental
|
|
Our choice of language features is designed to elicit the fundamental
|
|
-concepts and algorithms used in compilers for modern programming
|
|
|
|
-languages.
|
|
|
|
|
|
+concepts and algorithms used in compilers.
|
|
\begin{itemize}
|
|
\begin{itemize}
|
|
-\item We begin with integer arithmetic and local variables. The
|
|
|
|
- reader becomes acquainted with the basic tools of compiler
|
|
|
|
- construction, \emph{abstract syntax trees} and \emph{recursive
|
|
|
|
- functions}, in Chapter~\ref{ch:trees-recur} and applies them to a
|
|
|
|
- language with integers and variables in Chapter~\ref{ch:Rvar}. In
|
|
|
|
- Chapter~\ref{ch:register-allocation-Rvar} we apply \emph{graph
|
|
|
|
- coloring} to assign variables to registers.
|
|
|
|
-\item Chapter~\ref{ch:Rif} adds conditional control-flow, which
|
|
|
|
- motivates an elegant recursive algorithm for mapping expressions to
|
|
|
|
|
|
+\item We begin with integer arithmetic and local variables in
|
|
|
|
+ Chapters~\ref{ch:trees-recur} and \ref{ch:Rvar}, where we introduce
|
|
|
|
+ the fundamental tools of compiler construction: \emph{abstract
|
|
|
|
+ syntax trees} and \emph{recursive functions}.
|
|
|
|
+\item In Chapter~\ref{ch:register-allocation-Rvar} we apply
|
|
|
|
+ \emph{graph coloring} to assign variables to machine registers.
|
|
|
|
+\item Chapter~\ref{ch:Rif} adds \code{if} expressions, which motivates
|
|
|
|
+ an elegant recursive algorithm for mapping expressions to
|
|
\emph{control-flow graphs}.
|
|
\emph{control-flow graphs}.
|
|
\item Chapter~\ref{ch:Rvec} adds heap-allocated tuples, motivating
|
|
\item Chapter~\ref{ch:Rvec} adds heap-allocated tuples, motivating
|
|
\emph{garbage collection}.
|
|
\emph{garbage collection}.
|
|
@@ -339,104 +338,6 @@ assembly code that we generate does \emph{not} work with the runtime
|
|
system on Windows. One workaround is to use a virtual machine with
|
|
system on Windows. One workaround is to use a virtual machine with
|
|
Linux as the guest operating system.
|
|
Linux as the guest operating system.
|
|
|
|
|
|
-% TODO: point to support code on github
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-
|
|
|
|
-%% The tradition of compiler writing at Indiana University goes back to
|
|
|
|
-%% research and courses on programming languages by Professor Daniel
|
|
|
|
-%% Friedman in the 1970's and 1980's. Friedman conducted research on lazy
|
|
|
|
-%% evaluation~\citep{Friedman:1976aa} in the context of
|
|
|
|
-%% Lisp~\citep{McCarthy:1960dz} and then studied
|
|
|
|
-%% continuations~\citep{Felleisen:kx} and
|
|
|
|
-%% macros~\citep{Kohlbecker:1986dk} in the context of the
|
|
|
|
-%% Scheme~\citep{Sussman:1975ab}, a dialect of Lisp. One of the students
|
|
|
|
-%% of those courses, Kent Dybvig, went on to build Chez
|
|
|
|
-%% Scheme~\citep{Dybvig:2006aa}, a production-quality and efficient
|
|
|
|
-%% compiler for Scheme. After completing his Ph.D. at the University of
|
|
|
|
-%% North Carolina, he returned to teach at Indiana University.
|
|
|
|
-%% Throughout the 1990's and 2000's, Professor Dybvig continued
|
|
|
|
-%% development of Chez Scheme and taught the compiler course.
|
|
|
|
-
|
|
|
|
-%% The compiler course evolved to incorporate novel pedagogical ideas
|
|
|
|
-%% while also including elements of effective real-world compilers. One
|
|
|
|
-%% of Friedman's ideas was to split the compiler into many small
|
|
|
|
-%% ``passes'' so that the code for each pass would be easy to understood
|
|
|
|
-%% in isolation. In contrast, most compilers of the time were organized
|
|
|
|
-%% into only a few monolithic passes for reasons of compile-time
|
|
|
|
-%% efficiency. Another idea, called ``the game'', was to test the code
|
|
|
|
-%% generated by each pass on interpreters for each intermediate language,
|
|
|
|
-%% thereby helping to pinpoint errors in individual passes.
|
|
|
|
-%% %
|
|
|
|
-%% Dybvig, with later help from his students Dipanwita Sarkar and Andrew
|
|
|
|
-%% Keep, developed infrastructure to support this approach and evolved
|
|
|
|
-%% the course, first to use smaller micro-passes and then into even
|
|
|
|
-%% smaller nano-passes~\citep{Sarkar:2004fk,Keep:2012aa}. I was a student
|
|
|
|
-%% in this compiler course in the early 2000's as part of my
|
|
|
|
-%% Ph.D. studies at Indiana University. Needless to say, I enjoyed the
|
|
|
|
-%% course immensely!
|
|
|
|
-
|
|
|
|
-%% During that time, another graduate student named Abdulaziz Ghuloum
|
|
|
|
-%% observed that the front-to-back organization of the course made it
|
|
|
|
-%% difficult for students to understand the rationale for the compiler
|
|
|
|
-%% design. Ghuloum proposed an incremental approach in which the students
|
|
|
|
-%% start by implementing a complete compiler for a very small subset of
|
|
|
|
-%% the language. In each subsequent stage they add a feature to the
|
|
|
|
-%% language and then add or modify passes to handle the new
|
|
|
|
-%% feature~\citep{Ghuloum:2006bh}. In this way, the students see how the
|
|
|
|
-%% language features motivate aspects of the compiler design.
|
|
|
|
-
|
|
|
|
-%% After graduating from Indiana University in 2005, I went on to teach
|
|
|
|
-%% at the University of Colorado. I adapted the nano-pass and incremental
|
|
|
|
-%% approaches to compiling a subset of the Python
|
|
|
|
-%% language~\citep{Siek:2012ab}.
|
|
|
|
-%% %% Python and Scheme are quite different
|
|
|
|
-%% %% on the surface but there is a large overlap in the compiler techniques
|
|
|
|
-%% %% required for the two languages. Thus, I was able to teach much of the
|
|
|
|
-%% %% same content from the Indiana compiler course.
|
|
|
|
-%% I very much enjoyed teaching the course organized in this way, and
|
|
|
|
-%% even better, many of the students learned a lot and got excited about
|
|
|
|
-%% compilers.
|
|
|
|
-
|
|
|
|
-%% I returned to Indiana University in 2013. In my absence the compiler
|
|
|
|
-%% course had switched from the front-to-back organization to a
|
|
|
|
-%% back-to-front~\citep{Dybvig:2010aa}. While that organization also works
|
|
|
|
-%% well, I prefer the incremental approach and started porting and
|
|
|
|
-%% adapting the structure of the Colorado course back into the land of
|
|
|
|
-%% Scheme. In the meantime Indiana University had moved on from Scheme to
|
|
|
|
-%% Racket~\citep{plt-tr}, so the course is now about compiling a subset
|
|
|
|
-%% of Racket (and Typed Racket) to the x86 assembly language.
|
|
|
|
-
|
|
|
|
-%% This is the textbook for the incremental version of the compiler
|
|
|
|
-%% course at Indiana University (Spring 2016 - present). With this book
|
|
|
|
-%% I hope to make the Indiana compiler course available to people that
|
|
|
|
-%% have not had the chance to study compilers at Indiana University.
|
|
|
|
-
|
|
|
|
-%% %% I have captured what
|
|
|
|
-%% %% I think are the most important topics from \cite{Dybvig:2010aa} but
|
|
|
|
-%% %% have omitted topics that are less interesting conceptually. I have
|
|
|
|
-%% %% also made simplifications to reduce complexity. In this way, this
|
|
|
|
-%% %% book leans more towards pedagogy than towards the efficiency of the
|
|
|
|
-%% %% generated code. Also, the book differs in places where we I the
|
|
|
|
-%% %% opportunity to make the topics more fun, such as in relating register
|
|
|
|
-%% %% allocation to Sudoku (Chapter~\ref{ch:register-allocation-Rvar}).
|
|
|
|
-
|
|
|
|
-%% \section*{Prerequisites}
|
|
|
|
-
|
|
|
|
-%% The material in this book is challenging but rewarding. It is meant to
|
|
|
|
-%% prepare students for a lifelong career in programming languages.
|
|
|
|
-
|
|
|
|
-%% %\section*{Structure of book}
|
|
|
|
-%% % You might want to add short description about each chapter in this book.
|
|
|
|
-
|
|
|
|
-%% %\section*{About the companion website}
|
|
|
|
-%% %The website\footnote{\url{https://github.com/amberj/latex-book-template}} for %this file contains:
|
|
|
|
-%% %\begin{itemize}
|
|
|
|
-%% % \item A link to (freely downlodable) latest version of this document.
|
|
|
|
-%% % \item Link to download LaTeX source for this document.
|
|
|
|
-%% % \item Miscellaneous material (e.g. suggested readings etc).
|
|
|
|
-%% %\end{itemize}
|
|
|
|
-
|
|
|
|
\section*{Acknowledgments}
|
|
\section*{Acknowledgments}
|
|
|
|
|
|
The tradition of compiler construction at Indiana University goes back
|
|
The tradition of compiler construction at Indiana University goes back
|
|
@@ -471,13 +372,11 @@ We thank Ronald Garcia for being Jeremy's partner when they took the
|
|
compiler course in the early 2000's and especially for finding the bug
|
|
compiler course in the early 2000's and especially for finding the bug
|
|
that sent the garbage collector on a wild goose chase!
|
|
that sent the garbage collector on a wild goose chase!
|
|
|
|
|
|
-%Oscar Waddell ??
|
|
|
|
-
|
|
|
|
\mbox{}\\
|
|
\mbox{}\\
|
|
\noindent Jeremy G. Siek \\
|
|
\noindent Jeremy G. Siek \\
|
|
Bloomington, Indiana
|
|
Bloomington, Indiana
|
|
-%\noindent \url{http://homes.soic.indiana.edu/jsiek} \\
|
|
|
|
-%\noindent Spring 2016
|
|
|
|
|
|
+
|
|
|
|
+%Oscar Waddell ??
|
|
|
|
|
|
|
|
|
|
|
|
|