4 năm trước cách đây · cc2f8dbfa8
--- a/book.tex
+++ b/book.tex
@@ -169,45 +169,42 @@ University.
 
				 There is a magical moment when a programmer presses the ``run'' button
			
 
				 and the software begins to execute. Somehow a program written in a
			
 
				 high-level language is running on a computer that is only capable of
			
 
				-shuffling bits. This book reveals the wizardry that makes that
			
 
				-transformation possible. Beginning with the groundbreaking work of
			
 
				-Backus and colleagues in the 1950s, computer scientists discovered
			
 
				-techniques for constructing programs, called \emph{compilers}, that
			
 
				-automatically translate high-level programs into machine code.
			
 
				+shuffling bits. This book reveals the wizardry that makes that moment
			
 
				+possible. Beginning with the groundbreaking work of Backus and
			
 
				+colleagues in the 1950s, computer scientists discovered techniques for
			
 
				+constructing programs, called \emph{compilers}, that automatically
			
 
				+translate high-level programs into machine code.
			
 
				 
			
 
				-This book guides the reader on a journey, constructing their own
			
 
				+This book guides the reader on the journey of constructing their own
			
 
				 compiler for a small but powerful language. Along the way the reader
			
 
				 learns the essential concepts, algorithms, and data structures that
			
 
				-underlie modern compilers. They develop a clear understanding of how
			
 
				+underlie modern compilers. They develop an understanding of how
			
 
				 programs are mapped onto computer hardware which is helpful when
			
 
				 reasoning about execution time, debugging errors across layers of the
			
 
				-software stack, and understanding security vulnerabilities in a piece
			
 
				-of code.
			
 
				+software stack, and finding security vulnerabilities.
			
 
				 %
			
 
				 For readers interested in a career in compiler construction, this book
			
 
				-serves as stepping-stone to more advanced topics such as just-in-time
			
 
				+is a stepping-stone to advanced topics such as just-in-time
			
 
				 compilation, program analysis, and program optimization.
			
 
				 %
			
 
				-For readers interested in the creation of programming languages, this
			
 
				-book connects language design choices to their impact on compiler
			
 
				-organization and the generated code.
			
 
				+For readers interested in the design of programming languages, this
			
 
				+book connects language design choices to their impact on the compiler
			
 
				+and generated code.
			
 
				 
			
 
				-Compilers are typically organized into a pipeline with a handful of
			
 
				-stages, called passes, that translate a program into lower-level
			
 
				-abstractions. We take this approach to the extreme by splitting the
			
 
				+A compiler is typically organized as a pipeline with a handful of
			
 
				+passes that translate a program into ever lower levels of
			
 
				+abstraction. We take this approach to the extreme by partitioning our
			
 
				 compiler into a large number of \emph{nanopasses}, each of which
			
 
				 performs a single task. This makes the compiler easier to debug,
			
 
				 because we test the output of each pass, and it makes the compiler
			
 
				 easier to understand, because each pass involves fewer concepts.
			
 
				 
			
 
				-Most books about compiler construction are structured in the same way
			
 
				-as the compiler, with each chapter describing how to construct one
			
 
				-pass. The problem with that structure is that it becomes easy to lose
			
 
				-sight of which features of the input language motivate the design
			
 
				-choices in a particular pass of the compiler.  We instead take an
			
 
				-\emph{incremental} approach in which we build a complete compiler in
			
 
				-each chapter, starting with a tiny language and adding new features in
			
 
				-each subsequent chapter.
			
 
				+Most books about compiler construction are structured like the
			
 
				+compiler, with each chapter describing one pass. The problem with that
			
 
				+structure is that it obfuscates how language features motivate design
			
 
				+choices in the compiler. We take an \emph{incremental} approach in
			
 
				+which we build a complete compiler in each chapter, starting with a
			
 
				+tiny language and adding new features in subsequent chapters.
			
 
				 
			
 
				 Our choice of language features is designed to elicit the fundamental
			
 
				 concepts and algorithms used in compilers for modern programming
			
@@ -221,19 +218,21 @@ languages.
 
				   Chapter~\ref{ch:register-allocation-Rvar} we apply \emph{graph
			
 
				     coloring} to assign variables to registers.
			
 
				 \item Chapter~\ref{ch:Rif} adds conditional control-flow, which
			
 
				-  motivates the need for \emph{control-flow graphs}.
			
 
				+  motivates an elegant recursive algorithm for mapping expressions to
			
 
				+  \emph{control-flow graphs}.
			
 
				 \item Chapter~\ref{ch:Rvec} adds heap-allocated tuples, motivating
			
 
				   \emph{garbage collection}.
			
 
				-\item Chapter~\ref{ch:Rfun} adds functions similar those in the C
			
 
				-  programming language~\citep{Kernighan:1988nx}: first-class values
			
 
				-  without lexical scoping. The reader learns about the procedure call
			
 
				-  stack, \emph{calling conventions}, and their interaction with
			
 
				-  register allocation and garbage collection.
			
 
				+\item Chapter~\ref{ch:Rfun} adds functions that are first-class values
			
 
				+  but lack lexical scoping, similar to the C programming
			
 
				+  language~\citep{Kernighan:1988nx} except that we generate efficient
			
 
				+  tail calls. The reader learns about the procedure call stack,
			
 
				+  \emph{calling conventions}, and their interaction with register
			
 
				+  allocation and garbage collection.
			
 
				 \item Chapter~\ref{ch:Rlam} adds anonymous functions with lexical
			
 
				   scoping, i.e., \emph{lambda abstraction}. The reader learns about
			
 
				   \emph{closure conversion}, in which lambdas are translated into a
			
 
				   combination of functions and tuples.
			
 
				-\item Chapter~\ref{ch:Rdyn} adds \emph{dynamic typing}. Up until this
			
 
				+\item Chapter~\ref{ch:Rdyn} adds \emph{dynamic typing}. Prior to this
			
 
				   point the input languages are statically typed.  The reader extends
			
 
				   the statically typed language with an \code{Any} type which serves
			
 
				   as a target for compiling the dynamically typed language.
			
@@ -250,31 +249,31 @@ languages.
 
				   leveraging the \code{Any} type and type casts developed in Chapters
			
 
				   \ref{ch:Rdyn} and \ref{ch:Rgrad}.
			
 
				 \end{itemize}
			
 
				-Alas, there are many language features that we do not include. Our
			
 
				-choices are informed by a cost-benefit analysis in which we weigh the
			
 
				-incidental complexity of a feature against the number of fundamental
			
 
				+There are many language features that we do not include. Our choices
			
 
				+weigh the incidental complexity of a feature against the fundamental
			
 
				 concepts that it exposes. For example, we include tuples and not
			
 
				 records because they both elicit the study of heap allocation and
			
 
				 garbage collection but records come with more incidental complexity.
			
 
				 
			
 
				 Since 2016 this book has served as the textbook for the compiler
			
 
				 course at Indiana University, a 16-week course for upper-level
			
 
				-undergraduates and first-year graduate students.  Prior to this
			
 
				-course, students learn to program in both imperative and functional
			
 
				-languages, study data structures and algorithms, and take discrete
			
 
				-mathematics.
			
 
				-%
			
 
				-The students form groups of 2-4 people and complete one chapter every
			
 
				-two weeks, starting with Chapter~\ref{ch:Rvar} and finishing with
			
 
				-Chapter~\ref{ch:Rdyn}. Most chapters include a challenge problem that
			
 
				-we assign to the graduate students. The last two weeks of the course
			
 
				-are reserved for a final project in which students design and
			
 
				-implement an extension to the compiler of their choosing.
			
 
				-Chapters~\ref{ch:Rwhile}, \ref{ch:Rgrad}, and \ref{ch:Rpoly} can be
			
 
				-used in support of these projects or can be swapped in to replace some
			
 
				-of the earlier chapters. For example, a course with an emphasis on
			
 
				-statically-typed imperative languages would skip Chapter~\ref{ch:Rdyn}
			
 
				-in favor of
			
 
				+undergraduates and first-year graduate students.
			
 
				+%
			
 
				+Prior to this course, students learn to program in both imperative and
			
 
				+functional languages, study data structures and algorithms, and take
			
 
				+discrete mathematics.
			
 
				+%
			
 
				+At the beginning of the course, students form groups of 2-4 people.
			
 
				+The groups complete one chapter every two weeks, starting with
			
 
				+Chapter~\ref{ch:Rvar} and finishing with Chapter~\ref{ch:Rdyn}. Many
			
 
				+chapters include a challenge problem that we assign to the graduate
			
 
				+students. The last two weeks of the course involve a final project in
			
 
				+which students design and implement a compiler extension of their
			
 
				+choosing.  Chapters~\ref{ch:Rwhile}, \ref{ch:Rgrad}, and
			
 
				+\ref{ch:Rpoly} can be used in support of these projects or they can
			
 
				+replace some of the earlier chapters. For example, a course with an
			
 
				+emphasis on statically-typed imperative languages would skip
			
 
				+Chapter~\ref{ch:Rdyn} in favor of
			
 
				 Chapter~\ref{ch:Rwhile}. Figure~\ref{fig:chapter-dependences} depicts
			
 
				 the dependencies between chapters.
			
 
				 
			
@@ -285,17 +284,17 @@ University of Massachusetts Lowell.
 
				 
			
 
				 \begin{figure}[tp]
			
 
				 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
			
 
				-  \node (C1) at (0,1) {\small Ch.~\ref{ch:trees-recur} Preliminaries};
			
 
				-  \node (C2) at (4,1) {\small Ch.~\ref{ch:Rvar} Variables};
			
 
				-  \node (C3) at (8,1) {\small Ch.~\ref{ch:register-allocation-Rvar} Registers};
			
 
				+  \node (C1) at (0,1.5) {\small Ch.~\ref{ch:trees-recur} Preliminaries};
			
 
				+  \node (C2) at (4,1.5) {\small Ch.~\ref{ch:Rvar} Variables};
			
 
				+  \node (C3) at (8,1.5) {\small Ch.~\ref{ch:register-allocation-Rvar} Registers};
			
 
				   \node (C4) at (0,0) {\small Ch.~\ref{ch:Rif} Control Flow};
			
 
				   \node (C5) at (4,0) {\small Ch.~\ref{ch:Rvec} Tuples};
			
 
				   \node (C6) at (8,0) {\small Ch.~\ref{ch:Rfun} Functions};
			
 
				-  \node (C9) at (0,-1) {\small Ch.~\ref{ch:Rwhile} Loops};
			
 
				-  \node (C8) at (4,-1) {\small Ch.~\ref{ch:Rdyn} Dynamic};
			
 
				-  \node (C7) at (8,-1) {\small Ch.~\ref{ch:Rlam} Lambda};
			
 
				-  \node (C10) at (4,-2) {\small Ch.~\ref{ch:Rgrad} Gradual};
			
 
				-  \node (C11) at (8,-2) {\small Ch.~\ref{ch:Rpoly} Generics};
			
 
				+  \node (C9) at (0,-1.5) {\small Ch.~\ref{ch:Rwhile} Loops};
			
 
				+  \node (C8) at (4,-1.5) {\small Ch.~\ref{ch:Rdyn} Dynamic};
			
 
				+  \node (C7) at (8,-1.5) {\small Ch.~\ref{ch:Rlam} Lambda};
			
 
				+  \node (C10) at (4,-3) {\small Ch.~\ref{ch:Rgrad} Gradual};
			
 
				+  \node (C11) at (8,-3) {\small Ch.~\ref{ch:Rpoly} Generics};
			
 
				 
			
 
				   \path[->] (C1) edge [above] node {} (C2);
			
 
				   \path[->] (C2) edge [above] node {} (C3);
			
@@ -312,12 +311,16 @@ University of Massachusetts Lowell.
 
				   \label{fig:chapter-dependences}
			
 
				 \end{figure}
			
 
				 
			
 
				-This book uses the \href{https://racket-lang.org/}{Racket} language
			
 
				-both for the implementation of the compiler and for the input
			
 
				-language, so the reader should be proficient with Racket or Scheme
			
 
				-prior to reading this book. There are many excellent resources for
			
 
				-learning Scheme and
			
 
				-Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}.
			
 
				+We use the \href{https://racket-lang.org/}{Racket} language both for
			
 
				+the implementation of the compiler and for the input language, so the
			
 
				+reader should be proficient with Racket or Scheme. There are many
			
 
				+excellent resources for learning Scheme and
			
 
				+Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}. The
			
 
				+support code for this book is in the \code{github} repository at the
			
 
				+following URL:
			
 
				+\begin{center}\small
			
 
				+  \url{https://github.com/IUCompilerCourse/public-student-support-code}
			
 
				+\end{center}
			
 
				 
			
 
				 The compiler targets x86 assembly language~\citep{Intel:2015aa}, so it
			
 
				 is helpful but not necessary for the reader to have taken a computer
			
@@ -325,18 +328,16 @@ systems course~\citep{Bryant:2010aa}. This book introduces the parts
 
				 of x86-64 assembly language that are needed.
			
 
				 %
			
 
				 We follow the System V calling
			
 
				-conventions~\citep{Bryant:2005aa,Matz:2013aa}, which means that the
			
 
				-assembly code that we generate will work properly with our runtime
			
 
				-system (written in C) when it is compiled using the GNU C compiler
			
 
				-(\code{gcc}) on the Linux and MacOS operating systems. (Minor
			
 
				-adjustments are needed for MacOS which we note as they arise.)
			
 
				-%
			
 
				-The GNU C compiler, when running on the Microsoft Windows operating
			
 
				-system, follows the Microsoft x64 calling
			
 
				-convention~\citep{Microsoft:2018aa,Microsoft:2020aa}. So the assembly
			
 
				-code that we generate will \emph{not} work properly with our runtime
			
 
				-system on Windows. One option to consider for using a Windows computer
			
 
				-is to run a virtual machine with Linux as the guest operating system.
			
 
				+conventions~\citep{Bryant:2005aa,Matz:2013aa}, so the assembly code
			
 
				+that we generate works with the runtime system (written in C) when it
			
 
				+is compiled using the GNU C compiler (\code{gcc}) on Linux and MacOS
			
 
				+operating systems.
			
 
				+%
			
 
				+On the Windows operating system, \code{gcc} uses the Microsoft x64
			
 
				+calling convention~\citep{Microsoft:2018aa,Microsoft:2020aa}. So the
			
 
				+assembly code that we generate does \emph{not} work with the runtime
			
 
				+system on Windows. One workaround is to use a virtual machine with
			
 
				+Linux as the guest operating system.
			
 
				 
			
 
				 % TODO: point to support code on github
			
 
				 
			
@@ -438,13 +439,12 @@ is to run a virtual machine with Linux as the guest operating system.
 
				 
			
 
				 \section*{Acknowledgments}
			
 
				 
			
 
				-The tradition of compiler writing at Indiana University goes back to
			
 
				-research and courses on programming languages by Professor Daniel
			
 
				-Friedman in the 1970's and 1980's.  One of his students, Kent Dybvig,
			
 
				-built Chez Scheme~\citep{Dybvig:2006aa}, a production-quality and
			
 
				-efficient compiler for Scheme.  Throughout the 1990's and 2000's,
			
 
				-Professor Dybvig taught the compiler course and continued development
			
 
				-of Chez Scheme.
			
 
				+The tradition of compiler construction at Indiana University goes back
			
 
				+to research and courses on programming languages by Daniel Friedman in
			
 
				+the 1970's and 1980's.  One of his students, Kent Dybvig, implemented
			
 
				+Chez Scheme~\citep{Dybvig:2006aa}, a production-quality, efficient
			
 
				+compiler for Scheme.  Throughout the 1990's and 2000's, Dybvig taught
			
 
				+the compiler course and continued the development of Chez Scheme.
			
 
				 %
			
 
				 The compiler course evolved to incorporate novel pedagogical ideas
			
 
				 while also including elements of efficient real-world compilers.  One
			
@@ -452,24 +452,29 @@ of Friedman's ideas was to split the compiler into many small
 
				 passes. Another idea, called ``the game'', was to test the code
			
 
				 generated by each pass on interpreters.
			
 
				 
			
 
				-Dybvig, with later help from his students Dipanwita Sarkar and Andrew
			
 
				-Keep, developed infrastructure to support this approach and evolved
			
 
				-the course use even smaller
			
 
				+Dybvig, with help from his students Dipanwita Sarkar and Andrew Keep,
			
 
				+developed infrastructure to support this approach and evolved the
			
 
				+course to use even smaller
			
 
				 nanopasses~\citep{Sarkar:2004fk,Keep:2012aa}.  Many of the compiler
			
 
				-design decisions in this book are drawn from the assignment
			
 
				-descriptions of \citet{Dybvig:2010aa}. A graduate student named
			
 
				-Abdulaziz Ghuloum observed that the front-to-back organization of the
			
 
				-course made it difficult for students to understand the rationale for
			
 
				-the compiler design. Ghuloum proposed the incremental
			
 
				-approach~\citep{Ghuloum:2006bh}.
			
 
				+design decisions in this book are inspired by the assignment
			
 
				+descriptions of \citet{Dybvig:2010aa}. In the mid 2000's a student of
			
 
				+Dybvig's named Abdulaziz Ghuloum observed that the front-to-back
			
 
				+organization of the course made it difficult for students to
			
 
				+understand the rationale for the compiler design. Ghuloum proposed the
			
 
				+incremental approach~\citep{Ghuloum:2006bh}.
			
 
				+
			
 
				+We thank Bor-Yuh Chang, John Clements, Jay McCarthy, Nate Nystrom, and
			
 
				+Michael Wollowski for teaching courses based on early drafts.
			
 
				 
			
 
				-We thank John Clements, Bor-Yuh Evan Chang, Daniel P. Friedman, Ronald
			
 
				-Garcia, Abdulaziz Ghuloum, Jay McCarthy, Nate Nystrom, Dipanwita
			
 
				-Sarkar, Oscar Waddell, and Michael Wollowski.
			
 
				+We thank Ronald Garcia for being Jeremy's partner when they took the
			
 
				+compiler course in the early 2000's, especially for finding the bug
			
 
				+that was send the garbage collector on a wild goose chase!
			
 
				+
			
 
				+%Oscar Waddell ??
			
 
				 
			
 
				 \mbox{}\\
			
 
				 \noindent Jeremy G. Siek \\
			
 
				-Indiana University
			
 
				+Bloomington, Indiana
			
 
				 %\noindent \url{http://homes.soic.indiana.edu/jsiek} \\
			
 
				 %\noindent Spring 2016
			
 
				 
			
@@ -491,7 +496,7 @@ perform.\index{concrete syntax}\index{abstract syntax}\index{abstract
 
				 from concrete syntax to abstract syntax is a process called
			
 
				 \emph{parsing}~\citep{Aho:1986qf}. We do not cover the theory and
			
 
				 implementation of parsing in this book. A parser is provided in the
			
 
				-supporting materials for translating from concrete to abstract syntax.
			
 
				+support code for translating from concrete to abstract syntax.
			
 
				 
			
 
				 ASTs can be represented in many different ways inside the compiler,
			
 
				 depending on the programming language used to write the compiler.
			
@@ -789,7 +794,7 @@ in Figure~\ref{fig:r0-syntax}. The concrete syntax for \LangInt{} is
 
				 defined in Figure~\ref{fig:r0-concrete-syntax}.
			
 
				 
			
 
				 The \code{read-program} function provided in \code{utilities.rkt} of
			
 
				-the support materials reads a program in from a file (the sequence of
			
 
				+the support code reads a program in from a file (the sequence of
			
 
				 characters in the concrete syntax of Racket) and parses it into an
			
 
				 abstract syntax tree. See the description of \code{read-program} in
			
 
				 Appendix~\ref{appendix:utilities} for more details.
			
@@ -2068,13 +2073,8 @@ assignment.
 
				 \label{fig:c0-syntax}
			
 
				 \end{figure}
			
 
				 
			
 
				-The definitional interpreter for \LangCVar{} is in the support code
			
 
				-for this book, in the file \code{interp-Cvar.rkt}. The support code is
			
 
				-in a \code{github} repository at the following URL:
			
 
				-\begin{center}\footnotesize
			
 
				-  \url{https://github.com/IUCompilerCourse/public-student-support-code}
			
 
				-\end{center}
			
 
				-
			
 
				+The definitional interpreter for \LangCVar{} is in the support code,
			
 
				+in the file \code{interp-Cvar.rkt}.
			
 
				 
			
 
				 \subsection{The \LangXVar{} dialect}