4 年前 · 3b20363e71
--- a/book.tex
+++ b/book.tex
@@ -166,100 +166,163 @@ University.
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 \chapter*{Preface}
			
 
				 
			
 
				-The tradition of compiler writing at Indiana University goes back to
			
 
				-research and courses on programming languages by Professor Daniel
			
 
				-Friedman in the 1970's and 1980's. Friedman conducted research on lazy
			
 
				-evaluation~\citep{Friedman:1976aa} in the context of
			
 
				-Lisp~\citep{McCarthy:1960dz} and then studied
			
 
				-continuations~\citep{Felleisen:kx} and
			
 
				-macros~\citep{Kohlbecker:1986dk} in the context of the
			
 
				-Scheme~\citep{Sussman:1975ab}, a dialect of Lisp.  One of the students
			
 
				-of those courses, Kent Dybvig, went on to build Chez
			
 
				-Scheme~\citep{Dybvig:2006aa}, a production-quality and efficient
			
 
				-compiler for Scheme. After completing his Ph.D. at the University of
			
 
				-North Carolina, he returned to teach at Indiana University.
			
 
				-Throughout the 1990's and 2000's, Professor Dybvig continued
			
 
				-development of Chez Scheme and taught the compiler course.
			
 
				+There is a magical moment when a programmer presses the ``run'' button
			
 
				+and the software begins to execute. Somehow a program written in a
			
 
				+high-level language is running on a computer that is only capable of
			
 
				+shuffling bits. This book reveals the wizardry that makes that
			
 
				+transformation possible. Beginning with the groundbreaking work of
			
 
				+Backus and colleagues in the 1950s, computer scientists discovered
			
 
				+techniques for constructing programs, called \emph{compilers}, that
			
 
				+automatically translate high-level programs into machine code.
			
 
				+
			
 
				+This book guides the reader on a journey, constructing their own
			
 
				+compiler for a small but powerful language. Along the way the reader
			
 
				+learns the essential concepts, algorithms, and data structures that
			
 
				+underlie modern compilers. They develop a clear understanding of how
			
 
				+programs are mapped onto computer hardware which is helpful when
			
 
				+reasoning about execution time, debugging errors across layers of the
			
 
				+software stack, and understanding security vulnerabilities in a piece
			
 
				+of code.
			
 
				+%
			
 
				+For readers interested in a career in compiler construction, this book
			
 
				+serves as stepping-stone to more advanced topics such as just-in-time
			
 
				+compilation, program analysis, and program optimization.
			
 
				+%
			
 
				+For readers interested in the creation of programming languages, this
			
 
				+book connects language design choices to their impact on compiler
			
 
				+organization and the generated code.
			
 
				+
			
 
				+Compilers are typically organized into a pipeline with a handful of
			
 
				+stages, called passes, that translate a program into lower-level
			
 
				+abstractions. We take this approach to the extreme by splitting the
			
 
				+compiler into a large number of \emph{nanopasses}, each of which
			
 
				+performs a single task. This makes the compiler easier to debug,
			
 
				+because we test the output of each pass, and it makes the compiler
			
 
				+easier to understand, because each pass involves fewer concepts.
			
 
				+
			
 
				+Most books about compiler construction are structured in the same way
			
 
				+as the compiler, with each chapter describing how to construct one
			
 
				+pass. The problem with that structure is that it becomes easy to lose
			
 
				+sight of which features of the input language motivate the design
			
 
				+choices in a particular pass of the compiler.  We instead take an
			
 
				+\emph{incremental} approach in which we build a complete compiler in
			
 
				+each chapter, starting with a tiny language and adding new features in
			
 
				+each subsequent chapter.
			
 
				+
			
 
				+Our choice of language features is designed to elicit the fundamental
			
 
				+concepts and algorithms used in compilers for modern programming
			
 
				+languages.
			
 
				+\begin{itemize}
			
 
				+\item We begin with integer arithmetic and local variables.  The
			
 
				+  reader becomes acquainted with the basic tools of compiler
			
 
				+  construction, \emph{abstract syntax trees} and \emph{recursive
			
 
				+    functions}, in Chapter~\ref{ch:trees-recur} and applies them to a
			
 
				+  language with integers and variables in Chapter~\ref{ch:Rvar}. In
			
 
				+  Chapter~\ref{ch:register-allocation-Rvar} we apply \emph{graph
			
 
				+    coloring} to assign variables to registers.
			
 
				+\item Chapter~\ref{ch:Rif} adds conditional control-flow, which
			
 
				+  motivates the need for \emph{control-flow graphs}.
			
 
				+\item Chapter~\ref{ch:Rvec} adds heap-allocated tuples, motivating
			
 
				+  \emph{garbage collection}.
			
 
				+\item Chapter~\ref{ch:Rfun} adds functions similar those in the C
			
 
				+  programming language~\citep{Kernighan:1988nx}: first-class values
			
 
				+  without lexical scoping. The reader learns about the procedure call
			
 
				+  stack, \emph{calling conventions}, and their interaction with
			
 
				+  register allocation and garbage collection.
			
 
				+\item Chapter~\ref{ch:Rlam} adds anonymous functions with lexical
			
 
				+  scoping, i.e., \emph{lambda abstraction}. The reader learns about
			
 
				+  \emph{closure conversion}, in which lambdas are translated into a
			
 
				+  combination of functions and tuples.
			
 
				+\item Chapter~\ref{ch:Rdyn} adds \emph{dynamic typing}. Up until this
			
 
				+  point the input languages are statically typed.  The reader extends
			
 
				+  the statically typed language with an \code{Any} type which serves
			
 
				+  as a target for compiling the dynamically typed language.
			
 
				+\item Chapter~\ref{ch:Rwhile} fleshes out support for imperative
			
 
				+  programming languages with the addition of loops and mutable
			
 
				+  variables. These additions elicit the need for \emph{dataflow
			
 
				+    analysis} in the register allocator.
			
 
				+\item Chapter~\ref{ch:Rgrad} uses the \code{Any} type of
			
 
				+  Chapter~\ref{ch:Rdyn} to implement a \emph{gradually typed language}
			
 
				+  in which different regions of a program may be static or dynamically
			
 
				+  typed. The reader implements runtime support for \emph{proxies} that
			
 
				+  allow values to safely move between regions.
			
 
				+\item Chapter~\ref{ch:Rpoly} adds \emph{generics} with autoboxing,
			
 
				+  leveraging the \code{Any} type and type casts developed in Chapters
			
 
				+  \ref{ch:Rdyn} and \ref{ch:Rgrad}.
			
 
				+\end{itemize}
			
 
				+Alas, there are many language features that we do not include. Our
			
 
				+choices are informed by a cost-benefit analysis in which we weigh the
			
 
				+incidental complexity of a feature against the number of fundamental
			
 
				+concepts that it exposes. For example, we include tuples and not
			
 
				+records because they both elicit the study of heap allocation and
			
 
				+garbage collection but records come with more incidental complexity.
			
 
				+
			
 
				+Since 2016 this book has served as the textbook for the compiler
			
 
				+course at Indiana University, a 16-week course for upper-level
			
 
				+undergraduates and first-year graduate students.  Prior to this
			
 
				+course, students learn to program in both imperative and functional
			
 
				+languages, study data structures and algorithms, and take discrete
			
 
				+mathematics.
			
 
				+%
			
 
				+The students form groups of 2-4 people and complete one chapter every
			
 
				+two weeks, starting with Chapter~\ref{ch:Rvar} and finishing with
			
 
				+Chapter~\ref{ch:Rdyn}. Most chapters include a challenge problem that
			
 
				+we assign to the graduate students. The last two weeks of the course
			
 
				+are reserved for a final project in which students design and
			
 
				+implement an extension to the compiler of their choosing.
			
 
				+Chapters~\ref{ch:Rwhile}, \ref{ch:Rgrad}, and \ref{ch:Rpoly} can be
			
 
				+used in support of these projects or can be swapped in to replace some
			
 
				+of the earlier chapters. For example, a course with an emphasis on
			
 
				+statically-typed imperative languages would skip Chapter~\ref{ch:Rdyn}
			
 
				+in favor of
			
 
				+Chapter~\ref{ch:Rwhile}. Figure~\ref{fig:chapter-dependences} depicts
			
 
				+the dependencies between chapters.
			
 
				+
			
 
				+This book has also been used in compiler courses at California
			
 
				+Polytechnic State University, Rose–Hulman Institute of Technology, and
			
 
				+University of Massachusetts Lowell.
			
 
				 
			
 
				-The compiler course evolved to incorporate novel pedagogical ideas
			
 
				-while also including elements of effective real-world compilers.  One
			
 
				-of Friedman's ideas was to split the compiler into many small
			
 
				-``passes'' so that the code for each pass would be easy to understood
			
 
				-in isolation.  In contrast, most compilers of the time were organized
			
 
				-into only a few monolithic passes for reasons of compile-time
			
 
				-efficiency. Another idea, called ``the game'', was to test the code
			
 
				-generated by each pass on interpreters for each intermediate language,
			
 
				-thereby helping to pinpoint errors in individual passes.
			
 
				-%
			
 
				-Dybvig, with later help from his students Dipanwita Sarkar and Andrew
			
 
				-Keep, developed infrastructure to support this approach and evolved
			
 
				-the course, first to use smaller micro-passes and then into even
			
 
				-smaller nano-passes~\citep{Sarkar:2004fk,Keep:2012aa}. I was a student
			
 
				-in this compiler course in the early 2000's as part of my
			
 
				-Ph.D. studies at Indiana University. Needless to say, I enjoyed the
			
 
				-course immensely!
			
 
				-
			
 
				-During that time, another graduate student named Abdulaziz Ghuloum
			
 
				-observed that the front-to-back organization of the course made it
			
 
				-difficult for students to understand the rationale for the compiler
			
 
				-design. Ghuloum proposed an incremental approach in which the students
			
 
				-start by implementing a complete compiler for a very small subset of
			
 
				-the language. In each subsequent stage they add a feature to the
			
 
				-language and then add or modify passes to handle the new
			
 
				-feature~\citep{Ghuloum:2006bh}.  In this way, the students see how the
			
 
				-language features motivate aspects of the compiler design.
			
 
				-
			
 
				-After graduating from Indiana University in 2005, I went on to teach
			
 
				-at the University of Colorado. I adapted the nano-pass and incremental
			
 
				-approaches to compiling a subset of the Python
			
 
				-language~\citep{Siek:2012ab}.
			
 
				-%% Python and Scheme are quite different
			
 
				-%% on the surface but there is a large overlap in the compiler techniques
			
 
				-%% required for the two languages. Thus, I was able to teach much of the
			
 
				-%% same content from the Indiana compiler course.
			
 
				-I very much enjoyed teaching the course organized in this way, and
			
 
				-even better, many of the students learned a lot and got excited about
			
 
				-compilers.
			
 
				-
			
 
				-I returned to Indiana University in 2013.  In my absence the compiler
			
 
				-course had switched from the front-to-back organization to a
			
 
				-back-to-front~\citep{Dybvig:2010aa}. While that organization also works
			
 
				-well, I prefer the incremental approach and started porting and
			
 
				-adapting the structure of the Colorado course back into the land of
			
 
				-Scheme. In the meantime Indiana University had moved on from Scheme to
			
 
				-Racket~\citep{plt-tr}, so the course is now about compiling a subset
			
 
				-of Racket (and Typed Racket) to the x86 assembly language.
			
 
				-
			
 
				-This is the textbook for the incremental version of the compiler
			
 
				-course at Indiana University (Spring 2016 - present).  With this book
			
 
				-I hope to make the Indiana compiler course available to people that
			
 
				-have not had the chance to study compilers at Indiana University.
			
 
				-
			
 
				-%% I have captured what
			
 
				-%% I think are the most important topics from \cite{Dybvig:2010aa} but
			
 
				-%% have omitted topics that are less interesting conceptually. I have
			
 
				-%% also made simplifications to reduce complexity.  In this way, this
			
 
				-%% book leans more towards pedagogy than towards the efficiency of the
			
 
				-%% generated code. Also, the book differs in places where we I the
			
 
				-%% opportunity to make the topics more fun, such as in relating register
			
 
				-%% allocation to Sudoku (Chapter~\ref{ch:register-allocation-Rvar}).
			
 
				-
			
 
				-\section*{Prerequisites}
			
 
				-
			
 
				-The material in this book is challenging but rewarding. It is meant to
			
 
				-prepare students for a lifelong career in programming languages.
			
 
				-
			
 
				-The book uses the Racket language both for the implementation of the
			
 
				-compiler and for the language that is compiled, so a student should be
			
 
				-proficient with Racket or Scheme prior to reading this book. There are
			
 
				-many excellent resources for learning Scheme and
			
 
				+
			
 
				+\begin{figure}[tp]
			
 
				+\begin{tikzpicture}[baseline=(current  bounding  box.center)]
			
 
				+  \node (C1) at (0,1) {\small Ch.~\ref{ch:trees-recur} Preliminaries};
			
 
				+  \node (C2) at (4,1) {\small Ch.~\ref{ch:Rvar} Variables};
			
 
				+  \node (C3) at (8,1) {\small Ch.~\ref{ch:register-allocation-Rvar} Registers};
			
 
				+  \node (C4) at (0,0) {\small Ch.~\ref{ch:Rif} Control Flow};
			
 
				+  \node (C5) at (4,0) {\small Ch.~\ref{ch:Rvec} Tuples};
			
 
				+  \node (C6) at (8,0) {\small Ch.~\ref{ch:Rfun} Functions};
			
 
				+  \node (C9) at (0,-1) {\small Ch.~\ref{ch:Rwhile} Loops};
			
 
				+  \node (C8) at (4,-1) {\small Ch.~\ref{ch:Rdyn} Dynamic};
			
 
				+  \node (C7) at (8,-1) {\small Ch.~\ref{ch:Rlam} Lambda};
			
 
				+  \node (C10) at (4,-2) {\small Ch.~\ref{ch:Rgrad} Gradual};
			
 
				+  \node (C11) at (8,-2) {\small Ch.~\ref{ch:Rpoly} Generics};
			
 
				+
			
 
				+  \path[->] (C1) edge [above] node {} (C2);
			
 
				+  \path[->] (C2) edge [above] node {} (C3);
			
 
				+  \path[->] (C3) edge [above] node {} (C4);
			
 
				+  \path[->] (C4) edge [above] node {} (C5);
			
 
				+  \path[->] (C5) edge [above] node {} (C6);
			
 
				+  \path[->] (C6) edge [above] node {} (C7);
			
 
				+  \path[->] (C4) edge [above] node {} (C8);
			
 
				+  \path[->] (C4) edge [above] node {} (C9);
			
 
				+  \path[->] (C8) edge [above] node {} (C10);
			
 
				+  \path[->] (C10) edge [above] node {} (C11);
			
 
				+\end{tikzpicture}
			
 
				+  \caption{Diagram of chapter dependencies.}
			
 
				+  \label{fig:chapter-dependences}
			
 
				+\end{figure}
			
 
				+
			
 
				+This book uses the \href{https://racket-lang.org/}{Racket} language
			
 
				+both for the implementation of the compiler and for the input
			
 
				+language, so the reader should be proficient with Racket or Scheme
			
 
				+prior to reading this book. There are many excellent resources for
			
 
				+learning Scheme and
			
 
				 Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}.
			
 
				 
			
 
				-It is helpful but not necessary for the student to have prior exposure
			
 
				-to the x86 assembly language~\citep{Intel:2015aa}, as one might obtain
			
 
				-from a computer systems
			
 
				-course~\citep{Bryant:2010aa}. This book introduces the
			
 
				-parts of x86-64 assembly language that are needed.
			
 
				+The compiler targets x86 assembly language~\citep{Intel:2015aa}, so it
			
 
				+is helpful but not necessary for the reader to have taken a computer
			
 
				+systems course~\citep{Bryant:2010aa}. This book introduces the parts
			
 
				+of x86-64 assembly language that are needed.
			
 
				 %
			
 
				 We follow the System V calling
			
 
				 conventions~\citep{Bryant:2005aa,Matz:2013aa}, which means that the
			
@@ -275,30 +338,139 @@ code that we generate will \emph{not} work properly with our runtime
 
				 system on Windows. One option to consider for using a Windows computer
			
 
				 is to run a virtual machine with Linux as the guest operating system.
			
 
				 
			
 
				-%\section*{Structure of book}
			
 
				-% You might want to add short description about each chapter in this book.
			
 
				-
			
 
				-%\section*{About the companion website}
			
 
				-%The website\footnote{\url{https://github.com/amberj/latex-book-template}} for %this file contains:
			
 
				-%\begin{itemize}
			
 
				-%  \item A link to (freely downlodable) latest version of this document.
			
 
				-%  \item Link to download LaTeX source for this document.
			
 
				-%  \item Miscellaneous material (e.g. suggested readings etc).
			
 
				-%\end{itemize}
			
 
				+% TODO: point to support code on github
			
 
				+
			
 
				+
			
 
				+
			
 
				+%% The tradition of compiler writing at Indiana University goes back to
			
 
				+%% research and courses on programming languages by Professor Daniel
			
 
				+%% Friedman in the 1970's and 1980's. Friedman conducted research on lazy
			
 
				+%% evaluation~\citep{Friedman:1976aa} in the context of
			
 
				+%% Lisp~\citep{McCarthy:1960dz} and then studied
			
 
				+%% continuations~\citep{Felleisen:kx} and
			
 
				+%% macros~\citep{Kohlbecker:1986dk} in the context of the
			
 
				+%% Scheme~\citep{Sussman:1975ab}, a dialect of Lisp.  One of the students
			
 
				+%% of those courses, Kent Dybvig, went on to build Chez
			
 
				+%% Scheme~\citep{Dybvig:2006aa}, a production-quality and efficient
			
 
				+%% compiler for Scheme. After completing his Ph.D. at the University of
			
 
				+%% North Carolina, he returned to teach at Indiana University.
			
 
				+%% Throughout the 1990's and 2000's, Professor Dybvig continued
			
 
				+%% development of Chez Scheme and taught the compiler course.
			
 
				+
			
 
				+%% The compiler course evolved to incorporate novel pedagogical ideas
			
 
				+%% while also including elements of effective real-world compilers.  One
			
 
				+%% of Friedman's ideas was to split the compiler into many small
			
 
				+%% ``passes'' so that the code for each pass would be easy to understood
			
 
				+%% in isolation.  In contrast, most compilers of the time were organized
			
 
				+%% into only a few monolithic passes for reasons of compile-time
			
 
				+%% efficiency. Another idea, called ``the game'', was to test the code
			
 
				+%% generated by each pass on interpreters for each intermediate language,
			
 
				+%% thereby helping to pinpoint errors in individual passes.
			
 
				+%% %
			
 
				+%% Dybvig, with later help from his students Dipanwita Sarkar and Andrew
			
 
				+%% Keep, developed infrastructure to support this approach and evolved
			
 
				+%% the course, first to use smaller micro-passes and then into even
			
 
				+%% smaller nano-passes~\citep{Sarkar:2004fk,Keep:2012aa}. I was a student
			
 
				+%% in this compiler course in the early 2000's as part of my
			
 
				+%% Ph.D. studies at Indiana University. Needless to say, I enjoyed the
			
 
				+%% course immensely!
			
 
				+
			
 
				+%% During that time, another graduate student named Abdulaziz Ghuloum
			
 
				+%% observed that the front-to-back organization of the course made it
			
 
				+%% difficult for students to understand the rationale for the compiler
			
 
				+%% design. Ghuloum proposed an incremental approach in which the students
			
 
				+%% start by implementing a complete compiler for a very small subset of
			
 
				+%% the language. In each subsequent stage they add a feature to the
			
 
				+%% language and then add or modify passes to handle the new
			
 
				+%% feature~\citep{Ghuloum:2006bh}.  In this way, the students see how the
			
 
				+%% language features motivate aspects of the compiler design.
			
 
				+
			
 
				+%% After graduating from Indiana University in 2005, I went on to teach
			
 
				+%% at the University of Colorado. I adapted the nano-pass and incremental
			
 
				+%% approaches to compiling a subset of the Python
			
 
				+%% language~\citep{Siek:2012ab}.
			
 
				+%% %% Python and Scheme are quite different
			
 
				+%% %% on the surface but there is a large overlap in the compiler techniques
			
 
				+%% %% required for the two languages. Thus, I was able to teach much of the
			
 
				+%% %% same content from the Indiana compiler course.
			
 
				+%% I very much enjoyed teaching the course organized in this way, and
			
 
				+%% even better, many of the students learned a lot and got excited about
			
 
				+%% compilers.
			
 
				+
			
 
				+%% I returned to Indiana University in 2013.  In my absence the compiler
			
 
				+%% course had switched from the front-to-back organization to a
			
 
				+%% back-to-front~\citep{Dybvig:2010aa}. While that organization also works
			
 
				+%% well, I prefer the incremental approach and started porting and
			
 
				+%% adapting the structure of the Colorado course back into the land of
			
 
				+%% Scheme. In the meantime Indiana University had moved on from Scheme to
			
 
				+%% Racket~\citep{plt-tr}, so the course is now about compiling a subset
			
 
				+%% of Racket (and Typed Racket) to the x86 assembly language.
			
 
				+
			
 
				+%% This is the textbook for the incremental version of the compiler
			
 
				+%% course at Indiana University (Spring 2016 - present).  With this book
			
 
				+%% I hope to make the Indiana compiler course available to people that
			
 
				+%% have not had the chance to study compilers at Indiana University.
			
 
				+
			
 
				+%% %% I have captured what
			
 
				+%% %% I think are the most important topics from \cite{Dybvig:2010aa} but
			
 
				+%% %% have omitted topics that are less interesting conceptually. I have
			
 
				+%% %% also made simplifications to reduce complexity.  In this way, this
			
 
				+%% %% book leans more towards pedagogy than towards the efficiency of the
			
 
				+%% %% generated code. Also, the book differs in places where we I the
			
 
				+%% %% opportunity to make the topics more fun, such as in relating register
			
 
				+%% %% allocation to Sudoku (Chapter~\ref{ch:register-allocation-Rvar}).
			
 
				+
			
 
				+%% \section*{Prerequisites}
			
 
				+
			
 
				+%% The material in this book is challenging but rewarding. It is meant to
			
 
				+%% prepare students for a lifelong career in programming languages.
			
 
				+
			
 
				+%% %\section*{Structure of book}
			
 
				+%% % You might want to add short description about each chapter in this book.
			
 
				+
			
 
				+%% %\section*{About the companion website}
			
 
				+%% %The website\footnote{\url{https://github.com/amberj/latex-book-template}} for %this file contains:
			
 
				+%% %\begin{itemize}
			
 
				+%% %  \item A link to (freely downlodable) latest version of this document.
			
 
				+%% %  \item Link to download LaTeX source for this document.
			
 
				+%% %  \item Miscellaneous material (e.g. suggested readings etc).
			
 
				+%% %\end{itemize}
			
 
				 
			
 
				 \section*{Acknowledgments}
			
 
				 
			
 
				-Many people have contributed to the ideas, techniques, and
			
 
				-organization of this book and have taught courses based on it.  Many
			
 
				-of the compiler design decisions in this book are drawn from the
			
 
				-assignment descriptions of \cite{Dybvig:2010aa}.  We also would like
			
 
				-to thank John Clements, Bor-Yuh Evan Chang, Daniel P. Friedman, Ronald
			
 
				+The tradition of compiler writing at Indiana University goes back to
			
 
				+research and courses on programming languages by Professor Daniel
			
 
				+Friedman in the 1970's and 1980's.  One of his students, Kent Dybvig,
			
 
				+built Chez Scheme~\citep{Dybvig:2006aa}, a production-quality and
			
 
				+efficient compiler for Scheme.  Throughout the 1990's and 2000's,
			
 
				+Professor Dybvig taught the compiler course and continued development
			
 
				+of Chez Scheme.
			
 
				+%
			
 
				+The compiler course evolved to incorporate novel pedagogical ideas
			
 
				+while also including elements of efficient real-world compilers.  One
			
 
				+of Friedman's ideas was to split the compiler into many small
			
 
				+passes. Another idea, called ``the game'', was to test the code
			
 
				+generated by each pass on interpreters.
			
 
				+
			
 
				+Dybvig, with later help from his students Dipanwita Sarkar and Andrew
			
 
				+Keep, developed infrastructure to support this approach and evolved
			
 
				+the course use even smaller
			
 
				+nanopasses~\citep{Sarkar:2004fk,Keep:2012aa}.  Many of the compiler
			
 
				+design decisions in this book are drawn from the assignment
			
 
				+descriptions of \citet{Dybvig:2010aa}. A graduate student named
			
 
				+Abdulaziz Ghuloum observed that the front-to-back organization of the
			
 
				+course made it difficult for students to understand the rationale for
			
 
				+the compiler design. Ghuloum proposed the incremental
			
 
				+approach~\citep{Ghuloum:2006bh}.
			
 
				+
			
 
				+We thank John Clements, Bor-Yuh Evan Chang, Daniel P. Friedman, Ronald
			
 
				 Garcia, Abdulaziz Ghuloum, Jay McCarthy, Nate Nystrom, Dipanwita
			
 
				 Sarkar, Oscar Waddell, and Michael Wollowski.
			
 
				 
			
 
				 \mbox{}\\
			
 
				 \noindent Jeremy G. Siek \\
			
 
				-\noindent \url{http://homes.soic.indiana.edu/jsiek} \\
			
 
				+Indiana University
			
 
				+%\noindent \url{http://homes.soic.indiana.edu/jsiek} \\
			
 
				 %\noindent Spring 2016
			
 
				 
			
 
				 
			
@@ -937,7 +1109,7 @@ do anything.  On the other hand, if the error is a
 
				 \code{trapped-error}, then the compiler must produce an executable and
			
 
				 it is required to report that an error occurred. To signal an error,
			
 
				 exit with a return code of \code{255}.  The interpreters in chapters
			
 
				-\ref{ch:type-dynamic} and \ref{ch:gradual-typing} use
			
 
				+\ref{ch:type-dynamic} and \ref{ch:Rgrad} use
			
 
				 \code{trapped-error}.
			
 
				 
			
 
				 %% This convention applies to the languages defined in this
			
@@ -1066,7 +1238,7 @@ Appendix~\ref{appendix:utilities}.\\
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 \chapter{Integers and Variables}
			
 
				-\label{ch:int-exp}
			
 
				+\label{ch:Rvar}
			
 
				 
			
 
				 This chapter is about compiling a subset of Racket to x86-64 assembly
			
 
				 code~\citep{Intel:2015aa}. The subset, named \LangVar{}, includes
			
@@ -1488,7 +1660,7 @@ specified by the label and $\key{retq}$ returns from a procedure to
 
				 its caller. 
			
 
				 %
			
 
				 We discuss procedure calls in more detail later in this chapter and in
			
 
				-Chapter~\ref{ch:functions}. The instruction $\key{jmp}\,\itm{label}$
			
 
				+Chapter~\ref{ch:Rfun}. The instruction $\key{jmp}\,\itm{label}$
			
 
				 updates the program counter to the address of the instruction after
			
 
				 the specified label.
			
 
				 
			
@@ -1641,7 +1813,7 @@ allowed in front of every instructions. Instead instructions are
 
				 grouped into \emph{blocks}\index{block}\index{basic block} with a
			
 
				 label associated with every block, which is why the \key{X86Program}
			
 
				 struct includes an alist mapping labels to blocks. The reason for this
			
 
				-organization becomes apparent in Chapter~\ref{ch:bool-types} when we
			
 
				+organization becomes apparent in Chapter~\ref{ch:Rif} when we
			
 
				 introduce conditional branching. The \code{Block} structure includes
			
 
				 an $\itm{info}$ field that is not needed for this chapter, but becomes
			
 
				 useful in Chapter~\ref{ch:register-allocation-Rvar}.  For now, the
			
@@ -1779,7 +1951,7 @@ The ordering of \key{uniquify} with respect to
 
				 \key{uniquify} to come first.
			
 
				 
			
 
				 Last, we consider \key{select-instructions} and \key{assign-homes}.
			
 
				-These two passes are intertwined. In Chapter~\ref{ch:functions} we
			
 
				+These two passes are intertwined. In Chapter~\ref{ch:Rfun} we
			
 
				 learn that, in x86, registers are used for passing arguments to
			
 
				 functions and it is preferable to assign parameters to their
			
 
				 corresponding registers. On the other hand, by selecting instructions
			
@@ -1867,7 +2039,7 @@ A \LangCVar{} program consists of a control-flow graph represented as
 
				 an alist mapping labels to tails. This is more general than necessary
			
 
				 for the present chapter, as we do not yet introduce \key{goto} for
			
 
				 jumping to labels, but it saves us from having to change the syntax in
			
 
				-Chapter~\ref{ch:bool-types}.  For now there will be just one label,
			
 
				+Chapter~\ref{ch:Rif}.  For now there will be just one label,
			
 
				 \key{start}, and the whole program is its tail.
			
 
				 %
			
 
				 The $\itm{info}$ field of the \key{CProgram} form, after the
			
@@ -2297,7 +2469,7 @@ output. The reader might be tempted to instead organize
 
				 \code{cont} parameter and perhaps using \code{append} to combine
			
 
				 statements. We warn against that alternative because the
			
 
				 accumulator-passing style is key to how we generate high-quality code
			
 
				-for conditional expressions in Chapter~\ref{ch:bool-types}.
			
 
				+for conditional expressions in Chapter~\ref{ch:Rif}.
			
 
				 
			
 
				 \begin{exercise}\normalfont
			
 
				 %
			
@@ -2655,7 +2827,7 @@ all, fast code is useless if it produces incorrect results!
 
				 
			
 
				 \index{register allocation}
			
 
				 
			
 
				-In Chapter~\ref{ch:int-exp} we learned how to store variables on the
			
 
				+In Chapter~\ref{ch:Rvar} we learned how to store variables on the
			
 
				 stack. In this Chapter we learn how to improve the performance of the
			
 
				 generated code by placing some variables into registers.  The CPU can
			
 
				 access a register in a single cycle, whereas accessing the stack can
			
@@ -2724,7 +2896,7 @@ then model register allocation as a graph coloring problem
 
				 
			
 
				 If we run out of registers despite these efforts, we place the
			
 
				 remaining variables on the stack, similar to what we did in
			
 
				-Chapter~\ref{ch:int-exp}. It is common to use the verb \emph{spill}
			
 
				+Chapter~\ref{ch:Rvar}. It is common to use the verb \emph{spill}
			
 
				 for assigning a variable to a stack location. The decision to spill a
			
 
				 variable is handled as part of the graph coloring process
			
 
				 (Section~\ref{sec:graph-coloring}).
			
@@ -2800,7 +2972,7 @@ rdi rsi rdx rcx r8 r9
 
				 \end{lstlisting}
			
 
				 If there are more than six arguments, then the convention is to use
			
 
				 space on the frame of the caller for the rest of the
			
 
				-arguments. However, in Chapter~\ref{ch:functions} we arrange never to
			
 
				+arguments. However, in Chapter~\ref{ch:Rfun} we arrange never to
			
 
				 need more than six arguments. For now, the only function we care about
			
 
				 is \code{read\_int} and it takes zero arguments.
			
 
				 %
			
@@ -3407,7 +3579,7 @@ particular, we assign $-1$ to \code{rax} and $-2$ to \code{rsp}.
 
				 %% One might wonder why we include registers at all in the liveness
			
 
				 %% analysis and interference graph. For example, we never allocate a
			
 
				 %% variable to \code{rax} and \code{rsp}, so it would be harmless to
			
 
				-%% leave them out.  As we see in Chapter~\ref{ch:tuples}, when we begin
			
 
				+%% leave them out.  As we see in Chapter~\ref{ch:Rvec}, when we begin
			
 
				 %% to use register for passing arguments to functions, it will be
			
 
				 %% necessary for those registers to appear in the interference graph
			
 
				 %% because those registers will also be assigned to variables, and we
			
@@ -3692,7 +3864,7 @@ We recommend creating an auxiliary function named \code{color-graph}
 
				 that takes an interference graph and a list of all the variables in
			
 
				 the program. This function should return a mapping of variables to
			
 
				 their colors (represented as natural numbers). By creating this helper
			
 
				-function, you will be able to reuse it in Chapter~\ref{ch:functions}
			
 
				+function, you will be able to reuse it in Chapter~\ref{ch:Rfun}
			
 
				 when we add support for functions.
			
 
				 
			
 
				 To prioritize the processing of highly saturated nodes inside the
			
@@ -4270,7 +4442,7 @@ conclusion:
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 \chapter{Booleans and Control Flow}
			
 
				-\label{ch:bool-types}
			
 
				+\label{ch:Rif}
			
 
				 \index{Boolean}
			
 
				 \index{control flow}
			
 
				 \index{conditional expression}
			
@@ -5543,7 +5715,7 @@ Use the \code{tsort} and \code{transpose} functions of the Racket
 
				 As an aside, a topological ordering is only guaranteed to exist if the
			
 
				 graph does not contain any cycles. That is indeed the case for the
			
 
				 control-flow graphs that we generate from \LangIf{} programs.
			
 
				-However, in Chapter~\ref{ch:loop} we add loops to \LangLoop{} and
			
 
				+However, in Chapter~\ref{ch:Rwhile} we add loops to \LangLoop{} and
			
 
				 learn how to handle cycles in the control-flow graph.
			
 
				 
			
 
				 You'll need to construct a directed graph to represent the
			
@@ -5956,7 +6128,7 @@ blocks on several test programs.
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 \chapter{Tuples and Garbage Collection}
			
 
				-\label{ch:tuples}
			
 
				+\label{ch:Rvec}
			
 
				 \index{tuple}
			
 
				 \index{vector}
			
 
				 
			
@@ -5976,7 +6148,7 @@ no longer needed, which is why we also study \emph{garbage collection}
 
				 
			
 
				 Section~\ref{sec:r3} introduces the \LangVec{} language including its
			
 
				 interpreter and type checker. The \LangVec{} language extends the \LangIf{}
			
 
				-language of Chapter~\ref{ch:bool-types} with vectors and Racket's
			
 
				+language of Chapter~\ref{ch:Rif} with vectors and Racket's
			
 
				 \code{void} value. The reason for including the later is that the
			
 
				 \code{vector-set!} operation returns a value of type
			
 
				 \code{Void}\footnote{Racket's \code{Void} type corresponds to what is
			
@@ -7333,7 +7505,7 @@ from the set.
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 \chapter{Functions}
			
 
				-\label{ch:functions}
			
 
				+\label{ch:Rfun}
			
 
				 \index{function}
			
 
				 
			
 
				 This chapter studies the compilation of functions similar to those
			
@@ -7341,7 +7513,7 @@ found in the C language. This corresponds to a subset of Typed Racket
 
				 in which only top-level function definitions are allowed. This kind of
			
 
				 function is an important stepping stone to implementing
			
 
				 lexically-scoped functions, that is, \key{lambda} abstractions, which
			
 
				-is the topic of Chapter~\ref{ch:lambdas}.
			
 
				+is the topic of Chapter~\ref{ch:Rlam}.
			
 
				 
			
 
				 \section{The \LangFun{} Language}
			
 
				 
			
@@ -8294,7 +8466,7 @@ except the \code{retq} is replaced with \code{jmp *$\itm{arg}$}.
 
				 Regarding function definitions, you will need to generate a prelude
			
 
				 and conclusion for each one. This code is similar to the prelude and
			
 
				 conclusion that you generated for the \code{main} function in
			
 
				-Chapter~\ref{ch:tuples}. To review, the prelude of every function
			
 
				+Chapter~\ref{ch:Rvec}. To review, the prelude of every function
			
 
				 should carry out the following steps.
			
 
				 \begin{enumerate}
			
 
				 \item Start with \code{.global} and \code{.align} directives followed
			
@@ -8502,7 +8674,7 @@ mainconclusion:
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 \chapter{Lexically Scoped Functions}
			
 
				-\label{ch:lambdas}
			
 
				+\label{ch:Rlam}
			
 
				 \index{lambda}
			
 
				 \index{lexical scoping}
			
 
				 
			
@@ -8577,8 +8749,8 @@ applied. An efficient solution to the problem, due to
 
				 free variables together with the function pointer for the lambda's
			
 
				 code, an arrangement called a \emph{flat closure} (which we shorten to
			
 
				 just ``closure'').  \index{closure}\index{flat closure} Fortunately,
			
 
				-we have all the ingredients to make closures, Chapter~\ref{ch:tuples}
			
 
				-gave us vectors and Chapter~\ref{ch:functions} gave us function
			
 
				+we have all the ingredients to make closures, Chapter~\ref{ch:Rvec}
			
 
				+gave us vectors and Chapter~\ref{ch:Rfun} gave us function
			
 
				 pointers. The function pointer resides at index $0$ and the
			
 
				 values for the free variables will fill in the rest of the vector.
			
 
				 
			
@@ -9214,7 +9386,7 @@ work of \citet{Keep:2012ab}.
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 \chapter{Dynamic Typing}
			
 
				-\label{ch:type-dynamic}
			
 
				+\label{ch:Rdyn}
			
 
				 \index{dynamic typing}
			
 
				 
			
 
				 In this chapter we discuss the compilation of \LangDyn{}, a dynamically
			
@@ -9317,7 +9489,7 @@ There is no type checker for \LangDyn{} because it is not a statically
 
				 typed language (it's dynamically typed!).
			
 
				 
			
 
				 The definitional interpreter for \LangDyn{} is presented in
			
 
				-Figure~\ref{fig:interp-Rdyn} and its auxiliary functions are defined in
			
 
				+Figure~\ref{fig:interp-Rdyn} and its auxiliary functions are defined i
			
 
				 Figure~\ref{fig:interp-Rdyn-aux}. Consider the match case for
			
 
				 \code{(Int n)}.  Instead of simply returning the integer \code{n} (as
			
 
				 in the interpreter for \LangVar{} in Figure~\ref{fig:interp-Rvar}), the
			
@@ -10284,7 +10456,7 @@ for the compilation of \LangDyn{}.
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 \chapter{Loops and Assignment}
			
 
				-\label{ch:loop}
			
 
				+\label{ch:Rwhile}
			
 
				 
			
 
				 % TODO: define R'_8
			
 
				 
			
@@ -10893,7 +11065,7 @@ in $A_b \cap F_b$ with \code{AssignedFree} to produce \itm{params'}.
 
				 Let $P$ be the set of parameter names in \itm{params}.  The result is
			
 
				 $\LAMBDA{\itm{params'}}{T}{\itm{body'}}$, $A_b - P$, and $(F_b \cup
			
 
				 \mathrm{FV}(\itm{body})) - P$, where $\mathrm{FV}$ computes the free
			
 
				-variables of an expression (see Chapter~\ref{ch:lambdas}).
			
 
				+variables of an expression (see Chapter~\ref{ch:Rlam}).
			
 
				 
			
 
				 \paragraph{Convert Assignments}
			
 
				 
			
@@ -11211,7 +11383,7 @@ for the compilation of \LangLoop{}.
 
				 \section{Challenge: Arrays}
			
 
				 \label{sec:arrays}
			
 
				 
			
 
				-In Chapter~\ref{ch:tuples} we studied tuples, that is, sequences of
			
 
				+In Chapter~\ref{ch:Rvec} we studied tuples, that is, sequences of
			
 
				 elements whose length is determined at compile-time and where each
			
 
				 element of a tuple may have a different type (they are
			
 
				 heterogeous). This challenge is also about sequences, but this time
			
@@ -11435,7 +11607,7 @@ an array:
 
				 \end{itemize}
			
 
				 
			
 
				 
			
 
				-Recall that in Chapter~\ref{ch:type-dynamic}, we use a $3$-bit tag to
			
 
				+Recall that in Chapter~\ref{ch:Rdyn}, we use a $3$-bit tag to
			
 
				 differentiate the kinds of values that have been injected into the
			
 
				 \code{Any} type. We use the bit pattern \code{110} (or $6$ in decimal)
			
 
				 to indicate that the value is an array.
			
@@ -11448,7 +11620,7 @@ the passes to handle arrays.
 
				 
			
 
				 The array-access operators \code{vectorof-ref} and
			
 
				 \code{vectorof-set!} are similar to the \code{any-vector-ref} and
			
 
				-\code{any-vector-set!} operators of Chapter~\ref{ch:type-dynamic} in
			
 
				+\code{any-vector-set!} operators of Chapter~\ref{ch:Rdyn} in
			
 
				 that the type checker cannot tell whether the index will be in bounds,
			
 
				 so the bounds check must be performed at run time.  Recall that the
			
 
				 \code{reveal-casts} pass (Section~\ref{sec:reveal-casts-Rany}) wraps
			
@@ -11538,7 +11710,7 @@ arrays by laying out each row in the array, one after the next.
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 \chapter{Gradual Typing}
			
 
				-\label{ch:gradual-typing}
			
 
				+\label{ch:Rgrad}
			
 
				 \index{gradual typing}
			
 
				 
			
 
				 This chapter studies a language, \LangGrad{}, in which the programmer
			
@@ -11630,7 +11802,7 @@ syntax.
 
				 Both the type checker and the interpreter for \LangGrad{} require some
			
 
				 interesting changes to enable gradual typing, which we discuss in the
			
 
				 next two sections in the context of the \code{map-vec} example from
			
 
				-Chapter~\ref{ch:functions}.  In Figure~\ref{fig:gradual-map-vec} we
			
 
				+Chapter~\ref{ch:Rfun}.  In Figure~\ref{fig:gradual-map-vec} we
			
 
				 revised the \code{map-vec} example, omitting the type annotations from
			
 
				 the \code{add1} function.
			
 
				 
			
@@ -12706,7 +12878,7 @@ recommend the reader to the online gradual typing bibliography:
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
 
				 \chapter{Parametric Polymorphism}
			
 
				-\label{ch:parametric-polymorphism}
			
 
				+\label{ch:Rpoly}
			
 
				 \index{parametric polymorphism}
			
 
				 \index{generics}
			
 
				 
			
@@ -12752,7 +12924,7 @@ declaration comes before the \code{define}. In the abstract syntax,
 
				 the return type in the \code{Def} is \code{Any}, but that should be
			
 
				 ignored in favor of the return type in the type declaration.  (The
			
 
				 \code{Any} comes from using the same parser as in
			
 
				-Chapter~\ref{ch:type-dynamic}.)  The presence of a type declaration
			
 
				+Chapter~\ref{ch:Rdyn}.)  The presence of a type declaration
			
 
				 enables the use of an \code{All} type for a function, thereby making
			
 
				 it polymorphic. The grammar for types is extended to include
			
 
				 polymorphic types and type variables.
			
@@ -13154,7 +13326,7 @@ add just one new pass, \code{erase-types}, to compile \LangInst{} to
 
				 \section{Erase Types}
			
 
				 \label{sec:erase-types}
			
 
				 
			
 
				-We use the \code{Any} type from Chapter~\ref{ch:type-dynamic} to
			
 
				+We use the \code{Any} type from Chapter~\ref{ch:Rdyn} to
			
 
				 represent type variables. For example, Figure~\ref{fig:map-vec-erase}
			
 
				 shows the output of the \code{erase-types} pass on the polymorphic
			
 
				 \code{map-vec} (Figure~\ref{fig:map-vec-poly}). The occurrences of