4 years ago · 5b436b4c45
--- a/book.tex
+++ b/book.tex
@@ -170,8 +170,8 @@ University.
 
				 \chapter*{Preface}
			
 
				 
			
 
				 The tradition of compiler writing at Indiana University goes back to
			
 
				-research and courses about programming languages by Daniel Friedman in
			
 
				-the 1970's and 1980's. Dan conducted research on lazy
			
 
				+research and courses on programming languages by Professor Daniel
			
 
				+Friedman in the 1970's and 1980's. Friedman conducted research on lazy
			
 
				 evaluation~\citep{Friedman:1976aa} in the context of
			
 
				 Lisp~\citep{McCarthy:1960dz} and then studied
			
 
				 continuations~\citep{Felleisen:kx} and
			
@@ -180,67 +180,67 @@ Scheme~\citep{Sussman:1975ab}, a dialect of Lisp.  One of the students
 
				 of those courses, Kent Dybvig, went on to build Chez
			
 
				 Scheme~\citep{Dybvig:2006aa}, a production-quality and efficient
			
 
				 compiler for Scheme. After completing his Ph.D. at the University of
			
 
				-North Carolina, Kent returned to teach at Indiana University.
			
 
				-Throughout the 1990's and 2000's, Kent continued development of Chez
			
 
				-Scheme and taught the compiler course.
			
 
				+North Carolina, he returned to teach at Indiana University.
			
 
				+Throughout the 1990's and 2000's, Professor Dybvig continued
			
 
				+development of Chez Scheme and taught the compiler course.
			
 
				 
			
 
				 The compiler course evolved to incorporate novel pedagogical ideas
			
 
				 while also including elements of effective real-world compilers.  One
			
 
				-of Dan's ideas was to split the compiler into many small ``passes'' so
			
 
				-that the code for each pass would be easy to understood in isolation.
			
 
				-(In contrast, most compilers of the time were organized into only a
			
 
				-few monolithic passes for reasons of compile-time efficiency.)  Kent,
			
 
				-with later help from his students Dipanwita Sarkar and Andrew Keep,
			
 
				-developed infrastructure to support this approach and evolved the
			
 
				-course, first to use micro-sized passes and then into even smaller
			
 
				-nano passes~\citep{Sarkar:2004fk,Keep:2012aa}. Jeremy Siek was a
			
 
				-student in this compiler course in the early 2000's, as part of his
			
 
				-Ph.D. studies at Indiana University. Needless to say, Jeremy enjoyed
			
 
				-the course immensely!
			
 
				-
			
 
				-During that time, another student named Abdulaziz Ghuloum observed
			
 
				-that the front-to-back organization of the course made it difficult
			
 
				-for students to understand the rationale for the compiler
			
 
				-design. Abdulaziz proposed an incremental approach in which the
			
 
				-students build the compiler in stages; they start by implementing a
			
 
				-complete compiler for a very small subset of the input language and in
			
 
				-each subsequent stage they add a language feature and add or modify
			
 
				-passes to handle the new feature~\citep{Ghuloum:2006bh}.  In this way,
			
 
				-the students see how the language features motivate aspects of the
			
 
				+of Friedman's ideas was to split the compiler into many small
			
 
				+``passes'' so that the code for each pass would be easy to understood
			
 
				+in isolation.  (In contrast, most compilers of the time were organized
			
 
				+into only a few monolithic passes for reasons of compile-time
			
 
				+efficiency.)  Dybvig, with later help from his students Dipanwita
			
 
				+Sarkar and Andrew Keep, developed infrastructure to support this
			
 
				+approach and evolved the course, first to use smaller micro-passes and
			
 
				+then into even smaller
			
 
				+nano-passes~\citep{Sarkar:2004fk,Keep:2012aa}. I was a student in this
			
 
				+compiler course in the early 2000's as part of his Ph.D. studies at
			
 
				+Indiana University. Needless to say, I enjoyed the course immensely!
			
 
				+
			
 
				+During that time, another graduate student named Abdulaziz Ghuloum
			
 
				+observed that the front-to-back organization of the course made it
			
 
				+difficult for students to understand the rationale for the compiler
			
 
				+design. Ghuloum proposed an incremental approach in which the students
			
 
				+build the compiler in stages; they start by implementing a complete
			
 
				+compiler for a very small subset of the input language and in each
			
 
				+subsequent stage they add a language feature and add or modify passes
			
 
				+to handle the new feature~\citep{Ghuloum:2006bh}.  In this way, the
			
 
				+students see how the language features motivate aspects of the
			
 
				 compiler design.
			
 
				 
			
 
				-After graduating from Indiana University in 2005, Jeremy went on to
			
 
				-teach at the University of Colorado. He adapted the nano pass and
			
 
				-incremental approaches to compiling a subset of the Python
			
 
				+After graduating from Indiana University in 2005, I went on to teach
			
 
				+at the University of Colorado. I adapted the nano-pass and incremental
			
 
				+approaches to compiling a subset of the Python
			
 
				 language~\citep{Siek:2012ab}.  Python and Scheme are quite different
			
 
				 on the surface but there is a large overlap in the compiler techniques
			
 
				-required for the two languages. Thus, Jeremy was able to teach much of
			
 
				-the same content from the Indiana compiler course. He very much
			
 
				-enjoyed teaching the course organized in this way, and even better,
			
 
				-many of the students learned a lot and got excited about compilers.
			
 
				-
			
 
				-Jeremy returned to teach at Indiana University in 2013.  In his
			
 
				-absence the compiler course had switched from the front-to-back
			
 
				-organization to a back-to-front organization. Seeing how well the
			
 
				-incremental approach worked at Colorado, he started porting and
			
 
				-adapting the structure of the Colorado course back into the land of
			
 
				-Scheme. In the meantime Indiana had moved on from Scheme to Racket, so
			
 
				-the course is now about compiling a subset of Racket (and Typed
			
 
				-Racket) to the x86 assembly language. The compiler is implemented in
			
 
				-Racket 7.1~\citep{plt-tr}.
			
 
				+required for the two languages. Thus, I was able to teach much of the
			
 
				+same content from the Indiana compiler course. I very much enjoyed
			
 
				+teaching the course organized in this way, and even better, many of
			
 
				+the students learned a lot and got excited about compilers.
			
 
				+
			
 
				+I returned to teach at Indiana University in 2013.  In my absence the
			
 
				+compiler course had switched from the front-to-back organization to a
			
 
				+back-to-front organization. Seeing how well the incremental approach
			
 
				+worked at Colorado, I started porting and adapting the structure of
			
 
				+the Colorado course back into the land of Scheme. In the meantime
			
 
				+Indiana University had moved on from Scheme to Racket, so the course
			
 
				+is now about compiling a subset of Racket (and Typed Racket) to the
			
 
				+x86 assembly language. The compiler is implemented in
			
 
				+Racket~\citep{plt-tr}.
			
 
				 
			
 
				 This is the textbook for the incremental version of the compiler
			
 
				 course at Indiana University (Spring 2016 - present) and it is the
			
 
				-first open textbook for an Indiana compiler course.  With this book we
			
 
				+first open textbook for an Indiana compiler course.  With this book I
			
 
				 hope to make the Indiana compiler course available to people that have
			
 
				-not had the chance to study in Bloomington in person.  Many of the
			
 
				-compiler design decisions in this book are drawn from the assignment
			
 
				-descriptions of \cite{Dybvig:2010aa}. We have captured what we think
			
 
				-are the most important topics from \cite{Dybvig:2010aa} but we have
			
 
				-omitted topics that we think are less interesting conceptually and we
			
 
				-have made simplifications to reduce complexity.  In this way, this
			
 
				+not had the chance to study compilers at Indiana University.  Many of
			
 
				+the compiler design decisions in this book are drawn from the
			
 
				+assignment descriptions of \cite{Dybvig:2010aa}. I have captured what
			
 
				+I think are the most important topics from \cite{Dybvig:2010aa} but
			
 
				+have omitted topics that are less interesting conceptually. I have
			
 
				+also made simplifications to reduce complexity.  In this way, this
			
 
				 book leans more towards pedagogy than towards the efficiency of the
			
 
				-generated code. Also, the book differs in places where we saw the
			
 
				+generated code. Also, the book differs in places where we I the
			
 
				 opportunity to make the topics more fun, such as in relating register
			
 
				 allocation to Sudoku (Chapter~\ref{ch:register-allocation-r1}).
			
 
				 
			
@@ -255,10 +255,22 @@ proficient with Racket (or Scheme) prior to reading this book. There
 
				 are many excellent resources for learning Scheme and
			
 
				 Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}. It
			
 
				 is helpful but not necessary for the student to have prior exposure to
			
 
				-the x86 (or x86-64) assembly language~\citep{Intel:2015aa}, as one might
			
 
				-obtain from a computer systems
			
 
				-course~\citep{Bryant:2005aa,Bryant:2010aa}.  This book introduces the
			
 
				+the x86 (or x86-64) assembly language~\citep{Intel:2015aa}, as one
			
 
				+might obtain from a computer systems
			
 
				+course~\citep{Bryant:2005aa,Bryant:2010aa}. This book introduces the
			
 
				 parts of x86-64 assembly language that are needed.
			
 
				+%
			
 
				+We follow the System V calling conventions~\citep{Matz:2013aa}, which
			
 
				+means that the assembly code that we generate will work properly with
			
 
				+our runtime system (written in C) when it is compiled using the GNU C
			
 
				+compiler (\code{gcc}) on Linux or MacOS. (Minor adjustments are needed
			
 
				+for MacOS, which we note as they arise.)
			
 
				+%
			
 
				+The Microsoft Windows operating system uses a different calling
			
 
				+convention~\citep{Microsoft:2018aa}, which is followed by the GNU C
			
 
				+compiler when running on Windows. So the assembly code that we
			
 
				+generate will \emph{not} work on Windows.
			
 
				+
			
 
				 
			
 
				 %\section*{Structure of book}
			
 
				 % You might want to add short description about each chapter in this book.
			
@@ -1353,7 +1365,7 @@ x86_0 &::= & \key{.globl main}\\
 
				 \]
			
 
				 \end{minipage}
			
 
				 }
			
 
				-\caption{The concrete syntax of the x86$_0$ assembly language (AT\&T syntax).}
			
 
				+\caption{The syntax of the x86$_0$ assembly language (AT\&T syntax).}
			
 
				 \label{fig:x86-0-concrete}
			
 
				 \end{figure}
			
 
				 
			
@@ -5937,8 +5949,10 @@ An implementation of the copying collector is provided in the
 
				 interface to the garbage collector that is used by the compiler. The
			
 
				 \code{initialize} function creates the FromSpace, ToSpace, and root
			
 
				 stack and should be called in the prelude of the \code{main}
			
 
				-function. The \code{initialize} function puts the address of the
			
 
				-beginning of the FromSpace into the global variable
			
 
				+function. The arguments of \code{initialize} are the root stack size
			
 
				+and the heap size. Both need to be multiples of $64$ and $16384$ is a
			
 
				+good choice for both.  The \code{initialize} function puts the address
			
 
				+of the beginning of the FromSpace into the global variable
			
 
				 \code{free\_ptr}. The global variable \code{fromspace\_end} points to
			
 
				 the address that is 1-past the last element of the FromSpace. (We use
			
 
				 half-open intervals to represent chunks of
			
@@ -6625,7 +6639,7 @@ main:
 
				 	pushq	%r14
			
 
				 	subq	$0, %rsp
			
 
				 	movq $16384, %rdi
			
 
				-	movq $16, %rsi
			
 
				+	movq $16384, %rsi
			
 
				 	callq initialize
			
 
				 	movq rootstack_begin(%rip), %r15
			
 
				 	movq $0, (%r15)
			
@@ -6917,7 +6931,7 @@ inside each other.
 
				     (\key{vector-ref}\;\Exp\;\Int)} \\
			
 
				   &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
			
 
				       \mid \LP\key{has-type}~\Exp~\Type\RP } \\
			
 
				-      &\mid& \LP\Exp \; \Exp \ldots\RP \\
			
 
				+  &\mid& \LP\Exp \; \Exp \ldots\RP \\
			
 
				   \Def &::=& \CDEF{\Var}{\LS\Var \key{:} \Type\RS \ldots}{\Type}{\Exp} \\
			
 
				   R_4 &::=& \Def \ldots \; \Exp
			
 
				 \end{array}
			
@@ -8128,7 +8142,8 @@ syntax for function application.
 
				     &\mid& \gray{ (\key{vector}\;\Exp\ldots) \mid
			
 
				           (\key{vector-ref}\;\Exp\;\Int)} \\
			
 
				     &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
			
 
				-     \mid (\Exp \; \Exp\ldots) } \\
			
 
				+    \mid (\Exp \; \Exp\ldots) } \\
			
 
				+    &\mid& \LP \key{procedure-arity}~\Exp\RP \\
			
 
				     &\mid& \CLAMBDA{\LP\LS\Var \key{:} \Type\RS\ldots\RP}{\Type}{\Exp} \\
			
 
				   \Def &::=& \gray{ \CDEF{\Var}{\LS\Var \key{:} \Type\RS\ldots}{\Type}{\Exp} } \\
			
 
				   R_5 &::=& \gray{\Def\ldots \; \Exp}
			
@@ -8148,14 +8163,15 @@ syntax for function application.
 
				     \small
			
 
				 \[
			
 
				 \begin{array}{lcl}
			
 
				+  \itm{op} &::=& \ldots \mid \code{procedure-arity} \\
			
 
				   \Exp &::=& \gray{ \INT{\Int} \VAR{\Var} \mid \LET{\Var}{\Exp}{\Exp} } \\
			
 
				        &\mid& \gray{ \PRIM{\itm{op}}{\Exp\ldots} }\\
			
 
				      &\mid& \gray{ \BOOL{\itm{bool}}
			
 
				       \mid \IF{\Exp}{\Exp}{\Exp} } \\
			
 
				      &\mid& \gray{ \VOID{} \mid \LP\key{HasType}~\Exp~\Type \RP 
			
 
				      \mid \APPLY{\Exp}{\Exp\ldots} }\\
			
 
				-     &\mid& \LAMBDA{\LP[\Var\code{:}\Type]\ldots\RP}{\Type}{\Exp}\\
			
 
				- \Def &::=& \gray{ \FUNDEF{\Var}{\LP[\Var \code{:} \Type]\ldots\RP}{\Type}{\code{'()}}{\Exp} }\\
			
 
				+     &\mid& \LAMBDA{\LP\LS\Var\code{:}\Type\RS\ldots\RP}{\Type}{\Exp}\\
			
 
				+ \Def &::=& \gray{ \FUNDEF{\Var}{\LP\LS\Var \code{:} \Type\RS\ldots\RP}{\Type}{\code{'()}}{\Exp} }\\
			
 
				   R_5 &::=& \gray{ \PROGRAMDEFSEXP{\code{'()}}{\LP\Def\ldots\RP}{\Exp} }
			
 
				 \end{array}
			
 
				 \]
			
@@ -8178,23 +8194,7 @@ values.
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{lstlisting}
			
 
				-(define (interp-exp env)
			
 
				-  (lambda (e)
			
 
				-    (define recur (interp-exp env))
			
 
				-    (match e
			
 
				-      ...
			
 
				-      [(Lambda (list `[,xs : ,Ts] ...) rT body)
			
 
				-       `(lambda ,xs ,body ,env)]
			
 
				-      [(Apply fun args)
			
 
				-       (define fun-val ((interp-exp env) fun))
			
 
				-       (define arg-vals (map (interp-exp env) args))
			
 
				-       (match fun-val
			
 
				-	 [`(lambda ,xs ,body ,lam-env)
			
 
				-	  (define new-env (append (map cons xs arg-vals) lam-env))
			
 
				-	  ((interp-exp new-env) body)]
			
 
				-	 [else (error "interp-exp, expected function, not" fun-val)])]
			
 
				-      [else (error 'interp-exp "unrecognized expression")]
			
 
				-      )))
			
 
				+UPDATE ME
			
 
				 \end{lstlisting}
			
 
				 \caption{Interpreter for $R_5$.}
			
 
				 \label{fig:interp-R5}
			
@@ -8215,13 +8215,13 @@ require the body's type to match the declared return type.
 
				 (define (type-check-R5 env)
			
 
				   (lambda (e)
			
 
				     (match e
			
 
				-      [(Lambda (and bnd `([,xs : ,Ts] ...)) rT body)
			
 
				+      [(Lambda (and params `([,xs : ,Ts] ...)) rT body)
			
 
				        (define-values (new-body bodyT) 
			
 
				           ((type-check-exp (append (map cons xs Ts) env)) body))
			
 
				        (define ty `(,@Ts -> ,rT))
			
 
				        (cond
			
 
				          [(equal? rT bodyT)
			
 
				-           (values (HasType (Lambda bnd rT new-body) ty) ty)]
			
 
				+           (values (HasType (Lambda params rT new-body) ty) ty)]
			
 
				          [else
			
 
				            (error "mismatch in return type" bodyT rT)])]
			
 
				       ...
			
@@ -8875,14 +8875,16 @@ an explicit \code{If} expression that uses two new forms,
 
				 \code{tag-of-any}.  The \code{tag-of-any} operation retrieves the type
			
 
				 tag from a tagged value of type \code{Any}.  The \code{ValueOf} form
			
 
				 retrieves the underlying value from a tagged value.  The
			
 
				-\code{ValueOf} form includes the type for the underlying value, which
			
 
				-is needed by the type checker.  Finally, the \code{Exit} form ends the
			
 
				-execution of the program by invoking the operating system's
			
 
				-\code{exit} function. So the translation for \code{Project} is as
			
 
				-follows.
			
 
				+\code{ValueOf} form includes the type for the underlying value which
			
 
				+is used by the type checker.  Finally, the \code{Exit} form ends the
			
 
				+execution of the program.
			
 
				+%
			
 
				+If the target type of the projection is \code{Boolean} or
			
 
				+\code{Integer}, then \code{Project} can be translated as follows.
			
 
				 %(We have omitted the \code{has-type} AST nodes to make this
			
 
				 %output more readable.)
			
 
				-
			
 
				+\begin{center}
			
 
				+\begin{minipage}{1.0\textwidth}
			
 
				 \begin{lstlisting}
			
 
				 (Project |$e$| |$\FType$|)
			
 
				 |$\Rightarrow$|
			
@@ -8892,20 +8894,36 @@ follows.
 
				       (ValueOf |$\itm{tmp}$| |$\FType$|)
			
 
				       (Exit)))
			
 
				 \end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{center}
			
 
				+If the target type of the projection is a vector or function type,
			
 
				+then there is a bit more work to do. For vectors, check that the
			
 
				+length of the vector (use the \code{vector-length} primitive) matches
			
 
				+the length of the vector type. For functions, check that its arity
			
 
				+(\code{procedure-arity}) matches the number of parameters in the
			
 
				+function type.
			
 
				 
			
 
				 Regarding \code{Inject}, we recommend compiling it to a slightly
			
 
				 lower-level primitive operation named \code{make-any}. This operation
			
 
				-takes the tag instead of the type of the injected value.
			
 
				-
			
 
				+takes a tag instead of a type. \\
			
 
				+\begin{center}
			
 
				+\begin{minipage}{1.0\textwidth}
			
 
				 \begin{lstlisting}
			
 
				 (Inject |$e$| |$\FType$|)
			
 
				 |$\Rightarrow$|
			
 
				 (Prim 'make-any (list |$e'$| (Int |$\itm{tagof}(\FType)$|)))
			
 
				 \end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{center}
			
 
				 
			
 
				 We recommend translating the type predicates (\code{boolean?}, etc.)
			
 
				 into uses of \code{tag-of-any} and \code{eq?}.
			
 
				 
			
 
				+\section{Closure Conversion for $R_6$}
			
 
				+\label{sec:closure-conversion-R6}
			
 
				+
			
 
				+
			
 
				+
			
 
				 \section{Instruction Selection for $R_6$}
			
 
				 \label{sec:select-r6}