4 жил өмнө · 5b436b4c45
--- a/book.tex
+++ b/book.tex
@@ -170,8 +170,8 @@ University.
 
															 \chapter*{Preface}
														
 
															 The tradition of compiler writing at Indiana University goes back to
														
 
															-research and courses about programming languages by Daniel Friedman in
														
 
															-the 1970's and 1980's. Dan conducted research on lazy
														
 
															+research and courses on programming languages by Professor Daniel
														
 
															+Friedman in the 1970's and 1980's. Friedman conducted research on lazy
														
 
															 evaluation~\citep{Friedman:1976aa} in the context of
														
 
															 Lisp~\citep{McCarthy:1960dz} and then studied
														
 
															 continuations~\citep{Felleisen:kx} and
														
@@ -180,67 +180,67 @@ Scheme~\citep{Sussman:1975ab}, a dialect of Lisp.  One of the students
 
															 of those courses, Kent Dybvig, went on to build Chez
														
 
															 Scheme~\citep{Dybvig:2006aa}, a production-quality and efficient
														
 
															 compiler for Scheme. After completing his Ph.D. at the University of
														
 
															-North Carolina, Kent returned to teach at Indiana University.
														
 
															-Throughout the 1990's and 2000's, Kent continued development of Chez
														
 
															-Scheme and taught the compiler course.
														
 
															+North Carolina, he returned to teach at Indiana University.
														
 
															+Throughout the 1990's and 2000's, Professor Dybvig continued
														
 
															+development of Chez Scheme and taught the compiler course.
														
 
															 The compiler course evolved to incorporate novel pedagogical ideas
														
 
															 while also including elements of effective real-world compilers.  One
														
 
															-of Dan's ideas was to split the compiler into many small ``passes'' so
														
 
															-that the code for each pass would be easy to understood in isolation.
														
 
															-(In contrast, most compilers of the time were organized into only a
														
 
															-few monolithic passes for reasons of compile-time efficiency.)  Kent,
														
 
															-with later help from his students Dipanwita Sarkar and Andrew Keep,
														
 
															-developed infrastructure to support this approach and evolved the
														
 
															-course, first to use micro-sized passes and then into even smaller
														
 
															-nano passes~\citep{Sarkar:2004fk,Keep:2012aa}. Jeremy Siek was a
														
 
															-student in this compiler course in the early 2000's, as part of his
														
 
															-Ph.D. studies at Indiana University. Needless to say, Jeremy enjoyed
														
 
															-the course immensely!
														
 
															-
														
 
															-During that time, another student named Abdulaziz Ghuloum observed
														
 
															-that the front-to-back organization of the course made it difficult
														
 
															-for students to understand the rationale for the compiler
														
 
															-design. Abdulaziz proposed an incremental approach in which the
														
 
															-students build the compiler in stages; they start by implementing a
														
 
															-complete compiler for a very small subset of the input language and in
														
 
															-each subsequent stage they add a language feature and add or modify
														
 
															-passes to handle the new feature~\citep{Ghuloum:2006bh}.  In this way,
														
 
															-the students see how the language features motivate aspects of the
														
 
															+of Friedman's ideas was to split the compiler into many small
														
 
															+``passes'' so that the code for each pass would be easy to understood
														
 
															+in isolation.  (In contrast, most compilers of the time were organized
														
 
															+into only a few monolithic passes for reasons of compile-time
														
 
															+efficiency.)  Dybvig, with later help from his students Dipanwita
														
 
															+Sarkar and Andrew Keep, developed infrastructure to support this
														
 
															+approach and evolved the course, first to use smaller micro-passes and
														
 
															+then into even smaller
														
 
															+nano-passes~\citep{Sarkar:2004fk,Keep:2012aa}. I was a student in this
														
 
															+compiler course in the early 2000's as part of his Ph.D. studies at
														
 
															+Indiana University. Needless to say, I enjoyed the course immensely!
														
 
															+
														
 
															+During that time, another graduate student named Abdulaziz Ghuloum
														
 
															+observed that the front-to-back organization of the course made it
														
 
															+difficult for students to understand the rationale for the compiler
														
 
															+design. Ghuloum proposed an incremental approach in which the students
														
 
															+build the compiler in stages; they start by implementing a complete
														
 
															+compiler for a very small subset of the input language and in each
														
 
															+subsequent stage they add a language feature and add or modify passes
														
 
															+to handle the new feature~\citep{Ghuloum:2006bh}.  In this way, the
														
 
															+students see how the language features motivate aspects of the
														
 
															 compiler design.
														
 
															-After graduating from Indiana University in 2005, Jeremy went on to
														
 
															-teach at the University of Colorado. He adapted the nano pass and
														
 
															-incremental approaches to compiling a subset of the Python
														
 
															+After graduating from Indiana University in 2005, I went on to teach
														
 
															+at the University of Colorado. I adapted the nano-pass and incremental
														
 
															+approaches to compiling a subset of the Python
														
 
															 language~\citep{Siek:2012ab}.  Python and Scheme are quite different
														
 
															 on the surface but there is a large overlap in the compiler techniques
														
 
															-required for the two languages. Thus, Jeremy was able to teach much of
														
 
															-the same content from the Indiana compiler course. He very much
														
 
															-enjoyed teaching the course organized in this way, and even better,
														
 
															-many of the students learned a lot and got excited about compilers.
														
 
															-
														
 
															-Jeremy returned to teach at Indiana University in 2013.  In his
														
 
															-absence the compiler course had switched from the front-to-back
														
 
															-organization to a back-to-front organization. Seeing how well the
														
 
															-incremental approach worked at Colorado, he started porting and
														
 
															-adapting the structure of the Colorado course back into the land of
														
 
															-Scheme. In the meantime Indiana had moved on from Scheme to Racket, so
														
 
															-the course is now about compiling a subset of Racket (and Typed
														
 
															-Racket) to the x86 assembly language. The compiler is implemented in
														
 
															-Racket 7.1~\citep{plt-tr}.
														
 
															+required for the two languages. Thus, I was able to teach much of the
														
 
															+same content from the Indiana compiler course. I very much enjoyed
														
 
															+teaching the course organized in this way, and even better, many of
														
 
															+the students learned a lot and got excited about compilers.
														
 
															+
														
 
															+I returned to teach at Indiana University in 2013.  In my absence the
														
 
															+compiler course had switched from the front-to-back organization to a
														
 
															+back-to-front organization. Seeing how well the incremental approach
														
 
															+worked at Colorado, I started porting and adapting the structure of
														
 
															+the Colorado course back into the land of Scheme. In the meantime
														
 
															+Indiana University had moved on from Scheme to Racket, so the course
														
 
															+is now about compiling a subset of Racket (and Typed Racket) to the
														
 
															+x86 assembly language. The compiler is implemented in
														
 
															+Racket~\citep{plt-tr}.
														
 
															 This is the textbook for the incremental version of the compiler
														
 
															 course at Indiana University (Spring 2016 - present) and it is the
														
 
															-first open textbook for an Indiana compiler course.  With this book we
														
 
															+first open textbook for an Indiana compiler course.  With this book I
														
 
															 hope to make the Indiana compiler course available to people that have
														
 
															-not had the chance to study in Bloomington in person.  Many of the
														
 
															-compiler design decisions in this book are drawn from the assignment
														
 
															-descriptions of \cite{Dybvig:2010aa}. We have captured what we think
														
 
															-are the most important topics from \cite{Dybvig:2010aa} but we have
														
 
															-omitted topics that we think are less interesting conceptually and we
														
 
															-have made simplifications to reduce complexity.  In this way, this
														
 
															+not had the chance to study compilers at Indiana University.  Many of
														
 
															+the compiler design decisions in this book are drawn from the
														
 
															+assignment descriptions of \cite{Dybvig:2010aa}. I have captured what
														
 
															+I think are the most important topics from \cite{Dybvig:2010aa} but
														
 
															+have omitted topics that are less interesting conceptually. I have
														
 
															+also made simplifications to reduce complexity.  In this way, this
														
 
															 book leans more towards pedagogy than towards the efficiency of the
														
 
															-generated code. Also, the book differs in places where we saw the
														
 
															+generated code. Also, the book differs in places where we I the
														
 
															 opportunity to make the topics more fun, such as in relating register
														
 
															 allocation to Sudoku (Chapter~\ref{ch:register-allocation-r1}).
														
@@ -255,10 +255,22 @@ proficient with Racket (or Scheme) prior to reading this book. There
 
															 are many excellent resources for learning Scheme and
														
 
															 Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}. It
														
 
															 is helpful but not necessary for the student to have prior exposure to
														
 
															-the x86 (or x86-64) assembly language~\citep{Intel:2015aa}, as one might
														
 
															-obtain from a computer systems
														
 
															-course~\citep{Bryant:2005aa,Bryant:2010aa}.  This book introduces the
														
 
															+the x86 (or x86-64) assembly language~\citep{Intel:2015aa}, as one
														
 
															+might obtain from a computer systems
														
 
															+course~\citep{Bryant:2005aa,Bryant:2010aa}. This book introduces the
														
 
															 parts of x86-64 assembly language that are needed.
														
 
															+%
														
 
															+We follow the System V calling conventions~\citep{Matz:2013aa}, which
														
 
															+means that the assembly code that we generate will work properly with
														
 
															+our runtime system (written in C) when it is compiled using the GNU C
														
 
															+compiler (\code{gcc}) on Linux or MacOS. (Minor adjustments are needed
														
 
															+for MacOS, which we note as they arise.)
														
 
															+%
														
 
															+The Microsoft Windows operating system uses a different calling
														
 
															+convention~\citep{Microsoft:2018aa}, which is followed by the GNU C
														
 
															+compiler when running on Windows. So the assembly code that we
														
 
															+generate will \emph{not} work on Windows.
														
 
															+
														
 
															 %\section*{Structure of book}
														
 
															 % You might want to add short description about each chapter in this book.
														
@@ -1353,7 +1365,7 @@ x86_0 &::= & \key{.globl main}\\
 
															 \]
														
 
															 \end{minipage}
														
 
															 }
														
 
															-\caption{The concrete syntax of the x86$_0$ assembly language (AT\&T syntax).}
														
 
															+\caption{The syntax of the x86$_0$ assembly language (AT\&T syntax).}
														
 
															 \label{fig:x86-0-concrete}
														
 
															 \end{figure}
														
@@ -5937,8 +5949,10 @@ An implementation of the copying collector is provided in the
 
															 interface to the garbage collector that is used by the compiler. The
														
 
															 \code{initialize} function creates the FromSpace, ToSpace, and root
														
 
															 stack and should be called in the prelude of the \code{main}
														
 
															-function. The \code{initialize} function puts the address of the
														
 
															-beginning of the FromSpace into the global variable
														
 
															+function. The arguments of \code{initialize} are the root stack size
														
 
															+and the heap size. Both need to be multiples of $64$ and $16384$ is a
														
 
															+good choice for both.  The \code{initialize} function puts the address
														
 
															+of the beginning of the FromSpace into the global variable
														
 
															 \code{free\_ptr}. The global variable \code{fromspace\_end} points to
														
 
															 the address that is 1-past the last element of the FromSpace. (We use
														
 
															 half-open intervals to represent chunks of
														
@@ -6625,7 +6639,7 @@ main:
 
															 	pushq	%r14
														
 
															 	subq	$0, %rsp
														
 
															 	movq $16384, %rdi
														
 
															-	movq $16, %rsi
														
 
															+	movq $16384, %rsi
														
 
															 	callq initialize
														
 
															 	movq rootstack_begin(%rip), %r15
														
 
															 	movq $0, (%r15)
														
@@ -6917,7 +6931,7 @@ inside each other.
 
															     (\key{vector-ref}\;\Exp\;\Int)} \\
														
 
															   &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
														
 
															       \mid \LP\key{has-type}~\Exp~\Type\RP } \\
														
 
															-      &\mid& \LP\Exp \; \Exp \ldots\RP \\
														
 
															+  &\mid& \LP\Exp \; \Exp \ldots\RP \\
														
 
															   \Def &::=& \CDEF{\Var}{\LS\Var \key{:} \Type\RS \ldots}{\Type}{\Exp} \\
														
 
															   R_4 &::=& \Def \ldots \; \Exp
														
 
															 \end{array}
														
@@ -8128,7 +8142,8 @@ syntax for function application.
 
															     &\mid& \gray{ (\key{vector}\;\Exp\ldots) \mid
														
 
															           (\key{vector-ref}\;\Exp\;\Int)} \\
														
 
															     &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
														
 
															-     \mid (\Exp \; \Exp\ldots) } \\
														
 
															+    \mid (\Exp \; \Exp\ldots) } \\
														
 
															+    &\mid& \LP \key{procedure-arity}~\Exp\RP \\
														
 
															     &\mid& \CLAMBDA{\LP\LS\Var \key{:} \Type\RS\ldots\RP}{\Type}{\Exp} \\
														
 
															   \Def &::=& \gray{ \CDEF{\Var}{\LS\Var \key{:} \Type\RS\ldots}{\Type}{\Exp} } \\
														
 
															   R_5 &::=& \gray{\Def\ldots \; \Exp}
														
@@ -8148,14 +8163,15 @@ syntax for function application.
 
															     \small
														
 
															 \[
														
 
															 \begin{array}{lcl}
														
 
															+  \itm{op} &::=& \ldots \mid \code{procedure-arity} \\
														
 
															   \Exp &::=& \gray{ \INT{\Int} \VAR{\Var} \mid \LET{\Var}{\Exp}{\Exp} } \\
														
 
															        &\mid& \gray{ \PRIM{\itm{op}}{\Exp\ldots} }\\
														
 
															      &\mid& \gray{ \BOOL{\itm{bool}}
														
 
															       \mid \IF{\Exp}{\Exp}{\Exp} } \\
														
 
															      &\mid& \gray{ \VOID{} \mid \LP\key{HasType}~\Exp~\Type \RP 
														
 
															      \mid \APPLY{\Exp}{\Exp\ldots} }\\
														
 
															-     &\mid& \LAMBDA{\LP[\Var\code{:}\Type]\ldots\RP}{\Type}{\Exp}\\
														
 
															- \Def &::=& \gray{ \FUNDEF{\Var}{\LP[\Var \code{:} \Type]\ldots\RP}{\Type}{\code{'()}}{\Exp} }\\
														
 
															+     &\mid& \LAMBDA{\LP\LS\Var\code{:}\Type\RS\ldots\RP}{\Type}{\Exp}\\
														
 
															+ \Def &::=& \gray{ \FUNDEF{\Var}{\LP\LS\Var \code{:} \Type\RS\ldots\RP}{\Type}{\code{'()}}{\Exp} }\\
														
 
															   R_5 &::=& \gray{ \PROGRAMDEFSEXP{\code{'()}}{\LP\Def\ldots\RP}{\Exp} }
														
 
															 \end{array}
														
 
															 \]
														
@@ -8178,23 +8194,7 @@ values.
 
															 \begin{figure}[tbp]
														
 
															 \begin{lstlisting}
														
 
															-(define (interp-exp env)
														
 
															-  (lambda (e)
														
 
															-    (define recur (interp-exp env))
														
 
															-    (match e
														
 
															-      ...
														
 
															-      [(Lambda (list `[,xs : ,Ts] ...) rT body)
														
 
															-       `(lambda ,xs ,body ,env)]
														
 
															-      [(Apply fun args)
														
 
															-       (define fun-val ((interp-exp env) fun))
														
 
															-       (define arg-vals (map (interp-exp env) args))
														
 
															-       (match fun-val
														
 
															-	 [`(lambda ,xs ,body ,lam-env)
														
 
															-	  (define new-env (append (map cons xs arg-vals) lam-env))
														
 
															-	  ((interp-exp new-env) body)]
														
 
															-	 [else (error "interp-exp, expected function, not" fun-val)])]
														
 
															-      [else (error 'interp-exp "unrecognized expression")]
														
 
															-      )))
														
 
															+UPDATE ME
														
 
															 \end{lstlisting}
														
 
															 \caption{Interpreter for $R_5$.}
														
 
															 \label{fig:interp-R5}
														
@@ -8215,13 +8215,13 @@ require the body's type to match the declared return type.
 
															 (define (type-check-R5 env)
														
 
															   (lambda (e)
														
 
															     (match e
														
 
															-      [(Lambda (and bnd `([,xs : ,Ts] ...)) rT body)
														
 
															+      [(Lambda (and params `([,xs : ,Ts] ...)) rT body)
														
 
															        (define-values (new-body bodyT) 
														
 
															           ((type-check-exp (append (map cons xs Ts) env)) body))
														
 
															        (define ty `(,@Ts -> ,rT))
														
 
															        (cond
														
 
															          [(equal? rT bodyT)
														
 
															-           (values (HasType (Lambda bnd rT new-body) ty) ty)]
														
 
															+           (values (HasType (Lambda params rT new-body) ty) ty)]
														
 
															          [else
														
 
															            (error "mismatch in return type" bodyT rT)])]
														
 
															       ...
														
@@ -8875,14 +8875,16 @@ an explicit \code{If} expression that uses two new forms,
 
															 \code{tag-of-any}.  The \code{tag-of-any} operation retrieves the type
														
 
															 tag from a tagged value of type \code{Any}.  The \code{ValueOf} form
														
 
															 retrieves the underlying value from a tagged value.  The
														
 
															-\code{ValueOf} form includes the type for the underlying value, which
														
 
															-is needed by the type checker.  Finally, the \code{Exit} form ends the
														
 
															-execution of the program by invoking the operating system's
														
 
															-\code{exit} function. So the translation for \code{Project} is as
														
 
															-follows.
														
 
															+\code{ValueOf} form includes the type for the underlying value which
														
 
															+is used by the type checker.  Finally, the \code{Exit} form ends the
														
 
															+execution of the program.
														
 
															+%
														
 
															+If the target type of the projection is \code{Boolean} or
														
 
															+\code{Integer}, then \code{Project} can be translated as follows.
														
 
															 %(We have omitted the \code{has-type} AST nodes to make this
														
 
															 %output more readable.)
														
 
															-
														
 
															+\begin{center}
														
 
															+\begin{minipage}{1.0\textwidth}
														
 
															 \begin{lstlisting}
														
 
															 (Project |$e$| |$\FType$|)
														
 
															 |$\Rightarrow$|
														
@@ -8892,20 +8894,36 @@ follows.
 
															       (ValueOf |$\itm{tmp}$| |$\FType$|)
														
 
															       (Exit)))
														
 
															 \end{lstlisting}
														
 
															+\end{minipage}
														
 
															+\end{center}
														
 
															+If the target type of the projection is a vector or function type,
														
 
															+then there is a bit more work to do. For vectors, check that the
														
 
															+length of the vector (use the \code{vector-length} primitive) matches
														
 
															+the length of the vector type. For functions, check that its arity
														
 
															+(\code{procedure-arity}) matches the number of parameters in the
														
 
															+function type.
														
 
															 Regarding \code{Inject}, we recommend compiling it to a slightly
														
 
															 lower-level primitive operation named \code{make-any}. This operation
														
 
															-takes the tag instead of the type of the injected value.
														
 
															-
														
 
															+takes a tag instead of a type. \\
														
 
															+\begin{center}
														
 
															+\begin{minipage}{1.0\textwidth}
														
 
															 \begin{lstlisting}
														
 
															 (Inject |$e$| |$\FType$|)
														
 
															 |$\Rightarrow$|
														
 
															 (Prim 'make-any (list |$e'$| (Int |$\itm{tagof}(\FType)$|)))
														
 
															 \end{lstlisting}
														
 
															+\end{minipage}
														
 
															+\end{center}
														
 
															 We recommend translating the type predicates (\code{boolean?}, etc.)
														
 
															 into uses of \code{tag-of-any} and \code{eq?}.
														
 
															+\section{Closure Conversion for $R_6$}
														
 
															+\label{sec:closure-conversion-R6}
														
 
															+
														
 
															+
														
 
															+
														
 
															 \section{Instruction Selection for $R_6$}
														
 
															 \label{sec:select-r6}