Jeremy Siek %!s(int64=4) %!d(string=hai) anos
pai
achega
5b436b4c45
Modificáronse 1 ficheiros con 107 adicións e 89 borrados
  1. 107 89
      book.tex

+ 107 - 89
book.tex

@@ -170,8 +170,8 @@ University.
 \chapter*{Preface}
 \chapter*{Preface}
 
 
 The tradition of compiler writing at Indiana University goes back to
 The tradition of compiler writing at Indiana University goes back to
-research and courses about programming languages by Daniel Friedman in
-the 1970's and 1980's. Dan conducted research on lazy
+research and courses on programming languages by Professor Daniel
+Friedman in the 1970's and 1980's. Friedman conducted research on lazy
 evaluation~\citep{Friedman:1976aa} in the context of
 evaluation~\citep{Friedman:1976aa} in the context of
 Lisp~\citep{McCarthy:1960dz} and then studied
 Lisp~\citep{McCarthy:1960dz} and then studied
 continuations~\citep{Felleisen:kx} and
 continuations~\citep{Felleisen:kx} and
@@ -180,67 +180,67 @@ Scheme~\citep{Sussman:1975ab}, a dialect of Lisp.  One of the students
 of those courses, Kent Dybvig, went on to build Chez
 of those courses, Kent Dybvig, went on to build Chez
 Scheme~\citep{Dybvig:2006aa}, a production-quality and efficient
 Scheme~\citep{Dybvig:2006aa}, a production-quality and efficient
 compiler for Scheme. After completing his Ph.D. at the University of
 compiler for Scheme. After completing his Ph.D. at the University of
-North Carolina, Kent returned to teach at Indiana University.
-Throughout the 1990's and 2000's, Kent continued development of Chez
-Scheme and taught the compiler course.
+North Carolina, he returned to teach at Indiana University.
+Throughout the 1990's and 2000's, Professor Dybvig continued
+development of Chez Scheme and taught the compiler course.
 
 
 The compiler course evolved to incorporate novel pedagogical ideas
 The compiler course evolved to incorporate novel pedagogical ideas
 while also including elements of effective real-world compilers.  One
 while also including elements of effective real-world compilers.  One
-of Dan's ideas was to split the compiler into many small ``passes'' so
-that the code for each pass would be easy to understood in isolation.
-(In contrast, most compilers of the time were organized into only a
-few monolithic passes for reasons of compile-time efficiency.)  Kent,
-with later help from his students Dipanwita Sarkar and Andrew Keep,
-developed infrastructure to support this approach and evolved the
-course, first to use micro-sized passes and then into even smaller
-nano passes~\citep{Sarkar:2004fk,Keep:2012aa}. Jeremy Siek was a
-student in this compiler course in the early 2000's, as part of his
-Ph.D. studies at Indiana University. Needless to say, Jeremy enjoyed
-the course immensely!
-
-During that time, another student named Abdulaziz Ghuloum observed
-that the front-to-back organization of the course made it difficult
-for students to understand the rationale for the compiler
-design. Abdulaziz proposed an incremental approach in which the
-students build the compiler in stages; they start by implementing a
-complete compiler for a very small subset of the input language and in
-each subsequent stage they add a language feature and add or modify
-passes to handle the new feature~\citep{Ghuloum:2006bh}.  In this way,
-the students see how the language features motivate aspects of the
+of Friedman's ideas was to split the compiler into many small
+``passes'' so that the code for each pass would be easy to understood
+in isolation.  (In contrast, most compilers of the time were organized
+into only a few monolithic passes for reasons of compile-time
+efficiency.)  Dybvig, with later help from his students Dipanwita
+Sarkar and Andrew Keep, developed infrastructure to support this
+approach and evolved the course, first to use smaller micro-passes and
+then into even smaller
+nano-passes~\citep{Sarkar:2004fk,Keep:2012aa}. I was a student in this
+compiler course in the early 2000's as part of his Ph.D. studies at
+Indiana University. Needless to say, I enjoyed the course immensely!
+
+During that time, another graduate student named Abdulaziz Ghuloum
+observed that the front-to-back organization of the course made it
+difficult for students to understand the rationale for the compiler
+design. Ghuloum proposed an incremental approach in which the students
+build the compiler in stages; they start by implementing a complete
+compiler for a very small subset of the input language and in each
+subsequent stage they add a language feature and add or modify passes
+to handle the new feature~\citep{Ghuloum:2006bh}.  In this way, the
+students see how the language features motivate aspects of the
 compiler design.
 compiler design.
 
 
-After graduating from Indiana University in 2005, Jeremy went on to
-teach at the University of Colorado. He adapted the nano pass and
-incremental approaches to compiling a subset of the Python
+After graduating from Indiana University in 2005, I went on to teach
+at the University of Colorado. I adapted the nano-pass and incremental
+approaches to compiling a subset of the Python
 language~\citep{Siek:2012ab}.  Python and Scheme are quite different
 language~\citep{Siek:2012ab}.  Python and Scheme are quite different
 on the surface but there is a large overlap in the compiler techniques
 on the surface but there is a large overlap in the compiler techniques
-required for the two languages. Thus, Jeremy was able to teach much of
-the same content from the Indiana compiler course. He very much
-enjoyed teaching the course organized in this way, and even better,
-many of the students learned a lot and got excited about compilers.
-
-Jeremy returned to teach at Indiana University in 2013.  In his
-absence the compiler course had switched from the front-to-back
-organization to a back-to-front organization. Seeing how well the
-incremental approach worked at Colorado, he started porting and
-adapting the structure of the Colorado course back into the land of
-Scheme. In the meantime Indiana had moved on from Scheme to Racket, so
-the course is now about compiling a subset of Racket (and Typed
-Racket) to the x86 assembly language. The compiler is implemented in
-Racket 7.1~\citep{plt-tr}.
+required for the two languages. Thus, I was able to teach much of the
+same content from the Indiana compiler course. I very much enjoyed
+teaching the course organized in this way, and even better, many of
+the students learned a lot and got excited about compilers.
+
+I returned to teach at Indiana University in 2013.  In my absence the
+compiler course had switched from the front-to-back organization to a
+back-to-front organization. Seeing how well the incremental approach
+worked at Colorado, I started porting and adapting the structure of
+the Colorado course back into the land of Scheme. In the meantime
+Indiana University had moved on from Scheme to Racket, so the course
+is now about compiling a subset of Racket (and Typed Racket) to the
+x86 assembly language. The compiler is implemented in
+Racket~\citep{plt-tr}.
 
 
 This is the textbook for the incremental version of the compiler
 This is the textbook for the incremental version of the compiler
 course at Indiana University (Spring 2016 - present) and it is the
 course at Indiana University (Spring 2016 - present) and it is the
-first open textbook for an Indiana compiler course.  With this book we
+first open textbook for an Indiana compiler course.  With this book I
 hope to make the Indiana compiler course available to people that have
 hope to make the Indiana compiler course available to people that have
-not had the chance to study in Bloomington in person.  Many of the
-compiler design decisions in this book are drawn from the assignment
-descriptions of \cite{Dybvig:2010aa}. We have captured what we think
-are the most important topics from \cite{Dybvig:2010aa} but we have
-omitted topics that we think are less interesting conceptually and we
-have made simplifications to reduce complexity.  In this way, this
+not had the chance to study compilers at Indiana University.  Many of
+the compiler design decisions in this book are drawn from the
+assignment descriptions of \cite{Dybvig:2010aa}. I have captured what
+I think are the most important topics from \cite{Dybvig:2010aa} but
+have omitted topics that are less interesting conceptually. I have
+also made simplifications to reduce complexity.  In this way, this
 book leans more towards pedagogy than towards the efficiency of the
 book leans more towards pedagogy than towards the efficiency of the
-generated code. Also, the book differs in places where we saw the
+generated code. Also, the book differs in places where we I the
 opportunity to make the topics more fun, such as in relating register
 opportunity to make the topics more fun, such as in relating register
 allocation to Sudoku (Chapter~\ref{ch:register-allocation-r1}).
 allocation to Sudoku (Chapter~\ref{ch:register-allocation-r1}).
 
 
@@ -255,10 +255,22 @@ proficient with Racket (or Scheme) prior to reading this book. There
 are many excellent resources for learning Scheme and
 are many excellent resources for learning Scheme and
 Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}. It
 Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}. It
 is helpful but not necessary for the student to have prior exposure to
 is helpful but not necessary for the student to have prior exposure to
-the x86 (or x86-64) assembly language~\citep{Intel:2015aa}, as one might
-obtain from a computer systems
-course~\citep{Bryant:2005aa,Bryant:2010aa}.  This book introduces the
+the x86 (or x86-64) assembly language~\citep{Intel:2015aa}, as one
+might obtain from a computer systems
+course~\citep{Bryant:2005aa,Bryant:2010aa}. This book introduces the
 parts of x86-64 assembly language that are needed.
 parts of x86-64 assembly language that are needed.
+%
+We follow the System V calling conventions~\citep{Matz:2013aa}, which
+means that the assembly code that we generate will work properly with
+our runtime system (written in C) when it is compiled using the GNU C
+compiler (\code{gcc}) on Linux or MacOS. (Minor adjustments are needed
+for MacOS, which we note as they arise.)
+%
+The Microsoft Windows operating system uses a different calling
+convention~\citep{Microsoft:2018aa}, which is followed by the GNU C
+compiler when running on Windows. So the assembly code that we
+generate will \emph{not} work on Windows.
+
 
 
 %\section*{Structure of book}
 %\section*{Structure of book}
 % You might want to add short description about each chapter in this book.
 % You might want to add short description about each chapter in this book.
@@ -1353,7 +1365,7 @@ x86_0 &::= & \key{.globl main}\\
 \]
 \]
 \end{minipage}
 \end{minipage}
 }
 }
-\caption{The concrete syntax of the x86$_0$ assembly language (AT\&T syntax).}
+\caption{The syntax of the x86$_0$ assembly language (AT\&T syntax).}
 \label{fig:x86-0-concrete}
 \label{fig:x86-0-concrete}
 \end{figure}
 \end{figure}
 
 
@@ -5937,8 +5949,10 @@ An implementation of the copying collector is provided in the
 interface to the garbage collector that is used by the compiler. The
 interface to the garbage collector that is used by the compiler. The
 \code{initialize} function creates the FromSpace, ToSpace, and root
 \code{initialize} function creates the FromSpace, ToSpace, and root
 stack and should be called in the prelude of the \code{main}
 stack and should be called in the prelude of the \code{main}
-function. The \code{initialize} function puts the address of the
-beginning of the FromSpace into the global variable
+function. The arguments of \code{initialize} are the root stack size
+and the heap size. Both need to be multiples of $64$ and $16384$ is a
+good choice for both.  The \code{initialize} function puts the address
+of the beginning of the FromSpace into the global variable
 \code{free\_ptr}. The global variable \code{fromspace\_end} points to
 \code{free\_ptr}. The global variable \code{fromspace\_end} points to
 the address that is 1-past the last element of the FromSpace. (We use
 the address that is 1-past the last element of the FromSpace. (We use
 half-open intervals to represent chunks of
 half-open intervals to represent chunks of
@@ -6625,7 +6639,7 @@ main:
 	pushq	%r14
 	pushq	%r14
 	subq	$0, %rsp
 	subq	$0, %rsp
 	movq $16384, %rdi
 	movq $16384, %rdi
-	movq $16, %rsi
+	movq $16384, %rsi
 	callq initialize
 	callq initialize
 	movq rootstack_begin(%rip), %r15
 	movq rootstack_begin(%rip), %r15
 	movq $0, (%r15)
 	movq $0, (%r15)
@@ -6917,7 +6931,7 @@ inside each other.
     (\key{vector-ref}\;\Exp\;\Int)} \\
     (\key{vector-ref}\;\Exp\;\Int)} \\
   &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
   &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
       \mid \LP\key{has-type}~\Exp~\Type\RP } \\
       \mid \LP\key{has-type}~\Exp~\Type\RP } \\
-      &\mid& \LP\Exp \; \Exp \ldots\RP \\
+  &\mid& \LP\Exp \; \Exp \ldots\RP \\
   \Def &::=& \CDEF{\Var}{\LS\Var \key{:} \Type\RS \ldots}{\Type}{\Exp} \\
   \Def &::=& \CDEF{\Var}{\LS\Var \key{:} \Type\RS \ldots}{\Type}{\Exp} \\
   R_4 &::=& \Def \ldots \; \Exp
   R_4 &::=& \Def \ldots \; \Exp
 \end{array}
 \end{array}
@@ -8128,7 +8142,8 @@ syntax for function application.
     &\mid& \gray{ (\key{vector}\;\Exp\ldots) \mid
     &\mid& \gray{ (\key{vector}\;\Exp\ldots) \mid
           (\key{vector-ref}\;\Exp\;\Int)} \\
           (\key{vector-ref}\;\Exp\;\Int)} \\
     &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
     &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
-     \mid (\Exp \; \Exp\ldots) } \\
+    \mid (\Exp \; \Exp\ldots) } \\
+    &\mid& \LP \key{procedure-arity}~\Exp\RP \\
     &\mid& \CLAMBDA{\LP\LS\Var \key{:} \Type\RS\ldots\RP}{\Type}{\Exp} \\
     &\mid& \CLAMBDA{\LP\LS\Var \key{:} \Type\RS\ldots\RP}{\Type}{\Exp} \\
   \Def &::=& \gray{ \CDEF{\Var}{\LS\Var \key{:} \Type\RS\ldots}{\Type}{\Exp} } \\
   \Def &::=& \gray{ \CDEF{\Var}{\LS\Var \key{:} \Type\RS\ldots}{\Type}{\Exp} } \\
   R_5 &::=& \gray{\Def\ldots \; \Exp}
   R_5 &::=& \gray{\Def\ldots \; \Exp}
@@ -8148,14 +8163,15 @@ syntax for function application.
     \small
     \small
 \[
 \[
 \begin{array}{lcl}
 \begin{array}{lcl}
+  \itm{op} &::=& \ldots \mid \code{procedure-arity} \\
   \Exp &::=& \gray{ \INT{\Int} \VAR{\Var} \mid \LET{\Var}{\Exp}{\Exp} } \\
   \Exp &::=& \gray{ \INT{\Int} \VAR{\Var} \mid \LET{\Var}{\Exp}{\Exp} } \\
        &\mid& \gray{ \PRIM{\itm{op}}{\Exp\ldots} }\\
        &\mid& \gray{ \PRIM{\itm{op}}{\Exp\ldots} }\\
      &\mid& \gray{ \BOOL{\itm{bool}}
      &\mid& \gray{ \BOOL{\itm{bool}}
       \mid \IF{\Exp}{\Exp}{\Exp} } \\
       \mid \IF{\Exp}{\Exp}{\Exp} } \\
      &\mid& \gray{ \VOID{} \mid \LP\key{HasType}~\Exp~\Type \RP 
      &\mid& \gray{ \VOID{} \mid \LP\key{HasType}~\Exp~\Type \RP 
      \mid \APPLY{\Exp}{\Exp\ldots} }\\
      \mid \APPLY{\Exp}{\Exp\ldots} }\\
-     &\mid& \LAMBDA{\LP[\Var\code{:}\Type]\ldots\RP}{\Type}{\Exp}\\
- \Def &::=& \gray{ \FUNDEF{\Var}{\LP[\Var \code{:} \Type]\ldots\RP}{\Type}{\code{'()}}{\Exp} }\\
+     &\mid& \LAMBDA{\LP\LS\Var\code{:}\Type\RS\ldots\RP}{\Type}{\Exp}\\
+ \Def &::=& \gray{ \FUNDEF{\Var}{\LP\LS\Var \code{:} \Type\RS\ldots\RP}{\Type}{\code{'()}}{\Exp} }\\
   R_5 &::=& \gray{ \PROGRAMDEFSEXP{\code{'()}}{\LP\Def\ldots\RP}{\Exp} }
   R_5 &::=& \gray{ \PROGRAMDEFSEXP{\code{'()}}{\LP\Def\ldots\RP}{\Exp} }
 \end{array}
 \end{array}
 \]
 \]
@@ -8178,23 +8194,7 @@ values.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \begin{lstlisting}
 \begin{lstlisting}
-(define (interp-exp env)
-  (lambda (e)
-    (define recur (interp-exp env))
-    (match e
-      ...
-      [(Lambda (list `[,xs : ,Ts] ...) rT body)
-       `(lambda ,xs ,body ,env)]
-      [(Apply fun args)
-       (define fun-val ((interp-exp env) fun))
-       (define arg-vals (map (interp-exp env) args))
-       (match fun-val
-	 [`(lambda ,xs ,body ,lam-env)
-	  (define new-env (append (map cons xs arg-vals) lam-env))
-	  ((interp-exp new-env) body)]
-	 [else (error "interp-exp, expected function, not" fun-val)])]
-      [else (error 'interp-exp "unrecognized expression")]
-      )))
+UPDATE ME
 \end{lstlisting}
 \end{lstlisting}
 \caption{Interpreter for $R_5$.}
 \caption{Interpreter for $R_5$.}
 \label{fig:interp-R5}
 \label{fig:interp-R5}
@@ -8215,13 +8215,13 @@ require the body's type to match the declared return type.
 (define (type-check-R5 env)
 (define (type-check-R5 env)
   (lambda (e)
   (lambda (e)
     (match e
     (match e
-      [(Lambda (and bnd `([,xs : ,Ts] ...)) rT body)
+      [(Lambda (and params `([,xs : ,Ts] ...)) rT body)
        (define-values (new-body bodyT) 
        (define-values (new-body bodyT) 
           ((type-check-exp (append (map cons xs Ts) env)) body))
           ((type-check-exp (append (map cons xs Ts) env)) body))
        (define ty `(,@Ts -> ,rT))
        (define ty `(,@Ts -> ,rT))
        (cond
        (cond
          [(equal? rT bodyT)
          [(equal? rT bodyT)
-           (values (HasType (Lambda bnd rT new-body) ty) ty)]
+           (values (HasType (Lambda params rT new-body) ty) ty)]
          [else
          [else
            (error "mismatch in return type" bodyT rT)])]
            (error "mismatch in return type" bodyT rT)])]
       ...
       ...
@@ -8875,14 +8875,16 @@ an explicit \code{If} expression that uses two new forms,
 \code{tag-of-any}.  The \code{tag-of-any} operation retrieves the type
 \code{tag-of-any}.  The \code{tag-of-any} operation retrieves the type
 tag from a tagged value of type \code{Any}.  The \code{ValueOf} form
 tag from a tagged value of type \code{Any}.  The \code{ValueOf} form
 retrieves the underlying value from a tagged value.  The
 retrieves the underlying value from a tagged value.  The
-\code{ValueOf} form includes the type for the underlying value, which
-is needed by the type checker.  Finally, the \code{Exit} form ends the
-execution of the program by invoking the operating system's
-\code{exit} function. So the translation for \code{Project} is as
-follows.
+\code{ValueOf} form includes the type for the underlying value which
+is used by the type checker.  Finally, the \code{Exit} form ends the
+execution of the program.
+%
+If the target type of the projection is \code{Boolean} or
+\code{Integer}, then \code{Project} can be translated as follows.
 %(We have omitted the \code{has-type} AST nodes to make this
 %(We have omitted the \code{has-type} AST nodes to make this
 %output more readable.)
 %output more readable.)
-
+\begin{center}
+\begin{minipage}{1.0\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
 (Project |$e$| |$\FType$|)
 (Project |$e$| |$\FType$|)
 |$\Rightarrow$|
 |$\Rightarrow$|
@@ -8892,20 +8894,36 @@ follows.
       (ValueOf |$\itm{tmp}$| |$\FType$|)
       (ValueOf |$\itm{tmp}$| |$\FType$|)
       (Exit)))
       (Exit)))
 \end{lstlisting}
 \end{lstlisting}
+\end{minipage}
+\end{center}
+If the target type of the projection is a vector or function type,
+then there is a bit more work to do. For vectors, check that the
+length of the vector (use the \code{vector-length} primitive) matches
+the length of the vector type. For functions, check that its arity
+(\code{procedure-arity}) matches the number of parameters in the
+function type.
 
 
 Regarding \code{Inject}, we recommend compiling it to a slightly
 Regarding \code{Inject}, we recommend compiling it to a slightly
 lower-level primitive operation named \code{make-any}. This operation
 lower-level primitive operation named \code{make-any}. This operation
-takes the tag instead of the type of the injected value.
-
+takes a tag instead of a type. \\
+\begin{center}
+\begin{minipage}{1.0\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
 (Inject |$e$| |$\FType$|)
 (Inject |$e$| |$\FType$|)
 |$\Rightarrow$|
 |$\Rightarrow$|
 (Prim 'make-any (list |$e'$| (Int |$\itm{tagof}(\FType)$|)))
 (Prim 'make-any (list |$e'$| (Int |$\itm{tagof}(\FType)$|)))
 \end{lstlisting}
 \end{lstlisting}
+\end{minipage}
+\end{center}
 
 
 We recommend translating the type predicates (\code{boolean?}, etc.)
 We recommend translating the type predicates (\code{boolean?}, etc.)
 into uses of \code{tag-of-any} and \code{eq?}.
 into uses of \code{tag-of-any} and \code{eq?}.
 
 
+\section{Closure Conversion for $R_6$}
+\label{sec:closure-conversion-R6}
+
+
+
 \section{Instruction Selection for $R_6$}
 \section{Instruction Selection for $R_6$}
 \label{sec:select-r6}
 \label{sec:select-r6}