Browse Source

windows x64

Jeremy Siek 4 years ago
parent
commit
5b436b4c45
1 changed files with 107 additions and 89 deletions
  1. 107 89
      book.tex

+ 107 - 89
book.tex

@@ -170,8 +170,8 @@ University.
 \chapter*{Preface}
 
 The tradition of compiler writing at Indiana University goes back to
-research and courses about programming languages by Daniel Friedman in
-the 1970's and 1980's. Dan conducted research on lazy
+research and courses on programming languages by Professor Daniel
+Friedman in the 1970's and 1980's. Friedman conducted research on lazy
 evaluation~\citep{Friedman:1976aa} in the context of
 Lisp~\citep{McCarthy:1960dz} and then studied
 continuations~\citep{Felleisen:kx} and
@@ -180,67 +180,67 @@ Scheme~\citep{Sussman:1975ab}, a dialect of Lisp.  One of the students
 of those courses, Kent Dybvig, went on to build Chez
 Scheme~\citep{Dybvig:2006aa}, a production-quality and efficient
 compiler for Scheme. After completing his Ph.D. at the University of
-North Carolina, Kent returned to teach at Indiana University.
-Throughout the 1990's and 2000's, Kent continued development of Chez
-Scheme and taught the compiler course.
+North Carolina, he returned to teach at Indiana University.
+Throughout the 1990's and 2000's, Professor Dybvig continued
+development of Chez Scheme and taught the compiler course.
 
 The compiler course evolved to incorporate novel pedagogical ideas
 while also including elements of effective real-world compilers.  One
-of Dan's ideas was to split the compiler into many small ``passes'' so
-that the code for each pass would be easy to understood in isolation.
-(In contrast, most compilers of the time were organized into only a
-few monolithic passes for reasons of compile-time efficiency.)  Kent,
-with later help from his students Dipanwita Sarkar and Andrew Keep,
-developed infrastructure to support this approach and evolved the
-course, first to use micro-sized passes and then into even smaller
-nano passes~\citep{Sarkar:2004fk,Keep:2012aa}. Jeremy Siek was a
-student in this compiler course in the early 2000's, as part of his
-Ph.D. studies at Indiana University. Needless to say, Jeremy enjoyed
-the course immensely!
-
-During that time, another student named Abdulaziz Ghuloum observed
-that the front-to-back organization of the course made it difficult
-for students to understand the rationale for the compiler
-design. Abdulaziz proposed an incremental approach in which the
-students build the compiler in stages; they start by implementing a
-complete compiler for a very small subset of the input language and in
-each subsequent stage they add a language feature and add or modify
-passes to handle the new feature~\citep{Ghuloum:2006bh}.  In this way,
-the students see how the language features motivate aspects of the
+of Friedman's ideas was to split the compiler into many small
+``passes'' so that the code for each pass would be easy to understood
+in isolation.  (In contrast, most compilers of the time were organized
+into only a few monolithic passes for reasons of compile-time
+efficiency.)  Dybvig, with later help from his students Dipanwita
+Sarkar and Andrew Keep, developed infrastructure to support this
+approach and evolved the course, first to use smaller micro-passes and
+then into even smaller
+nano-passes~\citep{Sarkar:2004fk,Keep:2012aa}. I was a student in this
+compiler course in the early 2000's as part of his Ph.D. studies at
+Indiana University. Needless to say, I enjoyed the course immensely!
+
+During that time, another graduate student named Abdulaziz Ghuloum
+observed that the front-to-back organization of the course made it
+difficult for students to understand the rationale for the compiler
+design. Ghuloum proposed an incremental approach in which the students
+build the compiler in stages; they start by implementing a complete
+compiler for a very small subset of the input language and in each
+subsequent stage they add a language feature and add or modify passes
+to handle the new feature~\citep{Ghuloum:2006bh}.  In this way, the
+students see how the language features motivate aspects of the
 compiler design.
 
-After graduating from Indiana University in 2005, Jeremy went on to
-teach at the University of Colorado. He adapted the nano pass and
-incremental approaches to compiling a subset of the Python
+After graduating from Indiana University in 2005, I went on to teach
+at the University of Colorado. I adapted the nano-pass and incremental
+approaches to compiling a subset of the Python
 language~\citep{Siek:2012ab}.  Python and Scheme are quite different
 on the surface but there is a large overlap in the compiler techniques
-required for the two languages. Thus, Jeremy was able to teach much of
-the same content from the Indiana compiler course. He very much
-enjoyed teaching the course organized in this way, and even better,
-many of the students learned a lot and got excited about compilers.
-
-Jeremy returned to teach at Indiana University in 2013.  In his
-absence the compiler course had switched from the front-to-back
-organization to a back-to-front organization. Seeing how well the
-incremental approach worked at Colorado, he started porting and
-adapting the structure of the Colorado course back into the land of
-Scheme. In the meantime Indiana had moved on from Scheme to Racket, so
-the course is now about compiling a subset of Racket (and Typed
-Racket) to the x86 assembly language. The compiler is implemented in
-Racket 7.1~\citep{plt-tr}.
+required for the two languages. Thus, I was able to teach much of the
+same content from the Indiana compiler course. I very much enjoyed
+teaching the course organized in this way, and even better, many of
+the students learned a lot and got excited about compilers.
+
+I returned to teach at Indiana University in 2013.  In my absence the
+compiler course had switched from the front-to-back organization to a
+back-to-front organization. Seeing how well the incremental approach
+worked at Colorado, I started porting and adapting the structure of
+the Colorado course back into the land of Scheme. In the meantime
+Indiana University had moved on from Scheme to Racket, so the course
+is now about compiling a subset of Racket (and Typed Racket) to the
+x86 assembly language. The compiler is implemented in
+Racket~\citep{plt-tr}.
 
 This is the textbook for the incremental version of the compiler
 course at Indiana University (Spring 2016 - present) and it is the
-first open textbook for an Indiana compiler course.  With this book we
+first open textbook for an Indiana compiler course.  With this book I
 hope to make the Indiana compiler course available to people that have
-not had the chance to study in Bloomington in person.  Many of the
-compiler design decisions in this book are drawn from the assignment
-descriptions of \cite{Dybvig:2010aa}. We have captured what we think
-are the most important topics from \cite{Dybvig:2010aa} but we have
-omitted topics that we think are less interesting conceptually and we
-have made simplifications to reduce complexity.  In this way, this
+not had the chance to study compilers at Indiana University.  Many of
+the compiler design decisions in this book are drawn from the
+assignment descriptions of \cite{Dybvig:2010aa}. I have captured what
+I think are the most important topics from \cite{Dybvig:2010aa} but
+have omitted topics that are less interesting conceptually. I have
+also made simplifications to reduce complexity.  In this way, this
 book leans more towards pedagogy than towards the efficiency of the
-generated code. Also, the book differs in places where we saw the
+generated code. Also, the book differs in places where we I the
 opportunity to make the topics more fun, such as in relating register
 allocation to Sudoku (Chapter~\ref{ch:register-allocation-r1}).
 
@@ -255,10 +255,22 @@ proficient with Racket (or Scheme) prior to reading this book. There
 are many excellent resources for learning Scheme and
 Racket~\citep{Dybvig:1987aa,Abelson:1996uq,Friedman:1996aa,Felleisen:2001aa,Felleisen:2013aa,Flatt:2014aa}. It
 is helpful but not necessary for the student to have prior exposure to
-the x86 (or x86-64) assembly language~\citep{Intel:2015aa}, as one might
-obtain from a computer systems
-course~\citep{Bryant:2005aa,Bryant:2010aa}.  This book introduces the
+the x86 (or x86-64) assembly language~\citep{Intel:2015aa}, as one
+might obtain from a computer systems
+course~\citep{Bryant:2005aa,Bryant:2010aa}. This book introduces the
 parts of x86-64 assembly language that are needed.
+%
+We follow the System V calling conventions~\citep{Matz:2013aa}, which
+means that the assembly code that we generate will work properly with
+our runtime system (written in C) when it is compiled using the GNU C
+compiler (\code{gcc}) on Linux or MacOS. (Minor adjustments are needed
+for MacOS, which we note as they arise.)
+%
+The Microsoft Windows operating system uses a different calling
+convention~\citep{Microsoft:2018aa}, which is followed by the GNU C
+compiler when running on Windows. So the assembly code that we
+generate will \emph{not} work on Windows.
+
 
 %\section*{Structure of book}
 % You might want to add short description about each chapter in this book.
@@ -1353,7 +1365,7 @@ x86_0 &::= & \key{.globl main}\\
 \]
 \end{minipage}
 }
-\caption{The concrete syntax of the x86$_0$ assembly language (AT\&T syntax).}
+\caption{The syntax of the x86$_0$ assembly language (AT\&T syntax).}
 \label{fig:x86-0-concrete}
 \end{figure}
 
@@ -5937,8 +5949,10 @@ An implementation of the copying collector is provided in the
 interface to the garbage collector that is used by the compiler. The
 \code{initialize} function creates the FromSpace, ToSpace, and root
 stack and should be called in the prelude of the \code{main}
-function. The \code{initialize} function puts the address of the
-beginning of the FromSpace into the global variable
+function. The arguments of \code{initialize} are the root stack size
+and the heap size. Both need to be multiples of $64$ and $16384$ is a
+good choice for both.  The \code{initialize} function puts the address
+of the beginning of the FromSpace into the global variable
 \code{free\_ptr}. The global variable \code{fromspace\_end} points to
 the address that is 1-past the last element of the FromSpace. (We use
 half-open intervals to represent chunks of
@@ -6625,7 +6639,7 @@ main:
 	pushq	%r14
 	subq	$0, %rsp
 	movq $16384, %rdi
-	movq $16, %rsi
+	movq $16384, %rsi
 	callq initialize
 	movq rootstack_begin(%rip), %r15
 	movq $0, (%r15)
@@ -6917,7 +6931,7 @@ inside each other.
     (\key{vector-ref}\;\Exp\;\Int)} \\
   &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
       \mid \LP\key{has-type}~\Exp~\Type\RP } \\
-      &\mid& \LP\Exp \; \Exp \ldots\RP \\
+  &\mid& \LP\Exp \; \Exp \ldots\RP \\
   \Def &::=& \CDEF{\Var}{\LS\Var \key{:} \Type\RS \ldots}{\Type}{\Exp} \\
   R_4 &::=& \Def \ldots \; \Exp
 \end{array}
@@ -8128,7 +8142,8 @@ syntax for function application.
     &\mid& \gray{ (\key{vector}\;\Exp\ldots) \mid
           (\key{vector-ref}\;\Exp\;\Int)} \\
     &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
-     \mid (\Exp \; \Exp\ldots) } \\
+    \mid (\Exp \; \Exp\ldots) } \\
+    &\mid& \LP \key{procedure-arity}~\Exp\RP \\
     &\mid& \CLAMBDA{\LP\LS\Var \key{:} \Type\RS\ldots\RP}{\Type}{\Exp} \\
   \Def &::=& \gray{ \CDEF{\Var}{\LS\Var \key{:} \Type\RS\ldots}{\Type}{\Exp} } \\
   R_5 &::=& \gray{\Def\ldots \; \Exp}
@@ -8148,14 +8163,15 @@ syntax for function application.
     \small
 \[
 \begin{array}{lcl}
+  \itm{op} &::=& \ldots \mid \code{procedure-arity} \\
   \Exp &::=& \gray{ \INT{\Int} \VAR{\Var} \mid \LET{\Var}{\Exp}{\Exp} } \\
        &\mid& \gray{ \PRIM{\itm{op}}{\Exp\ldots} }\\
      &\mid& \gray{ \BOOL{\itm{bool}}
       \mid \IF{\Exp}{\Exp}{\Exp} } \\
      &\mid& \gray{ \VOID{} \mid \LP\key{HasType}~\Exp~\Type \RP 
      \mid \APPLY{\Exp}{\Exp\ldots} }\\
-     &\mid& \LAMBDA{\LP[\Var\code{:}\Type]\ldots\RP}{\Type}{\Exp}\\
- \Def &::=& \gray{ \FUNDEF{\Var}{\LP[\Var \code{:} \Type]\ldots\RP}{\Type}{\code{'()}}{\Exp} }\\
+     &\mid& \LAMBDA{\LP\LS\Var\code{:}\Type\RS\ldots\RP}{\Type}{\Exp}\\
+ \Def &::=& \gray{ \FUNDEF{\Var}{\LP\LS\Var \code{:} \Type\RS\ldots\RP}{\Type}{\code{'()}}{\Exp} }\\
   R_5 &::=& \gray{ \PROGRAMDEFSEXP{\code{'()}}{\LP\Def\ldots\RP}{\Exp} }
 \end{array}
 \]
@@ -8178,23 +8194,7 @@ values.
 
 \begin{figure}[tbp]
 \begin{lstlisting}
-(define (interp-exp env)
-  (lambda (e)
-    (define recur (interp-exp env))
-    (match e
-      ...
-      [(Lambda (list `[,xs : ,Ts] ...) rT body)
-       `(lambda ,xs ,body ,env)]
-      [(Apply fun args)
-       (define fun-val ((interp-exp env) fun))
-       (define arg-vals (map (interp-exp env) args))
-       (match fun-val
-	 [`(lambda ,xs ,body ,lam-env)
-	  (define new-env (append (map cons xs arg-vals) lam-env))
-	  ((interp-exp new-env) body)]
-	 [else (error "interp-exp, expected function, not" fun-val)])]
-      [else (error 'interp-exp "unrecognized expression")]
-      )))
+UPDATE ME
 \end{lstlisting}
 \caption{Interpreter for $R_5$.}
 \label{fig:interp-R5}
@@ -8215,13 +8215,13 @@ require the body's type to match the declared return type.
 (define (type-check-R5 env)
   (lambda (e)
     (match e
-      [(Lambda (and bnd `([,xs : ,Ts] ...)) rT body)
+      [(Lambda (and params `([,xs : ,Ts] ...)) rT body)
        (define-values (new-body bodyT) 
           ((type-check-exp (append (map cons xs Ts) env)) body))
        (define ty `(,@Ts -> ,rT))
        (cond
          [(equal? rT bodyT)
-           (values (HasType (Lambda bnd rT new-body) ty) ty)]
+           (values (HasType (Lambda params rT new-body) ty) ty)]
          [else
            (error "mismatch in return type" bodyT rT)])]
       ...
@@ -8875,14 +8875,16 @@ an explicit \code{If} expression that uses two new forms,
 \code{tag-of-any}.  The \code{tag-of-any} operation retrieves the type
 tag from a tagged value of type \code{Any}.  The \code{ValueOf} form
 retrieves the underlying value from a tagged value.  The
-\code{ValueOf} form includes the type for the underlying value, which
-is needed by the type checker.  Finally, the \code{Exit} form ends the
-execution of the program by invoking the operating system's
-\code{exit} function. So the translation for \code{Project} is as
-follows.
+\code{ValueOf} form includes the type for the underlying value which
+is used by the type checker.  Finally, the \code{Exit} form ends the
+execution of the program.
+%
+If the target type of the projection is \code{Boolean} or
+\code{Integer}, then \code{Project} can be translated as follows.
 %(We have omitted the \code{has-type} AST nodes to make this
 %output more readable.)
-
+\begin{center}
+\begin{minipage}{1.0\textwidth}
 \begin{lstlisting}
 (Project |$e$| |$\FType$|)
 |$\Rightarrow$|
@@ -8892,20 +8894,36 @@ follows.
       (ValueOf |$\itm{tmp}$| |$\FType$|)
       (Exit)))
 \end{lstlisting}
+\end{minipage}
+\end{center}
+If the target type of the projection is a vector or function type,
+then there is a bit more work to do. For vectors, check that the
+length of the vector (use the \code{vector-length} primitive) matches
+the length of the vector type. For functions, check that its arity
+(\code{procedure-arity}) matches the number of parameters in the
+function type.
 
 Regarding \code{Inject}, we recommend compiling it to a slightly
 lower-level primitive operation named \code{make-any}. This operation
-takes the tag instead of the type of the injected value.
-
+takes a tag instead of a type. \\
+\begin{center}
+\begin{minipage}{1.0\textwidth}
 \begin{lstlisting}
 (Inject |$e$| |$\FType$|)
 |$\Rightarrow$|
 (Prim 'make-any (list |$e'$| (Int |$\itm{tagof}(\FType)$|)))
 \end{lstlisting}
+\end{minipage}
+\end{center}
 
 We recommend translating the type predicates (\code{boolean?}, etc.)
 into uses of \code{tag-of-any} and \code{eq?}.
 
+\section{Closure Conversion for $R_6$}
+\label{sec:closure-conversion-R6}
+
+
+
 \section{Instruction Selection for $R_6$}
 \label{sec:select-r6}