Explorar el Código

progress on functions

Jeremy Siek hace 9 años
padre
commit
69e0923653
Se han modificado 1 ficheros con 94 adiciones y 54 borrados
  1. 94 54
      book.tex

+ 94 - 54
book.tex

@@ -975,11 +975,12 @@ main:
 	addq	$32, %rax
 	addq	$32, %rax
 	retq
 	retq
 \end{lstlisting}
 \end{lstlisting}
-\caption{An x86-64 program equivalent to $\BINOP{+}{10}{32}$.}
+\caption{\it An x86-64 program equivalent to $\BINOP{+}{10}{32}$.}
 \label{fig:p0-x86}
 \label{fig:p0-x86}
 \end{wrapfigure}
 \end{wrapfigure}
-\marginpar{Consider using italics for the texts in these figures.
-  It can get confusing to differentiate them from the main text.}
+%% \marginpar{Consider using italics for the texts in these figures.
+%%   It can get confusing to differentiate them from the main text.}
+%% It looks pretty ugly in italics.-Jeremy
 
 
 Figure~\ref{fig:p0-x86} depicts an x86-64 program that is equivalent
 Figure~\ref{fig:p0-x86} depicts an x86-64 program that is equivalent
 to \code{(+ 10 32)}. The \key{globl} directive says that the
 to \code{(+ 10 32)}. The \key{globl} directive says that the
@@ -1016,10 +1017,11 @@ main:
 \label{fig:p1-x86}
 \label{fig:p1-x86}
 \end{wrapfigure}
 \end{wrapfigure}
 
 
-Unfortunately, correct x86-64 varies in some ways depending on what operating system
-it is assembled in. The code examples shown here are correct on the Unix platform,
-but when assembled on Mac OSX, labels like \key{main} must be prepended by an underscore.
-So the correct output for the above program on Mac would begin with:
+Unfortunately, x86-64 varies in a couple ways depending on what
+operating system it is assembled in. The code examples shown here are
+correct on the Unix platform, but when assembled on Mac OSX, labels
+like \key{main} must be prepended by an underscore.  So the correct
+output for the above program on Mac would begin with:
 
 
 \begin{lstlisting}
 \begin{lstlisting}
 	.globl _main
 	.globl _main
@@ -1095,11 +1097,15 @@ pointer.
 
 
 The compiler will need a convenient representation for manipulating
 The compiler will need a convenient representation for manipulating
 x86 programs, so we define an abstract syntax for x86 in
 x86 programs, so we define an abstract syntax for x86 in
-Figure~\ref{fig:x86-ast-a}. The \itm{info} field of the \key{program}
-AST node is for storing auxiliary information that needs to be
-communicated from one step of the compiler to the next. 
-\marginpar{Consider mentioning PseudoX86, since I think that's what
-  you actually are referring to.}
+Figure~\ref{fig:x86-ast-a}. The $\Int$ field of the \key{program} AST
+node is number of bytes of stack space needed for variables in the
+program. (Some of the intermediate languages will store other
+information in that location for the purposes of communicating
+auxilliary data from one step of the compiler to the next. )
+%% \marginpar{Consider mentioning PseudoX86, since I think that's what
+%%   you actually are referring to.}
+%% Not here. PseudoX86 is the language with variables and
+%% instructions that don't obey the x86 rules. -Jeremy
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \fbox{
 \fbox{
@@ -1116,7 +1122,7 @@ communicated from one step of the compiler to the next.
              (\key{pushq}\;\Arg) \mid 
              (\key{pushq}\;\Arg) \mid 
              (\key{popq}\;\Arg) \mid 
              (\key{popq}\;\Arg) \mid 
              (\key{retq}) \\
              (\key{retq}) \\
-x86_0 &::= & (\key{program} \;\itm{info} \; \Instr^{+})
+x86_0 &::= & (\key{program} \;\Int \; \Instr^{+})
 \end{array}
 \end{array}
 \]
 \]
 \end{minipage}
 \end{minipage}
@@ -1415,15 +1421,10 @@ programs should include \key{let} constructs, variables, and variables
 that overshadow each other.  The five programs should be in a
 that overshadow each other.  The five programs should be in a
 subdirectory named \key{tests} and they should have the same file name
 subdirectory named \key{tests} and they should have the same file name
 except for a different integer at the end of the name, followed by the
 except for a different integer at the end of the name, followed by the
-ending \key{.scm}.  Use the \key{interp-tests} function
+ending \key{.rkt}.  Use the \key{interp-tests} function
 (Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
 (Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
 your \key{uniquify} pass on the example programs.
 your \key{uniquify} pass on the example programs.
-\marginpar{Tests should be {\tt .scm} files or {\tt .rkt} files?}
 
 
-%% You can use the interpreter \key{interpret-S0} defined in the
-%% \key{interp.rkt} file. The entire sequence of tests should be a short
-%% Racket program so you can re-run all the tests by running the Racket
-%% program. We refer to this as the \emph{regression test} program.
 \end{exercise}
 \end{exercise}
 
 
 
 
@@ -1728,7 +1729,11 @@ your passes on the example programs.
 
 
 \section{Print x86-64}
 \section{Print x86-64}
 \label{sec:print-x86}
 \label{sec:print-x86}
-\marginpar{The input isn't quite x86-64 right? It's PseudoX86.}
+%\marginpar{The input isn't quite x86-64 right? It's PseudoX86.}
+% No, it really is x86-64 at this point because all the
+% variables should be gone and the patch-instructions pass
+% has made sure that all the instructions follow the
+% x86-64 rules. -Jeremy
 The last step of the compiler from $R_1$ to x86-64 is to convert the
 The last step of the compiler from $R_1$ to x86-64 is to convert the
 x86-64 AST (defined in Figure~\ref{fig:x86-ast-a}) to the string
 x86-64 AST (defined in Figure~\ref{fig:x86-ast-a}) to the string
 representation (defined in Figure~\ref{fig:x86-a}). The Racket
 representation (defined in Figure~\ref{fig:x86-a}). The Racket
@@ -3069,14 +3074,14 @@ the element at index $0$ of the 1-tuple.
 \chapter{Functions}
 \chapter{Functions}
 \label{ch:functions}
 \label{ch:functions}
 
 
-This chapter studies the compilation of functions (aka. procedures) as
-they appear in the C language. The syntax for function definitions and
-function application (aka. function call) is shown in
+This chapter studies the compilation of functions (aka. procedures) at
+the level of abstraction of the C language. The syntax for function
+definitions and function application (aka. function call) is shown in
 Figure~\ref{fig:r4-syntax}, where we define the $R_4$ language.
 Figure~\ref{fig:r4-syntax}, where we define the $R_4$ language.
-Programs now start with zero or more function definitions.  The
+Programs in $R_4$ start with zero or more function definitions.  The
 function names from these definitions are in-scope for the entire
 function names from these definitions are in-scope for the entire
 program, including all other function definitions (so the ordering of
 program, including all other function definitions (so the ordering of
-function definitions does not matter). 
+function definitions does not matter).
 
 
 Functions are first-class in the sense that a function pointer is data
 Functions are first-class in the sense that a function pointer is data
 and can be stored in memory or passed as a parameter to another
 and can be stored in memory or passed as a parameter to another
@@ -3086,11 +3091,11 @@ function.  Thus, we introduce a function type, written
 \end{lstlisting}
 \end{lstlisting}
 for a function whose $n$ parameters have the types $\Type_1$ through
 for a function whose $n$ parameters have the types $\Type_1$ through
 $\Type_n$ and whose return type is $\Type_r$. The main limitation of
 $\Type_n$ and whose return type is $\Type_r$. The main limitation of
-these functions is that they are not lexically scoped. That is, the
-only external entities that can be referenced from inside a function
-body are other globally-defined functions. The syntax of $R_4$
-prevents functions from being nested inside each other; they can only
-be defined at the top level.
+these functions (with respect to Racket functions) is that they are
+not lexically scoped. That is, the only external entities that can be
+referenced from inside a function body are other globally-defined
+functions. The syntax of $R_4$ prevents functions from being nested
+inside each other; they can only be defined at the top level.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \centering
 \centering
@@ -3111,26 +3116,25 @@ be defined at the top level.
 \label{fig:r4-syntax}
 \label{fig:r4-syntax}
 \end{figure}
 \end{figure}
 
 
-The program in Figure~\ref{fig:r4-function-example} shows a
-representative example of definition and using functions in $R_4$.  We
-define a function \code{map} that applies some other function \code{f}
-to both elements of a 2-tuple and returns a new 2-tuple containing the
-results. We also define a function \code{add1} that does what its name
-suggests. The program then applies \code{map} to \code{add1} and
-\code{(vector 0 41)}.  The result is \code{(vector 1 42)}, from which
-we return the \code{42}.
+The program in Figure~\ref{fig:r4-function-example} is a
+representative example of defining and using functions in $R_4$.  We
+define a function \code{map-vec} that applies some other function
+\code{f} to both elements of a vector (a 2-tuple) and returns a new
+vector containing the results. We also define a function \code{add1}
+that does what its name suggests. The program then applies
+\code{map-vec} to \code{add1} and \code{(vector 0 41)}.  The result is
+\code{(vector 1 42)}, from which we return the \code{42}.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \begin{lstlisting}
 \begin{lstlisting}
 (program
 (program
-  (define (map [f : (Integer -> Integer)]
-                [v : (Vector Integer Integer)])
-    : (Vector Integer Integer)
-      (vector (f (vector-ref v 0)) 
-               (f (vector-ref v 1))))
+  (define (map-vec [f : (Integer -> Integer)]
+                     [v : (Vector Integer Integer)])
+          : (Vector Integer Integer)
+    (vector (f (vector-ref v 0)) (f (vector-ref v 1))))
   (define (add1 [x : Integer]) : Integer
   (define (add1 [x : Integer]) : Integer
-      (+ x 1))
-  (vector-ref (map add1 (vector 0 41)) 1)
+    (+ x 1))
+  (vector-ref (map-vec add1 (vector 0 41)) 1)
   )
   )
 \end{lstlisting}
 \end{lstlisting}
 \caption{Example of using functions in $R_4$.}
 \caption{Example of using functions in $R_4$.}
@@ -3138,7 +3142,7 @@ we return the \code{42}.
 \end{figure}
 \end{figure}
 
 
 
 
-
+\marginpar{\scriptsize to do: interpreter for $R_4$. \\ --Jeremy}
 
 
 \section{Functions in x86}
 \section{Functions in x86}
 
 
@@ -3185,12 +3189,22 @@ callee is responsible for saving and restoring some other registers,
 the \emph{callee save registers}, before and after using them. The
 the \emph{callee save registers}, before and after using them. The
 caller save registers are
 caller save registers are
 \begin{lstlisting}
 \begin{lstlisting}
-rdx rcx rsi rdi r8 r9 r10 r11
+  rax rdx rcx rsi rdi r8 r9 r10 r11
 \end{lstlisting}
 \end{lstlisting}
 while the callee save registers are 
 while the callee save registers are 
 \begin{lstlisting}
 \begin{lstlisting}
-rsp rbp rbx r12 r13 r14 r15
+  rsp rbp rbx r12 r13 r14 r15
 \end{lstlisting}
 \end{lstlisting}
+Another way to think about this caller/callee convention is the
+following. The caller should assume that all the caller save registers
+get overwritten with arbitrary values by the callee.  On the other
+hand, the caller can safely assume that all the callee save registers
+contain the same values after the call that they did before the call.
+The callee can freely use any of the caller save registers.  However,
+if the callee wants to use a callee save register, the callee must
+arrange to put the original value back in the register prior to
+returning to the caller, which is usually accomplished by saving and
+restoring the value from the stack.
 
 
 Recall from Section~\ref{sec:x86-64} that the stack is also used for
 Recall from Section~\ref{sec:x86-64} that the stack is also used for
 local variables, and that at the beginning of a function we move the
 local variables, and that at the beginning of a function we move the
@@ -3211,7 +3225,7 @@ address of the second, and so on. Figure~\ref{fig:call-frames} shows
 the layout of the caller and callee frames. Notice how important it is
 the layout of the caller and callee frames. Notice how important it is
 that we correctly compute the maximum number of arguments needed for
 that we correctly compute the maximum number of arguments needed for
 function calls; if that number is too small then the arguments and
 function calls; if that number is too small then the arguments and
-local variables will overlap in memory!
+local variables will smash into each other!
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \centering
 \centering
@@ -3238,14 +3252,40 @@ $8n-8$\key{(\%rsp)} & $8n+8$(\key{\%rbp})& argument $n$ \\
 \end{figure}
 \end{figure}
 
 
 
 
+\section{Planning the compilation of functions}
+
+Now that we have a good understanding of functions as they appear in
+$R_4$ and the support for functions in x86-64, we need to plan the
+changes to our compiler, that is, do we need any new passes and/or do
+we need to change any existing passes? Also, do we need to add new
+kinds of AST nodes to any of the intermediate languages?
+
+To begin with, the syntax of $R_4$ is inconvenient for purposes of
+compilation because it conflates the use of function names and local
+variables and it conflates the application of primitive operations and
+the application of functions. This is a problem because we need to
+compile the use of a function name differently than the use of a local
+variable; we need to use \code{leaq} to move the function name to a
+register. Similarly, the application of a function is going to require
+a complex sequence of instructions, unlike the primitive
+operations. Thus, it is a good idea to create a new pass that changes
+references to function names from just a symbol $f$ to
+\code{(function-ref $f$)} and that changes function application from
+\code{($e_0$ $e_1$ $\ldots$ $e_n$)} to the explicitly tagged
+\code{(app $e_0$ $e_1$ $\ldots$ $e_n$)}. A good name for this pass is
+\code{reveal-functions}. Placing this pass after \code{uniquify} is a
+good idea, because it will make sure that there are no local variables
+and functions that share the same name. On the other hand,
+\code{reveal-functions} needs to come before the \code{flatten} pass
+because \code{flatten} will help us compiler \code{function-ref}.
+
+Because each \code{function-ref} needs to eventually become an
+\code{leaq} instruction, it needs to become an assignment
+statement. This can be handled easily in the \code{flatten} pass by
+categorizing \code{function-ref} as a complex expression.
 
 
 
 
 
 
-
-% reveal-functions
-%   * differentiate variables and function names
-%   * differentiate primitive operations and function application
-%
 % flatten
 % flatten
 %   * function-ref not simple, why? have to use the leaq instruction
 %   * function-ref not simple, why? have to use the leaq instruction
 %       to put the function label in to a register.
 %       to put the function label in to a register.