Browse Source

first draft of function chapter

Jeremy Siek 9 years ago
parent
commit
0b8ca5bfee
1 changed files with 202 additions and 44 deletions
  1. 202 44
      book.tex

+ 202 - 44
book.tex

@@ -3145,6 +3145,7 @@ that does what its name suggests. The program then applies
 \marginpar{\scriptsize to do: interpreter for $R_4$. \\ --Jeremy}
 \marginpar{\scriptsize to do: interpreter for $R_4$. \\ --Jeremy}
 
 
 \section{Functions in x86}
 \section{Functions in x86}
+\label{sec:fun-x86}
 
 
 The x86 architecture provides a few features to support the
 The x86 architecture provides a few features to support the
 implementation of functions. We have already seen that x86 provides
 implementation of functions. We have already seen that x86 provides
@@ -3252,7 +3253,7 @@ $8n-8$\key{(\%rsp)} & $8n+8$(\key{\%rbp})& argument $n$ \\
 \end{figure}
 \end{figure}
 
 
 
 
-\section{Planning the compilation of functions}
+\section{The compilation of functions}
 
 
 Now that we have a good understanding of functions as they appear in
 Now that we have a good understanding of functions as they appear in
 $R_4$ and the support for functions in x86-64, we need to plan the
 $R_4$ and the support for functions in x86-64, we need to plan the
@@ -3269,10 +3270,10 @@ variable; we need to use \code{leaq} to move the function name to a
 register. Similarly, the application of a function is going to require
 register. Similarly, the application of a function is going to require
 a complex sequence of instructions, unlike the primitive
 a complex sequence of instructions, unlike the primitive
 operations. Thus, it is a good idea to create a new pass that changes
 operations. Thus, it is a good idea to create a new pass that changes
-references to function names from just a symbol $f$ to
-\code{(function-ref $f$)} and that changes function application from
-\code{($e_0$ $e_1$ $\ldots$ $e_n$)} to the explicitly tagged
-\code{(app $e_0$ $e_1$ $\ldots$ $e_n$)}. A good name for this pass is
+function references from just a symbol $f$ to \code{(function-ref
+  $f$)} and that changes function application from \code{($e_0$ $e_1$
+  $\ldots$ $e_n$)} to the explicitly tagged AST \code{(app $e_0$ $e_1$
+  $\ldots$ $e_n$)}. A good name for this pass is
 \code{reveal-functions}. Placing this pass after \code{uniquify} is a
 \code{reveal-functions}. Placing this pass after \code{uniquify} is a
 good idea, because it will make sure that there are no local variables
 good idea, because it will make sure that there are no local variables
 and functions that share the same name. On the other hand,
 and functions that share the same name. On the other hand,
@@ -3280,48 +3281,205 @@ and functions that share the same name. On the other hand,
 because \code{flatten} will help us compiler \code{function-ref}.
 because \code{flatten} will help us compiler \code{function-ref}.
 
 
 Because each \code{function-ref} needs to eventually become an
 Because each \code{function-ref} needs to eventually become an
-\code{leaq} instruction, it needs to become an assignment
-statement. This can be handled easily in the \code{flatten} pass by
-categorizing \code{function-ref} as a complex expression.
+\code{leaq} instruction, it first needs to become an assignment
+statement so there is a left-hand side in which to put the
+result. This can be handled easily in the \code{flatten} pass by
+categorizing \code{function-ref} as a complex expression.  Then, in
+the \code{select-instructions} pass, an assignment of
+\code{function-ref} becomes a \code{leaq} instruction as follows: \\
+\begin{tabular}{lll}
+\begin{minipage}{0.45\textwidth}
+\begin{lstlisting}
+  (assign |$\itm{lhs}$| (function-ref |$f$|))
+\end{lstlisting}
+\end{minipage}
+&
+$\Rightarrow$
+&
+\begin{minipage}{0.4\textwidth}
+\begin{lstlisting}
+(leaq (function-ref |$f$|) |$\itm{lhs}$|)
+\end{lstlisting}
+\end{minipage}
+\end{tabular} 
 
 
+Next we consider compiling function definitions.  The \code{flatten}
+pass should handle function definitions a lot like a \code{program}
+node; after all, the \code{program} node represents the \code{main}
+function. So the \code{flatten} pass, in addition to flattening the
+body of the function into a sequence of statements, should record the
+local variables in the $\Var^{*}$ field as shown below.
+\begin{lstlisting}
+   (define (|$f$| [|\itm{xs}| : |\itm{ts}|]|$^{*}$|) : |\itm{rt}| (|$\Var^{*}$|) |$\Stmt^{+}$|)
+\end{lstlisting}
+In the \code{select-instructions} pass, we need to encode the
+parameter passing in terms of the conventions discussed in
+Section~\ref{sec:fun-x86}. So depending on the length of the parameter
+list \itm{xs}, some of them may be in registers and some of them may
+be on the stack. I recommend generating \code{movq} instructions to
+move the parameters from their registers and stack locations into the
+variables \itm{xs}, then let register allocation handle the assignment
+of those variables to homes. After this pass, the \itm{xs} can be
+added to the list of local variables. As mentioned in
+Section~\ref{sec:fun-x86}, we need to find out how far to move the
+stack pointer to ensure we have enough space for stack arguments in
+all the calls inside the body of this function. This pass is a good
+place to do this and store the result in the \itm{maxStack} field of
+the output \code{define} shown below.
+\begin{lstlisting}
+  (define (|$f$|) |\itm{numParams}| (|$\Var^{*}$| |\itm{maxStack}|) |$\Instr^{+}$|)
+\end{lstlisting}
+
+Next, consider the compilation of function applications, which have
+the following form at the start of \code{select-instructions}.
+\begin{lstlisting}
+  (assign |\itm{lhs}| (app |\itm{fun}| |\itm{args}| |$\ldots$|))
+\end{lstlisting}
+In the mirror image of handling the parameters of function
+definitions, some of the arguments \itm{args} need to be moved to the
+argument passing registers and the rest should be moved to the
+appropriate stack locations, as discussed in
+Section~\ref{sec:fun-x86}. You might want to introduce a new kind of
+AST node for stack arguments, \code{(stack-arg $i$)} where $i$ is the
+index of this argument with respect to the other stack arguments. As
+you're generate this code for parameter passing, take note of how many
+stack arguments are needed for purposes of computing the
+\itm{maxStack} discussed above.
+
+Once the instructions for parameter passing have been generated, the
+function call itself can be performed with an indirect function call,
+for which I recommend creating the new instruction
+\code{indirect-callq}. Of course, the return value from the function
+is stored in \code{rax}, so it needs to be moved into the \itm{lhs}.
+\begin{lstlisting}
+  (indirect-callq |\itm{fun}|)
+  (movq (reg rax) |\itm{lhs}|)
+\end{lstlisting}
+
+The rest of the passes need only minor modifications to handle the new
+kinds of AST nodes: \code{function-ref}, \code{indirect-callq}, and
+\code{leaq}. Inside \code{uncover-live}, when computing the $W$ set
+(written variables) for an \code{indirect-callq} instruction, I
+recommend including all the caller save registers, which will have the
+affect of making sure that no caller save register actually need to be
+saved. In \code{patch-instructions}, you should deal with the x86
+idiosyncracy that the destination argument of \code{leaq} must be a
+register.
+
+For the \code{print-x86} pass, I recommend the following translations:
+\begin{lstlisting}
+  (function-ref |\itm{label}|) |$\Rightarrow$| |\itm{label}|(%rip)
+  (indirect-callq |\itm{arg}|) |$\Rightarrow$| callq *|\itm{arg}|
+  (stack-arg |$i$|) |$\Rightarrow$| |$i$|(%rsp)
+\end{lstlisting}
+For function definitions, the \code{print-x86} pass should add the
+code for saving and restoring the callee save registers, if you
+haven't already done that.
 
 
+\section{An Example Translation}
+
+Figure~\ref{fig:add-fun} shows an example translation of a simple
+function in $R_4$ to x86-64. The figure includes the results of the
+\code{flatten} and \code{select-instructions} passes.  Can you see any
+obvious ways to improve the translation?
+
+\begin{figure}[tbp]
+\begin{tabular}{lll}
+\begin{minipage}{0.5\textwidth}
+\begin{lstlisting}
+(program
+ (define (add [x : Integer] 
+                [y : Integer]) 
+    : Integer (+ x y))
+ (add 40 2))
+\end{lstlisting}
+$\Downarrow$
+\begin{lstlisting}
+(program (t.1 t.2)
+  ((define (add.1 [x.1 : Integer] 
+                    [y.1 : Integer])
+     : Integer (t.3)
+     (assign t.3 (+ x.1 y.1))
+     (return t.3)))
+  (assign t.1 (function-ref add.1))
+  (assign t.2 (app t.1 40 2))
+  (return t.2))
+\end{lstlisting}
+$\Downarrow$
+\begin{lstlisting}
+(program ((t.1 t.2) 0)
+  ((define (add.1) 2 ((x.1 y.1 t.3) 0)
+     (movq (reg rdi) (var x.1))
+     (movq (reg rsi) (var y.1))
+     (movq (var x.1) (var t.3))
+     (addq (var y.1) (var t.3))
+     (movq (var t.3) (reg rax))))
+  (leaq (function-ref add.1) (var t.1))
+  (movq (int 40) (reg rdi))
+  (movq (int 2) (reg rsi))
+  (indirect-callq (var t.1))
+  (movq (reg rax) (var t.2))
+  (movq (var t.2) (reg rax)))
+\end{lstlisting}
+\end{minipage}
+&
+\begin{minipage}{0.4\textwidth}
+$\Downarrow$
+\begin{lstlisting}
+	.globl add_1
+add_1:
+	pushq	%rbp
+	movq	%rsp, %rbp
+	pushq	%r15
+	pushq	%r14
+	pushq	%r13
+	pushq	%r12
+	pushq	%rbx
+	subq	$16, %rsp
+	movq	%rdi, %rbx
+	movq	%rsi, %rcx
+	addq	%rcx, %rbx
+	movq	%rbx, %rax
+	addq	$16, %rsp
+	popq	%rbx
+	popq	%r12
+	popq	%r13
+	popq	%r14
+	popq	%r15
+	popq	%rbp
+	retq
+
+	.globl _main
+_main:
+	pushq	%rbp
+	movq	%rsp, %rbp
+	subq	$16, %rsp
+	leaq	add_1(%rip), %rbx
+	movq	$40, %rdi
+	movq	$2, %rsi
+	callq	*%rbx
+	movq	%rax, %rbx
+	movq	%rbx, %rax
+	addq	$16, %rsp
+	popq	%rbp
+	retq
+\end{lstlisting}
+\end{minipage}
+\end{tabular} 
+\caption{Example compilation of a simple function to x86-64.}
+\label{fig:add-fun}
+\end{figure}
+
+
+
+\begin{exercise}\normalfont
+Expand your compiler to handle $R_4$ as outlined in this section.
+Create 5 new programs that use functions, including examples that pass
+functions and return functions from other functions, and test your
+compiler on these new programs and all of your previously created test
+programs.
+\end{exercise}
 
 
-% flatten
-%   * function-ref not simple, why? have to use the leaq instruction
-%       to put the function label in to a register.
-%
-% select-instructions
-%   * function defs. deal with parameters
-%   * (assign lhs (function-ref f)) => (leaq (function-ref f) lhs)
-%   * (assign lhs (app f es ...))
-%     - pass some args in registers, rest on the stack (stack-arg)
-%     - need to keep track of how large the stack needs to grow across
-%       all the function calls in the body of a function
-%     - indirect-callq f; movq rax lhs
-%
-% uncover-live
-%   * free-vars: function-ref, stack-arg
-%   * read-vars: leaq, indirect-callq
-%   * write-vars: leaq, indirect-callq (all caller save!)
-%   * uncover-live: treat functions like the main program.
-%
-% build interferece:
-%   * treat functions like the main function
-%
-% assign-homes
-%   * add cases for: stack, stack-arg, indirect-callq, function-ref
-%
-% allocate-registers
-%   * treat functions like the main function
-%
-% patch-instructions
-%   * add cases for: function defs, indirect-callq, leaq (target must be reg.)
-%
-% print-x86
-%   * function-ref uses rip
-%   * indirect-callq => callq *
-%   * stack-arg  => rsp
-%   * function defs: save and restore callee-save registers
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \chapter{Lexically Scoped Functions}
 \chapter{Lexically Scoped Functions}