6 rokov pred · bbda9c4673
--- a/book.tex
+++ b/book.tex
@@ -5396,21 +5396,28 @@ said to be a \emph{tail call}. In the case of a tail call, whatever
 
				 the callee returns will be immediately returned by the caller, so the
			
 
				 call can be optimized into a \code{jmp} instruction---the caller will
			
 
				 jump to the new function, maintaining the same frame and return
			
 
				-address.
			
 
				+address. Like the indirect function call, we write an indirect
			
 
				+jump with a register prefixed with an asterisk.
			
 
				+
			
 
				+\begin{lstlisting}
			
 
				+   jmp *%rax
			
 
				+\end{lstlisting}
			
 
				 
			
 
				 A common use case for this optimization is \emph{tail recursion}: a
			
 
				 function that calls itself in the tail position is essentially a loop,
			
 
				 and if it does not grow the stack on each call it can act like
			
 
				 one. Functional languages like Racket and Scheme typically rely
			
 
				-heavily on recursion, and so they typically guarantee that \emph{all}
			
 
				-tail calls will be optimized in this way.
			
 
				+heavily on function calls, and so they typically guarantee that
			
 
				+\emph{all} tail calls will be optimized in this way, not just
			
 
				+functions that call themselves.
			
 
				+
			
 
				+\margincomment{\scriptsize To do: better motivate guaranteed tail calls? -mv}
			
 
				 
			
 
				 If we were to stick to the calling convention used by C compilers like
			
 
				 \code{gcc}, it would be awkward to optimize tail calls that require
			
 
				-stack arguments, so we have simplify the process by imposing an
			
 
				-invariant that no function passes arguments that way. With this
			
 
				-invariant, space-efficient tail calls are straightforward to
			
 
				-implement.
			
 
				+stack arguments, so we simplify the process by imposing an invariant
			
 
				+that no function passes arguments that way. With this invariant,
			
 
				+space-efficient tail calls are straightforward to implement.
			
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \centering
			
@@ -5453,12 +5460,33 @@ kinds of AST nodes to any of the intermediate languages?
 
				 First, we need to transform functions to operate on at most five
			
 
				 arguments.  There are a total of six registers for passing arguments
			
 
				 used in the convention previously mentioned, and we will reserve one
			
 
				-for future use with higher-order functions~\ref{ch:lambdas}. A simple
			
 
				-strategy for imposing an argument limit of length $n$ is to take all
			
 
				-arguments $i$ where $i \geq n$ and pack them into a vector, making
			
 
				-that subsequent vector the $n$th argument, and replacing all
			
 
				-occurrances of the $i$th variable in the body with a projection from
			
 
				-the vector. This pass, \code{limit-functions}, can operate directly on $R_4$.
			
 
				+for future use with higher-order functions (as explained in
			
 
				+Chapter~\ref{ch:lambdas}). A simple strategy for imposing an argument
			
 
				+limit of length $n$ is to take all arguments $i$ where $i \geq n$ and
			
 
				+pack them into a vector, making that subsequent vector the $n$th
			
 
				+argument.
			
 
				+
			
 
				+\begin{tabular}{lll}
			
 
				+\begin{minipage}{0.2\textwidth}
			
 
				+\begin{lstlisting}
			
 
				+  (|$f$| |$x_1$| |$\ldots$| |$x_n$|) 
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+&
			
 
				+$\Rightarrow$
			
 
				+&
			
 
				+\begin{minipage}{0.4\textwidth}
			
 
				+\begin{lstlisting}
			
 
				+(|$f$| |$x_1$| |$\ldots$| |$x_5$| (vector |$x_6$| |$\ldots$| |$x_n$|))
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{tabular}
			
 
				+
			
 
				+Additionally, all occurrances of the $i$th argument (where $i>5$) in
			
 
				+the body must be replaced with a projection from the vector. A pass
			
 
				+that limits function arguments like this (which we will name
			
 
				+\code{limit-functions}), can operate directly on $R_4$.
			
 
				+
			
 
				 
			
 
				 \begin{figure}[tp]
			
 
				 \centering
			
@@ -5489,11 +5517,11 @@ the vector. This pass, \code{limit-functions}, can operate directly on $R_4$.
 
				 \label{fig:f1-syntax}
			
 
				 \end{figure}
			
 
				 
			
 
				-The syntax of $R_4$ is inconvenient for purposes of compilation
			
 
				-because it conflates the use of function names and local variables and
			
 
				-it conflates the application of primitive operations and the
			
 
				-application of functions. This is a problem because we need to compile
			
 
				-the use of a function name differently than the use of a local
			
 
				+Going forward, the syntax of $R_4$ is inconvenient for purposes of
			
 
				+compilation because it conflates the use of function names and local
			
 
				+variables and it conflates the application of primitive operations and
			
 
				+the application of functions. This is a problem because we need to
			
 
				+compile the use of a function name differently than the use of a local
			
 
				 variable; we need to use \code{leaq} to move the function name to a
			
 
				 register. Similarly, the application of a function is going to require
			
 
				 a complex sequence of instructions, unlike the primitive
			
@@ -5503,14 +5531,22 @@ function references from just a symbol $f$ to \code{(function-ref
 
				   $\ldots$ $e_n$)} to the explicitly tagged AST \code{(app $e_0$ $e_1$
			
 
				   $\ldots$ $e_n$)} or \code{(tailcall $e_0$ $e_1$ $\ldots$ $e_n$)}. A
			
 
				 good name for this pass is \code{reveal-functions} and the output
			
 
				-language, $F_1$, is defined in Figure~\ref{fig:f1-syntax}. Placing
			
 
				-this pass after \code{uniquify} is a good idea, because it will make
			
 
				-sure that there are no local variables and functions that share the
			
 
				-same name. On the other hand, \code{reveal-functions} needs to come
			
 
				-before the \code{flatten} pass because \code{flatten} will help us
			
 
				-compile \code{function-ref}.  Figure~\ref{fig:c3-syntax} defines the
			
 
				-syntax for $C_3$, the output of
			
 
				-\key{flatten}.
			
 
				+language, $F_1$, is defined in Figure~\ref{fig:f1-syntax}.
			
 
				+
			
 
				+Distinguishing between calls in tail position and non-tail position
			
 
				+requires the pass to have some notion of context. We recommend the
			
 
				+function take an additional boolean argument which represents whether
			
 
				+the expression it is considering is in tail position. For example,
			
 
				+when handling a conditional expression \code{(if $e_1$ $e_2$ $e_3$)}
			
 
				+in tail position, both $e_2$ and $e_3$ are also in tail position,
			
 
				+while $e_1$ is not.
			
 
				+
			
 
				+Placing this pass after \code{uniquify} is a good idea, because it
			
 
				+will make sure that there are no local variables and functions that
			
 
				+share the same name. On the other hand, \code{reveal-functions} needs
			
 
				+to come before the \code{flatten} pass because \code{flatten} will
			
 
				+help us compile \code{function-ref}.  Figure~\ref{fig:c3-syntax}
			
 
				+defines the syntax for $C_3$, the output of \key{flatten}.
			
 
				 
			
 
				 
			
 
				 \begin{figure}[tp]
			
@@ -5534,6 +5570,7 @@ syntax for $C_3$, the output of
 
				       &\mid& \gray{ (\key{collect} \,\itm{int}) }
			
 
				        \mid \gray{ (\key{allocate} \,\itm{int}) }\\
			
 
				       &\mid& \gray{ (\key{call-live-roots}\,(\Var^{*}) \,\Stmt^{*}) } \\
			
 
				+      &\mid& (\key{tailcall} \,\Arg\,\Arg^{*}) \\
			
 
				   \Def &::=& (\key{define}\; (\itm{label} \; [\Var \key{:} \Type]^{*}) \key{:} \Type \; \Stmt^{+}) \\
			
 
				 C_3 & ::= & (\key{program}\;(\Var^{*})\;(\key{type}\;\textit{type})\;(\key{defines}\,\Def^{*})\;\Stmt^{+})
			
 
				 \end{array}
			
@@ -5568,6 +5605,11 @@ $\Rightarrow$
 
				 \end{minipage}
			
 
				 \end{tabular} \\
			
 
				 %
			
 
				+Note that in the syntax for $C_3$, tail calls are statements, not
			
 
				+expressions. Once we perform a tail call, we do not ever expect it to
			
 
				+return a value to us, and \code{flatten} therefore should handle
			
 
				+\code{app} and \code{tailcall} forms differently.
			
 
				+
			
 
				 The output of select instructions is a program in the x86$_3$
			
 
				 language, whose syntax is defined in Figure~\ref{fig:x86-3}.
			
 
				 
			
@@ -5595,7 +5637,8 @@ language, whose syntax is defined in Figure~\ref{fig:x86-3}.
 
				        \mid  (\key{jmp} \; \itm{label})
			
 
				        \mid (\key{j}\itm{cc} \; \itm{label})
			
 
				        \mid (\key{label} \; \itm{label})  } \\
			
 
				-     &\mid& (\key{indirect-callq}\;\Arg ) \mid (\key{leaq}\;\Arg\;\Arg)\\
			
 
				+     &\mid& (\key{indirect-callq}\;\Arg ) \mid (\key{indirect-jmp}\;\Arg) \\
			
 
				+     &\mid& (\key{leaq}\;\Arg\;\Arg)\\
			
 
				 \Def &::= & (\key{define} \; (\itm{label}) \;\itm{int} \;\itm{info}\; \Instr^{+})\\
			
 
				 x86_3 &::= & (\key{program} \;\itm{info} \;(\key{type}\;\itm{type})\;
			
 
				                (\key{defines}\,\Def^{*}) \; \Instr^{+})
			
@@ -5621,38 +5664,41 @@ local variables in the $\Var^{*}$ field as shown below.
 
				 \end{lstlisting}
			
 
				 In the \code{select-instructions} pass, we need to encode the
			
 
				 parameter passing in terms of the conventions discussed in
			
 
				-Section~\ref{sec:fun-x86}. So depending on the length of the parameter
			
 
				-list \itm{xs}, some of them may be in registers and some of them may
			
 
				-be on the stack. I recommend generating \code{movq} instructions to
			
 
				-move the parameters from their registers and stack locations into the
			
 
				-variables \itm{xs}, then let register allocation handle the assignment
			
 
				-of those variables to homes. After this pass, the \itm{xs} can be
			
 
				-added to the list of local variables. As mentioned in
			
 
				-Section~\ref{sec:fun-x86}, we need to find out how far to move the
			
 
				-stack pointer to ensure we have enough space for stack arguments in
			
 
				-all the calls inside the body of this function. This pass is a good
			
 
				-place to do this and store the result in the \itm{maxStack} field of
			
 
				-the output \code{define} shown below.
			
 
				-\begin{lstlisting}
			
 
				-  (define (|$f$|) |\itm{numParams}| (|$\Var^{*}$| |\itm{maxStack}|) |$\Instr^{+}$|)
			
 
				-\end{lstlisting}
			
 
				-
			
 
				-Next, consider the compilation of function applications, which have
			
 
				+Section~\ref{sec:fun-x86}: a \code{movq} instruction for each
			
 
				+parameter should be generated, to move the parameter value from the
			
 
				+appropriate register to the appropriate variable from \itm{xs}.
			
 
				+%% I recommend generating \code{movq} instructions to
			
 
				+%% move the parameters from their registers and stack locations into the
			
 
				+%% variables \itm{xs}, then let register allocation handle the assignment
			
 
				+%% of those variables to homes.
			
 
				+%% After this pass, the \itm{xs} can be
			
 
				+%% added to the list of local variables. As mentioned in
			
 
				+%% Section~\ref{sec:fun-x86}, we need to find out how far to move the
			
 
				+%% stack pointer to ensure we have enough space for stack arguments in
			
 
				+%% all the calls inside the body of this function. This pass is a good
			
 
				+%% place to do this and store the result in the \itm{maxStack} field of
			
 
				+%% the output \code{define} shown below.
			
 
				+%% \begin{lstlisting}
			
 
				+%%   (define (|$f$|) |\itm{numParams}| (|$\Var^{*}$| |\itm{maxStack}|) |$\Instr^{+}$|)
			
 
				+%% \end{lstlisting}
			
 
				+
			
 
				+Next, consider the compilation of non-tail function applications, which have
			
 
				 the following form at the start of \code{select-instructions}.
			
 
				 \begin{lstlisting}
			
 
				   (assign |\itm{lhs}| (app |\itm{fun}| |\itm{args}| |$\ldots$|))
			
 
				 \end{lstlisting}
			
 
				 In the mirror image of handling the parameters of function
			
 
				-definitions, some of the arguments \itm{args} need to be moved to the
			
 
				-argument passing registers and the rest should be moved to the
			
 
				-appropriate stack locations, as discussed in
			
 
				+definitions, the arguments \itm{args} need to be moved to the
			
 
				+argument passing registers, as discussed in
			
 
				 Section~\ref{sec:fun-x86}.
			
 
				+%% and the rest should be moved to the
			
 
				+%% appropriate stack locations, 
			
 
				 %% You might want to introduce a new kind of AST node for stack
			
 
				 %% arguments, \code{(stack-arg $i$)} where $i$ is the index of this
			
 
				 %% argument with respect to the other stack arguments.
			
 
				-As you're generating the code for parameter passing, take note of how
			
 
				-many stack arguments are needed for purposes of computing the
			
 
				-\itm{maxStack} discussed above.
			
 
				+%% As you're generating the code for parameter passing, take note of how
			
 
				+%% many stack arguments are needed for purposes of computing the
			
 
				+%% \itm{maxStack} discussed above.
			
 
				 
			
 
				 Once the instructions for parameter passing have been generated, the
			
 
				 function call itself can be performed with an indirect function call,
			
@@ -5664,6 +5710,15 @@ is stored in \code{rax}, so it needs to be moved into the \itm{lhs}.
 
				   (movq (reg rax) |\itm{lhs}|)
			
 
				 \end{lstlisting}
			
 
				 
			
 
				+Handling function applications in tail positions is only slightly
			
 
				+different. The parameter passing is the same as non-tail calls,
			
 
				+but the tail call itself cannot use the \code{indirect-callq} form.
			
 
				+Generating, instead, an \code{indirect-jmp} form in \code{select-instructions}
			
 
				+accounts for the fact that we intend to eventually use a \code{jmp}
			
 
				+rather than a \code{callq} for the tail call. Of course, the
			
 
				+\code{movq} from \code{rax} is not necessary after a tail call.
			
 
				+
			
 
				+
			
 
				 The rest of the passes need only minor modifications to handle the new
			
 
				 kinds of AST nodes: \code{function-ref}, \code{indirect-callq}, and
			
 
				 \code{leaq}. Inside \code{uncover-live}, when computing the $W$ set
			
@@ -5672,16 +5727,41 @@ recommend including all the caller-saved registers, which will have
 
				 the affect of making sure that no caller-saved register actually needs
			
 
				 to be saved. In \code{patch-instructions}, you should deal with the
			
 
				 x86 idiosyncrasy that the destination argument of \code{leaq} must be
			
 
				-a register.
			
 
				+a register. Additionally, \code{patch-instructions} should ensure that
			
 
				+the \code{indirect-jmp} argument is \itm{rax}, our reserved
			
 
				+register---this is to make code generation more convenient, because
			
 
				+we will be trampling many registers before the tail call (as explained
			
 
				+below).
			
 
				 
			
 
				-For the \code{print-x86} pass, I recommend the following translations:
			
 
				+For the \code{print-x86} pass, we recommend the following translations:
			
 
				 \begin{lstlisting}
			
 
				   (function-ref |\itm{label}|) |$\Rightarrow$| |\itm{label}|(%rip)
			
 
				   (indirect-callq |\itm{arg}|) |$\Rightarrow$| callq *|\itm{arg}|
			
 
				 \end{lstlisting}
			
 
				-For function definitions, the \code{print-x86} pass should add the
			
 
				-code for saving and restoring the callee-saved registers, if you
			
 
				-haven't already done that.
			
 
				+Handling \code{indirect-jmp} requires a bit more care. A
			
 
				+straightforward translation of \code{indirect-jmp} would be
			
 
				+\code{jmp *$\itm{arg}$}, which is what we will want to do,
			
 
				+but \emph{before} this jump we need to pop the saved registers
			
 
				+and reset the frame pointer. This is why it was convenient to
			
 
				+ensure the \code{jmp} argument was \itm{rax}. A sufficiently
			
 
				+clever compiler could determine that a function body always
			
 
				+ends in a tail call, and thus avoid generating code to restore
			
 
				+registers and return via \code{ret}, but for simplicity we do
			
 
				+not need to do this.
			
 
				+
			
 
				+\margincomment{\footnotesize The reason we can't easily optimize
			
 
				+  this is because the details of function prologue and epilogue
			
 
				+  are not exposed in the AST, and just emitted as strings in
			
 
				+  \code{print-x86}.}
			
 
				+
			
 
				+As this implies, your \code{print-x86} pass needs to add
			
 
				+the code for saving and restoring callee-saved registers, if
			
 
				+you have not already implemented that. This is necessary when
			
 
				+generating code for function definitions.
			
 
				+
			
 
				+%% For function definitions, the \code{print-x86} pass should add the
			
 
				+%% code for saving and restoring the callee-saved registers, if you
			
 
				+%% haven't already done that.
			
 
				 
			
 
				 \section{An Example Translation}