6 năm trước cách đây · bbda9c4673
--- a/book.tex
+++ b/book.tex
@@ -5396,21 +5396,28 @@ said to be a \emph{tail call}. In the case of a tail call, whatever
 
															 the callee returns will be immediately returned by the caller, so the
														
 
															 call can be optimized into a \code{jmp} instruction---the caller will
														
 
															 jump to the new function, maintaining the same frame and return
														
 
															-address.
														
 
															+address. Like the indirect function call, we write an indirect
														
 
															+jump with a register prefixed with an asterisk.
														
 
															+
														
 
															+\begin{lstlisting}
														
 
															+   jmp *%rax
														
 
															+\end{lstlisting}
														
 
															 A common use case for this optimization is \emph{tail recursion}: a
														
 
															 function that calls itself in the tail position is essentially a loop,
														
 
															 and if it does not grow the stack on each call it can act like
														
 
															 one. Functional languages like Racket and Scheme typically rely
														
 
															-heavily on recursion, and so they typically guarantee that \emph{all}
														
 
															-tail calls will be optimized in this way.
														
 
															+heavily on function calls, and so they typically guarantee that
														
 
															+\emph{all} tail calls will be optimized in this way, not just
														
 
															+functions that call themselves.
														
 
															+
														
 
															+\margincomment{\scriptsize To do: better motivate guaranteed tail calls? -mv}
														
 
															 If we were to stick to the calling convention used by C compilers like
														
 
															 \code{gcc}, it would be awkward to optimize tail calls that require
														
 
															-stack arguments, so we have simplify the process by imposing an
														
 
															-invariant that no function passes arguments that way. With this
														
 
															-invariant, space-efficient tail calls are straightforward to
														
 
															-implement.
														
 
															+stack arguments, so we simplify the process by imposing an invariant
														
 
															+that no function passes arguments that way. With this invariant,
														
 
															+space-efficient tail calls are straightforward to implement.
														
 
															 \begin{figure}[tbp]
														
 
															 \centering
														
@@ -5453,12 +5460,33 @@ kinds of AST nodes to any of the intermediate languages?
 
															 First, we need to transform functions to operate on at most five
														
 
															 arguments.  There are a total of six registers for passing arguments
														
 
															 used in the convention previously mentioned, and we will reserve one
														
 
															-for future use with higher-order functions~\ref{ch:lambdas}. A simple
														
 
															-strategy for imposing an argument limit of length $n$ is to take all
														
 
															-arguments $i$ where $i \geq n$ and pack them into a vector, making
														
 
															-that subsequent vector the $n$th argument, and replacing all
														
 
															-occurrances of the $i$th variable in the body with a projection from
														
 
															-the vector. This pass, \code{limit-functions}, can operate directly on $R_4$.
														
 
															+for future use with higher-order functions (as explained in
														
 
															+Chapter~\ref{ch:lambdas}). A simple strategy for imposing an argument
														
 
															+limit of length $n$ is to take all arguments $i$ where $i \geq n$ and
														
 
															+pack them into a vector, making that subsequent vector the $n$th
														
 
															+argument.
														
 
															+
														
 
															+\begin{tabular}{lll}
														
 
															+\begin{minipage}{0.2\textwidth}
														
 
															+\begin{lstlisting}
														
 
															+  (|$f$| |$x_1$| |$\ldots$| |$x_n$|) 
														
 
															+\end{lstlisting}
														
 
															+\end{minipage}
														
 
															+&
														
 
															+$\Rightarrow$
														
 
															+&
														
 
															+\begin{minipage}{0.4\textwidth}
														
 
															+\begin{lstlisting}
														
 
															+(|$f$| |$x_1$| |$\ldots$| |$x_5$| (vector |$x_6$| |$\ldots$| |$x_n$|))
														
 
															+\end{lstlisting}
														
 
															+\end{minipage}
														
 
															+\end{tabular}
														
 
															+
														
 
															+Additionally, all occurrances of the $i$th argument (where $i>5$) in
														
 
															+the body must be replaced with a projection from the vector. A pass
														
 
															+that limits function arguments like this (which we will name
														
 
															+\code{limit-functions}), can operate directly on $R_4$.
														
 
															+
														
 
															 \begin{figure}[tp]
														
 
															 \centering
														
@@ -5489,11 +5517,11 @@ the vector. This pass, \code{limit-functions}, can operate directly on $R_4$.
 
															 \label{fig:f1-syntax}
														
 
															 \end{figure}
														
 
															-The syntax of $R_4$ is inconvenient for purposes of compilation
														
 
															-because it conflates the use of function names and local variables and
														
 
															-it conflates the application of primitive operations and the
														
 
															-application of functions. This is a problem because we need to compile
														
 
															-the use of a function name differently than the use of a local
														
 
															+Going forward, the syntax of $R_4$ is inconvenient for purposes of
														
 
															+compilation because it conflates the use of function names and local
														
 
															+variables and it conflates the application of primitive operations and
														
 
															+the application of functions. This is a problem because we need to
														
 
															+compile the use of a function name differently than the use of a local
														
 
															 variable; we need to use \code{leaq} to move the function name to a
														
 
															 register. Similarly, the application of a function is going to require
														
 
															 a complex sequence of instructions, unlike the primitive
														
@@ -5503,14 +5531,22 @@ function references from just a symbol $f$ to \code{(function-ref
 
															   $\ldots$ $e_n$)} to the explicitly tagged AST \code{(app $e_0$ $e_1$
														
 
															   $\ldots$ $e_n$)} or \code{(tailcall $e_0$ $e_1$ $\ldots$ $e_n$)}. A
														
 
															 good name for this pass is \code{reveal-functions} and the output
														
 
															-language, $F_1$, is defined in Figure~\ref{fig:f1-syntax}. Placing
														
 
															-this pass after \code{uniquify} is a good idea, because it will make
														
 
															-sure that there are no local variables and functions that share the
														
 
															-same name. On the other hand, \code{reveal-functions} needs to come
														
 
															-before the \code{flatten} pass because \code{flatten} will help us
														
 
															-compile \code{function-ref}.  Figure~\ref{fig:c3-syntax} defines the
														
 
															-syntax for $C_3$, the output of
														
 
															-\key{flatten}.
														
 
															+language, $F_1$, is defined in Figure~\ref{fig:f1-syntax}.
														
 
															+
														
 
															+Distinguishing between calls in tail position and non-tail position
														
 
															+requires the pass to have some notion of context. We recommend the
														
 
															+function take an additional boolean argument which represents whether
														
 
															+the expression it is considering is in tail position. For example,
														
 
															+when handling a conditional expression \code{(if $e_1$ $e_2$ $e_3$)}
														
 
															+in tail position, both $e_2$ and $e_3$ are also in tail position,
														
 
															+while $e_1$ is not.
														
 
															+
														
 
															+Placing this pass after \code{uniquify} is a good idea, because it
														
 
															+will make sure that there are no local variables and functions that
														
 
															+share the same name. On the other hand, \code{reveal-functions} needs
														
 
															+to come before the \code{flatten} pass because \code{flatten} will
														
 
															+help us compile \code{function-ref}.  Figure~\ref{fig:c3-syntax}
														
 
															+defines the syntax for $C_3$, the output of \key{flatten}.
														
 
															 \begin{figure}[tp]
														
@@ -5534,6 +5570,7 @@ syntax for $C_3$, the output of
 
															       &\mid& \gray{ (\key{collect} \,\itm{int}) }
														
 
															        \mid \gray{ (\key{allocate} \,\itm{int}) }\\
														
 
															       &\mid& \gray{ (\key{call-live-roots}\,(\Var^{*}) \,\Stmt^{*}) } \\
														
 
															+      &\mid& (\key{tailcall} \,\Arg\,\Arg^{*}) \\
														
 
															   \Def &::=& (\key{define}\; (\itm{label} \; [\Var \key{:} \Type]^{*}) \key{:} \Type \; \Stmt^{+}) \\
														
 
															 C_3 & ::= & (\key{program}\;(\Var^{*})\;(\key{type}\;\textit{type})\;(\key{defines}\,\Def^{*})\;\Stmt^{+})
														
 
															 \end{array}
														
@@ -5568,6 +5605,11 @@ $\Rightarrow$
 
															 \end{minipage}
														
 
															 \end{tabular} \\
														
 
															 %
														
 
															+Note that in the syntax for $C_3$, tail calls are statements, not
														
 
															+expressions. Once we perform a tail call, we do not ever expect it to
														
 
															+return a value to us, and \code{flatten} therefore should handle
														
 
															+\code{app} and \code{tailcall} forms differently.
														
 
															+
														
 
															 The output of select instructions is a program in the x86$_3$
														
 
															 language, whose syntax is defined in Figure~\ref{fig:x86-3}.
														
@@ -5595,7 +5637,8 @@ language, whose syntax is defined in Figure~\ref{fig:x86-3}.
 
															        \mid  (\key{jmp} \; \itm{label})
														
 
															        \mid (\key{j}\itm{cc} \; \itm{label})
														
 
															        \mid (\key{label} \; \itm{label})  } \\
														
 
															-     &\mid& (\key{indirect-callq}\;\Arg ) \mid (\key{leaq}\;\Arg\;\Arg)\\
														
 
															+     &\mid& (\key{indirect-callq}\;\Arg ) \mid (\key{indirect-jmp}\;\Arg) \\
														
 
															+     &\mid& (\key{leaq}\;\Arg\;\Arg)\\
														
 
															 \Def &::= & (\key{define} \; (\itm{label}) \;\itm{int} \;\itm{info}\; \Instr^{+})\\
														
 
															 x86_3 &::= & (\key{program} \;\itm{info} \;(\key{type}\;\itm{type})\;
														
 
															                (\key{defines}\,\Def^{*}) \; \Instr^{+})
														
@@ -5621,38 +5664,41 @@ local variables in the $\Var^{*}$ field as shown below.
 
															 \end{lstlisting}
														
 
															 In the \code{select-instructions} pass, we need to encode the
														
 
															 parameter passing in terms of the conventions discussed in
														
 
															-Section~\ref{sec:fun-x86}. So depending on the length of the parameter
														
 
															-list \itm{xs}, some of them may be in registers and some of them may
														
 
															-be on the stack. I recommend generating \code{movq} instructions to
														
 
															-move the parameters from their registers and stack locations into the
														
 
															-variables \itm{xs}, then let register allocation handle the assignment
														
 
															-of those variables to homes. After this pass, the \itm{xs} can be
														
 
															-added to the list of local variables. As mentioned in
														
 
															-Section~\ref{sec:fun-x86}, we need to find out how far to move the
														
 
															-stack pointer to ensure we have enough space for stack arguments in
														
 
															-all the calls inside the body of this function. This pass is a good
														
 
															-place to do this and store the result in the \itm{maxStack} field of
														
 
															-the output \code{define} shown below.
														
 
															-\begin{lstlisting}
														
 
															-  (define (|$f$|) |\itm{numParams}| (|$\Var^{*}$| |\itm{maxStack}|) |$\Instr^{+}$|)
														
 
															-\end{lstlisting}
														
 
															-
														
 
															-Next, consider the compilation of function applications, which have
														
 
															+Section~\ref{sec:fun-x86}: a \code{movq} instruction for each
														
 
															+parameter should be generated, to move the parameter value from the
														
 
															+appropriate register to the appropriate variable from \itm{xs}.
														
 
															+%% I recommend generating \code{movq} instructions to
														
 
															+%% move the parameters from their registers and stack locations into the
														
 
															+%% variables \itm{xs}, then let register allocation handle the assignment
														
 
															+%% of those variables to homes.
														
 
															+%% After this pass, the \itm{xs} can be
														
 
															+%% added to the list of local variables. As mentioned in
														
 
															+%% Section~\ref{sec:fun-x86}, we need to find out how far to move the
														
 
															+%% stack pointer to ensure we have enough space for stack arguments in
														
 
															+%% all the calls inside the body of this function. This pass is a good
														
 
															+%% place to do this and store the result in the \itm{maxStack} field of
														
 
															+%% the output \code{define} shown below.
														
 
															+%% \begin{lstlisting}
														
 
															+%%   (define (|$f$|) |\itm{numParams}| (|$\Var^{*}$| |\itm{maxStack}|) |$\Instr^{+}$|)
														
 
															+%% \end{lstlisting}
														
 
															+
														
 
															+Next, consider the compilation of non-tail function applications, which have
														
 
															 the following form at the start of \code{select-instructions}.
														
 
															 \begin{lstlisting}
														
 
															   (assign |\itm{lhs}| (app |\itm{fun}| |\itm{args}| |$\ldots$|))
														
 
															 \end{lstlisting}
														
 
															 In the mirror image of handling the parameters of function
														
 
															-definitions, some of the arguments \itm{args} need to be moved to the
														
 
															-argument passing registers and the rest should be moved to the
														
 
															-appropriate stack locations, as discussed in
														
 
															+definitions, the arguments \itm{args} need to be moved to the
														
 
															+argument passing registers, as discussed in
														
 
															 Section~\ref{sec:fun-x86}.
														
 
															+%% and the rest should be moved to the
														
 
															+%% appropriate stack locations, 
														
 
															 %% You might want to introduce a new kind of AST node for stack
														
 
															 %% arguments, \code{(stack-arg $i$)} where $i$ is the index of this
														
 
															 %% argument with respect to the other stack arguments.
														
 
															-As you're generating the code for parameter passing, take note of how
														
 
															-many stack arguments are needed for purposes of computing the
														
 
															-\itm{maxStack} discussed above.
														
 
															+%% As you're generating the code for parameter passing, take note of how
														
 
															+%% many stack arguments are needed for purposes of computing the
														
 
															+%% \itm{maxStack} discussed above.
														
 
															 Once the instructions for parameter passing have been generated, the
														
 
															 function call itself can be performed with an indirect function call,
														
@@ -5664,6 +5710,15 @@ is stored in \code{rax}, so it needs to be moved into the \itm{lhs}.
 
															   (movq (reg rax) |\itm{lhs}|)
														
 
															 \end{lstlisting}
														
 
															+Handling function applications in tail positions is only slightly
														
 
															+different. The parameter passing is the same as non-tail calls,
														
 
															+but the tail call itself cannot use the \code{indirect-callq} form.
														
 
															+Generating, instead, an \code{indirect-jmp} form in \code{select-instructions}
														
 
															+accounts for the fact that we intend to eventually use a \code{jmp}
														
 
															+rather than a \code{callq} for the tail call. Of course, the
														
 
															+\code{movq} from \code{rax} is not necessary after a tail call.
														
 
															+
														
 
															+
														
 
															 The rest of the passes need only minor modifications to handle the new
														
 
															 kinds of AST nodes: \code{function-ref}, \code{indirect-callq}, and
														
 
															 \code{leaq}. Inside \code{uncover-live}, when computing the $W$ set
														
@@ -5672,16 +5727,41 @@ recommend including all the caller-saved registers, which will have
 
															 the affect of making sure that no caller-saved register actually needs
														
 
															 to be saved. In \code{patch-instructions}, you should deal with the
														
 
															 x86 idiosyncrasy that the destination argument of \code{leaq} must be
														
 
															-a register.
														
 
															+a register. Additionally, \code{patch-instructions} should ensure that
														
 
															+the \code{indirect-jmp} argument is \itm{rax}, our reserved
														
 
															+register---this is to make code generation more convenient, because
														
 
															+we will be trampling many registers before the tail call (as explained
														
 
															+below).
														
 
															-For the \code{print-x86} pass, I recommend the following translations:
														
 
															+For the \code{print-x86} pass, we recommend the following translations:
														
 
															 \begin{lstlisting}
														
 
															   (function-ref |\itm{label}|) |$\Rightarrow$| |\itm{label}|(%rip)
														
 
															   (indirect-callq |\itm{arg}|) |$\Rightarrow$| callq *|\itm{arg}|
														
 
															 \end{lstlisting}
														
 
															-For function definitions, the \code{print-x86} pass should add the
														
 
															-code for saving and restoring the callee-saved registers, if you
														
 
															-haven't already done that.
														
 
															+Handling \code{indirect-jmp} requires a bit more care. A
														
 
															+straightforward translation of \code{indirect-jmp} would be
														
 
															+\code{jmp *$\itm{arg}$}, which is what we will want to do,
														
 
															+but \emph{before} this jump we need to pop the saved registers
														
 
															+and reset the frame pointer. This is why it was convenient to
														
 
															+ensure the \code{jmp} argument was \itm{rax}. A sufficiently
														
 
															+clever compiler could determine that a function body always
														
 
															+ends in a tail call, and thus avoid generating code to restore
														
 
															+registers and return via \code{ret}, but for simplicity we do
														
 
															+not need to do this.
														
 
															+
														
 
															+\margincomment{\footnotesize The reason we can't easily optimize
														
 
															+  this is because the details of function prologue and epilogue
														
 
															+  are not exposed in the AST, and just emitted as strings in
														
 
															+  \code{print-x86}.}
														
 
															+
														
 
															+As this implies, your \code{print-x86} pass needs to add
														
 
															+the code for saving and restoring callee-saved registers, if
														
 
															+you have not already implemented that. This is necessary when
														
 
															+generating code for function definitions.
														
 
															+
														
 
															+%% For function definitions, the \code{print-x86} pass should add the
														
 
															+%% code for saving and restoring the callee-saved registers, if you
														
 
															+%% haven't already done that.
														
 
															 \section{An Example Translation}