Browse Source

more prose explaining tail call compilation

Michael Vollmer 6 years ago
parent
commit
bbda9c4673
1 changed files with 135 additions and 55 deletions
  1. 135 55
      book.tex

+ 135 - 55
book.tex

@@ -5396,21 +5396,28 @@ said to be a \emph{tail call}. In the case of a tail call, whatever
 the callee returns will be immediately returned by the caller, so the
 the callee returns will be immediately returned by the caller, so the
 call can be optimized into a \code{jmp} instruction---the caller will
 call can be optimized into a \code{jmp} instruction---the caller will
 jump to the new function, maintaining the same frame and return
 jump to the new function, maintaining the same frame and return
-address.
+address. Like the indirect function call, we write an indirect
+jump with a register prefixed with an asterisk.
+
+\begin{lstlisting}
+   jmp *%rax
+\end{lstlisting}
 
 
 A common use case for this optimization is \emph{tail recursion}: a
 A common use case for this optimization is \emph{tail recursion}: a
 function that calls itself in the tail position is essentially a loop,
 function that calls itself in the tail position is essentially a loop,
 and if it does not grow the stack on each call it can act like
 and if it does not grow the stack on each call it can act like
 one. Functional languages like Racket and Scheme typically rely
 one. Functional languages like Racket and Scheme typically rely
-heavily on recursion, and so they typically guarantee that \emph{all}
-tail calls will be optimized in this way.
+heavily on function calls, and so they typically guarantee that
+\emph{all} tail calls will be optimized in this way, not just
+functions that call themselves.
+
+\margincomment{\scriptsize To do: better motivate guaranteed tail calls? -mv}
 
 
 If we were to stick to the calling convention used by C compilers like
 If we were to stick to the calling convention used by C compilers like
 \code{gcc}, it would be awkward to optimize tail calls that require
 \code{gcc}, it would be awkward to optimize tail calls that require
-stack arguments, so we have simplify the process by imposing an
-invariant that no function passes arguments that way. With this
-invariant, space-efficient tail calls are straightforward to
-implement.
+stack arguments, so we simplify the process by imposing an invariant
+that no function passes arguments that way. With this invariant,
+space-efficient tail calls are straightforward to implement.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \centering
 \centering
@@ -5453,12 +5460,33 @@ kinds of AST nodes to any of the intermediate languages?
 First, we need to transform functions to operate on at most five
 First, we need to transform functions to operate on at most five
 arguments.  There are a total of six registers for passing arguments
 arguments.  There are a total of six registers for passing arguments
 used in the convention previously mentioned, and we will reserve one
 used in the convention previously mentioned, and we will reserve one
-for future use with higher-order functions~\ref{ch:lambdas}. A simple
-strategy for imposing an argument limit of length $n$ is to take all
-arguments $i$ where $i \geq n$ and pack them into a vector, making
-that subsequent vector the $n$th argument, and replacing all
-occurrances of the $i$th variable in the body with a projection from
-the vector. This pass, \code{limit-functions}, can operate directly on $R_4$.
+for future use with higher-order functions (as explained in
+Chapter~\ref{ch:lambdas}). A simple strategy for imposing an argument
+limit of length $n$ is to take all arguments $i$ where $i \geq n$ and
+pack them into a vector, making that subsequent vector the $n$th
+argument.
+
+\begin{tabular}{lll}
+\begin{minipage}{0.2\textwidth}
+\begin{lstlisting}
+  (|$f$| |$x_1$| |$\ldots$| |$x_n$|) 
+\end{lstlisting}
+\end{minipage}
+&
+$\Rightarrow$
+&
+\begin{minipage}{0.4\textwidth}
+\begin{lstlisting}
+(|$f$| |$x_1$| |$\ldots$| |$x_5$| (vector |$x_6$| |$\ldots$| |$x_n$|))
+\end{lstlisting}
+\end{minipage}
+\end{tabular}
+
+Additionally, all occurrances of the $i$th argument (where $i>5$) in
+the body must be replaced with a projection from the vector. A pass
+that limits function arguments like this (which we will name
+\code{limit-functions}), can operate directly on $R_4$.
+
 
 
 \begin{figure}[tp]
 \begin{figure}[tp]
 \centering
 \centering
@@ -5489,11 +5517,11 @@ the vector. This pass, \code{limit-functions}, can operate directly on $R_4$.
 \label{fig:f1-syntax}
 \label{fig:f1-syntax}
 \end{figure}
 \end{figure}
 
 
-The syntax of $R_4$ is inconvenient for purposes of compilation
-because it conflates the use of function names and local variables and
-it conflates the application of primitive operations and the
-application of functions. This is a problem because we need to compile
-the use of a function name differently than the use of a local
+Going forward, the syntax of $R_4$ is inconvenient for purposes of
+compilation because it conflates the use of function names and local
+variables and it conflates the application of primitive operations and
+the application of functions. This is a problem because we need to
+compile the use of a function name differently than the use of a local
 variable; we need to use \code{leaq} to move the function name to a
 variable; we need to use \code{leaq} to move the function name to a
 register. Similarly, the application of a function is going to require
 register. Similarly, the application of a function is going to require
 a complex sequence of instructions, unlike the primitive
 a complex sequence of instructions, unlike the primitive
@@ -5503,14 +5531,22 @@ function references from just a symbol $f$ to \code{(function-ref
   $\ldots$ $e_n$)} to the explicitly tagged AST \code{(app $e_0$ $e_1$
   $\ldots$ $e_n$)} to the explicitly tagged AST \code{(app $e_0$ $e_1$
   $\ldots$ $e_n$)} or \code{(tailcall $e_0$ $e_1$ $\ldots$ $e_n$)}. A
   $\ldots$ $e_n$)} or \code{(tailcall $e_0$ $e_1$ $\ldots$ $e_n$)}. A
 good name for this pass is \code{reveal-functions} and the output
 good name for this pass is \code{reveal-functions} and the output
-language, $F_1$, is defined in Figure~\ref{fig:f1-syntax}. Placing
-this pass after \code{uniquify} is a good idea, because it will make
-sure that there are no local variables and functions that share the
-same name. On the other hand, \code{reveal-functions} needs to come
-before the \code{flatten} pass because \code{flatten} will help us
-compile \code{function-ref}.  Figure~\ref{fig:c3-syntax} defines the
-syntax for $C_3$, the output of
-\key{flatten}.
+language, $F_1$, is defined in Figure~\ref{fig:f1-syntax}.
+
+Distinguishing between calls in tail position and non-tail position
+requires the pass to have some notion of context. We recommend the
+function take an additional boolean argument which represents whether
+the expression it is considering is in tail position. For example,
+when handling a conditional expression \code{(if $e_1$ $e_2$ $e_3$)}
+in tail position, both $e_2$ and $e_3$ are also in tail position,
+while $e_1$ is not.
+
+Placing this pass after \code{uniquify} is a good idea, because it
+will make sure that there are no local variables and functions that
+share the same name. On the other hand, \code{reveal-functions} needs
+to come before the \code{flatten} pass because \code{flatten} will
+help us compile \code{function-ref}.  Figure~\ref{fig:c3-syntax}
+defines the syntax for $C_3$, the output of \key{flatten}.
 
 
 
 
 \begin{figure}[tp]
 \begin{figure}[tp]
@@ -5534,6 +5570,7 @@ syntax for $C_3$, the output of
       &\mid& \gray{ (\key{collect} \,\itm{int}) }
       &\mid& \gray{ (\key{collect} \,\itm{int}) }
        \mid \gray{ (\key{allocate} \,\itm{int}) }\\
        \mid \gray{ (\key{allocate} \,\itm{int}) }\\
       &\mid& \gray{ (\key{call-live-roots}\,(\Var^{*}) \,\Stmt^{*}) } \\
       &\mid& \gray{ (\key{call-live-roots}\,(\Var^{*}) \,\Stmt^{*}) } \\
+      &\mid& (\key{tailcall} \,\Arg\,\Arg^{*}) \\
   \Def &::=& (\key{define}\; (\itm{label} \; [\Var \key{:} \Type]^{*}) \key{:} \Type \; \Stmt^{+}) \\
   \Def &::=& (\key{define}\; (\itm{label} \; [\Var \key{:} \Type]^{*}) \key{:} \Type \; \Stmt^{+}) \\
 C_3 & ::= & (\key{program}\;(\Var^{*})\;(\key{type}\;\textit{type})\;(\key{defines}\,\Def^{*})\;\Stmt^{+})
 C_3 & ::= & (\key{program}\;(\Var^{*})\;(\key{type}\;\textit{type})\;(\key{defines}\,\Def^{*})\;\Stmt^{+})
 \end{array}
 \end{array}
@@ -5568,6 +5605,11 @@ $\Rightarrow$
 \end{minipage}
 \end{minipage}
 \end{tabular} \\
 \end{tabular} \\
 %
 %
+Note that in the syntax for $C_3$, tail calls are statements, not
+expressions. Once we perform a tail call, we do not ever expect it to
+return a value to us, and \code{flatten} therefore should handle
+\code{app} and \code{tailcall} forms differently.
+
 The output of select instructions is a program in the x86$_3$
 The output of select instructions is a program in the x86$_3$
 language, whose syntax is defined in Figure~\ref{fig:x86-3}.
 language, whose syntax is defined in Figure~\ref{fig:x86-3}.
 
 
@@ -5595,7 +5637,8 @@ language, whose syntax is defined in Figure~\ref{fig:x86-3}.
        \mid  (\key{jmp} \; \itm{label})
        \mid  (\key{jmp} \; \itm{label})
        \mid (\key{j}\itm{cc} \; \itm{label})
        \mid (\key{j}\itm{cc} \; \itm{label})
        \mid (\key{label} \; \itm{label})  } \\
        \mid (\key{label} \; \itm{label})  } \\
-     &\mid& (\key{indirect-callq}\;\Arg ) \mid (\key{leaq}\;\Arg\;\Arg)\\
+     &\mid& (\key{indirect-callq}\;\Arg ) \mid (\key{indirect-jmp}\;\Arg) \\
+     &\mid& (\key{leaq}\;\Arg\;\Arg)\\
 \Def &::= & (\key{define} \; (\itm{label}) \;\itm{int} \;\itm{info}\; \Instr^{+})\\
 \Def &::= & (\key{define} \; (\itm{label}) \;\itm{int} \;\itm{info}\; \Instr^{+})\\
 x86_3 &::= & (\key{program} \;\itm{info} \;(\key{type}\;\itm{type})\;
 x86_3 &::= & (\key{program} \;\itm{info} \;(\key{type}\;\itm{type})\;
                (\key{defines}\,\Def^{*}) \; \Instr^{+})
                (\key{defines}\,\Def^{*}) \; \Instr^{+})
@@ -5621,38 +5664,41 @@ local variables in the $\Var^{*}$ field as shown below.
 \end{lstlisting}
 \end{lstlisting}
 In the \code{select-instructions} pass, we need to encode the
 In the \code{select-instructions} pass, we need to encode the
 parameter passing in terms of the conventions discussed in
 parameter passing in terms of the conventions discussed in
-Section~\ref{sec:fun-x86}. So depending on the length of the parameter
-list \itm{xs}, some of them may be in registers and some of them may
-be on the stack. I recommend generating \code{movq} instructions to
-move the parameters from their registers and stack locations into the
-variables \itm{xs}, then let register allocation handle the assignment
-of those variables to homes. After this pass, the \itm{xs} can be
-added to the list of local variables. As mentioned in
-Section~\ref{sec:fun-x86}, we need to find out how far to move the
-stack pointer to ensure we have enough space for stack arguments in
-all the calls inside the body of this function. This pass is a good
-place to do this and store the result in the \itm{maxStack} field of
-the output \code{define} shown below.
-\begin{lstlisting}
-  (define (|$f$|) |\itm{numParams}| (|$\Var^{*}$| |\itm{maxStack}|) |$\Instr^{+}$|)
-\end{lstlisting}
-
-Next, consider the compilation of function applications, which have
+Section~\ref{sec:fun-x86}: a \code{movq} instruction for each
+parameter should be generated, to move the parameter value from the
+appropriate register to the appropriate variable from \itm{xs}.
+%% I recommend generating \code{movq} instructions to
+%% move the parameters from their registers and stack locations into the
+%% variables \itm{xs}, then let register allocation handle the assignment
+%% of those variables to homes.
+%% After this pass, the \itm{xs} can be
+%% added to the list of local variables. As mentioned in
+%% Section~\ref{sec:fun-x86}, we need to find out how far to move the
+%% stack pointer to ensure we have enough space for stack arguments in
+%% all the calls inside the body of this function. This pass is a good
+%% place to do this and store the result in the \itm{maxStack} field of
+%% the output \code{define} shown below.
+%% \begin{lstlisting}
+%%   (define (|$f$|) |\itm{numParams}| (|$\Var^{*}$| |\itm{maxStack}|) |$\Instr^{+}$|)
+%% \end{lstlisting}
+
+Next, consider the compilation of non-tail function applications, which have
 the following form at the start of \code{select-instructions}.
 the following form at the start of \code{select-instructions}.
 \begin{lstlisting}
 \begin{lstlisting}
   (assign |\itm{lhs}| (app |\itm{fun}| |\itm{args}| |$\ldots$|))
   (assign |\itm{lhs}| (app |\itm{fun}| |\itm{args}| |$\ldots$|))
 \end{lstlisting}
 \end{lstlisting}
 In the mirror image of handling the parameters of function
 In the mirror image of handling the parameters of function
-definitions, some of the arguments \itm{args} need to be moved to the
-argument passing registers and the rest should be moved to the
-appropriate stack locations, as discussed in
+definitions, the arguments \itm{args} need to be moved to the
+argument passing registers, as discussed in
 Section~\ref{sec:fun-x86}.
 Section~\ref{sec:fun-x86}.
+%% and the rest should be moved to the
+%% appropriate stack locations, 
 %% You might want to introduce a new kind of AST node for stack
 %% You might want to introduce a new kind of AST node for stack
 %% arguments, \code{(stack-arg $i$)} where $i$ is the index of this
 %% arguments, \code{(stack-arg $i$)} where $i$ is the index of this
 %% argument with respect to the other stack arguments.
 %% argument with respect to the other stack arguments.
-As you're generating the code for parameter passing, take note of how
-many stack arguments are needed for purposes of computing the
-\itm{maxStack} discussed above.
+%% As you're generating the code for parameter passing, take note of how
+%% many stack arguments are needed for purposes of computing the
+%% \itm{maxStack} discussed above.
 
 
 Once the instructions for parameter passing have been generated, the
 Once the instructions for parameter passing have been generated, the
 function call itself can be performed with an indirect function call,
 function call itself can be performed with an indirect function call,
@@ -5664,6 +5710,15 @@ is stored in \code{rax}, so it needs to be moved into the \itm{lhs}.
   (movq (reg rax) |\itm{lhs}|)
   (movq (reg rax) |\itm{lhs}|)
 \end{lstlisting}
 \end{lstlisting}
 
 
+Handling function applications in tail positions is only slightly
+different. The parameter passing is the same as non-tail calls,
+but the tail call itself cannot use the \code{indirect-callq} form.
+Generating, instead, an \code{indirect-jmp} form in \code{select-instructions}
+accounts for the fact that we intend to eventually use a \code{jmp}
+rather than a \code{callq} for the tail call. Of course, the
+\code{movq} from \code{rax} is not necessary after a tail call.
+
+
 The rest of the passes need only minor modifications to handle the new
 The rest of the passes need only minor modifications to handle the new
 kinds of AST nodes: \code{function-ref}, \code{indirect-callq}, and
 kinds of AST nodes: \code{function-ref}, \code{indirect-callq}, and
 \code{leaq}. Inside \code{uncover-live}, when computing the $W$ set
 \code{leaq}. Inside \code{uncover-live}, when computing the $W$ set
@@ -5672,16 +5727,41 @@ recommend including all the caller-saved registers, which will have
 the affect of making sure that no caller-saved register actually needs
 the affect of making sure that no caller-saved register actually needs
 to be saved. In \code{patch-instructions}, you should deal with the
 to be saved. In \code{patch-instructions}, you should deal with the
 x86 idiosyncrasy that the destination argument of \code{leaq} must be
 x86 idiosyncrasy that the destination argument of \code{leaq} must be
-a register.
+a register. Additionally, \code{patch-instructions} should ensure that
+the \code{indirect-jmp} argument is \itm{rax}, our reserved
+register---this is to make code generation more convenient, because
+we will be trampling many registers before the tail call (as explained
+below).
 
 
-For the \code{print-x86} pass, I recommend the following translations:
+For the \code{print-x86} pass, we recommend the following translations:
 \begin{lstlisting}
 \begin{lstlisting}
   (function-ref |\itm{label}|) |$\Rightarrow$| |\itm{label}|(%rip)
   (function-ref |\itm{label}|) |$\Rightarrow$| |\itm{label}|(%rip)
   (indirect-callq |\itm{arg}|) |$\Rightarrow$| callq *|\itm{arg}|
   (indirect-callq |\itm{arg}|) |$\Rightarrow$| callq *|\itm{arg}|
 \end{lstlisting}
 \end{lstlisting}
-For function definitions, the \code{print-x86} pass should add the
-code for saving and restoring the callee-saved registers, if you
-haven't already done that.
+Handling \code{indirect-jmp} requires a bit more care. A
+straightforward translation of \code{indirect-jmp} would be
+\code{jmp *$\itm{arg}$}, which is what we will want to do,
+but \emph{before} this jump we need to pop the saved registers
+and reset the frame pointer. This is why it was convenient to
+ensure the \code{jmp} argument was \itm{rax}. A sufficiently
+clever compiler could determine that a function body always
+ends in a tail call, and thus avoid generating code to restore
+registers and return via \code{ret}, but for simplicity we do
+not need to do this.
+
+\margincomment{\footnotesize The reason we can't easily optimize
+  this is because the details of function prologue and epilogue
+  are not exposed in the AST, and just emitted as strings in
+  \code{print-x86}.}
+
+As this implies, your \code{print-x86} pass needs to add
+the code for saving and restoring callee-saved registers, if
+you have not already implemented that. This is necessary when
+generating code for function definitions.
+
+%% For function definitions, the \code{print-x86} pass should add the
+%% code for saving and restoring the callee-saved registers, if you
+%% haven't already done that.
 
 
 \section{An Example Translation}
 \section{An Example Translation}