瀏覽代碼

start adding tailcalls to ch 6

Michael Vollmer 6 年之前
父節點
當前提交
14323561f2
共有 1 個文件被更改,包括 97 次插入50 次删除
  1. 97 50
      book.tex

+ 97 - 50
book.tex

@@ -5326,44 +5326,52 @@ with an asterisk.
    callq *%rbx
    callq *%rbx
 \end{lstlisting}
 \end{lstlisting}
 
 
-The x86 architecture does not directly support passing arguments to
-functions; instead we use a combination of registers and stack
-locations for passing arguments, following the conventions used by
-\code{gcc} as described by \cite{Matz:2013aa}. Up to six arguments may
-be passed in registers, using the registers \code{rdi}, \code{rsi},
+Because the x86 architecture does not have any direct support for
+passing arguments to functions, compiler implementers typically adopt
+a \emph{convention} to follow for how arguments are passed to
+functions. The convention for C compilers such as \code{gcc} (as
+described in \cite{Matz:2013aa}), uses a combination of registers and
+stack locations for passing arguments. Up to six arguments may be
+passed in registers, using the registers \code{rdi}, \code{rsi},
 \code{rdx}, \code{rcx}, \code{r8}, and \code{r9}, in that order.  If
 \code{rdx}, \code{rcx}, \code{r8}, and \code{r9}, in that order.  If
 there are more than six arguments, then the rest must be placed on the
 there are more than six arguments, then the rest must be placed on the
-stack, which we call \emph{stack arguments}, which we discuss in later
-paragraphs. The register \code{rax} is for the return value of the
-function.
-
-Recall from Section~\ref{sec:x86} that the stack is also used for
-local variables and for storing the values of callee-saved registers
-(we shall refer to all of these collectively as ``locals''), and that
-at the beginning of a function we move the stack pointer \code{rsp}
-down to make room for them.
+stack, which we call \emph{stack arguments}. The register \code{rax}
+is for the return value of the function.
+
+We will be using a modification of this convention. For reasons that
+will be explained in subsequent paragraphs, we will not make use of
+stack arguments, and instead restrict functions to passing arguments
+exclusively in registers. To enforce this restriction, functions of
+too many arguments will be transformed to pass additional arguments in
+a vector.
+
+%% Recall from Section~\ref{sec:x86} that the stack is also used for
+%% local variables and for storing the values of callee-saved registers
+%% (we shall refer to all of these collectively as ``locals''), and that
+%% at the beginning of a function we move the stack pointer \code{rsp}
+%% down to make room for them.
 %% We recommend storing the local variables
 %% We recommend storing the local variables
 %% first and then the callee-saved registers, so that the local variables
 %% first and then the callee-saved registers, so that the local variables
 %% can be accessed using \code{rbp} the same as before the addition of
 %% can be accessed using \code{rbp} the same as before the addition of
 %% functions.
 %% functions.
-To make additional room for passing arguments, we shall
-move the stack pointer even further down. We count how many stack
-arguments are needed for each function call that occurs inside the
-body of the function and find their maximum. Adding this number to the
-number of locals gives us how much the \code{rsp} should be moved at
-the beginning of the function. In preparation for a function call, we
-offset from \code{rsp} to set up the stack arguments. We put the first
-stack argument in \code{0(\%rsp)}, the second in \code{8(\%rsp)}, and
-so on.
-
-Upon calling the function, the stack arguments are retrieved by the
-callee using the base pointer \code{rbp}. The address \code{16(\%rbp)}
-is the location of the first stack argument, \code{24(\%rbp)} is the
-address of the second, and so on. Figure~\ref{fig:call-frames} shows
-the layout of the caller and callee frames. Notice how important it is
-that we correctly compute the maximum number of arguments needed for
-function calls; if that number is too small then the arguments and
-local variables will smash into each other!
+%% To make additional room for passing arguments, we shall
+%% move the stack pointer even further down. We count how many stack
+%% arguments are needed for each function call that occurs inside the
+%% body of the function and find their maximum. Adding this number to the
+%% number of locals gives us how much the \code{rsp} should be moved at
+%% the beginning of the function. In preparation for a function call, we
+%% offset from \code{rsp} to set up the stack arguments. We put the first
+%% stack argument in \code{0(\%rsp)}, the second in \code{8(\%rsp)}, and
+%% so on.
+
+%% Upon calling the function, the stack arguments are retrieved by the
+%% callee using the base pointer \code{rbp}. The address \code{16(\%rbp)}
+%% is the location of the first stack argument, \code{24(\%rbp)} is the
+%% address of the second, and so on. Figure~\ref{fig:call-frames} shows
+%% the layout of the caller and callee frames. Notice how important it is
+%% that we correctly compute the maximum number of arguments needed for
+%% function calls; if that number is too small then the arguments and
+%% local variables will smash into each other!
 
 
 As discussed in Section~\ref{sec:print-x86-reg-alloc}, an x86 function
 As discussed in Section~\ref{sec:print-x86-reg-alloc}, an x86 function
 is responsible for following conventions regarding the use of
 is responsible for following conventions regarding the use of
@@ -5376,6 +5384,33 @@ callee wants to use a callee-saved register, the callee must arrange
 to put the original value back in the register prior to returning to
 to put the original value back in the register prior to returning to
 the caller.
 the caller.
 
 
+Figure~\ref{fig:call-frames} shows the layout of the caller and callee
+frames. If we were to use stack arguments, they would be between the
+caller locals and the callee return address. A function call will
+place a new frame onto the stack, growing downward. There are cases,
+however, where we can \emph{replace} the current frame on the stack in
+a function call, rather than add a new frame.
+
+If a call is the last action in a function body, then that call is
+said to be a \emph{tail call}. In the case of a tail call, whatever
+the callee returns will be immediately returned by the caller, so the
+call can be optimized into a \code{jmp} instruction---the caller will
+jump to the new function, maintaining the same frame and return
+address.
+
+A common use case for this optimization is \emph{tail recursion}: a
+function that calls itself in the tail position is essentially a loop,
+and if it does not grow the stack on each call it can act like
+one. Functional languages like Racket and Scheme typically rely
+heavily on recursion, and so they typically guarantee that \emph{all}
+tail calls will be optimized in this way.
+
+If we were to stick to the calling convention used by C compilers like
+\code{gcc}, it would be awkward to optimize tail calls that require
+stack arguments, so we have simplify the process by imposing an
+invariant that no function passes arguments that way. With this
+invariant, space-efficient tail calls are straightforward to
+implement.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \centering
 \centering
@@ -5386,10 +5421,11 @@ Caller View & Callee View & Contents       & Frame \\ \hline
 -8(\key{\%rbp}) &  & local $1$ \\
 -8(\key{\%rbp}) &  & local $1$ \\
 \ldots & & \ldots \\
 \ldots & & \ldots \\
 $-8k$(\key{\%rbp}) &  & local $k$ \\
 $-8k$(\key{\%rbp}) &  & local $k$ \\
- & &  \\
-$8n-8$\key{(\%rsp)} & $8n+8$(\key{\%rbp})& argument $n$ \\
-& \ldots           & \ldots \\
-0\key{(\%rsp)} & 16(\key{\%rbp})  & argument $1$   & \\ \hline
+ %% & &  \\
+%% $8n-8$\key{(\%rsp)} & $8n+8$(\key{\%rbp})& argument $n$ \\
+%% & \ldots           & \ldots \\
+%% 0\key{(\%rsp)} & 16(\key{\%rbp})  & argument $1$   & \\
+\hline
 & 8(\key{\%rbp})   & return address & \multirow{5}{*}{Callee}\\
 & 8(\key{\%rbp})   & return address & \multirow{5}{*}{Callee}\\
 & 0(\key{\%rbp})   & old \key{rbp} \\
 & 0(\key{\%rbp})   & old \key{rbp} \\
 & -8(\key{\%rbp})  & local $1$ \\
 & -8(\key{\%rbp})  & local $1$ \\
@@ -5414,6 +5450,16 @@ changes to our compiler, that is, do we need any new passes and/or do
 we need to change any existing passes? Also, do we need to add new
 we need to change any existing passes? Also, do we need to add new
 kinds of AST nodes to any of the intermediate languages?
 kinds of AST nodes to any of the intermediate languages?
 
 
+First, we need to transform functions to operate on at most five
+arguments.  There are a total of six registers for passing arguments
+used in the convention previously mentioned, and we will reserve one
+for future use with higher-order functions~\ref{ch:lambdas}. A simple
+strategy for imposing an argument limit of length $n$ is to take all
+arguments $i$ where $i \geq n$ and pack them into a vector, making
+that subsequent vector the $n$th argument, and replacing all
+occurrances of the $i$th variable in the body with a projection from
+the vector. This pass, \code{limit-functions}, can operate directly on $R_4$.
+
 \begin{figure}[tp]
 \begin{figure}[tp]
 \centering
 \centering
 \fbox{
 \fbox{
@@ -5431,7 +5477,7 @@ kinds of AST nodes to any of the intermediate languages?
   &\mid& \gray{(\key{vector}\;\Exp^{+}) \mid
   &\mid& \gray{(\key{vector}\;\Exp^{+}) \mid
     (\key{vector-ref}\;\Exp\;\Int)} \\
     (\key{vector-ref}\;\Exp\;\Int)} \\
   &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})} \\
   &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})} \\
-      &\mid& (\key{app}\, \Exp \; \Exp^{*}) \\
+      &\mid& (\key{app}\, \Exp \; \Exp^{*}) \mid (\key{tailcall}\, \Exp \; \Exp^{*}) \\
   \Def &::=& (\key{define}\; (\itm{label} \; [\Var \key{:} \Type]^{*}) \key{:} \Type \; \Exp) \\
   \Def &::=& (\key{define}\; (\itm{label} \; [\Var \key{:} \Type]^{*}) \key{:} \Type \; \Exp) \\
   F_1 &::=& (\key{program} \; \Def^{*} \; \Exp)
   F_1 &::=& (\key{program} \; \Def^{*} \; \Exp)
 \end{array}
 \end{array}
@@ -5443,11 +5489,11 @@ kinds of AST nodes to any of the intermediate languages?
 \label{fig:f1-syntax}
 \label{fig:f1-syntax}
 \end{figure}
 \end{figure}
 
 
-The syntax of $R_4$ is inconvenient for purposes of
-compilation because it conflates the use of function names and local
-variables and it conflates the application of primitive operations and
-the application of functions. This is a problem because we need to
-compile the use of a function name differently than the use of a local
+The syntax of $R_4$ is inconvenient for purposes of compilation
+because it conflates the use of function names and local variables and
+it conflates the application of primitive operations and the
+application of functions. This is a problem because we need to compile
+the use of a function name differently than the use of a local
 variable; we need to use \code{leaq} to move the function name to a
 variable; we need to use \code{leaq} to move the function name to a
 register. Similarly, the application of a function is going to require
 register. Similarly, the application of a function is going to require
 a complex sequence of instructions, unlike the primitive
 a complex sequence of instructions, unlike the primitive
@@ -5455,14 +5501,15 @@ operations. Thus, it is a good idea to create a new pass that changes
 function references from just a symbol $f$ to \code{(function-ref
 function references from just a symbol $f$ to \code{(function-ref
   $f$)} and that changes function application from \code{($e_0$ $e_1$
   $f$)} and that changes function application from \code{($e_0$ $e_1$
   $\ldots$ $e_n$)} to the explicitly tagged AST \code{(app $e_0$ $e_1$
   $\ldots$ $e_n$)} to the explicitly tagged AST \code{(app $e_0$ $e_1$
-  $\ldots$ $e_n$)}. A good name for this pass is
-\code{reveal-functions} and the output language, $F_1$, is defined in
-Figure~\ref{fig:f1-syntax}. Placing this pass after \code{uniquify} is
-a good idea, because it will make sure that there are no local
-variables and functions that share the same name. On the other hand,
-\code{reveal-functions} needs to come before the \code{flatten} pass
-because \code{flatten} will help us compile \code{function-ref}.
-Figure~\ref{fig:c3-syntax} defines the syntax for $C_3$, the output of
+  $\ldots$ $e_n$)} or \code{(tailcall $e_0$ $e_1$ $\ldots$ $e_n$)}. A
+good name for this pass is \code{reveal-functions} and the output
+language, $F_1$, is defined in Figure~\ref{fig:f1-syntax}. Placing
+this pass after \code{uniquify} is a good idea, because it will make
+sure that there are no local variables and functions that share the
+same name. On the other hand, \code{reveal-functions} needs to come
+before the \code{flatten} pass because \code{flatten} will help us
+compile \code{function-ref}.  Figure~\ref{fig:c3-syntax} defines the
+syntax for $C_3$, the output of
 \key{flatten}.
 \key{flatten}.