|
@@ -5326,44 +5326,52 @@ with an asterisk.
|
|
callq *%rbx
|
|
callq *%rbx
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
|
|
|
|
-The x86 architecture does not directly support passing arguments to
|
|
|
|
-functions; instead we use a combination of registers and stack
|
|
|
|
-locations for passing arguments, following the conventions used by
|
|
|
|
-\code{gcc} as described by \cite{Matz:2013aa}. Up to six arguments may
|
|
|
|
-be passed in registers, using the registers \code{rdi}, \code{rsi},
|
|
|
|
|
|
+Because the x86 architecture does not have any direct support for
|
|
|
|
+passing arguments to functions, compiler implementers typically adopt
|
|
|
|
+a \emph{convention} to follow for how arguments are passed to
|
|
|
|
+functions. The convention for C compilers such as \code{gcc} (as
|
|
|
|
+described in \cite{Matz:2013aa}), uses a combination of registers and
|
|
|
|
+stack locations for passing arguments. Up to six arguments may be
|
|
|
|
+passed in registers, using the registers \code{rdi}, \code{rsi},
|
|
\code{rdx}, \code{rcx}, \code{r8}, and \code{r9}, in that order. If
|
|
\code{rdx}, \code{rcx}, \code{r8}, and \code{r9}, in that order. If
|
|
there are more than six arguments, then the rest must be placed on the
|
|
there are more than six arguments, then the rest must be placed on the
|
|
-stack, which we call \emph{stack arguments}, which we discuss in later
|
|
|
|
-paragraphs. The register \code{rax} is for the return value of the
|
|
|
|
-function.
|
|
|
|
-
|
|
|
|
-Recall from Section~\ref{sec:x86} that the stack is also used for
|
|
|
|
-local variables and for storing the values of callee-saved registers
|
|
|
|
-(we shall refer to all of these collectively as ``locals''), and that
|
|
|
|
-at the beginning of a function we move the stack pointer \code{rsp}
|
|
|
|
-down to make room for them.
|
|
|
|
|
|
+stack, which we call \emph{stack arguments}. The register \code{rax}
|
|
|
|
+is for the return value of the function.
|
|
|
|
+
|
|
|
|
+We will be using a modification of this convention. For reasons that
|
|
|
|
+will be explained in subsequent paragraphs, we will not make use of
|
|
|
|
+stack arguments, and instead restrict functions to passing arguments
|
|
|
|
+exclusively in registers. To enforce this restriction, functions of
|
|
|
|
+too many arguments will be transformed to pass additional arguments in
|
|
|
|
+a vector.
|
|
|
|
+
|
|
|
|
+%% Recall from Section~\ref{sec:x86} that the stack is also used for
|
|
|
|
+%% local variables and for storing the values of callee-saved registers
|
|
|
|
+%% (we shall refer to all of these collectively as ``locals''), and that
|
|
|
|
+%% at the beginning of a function we move the stack pointer \code{rsp}
|
|
|
|
+%% down to make room for them.
|
|
%% We recommend storing the local variables
|
|
%% We recommend storing the local variables
|
|
%% first and then the callee-saved registers, so that the local variables
|
|
%% first and then the callee-saved registers, so that the local variables
|
|
%% can be accessed using \code{rbp} the same as before the addition of
|
|
%% can be accessed using \code{rbp} the same as before the addition of
|
|
%% functions.
|
|
%% functions.
|
|
-To make additional room for passing arguments, we shall
|
|
|
|
-move the stack pointer even further down. We count how many stack
|
|
|
|
-arguments are needed for each function call that occurs inside the
|
|
|
|
-body of the function and find their maximum. Adding this number to the
|
|
|
|
-number of locals gives us how much the \code{rsp} should be moved at
|
|
|
|
-the beginning of the function. In preparation for a function call, we
|
|
|
|
-offset from \code{rsp} to set up the stack arguments. We put the first
|
|
|
|
-stack argument in \code{0(\%rsp)}, the second in \code{8(\%rsp)}, and
|
|
|
|
-so on.
|
|
|
|
-
|
|
|
|
-Upon calling the function, the stack arguments are retrieved by the
|
|
|
|
-callee using the base pointer \code{rbp}. The address \code{16(\%rbp)}
|
|
|
|
-is the location of the first stack argument, \code{24(\%rbp)} is the
|
|
|
|
-address of the second, and so on. Figure~\ref{fig:call-frames} shows
|
|
|
|
-the layout of the caller and callee frames. Notice how important it is
|
|
|
|
-that we correctly compute the maximum number of arguments needed for
|
|
|
|
-function calls; if that number is too small then the arguments and
|
|
|
|
-local variables will smash into each other!
|
|
|
|
|
|
+%% To make additional room for passing arguments, we shall
|
|
|
|
+%% move the stack pointer even further down. We count how many stack
|
|
|
|
+%% arguments are needed for each function call that occurs inside the
|
|
|
|
+%% body of the function and find their maximum. Adding this number to the
|
|
|
|
+%% number of locals gives us how much the \code{rsp} should be moved at
|
|
|
|
+%% the beginning of the function. In preparation for a function call, we
|
|
|
|
+%% offset from \code{rsp} to set up the stack arguments. We put the first
|
|
|
|
+%% stack argument in \code{0(\%rsp)}, the second in \code{8(\%rsp)}, and
|
|
|
|
+%% so on.
|
|
|
|
+
|
|
|
|
+%% Upon calling the function, the stack arguments are retrieved by the
|
|
|
|
+%% callee using the base pointer \code{rbp}. The address \code{16(\%rbp)}
|
|
|
|
+%% is the location of the first stack argument, \code{24(\%rbp)} is the
|
|
|
|
+%% address of the second, and so on. Figure~\ref{fig:call-frames} shows
|
|
|
|
+%% the layout of the caller and callee frames. Notice how important it is
|
|
|
|
+%% that we correctly compute the maximum number of arguments needed for
|
|
|
|
+%% function calls; if that number is too small then the arguments and
|
|
|
|
+%% local variables will smash into each other!
|
|
|
|
|
|
As discussed in Section~\ref{sec:print-x86-reg-alloc}, an x86 function
|
|
As discussed in Section~\ref{sec:print-x86-reg-alloc}, an x86 function
|
|
is responsible for following conventions regarding the use of
|
|
is responsible for following conventions regarding the use of
|
|
@@ -5376,6 +5384,33 @@ callee wants to use a callee-saved register, the callee must arrange
|
|
to put the original value back in the register prior to returning to
|
|
to put the original value back in the register prior to returning to
|
|
the caller.
|
|
the caller.
|
|
|
|
|
|
|
|
+Figure~\ref{fig:call-frames} shows the layout of the caller and callee
|
|
|
|
+frames. If we were to use stack arguments, they would be between the
|
|
|
|
+caller locals and the callee return address. A function call will
|
|
|
|
+place a new frame onto the stack, growing downward. There are cases,
|
|
|
|
+however, where we can \emph{replace} the current frame on the stack in
|
|
|
|
+a function call, rather than add a new frame.
|
|
|
|
+
|
|
|
|
+If a call is the last action in a function body, then that call is
|
|
|
|
+said to be a \emph{tail call}. In the case of a tail call, whatever
|
|
|
|
+the callee returns will be immediately returned by the caller, so the
|
|
|
|
+call can be optimized into a \code{jmp} instruction---the caller will
|
|
|
|
+jump to the new function, maintaining the same frame and return
|
|
|
|
+address.
|
|
|
|
+
|
|
|
|
+A common use case for this optimization is \emph{tail recursion}: a
|
|
|
|
+function that calls itself in the tail position is essentially a loop,
|
|
|
|
+and if it does not grow the stack on each call it can act like
|
|
|
|
+one. Functional languages like Racket and Scheme typically rely
|
|
|
|
+heavily on recursion, and so they typically guarantee that \emph{all}
|
|
|
|
+tail calls will be optimized in this way.
|
|
|
|
+
|
|
|
|
+If we were to stick to the calling convention used by C compilers like
|
|
|
|
+\code{gcc}, it would be awkward to optimize tail calls that require
|
|
|
|
+stack arguments, so we have simplify the process by imposing an
|
|
|
|
+invariant that no function passes arguments that way. With this
|
|
|
|
+invariant, space-efficient tail calls are straightforward to
|
|
|
|
+implement.
|
|
|
|
|
|
\begin{figure}[tbp]
|
|
\begin{figure}[tbp]
|
|
\centering
|
|
\centering
|
|
@@ -5386,10 +5421,11 @@ Caller View & Callee View & Contents & Frame \\ \hline
|
|
-8(\key{\%rbp}) & & local $1$ \\
|
|
-8(\key{\%rbp}) & & local $1$ \\
|
|
\ldots & & \ldots \\
|
|
\ldots & & \ldots \\
|
|
$-8k$(\key{\%rbp}) & & local $k$ \\
|
|
$-8k$(\key{\%rbp}) & & local $k$ \\
|
|
- & & \\
|
|
|
|
-$8n-8$\key{(\%rsp)} & $8n+8$(\key{\%rbp})& argument $n$ \\
|
|
|
|
-& \ldots & \ldots \\
|
|
|
|
-0\key{(\%rsp)} & 16(\key{\%rbp}) & argument $1$ & \\ \hline
|
|
|
|
|
|
+ %% & & \\
|
|
|
|
+%% $8n-8$\key{(\%rsp)} & $8n+8$(\key{\%rbp})& argument $n$ \\
|
|
|
|
+%% & \ldots & \ldots \\
|
|
|
|
+%% 0\key{(\%rsp)} & 16(\key{\%rbp}) & argument $1$ & \\
|
|
|
|
+\hline
|
|
& 8(\key{\%rbp}) & return address & \multirow{5}{*}{Callee}\\
|
|
& 8(\key{\%rbp}) & return address & \multirow{5}{*}{Callee}\\
|
|
& 0(\key{\%rbp}) & old \key{rbp} \\
|
|
& 0(\key{\%rbp}) & old \key{rbp} \\
|
|
& -8(\key{\%rbp}) & local $1$ \\
|
|
& -8(\key{\%rbp}) & local $1$ \\
|
|
@@ -5414,6 +5450,16 @@ changes to our compiler, that is, do we need any new passes and/or do
|
|
we need to change any existing passes? Also, do we need to add new
|
|
we need to change any existing passes? Also, do we need to add new
|
|
kinds of AST nodes to any of the intermediate languages?
|
|
kinds of AST nodes to any of the intermediate languages?
|
|
|
|
|
|
|
|
+First, we need to transform functions to operate on at most five
|
|
|
|
+arguments. There are a total of six registers for passing arguments
|
|
|
|
+used in the convention previously mentioned, and we will reserve one
|
|
|
|
+for future use with higher-order functions~\ref{ch:lambdas}. A simple
|
|
|
|
+strategy for imposing an argument limit of length $n$ is to take all
|
|
|
|
+arguments $i$ where $i \geq n$ and pack them into a vector, making
|
|
|
|
+that subsequent vector the $n$th argument, and replacing all
|
|
|
|
+occurrances of the $i$th variable in the body with a projection from
|
|
|
|
+the vector. This pass, \code{limit-functions}, can operate directly on $R_4$.
|
|
|
|
+
|
|
\begin{figure}[tp]
|
|
\begin{figure}[tp]
|
|
\centering
|
|
\centering
|
|
\fbox{
|
|
\fbox{
|
|
@@ -5431,7 +5477,7 @@ kinds of AST nodes to any of the intermediate languages?
|
|
&\mid& \gray{(\key{vector}\;\Exp^{+}) \mid
|
|
&\mid& \gray{(\key{vector}\;\Exp^{+}) \mid
|
|
(\key{vector-ref}\;\Exp\;\Int)} \\
|
|
(\key{vector-ref}\;\Exp\;\Int)} \\
|
|
&\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})} \\
|
|
&\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})} \\
|
|
- &\mid& (\key{app}\, \Exp \; \Exp^{*}) \\
|
|
|
|
|
|
+ &\mid& (\key{app}\, \Exp \; \Exp^{*}) \mid (\key{tailcall}\, \Exp \; \Exp^{*}) \\
|
|
\Def &::=& (\key{define}\; (\itm{label} \; [\Var \key{:} \Type]^{*}) \key{:} \Type \; \Exp) \\
|
|
\Def &::=& (\key{define}\; (\itm{label} \; [\Var \key{:} \Type]^{*}) \key{:} \Type \; \Exp) \\
|
|
F_1 &::=& (\key{program} \; \Def^{*} \; \Exp)
|
|
F_1 &::=& (\key{program} \; \Def^{*} \; \Exp)
|
|
\end{array}
|
|
\end{array}
|
|
@@ -5443,11 +5489,11 @@ kinds of AST nodes to any of the intermediate languages?
|
|
\label{fig:f1-syntax}
|
|
\label{fig:f1-syntax}
|
|
\end{figure}
|
|
\end{figure}
|
|
|
|
|
|
-The syntax of $R_4$ is inconvenient for purposes of
|
|
|
|
-compilation because it conflates the use of function names and local
|
|
|
|
-variables and it conflates the application of primitive operations and
|
|
|
|
-the application of functions. This is a problem because we need to
|
|
|
|
-compile the use of a function name differently than the use of a local
|
|
|
|
|
|
+The syntax of $R_4$ is inconvenient for purposes of compilation
|
|
|
|
+because it conflates the use of function names and local variables and
|
|
|
|
+it conflates the application of primitive operations and the
|
|
|
|
+application of functions. This is a problem because we need to compile
|
|
|
|
+the use of a function name differently than the use of a local
|
|
variable; we need to use \code{leaq} to move the function name to a
|
|
variable; we need to use \code{leaq} to move the function name to a
|
|
register. Similarly, the application of a function is going to require
|
|
register. Similarly, the application of a function is going to require
|
|
a complex sequence of instructions, unlike the primitive
|
|
a complex sequence of instructions, unlike the primitive
|
|
@@ -5455,14 +5501,15 @@ operations. Thus, it is a good idea to create a new pass that changes
|
|
function references from just a symbol $f$ to \code{(function-ref
|
|
function references from just a symbol $f$ to \code{(function-ref
|
|
$f$)} and that changes function application from \code{($e_0$ $e_1$
|
|
$f$)} and that changes function application from \code{($e_0$ $e_1$
|
|
$\ldots$ $e_n$)} to the explicitly tagged AST \code{(app $e_0$ $e_1$
|
|
$\ldots$ $e_n$)} to the explicitly tagged AST \code{(app $e_0$ $e_1$
|
|
- $\ldots$ $e_n$)}. A good name for this pass is
|
|
|
|
-\code{reveal-functions} and the output language, $F_1$, is defined in
|
|
|
|
-Figure~\ref{fig:f1-syntax}. Placing this pass after \code{uniquify} is
|
|
|
|
-a good idea, because it will make sure that there are no local
|
|
|
|
-variables and functions that share the same name. On the other hand,
|
|
|
|
-\code{reveal-functions} needs to come before the \code{flatten} pass
|
|
|
|
-because \code{flatten} will help us compile \code{function-ref}.
|
|
|
|
-Figure~\ref{fig:c3-syntax} defines the syntax for $C_3$, the output of
|
|
|
|
|
|
+ $\ldots$ $e_n$)} or \code{(tailcall $e_0$ $e_1$ $\ldots$ $e_n$)}. A
|
|
|
|
+good name for this pass is \code{reveal-functions} and the output
|
|
|
|
+language, $F_1$, is defined in Figure~\ref{fig:f1-syntax}. Placing
|
|
|
|
+this pass after \code{uniquify} is a good idea, because it will make
|
|
|
|
+sure that there are no local variables and functions that share the
|
|
|
|
+same name. On the other hand, \code{reveal-functions} needs to come
|
|
|
|
+before the \code{flatten} pass because \code{flatten} will help us
|
|
|
|
+compile \code{function-ref}. Figure~\ref{fig:c3-syntax} defines the
|
|
|
|
+syntax for $C_3$, the output of
|
|
\key{flatten}.
|
|
\key{flatten}.
|
|
|
|
|
|
|
|
|