|
@@ -2602,17 +2602,14 @@ all, fast code is useless if it produces incorrect results!
|
|
|
|
|
|
\index{register allocation}
|
|
|
|
|
|
-In Chapter~\ref{ch:int-exp} we placed all variables on the stack to
|
|
|
-make our life easier. However, we can improve the performance of the
|
|
|
-generated code if we instead place some variables into registers. The
|
|
|
-CPU can access a register in a single cycle, whereas accessing the
|
|
|
-stack takes many cycles if the relevant data is in cache or many more
|
|
|
-to access main memory if the data is not in cache.
|
|
|
-Figure~\ref{fig:reg-eg} shows a program with four variables that
|
|
|
-serves as a running example. We show the source program and also the
|
|
|
-output of instruction selection. At that point the program is almost
|
|
|
-x86 assembly but not quite; it still contains variables instead of
|
|
|
-stack locations or registers.
|
|
|
+In Chapter~\ref{ch:int-exp} we learned how to store variables on the
|
|
|
+stack. In this Chapter we learn how to improve the performance of the
|
|
|
+generated code by placing some variables into registers. The CPU can
|
|
|
+access a register in a single cycle, whereas accessing the stack can
|
|
|
+take 10s to 100s of cycles. The program in Figure~\ref{fig:reg-eg}
|
|
|
+serves as a running example. The source program is on the left and the
|
|
|
+output of instruction selection is on the right. The program is almost
|
|
|
+in the x86 assembly language but it still uses variables.
|
|
|
|
|
|
\begin{figure}
|
|
|
\begin{minipage}{0.45\textwidth}
|
|
@@ -2654,30 +2651,30 @@ start:
|
|
|
\end{figure}
|
|
|
|
|
|
The goal of register allocation is to fit as many variables into
|
|
|
-registers as possible. A program sometimes has more variables than
|
|
|
-registers, so we cannot always map each variable to a different
|
|
|
+registers as possible. Some programs have more variables than
|
|
|
+registers so we cannot always map each variable to a different
|
|
|
register. Fortunately, it is common for different variables to be
|
|
|
needed during different periods of time during program execution, and
|
|
|
in such cases several variables can be mapped to the same register.
|
|
|
-Consider variables \code{x} and \code{y} in Figure~\ref{fig:reg-eg}.
|
|
|
+Consider variables \code{x} and \code{z} in Figure~\ref{fig:reg-eg}.
|
|
|
After the variable \code{x} is moved to \code{z} it is no longer
|
|
|
-needed. Variable \code{y}, on the other hand, is used only after this
|
|
|
-point, so \code{x} and \code{y} could share the same register. The
|
|
|
+needed. Variable \code{z}, on the other hand, is used only after this
|
|
|
+point, so \code{x} and \code{z} could share the same register. The
|
|
|
topic of Section~\ref{sec:liveness-analysis-r1} is how to compute
|
|
|
where a variable is needed. Once we have that information, we compute
|
|
|
which variables are needed at the same time, i.e., which ones
|
|
|
\emph{interfere} with each other, and represent this relation as an
|
|
|
undirected graph whose vertices are variables and edges indicate when
|
|
|
two variables interfere (Section~\ref{sec:build-interference}). We
|
|
|
-then model register allocation as a graph coloring problem, which we
|
|
|
-discuss in Section~\ref{sec:graph-coloring}.
|
|
|
+then model register allocation as a graph coloring problem
|
|
|
+(Section~\ref{sec:graph-coloring}).
|
|
|
|
|
|
If we run out of registers despite these efforts, we place the
|
|
|
remaining variables on the stack, similar to what we did in
|
|
|
Chapter~\ref{ch:int-exp}. It is common to use the verb \emph{spill}
|
|
|
for assigning a variable to a stack location. The decision to spill a
|
|
|
-variable is handled as part of the graph coloring process described in
|
|
|
-Section~\ref{sec:graph-coloring}.
|
|
|
+variable is handled as part of the graph coloring process
|
|
|
+(Section~\ref{sec:graph-coloring}).
|
|
|
|
|
|
We make the simplifying assumption that each variable is assigned to
|
|
|
one location (a register or stack address). A more sophisticated
|
|
@@ -2687,8 +2684,7 @@ in short sequence and then only used again after many other
|
|
|
instructions, it could be more efficient to assign the variable to a
|
|
|
register during the initial sequence and then move it to the stack for
|
|
|
the rest of its lifetime. We refer the interested reader to
|
|
|
-\citet{Cooper:1998ly} and \citet{Cooper:2011aa} for more information
|
|
|
-about that approach.
|
|
|
+\citet{Cooper:2011aa} for more information about that approach.
|
|
|
|
|
|
% discuss prioritizing variables based on how much they are used.
|
|
|
|
|
@@ -2698,18 +2694,19 @@ about that approach.
|
|
|
|
|
|
As we perform register allocation, we need to be aware of the
|
|
|
\emph{calling conventions} \index{calling conventions} that govern how
|
|
|
-functions calls are performed in x86. Function calls require
|
|
|
-coordination between the caller and the callee, which is often
|
|
|
-assembly code written by different programmers or generated by
|
|
|
-different compilers. Here we follow the System V calling conventions
|
|
|
-that are used by the \code{gcc} compiler on Linux and
|
|
|
+functions calls are performed in x86.
|
|
|
+%
|
|
|
+Even though \LangVar{} does not include programmer-defined functions,
|
|
|
+our generated code includes a \code{main} function that is called by
|
|
|
+the operating system and our generated code contains calls to the
|
|
|
+\code{read\_int} function.
|
|
|
+
|
|
|
+Function calls require coordination between two pieces of code that
|
|
|
+may be written by different programmers or generated by different
|
|
|
+compilers. Here we follow the System V calling conventions that are
|
|
|
+used by the GNU C compiler on Linux and
|
|
|
MacOS~\citep{Bryant:2005aa,Matz:2013aa}.
|
|
|
%
|
|
|
-Even though \LangVar{} does not include programmer-defined functions, our
|
|
|
-generated code will 1) include a \code{main} function that the
|
|
|
-operating system will call to initiate execution, and 2) make calls to
|
|
|
-the \code{read\_int} function in our runtime system.
|
|
|
-
|
|
|
The calling conventions include rules about how functions share the
|
|
|
use of registers. In particular, the caller is responsible for freeing
|
|
|
up some registers prior to the function call for use by the callee.
|
|
@@ -2737,15 +2734,14 @@ view, the caller view and the callee view:
|
|
|
\item The callee can freely use any of the caller-saved registers.
|
|
|
However, if the callee wants to use a callee-saved register, the
|
|
|
callee must arrange to put the original value back in the register
|
|
|
- prior to returning to the caller, which is usually accomplished by
|
|
|
- saving the value to the stack in the prelude of the function and
|
|
|
- restoring the value in the conclusion of the function.
|
|
|
+ prior to returning to the caller. This can be accomplished by saving
|
|
|
+ the value to the stack in the prelude of the function and restoring
|
|
|
+ the value in the conclusion of the function.
|
|
|
\end{itemize}
|
|
|
|
|
|
In x86, registers are also used for passing arguments to a function
|
|
|
and for the return value. In particular, the first six arguments to a
|
|
|
-function are passed in the following six registers, in the order
|
|
|
-given.
|
|
|
+function are passed in the following six registers, in this order.
|
|
|
\begin{lstlisting}
|
|
|
rdi rsi rdx rcx r8 r9
|
|
|
\end{lstlisting}
|
|
@@ -2811,14 +2807,13 @@ prelude begins with saving the \code{rbp} register to the stack and
|
|
|
setting the \code{rbp} to the current stack pointer. We now know why
|
|
|
it is necessary to save the \code{rbp}: it is a callee-saved register.
|
|
|
The prelude then pushes \code{rbx} to the stack because 1) \code{rbx}
|
|
|
-is also a callee-saved register and 2) \code{rbx} is assigned to a
|
|
|
-variable (\code{x}). There are several more callee-saved registers
|
|
|
-that are not saved in the prelude because they were not used. The
|
|
|
-prelude subtracts 8 bytes from the \code{rsp} to make it 16-byte
|
|
|
-aligned and then jumps to the \code{start} block. Shifting attention
|
|
|
-to the \code{conclusion}, we see that \code{rbx} is restored from the
|
|
|
-stack with a \code{popq} instruction.
|
|
|
-\index{prelude}\index{conclusion}
|
|
|
+is a callee-saved register and 2) \code{rbx} is assigned to a variable
|
|
|
+(\code{x}). The other callee-saved registers are not saved in the
|
|
|
+prelude because they are not used. The prelude subtracts 8 bytes from
|
|
|
+the \code{rsp} to make it 16-byte aligned and then jumps to the
|
|
|
+\code{start} block. Shifting attention to the \code{conclusion}, we
|
|
|
+see that \code{rbx} is restored from the stack with a \code{popq}
|
|
|
+instruction. \index{prelude}\index{conclusion}
|
|
|
|
|
|
\begin{figure}[tp]
|
|
|
\begin{minipage}{0.45\textwidth}
|
|
@@ -2861,6 +2856,7 @@ conclusion:
|
|
|
\label{fig:example-calling-conventions}
|
|
|
\end{figure}
|
|
|
|
|
|
+\clearpage
|
|
|
|
|
|
\section{Liveness Analysis}
|
|
|
\label{sec:liveness-analysis-r1}
|
|
@@ -4081,17 +4077,16 @@ programs to make sure that your move biasing is working properly.
|
|
|
\margincomment{\footnotesize To do: another neat challenge would be to do
|
|
|
live range splitting~\citep{Cooper:1998ly}. \\ --Jeremy}
|
|
|
|
|
|
-\section{Output of the Running Example}
|
|
|
-\label{sec:reg-alloc-output}
|
|
|
-\index{prelude}\index{conclusion}
|
|
|
+%% \subsection{Output of the Running Example}
|
|
|
+%% \label{sec:reg-alloc-output}
|
|
|
|
|
|
Figure~\ref{fig:running-example-x86} shows the x86 code generated for
|
|
|
the running example (Figure~\ref{fig:reg-eg}) with register allocation
|
|
|
and move biasing. To demonstrate both the use of registers and the
|
|
|
stack, we have limited the register allocator to use just two
|
|
|
-registers: \code{rbx} and \code{rcx}. In the prelude of the
|
|
|
-\code{main} function, we push \code{rbx} onto the stack because it is
|
|
|
-a callee-saved register and it was assigned to variable by the
|
|
|
+registers: \code{rbx} and \code{rcx}. In the prelude\index{prelude}
|
|
|
+of the \code{main} function, we push \code{rbx} onto the stack because
|
|
|
+it is a callee-saved register and it was assigned to variable by the
|
|
|
register allocator. We subtract \code{8} from the \code{rsp} at the
|
|
|
end of the prelude to reserve space for the one spilled variable.
|
|
|
After that subtraction, the \code{rsp} is aligned to 16 bytes.
|
|
@@ -4105,11 +4100,11 @@ stack. The spilled variables must be placed lower on the stack than
|
|
|
the saved callee-save registers, so in this case \code{w} is placed at
|
|
|
\code{-16(\%rbp)}.
|
|
|
|
|
|
-In the \code{conclusion}, we undo the work that was done in the
|
|
|
-prelude. We move the stack pointer up by \code{8} bytes (the room for
|
|
|
-spilled variables), then we pop the old values of \code{rbx} and
|
|
|
-\code{rbp} (callee-saved registers), and finish with \code{retq} to
|
|
|
-return control to the operating system.
|
|
|
+In the \code{conclusion}\index{conclusion}, we undo the work that was
|
|
|
+done in the prelude. We move the stack pointer up by \code{8} bytes
|
|
|
+(the room for spilled variables), then we pop the old values of
|
|
|
+\code{rbx} and \code{rbp} (callee-saved registers), and finish with
|
|
|
+\code{retq} to return control to the operating system.
|
|
|
|
|
|
|
|
|
\begin{figure}[tbp]
|