4 years ago · 8ef46f5fd2
--- a/all.bib
+++ b/all.bib
--- a/book.tex
+++ b/book.tex
@@ -2535,9 +2535,9 @@ point, so \code{x} and \code{y} could share the same register. The
 
				 topic of Section~\ref{sec:liveness-analysis-r1} is how to compute
			
 
				 where a variable is needed.  Once we have that information, we compute
			
 
				 which variables are needed at the same time, i.e., which ones
			
 
				-\emph{interfere}, and represent this relation as an undirected graph
			
 
				-whose vertices are variables and edges indicate when two variables
			
 
				-interfere with each other (Section~\ref{sec:build-interference}). We
			
 
				+\emph{interfere} with each other, and represent this relation as an
			
 
				+undirected graph whose vertices are variables and edges indicate when
			
 
				+two variables interfere (Section~\ref{sec:build-interference}). We
			
 
				 then model register allocation as a graph coloring problem, which we
			
 
				 discuss in Section~\ref{sec:graph-coloring}.
			
 
				 
			
@@ -2548,17 +2548,32 @@ for assigning a variable to a stack location. The process of spilling
 
				 variables is handled as part of the graph coloring process described
			
 
				 in \ref{sec:graph-coloring}.
			
 
				 
			
 
				+We make the simplifying assumption that each variable is assigned to
			
 
				+one location (a register or stack address). A more sophisticated
			
 
				+approach is to assign a variable to one or more locations in different
			
 
				+regions of the program.  For example, if a variable is used many times
			
 
				+in short sequence and then only used again after many other
			
 
				+instructions, it could be more efficient to assign the variable to a
			
 
				+register during the intial sequence and then move it to the stack for
			
 
				+the rest of its lifetime. We refer the interested reader to
			
 
				+\citet{Cooper:1998ly} and \citet{Cooper:2011aa} for more information
			
 
				+about this approach.
			
 
				+
			
 
				+% discuss prioritizing variables based on how much they are used.
			
 
				+
			
 
				 \section{Registers and Calling Conventions}
			
 
				 \label{sec:calling-conventions}
			
 
				 
			
 
				 As we perform register allocation, we need to be aware of the
			
 
				 conventions that govern the way in which registers interact with
			
 
				-function calls, such as calls to the \code{read\_int} function. The
			
 
				-convention for x86 is that the caller is responsible for freeing up
			
 
				-some registers, the \emph{caller-saved registers}, prior to the
			
 
				-function call, and the callee is responsible for saving and restoring
			
 
				-some other registers, the \emph{callee-saved registers}, before and
			
 
				-after using them. The caller-saved registers are
			
 
				+function calls, such as calls to the \code{read\_int} function in our
			
 
				+generated code and even the call that the operating system makes to
			
 
				+execute our \code{main} function.  The convention for x86 is that the
			
 
				+caller is responsible for freeing up some registers, the
			
 
				+\emph{caller-saved registers}, prior to the function call, and the
			
 
				+callee is responsible for preserving the values in some other
			
 
				+registers, the \emph{callee-saved registers}. The caller-saved
			
 
				+registers are
			
 
				 \begin{lstlisting}
			
 
				   rax rcx rdx rsi rdi r8 r9 r10 r11
			
 
				 \end{lstlisting}
			
@@ -2566,16 +2581,129 @@ while the callee-saved registers are
 
				 \begin{lstlisting}
			
 
				   rsp rbp rbx r12 r13 r14 r15
			
 
				 \end{lstlisting}
			
 
				-Another way to think about this caller/callee convention is the
			
 
				-following. The caller should assume that all the caller-saved registers
			
 
				-get overwritten with arbitrary values by the callee.  On the other
			
 
				-hand, the caller can safely assume that all the callee-saved registers
			
 
				-contain the same values after the call that they did before the call.
			
 
				-The callee can freely use any of the caller-saved registers.  However,
			
 
				-if the callee wants to use a callee-saved register, the callee must
			
 
				-arrange to put the original value back in the register prior to
			
 
				-returning to the caller, which is usually accomplished by saving and
			
 
				-restoring the value from the stack.
			
 
				+
			
 
				+We can think about this caller/callee convention from two points of
			
 
				+view, the caller view and the callee view:
			
 
				+\begin{itemize}
			
 
				+\item The caller should assume that all the caller-saved registers get
			
 
				+  overwritten with arbitrary values by the callee.  On the other hand,
			
 
				+  the caller can safely assume that all the callee-saved registers
			
 
				+  contain the same values after the call that they did before the
			
 
				+  call.
			
 
				+\item The callee can freely use any of the caller-saved registers.
			
 
				+  However, if the callee wants to use a callee-saved register, the
			
 
				+  callee must arrange to put the original value back in the register
			
 
				+  prior to returning to the caller, which is usually accomplished by
			
 
				+  saving the value to the stack in the prelude of the function and
			
 
				+  restoring the value in the conclusion of the function.
			
 
				+\end{itemize}
			
 
				+
			
 
				+The next question is how these calling conventions impact register
			
 
				+allocation. Consider the $R_1$ program in
			
 
				+Figure~\ref{fig:example-calling-conventions}.  We first analyze this
			
 
				+example from the caller point of view and then from the callee point
			
 
				+of view.
			
 
				+
			
 
				+The program makes two calls to the \code{read} function.  Also, the
			
 
				+variable \code{x} is in-use during the second call to \code{read}, so
			
 
				+we need to make sure that the value in \code{x} does not get
			
 
				+accidentally wiped out by the call to \code{read}.  One obvious
			
 
				+approach is to save all the values in caller-saved registers to the
			
 
				+stack prior to each function call, and restore them after each
			
 
				+call. That way, if the register allocator chooses to assign \code{x}
			
 
				+to a caller-saved register, its value will be preserved accross the
			
 
				+call to \code{read}.  However, the disadvantage of this approach is
			
 
				+that saving and restoring to the stack is relatively slow. If \code{x}
			
 
				+is not used many times, it may be better to assign \code{x} to a stack
			
 
				+location in the first place. Or better yet, if we can arrange for
			
 
				+\code{x} to be placed in a callee-saved register, then it won't need
			
 
				+to be saved and restored during function calls.
			
 
				+
			
 
				+The approach that we recommend is to treat variables differently
			
 
				+depending on whether they are in-use during a function call.  If a
			
 
				+variable is in-use during a function call, then we never assign it to
			
 
				+a caller-saved register: we either assign it to a callee-saved
			
 
				+register or we spill it to the stack. If a variable is not in-use
			
 
				+during any function call, then we try the following alternatives in
			
 
				+order 1) look for an available caller-saved register (to leave room
			
 
				+for other variables in the callee-saved register), 2) look for a
			
 
				+callee-saved register, and 3) spill the variable to the stack.
			
 
				+
			
 
				+It is straightforward to implement this approach in a graph coloring
			
 
				+register allocator. First, we know which variables are in-use during
			
 
				+every function call because we compute that information for every
			
 
				+instruciton (Section~\ref{sec:liveness-analysis-r1}). Second, when we
			
 
				+build the interference graph (Section~\ref{sec:build-interference}),
			
 
				+we can place an edge between each of these variables and the
			
 
				+caller-saved registers in the interference graph. This will prevent
			
 
				+the graph coloring algorithm from assigning those variables to
			
 
				+caller-saved registers.
			
 
				+
			
 
				+Returning to the example in
			
 
				+Figure~\ref{fig:example-calling-conventions}, let us analyze the
			
 
				+generated x86 code on the right-hand side, focusing on the
			
 
				+\code{start} block. Notice that variable \code{x} is assigned to
			
 
				+\code{rbx}, a callee-saved register. Thus, it is already in a safe
			
 
				+place during the second call to \code{read\_int}. Next, notice that
			
 
				+variable \code{y} is assigned to \code{rcx}, a caller-saved register,
			
 
				+because there are no function calls in the remainder of the block.
			
 
				+
			
 
				+Next we analyze the example from the callee point of view, focusing on
			
 
				+the prelude and conclusion of the \code{main} function. As usual the
			
 
				+prelude begins with saving the \code{rbp} register to the stack and
			
 
				+setting the \code{rbp} to the current stack pointer. We now know why
			
 
				+it is necessary to save the \code{rbp}: it is a callee-saved register.
			
 
				+The prelude then pushes \code{rbx} to the stack because 1) \code{rbx}
			
 
				+is also a callee-saved register and 2) \code{rbx} is assigned to a
			
 
				+variable (\code{x}). There are several more callee-saved register that
			
 
				+are not saved in the prelude because they were not assigned to
			
 
				+variables. The prelude subtracts 8 bytes from the \code{rsp} to make
			
 
				+it 16-byte aligned and then jumps to the \code{start} block. Shifting
			
 
				+attention to the \code{conclusion}, we see that \code{rbx} is restored
			
 
				+from the stack with a \code{popq} instruction.
			
 
				+
			
 
				+\begin{figure}[tp]
			
 
				+\begin{minipage}{0.45\textwidth}
			
 
				+Example $R_1$ program:
			
 
				+%s0_14.rkt
			
 
				+\begin{lstlisting}
			
 
				+(let ([x (read)])
			
 
				+  (let ([y (read)])
			
 
				+    (+ (+ x y) 42)))
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\begin{minipage}{0.45\textwidth}
			
 
				+Generated x86 assembly:
			
 
				+\begin{lstlisting}
			
 
				+start:
			
 
				+	callq	read_int
			
 
				+	movq	%rax, %rbx
			
 
				+	callq	read_int
			
 
				+	movq	%rax, %rcx
			
 
				+	addq	%rcx, %rbx
			
 
				+	movq	%rbx, %rax
			
 
				+	addq	$42, %rax
			
 
				+	jmp _conclusion
			
 
				+
			
 
				+	.globl main
			
 
				+main:
			
 
				+	pushq	%rbp
			
 
				+	movq	%rsp, %rbp
			
 
				+	pushq	%rbx
			
 
				+	subq	$8, %rsp
			
 
				+	jmp start
			
 
				+conclusion:
			
 
				+	addq	$8, %rsp
			
 
				+	popq	%rbx
			
 
				+	popq	%rbp
			
 
				+	retq
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\caption{Example with function calls.}
			
 
				+  \label{fig:example-calling-conventions}
			
 
				+\end{figure}
			
 
				+
			
 
				+
			
 
				 
			
 
				 
			
 
				 \section{Liveness Analysis}
			
@@ -2685,7 +2813,7 @@ Figure~\ref{fig:live-eg} shows the results of live variables analysis
 
				 for the running example program, with the live-before and live-after
			
 
				 sets shown between each instruction to make the figure easy to read.
			
 
				 
			
 
				-\begin{figure}[tbp]
			
 
				+\begin{figure}[tp]
			
 
				 \hspace{20pt}
			
 
				 \begin{minipage}{0.45\textwidth}
			
 
				 \begin{lstlisting}
			
@@ -3278,15 +3406,15 @@ reason for this is that our \code{main} function must adhere to the
 
				 x86 calling conventions that we described in
			
 
				 Section~\ref{sec:calling-conventions}.  Furthermore, if your register
			
 
				 allocator assigned variables to other callee-saved registers
			
 
				-(e.g. rbx, r12, etc.), then those variables must also be saved to the
			
 
				-stack in the prelude and restored in the conclusion.  The simplest
			
 
				-approach is to save and restore all of the callee-saved registers. The
			
 
				-more efficient approach is to keep track of which callee-saved
			
 
				-registers were used and only save and restore them. Either way, make
			
 
				-sure to take this use of stack space into account when you are
			
 
				-calculating the size of the frame and adjusting the \code{rsp} in the
			
 
				-prelude. Also, don't forget that the size of the frame needs to be a
			
 
				-multiple of 16 bytes!
			
 
				+(e.g. \code{rbx}, \code{r12}, etc.), then those variables must also be
			
 
				+saved to the stack in the prelude and restored in the conclusion.  The
			
 
				+simplest approach is to save and restore all of the callee-saved
			
 
				+registers. The more efficient approach is to keep track of which
			
 
				+callee-saved registers were used and only save and restore
			
 
				+them. Either way, make sure to take this use of stack space into
			
 
				+account when you are calculating the size of the frame and adjusting
			
 
				+the \code{rsp} in the prelude. Also, don't forget that the size of the
			
 
				+frame needs to be a multiple of 16 bytes!
			
 
				 
			
 
				 
			
 
				 \section{Challenge: Move Biasing}
			
@@ -3566,7 +3694,7 @@ return control to the operating system.
 
				   % s0_28.rkt
			
 
				   % (use-minimal-set-of-registers! #t)
			
 
				   % and only rbx rcx
			
 
				-% tmp 0
			
 
				+% tmp 0 rbx
			
 
				 % z 1  rcx
			
 
				 % y 0  rbx
			
 
				 % w 2  16(%rbp)
			
@@ -3591,6 +3719,7 @@ main:
 
				 	pushq	%rbx
			
 
				 	subq	$8, %rsp
			
 
				 	jmp start
			
 
				+        
			
 
				 conclusion:
			
 
				 	addq	$8, %rsp
			
 
				 	popq	%rbx