4 tahun lalu · d1491b02c4
--- a/book.tex
+++ b/book.tex
@@ -1395,17 +1395,15 @@ procedure call. The memory layout for an individual frame is shown in
 
															 Figure~\ref{fig:frame}.  The register \key{rsp} is called the
														
 
															 \emph{stack pointer} and points to the item at the top of the
														
 
															 stack. The stack grows downward in memory, so we increase the size of
														
 
															-the stack by subtracting from the stack pointer. Some operating
														
 
															-systems require the frame size to be a multiple of 16 bytes. In the
														
 
															-context of a procedure call, the \emph{return address} is the next
														
 
															-instruction after the call instruction on the caller side. During a
														
 
															-function call, the return address is pushed onto the stack.  The
														
 
															-register \key{rbp} is the \emph{base pointer} and is used to access
														
 
															-variables associated with the current procedure call.  The base
														
 
															-pointer of the caller is pushed onto the stack after the return
														
 
															-address. We number the variables from $1$ to $n$. Variable $1$ is
														
 
															-stored at address $-8\key{(\%rbp)}$, variable $2$ at
														
 
															-$-16\key{(\%rbp)}$, etc.
														
 
															+the stack by subtracting from the stack pointer.  In the context of a
														
 
															+procedure call, the \emph{return address} is the next instruction
														
 
															+after the call instruction on the caller side. During a function call,
														
 
															+the return address is pushed onto the stack.  The register \key{rbp}
														
 
															+is the \emph{base pointer} and is used to access variables associated
														
 
															+with the current procedure call.  The base pointer of the caller is
														
 
															+pushed onto the stack after the return address. We number the
														
 
															+variables from $1$ to $n$. Variable $1$ is stored at address
														
 
															+$-8\key{(\%rbp)}$, variable $2$ at $-16\key{(\%rbp)}$, etc.
														
 
															 \begin{figure}[tbp]
														
 
															 \begin{lstlisting}
														
@@ -1448,36 +1446,49 @@ Position & Contents \\ \hline
 
															 \label{fig:frame}
														
 
															 \end{figure}
														
 
															-Getting back to the program in Figure~\ref{fig:p1-x86}, the first
														
 
															-three instructions are the typical \emph{prelude} for a procedure.
														
 
															-The instruction \key{pushq \%rbp} saves the base pointer for the
														
 
															-caller onto the stack and subtracts $8$ from the stack pointer. The
														
 
															-second instruction \key{movq \%rsp, \%rbp} changes the base pointer so
														
 
															-that it points the location of the old base pointer. The instruction
														
 
															-\key{subq \$16, \%rsp} moves the stack pointer down to make enough
														
 
															-room for storing variables.  This program needs one variable ($8$
														
 
															-bytes) but because the frame size is required to be a multiple of 16
														
 
															-bytes, the space for variables is rounded up to 16 bytes.
														
 
															+
														
 
															+Getting back to the program in Figure~\ref{fig:p1-x86}, consider how
														
 
															+control is transfered from the operating system to the \code{main}
														
 
															+function.  The operating system issues a \code{callq main} instruction
														
 
															+which pushes its return address on the stack and then jumps to
														
 
															+\code{main}. In x86-64, the stack pointer \code{rsp} must be divisible
														
 
															+by 16 bytes prior to the execution of any \code{callq} instruction, so
														
 
															+when control arrives at \code{main}, the \code{rsp} is 8 bytes out of
														
 
															+alignment (because the \code{callq} pushed the return address).  The
														
 
															+first three instructions are the typical \emph{prelude} for a
														
 
															+procedure.  The instruction \code{pushq \%rbp} saves the base pointer
														
 
															+for the caller onto the stack and subtracts $8$ from the stack
														
 
															+pointer. At this point the stack pointer is back to being 16-byte
														
 
															+aligned. The second instruction \code{movq \%rsp, \%rbp} changes the
														
 
															+base pointer so that it points the location of the old base
														
 
															+pointer. The instruction \code{subq \$16, \%rsp} moves the stack
														
 
															+pointer down to make enough room for storing variables.  This program
														
 
															+needs one variable ($8$ bytes) but we round up to 16 bytes to maintain
														
 
															+the 16-byte alignment of the \code{rsp}. With the \code{rsp} aligned,
														
 
															+we are ready to make calls to other functions. The last instruction of
														
 
															+the prelude is \code{jmp start}, which transfers control to the
														
 
															+instructions that were generated from the Racket expression \code{(+
														
 
															+  10 32)}.
														
 
															 The four instructions under the label \code{start} carry out the work
														
 
															 of computing \code{(+ 52 (- 10)))}. The first instruction
														
 
															-\key{movq \$10, -8(\%rbp)} stores $10$ in variable $1$. The
														
 
															-instruction \key{negq -8(\%rbp)} changes variable $1$ to $-10$. The
														
 
															-instruction \key{movq \$52, \%rax} places $52$ in the register \key{rax} and
														
 
															-finally \key{addq -8(\%rbp), \%rax} adds the contents of variable $1$ to
														
 
															-\key{rax}, at which point \key{rax} contains $42$.
														
 
															+\code{movq \$10, -8(\%rbp)} stores $10$ in variable $1$. The
														
 
															+instruction \code{negq -8(\%rbp)} changes variable $1$ to $-10$. The
														
 
															+instruction \code{movq \$52, \%rax} places $52$ in the register \code{rax} and
														
 
															+finally \code{addq -8(\%rbp), \%rax} adds the contents of variable $1$ to
														
 
															+\code{rax}, at which point \code{rax} contains $42$.
														
 
															 The three instructions under the label \code{conclusion} are the
														
 
															-typical \emph{finale} of a procedure.  The first two instructions are
														
 
															-necessary to get the state of the machine back to where it was at the
														
 
															-beginning of the procedure.  The instruction \key{addq \$16, \%rsp}
														
 
															-moves the stack pointer back to point at the old base pointer. The
														
 
															-amount added here needs to match the amount that was subtracted in the
														
 
															-prelude of the procedure. Then \key{popq \%rbp} returns the old base
														
 
															-pointer to \key{rbp} and adds $8$ to the stack pointer.  The last
														
 
															-instruction, \key{retq}, jumps back to the procedure that called this
														
 
															-one and adds 8 to the stack pointer, which returns the stack pointer
														
 
															-to where it was prior to the procedure call.
														
 
															+typical \emph{conclusion} of a procedure.  The first two instructions
														
 
															+are necessary to get the state of the machine back to where it was at
														
 
															+the beginning of the procedure.  The instruction \key{addq \$16,
														
 
															+  \%rsp} moves the stack pointer back to point at the old base
														
 
															+pointer. The amount added here needs to match the amount that was
														
 
															+subtracted in the prelude of the procedure. Then \key{popq \%rbp}
														
 
															+returns the old base pointer to \key{rbp} and adds $8$ to the stack
														
 
															+pointer.  The last instruction, \key{retq}, jumps back to the
														
 
															+procedure that called this one and adds 8 to the stack pointer, which
														
 
															+returns the stack pointer to where it was prior to the procedure call.
														
 
															 The compiler needs a convenient representation for manipulating x86
														
 
															 programs, so we define an abstract syntax for x86 in
														
@@ -2549,7 +2560,7 @@ function call, and the callee is responsible for saving and restoring
 
															 some other registers, the \emph{callee-saved registers}, before and
														
 
															 after using them. The caller-saved registers are
														
 
															 \begin{lstlisting}
														
 
															-  rax rdx rcx rsi rdi r8 r9 r10 r11
														
 
															+  rax rcx rdx rsi rdi r8 r9 r10 r11
														
 
															 \end{lstlisting}
														
 
															 while the callee-saved registers are
														
 
															 \begin{lstlisting}
														
@@ -2775,13 +2786,23 @@ move. So we have the following three rules.
 
															   the edge $(d,v)$ for every $v \in L_{\mathsf{after}}(k)$ unless $v =
														
 
															   d$ or $v = s$.
														
 
															 \end{enumerate}
														
 
															-\margincomment{JM: I think you could give examples of each one of these
														
 
															-  using the example program and use those to help explain why these
														
 
															-  rules are correct.\\
														
 
															-  JS: Agreed.}
														
 
															-Working from the top to bottom of Figure~\ref{fig:live-eg}, we obtain
														
 
															-the following interference for each instruction.
														
 
															+Working from the top to bottom of Figure~\ref{fig:live-eg}, apply the
														
 
															+above rules to each instruction. We highlight a few of the
														
 
															+instructions and then refer the reader to
														
 
															+Figure~\ref{fig:interference-results} all the interference results.
														
 
															+The first instruction is \lstinline{movq $1, v}, so rule 3 applies,
														
 
															+and the live-after set is $\{v\}$. We do not add any interference
														
 
															+edges because the one live variable $v$ is also the destination of
														
 
															+this instruction.
														
 
															+%
														
 
															+For the second instruction, \lstinline{movq $42, w}, so rule 3 applies
														
 
															+again, and the live-after set is $\{v,w\}$. So the target $w$ of
														
 
															+\key{movq} interferes with $v$.
														
 
															+%
														
 
															+Next we skip forward to the instruction \lstinline{movq x, y}.
														
 
															+
														
 
															+\begin{figure}[tbp]
														
 
															 \begin{quote}
														
 
															 \begin{tabular}{ll}
														
 
															 \lstinline{movq $1, v}& no interference by rule 3,\\
														
@@ -2798,6 +2819,10 @@ the following interference for each instruction.
 
															   \lstinline{jmp conclusion}& no interference.
														
 
															 \end{tabular}
														
 
															 \end{quote}
														
 
															+\caption{Interference results for the running example.}
														
 
															+\label{fig:interference-results}
														
 
															+\end{figure}
														
 
															+
														
 
															 The resulting interference graph is shown in
														
 
															 Figure~\ref{fig:interfere}.
														
@@ -3251,15 +3276,17 @@ The prelude saved the values in \code{rbp} and \code{rsp} and the
 
															 conclusion returned those values to \code{rbp} and \code{rsp}.  The
														
 
															 reason for this is that our \code{main} function must adhere to the
														
 
															 x86 calling conventions that we described in
														
 
															-Section~\ref{sec:calling-conventions}. In addition, the \code{main}
														
 
															-function needs to restore (in the conclusion) any callee-saved
														
 
															-registers that get used during register allocation. The simplest
														
 
															+Section~\ref{sec:calling-conventions}.  Furthermore, if your register
														
 
															+allocator assigned variables to other callee-saved registers
														
 
															+(e.g. rbx, r12, etc.), then those variables must also be saved to the
														
 
															+stack in the prelude and restored in the conclusion.  The simplest
														
 
															 approach is to save and restore all of the callee-saved registers. The
														
 
															 more efficient approach is to keep track of which callee-saved
														
 
															 registers were used and only save and restore them. Either way, make
														
 
															 sure to take this use of stack space into account when you are
														
 
															-calculating the size of the frame. Also, don't forget that the size of
														
 
															-the frame needs to be a multiple of 16 bytes.
														
 
															+calculating the size of the frame and adjusting the \code{rsp} in the
														
 
															+prelude. Also, don't forget that the size of the frame needs to be a
														
 
															+multiple of 16 bytes!
														
 
															 \section{Challenge: Move Biasing}
														
@@ -3313,10 +3340,12 @@ jmp conclusion
 
															 \end{lstlisting}
														
 
															 \end{minipage}
														
 
															-While this allocation is quite good, we could do better. For example,
														
 
															-the variables \key{x} and \key{y} ended up in different registers, but
														
 
															-if they had been placed in the same register, then the move from
														
 
															-\key{x} to \key{y} could be removed.
														
 
															+In the above output code there are two \key{movq} instructions that
														
 
															+can be removed because their source and target are the same.  However,
														
 
															+if we had put \key{t}, \key{v}, \key{x}, and \key{y} into the same
														
 
															+register, we could instead remove three \key{movq} instructions.  We
														
 
															+can accomplish this by taking into account which variables appear in
														
 
															+\key{movq} instructions with which other variables.
														
 
															 We say that two variables $p$ and $q$ are \emph{move related} if they
														
 
															 participate together in a \key{movq} instruction, that is, \key{movq}
														
@@ -3503,6 +3532,76 @@ programs to make sure that your move biasing is working properly.
 
															 \margincomment{\footnotesize To do: another neat challenge would be to do
														
 
															   live range splitting~\citep{Cooper:1998ly}. \\ --Jeremy}
														
 
															+\section{Output of the Running Example}
														
 
															+\label{sec:reg-alloc-output}
														
 
															+
														
 
															+Figure~\ref{fig:running-example-x86} shows the x86 code generated for
														
 
															+the running example (Figure~\ref{fig:reg-eg}) with register allocation
														
 
															+and move biasing. To demonstrate both the use of registers and the
														
 
															+stack, we have limited the register allocator to use just two
														
 
															+registers: \code{rbx} and \code{rcx}.  In the prelude of the
														
 
															+\code{main} function, we push \code{rbx} onto the stack because it is
														
 
															+a callee-saved register and it was assigned to variable by the
														
 
															+register allocator.  We substract \code{8} from the \code{rsp} at the
														
 
															+end of the prelude to reserve space for the one spilled variable.
														
 
															+After that subtraction, the \code{rsp} is aligned to 16 bytes.
														
 
															+
														
 
															+Moving on the the \code{start} block, we see how the registers were
														
 
															+allocated. Variables \code{v}, \code{x}, and \code{y} were assigned to
														
 
															+\code{rbx} and variable \code{z} was assigned to \code{rcx}.  Variable
														
 
															+\code{w} was spilled to the stack location \code{-16(\%rbp)}.  Recall
														
 
															+that the prelude saved the callee-save register \code{rbx} onto the
														
 
															+stack. The spilled variables must be placed lower on the stack than
														
 
															+the saved callee-save registers, so in this case \code{w} is placed at
														
 
															+\code{-16(\%rbp)}.
														
 
															+
														
 
															+In the \code{conclusion}, we undo the work that was done in the
														
 
															+prelude. We move the stack pointer up by \code{8} bytes (the room for
														
 
															+spilled variables), then we pop the old values of \code{rbx} and
														
 
															+\code{rbp} (callee-saved registers), and finish with \code{retq} to
														
 
															+return control to the operating system.
														
 
															+
														
 
															+  
														
 
															+\begin{figure}[tbp]
														
 
															+  % s0_28.rkt
														
 
															+  % (use-minimal-set-of-registers! #t)
														
 
															+  % and only rbx rcx
														
 
															+% tmp 0
														
 
															+% z 1  rcx
														
 
															+% y 0  rbx
														
 
															+% w 2  16(%rbp)
														
 
															+% v 0  rbx
														
 
															+% x 0  rbx
														
 
															+\begin{lstlisting}
														
 
															+start:
														
 
															+	movq	$1, %rbx
														
 
															+	movq	$42, -16(%rbp)
														
 
															+	addq	$7, %rbx
														
 
															+	movq	%rbx, %rcx
														
 
															+	addq	-16(%rbp), %rcx
														
 
															+	negq	%rbx
														
 
															+	movq	%rcx, %rax
														
 
															+	addq	%rbx, %rax
														
 
															+	jmp conclusion
														
 
															+
														
 
															+	.globl main
														
 
															+main:
														
 
															+	pushq	%rbp
														
 
															+	movq	%rsp, %rbp
														
 
															+	pushq	%rbx
														
 
															+	subq	$8, %rsp
														
 
															+	jmp start
														
 
															+conclusion:
														
 
															+	addq	$8, %rsp
														
 
															+	popq	%rbx
														
 
															+	popq	%rbp
														
 
															+	retq
														
 
															+\end{lstlisting}
														
 
															+\caption{The x86 output from the running example (Figure~\ref{fig:reg-eg}).}
														
 
															+\label{fig:running-example-x86}
														
 
															+\end{figure}
														
 
															+
														
 
															+
														
 
															 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
														
@@ -3827,7 +3926,7 @@ To implement the new logical operations, the comparison operations,
 
															 and the \key{if} expression, we need to delve further into the x86
														
 
															 language. Figure~\ref{fig:x86-1} defines the abstract syntax for a
														
 
															 larger subset of x86 that includes instructions for logical
														
 
															-operations, comparisons, and jumps.
														
 
															+operations, comparisons, and conditional jumps.
														
 
															 One small challenge is that x86 does not provide an instruction that
														
 
															 directly implements logical negation (\code{not} in $R_2$ and $C_1$).
														
@@ -4216,7 +4315,7 @@ less-than comparison is as follows.
 
															 \[
														
 
															 (\key{<}~e_1~e_2) \quad\Rightarrow\quad
														
 
															 \begin{array}{l}
														
 
															-\key{if}~(\key{<}~e_1~e_2)~\key{then} \\
														
 
															+\key{if}~(\key{<}~e_1~e_2) \\
														
 
															 \qquad\key{goto}~\ell_1\key{;}\\
														
 
															 \key{else}\\
														
 
															 \qquad\key{goto}~\ell_2\key{;}
														
@@ -4233,7 +4332,7 @@ current one, that is, predicate context. So we apply
 
															 \code{explicate-pred} to the ``then'' branch with the two blocks
														
 
															 \GOTO{$\ell_1$} and \GOTO{$\ell_2$} to obtain $B_3$.  Proceed in a
														
 
															 similar way with the ``else'' branch to obtain $B_4$.  Finally, we
														
 
															-apply \code{explicate-pred} to the predicate of hte \code{if} and the
														
 
															+apply \code{explicate-pred} to the predicate of the \code{if} and the
														
 
															 blocks $B_3$ and $B_4$ to obtain the result $B_5$.
														
 
															 \[
														
 
															 (\key{if}\; \itm{cnd}\; \itm{thn}\; \itm{els})
														
@@ -4269,22 +4368,22 @@ approach of encoding them as integers, with true as 1 and false as 0.
 
															 For $\Stmt$, we discuss a couple cases.  The \code{not} operation can
														
 
															 be implemented in terms of \code{xorq} as we discussed at the
														
 
															 beginning of this section. Given an assignment
														
 
															-$\itm{var}$ \key{=} \key{(not} $\Arg$\key{);},
														
 
															+$\itm{var}$ \key{=} \key{(not} $\Atm$\key{);},
														
 
															 if the left-hand side $\itm{var}$ is
														
 
															-the same as $\Arg$, then just the \code{xorq} suffices.
														
 
															+the same as $\Atm$, then just the \code{xorq} suffices.
														
 
															 \[
														
 
															 \Var~\key{=}~ \key{(not}\; \Var\key{);}
														
 
															 \quad\Rightarrow\quad
														
 
															 \key{xorq}~\key{\$}1\key{,}~\Var
														
 
															 \]
														
 
															 Otherwise, a \key{movq} is needed to adapt to the update-in-place
														
 
															-semantics of x86. Let $\Arg'$ be the result of recursively processing
														
 
															-$\Arg$. Then we have
														
 
															+semantics of x86. Let $\Arg$ be the result of translating $\Atm$ to
														
 
															+x86. Then we have
														
 
															 \[
														
 
															-\Var~\key{=}~ \key{(not}\; \Arg\key{);}
														
 
															+\Var~\key{=}~ \key{(not}\; \Atm\key{);}
														
 
															 \quad\Rightarrow\quad
														
 
															 \begin{array}{l}
														
 
															-\key{movq}~\Arg'\key{,}~\Var\\
														
 
															+\key{movq}~\Arg\key{,}~\Var\\
														
 
															 \key{xorq}~\key{\$}1\key{,}~\Var
														
 
															 \end{array}
														
 
															 \]
														
@@ -4297,7 +4396,7 @@ sequence of three instructions. \\
 
															 \begin{tabular}{lll}
														
 
															 \begin{minipage}{0.4\textwidth}
														
 
															 \begin{lstlisting}
														
 
															-|$\Var$| = (eq? |$\Arg_1$| |$\Arg_2$|);
														
 
															+|$\Var$| = (eq? |$\Atm_1$| |$\Atm_2$|);
														
 
															 \end{lstlisting}
														
 
															 \end{minipage}
														
 
															 &
														
@@ -4305,7 +4404,7 @@ $\Rightarrow$
 
															 &
														
 
															 \begin{minipage}{0.4\textwidth}
														
 
															 \begin{lstlisting}
														
 
															-cmpq |$\Arg'_2$|, |$\Arg'_1$|
														
 
															+cmpq |$\Arg_2$|, |$\Arg_1$|
														
 
															 sete %al
														
 
															 movzbq %al, |$\Var$|
														
 
															 \end{lstlisting}
														
@@ -4324,7 +4423,7 @@ to a regular jump (for ``else'').\\
 
															 \begin{tabular}{lll}
														
 
															 \begin{minipage}{0.4\textwidth}
														
 
															 \begin{lstlisting}
														
 
															-if (eq? |$\Arg_1$| |$\Arg_2$|) then
														
 
															+if (eq? |$\Atm_1$| |$\Atm_2$|)
														
 
															    goto |$\ell_1$|;
														
 
															 else
														
 
															    goto |$\ell_2$|;
														
@@ -4335,7 +4434,7 @@ $\Rightarrow$
 
															 &
														
 
															 \begin{minipage}{0.4\textwidth}
														
 
															 \begin{lstlisting}
														
 
															-cmpq |$\Arg'_2$| |$\Arg'_1$|
														
 
															+cmpq |$\Arg_2$|, |$\Arg_1$|
														
 
															 je |$\ell_1$|
														
 
															 jmp |$\ell_2$|
														
 
															 \end{lstlisting}
														
@@ -4578,7 +4677,7 @@ Figure~\ref{fig:R2-passes} lists all the passes needed for the
 
															 compilation of $R_2$.
														
 
															-\section{Challenge: Optimize Jumps}
														
 
															+\section{Challenge: Optimize and Remove Jumps}
														
 
															 \label{sec:opt-jumps}
														
 
															 Recall that in the example output of \code{explicate-control} in