hace 4 años · d1491b02c4
--- a/book.tex
+++ b/book.tex
@@ -1395,17 +1395,15 @@ procedure call. The memory layout for an individual frame is shown in
 
				 Figure~\ref{fig:frame}.  The register \key{rsp} is called the
			
 
				 \emph{stack pointer} and points to the item at the top of the
			
 
				 stack. The stack grows downward in memory, so we increase the size of
			
 
				-the stack by subtracting from the stack pointer. Some operating
			
 
				-systems require the frame size to be a multiple of 16 bytes. In the
			
 
				-context of a procedure call, the \emph{return address} is the next
			
 
				-instruction after the call instruction on the caller side. During a
			
 
				-function call, the return address is pushed onto the stack.  The
			
 
				-register \key{rbp} is the \emph{base pointer} and is used to access
			
 
				-variables associated with the current procedure call.  The base
			
 
				-pointer of the caller is pushed onto the stack after the return
			
 
				-address. We number the variables from $1$ to $n$. Variable $1$ is
			
 
				-stored at address $-8\key{(\%rbp)}$, variable $2$ at
			
 
				-$-16\key{(\%rbp)}$, etc.
			
 
				+the stack by subtracting from the stack pointer.  In the context of a
			
 
				+procedure call, the \emph{return address} is the next instruction
			
 
				+after the call instruction on the caller side. During a function call,
			
 
				+the return address is pushed onto the stack.  The register \key{rbp}
			
 
				+is the \emph{base pointer} and is used to access variables associated
			
 
				+with the current procedure call.  The base pointer of the caller is
			
 
				+pushed onto the stack after the return address. We number the
			
 
				+variables from $1$ to $n$. Variable $1$ is stored at address
			
 
				+$-8\key{(\%rbp)}$, variable $2$ at $-16\key{(\%rbp)}$, etc.
			
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{lstlisting}
			
@@ -1448,36 +1446,49 @@ Position & Contents \\ \hline
 
				 \label{fig:frame}
			
 
				 \end{figure}
			
 
				 
			
 
				-Getting back to the program in Figure~\ref{fig:p1-x86}, the first
			
 
				-three instructions are the typical \emph{prelude} for a procedure.
			
 
				-The instruction \key{pushq \%rbp} saves the base pointer for the
			
 
				-caller onto the stack and subtracts $8$ from the stack pointer. The
			
 
				-second instruction \key{movq \%rsp, \%rbp} changes the base pointer so
			
 
				-that it points the location of the old base pointer. The instruction
			
 
				-\key{subq \$16, \%rsp} moves the stack pointer down to make enough
			
 
				-room for storing variables.  This program needs one variable ($8$
			
 
				-bytes) but because the frame size is required to be a multiple of 16
			
 
				-bytes, the space for variables is rounded up to 16 bytes.
			
 
				+
			
 
				+Getting back to the program in Figure~\ref{fig:p1-x86}, consider how
			
 
				+control is transfered from the operating system to the \code{main}
			
 
				+function.  The operating system issues a \code{callq main} instruction
			
 
				+which pushes its return address on the stack and then jumps to
			
 
				+\code{main}. In x86-64, the stack pointer \code{rsp} must be divisible
			
 
				+by 16 bytes prior to the execution of any \code{callq} instruction, so
			
 
				+when control arrives at \code{main}, the \code{rsp} is 8 bytes out of
			
 
				+alignment (because the \code{callq} pushed the return address).  The
			
 
				+first three instructions are the typical \emph{prelude} for a
			
 
				+procedure.  The instruction \code{pushq \%rbp} saves the base pointer
			
 
				+for the caller onto the stack and subtracts $8$ from the stack
			
 
				+pointer. At this point the stack pointer is back to being 16-byte
			
 
				+aligned. The second instruction \code{movq \%rsp, \%rbp} changes the
			
 
				+base pointer so that it points the location of the old base
			
 
				+pointer. The instruction \code{subq \$16, \%rsp} moves the stack
			
 
				+pointer down to make enough room for storing variables.  This program
			
 
				+needs one variable ($8$ bytes) but we round up to 16 bytes to maintain
			
 
				+the 16-byte alignment of the \code{rsp}. With the \code{rsp} aligned,
			
 
				+we are ready to make calls to other functions. The last instruction of
			
 
				+the prelude is \code{jmp start}, which transfers control to the
			
 
				+instructions that were generated from the Racket expression \code{(+
			
 
				+  10 32)}.
			
 
				 
			
 
				 The four instructions under the label \code{start} carry out the work
			
 
				 of computing \code{(+ 52 (- 10)))}. The first instruction
			
 
				-\key{movq \$10, -8(\%rbp)} stores $10$ in variable $1$. The
			
 
				-instruction \key{negq -8(\%rbp)} changes variable $1$ to $-10$. The
			
 
				-instruction \key{movq \$52, \%rax} places $52$ in the register \key{rax} and
			
 
				-finally \key{addq -8(\%rbp), \%rax} adds the contents of variable $1$ to
			
 
				-\key{rax}, at which point \key{rax} contains $42$.
			
 
				+\code{movq \$10, -8(\%rbp)} stores $10$ in variable $1$. The
			
 
				+instruction \code{negq -8(\%rbp)} changes variable $1$ to $-10$. The
			
 
				+instruction \code{movq \$52, \%rax} places $52$ in the register \code{rax} and
			
 
				+finally \code{addq -8(\%rbp), \%rax} adds the contents of variable $1$ to
			
 
				+\code{rax}, at which point \code{rax} contains $42$.
			
 
				 
			
 
				 The three instructions under the label \code{conclusion} are the
			
 
				-typical \emph{finale} of a procedure.  The first two instructions are
			
 
				-necessary to get the state of the machine back to where it was at the
			
 
				-beginning of the procedure.  The instruction \key{addq \$16, \%rsp}
			
 
				-moves the stack pointer back to point at the old base pointer. The
			
 
				-amount added here needs to match the amount that was subtracted in the
			
 
				-prelude of the procedure. Then \key{popq \%rbp} returns the old base
			
 
				-pointer to \key{rbp} and adds $8$ to the stack pointer.  The last
			
 
				-instruction, \key{retq}, jumps back to the procedure that called this
			
 
				-one and adds 8 to the stack pointer, which returns the stack pointer
			
 
				-to where it was prior to the procedure call.
			
 
				+typical \emph{conclusion} of a procedure.  The first two instructions
			
 
				+are necessary to get the state of the machine back to where it was at
			
 
				+the beginning of the procedure.  The instruction \key{addq \$16,
			
 
				+  \%rsp} moves the stack pointer back to point at the old base
			
 
				+pointer. The amount added here needs to match the amount that was
			
 
				+subtracted in the prelude of the procedure. Then \key{popq \%rbp}
			
 
				+returns the old base pointer to \key{rbp} and adds $8$ to the stack
			
 
				+pointer.  The last instruction, \key{retq}, jumps back to the
			
 
				+procedure that called this one and adds 8 to the stack pointer, which
			
 
				+returns the stack pointer to where it was prior to the procedure call.
			
 
				 
			
 
				 The compiler needs a convenient representation for manipulating x86
			
 
				 programs, so we define an abstract syntax for x86 in
			
@@ -2549,7 +2560,7 @@ function call, and the callee is responsible for saving and restoring
 
				 some other registers, the \emph{callee-saved registers}, before and
			
 
				 after using them. The caller-saved registers are
			
 
				 \begin{lstlisting}
			
 
				-  rax rdx rcx rsi rdi r8 r9 r10 r11
			
 
				+  rax rcx rdx rsi rdi r8 r9 r10 r11
			
 
				 \end{lstlisting}
			
 
				 while the callee-saved registers are
			
 
				 \begin{lstlisting}
			
@@ -2775,13 +2786,23 @@ move. So we have the following three rules.
 
				   the edge $(d,v)$ for every $v \in L_{\mathsf{after}}(k)$ unless $v =
			
 
				   d$ or $v = s$.
			
 
				 \end{enumerate}
			
 
				-\margincomment{JM: I think you could give examples of each one of these
			
 
				-  using the example program and use those to help explain why these
			
 
				-  rules are correct.\\
			
 
				-  JS: Agreed.}
			
 
				 
			
 
				-Working from the top to bottom of Figure~\ref{fig:live-eg}, we obtain
			
 
				-the following interference for each instruction.
			
 
				+Working from the top to bottom of Figure~\ref{fig:live-eg}, apply the
			
 
				+above rules to each instruction. We highlight a few of the
			
 
				+instructions and then refer the reader to
			
 
				+Figure~\ref{fig:interference-results} all the interference results.
			
 
				+The first instruction is \lstinline{movq $1, v}, so rule 3 applies,
			
 
				+and the live-after set is $\{v\}$. We do not add any interference
			
 
				+edges because the one live variable $v$ is also the destination of
			
 
				+this instruction.
			
 
				+%
			
 
				+For the second instruction, \lstinline{movq $42, w}, so rule 3 applies
			
 
				+again, and the live-after set is $\{v,w\}$. So the target $w$ of
			
 
				+\key{movq} interferes with $v$.
			
 
				+%
			
 
				+Next we skip forward to the instruction \lstinline{movq x, y}.
			
 
				+
			
 
				+\begin{figure}[tbp]
			
 
				 \begin{quote}
			
 
				 \begin{tabular}{ll}
			
 
				 \lstinline{movq $1, v}& no interference by rule 3,\\
			
@@ -2798,6 +2819,10 @@ the following interference for each instruction.
 
				   \lstinline{jmp conclusion}& no interference.
			
 
				 \end{tabular}
			
 
				 \end{quote}
			
 
				+\caption{Interference results for the running example.}
			
 
				+\label{fig:interference-results}
			
 
				+\end{figure}
			
 
				+
			
 
				 The resulting interference graph is shown in
			
 
				 Figure~\ref{fig:interfere}.
			
 
				 
			
@@ -3251,15 +3276,17 @@ The prelude saved the values in \code{rbp} and \code{rsp} and the
 
				 conclusion returned those values to \code{rbp} and \code{rsp}.  The
			
 
				 reason for this is that our \code{main} function must adhere to the
			
 
				 x86 calling conventions that we described in
			
 
				-Section~\ref{sec:calling-conventions}. In addition, the \code{main}
			
 
				-function needs to restore (in the conclusion) any callee-saved
			
 
				-registers that get used during register allocation. The simplest
			
 
				+Section~\ref{sec:calling-conventions}.  Furthermore, if your register
			
 
				+allocator assigned variables to other callee-saved registers
			
 
				+(e.g. rbx, r12, etc.), then those variables must also be saved to the
			
 
				+stack in the prelude and restored in the conclusion.  The simplest
			
 
				 approach is to save and restore all of the callee-saved registers. The
			
 
				 more efficient approach is to keep track of which callee-saved
			
 
				 registers were used and only save and restore them. Either way, make
			
 
				 sure to take this use of stack space into account when you are
			
 
				-calculating the size of the frame. Also, don't forget that the size of
			
 
				-the frame needs to be a multiple of 16 bytes.
			
 
				+calculating the size of the frame and adjusting the \code{rsp} in the
			
 
				+prelude. Also, don't forget that the size of the frame needs to be a
			
 
				+multiple of 16 bytes!
			
 
				 
			
 
				 
			
 
				 \section{Challenge: Move Biasing}
			
@@ -3313,10 +3340,12 @@ jmp conclusion
 
				 \end{lstlisting}
			
 
				 \end{minipage}
			
 
				 
			
 
				-While this allocation is quite good, we could do better. For example,
			
 
				-the variables \key{x} and \key{y} ended up in different registers, but
			
 
				-if they had been placed in the same register, then the move from
			
 
				-\key{x} to \key{y} could be removed.
			
 
				+In the above output code there are two \key{movq} instructions that
			
 
				+can be removed because their source and target are the same.  However,
			
 
				+if we had put \key{t}, \key{v}, \key{x}, and \key{y} into the same
			
 
				+register, we could instead remove three \key{movq} instructions.  We
			
 
				+can accomplish this by taking into account which variables appear in
			
 
				+\key{movq} instructions with which other variables.
			
 
				 
			
 
				 We say that two variables $p$ and $q$ are \emph{move related} if they
			
 
				 participate together in a \key{movq} instruction, that is, \key{movq}
			
@@ -3503,6 +3532,76 @@ programs to make sure that your move biasing is working properly.
 
				 \margincomment{\footnotesize To do: another neat challenge would be to do
			
 
				   live range splitting~\citep{Cooper:1998ly}. \\ --Jeremy}
			
 
				 
			
 
				+\section{Output of the Running Example}
			
 
				+\label{sec:reg-alloc-output}
			
 
				+
			
 
				+Figure~\ref{fig:running-example-x86} shows the x86 code generated for
			
 
				+the running example (Figure~\ref{fig:reg-eg}) with register allocation
			
 
				+and move biasing. To demonstrate both the use of registers and the
			
 
				+stack, we have limited the register allocator to use just two
			
 
				+registers: \code{rbx} and \code{rcx}.  In the prelude of the
			
 
				+\code{main} function, we push \code{rbx} onto the stack because it is
			
 
				+a callee-saved register and it was assigned to variable by the
			
 
				+register allocator.  We substract \code{8} from the \code{rsp} at the
			
 
				+end of the prelude to reserve space for the one spilled variable.
			
 
				+After that subtraction, the \code{rsp} is aligned to 16 bytes.
			
 
				+
			
 
				+Moving on the the \code{start} block, we see how the registers were
			
 
				+allocated. Variables \code{v}, \code{x}, and \code{y} were assigned to
			
 
				+\code{rbx} and variable \code{z} was assigned to \code{rcx}.  Variable
			
 
				+\code{w} was spilled to the stack location \code{-16(\%rbp)}.  Recall
			
 
				+that the prelude saved the callee-save register \code{rbx} onto the
			
 
				+stack. The spilled variables must be placed lower on the stack than
			
 
				+the saved callee-save registers, so in this case \code{w} is placed at
			
 
				+\code{-16(\%rbp)}.
			
 
				+
			
 
				+In the \code{conclusion}, we undo the work that was done in the
			
 
				+prelude. We move the stack pointer up by \code{8} bytes (the room for
			
 
				+spilled variables), then we pop the old values of \code{rbx} and
			
 
				+\code{rbp} (callee-saved registers), and finish with \code{retq} to
			
 
				+return control to the operating system.
			
 
				+
			
 
				+  
			
 
				+\begin{figure}[tbp]
			
 
				+  % s0_28.rkt
			
 
				+  % (use-minimal-set-of-registers! #t)
			
 
				+  % and only rbx rcx
			
 
				+% tmp 0
			
 
				+% z 1  rcx
			
 
				+% y 0  rbx
			
 
				+% w 2  16(%rbp)
			
 
				+% v 0  rbx
			
 
				+% x 0  rbx
			
 
				+\begin{lstlisting}
			
 
				+start:
			
 
				+	movq	$1, %rbx
			
 
				+	movq	$42, -16(%rbp)
			
 
				+	addq	$7, %rbx
			
 
				+	movq	%rbx, %rcx
			
 
				+	addq	-16(%rbp), %rcx
			
 
				+	negq	%rbx
			
 
				+	movq	%rcx, %rax
			
 
				+	addq	%rbx, %rax
			
 
				+	jmp conclusion
			
 
				+
			
 
				+	.globl main
			
 
				+main:
			
 
				+	pushq	%rbp
			
 
				+	movq	%rsp, %rbp
			
 
				+	pushq	%rbx
			
 
				+	subq	$8, %rsp
			
 
				+	jmp start
			
 
				+conclusion:
			
 
				+	addq	$8, %rsp
			
 
				+	popq	%rbx
			
 
				+	popq	%rbp
			
 
				+	retq
			
 
				+\end{lstlisting}
			
 
				+\caption{The x86 output from the running example (Figure~\ref{fig:reg-eg}).}
			
 
				+\label{fig:running-example-x86}
			
 
				+\end{figure}
			
 
				+
			
 
				+
			
 
				 
			
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
@@ -3827,7 +3926,7 @@ To implement the new logical operations, the comparison operations,
 
				 and the \key{if} expression, we need to delve further into the x86
			
 
				 language. Figure~\ref{fig:x86-1} defines the abstract syntax for a
			
 
				 larger subset of x86 that includes instructions for logical
			
 
				-operations, comparisons, and jumps.
			
 
				+operations, comparisons, and conditional jumps.
			
 
				 
			
 
				 One small challenge is that x86 does not provide an instruction that
			
 
				 directly implements logical negation (\code{not} in $R_2$ and $C_1$).
			
@@ -4216,7 +4315,7 @@ less-than comparison is as follows.
 
				 \[
			
 
				 (\key{<}~e_1~e_2) \quad\Rightarrow\quad
			
 
				 \begin{array}{l}
			
 
				-\key{if}~(\key{<}~e_1~e_2)~\key{then} \\
			
 
				+\key{if}~(\key{<}~e_1~e_2) \\
			
 
				 \qquad\key{goto}~\ell_1\key{;}\\
			
 
				 \key{else}\\
			
 
				 \qquad\key{goto}~\ell_2\key{;}
			
@@ -4233,7 +4332,7 @@ current one, that is, predicate context. So we apply
 
				 \code{explicate-pred} to the ``then'' branch with the two blocks
			
 
				 \GOTO{$\ell_1$} and \GOTO{$\ell_2$} to obtain $B_3$.  Proceed in a
			
 
				 similar way with the ``else'' branch to obtain $B_4$.  Finally, we
			
 
				-apply \code{explicate-pred} to the predicate of hte \code{if} and the
			
 
				+apply \code{explicate-pred} to the predicate of the \code{if} and the
			
 
				 blocks $B_3$ and $B_4$ to obtain the result $B_5$.
			
 
				 \[
			
 
				 (\key{if}\; \itm{cnd}\; \itm{thn}\; \itm{els})
			
@@ -4269,22 +4368,22 @@ approach of encoding them as integers, with true as 1 and false as 0.
 
				 For $\Stmt$, we discuss a couple cases.  The \code{not} operation can
			
 
				 be implemented in terms of \code{xorq} as we discussed at the
			
 
				 beginning of this section. Given an assignment
			
 
				-$\itm{var}$ \key{=} \key{(not} $\Arg$\key{);},
			
 
				+$\itm{var}$ \key{=} \key{(not} $\Atm$\key{);},
			
 
				 if the left-hand side $\itm{var}$ is
			
 
				-the same as $\Arg$, then just the \code{xorq} suffices.
			
 
				+the same as $\Atm$, then just the \code{xorq} suffices.
			
 
				 \[
			
 
				 \Var~\key{=}~ \key{(not}\; \Var\key{);}
			
 
				 \quad\Rightarrow\quad
			
 
				 \key{xorq}~\key{\$}1\key{,}~\Var
			
 
				 \]
			
 
				 Otherwise, a \key{movq} is needed to adapt to the update-in-place
			
 
				-semantics of x86. Let $\Arg'$ be the result of recursively processing
			
 
				-$\Arg$. Then we have
			
 
				+semantics of x86. Let $\Arg$ be the result of translating $\Atm$ to
			
 
				+x86. Then we have
			
 
				 \[
			
 
				-\Var~\key{=}~ \key{(not}\; \Arg\key{);}
			
 
				+\Var~\key{=}~ \key{(not}\; \Atm\key{);}
			
 
				 \quad\Rightarrow\quad
			
 
				 \begin{array}{l}
			
 
				-\key{movq}~\Arg'\key{,}~\Var\\
			
 
				+\key{movq}~\Arg\key{,}~\Var\\
			
 
				 \key{xorq}~\key{\$}1\key{,}~\Var
			
 
				 \end{array}
			
 
				 \]
			
@@ -4297,7 +4396,7 @@ sequence of three instructions. \\
 
				 \begin{tabular}{lll}
			
 
				 \begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
 
				-|$\Var$| = (eq? |$\Arg_1$| |$\Arg_2$|);
			
 
				+|$\Var$| = (eq? |$\Atm_1$| |$\Atm_2$|);
			
 
				 \end{lstlisting}
			
 
				 \end{minipage}
			
 
				 &
			
@@ -4305,7 +4404,7 @@ $\Rightarrow$
 
				 &
			
 
				 \begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
 
				-cmpq |$\Arg'_2$|, |$\Arg'_1$|
			
 
				+cmpq |$\Arg_2$|, |$\Arg_1$|
			
 
				 sete %al
			
 
				 movzbq %al, |$\Var$|
			
 
				 \end{lstlisting}
			
@@ -4324,7 +4423,7 @@ to a regular jump (for ``else'').\\
 
				 \begin{tabular}{lll}
			
 
				 \begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
 
				-if (eq? |$\Arg_1$| |$\Arg_2$|) then
			
 
				+if (eq? |$\Atm_1$| |$\Atm_2$|)
			
 
				    goto |$\ell_1$|;
			
 
				 else
			
 
				    goto |$\ell_2$|;
			
@@ -4335,7 +4434,7 @@ $\Rightarrow$
 
				 &
			
 
				 \begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
 
				-cmpq |$\Arg'_2$| |$\Arg'_1$|
			
 
				+cmpq |$\Arg_2$|, |$\Arg_1$|
			
 
				 je |$\ell_1$|
			
 
				 jmp |$\ell_2$|
			
 
				 \end{lstlisting}
			
@@ -4578,7 +4677,7 @@ Figure~\ref{fig:R2-passes} lists all the passes needed for the
 
				 compilation of $R_2$.
			
 
				 
			
 
				 
			
 
				-\section{Challenge: Optimize Jumps}
			
 
				+\section{Challenge: Optimize and Remove Jumps}
			
 
				 \label{sec:opt-jumps}
			
 
				 
			
 
				 Recall that in the example output of \code{explicate-control} in