Jeremy Siek hace 4 años
padre
commit
d1491b02c4
Se han modificado 1 ficheros con 165 adiciones y 66 borrados
  1. 165 66
      book.tex

+ 165 - 66
book.tex

@@ -1395,17 +1395,15 @@ procedure call. The memory layout for an individual frame is shown in
 Figure~\ref{fig:frame}.  The register \key{rsp} is called the
 \emph{stack pointer} and points to the item at the top of the
 stack. The stack grows downward in memory, so we increase the size of
-the stack by subtracting from the stack pointer. Some operating
-systems require the frame size to be a multiple of 16 bytes. In the
-context of a procedure call, the \emph{return address} is the next
-instruction after the call instruction on the caller side. During a
-function call, the return address is pushed onto the stack.  The
-register \key{rbp} is the \emph{base pointer} and is used to access
-variables associated with the current procedure call.  The base
-pointer of the caller is pushed onto the stack after the return
-address. We number the variables from $1$ to $n$. Variable $1$ is
-stored at address $-8\key{(\%rbp)}$, variable $2$ at
-$-16\key{(\%rbp)}$, etc.
+the stack by subtracting from the stack pointer.  In the context of a
+procedure call, the \emph{return address} is the next instruction
+after the call instruction on the caller side. During a function call,
+the return address is pushed onto the stack.  The register \key{rbp}
+is the \emph{base pointer} and is used to access variables associated
+with the current procedure call.  The base pointer of the caller is
+pushed onto the stack after the return address. We number the
+variables from $1$ to $n$. Variable $1$ is stored at address
+$-8\key{(\%rbp)}$, variable $2$ at $-16\key{(\%rbp)}$, etc.
 
 \begin{figure}[tbp]
 \begin{lstlisting}
@@ -1448,36 +1446,49 @@ Position & Contents \\ \hline
 \label{fig:frame}
 \end{figure}
 
-Getting back to the program in Figure~\ref{fig:p1-x86}, the first
-three instructions are the typical \emph{prelude} for a procedure.
-The instruction \key{pushq \%rbp} saves the base pointer for the
-caller onto the stack and subtracts $8$ from the stack pointer. The
-second instruction \key{movq \%rsp, \%rbp} changes the base pointer so
-that it points the location of the old base pointer. The instruction
-\key{subq \$16, \%rsp} moves the stack pointer down to make enough
-room for storing variables.  This program needs one variable ($8$
-bytes) but because the frame size is required to be a multiple of 16
-bytes, the space for variables is rounded up to 16 bytes.
+
+Getting back to the program in Figure~\ref{fig:p1-x86}, consider how
+control is transfered from the operating system to the \code{main}
+function.  The operating system issues a \code{callq main} instruction
+which pushes its return address on the stack and then jumps to
+\code{main}. In x86-64, the stack pointer \code{rsp} must be divisible
+by 16 bytes prior to the execution of any \code{callq} instruction, so
+when control arrives at \code{main}, the \code{rsp} is 8 bytes out of
+alignment (because the \code{callq} pushed the return address).  The
+first three instructions are the typical \emph{prelude} for a
+procedure.  The instruction \code{pushq \%rbp} saves the base pointer
+for the caller onto the stack and subtracts $8$ from the stack
+pointer. At this point the stack pointer is back to being 16-byte
+aligned. The second instruction \code{movq \%rsp, \%rbp} changes the
+base pointer so that it points the location of the old base
+pointer. The instruction \code{subq \$16, \%rsp} moves the stack
+pointer down to make enough room for storing variables.  This program
+needs one variable ($8$ bytes) but we round up to 16 bytes to maintain
+the 16-byte alignment of the \code{rsp}. With the \code{rsp} aligned,
+we are ready to make calls to other functions. The last instruction of
+the prelude is \code{jmp start}, which transfers control to the
+instructions that were generated from the Racket expression \code{(+
+  10 32)}.
 
 The four instructions under the label \code{start} carry out the work
 of computing \code{(+ 52 (- 10)))}. The first instruction
-\key{movq \$10, -8(\%rbp)} stores $10$ in variable $1$. The
-instruction \key{negq -8(\%rbp)} changes variable $1$ to $-10$. The
-instruction \key{movq \$52, \%rax} places $52$ in the register \key{rax} and
-finally \key{addq -8(\%rbp), \%rax} adds the contents of variable $1$ to
-\key{rax}, at which point \key{rax} contains $42$.
+\code{movq \$10, -8(\%rbp)} stores $10$ in variable $1$. The
+instruction \code{negq -8(\%rbp)} changes variable $1$ to $-10$. The
+instruction \code{movq \$52, \%rax} places $52$ in the register \code{rax} and
+finally \code{addq -8(\%rbp), \%rax} adds the contents of variable $1$ to
+\code{rax}, at which point \code{rax} contains $42$.
 
 The three instructions under the label \code{conclusion} are the
-typical \emph{finale} of a procedure.  The first two instructions are
-necessary to get the state of the machine back to where it was at the
-beginning of the procedure.  The instruction \key{addq \$16, \%rsp}
-moves the stack pointer back to point at the old base pointer. The
-amount added here needs to match the amount that was subtracted in the
-prelude of the procedure. Then \key{popq \%rbp} returns the old base
-pointer to \key{rbp} and adds $8$ to the stack pointer.  The last
-instruction, \key{retq}, jumps back to the procedure that called this
-one and adds 8 to the stack pointer, which returns the stack pointer
-to where it was prior to the procedure call.
+typical \emph{conclusion} of a procedure.  The first two instructions
+are necessary to get the state of the machine back to where it was at
+the beginning of the procedure.  The instruction \key{addq \$16,
+  \%rsp} moves the stack pointer back to point at the old base
+pointer. The amount added here needs to match the amount that was
+subtracted in the prelude of the procedure. Then \key{popq \%rbp}
+returns the old base pointer to \key{rbp} and adds $8$ to the stack
+pointer.  The last instruction, \key{retq}, jumps back to the
+procedure that called this one and adds 8 to the stack pointer, which
+returns the stack pointer to where it was prior to the procedure call.
 
 The compiler needs a convenient representation for manipulating x86
 programs, so we define an abstract syntax for x86 in
@@ -2549,7 +2560,7 @@ function call, and the callee is responsible for saving and restoring
 some other registers, the \emph{callee-saved registers}, before and
 after using them. The caller-saved registers are
 \begin{lstlisting}
-  rax rdx rcx rsi rdi r8 r9 r10 r11
+  rax rcx rdx rsi rdi r8 r9 r10 r11
 \end{lstlisting}
 while the callee-saved registers are
 \begin{lstlisting}
@@ -2775,13 +2786,23 @@ move. So we have the following three rules.
   the edge $(d,v)$ for every $v \in L_{\mathsf{after}}(k)$ unless $v =
   d$ or $v = s$.
 \end{enumerate}
-\margincomment{JM: I think you could give examples of each one of these
-  using the example program and use those to help explain why these
-  rules are correct.\\
-  JS: Agreed.}
 
-Working from the top to bottom of Figure~\ref{fig:live-eg}, we obtain
-the following interference for each instruction.
+Working from the top to bottom of Figure~\ref{fig:live-eg}, apply the
+above rules to each instruction. We highlight a few of the
+instructions and then refer the reader to
+Figure~\ref{fig:interference-results} all the interference results.
+The first instruction is \lstinline{movq $1, v}, so rule 3 applies,
+and the live-after set is $\{v\}$. We do not add any interference
+edges because the one live variable $v$ is also the destination of
+this instruction.
+%
+For the second instruction, \lstinline{movq $42, w}, so rule 3 applies
+again, and the live-after set is $\{v,w\}$. So the target $w$ of
+\key{movq} interferes with $v$.
+%
+Next we skip forward to the instruction \lstinline{movq x, y}.
+
+\begin{figure}[tbp]
 \begin{quote}
 \begin{tabular}{ll}
 \lstinline{movq $1, v}& no interference by rule 3,\\
@@ -2798,6 +2819,10 @@ the following interference for each instruction.
   \lstinline{jmp conclusion}& no interference.
 \end{tabular}
 \end{quote}
+\caption{Interference results for the running example.}
+\label{fig:interference-results}
+\end{figure}
+
 The resulting interference graph is shown in
 Figure~\ref{fig:interfere}.
 
@@ -3251,15 +3276,17 @@ The prelude saved the values in \code{rbp} and \code{rsp} and the
 conclusion returned those values to \code{rbp} and \code{rsp}.  The
 reason for this is that our \code{main} function must adhere to the
 x86 calling conventions that we described in
-Section~\ref{sec:calling-conventions}. In addition, the \code{main}
-function needs to restore (in the conclusion) any callee-saved
-registers that get used during register allocation. The simplest
+Section~\ref{sec:calling-conventions}.  Furthermore, if your register
+allocator assigned variables to other callee-saved registers
+(e.g. rbx, r12, etc.), then those variables must also be saved to the
+stack in the prelude and restored in the conclusion.  The simplest
 approach is to save and restore all of the callee-saved registers. The
 more efficient approach is to keep track of which callee-saved
 registers were used and only save and restore them. Either way, make
 sure to take this use of stack space into account when you are
-calculating the size of the frame. Also, don't forget that the size of
-the frame needs to be a multiple of 16 bytes.
+calculating the size of the frame and adjusting the \code{rsp} in the
+prelude. Also, don't forget that the size of the frame needs to be a
+multiple of 16 bytes!
 
 
 \section{Challenge: Move Biasing}
@@ -3313,10 +3340,12 @@ jmp conclusion
 \end{lstlisting}
 \end{minipage}
 
-While this allocation is quite good, we could do better. For example,
-the variables \key{x} and \key{y} ended up in different registers, but
-if they had been placed in the same register, then the move from
-\key{x} to \key{y} could be removed.
+In the above output code there are two \key{movq} instructions that
+can be removed because their source and target are the same.  However,
+if we had put \key{t}, \key{v}, \key{x}, and \key{y} into the same
+register, we could instead remove three \key{movq} instructions.  We
+can accomplish this by taking into account which variables appear in
+\key{movq} instructions with which other variables.
 
 We say that two variables $p$ and $q$ are \emph{move related} if they
 participate together in a \key{movq} instruction, that is, \key{movq}
@@ -3503,6 +3532,76 @@ programs to make sure that your move biasing is working properly.
 \margincomment{\footnotesize To do: another neat challenge would be to do
   live range splitting~\citep{Cooper:1998ly}. \\ --Jeremy}
 
+\section{Output of the Running Example}
+\label{sec:reg-alloc-output}
+
+Figure~\ref{fig:running-example-x86} shows the x86 code generated for
+the running example (Figure~\ref{fig:reg-eg}) with register allocation
+and move biasing. To demonstrate both the use of registers and the
+stack, we have limited the register allocator to use just two
+registers: \code{rbx} and \code{rcx}.  In the prelude of the
+\code{main} function, we push \code{rbx} onto the stack because it is
+a callee-saved register and it was assigned to variable by the
+register allocator.  We substract \code{8} from the \code{rsp} at the
+end of the prelude to reserve space for the one spilled variable.
+After that subtraction, the \code{rsp} is aligned to 16 bytes.
+
+Moving on the the \code{start} block, we see how the registers were
+allocated. Variables \code{v}, \code{x}, and \code{y} were assigned to
+\code{rbx} and variable \code{z} was assigned to \code{rcx}.  Variable
+\code{w} was spilled to the stack location \code{-16(\%rbp)}.  Recall
+that the prelude saved the callee-save register \code{rbx} onto the
+stack. The spilled variables must be placed lower on the stack than
+the saved callee-save registers, so in this case \code{w} is placed at
+\code{-16(\%rbp)}.
+
+In the \code{conclusion}, we undo the work that was done in the
+prelude. We move the stack pointer up by \code{8} bytes (the room for
+spilled variables), then we pop the old values of \code{rbx} and
+\code{rbp} (callee-saved registers), and finish with \code{retq} to
+return control to the operating system.
+
+  
+\begin{figure}[tbp]
+  % s0_28.rkt
+  % (use-minimal-set-of-registers! #t)
+  % and only rbx rcx
+% tmp 0
+% z 1  rcx
+% y 0  rbx
+% w 2  16(%rbp)
+% v 0  rbx
+% x 0  rbx
+\begin{lstlisting}
+start:
+	movq	$1, %rbx
+	movq	$42, -16(%rbp)
+	addq	$7, %rbx
+	movq	%rbx, %rcx
+	addq	-16(%rbp), %rcx
+	negq	%rbx
+	movq	%rcx, %rax
+	addq	%rbx, %rax
+	jmp conclusion
+
+	.globl main
+main:
+	pushq	%rbp
+	movq	%rsp, %rbp
+	pushq	%rbx
+	subq	$8, %rsp
+	jmp start
+conclusion:
+	addq	$8, %rsp
+	popq	%rbx
+	popq	%rbp
+	retq
+\end{lstlisting}
+\caption{The x86 output from the running example (Figure~\ref{fig:reg-eg}).}
+\label{fig:running-example-x86}
+\end{figure}
+
+
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -3827,7 +3926,7 @@ To implement the new logical operations, the comparison operations,
 and the \key{if} expression, we need to delve further into the x86
 language. Figure~\ref{fig:x86-1} defines the abstract syntax for a
 larger subset of x86 that includes instructions for logical
-operations, comparisons, and jumps.
+operations, comparisons, and conditional jumps.
 
 One small challenge is that x86 does not provide an instruction that
 directly implements logical negation (\code{not} in $R_2$ and $C_1$).
@@ -4216,7 +4315,7 @@ less-than comparison is as follows.
 \[
 (\key{<}~e_1~e_2) \quad\Rightarrow\quad
 \begin{array}{l}
-\key{if}~(\key{<}~e_1~e_2)~\key{then} \\
+\key{if}~(\key{<}~e_1~e_2) \\
 \qquad\key{goto}~\ell_1\key{;}\\
 \key{else}\\
 \qquad\key{goto}~\ell_2\key{;}
@@ -4233,7 +4332,7 @@ current one, that is, predicate context. So we apply
 \code{explicate-pred} to the ``then'' branch with the two blocks
 \GOTO{$\ell_1$} and \GOTO{$\ell_2$} to obtain $B_3$.  Proceed in a
 similar way with the ``else'' branch to obtain $B_4$.  Finally, we
-apply \code{explicate-pred} to the predicate of hte \code{if} and the
+apply \code{explicate-pred} to the predicate of the \code{if} and the
 blocks $B_3$ and $B_4$ to obtain the result $B_5$.
 \[
 (\key{if}\; \itm{cnd}\; \itm{thn}\; \itm{els})
@@ -4269,22 +4368,22 @@ approach of encoding them as integers, with true as 1 and false as 0.
 For $\Stmt$, we discuss a couple cases.  The \code{not} operation can
 be implemented in terms of \code{xorq} as we discussed at the
 beginning of this section. Given an assignment
-$\itm{var}$ \key{=} \key{(not} $\Arg$\key{);},
+$\itm{var}$ \key{=} \key{(not} $\Atm$\key{);},
 if the left-hand side $\itm{var}$ is
-the same as $\Arg$, then just the \code{xorq} suffices.
+the same as $\Atm$, then just the \code{xorq} suffices.
 \[
 \Var~\key{=}~ \key{(not}\; \Var\key{);}
 \quad\Rightarrow\quad
 \key{xorq}~\key{\$}1\key{,}~\Var
 \]
 Otherwise, a \key{movq} is needed to adapt to the update-in-place
-semantics of x86. Let $\Arg'$ be the result of recursively processing
-$\Arg$. Then we have
+semantics of x86. Let $\Arg$ be the result of translating $\Atm$ to
+x86. Then we have
 \[
-\Var~\key{=}~ \key{(not}\; \Arg\key{);}
+\Var~\key{=}~ \key{(not}\; \Atm\key{);}
 \quad\Rightarrow\quad
 \begin{array}{l}
-\key{movq}~\Arg'\key{,}~\Var\\
+\key{movq}~\Arg\key{,}~\Var\\
 \key{xorq}~\key{\$}1\key{,}~\Var
 \end{array}
 \]
@@ -4297,7 +4396,7 @@ sequence of three instructions. \\
 \begin{tabular}{lll}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
-|$\Var$| = (eq? |$\Arg_1$| |$\Arg_2$|);
+|$\Var$| = (eq? |$\Atm_1$| |$\Atm_2$|);
 \end{lstlisting}
 \end{minipage}
 &
@@ -4305,7 +4404,7 @@ $\Rightarrow$
 &
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
-cmpq |$\Arg'_2$|, |$\Arg'_1$|
+cmpq |$\Arg_2$|, |$\Arg_1$|
 sete %al
 movzbq %al, |$\Var$|
 \end{lstlisting}
@@ -4324,7 +4423,7 @@ to a regular jump (for ``else'').\\
 \begin{tabular}{lll}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
-if (eq? |$\Arg_1$| |$\Arg_2$|) then
+if (eq? |$\Atm_1$| |$\Atm_2$|)
    goto |$\ell_1$|;
 else
    goto |$\ell_2$|;
@@ -4335,7 +4434,7 @@ $\Rightarrow$
 &
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
-cmpq |$\Arg'_2$| |$\Arg'_1$|
+cmpq |$\Arg_2$|, |$\Arg_1$|
 je |$\ell_1$|
 jmp |$\ell_2$|
 \end{lstlisting}
@@ -4578,7 +4677,7 @@ Figure~\ref{fig:R2-passes} lists all the passes needed for the
 compilation of $R_2$.
 
 
-\section{Challenge: Optimize Jumps}
+\section{Challenge: Optimize and Remove Jumps}
 \label{sec:opt-jumps}
 
 Recall that in the example output of \code{explicate-control} in