Jeremy Siek 4 éve
szülő
commit
d1491b02c4
1 módosított fájl, 165 hozzáadás és 66 törlés
  1. 165 66
      book.tex

+ 165 - 66
book.tex

@@ -1395,17 +1395,15 @@ procedure call. The memory layout for an individual frame is shown in
 Figure~\ref{fig:frame}.  The register \key{rsp} is called the
 Figure~\ref{fig:frame}.  The register \key{rsp} is called the
 \emph{stack pointer} and points to the item at the top of the
 \emph{stack pointer} and points to the item at the top of the
 stack. The stack grows downward in memory, so we increase the size of
 stack. The stack grows downward in memory, so we increase the size of
-the stack by subtracting from the stack pointer. Some operating
-systems require the frame size to be a multiple of 16 bytes. In the
-context of a procedure call, the \emph{return address} is the next
-instruction after the call instruction on the caller side. During a
-function call, the return address is pushed onto the stack.  The
-register \key{rbp} is the \emph{base pointer} and is used to access
-variables associated with the current procedure call.  The base
-pointer of the caller is pushed onto the stack after the return
-address. We number the variables from $1$ to $n$. Variable $1$ is
-stored at address $-8\key{(\%rbp)}$, variable $2$ at
-$-16\key{(\%rbp)}$, etc.
+the stack by subtracting from the stack pointer.  In the context of a
+procedure call, the \emph{return address} is the next instruction
+after the call instruction on the caller side. During a function call,
+the return address is pushed onto the stack.  The register \key{rbp}
+is the \emph{base pointer} and is used to access variables associated
+with the current procedure call.  The base pointer of the caller is
+pushed onto the stack after the return address. We number the
+variables from $1$ to $n$. Variable $1$ is stored at address
+$-8\key{(\%rbp)}$, variable $2$ at $-16\key{(\%rbp)}$, etc.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 \begin{lstlisting}
 \begin{lstlisting}
@@ -1448,36 +1446,49 @@ Position & Contents \\ \hline
 \label{fig:frame}
 \label{fig:frame}
 \end{figure}
 \end{figure}
 
 
-Getting back to the program in Figure~\ref{fig:p1-x86}, the first
-three instructions are the typical \emph{prelude} for a procedure.
-The instruction \key{pushq \%rbp} saves the base pointer for the
-caller onto the stack and subtracts $8$ from the stack pointer. The
-second instruction \key{movq \%rsp, \%rbp} changes the base pointer so
-that it points the location of the old base pointer. The instruction
-\key{subq \$16, \%rsp} moves the stack pointer down to make enough
-room for storing variables.  This program needs one variable ($8$
-bytes) but because the frame size is required to be a multiple of 16
-bytes, the space for variables is rounded up to 16 bytes.
+
+Getting back to the program in Figure~\ref{fig:p1-x86}, consider how
+control is transfered from the operating system to the \code{main}
+function.  The operating system issues a \code{callq main} instruction
+which pushes its return address on the stack and then jumps to
+\code{main}. In x86-64, the stack pointer \code{rsp} must be divisible
+by 16 bytes prior to the execution of any \code{callq} instruction, so
+when control arrives at \code{main}, the \code{rsp} is 8 bytes out of
+alignment (because the \code{callq} pushed the return address).  The
+first three instructions are the typical \emph{prelude} for a
+procedure.  The instruction \code{pushq \%rbp} saves the base pointer
+for the caller onto the stack and subtracts $8$ from the stack
+pointer. At this point the stack pointer is back to being 16-byte
+aligned. The second instruction \code{movq \%rsp, \%rbp} changes the
+base pointer so that it points the location of the old base
+pointer. The instruction \code{subq \$16, \%rsp} moves the stack
+pointer down to make enough room for storing variables.  This program
+needs one variable ($8$ bytes) but we round up to 16 bytes to maintain
+the 16-byte alignment of the \code{rsp}. With the \code{rsp} aligned,
+we are ready to make calls to other functions. The last instruction of
+the prelude is \code{jmp start}, which transfers control to the
+instructions that were generated from the Racket expression \code{(+
+  10 32)}.
 
 
 The four instructions under the label \code{start} carry out the work
 The four instructions under the label \code{start} carry out the work
 of computing \code{(+ 52 (- 10)))}. The first instruction
 of computing \code{(+ 52 (- 10)))}. The first instruction
-\key{movq \$10, -8(\%rbp)} stores $10$ in variable $1$. The
-instruction \key{negq -8(\%rbp)} changes variable $1$ to $-10$. The
-instruction \key{movq \$52, \%rax} places $52$ in the register \key{rax} and
-finally \key{addq -8(\%rbp), \%rax} adds the contents of variable $1$ to
-\key{rax}, at which point \key{rax} contains $42$.
+\code{movq \$10, -8(\%rbp)} stores $10$ in variable $1$. The
+instruction \code{negq -8(\%rbp)} changes variable $1$ to $-10$. The
+instruction \code{movq \$52, \%rax} places $52$ in the register \code{rax} and
+finally \code{addq -8(\%rbp), \%rax} adds the contents of variable $1$ to
+\code{rax}, at which point \code{rax} contains $42$.
 
 
 The three instructions under the label \code{conclusion} are the
 The three instructions under the label \code{conclusion} are the
-typical \emph{finale} of a procedure.  The first two instructions are
-necessary to get the state of the machine back to where it was at the
-beginning of the procedure.  The instruction \key{addq \$16, \%rsp}
-moves the stack pointer back to point at the old base pointer. The
-amount added here needs to match the amount that was subtracted in the
-prelude of the procedure. Then \key{popq \%rbp} returns the old base
-pointer to \key{rbp} and adds $8$ to the stack pointer.  The last
-instruction, \key{retq}, jumps back to the procedure that called this
-one and adds 8 to the stack pointer, which returns the stack pointer
-to where it was prior to the procedure call.
+typical \emph{conclusion} of a procedure.  The first two instructions
+are necessary to get the state of the machine back to where it was at
+the beginning of the procedure.  The instruction \key{addq \$16,
+  \%rsp} moves the stack pointer back to point at the old base
+pointer. The amount added here needs to match the amount that was
+subtracted in the prelude of the procedure. Then \key{popq \%rbp}
+returns the old base pointer to \key{rbp} and adds $8$ to the stack
+pointer.  The last instruction, \key{retq}, jumps back to the
+procedure that called this one and adds 8 to the stack pointer, which
+returns the stack pointer to where it was prior to the procedure call.
 
 
 The compiler needs a convenient representation for manipulating x86
 The compiler needs a convenient representation for manipulating x86
 programs, so we define an abstract syntax for x86 in
 programs, so we define an abstract syntax for x86 in
@@ -2549,7 +2560,7 @@ function call, and the callee is responsible for saving and restoring
 some other registers, the \emph{callee-saved registers}, before and
 some other registers, the \emph{callee-saved registers}, before and
 after using them. The caller-saved registers are
 after using them. The caller-saved registers are
 \begin{lstlisting}
 \begin{lstlisting}
-  rax rdx rcx rsi rdi r8 r9 r10 r11
+  rax rcx rdx rsi rdi r8 r9 r10 r11
 \end{lstlisting}
 \end{lstlisting}
 while the callee-saved registers are
 while the callee-saved registers are
 \begin{lstlisting}
 \begin{lstlisting}
@@ -2775,13 +2786,23 @@ move. So we have the following three rules.
   the edge $(d,v)$ for every $v \in L_{\mathsf{after}}(k)$ unless $v =
   the edge $(d,v)$ for every $v \in L_{\mathsf{after}}(k)$ unless $v =
   d$ or $v = s$.
   d$ or $v = s$.
 \end{enumerate}
 \end{enumerate}
-\margincomment{JM: I think you could give examples of each one of these
-  using the example program and use those to help explain why these
-  rules are correct.\\
-  JS: Agreed.}
 
 
-Working from the top to bottom of Figure~\ref{fig:live-eg}, we obtain
-the following interference for each instruction.
+Working from the top to bottom of Figure~\ref{fig:live-eg}, apply the
+above rules to each instruction. We highlight a few of the
+instructions and then refer the reader to
+Figure~\ref{fig:interference-results} all the interference results.
+The first instruction is \lstinline{movq $1, v}, so rule 3 applies,
+and the live-after set is $\{v\}$. We do not add any interference
+edges because the one live variable $v$ is also the destination of
+this instruction.
+%
+For the second instruction, \lstinline{movq $42, w}, so rule 3 applies
+again, and the live-after set is $\{v,w\}$. So the target $w$ of
+\key{movq} interferes with $v$.
+%
+Next we skip forward to the instruction \lstinline{movq x, y}.
+
+\begin{figure}[tbp]
 \begin{quote}
 \begin{quote}
 \begin{tabular}{ll}
 \begin{tabular}{ll}
 \lstinline{movq $1, v}& no interference by rule 3,\\
 \lstinline{movq $1, v}& no interference by rule 3,\\
@@ -2798,6 +2819,10 @@ the following interference for each instruction.
   \lstinline{jmp conclusion}& no interference.
   \lstinline{jmp conclusion}& no interference.
 \end{tabular}
 \end{tabular}
 \end{quote}
 \end{quote}
+\caption{Interference results for the running example.}
+\label{fig:interference-results}
+\end{figure}
+
 The resulting interference graph is shown in
 The resulting interference graph is shown in
 Figure~\ref{fig:interfere}.
 Figure~\ref{fig:interfere}.
 
 
@@ -3251,15 +3276,17 @@ The prelude saved the values in \code{rbp} and \code{rsp} and the
 conclusion returned those values to \code{rbp} and \code{rsp}.  The
 conclusion returned those values to \code{rbp} and \code{rsp}.  The
 reason for this is that our \code{main} function must adhere to the
 reason for this is that our \code{main} function must adhere to the
 x86 calling conventions that we described in
 x86 calling conventions that we described in
-Section~\ref{sec:calling-conventions}. In addition, the \code{main}
-function needs to restore (in the conclusion) any callee-saved
-registers that get used during register allocation. The simplest
+Section~\ref{sec:calling-conventions}.  Furthermore, if your register
+allocator assigned variables to other callee-saved registers
+(e.g. rbx, r12, etc.), then those variables must also be saved to the
+stack in the prelude and restored in the conclusion.  The simplest
 approach is to save and restore all of the callee-saved registers. The
 approach is to save and restore all of the callee-saved registers. The
 more efficient approach is to keep track of which callee-saved
 more efficient approach is to keep track of which callee-saved
 registers were used and only save and restore them. Either way, make
 registers were used and only save and restore them. Either way, make
 sure to take this use of stack space into account when you are
 sure to take this use of stack space into account when you are
-calculating the size of the frame. Also, don't forget that the size of
-the frame needs to be a multiple of 16 bytes.
+calculating the size of the frame and adjusting the \code{rsp} in the
+prelude. Also, don't forget that the size of the frame needs to be a
+multiple of 16 bytes!
 
 
 
 
 \section{Challenge: Move Biasing}
 \section{Challenge: Move Biasing}
@@ -3313,10 +3340,12 @@ jmp conclusion
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 
 
-While this allocation is quite good, we could do better. For example,
-the variables \key{x} and \key{y} ended up in different registers, but
-if they had been placed in the same register, then the move from
-\key{x} to \key{y} could be removed.
+In the above output code there are two \key{movq} instructions that
+can be removed because their source and target are the same.  However,
+if we had put \key{t}, \key{v}, \key{x}, and \key{y} into the same
+register, we could instead remove three \key{movq} instructions.  We
+can accomplish this by taking into account which variables appear in
+\key{movq} instructions with which other variables.
 
 
 We say that two variables $p$ and $q$ are \emph{move related} if they
 We say that two variables $p$ and $q$ are \emph{move related} if they
 participate together in a \key{movq} instruction, that is, \key{movq}
 participate together in a \key{movq} instruction, that is, \key{movq}
@@ -3503,6 +3532,76 @@ programs to make sure that your move biasing is working properly.
 \margincomment{\footnotesize To do: another neat challenge would be to do
 \margincomment{\footnotesize To do: another neat challenge would be to do
   live range splitting~\citep{Cooper:1998ly}. \\ --Jeremy}
   live range splitting~\citep{Cooper:1998ly}. \\ --Jeremy}
 
 
+\section{Output of the Running Example}
+\label{sec:reg-alloc-output}
+
+Figure~\ref{fig:running-example-x86} shows the x86 code generated for
+the running example (Figure~\ref{fig:reg-eg}) with register allocation
+and move biasing. To demonstrate both the use of registers and the
+stack, we have limited the register allocator to use just two
+registers: \code{rbx} and \code{rcx}.  In the prelude of the
+\code{main} function, we push \code{rbx} onto the stack because it is
+a callee-saved register and it was assigned to variable by the
+register allocator.  We substract \code{8} from the \code{rsp} at the
+end of the prelude to reserve space for the one spilled variable.
+After that subtraction, the \code{rsp} is aligned to 16 bytes.
+
+Moving on the the \code{start} block, we see how the registers were
+allocated. Variables \code{v}, \code{x}, and \code{y} were assigned to
+\code{rbx} and variable \code{z} was assigned to \code{rcx}.  Variable
+\code{w} was spilled to the stack location \code{-16(\%rbp)}.  Recall
+that the prelude saved the callee-save register \code{rbx} onto the
+stack. The spilled variables must be placed lower on the stack than
+the saved callee-save registers, so in this case \code{w} is placed at
+\code{-16(\%rbp)}.
+
+In the \code{conclusion}, we undo the work that was done in the
+prelude. We move the stack pointer up by \code{8} bytes (the room for
+spilled variables), then we pop the old values of \code{rbx} and
+\code{rbp} (callee-saved registers), and finish with \code{retq} to
+return control to the operating system.
+
+  
+\begin{figure}[tbp]
+  % s0_28.rkt
+  % (use-minimal-set-of-registers! #t)
+  % and only rbx rcx
+% tmp 0
+% z 1  rcx
+% y 0  rbx
+% w 2  16(%rbp)
+% v 0  rbx
+% x 0  rbx
+\begin{lstlisting}
+start:
+	movq	$1, %rbx
+	movq	$42, -16(%rbp)
+	addq	$7, %rbx
+	movq	%rbx, %rcx
+	addq	-16(%rbp), %rcx
+	negq	%rbx
+	movq	%rcx, %rax
+	addq	%rbx, %rax
+	jmp conclusion
+
+	.globl main
+main:
+	pushq	%rbp
+	movq	%rsp, %rbp
+	pushq	%rbx
+	subq	$8, %rsp
+	jmp start
+conclusion:
+	addq	$8, %rsp
+	popq	%rbx
+	popq	%rbp
+	retq
+\end{lstlisting}
+\caption{The x86 output from the running example (Figure~\ref{fig:reg-eg}).}
+\label{fig:running-example-x86}
+\end{figure}
+
+
 
 
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
@@ -3827,7 +3926,7 @@ To implement the new logical operations, the comparison operations,
 and the \key{if} expression, we need to delve further into the x86
 and the \key{if} expression, we need to delve further into the x86
 language. Figure~\ref{fig:x86-1} defines the abstract syntax for a
 language. Figure~\ref{fig:x86-1} defines the abstract syntax for a
 larger subset of x86 that includes instructions for logical
 larger subset of x86 that includes instructions for logical
-operations, comparisons, and jumps.
+operations, comparisons, and conditional jumps.
 
 
 One small challenge is that x86 does not provide an instruction that
 One small challenge is that x86 does not provide an instruction that
 directly implements logical negation (\code{not} in $R_2$ and $C_1$).
 directly implements logical negation (\code{not} in $R_2$ and $C_1$).
@@ -4216,7 +4315,7 @@ less-than comparison is as follows.
 \[
 \[
 (\key{<}~e_1~e_2) \quad\Rightarrow\quad
 (\key{<}~e_1~e_2) \quad\Rightarrow\quad
 \begin{array}{l}
 \begin{array}{l}
-\key{if}~(\key{<}~e_1~e_2)~\key{then} \\
+\key{if}~(\key{<}~e_1~e_2) \\
 \qquad\key{goto}~\ell_1\key{;}\\
 \qquad\key{goto}~\ell_1\key{;}\\
 \key{else}\\
 \key{else}\\
 \qquad\key{goto}~\ell_2\key{;}
 \qquad\key{goto}~\ell_2\key{;}
@@ -4233,7 +4332,7 @@ current one, that is, predicate context. So we apply
 \code{explicate-pred} to the ``then'' branch with the two blocks
 \code{explicate-pred} to the ``then'' branch with the two blocks
 \GOTO{$\ell_1$} and \GOTO{$\ell_2$} to obtain $B_3$.  Proceed in a
 \GOTO{$\ell_1$} and \GOTO{$\ell_2$} to obtain $B_3$.  Proceed in a
 similar way with the ``else'' branch to obtain $B_4$.  Finally, we
 similar way with the ``else'' branch to obtain $B_4$.  Finally, we
-apply \code{explicate-pred} to the predicate of hte \code{if} and the
+apply \code{explicate-pred} to the predicate of the \code{if} and the
 blocks $B_3$ and $B_4$ to obtain the result $B_5$.
 blocks $B_3$ and $B_4$ to obtain the result $B_5$.
 \[
 \[
 (\key{if}\; \itm{cnd}\; \itm{thn}\; \itm{els})
 (\key{if}\; \itm{cnd}\; \itm{thn}\; \itm{els})
@@ -4269,22 +4368,22 @@ approach of encoding them as integers, with true as 1 and false as 0.
 For $\Stmt$, we discuss a couple cases.  The \code{not} operation can
 For $\Stmt$, we discuss a couple cases.  The \code{not} operation can
 be implemented in terms of \code{xorq} as we discussed at the
 be implemented in terms of \code{xorq} as we discussed at the
 beginning of this section. Given an assignment
 beginning of this section. Given an assignment
-$\itm{var}$ \key{=} \key{(not} $\Arg$\key{);},
+$\itm{var}$ \key{=} \key{(not} $\Atm$\key{);},
 if the left-hand side $\itm{var}$ is
 if the left-hand side $\itm{var}$ is
-the same as $\Arg$, then just the \code{xorq} suffices.
+the same as $\Atm$, then just the \code{xorq} suffices.
 \[
 \[
 \Var~\key{=}~ \key{(not}\; \Var\key{);}
 \Var~\key{=}~ \key{(not}\; \Var\key{);}
 \quad\Rightarrow\quad
 \quad\Rightarrow\quad
 \key{xorq}~\key{\$}1\key{,}~\Var
 \key{xorq}~\key{\$}1\key{,}~\Var
 \]
 \]
 Otherwise, a \key{movq} is needed to adapt to the update-in-place
 Otherwise, a \key{movq} is needed to adapt to the update-in-place
-semantics of x86. Let $\Arg'$ be the result of recursively processing
-$\Arg$. Then we have
+semantics of x86. Let $\Arg$ be the result of translating $\Atm$ to
+x86. Then we have
 \[
 \[
-\Var~\key{=}~ \key{(not}\; \Arg\key{);}
+\Var~\key{=}~ \key{(not}\; \Atm\key{);}
 \quad\Rightarrow\quad
 \quad\Rightarrow\quad
 \begin{array}{l}
 \begin{array}{l}
-\key{movq}~\Arg'\key{,}~\Var\\
+\key{movq}~\Arg\key{,}~\Var\\
 \key{xorq}~\key{\$}1\key{,}~\Var
 \key{xorq}~\key{\$}1\key{,}~\Var
 \end{array}
 \end{array}
 \]
 \]
@@ -4297,7 +4396,7 @@ sequence of three instructions. \\
 \begin{tabular}{lll}
 \begin{tabular}{lll}
 \begin{minipage}{0.4\textwidth}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-|$\Var$| = (eq? |$\Arg_1$| |$\Arg_2$|);
+|$\Var$| = (eq? |$\Atm_1$| |$\Atm_2$|);
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 &
 &
@@ -4305,7 +4404,7 @@ $\Rightarrow$
 &
 &
 \begin{minipage}{0.4\textwidth}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-cmpq |$\Arg'_2$|, |$\Arg'_1$|
+cmpq |$\Arg_2$|, |$\Arg_1$|
 sete %al
 sete %al
 movzbq %al, |$\Var$|
 movzbq %al, |$\Var$|
 \end{lstlisting}
 \end{lstlisting}
@@ -4324,7 +4423,7 @@ to a regular jump (for ``else'').\\
 \begin{tabular}{lll}
 \begin{tabular}{lll}
 \begin{minipage}{0.4\textwidth}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-if (eq? |$\Arg_1$| |$\Arg_2$|) then
+if (eq? |$\Atm_1$| |$\Atm_2$|)
    goto |$\ell_1$|;
    goto |$\ell_1$|;
 else
 else
    goto |$\ell_2$|;
    goto |$\ell_2$|;
@@ -4335,7 +4434,7 @@ $\Rightarrow$
 &
 &
 \begin{minipage}{0.4\textwidth}
 \begin{minipage}{0.4\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-cmpq |$\Arg'_2$| |$\Arg'_1$|
+cmpq |$\Arg_2$|, |$\Arg_1$|
 je |$\ell_1$|
 je |$\ell_1$|
 jmp |$\ell_2$|
 jmp |$\ell_2$|
 \end{lstlisting}
 \end{lstlisting}
@@ -4578,7 +4677,7 @@ Figure~\ref{fig:R2-passes} lists all the passes needed for the
 compilation of $R_2$.
 compilation of $R_2$.
 
 
 
 
-\section{Challenge: Optimize Jumps}
+\section{Challenge: Optimize and Remove Jumps}
 \label{sec:opt-jumps}
 \label{sec:opt-jumps}
 
 
 Recall that in the example output of \code{explicate-control} in
 Recall that in the example output of \code{explicate-control} in