Parcourir la source

uniform notion of block, introduce basic block terminology in the right places

Jeremy Siek il y a 3 ans
Parent
commit
1e89e2c626
1 fichiers modifiés avec 31 ajouts et 23 suppressions
  1. 31 23
      book.tex

+ 31 - 23
book.tex

@@ -2587,9 +2587,9 @@ Figure~\ref{fig:x86-int-ast}. We refer to this language as
 The main difference compared to the concrete syntax of \LangXInt{}
 (Figure~\ref{fig:x86-int-concrete}) is that labels are not allowed in
 front of every instruction. Instead instructions are grouped into
-\emph{blocks}\index{subject}{block} with a
-label associated with every block, which is why the \key{X86Program}
-struct includes an alist mapping labels to blocks. The reason for this
+\emph{basic blocks}\index{subject}{basic block} with a
+label associated with every basic block, which is why the \key{X86Program}
+struct includes an alist mapping labels to basic blocks. The reason for this
 organization becomes apparent in Chapter~\ref{ch:Lif} when we
 introduce conditional branching. The \code{Block} structure includes
 an $\itm{info}$ field that is not needed for this chapter but becomes
@@ -2741,9 +2741,8 @@ Our compiler for \LangVar{} consists of the following passes.
 {\if\edition\racketEd
 \item[\key{explicate\_control}] makes the execution order of the
   program explicit. It converts the abstract syntax tree
-  representation into a graph in which each node contains a sequence
-  of statements and the edges between nodes say which nodes contain
-  jumps to other nodes.
+  representation into a graph in which each node is a labeled sequence
+  of statements and the edges are \code{goto} statements.
 \fi}
 
 \item[\key{select\_instructions}] handles the difference between
@@ -7285,6 +7284,7 @@ making it straightforward to compile \code{if} statements to x86.  The
 \key{CProgram} construct contains an alist mapping labels to $\Tail$
 expressions. A \code{goto} statement transfers control to the $\Tail$
 expression corresponding to its label.
+%
 Figure~\ref{fig:c1-concrete-syntax} defines the concrete syntax of the
 \LangCIf{} intermediate language and Figure~\ref{fig:c1-syntax}
 defines its abstract syntax.
@@ -7308,13 +7308,14 @@ statement to finish the program with a specified value.
 %
 The \key{CProgram} construct contains a dictionary mapping labels to
 lists of statements that end with a \code{return} statement, a
-\code{goto}, or a conditional \code{goto}.  Statement lists of this
-form are called \emph{basic blocks}\index{subject}{basic block}: there
-is a control transfer at the end and control only enters at the
-beginning of the list, which is marked by the label.
+\code{goto}, or a conditional \code{goto}.
+%% Statement lists of this
+%% form are called \emph{basic blocks}\index{subject}{basic block}: there
+%% is a control transfer at the end and control only enters at the
+%% beginning of the list, which is marked by the label.
 %
-A \code{goto} statement transfers control to basic block corresponding
-to its label.
+A \code{goto} statement transfers control to the sequence of statements
+associated with its label.
 %
 The concrete syntax for \LangCIf{} is defined in
 Figure~\ref{fig:c1-concrete-syntax} and the abstract syntax is defined
@@ -7366,7 +7367,7 @@ in Figure~\ref{fig:c1-syntax}.
 \Stmt &::=& \PRINT{\Atm} \MID \EXPR{\Exp} \\
      &\MID& \ASSIGN{\VAR{\Var}}{\Exp}  
      \MID \RETURN{\Exp} \MID \GOTO{\itm{label}} \\
-    &\MID& \IFSTMT{\CMP{\Atm}{\itm{cmp}}{\Atm}}{\LS\GOTO{\itm{label}}\RS}{\LS\GOTO{\itm{label}}\RS} 
+     &\MID& \IFSTMT{\CMP{\Atm}{\itm{cmp}}{\Atm}}{\LS\GOTO{\itm{label}}\RS}{\LS\GOTO{\itm{label}}\RS}
 \end{array}
 }
   
@@ -7444,6 +7445,11 @@ language. Figures~\ref{fig:x86-1-concrete} and \ref{fig:x86-1} define
 the concrete and abstract syntax for the \LangXIf{} subset of x86,
 which includes instructions for logical operations, comparisons, and
 \racket{conditional} jumps.
+%
+\python{The abstract syntax for an \LangXIf{} program contains a
+  dictionary mapping labels to sequences of instructions, each of
+  which we refer to as a \emph{basic block}\index{subject}{basic
+    block}.}
 
 One challenge is that x86 does not provide an instruction that
 directly implements logical negation (\code{not} in \LangIf{} and
@@ -7484,7 +7490,7 @@ $\Atm$ to x86.
    \MID \key{cmpq}~\Arg\key{,}~\Arg
     \MID  \key{set}cc~\Arg 
     \MID \key{movzbq}~\Arg\key{,}~\Arg \\
-    &\MID& \key{j}cc~\itm{label}
+    &\MID& \key{j}cc~\itm{label} \\
 \end{array}
 }
 
@@ -7558,7 +7564,8 @@ $\Atm$ to x86.
        &\MID& \BININSTR{\scode{set}}{\itm{cc}}{\Arg} 
        \MID \BININSTR{\scode{movzbq}}{\Arg}{\Arg}\\
        &\MID&  \JMPIF{\itm{cc}}{\itm{label}} \\
-\LangXIfM{} &::= & \XPROGRAM{\itm{info}}{\LC\itm{label} \,\key{:}\, \Instr^{*} \key{,} \ldots \RC }
+\Block &::= & \Instr^{+} \\
+\LangXIfM{} &::= & \XPROGRAM{\itm{info}}{\LC\itm{label} \,\key{:}\, \Block \key{,} \ldots \RC }
 \end{array}
 \]
 \fi}
@@ -7897,16 +7904,17 @@ Unfortunately, this approach duplicates the two branches from the
 outer \code{if} and a compiler must never duplicate code!  After all,
 the two branches could be very large expressions.
 
-We need a way to perform the above transformation but without
-duplicating code. That is, we need a way for different parts of a
-program to refer to the same piece of code.
+How can we apply the above transformation but without duplicating
+code? In other words, how can two different parts of a program refer
+to one piece of code.
 %
-Put another way, we need to move away from abstract syntax
-\emph{trees} and instead use \emph{graphs}.
+The answer is that we must move away from abstract syntax \emph{trees}
+and instead use \emph{graphs}.
 %
 At the level of x86 assembly this is straightforward because we can
 label the code for each branch and insert jumps in all the places that
-need to execute the branch.
+need to execute the branch. In this way, jump instructions are edges
+in the graph and the basic blocks are the nodes.
 %
 Likewise, our language \LangCIf{} provides the ability to label a
 sequence of statements and to jump to a label via \code{goto}.
@@ -14101,8 +14109,8 @@ language, whose syntax is defined in Figure~\ref{fig:x86-3}.
 \Instr &::=& \ldots
      \MID \key{callq}\;\key{*}\Arg \MID \key{tailjmp}\;\Arg 
      \MID \key{leaq}\;\Arg\key{,}\;\key{\%}\Reg \\
-\Block &::= & \itm{label}\key{:}\, \Instr^{*} \\
-\Def &::= & \key{.globl}\,\itm{label}\; \Block^{*} \\
+\Block &::= & \Instr^{+} \\
+\Def &::= & \key{.globl}\,\itm{label}\; (\itm{label}\key{:}\, \Block)^{*} \\
 \LangXIndCallM{} &::= & \Def\ldots
 \end{array}
 \]