Jeremy Siek пре 6 година
родитељ
комит
1b05ea00e9
3 измењених фајлова са 81 додато и 117 уклоњено
  1. 79 116
      book.tex
  2. 1 0
      defs.tex
  3. 1 1
      notes.md

+ 79 - 116
book.tex

@@ -1454,13 +1454,6 @@ the \key{assign-homes}, followed by a third pass, named
 patch-up any outstanding problems regarding instructions that involve
 too many memory accesses.
 
-Figure~\ref{fig:R1-passes} presents the ordering of the compiler
-passes in the form of a graph. Each pass is an edge and the
-input/output language of each pass is a node.
-
-UNDER CONSTRUCTION
-
-
 \begin{figure}[tbp]
 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
 \node (R1) at (0,2)  {\large $R_1$};
@@ -1488,64 +1481,62 @@ UNDER CONSTRUCTION
 \label{fig:R1-passes}
 \end{figure}
 
+Figure~\ref{fig:R1-passes} presents the ordering of the compiler
+passes in the form of a graph. Each pass is an edge and the
+input/output language of each pass is a node in the graph.  The output
+of \key{uniquify} and \key{remove-complex-opera*} are programs that
+are still in the $R_1$ language, but the output of the pass
+\key{explicate-control} is in a different language that is designed to
+make the order of evaluation explicit in its syntax, which we
+introduce in the next section. Also, there are two passes of lesser
+importance in Figure~\ref{fig:R1-passes} that we have not yet talked
+about, \key{uncover-locals} and \key{print-x86}. We shall discuss them
+later in this Chapter.
+
+\subsection{The $C_0$ Intermediate Language}
+
+It so happens that the output of \key{explicate-control} is vaguely
+similar to the $C$ language~\citep{Kernighan:1988nx}, so we name it
+$C_0$.  The syntax for $C_0$ is defined in Figure~\ref{fig:c0-syntax}.
+%
+The $C_0$ language supports the same operators as $R_1$ but the
+arguments of operators are now restricted to just variables and
+integers, thanks to the \key{remove-complex-opera*} pass.  In the
+literature this style of intermediate language is called
+administrative normal form, or ANF for
+short~\citep{Danvy:1991fk,Flanagan:1993cg}.  Instead of \key{let}
+expressions, $C_0$ has assignment statements which can be executed in
+sequence using the \key{seq} construct. A sequent of statements always
+ends with \key{return}, a guarantee that is baked into the grammar
+rules for the \itm{tail} non-terminal. The term \emph{tail position}
+refers to an expression that is the last one to execute within a
+function. (An expression may contain subexpressions, and those may or
+may not be in tail position depending on the kind of expression.)  We
+choose the name ``tail'' for this non-terminal in the grammar because
+indeed, it corresponds to the last thing that needs to execute.
+
+A $C_0$ program consists of an association list mapping labels to
+tails, though this is overkill for the present Chapter, as we do not
+yet need the introduce \key{goto} for jumping to a label.  For now
+there will just be one label, \key{start}, and the whole program will
+be in it's tail.
+%
+The $\itm{info}$ field of the program, after \key{uncover-locals},
+will contain a mapping from \key{locals} to a list of variables, that
+is, all the variables used in the program. At the start of the
+program, these variables are uninitialized (they contain garbage) and
+each variable becomes initialized on its first assignment.
 
-%% \[
-%% \begin{tikzpicture}[baseline=(current  bounding  box.center)]
-%% \foreach \i/\p in {1/1,2/2,3/3,4/4,5/5,6/6}
-%% {
-%%   \node (\i) at (\p*1.5,0) {$\bullet$};
-%% }
-%% \foreach \x/\y/\lbl in {1/2/e,2/3/b,3/4/c,4/5/a,5/6/d}
-%% {
-%%   \path[->,bend left=15] (\x) edge [above] node {\small\lbl} (\y);
-%% }
-%% \end{tikzpicture}
-%% \]
-We further simplify the translation from $R_1$ to x86 by identifying
-an intermediate language named $C_0$, roughly half-way between $R_1$
-and x86, to provide a rest stop along the way. We name the language
-$C_0$ because it is vaguely similar to the $C$
-language~\citep{Kernighan:1988nx}. The differences (e) and (a),
-regarding variables and nested expressions, will be handled by two
-steps, \key{uniquify} and \key{flatten}, which bring us to
-$C_0$.
-%% \[
-%% \begin{tikzpicture}[baseline=(current  bounding  box.center)]
-%% \foreach \i/\p in {R_1/1,R_1/2,C_0/3}
-%% {
-%%   \node (\p) at (\p*3,0) {\large $\i$};
-%% }
-%% \foreach \x/\y/\lbl in {1/2/uniquify,2/3/flatten}
-%% {
-%%  \path[->,bend left=15] (\x) edge [above] node {\ttfamily\footnotesize \lbl} (\y);
-%% }
-%% \end{tikzpicture}
-%% \]
-
-The syntax for $C_0$ is defined in Figure~\ref{fig:c0-syntax}.  The
-$C_0$ language supports the same operators as $R_1$ but the arguments
-of operators are now restricted to just variables and integers, so all
-intermediate results are bound to variables. In the literature this
-style of intermediate language is called administrative normal form,
-or ANF for short~\citep{Danvy:1991fk,Flanagan:1993cg}.  The \key{let}
-construct of $R_1$ is replaced by an assignment statement and there is
-a \key{return} construct to specify the return value of the program. A
-program consists of a sequence of statements that include at least one
-\key{return} statement. Each program is also annotated with a list of
-variables (viz. {\tt (var*)}). At the start of the program, these
-variables are uninitialized (they contain garbage) and each variable
-becomes initialized on its first assignment. All of the variables used
-in the program must be present in this list exactly once.
-
-\begin{figure}[tp]
+\begin{figure}[tbp]
 \fbox{
 \begin{minipage}{0.96\textwidth}
 \[
 \begin{array}{lcl}
 \Arg &::=& \Int \mid \Var \\
 \Exp &::=& \Arg \mid (\key{read}) \mid (\key{-}\;\Arg) \mid (\key{+} \; \Arg\;\Arg)\\
-\Stmt &::=& \ASSIGN{\Var}{\Exp} \mid \RETURN{\Arg} \\
-C_0 & ::= & (\key{program}\;(\Var^{*})\;\Stmt^{+})
+\Stmt &::=& \ASSIGN{\Var}{\Exp} \\
+\Tail &::= & \RETURN{\Arg} \mid (\key{seq}\; \Stmt\; \Tail) \\
+C_0 & ::= & (\key{program}\;\itm{info}\;((\itm{label}\,\key{.}\,\Tail)^{+}))
 \end{array}
 \]
 \end{minipage}
@@ -1554,64 +1545,36 @@ C_0 & ::= & (\key{program}\;(\Var^{*})\;\Stmt^{+})
 \label{fig:c0-syntax}
 \end{figure}
 
-To get from $C_0$ to x86 assembly, it remains for us to handle
-difference (a) (the format of instructions) and difference (d)
-(variables versus stack locations and registers). These two
-differences are intertwined, creating a bit of a Gordian Knot. To
-handle difference (d), we need to map some variables to registers
-(there are only 16 registers) and the remaining variables to locations
-on the stack (which is unbounded). To make good decisions regarding
-this mapping, we need the program to be close to its final form (in
-x86 assembly) so we know exactly when which variables are used. After
-all, variables that are used at different time periods during program
-execution can be assigned to the same register.  However, our choice
-of x86 instructions depends on whether the variables are mapped to
-registers or stack locations, so we have a circular dependency. We cut
-this knot by doing an optimistic selection of instructions in the
-\key{select-instructions} pass, followed by the \key{assign-homes}
-pass to map variables to registers or stack locations, and conclude by
-finalizing the instruction selection in the \key{patch-instructions}
-pass.
-%% \[
-%% \begin{tikzpicture}[baseline=(current  bounding  box.center)]
-%% \node (1) at (0,0)  {\large $C_0$};
-%% \node (2) at (3,0)  {\large $\text{x86}^{*}_0$};
-%% \node (3) at (6,0)  {\large $\text{x86}^{*}_0$};
-%% \node (4) at (9,0) {\large $\text{x86}_0$};
-
-%% \path[->,bend left=15] (1) edge [above] node {\ttfamily\footnotesize select-instr.} (2);
-%% \path[->,bend left=15] (2) edge [above] node {\ttfamily\footnotesize assign-homes} (3);
-%% \path[->,bend left=15] (3) edge [above] node {\ttfamily\footnotesize patch-instr.} (4);
-%% \end{tikzpicture}
-%% \]
-
-The \key{select-instructions} pass is optimistic in the sense that it
-treats variables as if they were all mapped to registers. The
-\key{select-instructions} pass generates a program that consists of
-x86 instructions but that still uses variables, so it is an
-intermediate language that is technically different than x86, which
-explains the asterisks in the diagram above.
-
-In this Chapter we shall take the easy road to implementing
-\key{assign-homes} and simply map all variables to stack locations.
-The topic of Chapter~\ref{ch:register-allocation} is implementing a
-smarter approach in which we make a best-effort to map variables to
-registers, resorting to the stack only when necessary.
-
-Once variables have been assigned to their homes, we can finalize the
-instruction selection by dealing with an idiosyncrasy of x86
-assembly. Many x86 instructions have two arguments but only one of the
-arguments may be a memory reference (and the stack is a part of
-memory).  Because some variables may get mapped to stack locations,
-some of our generated instructions may violate this restriction.  The
-purpose of the \key{patch-instructions} pass is to fix this problem by
-replacing every violating instruction with a short sequence of
-instructions that use the \key{rax} register. Once we have implemented
-a good register allocator (Chapter~\ref{ch:register-allocation}), the
-need to patch instructions will be relatively rare.
-
-The x86$^{*}$ language extends x86
-with variables and looser rules regarding instruction arguments. The
+
+%% The \key{select-instructions} pass is optimistic in the sense that it
+%% treats variables as if they were all mapped to registers. The
+%% \key{select-instructions} pass generates a program that consists of
+%% x86 instructions but that still uses variables, so it is an
+%% intermediate language that is technically different than x86, which
+%% explains the asterisks in the diagram above.
+
+%% In this Chapter we shall take the easy road to implementing
+%% \key{assign-homes} and simply map all variables to stack locations.
+%% The topic of Chapter~\ref{ch:register-allocation} is implementing a
+%% smarter approach in which we make a best-effort to map variables to
+%% registers, resorting to the stack only when necessary.
+
+%% Once variables have been assigned to their homes, we can finalize the
+%% instruction selection by dealing with an idiosyncrasy of x86
+%% assembly. Many x86 instructions have two arguments but only one of the
+%% arguments may be a memory reference (and the stack is a part of
+%% memory).  Because some variables may get mapped to stack locations,
+%% some of our generated instructions may violate this restriction.  The
+%% purpose of the \key{patch-instructions} pass is to fix this problem by
+%% replacing every violating instruction with a short sequence of
+%% instructions that use the \key{rax} register. Once we have implemented
+%% a good register allocator (Chapter~\ref{ch:register-allocation}), the
+%% need to patch instructions will be relatively rare.
+
+\subsection{The dialects x86}
+
+The x86$^{*}$ language, pronounced ``pseudo-x86'', extends x86 with
+variables and looser rules regarding instruction arguments. The
 x86$^{\dagger}$ language is the concrete syntax (string) for x86.
 
 

+ 1 - 0
defs.tex

@@ -8,6 +8,7 @@
 \newcommand{\FType}{\itm{ftype}}
 \newcommand{\Instr}{\itm{instr}}
 \newcommand{\Block}{\itm{block}}
+\newcommand{\Tail}{\itm{tail}}
 \newcommand{\Prog}{\itm{prog}}
 \newcommand{\Arg}{\itm{arg}}
 \newcommand{\Reg}{\itm{reg}}

+ 1 - 1
notes.md

@@ -31,7 +31,7 @@ V
     exp ::= arg | (op arg*)
     stmt ::= (assign x exp)
     tail ::= (return exp) | (seq stmt tail)
-    C0 ::= (program () tail)
+    C0 ::= (program () ((label . tail)*))
 
 uncover-locals
 |