пре 6 година · 1b05ea00e9
--- a/book.tex
+++ b/book.tex
@@ -1454,13 +1454,6 @@ the \key{assign-homes}, followed by a third pass, named
 
				 patch-up any outstanding problems regarding instructions that involve
			
 
				 too many memory accesses.
			
 
				 
			
 
				-Figure~\ref{fig:R1-passes} presents the ordering of the compiler
			
 
				-passes in the form of a graph. Each pass is an edge and the
			
 
				-input/output language of each pass is a node.
			
 
				-
			
 
				-UNDER CONSTRUCTION
			
 
				-
			
 
				-
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
			
 
				 \node (R1) at (0,2)  {\large $R_1$};
			
@@ -1488,64 +1481,62 @@ UNDER CONSTRUCTION
 
				 \label{fig:R1-passes}
			
 
				 \end{figure}
			
 
				 
			
 
				+Figure~\ref{fig:R1-passes} presents the ordering of the compiler
			
 
				+passes in the form of a graph. Each pass is an edge and the
			
 
				+input/output language of each pass is a node in the graph.  The output
			
 
				+of \key{uniquify} and \key{remove-complex-opera*} are programs that
			
 
				+are still in the $R_1$ language, but the output of the pass
			
 
				+\key{explicate-control} is in a different language that is designed to
			
 
				+make the order of evaluation explicit in its syntax, which we
			
 
				+introduce in the next section. Also, there are two passes of lesser
			
 
				+importance in Figure~\ref{fig:R1-passes} that we have not yet talked
			
 
				+about, \key{uncover-locals} and \key{print-x86}. We shall discuss them
			
 
				+later in this Chapter.
			
 
				+
			
 
				+\subsection{The $C_0$ Intermediate Language}
			
 
				+
			
 
				+It so happens that the output of \key{explicate-control} is vaguely
			
 
				+similar to the $C$ language~\citep{Kernighan:1988nx}, so we name it
			
 
				+$C_0$.  The syntax for $C_0$ is defined in Figure~\ref{fig:c0-syntax}.
			
 
				+%
			
 
				+The $C_0$ language supports the same operators as $R_1$ but the
			
 
				+arguments of operators are now restricted to just variables and
			
 
				+integers, thanks to the \key{remove-complex-opera*} pass.  In the
			
 
				+literature this style of intermediate language is called
			
 
				+administrative normal form, or ANF for
			
 
				+short~\citep{Danvy:1991fk,Flanagan:1993cg}.  Instead of \key{let}
			
 
				+expressions, $C_0$ has assignment statements which can be executed in
			
 
				+sequence using the \key{seq} construct. A sequent of statements always
			
 
				+ends with \key{return}, a guarantee that is baked into the grammar
			
 
				+rules for the \itm{tail} non-terminal. The term \emph{tail position}
			
 
				+refers to an expression that is the last one to execute within a
			
 
				+function. (An expression may contain subexpressions, and those may or
			
 
				+may not be in tail position depending on the kind of expression.)  We
			
 
				+choose the name ``tail'' for this non-terminal in the grammar because
			
 
				+indeed, it corresponds to the last thing that needs to execute.
			
 
				+
			
 
				+A $C_0$ program consists of an association list mapping labels to
			
 
				+tails, though this is overkill for the present Chapter, as we do not
			
 
				+yet need the introduce \key{goto} for jumping to a label.  For now
			
 
				+there will just be one label, \key{start}, and the whole program will
			
 
				+be in it's tail.
			
 
				+%
			
 
				+The $\itm{info}$ field of the program, after \key{uncover-locals},
			
 
				+will contain a mapping from \key{locals} to a list of variables, that
			
 
				+is, all the variables used in the program. At the start of the
			
 
				+program, these variables are uninitialized (they contain garbage) and
			
 
				+each variable becomes initialized on its first assignment.
			
 
				 
			
 
				-%% \[
			
 
				-%% \begin{tikzpicture}[baseline=(current  bounding  box.center)]
			
 
				-%% \foreach \i/\p in {1/1,2/2,3/3,4/4,5/5,6/6}
			
 
				-%% {
			
 
				-%%   \node (\i) at (\p*1.5,0) {$\bullet$};
			
 
				-%% }
			
 
				-%% \foreach \x/\y/\lbl in {1/2/e,2/3/b,3/4/c,4/5/a,5/6/d}
			
 
				-%% {
			
 
				-%%   \path[->,bend left=15] (\x) edge [above] node {\small\lbl} (\y);
			
 
				-%% }
			
 
				-%% \end{tikzpicture}
			
 
				-%% \]
			
 
				-We further simplify the translation from $R_1$ to x86 by identifying
			
 
				-an intermediate language named $C_0$, roughly half-way between $R_1$
			
 
				-and x86, to provide a rest stop along the way. We name the language
			
 
				-$C_0$ because it is vaguely similar to the $C$
			
 
				-language~\citep{Kernighan:1988nx}. The differences (e) and (a),
			
 
				-regarding variables and nested expressions, will be handled by two
			
 
				-steps, \key{uniquify} and \key{flatten}, which bring us to
			
 
				-$C_0$.
			
 
				-%% \[
			
 
				-%% \begin{tikzpicture}[baseline=(current  bounding  box.center)]
			
 
				-%% \foreach \i/\p in {R_1/1,R_1/2,C_0/3}
			
 
				-%% {
			
 
				-%%   \node (\p) at (\p*3,0) {\large $\i$};
			
 
				-%% }
			
 
				-%% \foreach \x/\y/\lbl in {1/2/uniquify,2/3/flatten}
			
 
				-%% {
			
 
				-%%  \path[->,bend left=15] (\x) edge [above] node {\ttfamily\footnotesize \lbl} (\y);
			
 
				-%% }
			
 
				-%% \end{tikzpicture}
			
 
				-%% \]
			
 
				-
			
 
				-The syntax for $C_0$ is defined in Figure~\ref{fig:c0-syntax}.  The
			
 
				-$C_0$ language supports the same operators as $R_1$ but the arguments
			
 
				-of operators are now restricted to just variables and integers, so all
			
 
				-intermediate results are bound to variables. In the literature this
			
 
				-style of intermediate language is called administrative normal form,
			
 
				-or ANF for short~\citep{Danvy:1991fk,Flanagan:1993cg}.  The \key{let}
			
 
				-construct of $R_1$ is replaced by an assignment statement and there is
			
 
				-a \key{return} construct to specify the return value of the program. A
			
 
				-program consists of a sequence of statements that include at least one
			
 
				-\key{return} statement. Each program is also annotated with a list of
			
 
				-variables (viz. {\tt (var*)}). At the start of the program, these
			
 
				-variables are uninitialized (they contain garbage) and each variable
			
 
				-becomes initialized on its first assignment. All of the variables used
			
 
				-in the program must be present in this list exactly once.
			
 
				-
			
 
				-\begin{figure}[tp]
			
 
				+\begin{figure}[tbp]
			
 
				 \fbox{
			
 
				 \begin{minipage}{0.96\textwidth}
			
 
				 \[
			
 
				 \begin{array}{lcl}
			
 
				 \Arg &::=& \Int \mid \Var \\
			
 
				 \Exp &::=& \Arg \mid (\key{read}) \mid (\key{-}\;\Arg) \mid (\key{+} \; \Arg\;\Arg)\\
			
 
				-\Stmt &::=& \ASSIGN{\Var}{\Exp} \mid \RETURN{\Arg} \\
			
 
				-C_0 & ::= & (\key{program}\;(\Var^{*})\;\Stmt^{+})
			
 
				+\Stmt &::=& \ASSIGN{\Var}{\Exp} \\
			
 
				+\Tail &::= & \RETURN{\Arg} \mid (\key{seq}\; \Stmt\; \Tail) \\
			
 
				+C_0 & ::= & (\key{program}\;\itm{info}\;((\itm{label}\,\key{.}\,\Tail)^{+}))
			
 
				 \end{array}
			
 
				 \]
			
 
				 \end{minipage}
			
@@ -1554,64 +1545,36 @@ C_0 & ::= & (\key{program}\;(\Var^{*})\;\Stmt^{+})
 
				 \label{fig:c0-syntax}
			
 
				 \end{figure}
			
 
				 
			
 
				-To get from $C_0$ to x86 assembly, it remains for us to handle
			
 
				-difference (a) (the format of instructions) and difference (d)
			
 
				-(variables versus stack locations and registers). These two
			
 
				-differences are intertwined, creating a bit of a Gordian Knot. To
			
 
				-handle difference (d), we need to map some variables to registers
			
 
				-(there are only 16 registers) and the remaining variables to locations
			
 
				-on the stack (which is unbounded). To make good decisions regarding
			
 
				-this mapping, we need the program to be close to its final form (in
			
 
				-x86 assembly) so we know exactly when which variables are used. After
			
 
				-all, variables that are used at different time periods during program
			
 
				-execution can be assigned to the same register.  However, our choice
			
 
				-of x86 instructions depends on whether the variables are mapped to
			
 
				-registers or stack locations, so we have a circular dependency. We cut
			
 
				-this knot by doing an optimistic selection of instructions in the
			
 
				-\key{select-instructions} pass, followed by the \key{assign-homes}
			
 
				-pass to map variables to registers or stack locations, and conclude by
			
 
				-finalizing the instruction selection in the \key{patch-instructions}
			
 
				-pass.
			
 
				-%% \[
			
 
				-%% \begin{tikzpicture}[baseline=(current  bounding  box.center)]
			
 
				-%% \node (1) at (0,0)  {\large $C_0$};
			
 
				-%% \node (2) at (3,0)  {\large $\text{x86}^{*}_0$};
			
 
				-%% \node (3) at (6,0)  {\large $\text{x86}^{*}_0$};
			
 
				-%% \node (4) at (9,0) {\large $\text{x86}_0$};
			
 
				-
			
 
				-%% \path[->,bend left=15] (1) edge [above] node {\ttfamily\footnotesize select-instr.} (2);
			
 
				-%% \path[->,bend left=15] (2) edge [above] node {\ttfamily\footnotesize assign-homes} (3);
			
 
				-%% \path[->,bend left=15] (3) edge [above] node {\ttfamily\footnotesize patch-instr.} (4);
			
 
				-%% \end{tikzpicture}
			
 
				-%% \]
			
 
				-
			
 
				-The \key{select-instructions} pass is optimistic in the sense that it
			
 
				-treats variables as if they were all mapped to registers. The
			
 
				-\key{select-instructions} pass generates a program that consists of
			
 
				-x86 instructions but that still uses variables, so it is an
			
 
				-intermediate language that is technically different than x86, which
			
 
				-explains the asterisks in the diagram above.
			
 
				-
			
 
				-In this Chapter we shall take the easy road to implementing
			
 
				-\key{assign-homes} and simply map all variables to stack locations.
			
 
				-The topic of Chapter~\ref{ch:register-allocation} is implementing a
			
 
				-smarter approach in which we make a best-effort to map variables to
			
 
				-registers, resorting to the stack only when necessary.
			
 
				-
			
 
				-Once variables have been assigned to their homes, we can finalize the
			
 
				-instruction selection by dealing with an idiosyncrasy of x86
			
 
				-assembly. Many x86 instructions have two arguments but only one of the
			
 
				-arguments may be a memory reference (and the stack is a part of
			
 
				-memory).  Because some variables may get mapped to stack locations,
			
 
				-some of our generated instructions may violate this restriction.  The
			
 
				-purpose of the \key{patch-instructions} pass is to fix this problem by
			
 
				-replacing every violating instruction with a short sequence of
			
 
				-instructions that use the \key{rax} register. Once we have implemented
			
 
				-a good register allocator (Chapter~\ref{ch:register-allocation}), the
			
 
				-need to patch instructions will be relatively rare.
			
 
				-
			
 
				-The x86$^{*}$ language extends x86
			
 
				-with variables and looser rules regarding instruction arguments. The
			
 
				+
			
 
				+%% The \key{select-instructions} pass is optimistic in the sense that it
			
 
				+%% treats variables as if they were all mapped to registers. The
			
 
				+%% \key{select-instructions} pass generates a program that consists of
			
 
				+%% x86 instructions but that still uses variables, so it is an
			
 
				+%% intermediate language that is technically different than x86, which
			
 
				+%% explains the asterisks in the diagram above.
			
 
				+
			
 
				+%% In this Chapter we shall take the easy road to implementing
			
 
				+%% \key{assign-homes} and simply map all variables to stack locations.
			
 
				+%% The topic of Chapter~\ref{ch:register-allocation} is implementing a
			
 
				+%% smarter approach in which we make a best-effort to map variables to
			
 
				+%% registers, resorting to the stack only when necessary.
			
 
				+
			
 
				+%% Once variables have been assigned to their homes, we can finalize the
			
 
				+%% instruction selection by dealing with an idiosyncrasy of x86
			
 
				+%% assembly. Many x86 instructions have two arguments but only one of the
			
 
				+%% arguments may be a memory reference (and the stack is a part of
			
 
				+%% memory).  Because some variables may get mapped to stack locations,
			
 
				+%% some of our generated instructions may violate this restriction.  The
			
 
				+%% purpose of the \key{patch-instructions} pass is to fix this problem by
			
 
				+%% replacing every violating instruction with a short sequence of
			
 
				+%% instructions that use the \key{rax} register. Once we have implemented
			
 
				+%% a good register allocator (Chapter~\ref{ch:register-allocation}), the
			
 
				+%% need to patch instructions will be relatively rare.
			
 
				+
			
 
				+\subsection{The dialects x86}
			
 
				+
			
 
				+The x86$^{*}$ language, pronounced ``pseudo-x86'', extends x86 with
			
 
				+variables and looser rules regarding instruction arguments. The
			
 
				 x86$^{\dagger}$ language is the concrete syntax (string) for x86.
			
 
				 
			
 
				 
			
--- a/defs.tex
+++ b/defs.tex
@@ -8,6 +8,7 @@
 
				 \newcommand{\FType}{\itm{ftype}}
			
 
				 \newcommand{\Instr}{\itm{instr}}
			
 
				 \newcommand{\Block}{\itm{block}}
			
 
				+\newcommand{\Tail}{\itm{tail}}
			
 
				 \newcommand{\Prog}{\itm{prog}}
			
 
				 \newcommand{\Arg}{\itm{arg}}
			
 
				 \newcommand{\Reg}{\itm{reg}}
			
--- a/notes.md
+++ b/notes.md
@@ -31,7 +31,7 @@ V
 
				     exp ::= arg | (op arg*)
			
 
				     stmt ::= (assign x exp)
			
 
				     tail ::= (return exp) | (seq stmt tail)
			
 
				-    C0 ::= (program () tail)
			
 
				+    C0 ::= (program () ((label . tail)*))
			
 
				 
			
 
				 uncover-locals
			
 
				 |