9 роки тому · b47629fe9f
--- a/book.tex
+++ b/book.tex
@@ -15,6 +15,7 @@
 
				 \usepackage{semantic}
			
 
				 \usepackage{wrapfig}
			
 
				 \usepackage{tikz}
			
 
				+\usetikzlibrary{arrows}
			
 
				 
			
 
				 % Computer Modern is already the default. -Jeremy
			
 
				 %\renewcommand{\ttdefault}{cmtt}
			
@@ -78,7 +79,8 @@ columns=fullflexible
 
				   Indiana University \\
			
 
				   \\
			
 
				   with contributions from: \\
			
 
				-  Carl Factora
			
 
				+  Carl Factora \\
			
 
				+  Cameron Swords
			
 
				    }
			
 
				 
			
 
				 \begin{document}
			
@@ -120,13 +122,14 @@ Talk about pre-requisites.
 
				 
			
 
				 Need to give thanks to 
			
 
				 \begin{itemize}
			
 
				+\item Bor-Yuh Evan Chang
			
 
				 \item Kent Dybvig
			
 
				 \item Daniel P. Friedman
			
 
				+\item Ronald Garcia
			
 
				 \item Abdulaziz Ghuloum
			
 
				-\item Oscar Waddell
			
 
				+\item Ryan Newton
			
 
				 \item Dipanwita Sarkar
			
 
				-\item Ronald Garcia
			
 
				-\item Bor-Yuh Evan Chang
			
 
				+\item Oscar Waddell
			
 
				 \end{itemize}
			
 
				 
			
 
				 %\mbox{}\\
			
@@ -140,9 +143,9 @@ Need to give thanks to
 
				 In this chapter, we review the basic tools that are needed for
			
 
				 implementing a compiler. We use abstract syntax trees (ASTs) in the
			
 
				 form of S-expressions to represent programs (Section~\ref{sec:ast})
			
 
				-and pattern matching to inspect an AST node
			
 
				+and pattern matching to inspect individual nodes in an AST
			
 
				 (Section~\ref{sec:pattern-matching}).  We use recursion to construct
			
 
				-and deconstruct entire ASTs (Section~\ref{sec:recursion}).
			
 
				+and deconstruct ASTs (Section~\ref{sec:recursion}).
			
 
				 
			
 
				 \section{Abstract Syntax Trees}
			
 
				 \label{sec:ast}
			
@@ -152,6 +155,7 @@ programs is the \emph{abstract syntax tree} (AST). When considering
 
				 some part of a program, a compiler needs to ask what kind of part it
			
 
				 is and what sub-parts it has. For example, the program on the left is
			
 
				 represented by the AST on the right.
			
 
				+\marginpar{\scriptsize The arrow heads need to be bigger. -JGS}
			
 
				 \begin{center}
			
 
				 \begin{minipage}{0.4\textwidth}
			
 
				 \begin{lstlisting}
			
@@ -174,7 +178,7 @@ represented by the AST on the right.
 
				 \end{equation}
			
 
				 \end{minipage}
			
 
				 \end{center}
			
 
				-We shall use the standard terminology for trees: each square above is
			
 
				+We shall use the standard terminology for trees: each circle above is
			
 
				 called a \emph{node}. The arrows connect a node to its \emph{children}
			
 
				 (which are also nodes). The top-most node is the \emph{root}.  Every
			
 
				 node except for the root has a \emph{parent} (the node it is the child
			
@@ -182,14 +186,14 @@ of). If a node has no children, it is a \emph{leaf} node.  Otherwise
 
				 it is an \emph{internal} node.
			
 
				 
			
 
				 When deciding how to compile the above program, we need to know that
			
 
				-the root node an addition and that it has two children: \texttt{read}
			
 
				-and the negation of \texttt{8}. The abstract syntax tree data
			
 
				-structure directly supports these queries and hence is a good
			
 
				+the root node operation is addition and that it has two children:
			
 
				+\texttt{read} and the negation of \texttt{8}. The abstract syntax tree
			
 
				+data structure directly supports these queries and hence is a good
			
 
				 choice. In this book, we will often write down the textual
			
 
				 representation of a program even when we really have in mind the AST,
			
 
				-simply because the textual representation is easier to typeset.  We
			
 
				-recommend that, in your mind, you should alway interpret programs as
			
 
				-abstract syntax trees.
			
 
				+because the textual representation is more concise.  We recommend
			
 
				+that, in your mind, you alway interpret programs as abstract syntax
			
 
				+trees.
			
 
				 
			
 
				 \section{Grammars}
			
 
				 \label{sec:grammar}
			
@@ -214,7 +218,7 @@ reader already knows what an integer is.) A name such as $\itm{arith}$
 
				 that is defined by the rules, is a \emph{non-terminal}.
			
 
				 
			
 
				 The second rule for the $\itm{arith}$ language is the \texttt{read}
			
 
				-function to receive an input integer from the user of the program.
			
 
				+operation that receives an input integer from the user of the program.
			
 
				 \begin{equation}
			
 
				   \itm{arith} ::= (\key{read}) \label{eq:arith-read}
			
 
				 \end{equation}
			
@@ -227,8 +231,10 @@ another arith by negating it.
 
				 Symbols such as \key{-} that play an auxilliary role in the abstract
			
 
				 syntax are called \emph{terminal} symbols.
			
 
				 
			
 
				-By rule \eqref{eq:arith-int}, \texttt{8} is an $\itm{arith}$, then by
			
 
				-rule \eqref{eq:arith-neg}, the following AST is an $\itm{arith}$.
			
 
				+We can apply the rules to build ASTs in the $\itm{arith}$
			
 
				+language. For example, by rule \eqref{eq:arith-int}, \texttt{8} is an
			
 
				+$\itm{arith}$, then by rule \eqref{eq:arith-neg}, the following AST is
			
 
				+an $\itm{arith}$.
			
 
				 \begin{center}
			
 
				 \begin{minipage}{0.25\textwidth}
			
 
				 \begin{lstlisting}
			
@@ -259,18 +265,17 @@ $\itm{arith}$, so we can apply rule \eqref{eq:arith-add} to show that
 
				 \texttt{(+ (read) (- 8))} is in the $\itm{arith}$ language.
			
 
				 
			
 
				 If you have an AST for which the above four rules do not apply, then
			
 
				-the AST is not in $\itm{arith}$. For example, the AST \texttt{(- (read)
			
 
				-  (+ 8))} is not in $\itm{arith}$ because there are no rules for $+$
			
 
				-with only one argument, nor for $-$ with two arguments.  Whenever we
			
 
				-define a language through a grammar, we implicitly mean for the
			
 
				-language to be the smallest set of programs that are justified by the
			
 
				-rules. That is, the language only includes those programs that the
			
 
				-rules allow.
			
 
				+the AST is not in $\itm{arith}$. For example, the AST \texttt{(-
			
 
				+  (read) (+ 8))} is not in $\itm{arith}$ because there are no rules
			
 
				+for \key{+} with only one argument, nor for \key{-} with two
			
 
				+arguments.  Whenever we define a language with a grammar, we
			
 
				+implicitly mean for the language to be the smallest set of programs
			
 
				+that are justified by the rules. That is, the language only includes
			
 
				+those programs that the rules allow.
			
 
				 
			
 
				 It is common to have many rules with the same left-hand side, so the
			
 
				-following vertical bar notation is used to gather several rules on one
			
 
				-line.  We refer to each clause between a vertical bar as an
			
 
				-``alternative''.
			
 
				+following vertical bar notation is used to gather several rules.  We
			
 
				+refer to each clause between a vertical bar as an ``alternative''.
			
 
				 \[
			
 
				 \itm{arith} ::= \Int \mid ({\tt \key{read}}) \mid (\key{-} \; \itm{arith}) \mid
			
 
				    (\key{+} \; \itm{arith} \; \itm{arith}) 
			
@@ -280,13 +285,13 @@ line.  We refer to each clause between a vertical bar as an
 
				 \label{sec:s-expr}
			
 
				 
			
 
				 Racket, as a descendant of Lisp~\citep{McCarthy:1960dz}, has
			
 
				-particularly convenient support for creating and manipulating abstract
			
 
				-syntax trees with its \emph{symbolic expression} feature, or
			
 
				-S-expression for short. We can create an S-expression simply by
			
 
				-writing a backquote followed by the textual representation of the
			
 
				-AST. (Technically speaking, this is called a \emph{quasiquote} in
			
 
				-Racket.)  For example, an S-expression to represent the AST
			
 
				-\eqref{eq:arith-prog} is created by the following Racket expression:
			
 
				+convenient support for creating and manipulating abstract syntax trees
			
 
				+with its \emph{symbolic expression} feature, or S-expression for
			
 
				+short. We can create an S-expression simply by writing a backquote
			
 
				+followed by the textual representation of the AST. (Technically
			
 
				+speaking, this is called a \emph{quasiquote} in Racket.)  For example,
			
 
				+an S-expression to represent the AST \eqref{eq:arith-prog} is created
			
 
				+by the following Racket expression:
			
 
				 \begin{center}
			
 
				 \texttt{`(+ (read) (- 8))}
			
 
				 \end{center}
			
@@ -500,10 +505,10 @@ arbitrary S-expression, {\tt sexp}, and determines whether or not {\tt
 
				 one grammar rule for $\itm{arith}$ and the body of each clause makes a
			
 
				 recursive call for each child node. This pattern of recursive function
			
 
				 is so common that it has a name, \emph{structural recursion}.  In
			
 
				-general, when a recursive function is defined using a set of match
			
 
				-clauses that correspond to a grammar, and each clause body makes a
			
 
				-recursive call on each child node, then we say the function is defined
			
 
				-by structural recursion.
			
 
				+general, when a recursive function is defined using a sequence of
			
 
				+match clauses that correspond to a grammar, and each clause body makes
			
 
				+a recursive call on each child node, then we say the function is
			
 
				+defined by structural recursion.
			
 
				 
			
 
				 \begin{center}
			
 
				 \begin{minipage}{0.7\textwidth}
			
@@ -589,7 +594,7 @@ by structural recursion.
 
				 
			
 
				 %% \end{verbatim}
			
 
				 
			
 
				-\section{Interpreter}
			
 
				+\section{Interpreters}
			
 
				 \label{sec:interp-arith}
			
 
				 
			
 
				 The meaning, or semantics, of a program is typically defined in the
			
@@ -739,6 +744,7 @@ evaluator on several examples and tests the output program.  The
 
				 \end{lstlisting}
			
 
				 
			
 
				 \begin{exercise}
			
 
				+\normalfont % I don't like the italics for exercises. -Jeremy
			
 
				 We challenge the reader to improve on the simple partial evaluator in
			
 
				 Figure~\ref{fig:pe-arith} by replacing the \texttt{pe-neg} and
			
 
				 \texttt{pe-add} helper functions with functions that know more about
			
@@ -755,7 +761,7 @@ output that takes the form of the $\itm{residual}$ non-terminal in the
 
				 following grammar.
			
 
				 \[
			
 
				 \begin{array}{lcl}
			
 
				-e &::=& (\TTKEY{read}) \mid (\key{-} \;({\tt \TTKEY{read}})) \mid (\key{+} \;e\; e)\\
			
 
				+e &::=& (\key{read}) \mid (\key{-} \;(\key{read})) \mid (\key{+} \; e \; e)\\
			
 
				 \itm{residual} &::=& \Int \mid (\key{+}\; \Int\; e) \mid e
			
 
				 \end{array}
			
 
				 \]
			
@@ -1108,7 +1114,7 @@ communicated from one step of the compiler to the next.
 
				 \label{fig:x86-ast-a}
			
 
				 \end{figure}
			
 
				 
			
 
				-\section{From $S_0$ to x86-64 via $C_0$}
			
 
				+\section{Planning the trip from $S_0$ to x86-64}
			
 
				 \label{sec:plan-s0-x86}
			
 
				 
			
 
				 To compile one language to another it helps to focus on the
			
@@ -1289,6 +1295,7 @@ it to different expressions, as in the last clause for primitive
 
				 operations in Figure~\ref{fig:uniquify-s0}.
			
 
				 
			
 
				 \begin{exercise}
			
 
				+\normalfont % I don't like the italics for exercises. -Jeremy
			
 
				 Complete the \key{uniquify} pass by filling in the blanks, that is,
			
 
				 implement the clauses for variables and for the \key{let} construct.
			
 
				 \end{exercise}
			
@@ -1313,6 +1320,7 @@ implement the clauses for variables and for the \key{let} construct.
 
				 \end{figure}
			
 
				 
			
 
				 \begin{exercise}
			
 
				+\normalfont % I don't like the italics for exercises. -Jeremy
			
 
				 Test your \key{uniquify} pass by creating three example $S_0$ programs
			
 
				 and checking whether the output programs produce the same result as
			
 
				 the input programs. The $S_0$ programs should be designed to test the
			
@@ -1322,8 +1330,8 @@ that overshadow eachother.  The three programs should be in a
 
				 subdirectory named \key{tests} and they shoul have the same file name
			
 
				 except for a different integer at the end of the name, followed by the
			
 
				 ending \key{.scm}.  Use the \key{interp-tests} function
			
 
				-(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test your
			
 
				-\key{uniquify} pass on the example programs.
			
 
				+(Appendix~\ref{appendix:utilities}) from \key{utilities.rkt} to test
			
 
				+your \key{uniquify} pass on the example programs.
			
 
				 
			
 
				 %% You can use the interpreter \key{interpret-S0} defined in the
			
 
				 %% \key{interp.rkt} file. The entire sequence of tests should be a short
			
@@ -1385,6 +1393,7 @@ of \key{flatten}.
 
				 \]
			
 
				 
			
 
				 \begin{exercise}
			
 
				+\normalfont
			
 
				 Implement the \key{flatten} pass and test it on all of the example
			
 
				 programs that you created to test the \key{uniquify} pass and create
			
 
				 three new example programs that are designed to exercise all of the
			
@@ -1496,7 +1505,7 @@ follows.
 
				 The \key{imulq} instruction is a special case because the destination
			
 
				 argument must be a register.
			
 
				 
			
 
				-\section{Print x86}
			
 
				+\section{Print x86-64}
			
 
				 \label{sec:print-x86}
			
 
				 
			
 
				 The last step of the compiler from $S_0$ to x86-64 is to convert the
			
@@ -1532,7 +1541,7 @@ and then store in the $\itm{info}$ field of the \key{program}.
 
				 \chapter{Register Allocation}
			
 
				 \label{ch:register-allocation}
			
 
				 
			
 
				-In Chapter~\ref{ch:int-exp} we simplified the generation of x86
			
 
				+In Chapter~\ref{ch:int-exp} we simplified the generation of x86-64
			
 
				 assembly by placing all variables on the stack. We can improve the
			
 
				 performance of the generated code considerably if we instead try to
			
 
				 place as many variables as possible into registers.  The CPU can
			
@@ -1541,7 +1550,7 @@ take from several cycles (to go to cache) to hundreds of cycles (to go
 
				 to main memory).  Figure~\ref{fig:reg-eg} shows a program with four
			
 
				 variables that serves as a running example. We show the source program
			
 
				 and also the output of instruction selection. At that point the
			
 
				-program is almost x86 assembly but not quite; it still contains
			
 
				+program is almost x86-64 assembly but not quite; it still contains
			
 
				 variables instead of stack locations or registers.
			
 
				 
			
 
				 \begin{figure}
			
@@ -1995,7 +2004,7 @@ Applying this assignment to our running example
 
				   (movq (stack-loc -16) (reg rax))
			
 
				   (subq (reg rbx) (reg rax)))
			
 
				 \end{lstlisting}
			
 
				-This program is almost an x86 program. The remaining step is to apply
			
 
				+This program is almost an x86-64 program. The remaining step is to apply
			
 
				 the patch instructions pass. In this example, the trivial move of
			
 
				 \key{-16(\%rbp)} to itself is deleted and the addition of
			
 
				 \key{-8(\%rbp)} to \key{-16(\%rbp)} is fixed by going through
			
@@ -2013,11 +2022,11 @@ shown in Figure~\ref{fig:reg-alloc-passes}.
 
				 \[
			
 
				 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
			
 
				 \node (1) at (-2,0)     {$C_0$};
			
 
				-\node (2)  at (0,0)     {$\text{x86}^{*}$};
			
 
				-\node (3)  at (0,-1.5)  {$\text{x86}^{*}$};
			
 
				-\node (4)  at (0,-3)    {$\text{x86}^{*}$};
			
 
				-\node (5)  at (0,-4.5)  {$\text{x86}^{*}$};
			
 
				-\node (6)  at (2,-4.5)  {$\text{x86}$};
			
 
				+\node (2)  at (0,0)     {$\text{x86-64}^{*}$};
			
 
				+\node (3)  at (0,-1.5)  {$\text{x86-64}^{*}$};
			
 
				+\node (4)  at (0,-3)    {$\text{x86-64}^{*}$};
			
 
				+\node (5)  at (0,-4.5)  {$\text{x86-64}^{*}$};
			
 
				+\node (6)  at (2,-4.5)  {$\text{x86-64}$};
			
 
				 
			
 
				 \path[->,bend left=15] (1) edge [above] node {\ttfamily\scriptsize select-instr.}      (2);
			
 
				 \path[->,            ] (2) edge [right] node {\ttfamily\scriptsize uncover-live}       (3);
			
@@ -2240,7 +2249,7 @@ languages considered in this book ($S_0, S_1, \ldots$) and interprets
 
				 the program, returning the result value.  The \key{interp-C} function
			
 
				 interprets an AST for a program in one of the C-like languages ($C_0,
			
 
				 C_1, \ldots$), and the \key{interp-x86} function interprets an AST for
			
 
				-an x86 program.
			
 
				+an x86-64 program.
			
 
				 
			
 
				 \section{Utility Functions}
			
 
				 \label{appendix:utilities}
			
@@ -2276,7 +2285,7 @@ the input for the Scheme program.
 
				 The compiler-tests function takes a compiler name (a string) a
			
 
				 description of the passes (see the comment for \key{interp-tests}) a
			
 
				 test family name (a string), and a list of test numbers (see the
			
 
				-comment for interp-tests), and runs the compiler to generate x86 (a
			
 
				+comment for interp-tests), and runs the compiler to generate x86-64 (a
			
 
				 \key{.s} file) and then runs gcc to generate machine code.  It runs
			
 
				 the machine code and checks that the output is 42.
			
 
				 \begin{lstlisting}
			
--- a/defs.tex
+++ b/defs.tex
@@ -25,5 +25,5 @@
 
				 
			
 
				 \newcommand{\IF}[3]{(\key{if}\,#1\;#2\;#3)}
			
 
				 
			
 
				-\newcommand{\TTKEY}[1]{\normalfont\tt\key{#1}}
			
 
				+\newcommand{\TTKEY}[1]{{\normalfont\tt #1}}