瀏覽代碼

mention of three-address code

Jeremy Siek 3 年之前
父節點
當前提交
37ad5fd61f
共有 3 個文件被更改,包括 73 次插入41 次删除
  1. 11 0
      book.bib
  2. 60 40
      book.tex
  3. 2 1
      defs.tex

+ 11 - 0
book.bib

@@ -1,3 +1,14 @@
+
+@book{Aho:2006wb,
+	address = {USA},
+	author = {Aho, Alfred V. and Lam, Monica S. and Sethi, Ravi and Ullman, Jeffrey D.},
+	date-added = {2021-10-22 09:38:52 -0400},
+	date-modified = {2021-10-22 09:38:59 -0400},
+	isbn = {0321486811},
+	publisher = {Addison-Wesley Longman Publishing Co., Inc.},
+	title = {Compilers: Principles, Techniques, and Tools (2nd Edition)},
+	year = {2006}}
+
 @article{Logothetis:1981,
 @article{Logothetis:1981,
 author = {Logothetis, George and Mishra, Prateek},
 author = {Logothetis, George and Mishra, Prateek},
 title = {Compiling short-circuit boolean expressions in one pass},
 title = {Compiling short-circuit boolean expressions in one pass},

+ 60 - 40
book.tex

@@ -454,7 +454,7 @@ that efficiently supports the operations that the compiler needs to
 perform.\index{subject}{concrete syntax}\index{subject}{abstract syntax}\index{subject}{abstract
 perform.\index{subject}{concrete syntax}\index{subject}{abstract syntax}\index{subject}{abstract
   syntax tree}\index{subject}{AST}\index{subject}{program}\index{subject}{parse} The translation
   syntax tree}\index{subject}{AST}\index{subject}{program}\index{subject}{parse} The translation
 from concrete syntax to abstract syntax is a process called
 from concrete syntax to abstract syntax is a process called
-\emph{parsing}~\citep{Aho:1986qf}. We do not cover the theory and
+\emph{parsing}~\citep{Aho:2006wb}. We do not cover the theory and
 implementation of parsing in this book.
 implementation of parsing in this book.
 %
 %
 \racket{A parser is provided in the support code for translating from
 \racket{A parser is provided in the support code for translating from
@@ -2565,12 +2565,12 @@ to bridge those differences. What are the differences between \LangVar{}
 and x86 assembly? Here are some of the most important ones:
 and x86 assembly? Here are some of the most important ones:
 
 
 \begin{enumerate}
 \begin{enumerate}
-\item x86 arithmetic instructions typically have two arguments
-  and update the second argument in place. In contrast, \LangVar{}
+\item x86 arithmetic instructions typically have two arguments and
+  update the second argument in place. In contrast, \LangVar{}
   arithmetic operations take two arguments and produce a new value.
   arithmetic operations take two arguments and produce a new value.
   An x86 instruction may have at most one memory-accessing argument.
   An x86 instruction may have at most one memory-accessing argument.
-  Furthermore, some instructions place special restrictions on their
-  arguments.
+  Furthermore, some x86 instructions place special restrictions on
+  their arguments.
 
 
 \item An argument of an \LangVar{} operator can be a deeply-nested
 \item An argument of an \LangVar{} operator can be a deeply-nested
   expression, whereas x86 instructions restrict their arguments to be
   expression, whereas x86 instructions restrict their arguments to be
@@ -2592,13 +2592,17 @@ and x86 assembly? Here are some of the most important ones:
 \fi}  
 \fi}  
 \end{enumerate}
 \end{enumerate}
 
 
-We ease the challenge of compiling from \LangVar{} to x86 by breaking down
-the problem into several steps, dealing with the above differences one
-at a time.  Each of these steps is called a \emph{pass} of the
-compiler.\index{subject}{pass}\index{subject}{compiler pass}
+We ease the challenge of compiling from \LangVar{} to x86 by breaking
+down the problem into several steps, dealing with the above
+differences one at a time. Each of these steps is called a \emph{pass}
+of the compiler.\index{subject}{pass}\index{subject}{compiler pass}
 %
 %
-This terminology comes from the way each step passes over the AST of
-the program.
+This terminology comes from the way each step passes over, that is,
+traverses the AST of the program.
+%
+Furthermore, we follow the nanopass approach, which means we strive
+for each pass to accomplish one clear objective (not two or three at
+the same time).
 %
 %
 We begin by sketching how we might implement each pass, and give them
 We begin by sketching how we might implement each pass, and give them
 names.  We then figure out an ordering of the passes and the
 names.  We then figure out an ordering of the passes and the
@@ -2611,6 +2615,8 @@ our own design.  Finally, to implement each pass we write one
 recursive function per non-terminal in the grammar of the input
 recursive function per non-terminal in the grammar of the input
 language of the pass.  \index{subject}{intermediate language}
 language of the pass.  \index{subject}{intermediate language}
 
 
+Our compiler for \LangVar{} consists of the following passes.
+%
 \begin{description}
 \begin{description}
 {\if\edition\racketEd
 {\if\edition\racketEd
 \item[\key{uniquify}] deals with the shadowing of variables by
 \item[\key{uniquify}] deals with the shadowing of variables by
@@ -2627,7 +2633,7 @@ language of the pass.  \index{subject}{intermediate language}
   
   
 {\if\edition\racketEd
 {\if\edition\racketEd
 \item[\key{explicate\_control}] makes the execution order of the
 \item[\key{explicate\_control}] makes the execution order of the
-  program explicit. It convert the abstract syntax tree representation
+  program explicit. It converts the abstract syntax tree representation
   into a control-flow graph in which each node contains a sequence of
   into a control-flow graph in which each node contains a sequence of
   statements and the edges between nodes say which nodes contain jumps
   statements and the edges between nodes say which nodes contain jumps
   to other nodes.
   to other nodes.
@@ -2638,9 +2644,18 @@ language of the pass.  \index{subject}{intermediate language}
   \LangVar{} operation to a short sequence of instructions that
   \LangVar{} operation to a short sequence of instructions that
   accomplishes the same task.
   accomplishes the same task.
 
 
-\item[\key{assign\_homes}] replaces the variables in \LangVar{} with
-  registers or stack locations in x86.
+\item[\key{assign\_homes}] replaces variables with registers or stack
+  locations.
 \end{description}
 \end{description}
+%
+{\if\edition\racketEd
+%
+Our treatment of \code{remove\_complex\_operands} and
+\code{explicate\_control} as separate passes is an example of the
+nanopass approach. The traditional approach is to combine them into a
+single step~\citep{Aho:2006wb}.
+%  
+\fi}
 
 
 The next question is: in what order should we apply these passes? This
 The next question is: in what order should we apply these passes? This
 question can be challenging because it is difficult to know ahead of
 question can be challenging because it is difficult to know ahead of
@@ -2680,10 +2695,9 @@ be forced to assign both arguments to memory locations.
 %
 %
 A sophisticated approach is to iteratively repeat the two passes until
 A sophisticated approach is to iteratively repeat the two passes until
 a solution is found. However, to reduce implementation complexity we
 a solution is found. However, to reduce implementation complexity we
-recommend a simpler approach in which \key{select\_instructions} comes
-first, followed by the \key{assign\_homes}, then a third pass named
-\key{patch\_instructions} that uses a reserved register to fix
-outstanding problems.
+recommend placing \key{select\_instructions} first, followed by the
+\key{assign\_homes}, then a third pass named \key{patch\_instructions}
+that uses a reserved register to fix outstanding problems.
 
 
 \begin{figure}[tbp]
 \begin{figure}[tbp]
 {\if\edition\racketEd  
 {\if\edition\racketEd  
@@ -2767,6 +2781,10 @@ of each of the compiler passes in Figure~\ref{fig:Lvar-passes}.
 The output of \code{explicate\_control} is similar to the $C$
 The output of \code{explicate\_control} is similar to the $C$
 language~\citep{Kernighan:1988nx} in that it has separate syntactic
 language~\citep{Kernighan:1988nx} in that it has separate syntactic
 categories for expressions and statements, so we name it \LangCVar{}.
 categories for expressions and statements, so we name it \LangCVar{}.
+This style of intermediate language is also known as
+\emph{three-address code}, to emphasize that the typical form of a
+statement is \CASSIGN{\key{x}}{\CADD{\key{y}}{\key{z}}} involves three
+addresses~\citep{Aho:2006wb}.
 
 
 The concrete syntax for \LangCVar{} is defined in
 The concrete syntax for \LangCVar{} is defined in
 Figure~\ref{fig:c0-concrete-syntax} and the abstract syntax for
 Figure~\ref{fig:c0-concrete-syntax} and the abstract syntax for
@@ -3038,8 +3056,7 @@ Figure~\ref{fig:Lvar-anf-syntax} presents the grammar for the output
 of this pass, the language \LangVarANF{}. The only difference is that
 of this pass, the language \LangVarANF{}. The only difference is that
 operator arguments are restricted to be atomic expressions that are
 operator arguments are restricted to be atomic expressions that are
 defined by the \Atm{} non-terminal. In particular, integer constants
 defined by the \Atm{} non-terminal. In particular, integer constants
-and variables are atomic. This restriction brings us closer to what is
-known as a \emph{three-address code}~\citep{Aho:1986qf} language.
+and variables are atomic.
 
 
 The atomic expressions are pure (they do not cause side-effects or
 The atomic expressions are pure (they do not cause side-effects or
 depend on them) whereas complex expressions may have side effects,
 depend on them) whereas complex expressions may have side effects,
@@ -3053,7 +3070,8 @@ between atomic expressions and complex expressions can change and
 often does. The reason that these changes are behaviour preserving is
 often does. The reason that these changes are behaviour preserving is
 that the atomic expressions are pure.
 that the atomic expressions are pure.
 
 
-Another well-known form is the \emph{administrative normal form}
+Another well-known form for intermediate languages is the
+\emph{administrative normal form}
 (ANF)~\citep{Danvy:1991fk,Flanagan:1993cg}.
 (ANF)~\citep{Danvy:1991fk,Flanagan:1993cg}.
 \index{subject}{administrative normal form} \index{subject}{ANF}
 \index{subject}{administrative normal form} \index{subject}{ANF}
 %
 %
@@ -7487,7 +7505,7 @@ R^{\mathsf{ANF}}_{\mathsf{if}}  &::=& \PROGRAM{\code{()}}{\Exp}
 \fi}
 \fi}
 \end{minipage}
 \end{minipage}
 }
 }
-\caption{\LangIfANF{} is \LangIf{} in administrative normal form (ANF).}
+\caption{\LangIfANF{} is \LangIf{} in monadic normal form.}
 \label{fig:Lif-anf-syntax}
 \label{fig:Lif-anf-syntax}
 \end{figure}
 \end{figure}
 
 
@@ -9283,12 +9301,14 @@ blocks on several test programs.
 \section{Further Reading}
 \section{Further Reading}
 \label{sec:cond-further-reading}
 \label{sec:cond-further-reading}
 
 
-The algorithm for the \code{explicate\_control} pass comes from the
-course notes of \citet{Dybvig:2010aa} and it has several similarities
-to an algorithm of \citet{Danvy:2003fk}. The treatment of conditionals
-in the \code{explicate\_control} pass is similar to the case-of-case
-transformation of \citet{PeytonJones:1998} and to short-cut boolean
-evaluation~\citep{Logothetis:1981,Aho:1986qf,Clarke:1989,Danvy:2003fk}.
+The algorithm for the \code{explicate\_control} pass is based on the
+the \code{explose-basic-blocks} pass in course notes of
+\citet{Dybvig:2010aa}. It has several similarities to the algorithms
+of \citet{Danvy:2003fk} and \citet{Appel:2003fk}. The treatment of
+conditionals in the \code{explicate\_control} pass is similar to the
+case-of-case transformation of \citet{PeytonJones:1998} and to
+short-cut boolean
+evaluation~\citep{Logothetis:1981,Aho:2006wb,Clarke:1989,Danvy:2003fk}.
 
 
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
 \chapter{Loops and Dataflow Analysis}
 \chapter{Loops and Dataflow Analysis}
@@ -10168,7 +10188,7 @@ Figure~\ref{fig:Rwhile-anf-syntax} defines the output language
 \fi}
 \fi}
 \end{minipage}
 \end{minipage}
 }
 }
-\caption{\LangLoopANF{} is \LangLoop{} in administrative normal form (ANF).}
+\caption{\LangLoopANF{} is \LangLoop{} in monadic normal form.}
 \label{fig:Rwhile-anf-syntax}
 \label{fig:Rwhile-anf-syntax}
 \end{figure}
 \end{figure}
 
 
@@ -11333,7 +11353,7 @@ should all be treated as complex operands.
 %% from its enclosing \code{HasType}.
 %% from its enclosing \code{HasType}.
 Figure~\ref{fig:Lvec-anf-syntax}
 Figure~\ref{fig:Lvec-anf-syntax}
 shows the grammar for the output language \LangVecANF{} of this
 shows the grammar for the output language \LangVecANF{} of this
-pass, which is \LangVec{} in administrative normal form.
+pass, which is \LangVec{} in monadic normal form.
 
 
 \begin{figure}[tp]
 \begin{figure}[tp]
 \centering
 \centering
@@ -11357,7 +11377,7 @@ pass, which is \LangVec{} in administrative normal form.
 \]
 \]
 \end{minipage}
 \end{minipage}
 }
 }
-\caption{\LangVecANF{} is \LangVec{} in administrative normal form (ANF).}
+\caption{\LangVecANF{} is \LangVec{} in monadic normal form.}
 \label{fig:Lvec-anf-syntax}
 \label{fig:Lvec-anf-syntax}
 \end{figure}
 \end{figure}
 
 
@@ -12954,7 +12974,7 @@ R^{\dagger}_4  &::=& \gray{ \PROGRAMDEFS{\code{'()}}{\Def} }
 \]
 \]
 \end{minipage}
 \end{minipage}
 }
 }
-\caption{\LangFunANF{} is \LangFun{} in administrative normal form (ANF).}
+\caption{\LangFunANF{} is \LangFunRefAlloc{} in monadic normal form.}
 \label{fig:Rfun-anf-syntax}
 \label{fig:Rfun-anf-syntax}
 \end{figure}
 \end{figure}
 
 
@@ -14229,15 +14249,15 @@ extract the $5$-bits starting at position $58$ from the tag.
 
 
 \begin{figure}[p]
 \begin{figure}[p]
 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
-\node (Rfun) at (0,2)  {\large \LangFun{}};
-\node (Rfun-2) at (3,2)  {\large \LangFun{}};
-\node (Rfun-3) at (6,2)  {\large \LangFun{}};
-\node (F1-0) at (9,2)  {\large \LangFunRef{}};
-\node (F1-1) at (12,0)  {\large \LangFunRef{}};
+\node (Rfun) at (0,2)  {\large \LangLam{}};
+\node (Rfun-2) at (3,2)  {\large \LangLam{}};
+\node (Rfun-3) at (6,2)  {\large \LangLam{}};
+\node (F1-0) at (9,2)  {\large \LangLamFunRef{}};
+\node (F1-1) at (12,0)  {\large \LangLamFunRef{}};
 \node (F1-2) at (9,0)  {\large \LangFunRef{}};
 \node (F1-2) at (9,0)  {\large \LangFunRef{}};
-\node (F1-3) at (6,0)  {\large $F_1$};
-\node (F1-4) at (3,0)  {\large $F_1$};
-\node (F1-5) at (0,0)  {\large $F^{\RCO}_1$};
+\node (F1-3) at (6,0)  {\large \LangFunRef{}};
+\node (F1-4) at (3,0)  {\large \LangFunRefAlloc{}};
+\node (F1-5) at (0,0)  {\large \LangFunANF{}};
 \node (C3-2) at (3,-2)  {\large \LangCFun{}};
 \node (C3-2) at (3,-2)  {\large \LangCFun{}};
 
 
 \node (x86-2) at (3,-4)  {\large \LangXIndCallVar{}};
 \node (x86-2) at (3,-4)  {\large \LangXIndCallVar{}};

+ 2 - 1
defs.tex

@@ -37,11 +37,12 @@
 \newcommand{\LangFunM}{\Lang_{\mathsf{Fun}}} %R4
 \newcommand{\LangFunM}{\Lang_{\mathsf{Fun}}} %R4
 \newcommand{\LangCFun}{$\CLang_{\mathsf{Fun}}$} %C3
 \newcommand{\LangCFun}{$\CLang_{\mathsf{Fun}}$} %C3
 \newcommand{\LangCFunM}{\CLang_{\mathsf{Fun}}} %C3
 \newcommand{\LangCFunM}{\CLang_{\mathsf{Fun}}} %C3
-\newcommand{\LangFunANF}{\ensuremath{\Lang^{\RCO}_{\mathsf{Fun}}}} %R4
+\newcommand{\LangFunANF}{\ensuremath{\Lang^{\RCO}_{\mathsf{FunRef}}}} %R4
 \newcommand{\LangFunRef}{$\Lang_{\mathsf{FunRef}}$} %F1
 \newcommand{\LangFunRef}{$\Lang_{\mathsf{FunRef}}$} %F1
 \newcommand{\LangFunRefM}{\Lang_{\mathsf{FunRef}}} %F1
 \newcommand{\LangFunRefM}{\Lang_{\mathsf{FunRef}}} %F1
 \newcommand{\LangFunRefAlloc}{\ensuremath{\Lang^{\mathsf{Alloc}}_{\mathsf{FunRef}}}} %R'4
 \newcommand{\LangFunRefAlloc}{\ensuremath{\Lang^{\mathsf{Alloc}}_{\mathsf{FunRef}}}} %R'4
 \newcommand{\LangLam}{$\Lang_\lambda$} %R5
 \newcommand{\LangLam}{$\Lang_\lambda$} %R5
+\newcommand{\LangLamFunRef}{$\Lang_\lambda^{\mathsf{FunRef}}$} 
 \newcommand{\LangLamM}{\ensuremath{\Lang_\lambda}} %R5
 \newcommand{\LangLamM}{\ensuremath{\Lang_\lambda}} %R5
 \newcommand{\LangCLam}{$\CLang_{\mathsf{Clos}}$} %C4
 \newcommand{\LangCLam}{$\CLang_{\mathsf{Clos}}$} %C4
 \newcommand{\LangCLamM}{\CLang_{\mathsf{Clos}}} %C4
 \newcommand{\LangCLamM}{\CLang_{\mathsf{Clos}}} %C4