|
@@ -167,9 +167,9 @@ London, England}
|
|
|
|
|
|
\tableofcontents
|
|
|
|
|
|
-\listoffigures
|
|
|
+%\listoffigures
|
|
|
|
|
|
-\listoftables
|
|
|
+%\listoftables
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
\chapter*{Preface}
|
|
@@ -199,22 +199,20 @@ language design choices to their impact on the compiler and the generated
|
|
|
code.
|
|
|
|
|
|
A compiler is typically organized as a sequence of stages that
|
|
|
-progressively translates a program to code that runs on hardware. We
|
|
|
+progressively translate a program to code that runs on hardware. We
|
|
|
take this approach to the extreme by partitioning our compiler into a
|
|
|
large number of \emph{nanopasses}, each of which performs a single
|
|
|
task. This allows us to test the output of each pass in isolation, and
|
|
|
-furthermore, allows us to focus our attention making the compiler far
|
|
|
-easier to understand.
|
|
|
-
|
|
|
-%% [TODO: easier to understand/debug for those maintaining the compiler,
|
|
|
-%% proving correctness]
|
|
|
+furthermore, allows us to focus our attention which makes the compiler
|
|
|
+far easier to understand.
|
|
|
|
|
|
The most familiar approach to describing compilers is with one pass
|
|
|
-per chapter. The problem with that is it obfuscates how language
|
|
|
-features motivate design choices in a compiler. We take an
|
|
|
+per chapter. The problem with that approach is it obfuscates how
|
|
|
+language features motivate design choices in a compiler. We take an
|
|
|
\emph{incremental} approach in which we build a complete compiler in
|
|
|
-each chapter, starting with arithmetic and variables and add new
|
|
|
-features in subsequent chapters.
|
|
|
+each chapter, starting with a small input language that includes only
|
|
|
+arithmetic and variables and we add new language features in
|
|
|
+subsequent chapters.
|
|
|
|
|
|
Our choice of language features is designed to elicit the fundamental
|
|
|
concepts and algorithms used in compilers.
|
|
@@ -226,11 +224,11 @@ concepts and algorithms used in compilers.
|
|
|
\item In Chapter~\ref{ch:register-allocation-Rvar} we apply
|
|
|
\emph{graph coloring} to assign variables to machine registers.
|
|
|
\item Chapter~\ref{ch:Rif} adds \code{if} expressions, which motivates
|
|
|
- an elegant recursive algorithm for mapping expressions to
|
|
|
- \emph{control-flow graphs}.
|
|
|
+ an elegant recursive algorithm for translating them into conditional
|
|
|
+ \code{goto}'s.
|
|
|
\item Chapter~\ref{ch:Rwhile} fleshes out support for imperative
|
|
|
programming languages with the addition of loops\racket{ and mutable
|
|
|
- variables}. These additions elicit the need for \emph{dataflow
|
|
|
+ variables}. This elicits the need for \emph{dataflow
|
|
|
analysis} in the register allocator.
|
|
|
\item Chapter~\ref{ch:Rvec} adds heap-allocated tuples, motivating
|
|
|
\emph{garbage collection}.
|
|
@@ -244,10 +242,15 @@ concepts and algorithms used in compilers.
|
|
|
scoping, i.e., \emph{lambda abstraction}. The reader learns about
|
|
|
\emph{closure conversion}, in which lambdas are translated into a
|
|
|
combination of functions and tuples.
|
|
|
+% Chapter about classes and objects?
|
|
|
\item Chapter~\ref{ch:Rdyn} adds \emph{dynamic typing}. Prior to this
|
|
|
point the input languages are statically typed. The reader extends
|
|
|
the statically typed language with an \code{Any} type which serves
|
|
|
as a target for compiling the dynamically typed language.
|
|
|
+{\if\edition\pythonEd
|
|
|
+\item Chapter~\ref{ch:Robject} adds support for \emph{objects} and
|
|
|
+ \emph{classes}.
|
|
|
+\fi}
|
|
|
\item Chapter~\ref{ch:Rgrad} uses the \code{Any} type of
|
|
|
Chapter~\ref{ch:Rdyn} to implement a \emph{gradually typed language}
|
|
|
in which different regions of a program may be static or dynamically
|
|
@@ -258,44 +261,71 @@ concepts and algorithms used in compilers.
|
|
|
\ref{ch:Rdyn} and \ref{ch:Rgrad}.
|
|
|
\end{itemize}
|
|
|
There are many language features that we do not include. Our choices
|
|
|
-weigh the incidental complexity of a feature against the fundamental
|
|
|
+balance the incidental complexity of a feature versus the fundamental
|
|
|
concepts that it exposes. For example, we include tuples and not
|
|
|
records because they both elicit the study of heap allocation and
|
|
|
garbage collection but records come with more incidental complexity.
|
|
|
|
|
|
-Since 2016 drafts of this book have served as the textbook for the
|
|
|
-compiler course at Indiana University, a 16-week course for
|
|
|
-upper-level undergraduates and first-year graduate students.
|
|
|
+Since 2009 drafts of this book have served as the textbook for 16-week
|
|
|
+compiler courses for upper-level undergraduates and first-year
|
|
|
+graduate students at the University of Colorado and Indiana
|
|
|
+University.
|
|
|
%
|
|
|
-Prior to this course, students learn to program in both imperative and
|
|
|
-functional languages, study data structures and algorithms, and take
|
|
|
-discrete mathematics.
|
|
|
+Students come into the course having learned the basics of
|
|
|
+programmming, data structures and algorithms, and discrete
|
|
|
+mathematics.
|
|
|
%
|
|
|
At the beginning of the course, students form groups of 2-4 people.
|
|
|
The groups complete one chapter every two weeks, starting with
|
|
|
Chapter~\ref{ch:Rvar}. Many chapters include a challenge problem that
|
|
|
we assign to the graduate students. The last two weeks of the course
|
|
|
involve a final project in which students design and implement a
|
|
|
-compiler extension of their choosing. Chapters~\ref{ch:Rwhile},
|
|
|
-\ref{ch:Rgrad}, and \ref{ch:Rpoly} can be used in support of these
|
|
|
-projects or they can replace some of the other chapters. For example,
|
|
|
-a course with an emphasis on statically-typed imperative languages
|
|
|
-could include Chapter~\ref{ch:Rwhile} but skip
|
|
|
-Chapter~\ref{ch:Rdyn}. For compiler courses at univerities on the
|
|
|
-quarter system, with 10 weeks, we recommend completing up through
|
|
|
-Chapter~\ref{ch:Rfun}. (If pressed for time, one can skip
|
|
|
-Chapter~\ref{ch:Rvec} but still include Chapter~\ref{ch:Rfun} by
|
|
|
-limiting the number of parameters allowed in functions.)
|
|
|
-Figure~\ref{fig:chapter-dependences} depicts the dependencies between
|
|
|
-chapters.
|
|
|
+compiler extension of their choosing. Chapters~\ref{ch:Rgrad} and
|
|
|
+\ref{ch:Rpoly} can be used in support of these projects or they can
|
|
|
+replace some of the other chapters. For example, a course with an
|
|
|
+emphasis on statically-typed imperative languages could include
|
|
|
+Chapter~\ref{ch:Rpoly} but skip Chapter~\ref{ch:Rdyn}. For compiler
|
|
|
+courses at univerities on the quarter system, with 10 weeks, we
|
|
|
+recommend completing up through Chapter~\ref{ch:Rfun}. (If pressed
|
|
|
+for time, one can skip Chapter~\ref{ch:Rvec} but still include
|
|
|
+Chapter~\ref{ch:Rfun} by limiting the number of parameters allowed in
|
|
|
+functions.) Figure~\ref{fig:chapter-dependences} depicts the
|
|
|
+dependencies between chapters.
|
|
|
|
|
|
This book has also been used in compiler courses at California
|
|
|
Polytechnic State University, Portland State University, Rose–Hulman
|
|
|
-Institute of Technology, University of Massachusetts Lowell,
|
|
|
-University of Colorado, and the University of Vermont.
|
|
|
+Institute of Technology, University of Massachusetts Lowell, and the
|
|
|
+University of Vermont.
|
|
|
|
|
|
|
|
|
\begin{figure}[tp]
|
|
|
+{\if\edition\racketEd
|
|
|
+\begin{tikzpicture}[baseline=(current bounding box.center)]
|
|
|
+ \node (C1) at (0,1.5) {\small Ch.~\ref{ch:trees-recur} Preliminaries};
|
|
|
+ \node (C2) at (4,1.5) {\small Ch.~\ref{ch:Rvar} Variables};
|
|
|
+ \node (C3) at (8,1.5) {\small Ch.~\ref{ch:register-allocation-Rvar} Registers};
|
|
|
+ \node (C4) at (0,0) {\small Ch.~\ref{ch:Rif} Conditionals};
|
|
|
+ \node (C5) at (4,0) {\small Ch.~\ref{ch:Rvec} Tuples};
|
|
|
+ \node (C6) at (8,0) {\small Ch.~\ref{ch:Rfun} Functions};
|
|
|
+ \node (C9) at (0,-1.5) {\small Ch.~\ref{ch:Rwhile} Loops};
|
|
|
+ \node (C8) at (4,-1.5) {\small Ch.~\ref{ch:Rdyn} Dynamic};
|
|
|
+ \node (C7) at (8,-1.5) {\small Ch.~\ref{ch:Rlam} Lambda};
|
|
|
+ \node (C10) at (4,-3) {\small Ch.~\ref{ch:Rgrad} Gradual Typing};
|
|
|
+ \node (C11) at (8,-3) {\small Ch.~\ref{ch:Rpoly} Generics};
|
|
|
+
|
|
|
+ \path[->] (C1) edge [above] node {} (C2);
|
|
|
+ \path[->] (C2) edge [above] node {} (C3);
|
|
|
+ \path[->] (C3) edge [above] node {} (C4);
|
|
|
+ \path[->] (C4) edge [above] node {} (C5);
|
|
|
+ \path[->] (C5) edge [above] node {} (C6);
|
|
|
+ \path[->] (C6) edge [above] node {} (C7);
|
|
|
+ \path[->] (C4) edge [above] node {} (C8);
|
|
|
+ \path[->] (C4) edge [above] node {} (C9);
|
|
|
+ \path[->] (C8) edge [above] node {} (C10);
|
|
|
+ \path[->] (C10) edge [above] node {} (C11);
|
|
|
+\end{tikzpicture}
|
|
|
+\fi}
|
|
|
+{\if\edition\pythonEd
|
|
|
\begin{tikzpicture}[baseline=(current bounding box.center)]
|
|
|
\node (C1) at (0,1.5) {\small Ch.~\ref{ch:trees-recur} Preliminaries};
|
|
|
\node (C2) at (4,1.5) {\small Ch.~\ref{ch:Rvar} Variables};
|
|
@@ -305,6 +335,7 @@ University of Colorado, and the University of Vermont.
|
|
|
\node (C6) at (8,0) {\small Ch.~\ref{ch:Rfun} Functions};
|
|
|
\node (C9) at (0,-1.5) {\small Ch.~\ref{ch:Rwhile} Loops};
|
|
|
\node (C8) at (4,-1.5) {\small Ch.~\ref{ch:Rdyn} Dynamic};
|
|
|
+ \node (CO) at (0,-3) {\small Ch.~\ref{ch:Robject} Objects};
|
|
|
\node (C7) at (8,-1.5) {\small Ch.~\ref{ch:Rlam} Lambda};
|
|
|
\node (C10) at (4,-3) {\small Ch.~\ref{ch:Rgrad} Gradual Typing};
|
|
|
\node (C11) at (8,-3) {\small Ch.~\ref{ch:Rpoly} Generics};
|
|
@@ -318,8 +349,10 @@ University of Colorado, and the University of Vermont.
|
|
|
\path[->] (C4) edge [above] node {} (C8);
|
|
|
\path[->] (C4) edge [above] node {} (C9);
|
|
|
\path[->] (C8) edge [above] node {} (C10);
|
|
|
+ \path[->] (C8) edge [above] node {} (CO);
|
|
|
\path[->] (C10) edge [above] node {} (C11);
|
|
|
\end{tikzpicture}
|
|
|
+\fi}
|
|
|
\caption{Diagram of chapter dependencies.}
|
|
|
\label{fig:chapter-dependences}
|
|
|
\end{figure}
|
|
@@ -359,7 +392,7 @@ We follow the System V calling
|
|
|
conventions~\citep{Bryant:2005aa,Matz:2013aa}, so the assembly code
|
|
|
that we generate works with the runtime system (written in C) when it
|
|
|
is compiled using the GNU C compiler (\code{gcc}) on Linux and MacOS
|
|
|
-operating systems.
|
|
|
+operating systems on Intel hardware.
|
|
|
%
|
|
|
On the Windows operating system, \code{gcc} uses the Microsoft x64
|
|
|
calling convention~\citep{Microsoft:2018aa,Microsoft:2020aa}. So the
|
|
@@ -372,13 +405,13 @@ Linux as the guest operating system.
|
|
|
The tradition of compiler construction at Indiana University goes back
|
|
|
to research and courses on programming languages by Daniel Friedman in
|
|
|
the 1970's and 1980's. One of his students, Kent Dybvig, implemented
|
|
|
-Chez Scheme~\citep{Dybvig:2006aa}, a production-quality, efficient
|
|
|
+Chez Scheme~\citep{Dybvig:2006aa}, an efficient, production-quality
|
|
|
compiler for Scheme. Throughout the 1990's and 2000's, Dybvig taught
|
|
|
the compiler course and continued the development of Chez Scheme.
|
|
|
%
|
|
|
The compiler course evolved to incorporate novel pedagogical ideas
|
|
|
-while also including elements of efficient real-world compilers. One
|
|
|
-of Friedman's ideas was to split the compiler into many small
|
|
|
+while also including elements of real-world compilers. One of
|
|
|
+Friedman's ideas was to split the compiler into many small
|
|
|
passes. Another idea, called ``the game'', was to test the code
|
|
|
generated by each pass using interpreters.
|
|
|
|
|
@@ -467,10 +500,11 @@ Compilers use abstract syntax trees to represent programs because they
|
|
|
often need to ask questions like: for a given part of a program, what
|
|
|
kind of language feature is it? What are its sub-parts? Consider the
|
|
|
program on the left and its AST on the right. This program is an
|
|
|
-addition operation and it has two sub-parts, a read operation and a
|
|
|
-negation. The negation has another sub-part, the integer constant
|
|
|
-\code{8}. By using a tree to represent the program, we can easily
|
|
|
-follow the links to go from one part of a program to its sub-parts.
|
|
|
+addition operation and it has two sub-parts, a
|
|
|
+\racket{read}\python{input} operation and a negation. The negation has
|
|
|
+another sub-part, the integer constant \code{8}. By using a tree to
|
|
|
+represent the program, we can easily follow the links to go from one
|
|
|
+part of a program to its sub-parts.
|
|
|
\begin{center}
|
|
|
\begin{minipage}{0.4\textwidth}
|
|
|
\if\edition\racketEd
|
|
@@ -630,7 +664,7 @@ eight = Constant(8)
|
|
|
We say that the value created by \INT{8} is an
|
|
|
\emph{instance} of the \code{Constant} class.
|
|
|
|
|
|
-The following is class definition for unary operators.
|
|
|
+The following is the class definition for unary operators.
|
|
|
\begin{lstlisting}
|
|
|
class UnaryOp:
|
|
|
def __init__(self, op, operand):
|
|
@@ -640,7 +674,7 @@ class UnaryOp:
|
|
|
The specific operation is specified by the \code{op} parameter. For
|
|
|
example, the class \code{USub} is for unary subtraction. (More unary
|
|
|
operators are introduced in later chapters.) To create an AST that
|
|
|
-negates the number $8$, we write \NEG{\code{eight}}.
|
|
|
+negates the number $8$, we write the following.
|
|
|
\begin{lstlisting}
|
|
|
neg_eight = UnaryOp(USub(), eight)
|
|
|
\end{lstlisting}
|
|
@@ -791,7 +825,7 @@ AST is not in \LangInt{}. For example, the program \racket{\code{(-
|
|
|
(read) 8)}} \python{\code{input\_int() - 8}} is not in \LangInt{}
|
|
|
because there are no rules for the \key{-} operator with two
|
|
|
arguments. Whenever we define a language with a grammar, the language
|
|
|
-only includes those programs that are justified by the rules.
|
|
|
+only includes those programs that are justified by the grammar rules.
|
|
|
|
|
|
{\if\edition\pythonEd
|
|
|
The language \LangInt{} includes a second non-terminal $\Stmt$ for statements.
|
|
@@ -920,7 +954,7 @@ defined in Figure~\ref{fig:r0-concrete-syntax}.
|
|
|
\label{sec:pattern-matching}
|
|
|
|
|
|
As mentioned in Section~\ref{sec:ast}, compilers often need to access
|
|
|
-the parts of an AST node. \racket{Racket}\python{Python} provides the
|
|
|
+the parts of an AST node. \racket{Racket}\python{As of version 3.10, Python} provides the
|
|
|
\texttt{match} feature to access the parts of a value.
|
|
|
Consider the following example. \index{subject}{match} \index{subject}{pattern matching}
|
|
|
\begin{center}
|
|
@@ -1018,8 +1052,6 @@ def leaf(arith):
|
|
|
return False
|
|
|
case BinOp(e1, Add(), e2):
|
|
|
return False
|
|
|
- case _:
|
|
|
- return False
|
|
|
|
|
|
print(leaf(Call(Name('input_int'), [])))
|
|
|
print(leaf(UnaryOp(USub(), eight)))
|
|
@@ -1044,8 +1076,7 @@ print(leaf(Constant(8)))
|
|
|
\end{lstlisting}
|
|
|
\fi}
|
|
|
{\if\edition\pythonEd
|
|
|
- \begin{lstlisting}
|
|
|
-
|
|
|
+\begin{lstlisting}
|
|
|
|
|
|
|
|
|
|
|
@@ -1084,13 +1115,12 @@ of your choice (e.g. \code{e1} and \code{e2}).
|
|
|
\label{sec:recursion}
|
|
|
\index{subject}{recursive function}
|
|
|
|
|
|
-Programs are inherently recursive. For example, an \LangInt{}
|
|
|
-expression is often made of smaller expressions. Thus, the natural way
|
|
|
-to process an entire program is with a recursive function. As a first
|
|
|
-example of such a recursive function, we define the function
|
|
|
-\code{exp} in Figure~\ref{fig:exp-predicate}, which takes an
|
|
|
-arbitrary value and determines whether or not it is an \LangInt{}
|
|
|
-expression.
|
|
|
+Programs are inherently recursive. For example, an expression is often
|
|
|
+made of smaller expressions. Thus, the natural way to process an
|
|
|
+entire program is with a recursive function. As a first example of
|
|
|
+such a recursive function, we define the function \code{exp} in
|
|
|
+Figure~\ref{fig:exp-predicate}, which takes an arbitrary value and
|
|
|
+determines whether or not it is an expression in \LangInt{}.
|
|
|
%
|
|
|
We say that a function is defined by \emph{structural recursion} when
|
|
|
it is defined using a sequence of match \racket{clauses}\python{cases}
|
|
@@ -1102,7 +1132,7 @@ child node.\footnote{This principle of structuring code according to
|
|
|
\python{We define a second function, named \code{stmt}, that recognizes
|
|
|
whether a value is a \LangInt{} statement.}
|
|
|
\python{Finally, }
|
|
|
-Figure~\ref{fig:exp-predicate} \racket{also} defines \code{Rint}, which
|
|
|
+Figure~\ref{fig:exp-predicate} \racket{also} defines \racket{\code{Rint}}\python{\code{Pint}}, which
|
|
|
determines whether an AST is a program in \LangInt{}. In general we can
|
|
|
expect to write one recursive function to handle each non-terminal in
|
|
|
a grammar.\index{subject}{structural recursion}
|
|
@@ -1177,15 +1207,15 @@ def stmt(s):
|
|
|
case _:
|
|
|
return False
|
|
|
|
|
|
-def Rint(p):
|
|
|
+def P_int(p):
|
|
|
match p:
|
|
|
case Module(body):
|
|
|
return all([stmt(s) for s in body])
|
|
|
case _:
|
|
|
return False
|
|
|
|
|
|
-print(Rint(Module([Expr(ast1_1)])))
|
|
|
-print(Rint(Module([Expr(BinOp(read, Sub(),
|
|
|
+print(P_int(Module([Expr(ast1_1)])))
|
|
|
+print(P_int(Module([Expr(BinOp(read, Sub(),
|
|
|
UnaryOp(Add(), Constant(8))))])))
|
|
|
\end{lstlisting}
|
|
|
\end{minipage}
|
|
@@ -1275,11 +1305,22 @@ designated as the definition of a language is called a
|
|
|
\emph{definitional interpreter}~\citep{reynolds72:_def_interp}.
|
|
|
\index{subject}{definitional interpreter} We warm up by creating a
|
|
|
definitional interpreter for the \LangInt{} language, which serves as
|
|
|
-a second example of structural recursion. The \texttt{interp\_Rint}
|
|
|
-function is defined in Figure~\ref{fig:interp_Rint}. The body of the
|
|
|
-function is a match on the input program followed by a call to the
|
|
|
-\lstinline{interp_exp} helper function, which in turn has one match
|
|
|
-clause per grammar rule for \LangInt{} expressions.
|
|
|
+a second example of structural recursion. The \racket{\code{interp\_Rint}}
|
|
|
+\python{\code{interp\_Pint}}
|
|
|
+function is defined in Figure~\ref{fig:interp_Rint}.
|
|
|
+%
|
|
|
+\racket{The body of the function is a match on the input program
|
|
|
+ followed by a call to the \lstinline{interp_exp} helper function,
|
|
|
+ which in turn has one match clause per grammar rule for \LangInt{}
|
|
|
+ expressions.}
|
|
|
+%
|
|
|
+\python{The body of the function matches on the \code{Module} AST node
|
|
|
+ and then invokes \code{interp\_stmt} on each statement in the
|
|
|
+ module. The \code{interp\_stmt} function includes a case for each
|
|
|
+ grammar rule of the \Stmt{} non-terminal and it calls
|
|
|
+ \code{interp\_exp} on each subexpression. The \code{interp\_exp}
|
|
|
+ function includes a case for each grammar rule of the \Exp{}
|
|
|
+ non-terminal.}
|
|
|
|
|
|
\begin{figure}[tp]
|
|
|
{\if\edition\racketEd\color{olive}
|
|
@@ -1326,7 +1367,7 @@ def interp_stmt(s):
|
|
|
case Expr(value):
|
|
|
interp_exp(value)
|
|
|
|
|
|
-def interp_Pint(p):
|
|
|
+def interp_P_int(p):
|
|
|
match p:
|
|
|
case Module(body):
|
|
|
for s in body:
|
|
@@ -1377,8 +1418,8 @@ each other, in this case nesting several additions and negations.
|
|
|
print(10 + -(12 + 20))
|
|
|
\end{lstlisting}
|
|
|
\fi}
|
|
|
-
|
|
|
-What is the result of the above program?
|
|
|
+%
|
|
|
+\noindent What is the result of the above program?
|
|
|
|
|
|
{\if\edition\racketEd\color{olive}
|
|
|
As mentioned previously, the \LangInt{} language does not support
|
|
@@ -1430,7 +1471,7 @@ and then subtracts \code{8}. So if we run
|
|
|
\fi}
|
|
|
{\if\edition\pythonEd
|
|
|
\begin{lstlisting}
|
|
|
-interp_Pint(Module([Expr(Call(Name('print'), [ast1_1]))]))
|
|
|
+interp_P_int(Module([Expr(Call(Name('print'), [ast1_1]))]))
|
|
|
\end{lstlisting}
|
|
|
\fi}
|
|
|
\noindent and if the input is \code{50}, the result is \code{42}.
|
|
@@ -1443,7 +1484,7 @@ first instance of this course!}
|
|
|
|
|
|
The job of a compiler is to translate a program in one language into a
|
|
|
program in another language so that the output program behaves the
|
|
|
-same way as the input program does. This idea is depicted in the
|
|
|
+same way as the input program. This idea is depicted in the
|
|
|
following diagram. Suppose we have two languages, $\mathcal{L}_1$ and
|
|
|
$\mathcal{L}_2$, and a definitional interpreter for each language.
|
|
|
Given a compiler that translates from language $\mathcal{L}_1$ to
|
|
@@ -1468,12 +1509,11 @@ In the next section we see our first example of a compiler.
|
|
|
\section{Example Compiler: a Partial Evaluator}
|
|
|
\label{sec:partial-evaluation}
|
|
|
|
|
|
-In this section we consider a compiler that translates \LangInt{} programs
|
|
|
-into \LangInt{} programs that may be more efficient, that is, this compiler
|
|
|
-is an optimizer. This optimizer eagerly computes the parts of the
|
|
|
-program that do not depend on any inputs, a process known as
|
|
|
-\emph{partial evaluation}~\citep{Jones:1993uq}.
|
|
|
-\index{subject}{partial evaluation}
|
|
|
+In this section we consider a compiler that translates \LangInt{}
|
|
|
+programs into \LangInt{} programs that may be more efficient. The
|
|
|
+compiler eagerly computes the parts of the program that do not depend
|
|
|
+on any inputs, a process known as \emph{partial
|
|
|
+evaluation}~\citep{Jones:1993uq}. \index{subject}{partial evaluation}
|
|
|
For example, given the following program
|
|
|
{\if\edition\racketEd\color{olive}
|
|
|
\begin{lstlisting}
|
|
@@ -1482,7 +1522,7 @@ For example, given the following program
|
|
|
\fi}
|
|
|
{\if\edition\pythonEd
|
|
|
\begin{lstlisting}
|
|
|
-print input_int() + -(5 + 3)
|
|
|
+print(input_int() + -(5 + 3) )
|
|
|
\end{lstlisting}
|
|
|
\fi}
|
|
|
\noindent our compiler translates it into the program
|
|
@@ -1493,17 +1533,17 @@ print input_int() + -(5 + 3)
|
|
|
\fi}
|
|
|
{\if\edition\pythonEd
|
|
|
\begin{lstlisting}
|
|
|
-print input_int() + -8
|
|
|
+print(input_int() + -8)
|
|
|
\end{lstlisting}
|
|
|
\fi}
|
|
|
|
|
|
Figure~\ref{fig:pe-arith} gives the code for a simple partial
|
|
|
evaluator for the \LangInt{} language. The output of the partial evaluator
|
|
|
-is an \LangInt{} program. In Figure~\ref{fig:pe-arith}, the structural
|
|
|
+is a program in \LangInt{}. In Figure~\ref{fig:pe-arith}, the structural
|
|
|
recursion over $\Exp$ is captured in the \code{pe\_exp} function
|
|
|
whereas the code for partially evaluating the negation and addition
|
|
|
-operations is factored into two separate helper functions:
|
|
|
-\code{pe\_neg} and \code{pe\_add}. The input to these helper
|
|
|
+operations is factored into two auxiliary functions:
|
|
|
+\code{pe\_neg} and \code{pe\_add}. The input to these
|
|
|
functions is the output of partially evaluating the children.
|
|
|
The \code{pe\_neg} and \code{pe\_add} functions check whether their
|
|
|
arguments are integers and if they are, perform the appropriate
|
|
@@ -1569,7 +1609,7 @@ def pe_stmt(s):
|
|
|
case Expr(value):
|
|
|
return Expr(pe_exp(value))
|
|
|
|
|
|
-def pe_Pint(p):
|
|
|
+def pe_P_int(p):
|
|
|
match p:
|
|
|
case Module(body):
|
|
|
new_body = [pe_stmt(s) for s in body]
|
|
@@ -1605,29 +1645,37 @@ Appendix~\ref{appendix:utilities}.\\
|
|
|
\fi}
|
|
|
% TODO: python version of testing the PE
|
|
|
|
|
|
+\begin{exercise}\normalfont
|
|
|
+ Create three programs in the \LangInt{} language and test whether
|
|
|
+ partially evaluating them with \code{pe\_Pint} and then
|
|
|
+ interpreting them with \code{interp\_Pint} gives the same result
|
|
|
+ as directly interpreting them with \code{interp\_Pint}.
|
|
|
+\end{exercise}
|
|
|
+
|
|
|
+
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
\chapter{Integers and Variables}
|
|
|
\label{ch:Rvar}
|
|
|
|
|
|
-This chapter is about compiling a subset of \racket{Racket}\python{Python}
|
|
|
-to x86-64 assembly
|
|
|
+This chapter is about compiling a subset of
|
|
|
+\racket{Racket}\python{Python} to x86-64 assembly
|
|
|
code~\citep{Intel:2015aa}. The subset, named \LangVar{}, includes
|
|
|
-integer arithmetic and local variable binding. We often refer to
|
|
|
-x86-64 simply as x86. The chapter begins with a description of the
|
|
|
+integer arithmetic and local variables. We often refer to x86-64
|
|
|
+simply as x86. The chapter begins with a description of the
|
|
|
\LangVar{} language (Section~\ref{sec:s0}) followed by an introduction
|
|
|
-to x86 assembly (Section~\ref{sec:x86}). The x86 assembly language
|
|
|
-is large so we discuss only the instructions needed for compiling
|
|
|
+to x86 assembly (Section~\ref{sec:x86}). The x86 assembly language is
|
|
|
+large so we discuss only the instructions needed for compiling
|
|
|
\LangVar{}. We introduce more x86 instructions in later chapters.
|
|
|
After introducing \LangVar{} and x86, we reflect on their differences
|
|
|
and come up with a plan to break down the translation from \LangVar{}
|
|
|
to x86 into a handful of steps (Section~\ref{sec:plan-s0-x86}). The
|
|
|
rest of the sections in this chapter give detailed hints regarding
|
|
|
-each step (Sections~\ref{sec:uniquify-Rvar} through \ref{sec:patch-s0}).
|
|
|
-We hope to give enough hints that the well-prepared reader, together
|
|
|
-with a few friends, can implement a compiler from \LangVar{} to x86 in
|
|
|
-a couple weeks. To give the reader a feeling for the scale of this
|
|
|
-first compiler, the instructor solution for the \LangVar{} compiler is
|
|
|
-approximately \racket{500}\python{300} lines of code.
|
|
|
+each step. We hope to give enough hints that the well-prepared
|
|
|
+reader, together with a few friends, can implement a compiler from
|
|
|
+\LangVar{} to x86 in a couple weeks. To give the reader a feeling for
|
|
|
+the scale of this first compiler, the instructor solution for the
|
|
|
+\LangVar{} compiler is approximately \racket{500}\python{300} lines of
|
|
|
+code.
|
|
|
|
|
|
\section{The \LangVar{} Language}
|
|
|
\label{sec:s0}
|
|
@@ -1637,8 +1685,8 @@ The \LangVar{} language extends the \LangInt{} language with
|
|
|
variables. The concrete syntax of the \LangVar{} language is defined
|
|
|
by the grammar in Figure~\ref{fig:Rvar-concrete-syntax} and the
|
|
|
abstract syntax is defined in Figure~\ref{fig:Rvar-syntax}. The
|
|
|
-non-terminal \Var{} may be any Racket identifier. As in \LangInt{},
|
|
|
-\key{read} is a nullary operator, \key{-} is a unary operator, and
|
|
|
+non-terminal \Var{} may be any \racket{Racket}\python{Python} identifier.
|
|
|
+As in \LangInt{}, \READOP{} is a nullary operator, \key{-} is a unary operator, and
|
|
|
\key{+} is a binary operator. Similar to \LangInt{}, the abstract
|
|
|
syntax of \LangVar{} includes the \racket{\key{Program}
|
|
|
struct}\python{\key{Module} instance} to mark the top of the
|
|
@@ -1726,16 +1774,17 @@ evaluates the body \code{(+ 10 x)}, producing $42$.
|
|
|
\fi}
|
|
|
%
|
|
|
{\if\edition\pythonEd
|
|
|
-The \LangVar{} language adds variables and the assignment statement
|
|
|
-to \LangInt{}. The assignment statement defines a variable for use by
|
|
|
-later statements and initializes the variable with the value of an expression.
|
|
|
-The abstract syntax for assignment is defined in
|
|
|
-Figure~\ref{fig:Rvar-syntax}. The concrete syntax for assignment is
|
|
|
+%
|
|
|
+The \LangVar{} language includes assignment statements, which define a
|
|
|
+variable for use in later statements and initializes the variable with
|
|
|
+the value of an expression. The abstract syntax for assignment is
|
|
|
+defined in Figure~\ref{fig:Rvar-syntax}. The concrete syntax for
|
|
|
+assignment is
|
|
|
\begin{lstlisting}
|
|
|
|$\itm{var}$| = |$\itm{exp}$|
|
|
|
\end{lstlisting}
|
|
|
-For example, the following program initializes \code{x} to $32$ and then
|
|
|
-prints the result of \code{10 + x}, producing $42$.
|
|
|
+For example, the following program initializes the variable \code{x}
|
|
|
+to $32$ and then prints the result of \code{10 + x}, producing $42$.
|
|
|
\begin{lstlisting}
|
|
|
x = 12 + 20
|
|
|
print(10 + x)
|
|
@@ -1743,6 +1792,7 @@ print(10 + x)
|
|
|
\fi}
|
|
|
|
|
|
{\if\edition\racketEd\color{olive}
|
|
|
+%
|
|
|
When there are multiple \key{let}'s for the same variable, the closest
|
|
|
enclosing \key{let} is used. That is, variable definitions overshadow
|
|
|
prior definitions. Consider the following program with two \key{let}'s
|
|
@@ -1809,26 +1859,26 @@ interpreter for \LangVar{}. The following code sketches this idea.
|
|
|
{\if\edition\pythonEd
|
|
|
\begin{minipage}{0.45\textwidth}
|
|
|
\begin{lstlisting}
|
|
|
-def interp_Rvar_exp(e):
|
|
|
+def interp_P_var_exp(e):
|
|
|
match e:
|
|
|
case UnaryOp(USub(), e1):
|
|
|
- return - interp_Rvar_exp(e1)
|
|
|
+ return - interp_P_var_exp(e1)
|
|
|
...
|
|
|
\end{lstlisting}
|
|
|
\end{minipage}
|
|
|
\begin{minipage}{0.45\textwidth}
|
|
|
\begin{lstlisting}
|
|
|
-def interp_Rif_exp(e):
|
|
|
+def interp_P_if_exp(e):
|
|
|
match e:
|
|
|
case IfExp(cnd, thn, els):
|
|
|
- match interp_Rif_exp(cnd):
|
|
|
+ match interp_P_if_exp(cnd):
|
|
|
case True:
|
|
|
- return interp_Rif_exp(thn)
|
|
|
+ return interp_P_if_exp(thn)
|
|
|
case False:
|
|
|
- return interp_Rif_exp(els)
|
|
|
+ return interp_P_if_exp(els)
|
|
|
...
|
|
|
case _:
|
|
|
- return interp_Rvar_exp(e)
|
|
|
+ return interp_P_var_exp(e)
|
|
|
\end{lstlisting}
|
|
|
\end{minipage}
|
|
|
\fi}
|
|
@@ -1847,30 +1897,42 @@ the following program.
|
|
|
print(-(42 if True else 0))
|
|
|
\end{lstlisting}
|
|
|
\fi}
|
|
|
-If we invoke \code{interp\_Rif\_exp} on this program, it dispatches to
|
|
|
-\code{interp\_Rvar\_exp} to handle the \code{-} operator, but then it
|
|
|
-recurisvely calls \code{interp\_Rvar\_exp} again on the argument of \code{-},
|
|
|
-which is an \code{If}. But there is no case for \code{If} in
|
|
|
-\code{interp\_Rvar\_exp}, so we get an error!
|
|
|
+If we invoke
|
|
|
+\racket{\code{interp\_Rif\_exp}}
|
|
|
+\python{\code{interp\_Pif\_exp}}
|
|
|
+on this program, it dispatches to
|
|
|
+\racket{\code{interp\_Rvar\_exp}}
|
|
|
+\python{\code{interp\_Pvar\_exp}}
|
|
|
+to handle the \code{-} operator, but then it recurisvely calls
|
|
|
+\racket{\code{interp\_Rvar\_exp}}
|
|
|
+\python{\code{interp\_Pvar\_exp}}
|
|
|
+again on the argument of \code{-}, which is an \code{If}. But there is no case for \code{If} in
|
|
|
+\racket{\code{interp\_Rvar\_exp}}
|
|
|
+\python{\code{interp\_Pvar\_exp}},
|
|
|
+so we get an error!
|
|
|
|
|
|
To make our interpreters extensible we need something called
|
|
|
\emph{open recursion}\index{subject}{open recursion}, where the tying of the
|
|
|
recursive knot is delayed to when the functions are
|
|
|
-composed. Object-oriented languages provide open recursion with the
|
|
|
-late-binding of overridden methods\index{subject}{method overriding}. The
|
|
|
-following code sketches this idea for interpreting \LangVar{} and
|
|
|
+composed. Object-oriented languages provide open recursion via
|
|
|
+method overriding\index{subject}{method overriding}. The
|
|
|
+following code uses method overriding to interpret \LangVar{} and
|
|
|
\LangIf{} using
|
|
|
+%
|
|
|
\racket{the
|
|
|
-\href{https://docs.racket-lang.org/guide/classes.html}{\code{class}}
|
|
|
-\index{subject}{class} feature of Racket}
|
|
|
-\python{a Python \code{class} definition}. We define one class for each
|
|
|
-language and define a method for interpreting expressions inside each
|
|
|
-class. The class for \LangIf{} inherits from the class for \LangVar{}
|
|
|
-and the method \code{interp\_exp} in \LangIf{} overrides the
|
|
|
-\code{interp\_exp} in \LangVar{}. Note that the default case of
|
|
|
-\code{interp\_exp} in \LangIf{} uses \code{super} to invoke
|
|
|
-\code{interp\_exp}, and because \LangIf{} inherits from \LangVar{},
|
|
|
-that dispatches to the \code{interp\_exp} in \LangVar{}.
|
|
|
+ \href{https://docs.racket-lang.org/guide/classes.html}{\code{class}}
|
|
|
+ \index{subject}{class} feature of Racket}
|
|
|
+%
|
|
|
+\python{a Python \code{class} definition}.
|
|
|
+%
|
|
|
+We define one class for each language and define a method for
|
|
|
+interpreting expressions inside each class. The class for \LangIf{}
|
|
|
+inherits from the class for \LangVar{} and the method
|
|
|
+\code{interp\_exp} in \LangIf{} overrides the \code{interp\_exp} in
|
|
|
+\LangVar{}. Note that the default case of \code{interp\_exp} in
|
|
|
+\LangIf{} uses \code{super} to invoke \code{interp\_exp}, and because
|
|
|
+\LangIf{} inherits from \LangVar{}, that dispatches to the
|
|
|
+\code{interp\_exp} in \LangVar{}.
|
|
|
\begin{center}
|
|
|
{\if\edition\racketEd\color{olive}
|
|
|
\begin{minipage}{0.45\textwidth}
|
|
@@ -1954,7 +2016,7 @@ and calling the \code{interp\_exp} method.
|
|
|
\fi}
|
|
|
{\if\edition\pythonEd
|
|
|
\begin{lstlisting}
|
|
|
-InterpRif().interp_exp(e0)
|
|
|
+InterpPif().interp_exp(e0)
|
|
|
\end{lstlisting}
|
|
|
\fi}
|
|
|
\noindent The default case of \code{interp\_exp} in \LangIf{} handles it by
|
|
@@ -2021,13 +2083,19 @@ information about them. We refer to these mappings as
|
|
|
common term for environment in the compiler literature is \emph{symbol
|
|
|
table}\index{subject}{symbol table}.}
|
|
|
%
|
|
|
-We use \racket{an association list
|
|
|
- (alist)}\python{\href{https://docs.python.org/3.10/library/stdtypes.html\#mapping-types-dict}{dictionary}} to represent the
|
|
|
-environment. \racket{Figure~\ref{fig:alist} gives a brief introduction
|
|
|
- to alists and the \code{racket/dict} package.} The
|
|
|
-\code{interp\_exp} function takes the current environment, \code{env},
|
|
|
-as an extra parameter. When the interpreter encounters a variable, it
|
|
|
-looks up the corresponding value in the dictionary.
|
|
|
+We use%
|
|
|
+%
|
|
|
+\racket{an association list (alist)}
|
|
|
+%
|
|
|
+\python{a Python \href{https://docs.python.org/3.10/library/stdtypes.html\#mapping-types-dict}{dictionary}}
|
|
|
+to represent the environment.
|
|
|
+%
|
|
|
+\racket{Figure~\ref{fig:alist} gives a brief introduction to alists
|
|
|
+ and the \code{racket/dict} package.}
|
|
|
+%
|
|
|
+The \code{interp\_exp} function takes the current environment,
|
|
|
+\code{env}, as an extra parameter. When the interpreter encounters a
|
|
|
+variable, it looks up the corresponding value in the dictionary.
|
|
|
%
|
|
|
\racket{When the interpreter encounters a \key{Let}, it evaluates the
|
|
|
initializing expression, extends the environment with the result
|
|
@@ -8922,7 +8990,7 @@ blocks on several test programs.
|
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
-\chapter{Loops}
|
|
|
+\chapter{Loops and Dataflow Analysis}
|
|
|
\label{ch:Rwhile}
|
|
|
|
|
|
% TODO: define R'_8
|
|
@@ -14540,6 +14608,13 @@ for the compilation of \LangDyn{}.
|
|
|
|
|
|
\fi % racketEd
|
|
|
|
|
|
+%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
+\chapter{Objects}
|
|
|
+\label{ch:Robject}
|
|
|
+\index{subject}{objects}
|
|
|
+\index{subject}{classes}
|
|
|
+
|
|
|
+
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
\chapter{Gradual Typing}
|
|
@@ -16347,12 +16422,12 @@ for the compilation of \LangPoly{}.
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
|
\clearpage
|
|
|
|
|
|
-\if\edition\racketEd
|
|
|
-
|
|
|
\appendix
|
|
|
|
|
|
\chapter{Appendix}
|
|
|
|
|
|
+\if\edition\racketEd
|
|
|
+
|
|
|
\section{Interpreters}
|
|
|
\label{appendix:interp}
|
|
|
\index{subject}{interpreter}
|
|
@@ -16469,6 +16544,8 @@ triggered. The alist may contain both immutable pairs (built with
|
|
|
|
|
|
%The \key{map2} function ...
|
|
|
|
|
|
+\fi %\racketEd
|
|
|
+
|
|
|
\section{x86 Instruction Set Quick-Reference}
|
|
|
\label{sec:x86-quick-reference}
|
|
|
\index{subject}{x86}
|
|
@@ -16535,8 +16612,8 @@ registers.
|
|
|
\label{tab:x86-instr}
|
|
|
\end{table}
|
|
|
|
|
|
+\if\edition\racketEd
|
|
|
\cleardoublepage
|
|
|
-
|
|
|
\section{Concrete Syntax for Intermediate Languages}
|
|
|
|
|
|
The concrete syntax of \LangAny{} is defined in
|
|
@@ -16675,7 +16752,7 @@ and \ref{fig:c3-concrete-syntax}, respectively.
|
|
|
\label{fig:c3-concrete-syntax}
|
|
|
\end{figure}
|
|
|
|
|
|
-\fi %racketEd
|
|
|
+\fi % racketEd
|
|
|
|
|
|
\backmatter
|
|
|
\addtocontents{toc}{\vspace{11pt}}
|