|
@@ -122,7 +122,6 @@ University.
|
|
|
|
|
|
\mainmatter
|
|
\mainmatter
|
|
|
|
|
|
-\if{0}
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\chapter*{Preface}
|
|
\chapter*{Preface}
|
|
|
|
|
|
@@ -141,22 +140,19 @@ Chez Scheme and taught the compiler course.
|
|
|
|
|
|
The compiler course evolved to incorporate novel pedagogical ideas
|
|
The compiler course evolved to incorporate novel pedagogical ideas
|
|
while also including elements of effective real-world compilers. One
|
|
while also including elements of effective real-world compilers. One
|
|
-of Dan's ideas was to split the compiler into many small passes over
|
|
|
|
-the input program and subsequent intermediate representations, so that
|
|
|
|
-the code for each pass would be easy to understood in isolation. (In
|
|
|
|
-contrast, most compilers of the time were organized into only a few
|
|
|
|
-monolithic passes for reasons of compile-time efficiency.) Kent and
|
|
|
|
-his students, Dipanwita Sarkar and Andrew Keep, developed
|
|
|
|
-infrastructure to support this approach and evolved the course, first
|
|
|
|
-to use micro-sized passes and then into even smaller nano
|
|
|
|
-passes~\citep{Sarkar:2004fk,Keep:2012aa}. I took this compiler course
|
|
|
|
-in the early 2000's, as part of my Ph.D. studies at Indiana
|
|
|
|
-University. Needless to say, I enjoyed the course immensely.
|
|
|
|
-
|
|
|
|
-\rn{I think that 1999 when I took it was the first micropass semester, and that
|
|
|
|
- that approach preceded the infrastructure work by Dipa.}
|
|
|
|
-
|
|
|
|
-One of my classmates, Abdulaziz Ghuloum, observed that the
|
|
|
|
|
|
+of Dan's ideas was to split the compiler into many small ``passes'' so
|
|
|
|
+that the code for each pass would be easy to understood in isolation.
|
|
|
|
+(In contrast, most compilers of the time were organized into only a
|
|
|
|
+few monolithic passes for reasons of compile-time efficiency.) Kent,
|
|
|
|
+with help later from his students Dipanwita Sarkar and Andrew Keep,
|
|
|
|
+developed infrastructure to support this approach and evolved the
|
|
|
|
+course, first to use micro-sized passes and then into even smaller
|
|
|
|
+nano passes~\citep{Sarkar:2004fk,Keep:2012aa}. Jeremy Siek took this
|
|
|
|
+compiler course in the early 2000's, as part of his Ph.D. studies at
|
|
|
|
+Indiana University. Needless to say, Jeremy enjoyed the course
|
|
|
|
+immensely.
|
|
|
|
+
|
|
|
|
+One of Jeremy's classmates, Abdulaziz Ghuloum, observed that the
|
|
front-to-back organization of the course made it difficult for
|
|
front-to-back organization of the course made it difficult for
|
|
students to understand the rationale for the compiler
|
|
students to understand the rationale for the compiler
|
|
design. Abdulaziz proposed an incremental approach in which the
|
|
design. Abdulaziz proposed an incremental approach in which the
|
|
@@ -167,20 +163,20 @@ add or modify passes to handle the new feature~\citep{Ghuloum:2006bh}.
|
|
In this way, the students see how the language features motivate
|
|
In this way, the students see how the language features motivate
|
|
aspects of the compiler design.
|
|
aspects of the compiler design.
|
|
|
|
|
|
-After graduating from Indiana University in 2005, I went on to teach
|
|
|
|
-at the University of Colorado. I adapted the nano pass and incremental
|
|
|
|
-approaches to compiling a subset of the Python
|
|
|
|
|
|
+After graduating from Indiana University in 2005, Jeremy went on to
|
|
|
|
+teach at the University of Colorado. He adapted the nano pass and
|
|
|
|
+incremental approaches to compiling a subset of the Python
|
|
language~\citep{Siek:2012ab}. Python and Scheme are quite different
|
|
language~\citep{Siek:2012ab}. Python and Scheme are quite different
|
|
on the surface but there is a large overlap in the compiler techniques
|
|
on the surface but there is a large overlap in the compiler techniques
|
|
-required for the two languages. Thus, I was able to teach much of the
|
|
|
|
-same content from the Indiana compiler course. I very much enjoyed
|
|
|
|
-teaching the course organized in this way, and even better, many of
|
|
|
|
-the students learned a lot and got excited about compilers.
|
|
|
|
|
|
+required for the two languages. Thus, Jeremy was able to teach much of
|
|
|
|
+the same content from the Indiana compiler course. He very much
|
|
|
|
+enjoyed teaching the course organized in this way, and even better,
|
|
|
|
+many of the students learned a lot and got excited about compilers.
|
|
|
|
|
|
-It is now 2016 and I too have returned to teach at Indiana University.
|
|
|
|
-In my absence the compiler course had switched from the front-to-back
|
|
|
|
|
|
+Jeremy returned to teach at Indiana University in 2013. In his
|
|
|
|
+absence the compiler course had switched from the front-to-back
|
|
organization to a back-to-front organization. Seeing how well the
|
|
organization to a back-to-front organization. Seeing how well the
|
|
-incremental approach worked at Colorado, I started porting and
|
|
|
|
|
|
+incremental approach worked at Colorado, he started porting and
|
|
adapting the structure of the Colorado course back into the land of
|
|
adapting the structure of the Colorado course back into the land of
|
|
Scheme. In the meantime Indiana had moved on from Scheme to Racket, so
|
|
Scheme. In the meantime Indiana had moved on from Scheme to Racket, so
|
|
the course is now about compiling a subset of Racket to the x86
|
|
the course is now about compiling a subset of Racket to the x86
|
|
@@ -188,24 +184,24 @@ assembly language and the compiler is implemented in
|
|
Racket~\citep{plt-tr}.
|
|
Racket~\citep{plt-tr}.
|
|
|
|
|
|
This is the textbook for the incremental version of the compiler
|
|
This is the textbook for the incremental version of the compiler
|
|
-course at Indiana University (Spring 2016) and it is the first
|
|
|
|
-open textbook for an Indiana compiler course. With this book I hope to
|
|
|
|
-make the Indiana compiler course available to people that have not had
|
|
|
|
-the chance to study in Bloomington in person. Many of the compiler
|
|
|
|
-design decisions in this book are drawn from the assignment
|
|
|
|
-descriptions of \cite{Dybvig:2010aa}. I have captured what I think are
|
|
|
|
-the most important topics from \cite{Dybvig:2010aa} but I have omitted
|
|
|
|
-topics that I think are less interesting conceptually and I have made
|
|
|
|
|
|
+course at Indiana University (Spring 2016 - Fall 2018) and it is the
|
|
|
|
+first open textbook for an Indiana compiler course. With this book we
|
|
|
|
+hope to make the Indiana compiler course available to people that have
|
|
|
|
+not had the chance to study in Bloomington in person. Many of the
|
|
|
|
+compiler design decisions in this book are drawn from the assignment
|
|
|
|
+descriptions of \cite{Dybvig:2010aa}. We have captured what we think are
|
|
|
|
+the most important topics from \cite{Dybvig:2010aa} but we have omitted
|
|
|
|
+topics that I think are less interesting conceptually and we have made
|
|
simplifications to reduce complexity. In this way, this book leans
|
|
simplifications to reduce complexity. In this way, this book leans
|
|
more towards pedagogy than towards the absolute efficiency of the
|
|
more towards pedagogy than towards the absolute efficiency of the
|
|
-generated code. Also, the book differs in places where I saw the
|
|
|
|
|
|
+generated code. Also, the book differs in places where we saw the
|
|
opportunity to make the topics more fun, such as in relating register
|
|
opportunity to make the topics more fun, such as in relating register
|
|
allocation to Sudoku (Chapter~\ref{ch:register-allocation}).
|
|
allocation to Sudoku (Chapter~\ref{ch:register-allocation}).
|
|
|
|
|
|
\section*{Prerequisites}
|
|
\section*{Prerequisites}
|
|
|
|
|
|
The material in this book is challenging but rewarding. It is meant to
|
|
The material in this book is challenging but rewarding. It is meant to
|
|
-prepare students for a lifelong career in programming languages. I do
|
|
|
|
|
|
+prepare students for a lifelong career in programming languages. We do
|
|
not recommend this book for students who want to dabble in programming
|
|
not recommend this book for students who want to dabble in programming
|
|
languages. Because the book uses the Racket language both for the
|
|
languages. Because the book uses the Racket language both for the
|
|
implementation of the compiler and for the language that is compiled,
|
|
implementation of the compiler and for the language that is compiled,
|
|
@@ -232,14 +228,16 @@ parts of x86-64 assembly language that are needed.
|
|
|
|
|
|
\section*{Acknowledgments}
|
|
\section*{Acknowledgments}
|
|
|
|
|
|
-Need to give thanks to
|
|
|
|
|
|
+Many people have contributed to the ideas, techniques, organization,
|
|
|
|
+and teaching of the materials in this book. We especially thank the
|
|
|
|
+following people.
|
|
|
|
+
|
|
\begin{itemize}
|
|
\begin{itemize}
|
|
\item Bor-Yuh Evan Chang
|
|
\item Bor-Yuh Evan Chang
|
|
\item Kent Dybvig
|
|
\item Kent Dybvig
|
|
\item Daniel P. Friedman
|
|
\item Daniel P. Friedman
|
|
\item Ronald Garcia
|
|
\item Ronald Garcia
|
|
\item Abdulaziz Ghuloum
|
|
\item Abdulaziz Ghuloum
|
|
-\item Ryan Newton
|
|
|
|
\item Dipanwita Sarkar
|
|
\item Dipanwita Sarkar
|
|
\item Andrew Keep
|
|
\item Andrew Keep
|
|
\item Oscar Waddell
|
|
\item Oscar Waddell
|
|
@@ -248,9 +246,8 @@ Need to give thanks to
|
|
\mbox{}\\
|
|
\mbox{}\\
|
|
\noindent Jeremy G. Siek \\
|
|
\noindent Jeremy G. Siek \\
|
|
\noindent \url{http://homes.soic.indiana.edu/jsiek} \\
|
|
\noindent \url{http://homes.soic.indiana.edu/jsiek} \\
|
|
-\noindent Spring 2016
|
|
|
|
|
|
+%\noindent Spring 2016
|
|
|
|
|
|
-\fi{} %% End Preface
|
|
|
|
|
|
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
|
|
\chapter{Preliminaries}
|
|
\chapter{Preliminaries}
|
|
@@ -269,6 +266,7 @@ Scheme, we use S-expressions to represent programs (Section~\ref{sec:ast})
|
|
and pattern matching to inspect individual nodes in an AST
|
|
and pattern matching to inspect individual nodes in an AST
|
|
(Section~\ref{sec:pattern-matching}). We use recursion to construct
|
|
(Section~\ref{sec:pattern-matching}). We use recursion to construct
|
|
and deconstruct entire ASTs (Section~\ref{sec:recursion}).
|
|
and deconstruct entire ASTs (Section~\ref{sec:recursion}).
|
|
|
|
+This chapter provides an introduction to these ideas.
|
|
|
|
|
|
\section{Abstract Syntax Trees}
|
|
\section{Abstract Syntax Trees}
|
|
\label{sec:ast}
|
|
\label{sec:ast}
|
|
@@ -307,6 +305,31 @@ node except for the root has a \emph{parent} (the node it is the child
|
|
of). If a node has no children, it is a \emph{leaf} node. Otherwise
|
|
of). If a node has no children, it is a \emph{leaf} node. Otherwise
|
|
it is an \emph{internal} node.
|
|
it is an \emph{internal} node.
|
|
|
|
|
|
|
|
+Recall that an \emph{symbolic expression} (S-expression) is either
|
|
|
|
+\begin{enumerate}
|
|
|
|
+\item an atom, or
|
|
|
|
+\item a pair of two S-expressions, written $(e_1 \key{.} e_2)$,
|
|
|
|
+ where $e_1$ and $e_2$ are each an S-expression.
|
|
|
|
+\end{enumerate}
|
|
|
|
+An \emph{atom} can be a symbol, such as \code{'hello}, a number, the null
|
|
|
|
+value \code{'()}, etc. It is quite common to use S-expressions
|
|
|
|
+to represent a list, such as $a, b ,c$ in the following way:
|
|
|
|
+\begin{lstlisting}
|
|
|
|
+ '(a . (b . (c . ())))
|
|
|
|
+\end{lstlisting}
|
|
|
|
+Each element of the list is in the first slot of a pair, and the
|
|
|
|
+second slot is either the rest of the list or the null value, to mark
|
|
|
|
+the end of the list. Such lists are so common that Racket provides
|
|
|
|
+special notation for them that removes the need for the periods
|
|
|
|
+and so many parenthesis:
|
|
|
|
+\begin{lstlisting}
|
|
|
|
+ '(a b c)
|
|
|
|
+\end{lstlisting}
|
|
|
|
+Thus, the S-expression of \eqref{eq:arith-prog} is a list whose first
|
|
|
|
+element is the symbol \code{'+}, whose second element is a list
|
|
|
|
+(containing just one element, the symbol \code{read}), and whose third
|
|
|
|
+element is another list (containing two atoms).
|
|
|
|
+
|
|
When deciding how to compile the above program, we need to know that
|
|
When deciding how to compile the above program, we need to know that
|
|
the root node operation is addition and that it has two children:
|
|
the root node operation is addition and that it has two children:
|
|
\texttt{read} and a negation. The abstract syntax tree data structure
|
|
\texttt{read} and a negation. The abstract syntax tree data structure
|