4 jaren geleden · 80d0ceda30
--- a/book.tex
+++ b/book.tex
@@ -75,7 +75,18 @@
 
				 \newcommand{\margincomment}[1]{}
			
 
				 \fi
			
 
				 
			
 
				-\lstset{%
			
 
				+\newcommand{\ocaml}[1]{{\color{blue}{#1}}}
			
 
				+
			
 
				+\newenvironment{ocamlx}{
			
 
				+  \begin{color}{blue}
			
 
				+}
			
 
				+{
			
 
				+  \end{color}
			
 
				+}
			
 
				+
			
 
				+\definecolor{BLUE}{rgb}{0,0,1}  % no idea why we need this
			
 
				+
			
 
				+\lstdefinestyle{racket}{
			
 
				 language=Lisp,
			
 
				 basicstyle=\ttfamily\small,
			
 
				 morekeywords={seq,assign,program,block,define,lambda,match,goto,if,else,then,struct,Integer,Boolean,Vector,Void,Any,while,begin,define,public,override,class},
			
@@ -85,6 +96,17 @@ columns=flexible,
 
				 moredelim=[is][\color{red}]{~}{~},
			
 
				 showstringspaces=false
			
 
				 }
			
 
				+  
			
 
				+\lstset{style=racket}
			
 
				+
			
 
				+\lstdefinestyle{ocaml}{
			
 
				+  language=[Objective]Caml,
			
 
				+  basicstyle=\ttfamily\small\color{blue},
			
 
				+  columns=flexible,
			
 
				+  escapechar={},
			
 
				+  showstringspaces=false
			
 
				+}
			
 
				+
			
 
				 
			
 
				 \newtheorem{theorem}{Theorem}
			
 
				 \newtheorem{lemma}[theorem]{Lemma}
			
@@ -144,8 +166,12 @@ showstringspaces=false
 
				   Ryan Scott \\
			
 
				   Cameron Swords \\
			
 
				   Michael M. Vitousek \\
			
 
				-  Michael Vollmer 
			
 
				-   }
			
 
				+  Michael Vollmer \\
			
 
				+  \\
			
 
				+  \ocaml{OCaml version:} \\
			
 
				+  \ocaml{Andrew Tolmach} \\
			
 
				+  \ocaml{(with inspiration from a Haskell version by Ian Winter)}
			
 
				+}
			
 
				 
			
 
				 \begin{document}
			
 
				 
			
@@ -384,6 +410,32 @@ Bloomington, Indiana
 
				 \chapter{Preliminaries}
			
 
				 \label{ch:trees-recur}
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+  Text in blue, like this, represents additions to the original book
			
 
				+  text to support the use of OCaml rather than Racket as our compiler
			
 
				+  implementation language.  The original text is never changed, so you
			
 
				+  can see both the Racket and OCaml versions in parallel. The main
			
 
				+  motivation for this is to save a lot of rote editing: the bulk of
			
 
				+  the story being told in this book is substantially the same
			
 
				+  regardless of implementation language, so most of what has been
			
 
				+  written about the Racket version applies directly to OCaml
			
 
				+  with just small mental adjustments between the syntaxes of the two
			
 
				+  languages.  A secondary motivation is that it is sometimes easier to
			
 
				+  see key underlying ideas when they are expressed in more than one
			
 
				+  way.
			
 
				+
			
 
				+  In many respects, Racket and OCaml are very similar languages: they
			
 
				+  both encourage a purely functional style of programming while also supporting
			
 
				+  imperative programming, provide higher-order functions, use
			
 
				+  garbage collection to guarantee memory safety, etc.  Indeed, the
			
 
				+  ``back ends'' of Racket and OCaml implementations are nearly
			
 
				+  interchangeable. By far the most fundamental difference between them is
			
 
				+  that OCaml uses static typing, whereas Racket uses runtime typing.
			
 
				+  The latter can provide useful flexibility, but the former has the
			
 
				+  big advantage of providing compile-time feedback on type errors.
			
 
				+  This is our main motivation for using OCaml.
			
 
				+\end{ocamlx}
			
 
				+
			
 
				 In this chapter we review the basic tools that are needed to implement
			
 
				 a compiler. Programs are typically input by a programmer as text,
			
 
				 i.e., a sequence of characters. The program-as-text representation is
			
@@ -403,7 +455,10 @@ depending on the programming language used to write the compiler.
 
				 %
			
 
				 We use Racket's
			
 
				 \href{https://docs.racket-lang.org/guide/define-struct.html}{\code{struct}}
			
 
				-feature to represent ASTs (Section~\ref{sec:ast}). We use grammars to
			
 
				+feature to represent ASTs (Section~\ref{sec:ast}).
			
 
				+\ocaml{OCaml: we use \emph{variants} (also called algebraic data types) to
			
 
				+  represent ASTs.}
			
 
				+We use grammars to
			
 
				 define the abstract syntax of programming languages
			
 
				 (Section~\ref{sec:grammar}) and pattern matching to inspect individual
			
 
				 nodes in an AST (Section~\ref{sec:pattern-matching}).  We use
			
@@ -411,13 +466,23 @@ recursive functions to construct and deconstruct ASTs
 
				 (Section~\ref{sec:recursion}).  This chapter provides an brief
			
 
				 introduction to these ideas.  \index{struct}
			
 
				 
			
 
				-\section{Abstract Syntax Trees and Racket Structures}
			
 
				+\section{Abstract Syntax Trees and Racket Structures \ocaml{/ OCaml Variants}}
			
 
				 \label{sec:ast}
			
 
				 
			
 
				 Compilers use abstract syntax trees to represent programs because they
			
 
				 often need to ask questions like: for a given part of a program, what
			
 
				 kind of language feature is it? What are its sub-parts? Consider the
			
 
				-program on the left and its AST on the right. This program is an
			
 
				+program on the left and its AST on the right.
			
 
				+\begin{ocamlx}
			
 
				+This program is
			
 
				+  itself in Racket; in addition to using Racket as the compiler implementation
			
 
				+  language, the original version of this book uses subsets of Racket as the
			
 
				+  \emph{source} languages that we compile. In the OCaml version we will be using
			
 
				+  ad-hoc source languages that look a lot like subsets of Racket, but sometimes
			
 
				+  made simpler (because there is no particular advantage to matching the messier details
			
 
				+  of Racket syntax). The code on the left will be valid in all of our source languages too.
			
 
				+\end{ocamlx}
			
 
				+This program is an
			
 
				 addition operation and it has two sub-parts, a read operation and a
			
 
				 negation. The negation has another sub-part, the integer constant
			
 
				 \code{8}. By using a tree to represent the program, we can easily
			
@@ -540,6 +605,33 @@ whereas the addition operator has two children:
 
				 \begin{lstlisting}
			
 
				 (define ast1.1 (Prim '+ (list rd neg-eight)))
			
 
				 \end{lstlisting}
			
 
				+\begin{ocamlx}
			
 
				+We define an OCaml variant type for ASTs, with a different constructor for each
			
 
				+kind of node:
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+type exp = 
			
 
				+  Int of int  
			
 
				+| Prim of primop * exp list
			
 
				+\end{lstlisting}
			
 
				+This definition depends on the definition of another variant type that enumerates the possible primops
			
 
				+(in place of the single-quoted symbols used in Racket):
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+type primop = 
			
 
				+  Read
			
 
				+| Neg
			
 
				+| Add
			
 
				+\end{lstlisting}
			
 
				+To create an AST node for the integer 8, we write \code{Int 8}.
			
 
				+To create an AST that negates
			
 
				+the number 8, we write \code{Prim(Neg,[Int 8])}, and so on:
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+let eight = Int 8
			
 
				+let neg_eight = Prim(Neg,[eight])
			
 
				+let rd = Prim(Read,[])
			
 
				+let ast1_1 = Prim(Add,[rd,neg_eight])
			
 
				+\end{lstlisting}
			
 
				+Note that OCaml identifiers are more restricted in form than those of Racket; we will typically replace uses of dash (\code{-}), dot (\code{.}), etc. by underscores (\code{\_}).
			
 
				+\end{ocamlx}
			
 
				 
			
 
				 We have made a design choice regarding the \code{Prim} structure.
			
 
				 Instead of using one structure for many different operations
			
@@ -555,11 +647,26 @@ of the compiler the code for the different primitive operators is the
 
				 same, so we might as well just write that code once, which is enabled
			
 
				 by using a single structure.
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+  We have made a similar design choice in OCaml. The corresponding
			
 
				+  alternative would have been to define our AST type as
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+type exp = 
			
 
				+    Int of int  
			
 
				+  | Read
			
 
				+  | Add of exp * exp
			
 
				+  | Neg of exp
			
 
				+\end{lstlisting}
			
 
				+Note that one advantage of using this alternative is that it would explicitly enforce
			
 
				+that each primitive operator is given the correct number of arguments (its \emph{arity});
			
 
				+this restriction is not captured in the list-based version.
			
 
				+\end{ocamlx}
			
 
				+
			
 
				 When compiling a program such as \eqref{eq:arith-prog}, we need to
			
 
				 know that the operation associated with the root node is addition and
			
 
				 we need to be able to access its two children. Racket provides pattern
			
 
				 matching to support these kinds of queries, as we see in
			
 
				-Section~\ref{sec:pattern-matching}.
			
 
				+Section~\ref{sec:pattern-matching}. \ocaml{So does OCaml.}
			
 
				 
			
 
				 In this book, we often write down the concrete syntax of a program
			
 
				 even when we really have in mind the AST because the concrete syntax
			
@@ -585,6 +692,15 @@ As an example, we describe a small language, named \LangInt{}, that consists of
 
				 integers and arithmetic operations.
			
 
				 \index{grammar}
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+  Using a grammar to describe abstract syntax is less useful in OCaml than in
			
 
				+  Racket, because our variant type definition for ASTs already serves to specify
			
 
				+  the legal forms of tree (except that it is overly flexible about the arity of
			
 
				+  primops, as mentioned above). So don't worry too much about the details of
			
 
				+  the AST grammar here---but do make sure you understand how the same ideas
			
 
				+  are applied to \emph{concrete} grammars, below.
			
 
				+\end{ocamlx}
			
 
				+  
			
 
				 The first grammar rule for the abstract syntax of \LangInt{} says that an
			
 
				 instance of the \code{Int} structure is an expression:
			
 
				 \begin{equation}
			
@@ -610,6 +726,10 @@ integers on a 64-bit machine. So an $\Int$ is a sequence of decimals
 
				 ($0$ to $9$), possibly starting with $-$ (for negative integers), such
			
 
				 that the sequence of decimals represent an integer in range $-2^{62}$
			
 
				 to $2^{62}-1$.
			
 
				+\ocaml{As it happens, OCaml's standard integer type
			
 
				+  (\code{int}) is also 63 bits on a 64-bit machine. Initially, we 
			
 
				+  will adopt the corresponding convention that $\Int$ is a 63-bit integer,
			
 
				+  but soon we will move to full 64-bit integers.}
			
 
				 
			
 
				 The second grammar rule is the \texttt{read} operation that receives
			
 
				 an input integer from the user of the program.
			
@@ -650,6 +770,10 @@ following AST is an $\Exp$.
 
				 \end{minipage}
			
 
				 \end{center}
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+  The corresponding OCaml AST expression is \code{Prim(Neg,[Int 8])}.
			
 
				+\end{ocamlx}
			
 
				+
			
 
				 The next grammar rule is for addition expressions:
			
 
				 \begin{equation}
			
 
				   \Exp ::= \ADD{\Exp}{\Exp} \label{eq:arith-add}
			
@@ -663,6 +787,7 @@ to show that
 
				 (Prim '+ (list (Prim 'read '()) (Prim '- (list (Int 8)))))
			
 
				 \end{lstlisting}
			
 
				 is an $\Exp$ in the \LangInt{} language.
			
 
				+\ocaml{\\ OCaml: \code{Prim(Add,[Prim(Read,[]);Prim(Neg,[Int 8])])}.}
			
 
				 
			
 
				 If you have an AST for which the above rules do not apply, then the
			
 
				 AST is not in \LangInt{}. For example, the program \code{(- (read) (+ 8))}
			
@@ -683,6 +808,25 @@ The \code{Program} structure is defined as follows
 
				 where \code{body} is an expression. In later chapters, the \code{info}
			
 
				 part will be used to store auxiliary information but for now it is
			
 
				 just the empty list.
			
 
				+\begin{ocamlx}
			
 
				+  In OCaml:
			
 
				+  \begin{lstlisting}[style=ocaml]
			
 
				+    type 'info rint_program = Program of 'info * exp
			
 
				+  \end{lstlisting}
			
 
				+  Again, we represent the structure as a variant type
			
 
				+  (\code{rint\_program}), this time just with one constructor
			
 
				+  (\code{Program)}.  We \emph{parameterize} \code{program} by a
			
 
				+  \emph{type variable} \code{'info} (type variables are distinguished by having
			
 
				+  a leading tick  mark).  This says that \code{rint\_program} is a family of types which can
			
 
				+  be instantiated to represent programs holding a particular kind of auxiliary information.
			
 
				+  For now, we'll just instantiate \code{'info}
			
 
				+  with the \emph{unit} type, written \code{unit}, whose sole (boring)
			
 
				+  value is written \code{()}.
			
 
				+  \begin{lstlisting}[style=ocaml]
			
 
				+    let p : unit rint_program = Program () body
			
 
				+  \end{lstlisting}
			
 
				+  Here the colon (\code{:}) introduces an explicit type annotation on \code{p}; it can be read ``has type.''
			
 
				+\end{ocamlx}
			
 
				 
			
 
				 It is common to have many grammar rules with the same left-hand side
			
 
				 but different right-hand sides, such as the rules for $\Exp$ in the
			
@@ -690,7 +834,8 @@ grammar of \LangInt{}. As a short-hand, a vertical bar can be used to
 
				 combine several right-hand-sides into a single rule.
			
 
				 
			
 
				 We collect all of the grammar rules for the abstract syntax of \LangInt{}
			
 
				-in Figure~\ref{fig:r0-syntax}. The concrete syntax for \LangInt{} is
			
 
				+in Figure~\ref{fig:r0-syntax} \ocaml{along with the corresponding OCaml type definitions}.
			
 
				+The concrete syntax for \LangInt{} is
			
 
				 defined in Figure~\ref{fig:r0-concrete-syntax}.
			
 
				 
			
 
				 The \code{read-program} function provided in \code{utilities.rkt} of
			
@@ -699,7 +844,61 @@ characters in the concrete syntax of Racket) and parses it into an
 
				 abstract syntax tree. See the description of \code{read-program} in
			
 
				 Appendix~\ref{appendix:utilities} for more details.
			
 
				 
			
 
				-
			
 
				+\begin{ocamlx}
			
 
				+  As noted above, the concrete syntaxes we will use are similar to Racket's own syntax.
			
 
				+  In particular, programs are described as \emph{S-expressions}. An S-expression can be
			
 
				+  either an atom (an integer, symbol, or quoted string) or a list of S-expressions enclosed in
			
 
				+  parentheses.  You can see that the concrete syntax for \LangInt{} is written as
			
 
				+  S-expressions where the symbols used are \code{read},\code{-}, and \code{+}, and
			
 
				+  a primitive operation invocation is described by a list whose first element is
			
 
				+  the operation symbol and whose remaining elements (0 or more of them) are
			
 
				+  S-expressions representing the arguments (which can themselves be lists).
			
 
				+  All the source languages we consider in this book will be written as S-expressions in
			
 
				+  a similar style;  the details of which symbols and shapes of list are allowed
			
 
				+  will vary from language to language.
			
 
				+  
			
 
				+  To handle all this neatly in OCaml, we split the parsing of concrete
			
 
				+  programs into two phases. First, the \code{parse} function provided
			
 
				+  in \code{sexpr.ml} of the support code reads text from a file and
			
 
				+  parses it into a generic S-expression data type.  (This code is a
			
 
				+  bit complicated and messy, but you don't have to understand its
			
 
				+  internals in order to use it.) Then, a source-language-specific
			
 
				+  program is used to convert the S-expression into the abstract syntax
			
 
				+  of that particular language. We will see later on that OCaml's pattern
			
 
				+  matching facilities make it very easy to write such conversion
			
 
				+  routines.  This is particularly true because the S-expression format
			
 
				+  we use for our concrete source languages is already very close to an
			
 
				+  abstract syntax, which means the conversion has very little work to
			
 
				+  do.  For example, as you have seen, primitive operations are all
			
 
				+  written in prefix, rather than infix, notation, so there is no need
			
 
				+  to worry about issues like precedence and associativity of operators
			
 
				+  in an expression like \code{(2 * 3 + 4)}: the S-expression syntax
			
 
				+  will be either \code{(+ (* 2 3) 4)} or \code{(* 2 (+ 3 4))}, so
			
 
				+  there is no possible ambiguity. The downside is that source programs
			
 
				+  are a bit more tedious to write, and may sometimes seem to be drowning in
			
 
				+  parentheses.
			
 
				+
			
 
				+  The OCaml representation of generic S-expressions is just another
			
 
				+  variant type:
			
 
				+  \begin{lstlisting}[style=ocaml]
			
 
				+    type sexp =
			
 
				+    | SList of sexp list
			
 
				+            (* list of expressions delimited by parentheses *)
			
 
				+    | SNum of Int64.t
			
 
				+            (* 64-bit integers *)
			
 
				+    | SSym of string
			
 
				+            (* non-digit character sequence delimited by white space *)
			
 
				+    | SString of string
			
 
				+            (* arbitrary character sequence delimited by double quotes *)
			
 
				+  \end{lstlisting}
			
 
				+  The generic S-expression parser handles (nestable) comments delimited by
			
 
				+  curly braces (\code{\{} and \code{\}}).  Symbols can contain any
			
 
				+  non-digit, non-whitespace characters except parentheses, curly braces, and
			
 
				+  the back tick (\code{\`}); this last exclusion is handy when we want to
			
 
				+  generate internal names during compilation and be sure they don't clash
			
 
				+  with a user-defined symbol.
			
 
				+\end{ocamlx}
			
 
				+  
			
 
				 \begin{figure}[tp]
			
 
				 \fbox{
			
 
				 \begin{minipage}{0.96\textwidth}
			
@@ -729,6 +928,19 @@ Appendix~\ref{appendix:utilities} for more details.
 
				 \]
			
 
				 \end{minipage}
			
 
				 }
			
 
				+
			
 
				+\begin{minipage}{0.96\textwidth}
			
 
				+\begin{lstlisting}[style=ocaml,frame=single]
			
 
				+type primop = 
			
 
				+   Read
			
 
				+ | Neg
			
 
				+ | Add
			
 
				+type exp = 
			
 
				+   Int of int  
			
 
				+ | Prim of primop * exp list
			
 
				+type 'info rint_program = Program of 'info * exp
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				 \caption{The abstract syntax of \LangInt{}.}
			
 
				 \label{fig:r0-syntax}
			
 
				 \end{figure}
			
@@ -775,11 +987,37 @@ The body of a match clause may contain arbitrary Racket code.  The
 
				 pattern variables can be used in the scope of the body, such as
			
 
				 \code{op} in \code{(print op)}.
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+  Here is the OCaml version, which is quite similar:
			
 
				+\begin{center}
			
 
				+\begin{minipage}{0.5\textwidth}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+match ast1_1 with
			
 
				+| Prim(op,[child1;child2]) -> op
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\vrule
			
 
				+\begin{minipage}{0.25\textwidth}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+
			
 
				+   Add
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{center}
			
 
				+\end{ocamlx}
			
 
				+
			
 
				 A \code{match} form may contain several clauses, as in the following
			
 
				 function \code{leaf?} that recognizes when an \LangInt{} node is a leaf in
			
 
				 the AST. The \code{match} proceeds through the clauses in order,
			
 
				 checking whether the pattern can match the input AST. The body of the
			
 
				-first clause that matches is executed. The output of \code{leaf?} for
			
 
				+first clause that matches is executed.
			
 
				+\begin{ocamlx}
			
 
				+In fact, in OCaml, we will get a warning message about the code above, because the \code{match} only contains
			
 
				+a clause for a {\tt Prim} with two children, not for other other possible forms of \code{exp}.
			
 
				+Although in this particular instance, that's OK (because of the value of \code{ast1\_1}), in general
			
 
				+it suggests a possible error. Getting warnings like this is one of the advantages of static typing.
			
 
				+\end{ocamlx}
			
 
				+The output of \code{leaf?} for
			
 
				 several ASTs is shown on the right.
			
 
				 \begin{center}
			
 
				 \begin{minipage}{0.6\textwidth}
			
@@ -827,6 +1065,63 @@ right-hand side $\ADD{\Exp}{\Exp}$. When translating from grammars to
 
				 patterns, replace non-terminals such as $\Exp$ with pattern variables
			
 
				 of your choice (e.g. \code{e1} and \code{e2}).
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+Here is the directly corresponding OCaml version.
			
 
				+\begin{center}
			
 
				+\begin{minipage}{0.6\textwidth}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+let is_leaf arith = 
			
 
				+  match arith with
			
 
				+  | Int n -> true
			
 
				+  | Prim(Read,[]) -> true
			
 
				+  | Prim(Neg,[e1]) -> false
			
 
				+  | Prim(Add,[e1;e2]) -> false
			
 
				+  | _ -> assert false
			
 
				+  
			
 
				+is_leaf (Prim(Read,[]))
			
 
				+is_leaf (Prim(Neg,[Int 8]))
			
 
				+is_leaf (Int 8)
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\vrule
			
 
				+\begin{minipage}{0.25\textwidth}
			
 
				+  \begin{lstlisting}[style=ocaml]
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+    
			
 
				+    
			
 
				+    
			
 
				+   true
			
 
				+   false
			
 
				+   true
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{center}
			
 
				+
			
 
				+The final clause uses a wildcard pattern {\tt \_}, which matches anything of type \code{exp},
			
 
				+to cover the (ill-formed) cases where a primop is given the wrong number of arguments;
			
 
				+otherwise, the compiler will again issue a warning that not all cases have been considered.
			
 
				+The \code{assert false} causes OCaml execution to halt with an uncaught exception message.
			
 
				+
			
 
				+In this particular case, we can use wildcards to write a more idiomatic version of
			
 
				+\code{is\_leaf} that doesn't require a catch-all case (and is also ``future-proof''
			
 
				+against later additions to the \code{primop} type). We also make use of the following
			
 
				+short-cut: a function that takes an argument $arg$ and then immediately performs
			
 
				+a \code{match} over $arg$ can be written more concisely using the \code{function} keyword.
			
 
				+
			
 
				+\begin{center}
			
 
				+\begin{minipage}{0.5\textwidth}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+let is_leaf = function 
			
 
				+  | Int _ -> true
			
 
				+  | Prim(_,[]) -> true
			
 
				+  | _ -> false
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{center}
			
 
				+\end{ocamlx}
			
 
				 
			
 
				 \section{Recursive Functions}
			
 
				 \label{sec:recursion}
			
@@ -920,6 +1215,79 @@ For example, the above function is subtly wrong:
 
				 \lstinline{(Rint? (Program '() (Program '() (Int 3))))}
			
 
				 returns true when it should return false.
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+There is almost no point in writing OCaml analogs to \code{exp?} or \code{Rint?}, because static
			
 
				+  typing guarantees that values claimed to be in type \code{exp} or \code{rint\_program} really are
			
 
				+  (or the OCaml program will not pass the OCaml typechecker).  However, it is still worth
			
 
				+  writing a function to check that primops are applied to the right number of arguments.
			
 
				+  Here is an idiomatic way to do that:
			
 
				+
			
 
				+\begin{center}
			
 
				+\begin{minipage}{0.85\textwidth}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+let arity = function 
			
 
				+  | Read -> 0
			
 
				+  | Neg -> 1
			
 
				+  | Add -> 2    
			
 
				+
			
 
				+let rec check_exp = function 
			
 
				+  | Int _ -> true
			
 
				+  | Prim(op,args) ->
			
 
				+      List.length args = arity op && check_exps args
			
 
				+and check_exps = function
			
 
				+  | [] -> true
			
 
				+  | (exp::exps') -> check_exp exp && check_exps exps'
			
 
				+
			
 
				+let check_program (Program(_,e)) = check_exp e
			
 
				+
			
 
				+check_program (Program((),ast1_1))
			
 
				+check_program (Program((),Prim(Neg,[Prim(Read,[]);
			
 
				+                                    Prim(Plus,[Int 8])])))
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\vrule
			
 
				+\begin{minipage}{0.1\textwidth}
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+  
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+
			
 
				+   true
			
 
				+
			
 
				+   false
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{center}
			
 
				+
			
 
				+In the definition of \code{check\_program}, since the argument type \code{rint\_program}
			
 
				+has only one constructor, we can write a pattern \code{Program(\_,e)} which matches that constructor directly in
			
 
				+place of an argument name; this binds the variable(s) (here \code{e}) of the pattern in the body of the function.
			
 
				+Note that \code{check\_exp} is declared to be recursive by using the \code{rec} keyword;
			
 
				+in fact, \code{check\_exp} and \code{check\_exps} are \emph{mutually} recursive because
			
 
				+their definitions are connected by the \code{and} keyword. \code{List.length} is a library
			
 
				+function that returns the length of a list.  Actually, the library also has a handy higher-order
			
 
				+function \code{List.for\_all} that applies a specified boolean-value function to a list and returns
			
 
				+whether it is true on all elements.  Using that, we could rewrite the \code{Prim}
			
 
				+clause of \code{check\_exp} as
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+  | Prim(op,args) ->
			
 
				+           List.length args = arity op && List.for_all check_exp args
			
 
				+\end{lstlisting}
			
 
				+and dispense with \code{check\_exps} altogether.  Being able to operate on entire lists
			
 
				+uniformly like this is one of the payoffs for using a single generic \code{Prim} constructor.
			
 
				+\end{ocamlx}
			
 
				+  
			
 
				 
			
 
				 \section{Interpreters}
			
 
				 \label{sec:interp-Rint}
			
@@ -938,7 +1306,7 @@ of structural recursion. The \texttt{interp-Rint} function is defined in
 
				 Figure~\ref{fig:interp-Rint}. The body of the function is a match on the
			
 
				 input program followed by a call to the \lstinline{interp-exp} helper
			
 
				 function, which in turn has one match clause per grammar rule for
			
 
				-\LangInt{} expressions.
			
 
				+\LangInt{} expressions. \ocaml{The OCaml version is in Figure~\ref{fig:ocaml-interp-Rint}.}
			
 
				 
			
 
				 \begin{figure}[tp]
			
 
				 \begin{lstlisting}
			
@@ -965,6 +1333,26 @@ function, which in turn has one match clause per grammar rule for
 
				 \label{fig:interp-Rint}
			
 
				 \end{figure}
			
 
				 
			
 
				+\begin{figure}[tp]
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+let interp_exp exp =
			
 
				+  match exp with
			
 
				+  | Int n -> n
			
 
				+  | Prim(Read,[]) -> read_int()
			
 
				+  | Prim(Neg,[e]) -> - (interp_exp e)
			
 
				+  | Prim(Add,[e1;e2]) ->  
			
 
				+      (* must explicitly sequence evaluation order! *)
			
 
				+      let v1 = interp_exp e1 in
			
 
				+      let v2 = interp_exp e2 in
			
 
				+      v1 + v2
			
 
				+  | _ -> assert false (* arity mismatch *)
			
 
				+  
			
 
				+let interp_program (Program(_,e)) = interp_exp e
			
 
				+\end{lstlisting}
			
 
				+\caption{\ocaml{OCaml interpreter for the \LangInt{} language.}}
			
 
				+\label{fig:ocaml-interp-Rint}
			
 
				+\end{figure}
			
 
				+
			
 
				 Let us consider the result of interpreting a few \LangInt{} programs. The
			
 
				 following program adds two integers.
			
 
				 \begin{lstlisting}
			
@@ -979,6 +1367,12 @@ abstract syntax is:
 
				 \begin{lstlisting}
			
 
				 (Program '() (Prim '+ (list (Int 10) (Int 32))))
			
 
				 \end{lstlisting}
			
 
				+\begin{ocamlx}
			
 
				+  Ocaml:
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+Program((),Prim(Add,[Int 10; Int 32]))    
			
 
				+\end{lstlisting}  
			
 
				+\end{ocamlx}
			
 
				 
			
 
				 The next example demonstrates that expressions may be nested within
			
 
				 each other, in this case nesting several additions and negations.
			
@@ -1016,6 +1410,11 @@ it is required to report that an error occurred. To signal an error,
 
				 exit with a return code of \code{255}.  The interpreters in chapters
			
 
				 \ref{ch:Rdyn} and \ref{ch:Rgrad} use
			
 
				 \code{trapped-error}.
			
 
				+\begin{ocamlx}
			
 
				+  In OCaml, overflow does not cause a trap; instead values ``wrap around''
			
 
				+  to produce results modulo $2^{64}$.  The result of this program is
			
 
				+  \key{-1223372036854775816}.
			
 
				+\end{ocamlx}
			
 
				 
			
 
				 %% This convention applies to the languages defined in this
			
 
				 %% book, as a way to simplify the student's task of implementing them,
			
@@ -1023,7 +1422,8 @@ exit with a return code of \code{255}.  The interpreters in chapters
 
				 %% 
			
 
				 
			
 
				 Moving on to the last feature of the \LangInt{} language, the \key{read}
			
 
				-operation prompts the user of the program for an integer.  Recall that
			
 
				+operation prompts the user of the program for an integer. \ocaml{The \code{read\_int}
			
 
				+  function is in the standard library.} Recall that
			
 
				 program \eqref{eq:arith-prog} performs a \key{read} and then subtracts
			
 
				 \code{8}. So if we run
			
 
				 \begin{lstlisting}
			
@@ -1120,6 +1520,34 @@ arguments are integers and if they are, perform the appropriate
 
				 arithmetic.  Otherwise, they create an AST node for the arithmetic
			
 
				 operation.
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+  The corresponding OCaml code is in Figure~\ref{fig:ocaml-pe-arith}. In \code{pe\_add}, note
			
 
				+  the syntax for matching over a pair of values simultaneously.
			
 
				+
			
 
				+\begin{figure}[tp]
			
 
				+\begin{lstlisting}[style=ocaml]
			
 
				+let pe_neg = function
			
 
				+    Int n -> Int (-n)
			
 
				+  | e -> Prim(Neg,[e])
			
 
				+
			
 
				+let pe_add e1 e2 = 
			
 
				+  match e1,e2 with
			
 
				+    Int n1,Int n2 -> Int (n1+n2)
			
 
				+  | e1,e2 -> Prim(Add,[e1;e2])
			
 
				+
			
 
				+let rec pe_exp = function
			
 
				+    Prim(Neg,[e]) -> pe_neg (pe_exp e)
			
 
				+  | Prim(Add,[e1;e2]) -> pe_add (pe_exp e1) (pe_exp e2)
			
 
				+  | e -> e
			
 
				+
			
 
				+let pe_program (Program(info,e)) = Program(info,pe_exp e)
			
 
				+\end{lstlisting}
			
 
				+\caption{\ocaml{An OCaml partial evaluator for \LangInt{}}.}
			
 
				+\label{fig:ocaml-pe-arith}
			
 
				+\end{figure}
			
 
				+
			
 
				+\end{ocamlx}
			
 
				+
			
 
				 To gain some confidence that the partial evaluator is correct, we can
			
 
				 test whether it produces programs that get the same result as the
			
 
				 input programs. That is, we can test whether it satisfies Diagram
			
@@ -1139,6 +1567,57 @@ Appendix~\ref{appendix:utilities}.\\
 
				 \end{lstlisting}
			
 
				 \end{minipage}
			
 
				 
			
 
				+\begin{ocamlx}
			
 
				+  We can perform a similar kind of test in OCaml using a utility
			
 
				+  function called \code{interp\_from\_string} which is in the support
			
 
				+  code for this chapter (not yet in the Appendix).
			
 
				+
			
 
				+  Note, however, that comparing
			
 
				+  results like this isn't a very satisfactory way of testing programs
			
 
				+  that use \code{Read} anyhow, because it requires us to input the
			
 
				+  same values twice, once for each execution, or the test will fail!
			
 
				+  A more straightforward approach is to know what result value we
			
 
				+  expect from each test program on a given set of input, and simply check
			
 
				+  that the partially evaluated program still produces that result.
			
 
				+  The support code also contains a simple driver that implements this approach.
			
 
				+\end{ocamlx}
			
 
				+
			
 
				+\begin{ocamlx}
			
 
				+{\bf Warmup Exercises}
			
 
				+
			
 
				+1. Extend the concrete language and implementation for \LangInt{} with an additional arity-2 primop that
			
 
				+performs subtraction.  The concrete form for this is \code{(- $e_1$ $e_2$)} where
			
 
				+$e_1$ and $e_2$ are expressions. Note that there are several ways to do this: you can add
			
 
				+an additional primop \code{Sub} to the AST, and add new code to check and interpret it,
			
 
				+or you can choose to ``de-sugar'' the new form into a combination of existing primops when
			
 
				+converting S-expressions to ASTs. Either way, make sure that you understand why the concrete
			
 
				+language remains unambiguous even though (a) we already have a unary negation operaror that is also written
			
 
				+with \code{-}, and (b) unlike addition, subtraction is not an associative operator, i.e.
			
 
				+$((a-b)-c$ is not generally the same thing as $(a-(b-c))$.
			
 
				+
			
 
				+2. Make some non-trivial improvement to the partial evaluator. This task is intentionally open-ended, but here
			
 
				+are some suggestions, in increasing order of difficulty.
			
 
				+\begin{itemize}
			
 
				+\item 
			
 
				+If you added a new primop for subtraction in part 1, add support for
			
 
				+partially evaluating subtractions involving constants, analogous to what is already there
			
 
				+for addition.
			
 
				+\item
			
 
				+  Add support for simplifying expressions
			
 
				+  based on simple algebraic identities, e.g. $x + 0 = x$ for all $x$.
			
 
				+\item Try to simplify expressions to
			
 
				+  the point where they contain no more than one \code{Int} leaf expression (the remaining leaves should all be
			
 
				+\code{Read}s). 
			
 
				+\end{itemize}
			
 
				+
			
 
				+3. Change the AST, interpreter and (improved) partial evaluator for \LangInt{} so that they
			
 
				+use true 64-bit integers throughout.
			
 
				+(Currently, these are used in S-expressions in the front end, but everything else uses 63-bit integers instead.)
			
 
				+This will bring our interpreter and partial evaluator in line with X86-64 machine code, our ultimate
			
 
				+compilation target.
			
 
				+The point of this exercise is to get you familiar with exploring an OCaml library, in this case \code{Int64},
			
 
				+which is documented at \url{https://ocaml.org/releases/4.12/api/Int64.html}.
			
 
				+\end{ocamlx}
			
 
				 
			
 
				 
			
 
				 %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
			
@@ -1655,7 +2134,7 @@ conclusion:
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \centering
			
 
				-\begin{tabular}{|r|l|} \hline
			
 
				+\begin{tabular}{|r|l|} \hline 
			
 
				 Position & Contents \\ \hline
			
 
				 8(\key{\%rbp}) & return address \\
			
 
				 0(\key{\%rbp}) & old \key{rbp} \\
			
@@ -3441,7 +3920,7 @@ saturation of a vertex, in Sudoku terms, is the set of numbers that
 
				 are no longer available. In graph terminology, we have the following
			
 
				 definition:
			
 
				 \begin{equation*}
			
 
				-  \mathrm{saturation}(u) = \{ c \;|\; \exists v. v \in \mathrm{neighbors}(u)
			
 
				+  \mathrm{saturation}(u) = \{ c \mid \exists v. v \in \mathrm{neighbors}(u)
			
 
				      \text{ and } \mathrm{color}(v) = c \}
			
 
				 \end{equation*}
			
 
				 where $\mathrm{neighbors}(u)$ is the set of vertices that share an
			
@@ -4501,7 +4980,7 @@ because Typed Racket expects the type of the argument to be of the
 
				 form \code{(Listof T)} or \code{(Pairof T1 T2)}.
			
 
				 
			
 
				 The \LangIf{} language performs type checking during compilation like
			
 
				-Typed Racket. In Chapter~\ref{ch:type-dynamic} we study the
			
 
				+Typed Racket. In Chapter~\ref{ch:Rdyn} we study the
			
 
				 alternative choice, that is, a dynamically typed language like Racket.
			
 
				 The \LangIf{} language is a subset of Typed Racket; for some
			
 
				 operations we are more restrictive, for example, rejecting
			
@@ -5840,7 +6319,7 @@ Add the following entry to the list of \code{passes} in
 
				 \end{lstlisting}
			
 
				 \end{exercise}
			
 
				 
			
 
				-\begin{figure}[tbp]
			
 
				+the \begin{figure}[tbp]
			
 
				 \begin{tikzpicture}[baseline=(current  bounding  box.center)]
			
 
				 \node (Rif) at (0,2)  {\large \LangIf{}};
			
 
				 \node (Rif-2) at (3,2)  {\large \LangIf{}};