4 лет назад · ccf51a1bbf
--- a/book.tex
+++ b/book.tex
@@ -1178,6 +1178,7 @@ $52$ then $10$, the following produces $42$ (not $-42$).
 
				 \end{lstlisting}
			
 
				 
			
 
				 \subsection{Extensible Interpreters via Method Overriding}
			
 
				+\label{sec:extensible-interp}
			
 
				 
			
 
				 To prepare for discussing the interpreter for \LangVar{}, we need to
			
 
				 explain why we choose to implement the interpreter using
			
@@ -4376,19 +4377,16 @@ separately because of its short-circuiting behavior.
 
				       (match e
			
 
				         [(Bool b) b]
			
 
				         [(If cnd thn els)
			
 
				-         (define b (recur cnd))
			
 
				-         (match b
			
 
				+         (match (recur cnd)
			
 
				            [#t (recur thn)]
			
 
				            [#f (recur els)])]
			
 
				         [(Prim 'and (list e1 e2))
			
 
				-         (define v1 (recur e1))
			
 
				-         (match v1
			
 
				+         (match (recur e1)
			
 
				            [#t (match (recur e2) [#t #t] [#f #f])]
			
 
				            [#f #f])]
			
 
				         [(Prim op args)
			
 
				          (apply (interp-op op) (for/list ([e args]) (recur e)))]
			
 
				-        [else ((super interp-exp env) e)]
			
 
				-        ))
			
 
				+        [else ((super interp-exp env) e)]))
			
 
				     ))
			
 
				 
			
 
				 (define (interp-Rif p)
			
@@ -4427,8 +4425,7 @@ separately because of its short-circuiting behavior.
 
				     ['>= (lambda (v1 v2)
			
 
				            (cond [(and (fixnum? v1) (fixnum? v2))
			
 
				                   (>= v1 v2)]))]
			
 
				-    [else (error 'interp-op "unknown operator")]
			
 
				-    ))
			
 
				+    [else (error 'interp-op "unknown operator")]))
			
 
				 \end{lstlisting}
			
 
				 \caption{Interpreter for the primitive operators in the \LangIf{} language.}
			
 
				 \label{fig:interp-op-Rif}
			
@@ -4455,39 +4452,56 @@ produces a \key{Boolean}.
 
				 
			
 
				 Another way to think about type checking is that it enforces a set of
			
 
				 rules about which operators can be applied to which kinds of
			
 
				-values. For example, our type checker for \LangIf{} will signal an error
			
 
				-for the below expression because, as we have seen above, the
			
 
				-expression \code{(+ 10 ...)} has type \key{Integer} but the type
			
 
				-checker enforces the rule that the argument of \code{not} must be a
			
 
				-\key{Boolean}.
			
 
				+values. For example, our type checker for \LangIf{} signals an error
			
 
				+for the below expression
			
 
				 \begin{lstlisting}
			
 
				    (not (+ 10 (- (+ 12 20))))
			
 
				 \end{lstlisting}
			
 
				-
			
 
				-We implement type checking using classes and method overriding for the
			
 
				-same reason that we use them to implement the interpreters. We
			
 
				-separate the type checker for the \LangVar{} fragment into its own class,
			
 
				-shown in Figure~\ref{fig:type-check-Rvar}. The type checker for \LangIf{} is
			
 
				-shown in Figure~\ref{fig:type-check-Rif}; inherits from the one for
			
 
				-\LangVar{}. The code for these type checkers are in the files
			
 
				-\code{type-check-Rvar.rkt} and \code{type-check-Rif.rkt} of the support
			
 
				-code.
			
 
				+The subexpression \code{(+ 10 (- (+ 12 20)))} has type \key{Integer}
			
 
				+but the type checker enforces the rule that the argument of \code{not}
			
 
				+must be a \key{Boolean}.
			
 
				+
			
 
				+We implement type checking using classes and methods because they
			
 
				+provide the open recursion needed to reuse code as we extend the type
			
 
				+checker in later chapters, analogous to the use of classes and methods
			
 
				+for the interpreters (Section~\ref{sec:extensible-interp}).
			
 
				+
			
 
				+We separate the type checker for the \LangVar{} fragment into its own
			
 
				+class, shown in Figure~\ref{fig:type-check-Rvar}. The type checker for
			
 
				+\LangIf{} is shown in Figure~\ref{fig:type-check-Rif} and it inherits
			
 
				+from the type checker for \LangVar{}. These type checkers are in the
			
 
				+files \code{type-check-Rvar.rkt} and \code{type-check-Rif.rkt} of the
			
 
				+support code.
			
 
				 %
			
 
				 Each type checker is a structurally recursive function over the AST.
			
 
				 Given an input expression \code{e}, the type checker either signals an
			
 
				 error or returns an expression and its type (\key{Integer} or
			
 
				-\key{Boolean}). There are situations in which we want to change or
			
 
				-update the expression.
			
 
				-%
			
 
				-The type of an integer literal is \code{Integer} and
			
 
				-the type of a Boolean literal is \code{Boolean}.  To handle variables,
			
 
				-the type checker uses the environment \code{env} to map variables to
			
 
				-types. Consider the clause for \key{let}.  We type check the
			
 
				-initializing expression to obtain its type \key{T} and then associate
			
 
				-type \code{T} with the variable \code{x} in the environment used to
			
 
				-type check the body of the \key{let}. Thus, when the type checker
			
 
				-encounters a use of variable \code{x}, it can find its type in the
			
 
				-environment.
			
 
				+\key{Boolean}). It returns an expression because there are situations
			
 
				+in which we want to change or update the expression.
			
 
				+
			
 
				+Next we discuss the \code{match} clauses in \code{type-check-exp} of
			
 
				+Figure~\ref{fig:type-check-Rvar}.  The type of an integer constant is
			
 
				+\code{Integer}.  To handle variables, the type checker uses the
			
 
				+environment \code{env} to map variables to types. Consider the clause
			
 
				+for \key{let}.  We type check the initializing expression to obtain
			
 
				+its type \key{T} and then associate type \code{T} with the variable
			
 
				+\code{x} in the environment used to type check the body of the
			
 
				+\key{let}. Thus, when the type checker encounters a use of variable
			
 
				+\code{x}, it can find its type in the environment.  Regarding
			
 
				+primitive operators, we recursively analyze the arguments and then
			
 
				+invoke \code{type-check-op} to check whether the argument types are
			
 
				+allowed.
			
 
				+
			
 
				+Several auxiliary methods are used in the type checker. The method
			
 
				+\code{operator-types} defines a dictionary that maps the operator
			
 
				+names to their parameter and return types. The \code{type-equal?}
			
 
				+method determines whether two types are equal, which for now simply
			
 
				+dispatches to \code{equal?}  (deep equality). The
			
 
				+\code{check-type-equal?} method triggers an error if the two types are
			
 
				+not equal. The \code{type-check-op} method looks up the operator in
			
 
				+the \code{operator-types} dictionary and then checks whether the
			
 
				+argument types are equal to the parameter types.  The result is the
			
 
				+return type of the operator.
			
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{lstlisting}[basicstyle=\ttfamily\footnotesize]
			
@@ -4517,8 +4531,8 @@ environment.
 
				     (define/public (type-check-exp env)
			
 
				       (lambda (e)
			
 
				         (match e
			
 
				-          [(Var x)  (values (Var x) (dict-ref env x))]
			
 
				           [(Int n)  (values (Int n) 'Integer)]
			
 
				+          [(Var x)  (values (Var x) (dict-ref env x))]
			
 
				           [(Let x e body)
			
 
				            (define-values (e^ Te) ((type-check-exp env) e))
			
 
				            (define-values (b Tb) ((type-check-exp (dict-set env x Te)) body))
			
@@ -4541,7 +4555,7 @@ environment.
 
				 (define (type-check-Rvar p)
			
 
				   (send (new type-check-Rvar-class) type-check-program p))
			
 
				 \end{lstlisting}
			
 
				-\caption{Type checker for the \LangVar{} fragment of \LangIf{}.}
			
 
				+\caption{Type checker for the \LangVar{} language.}
			
 
				 \label{fig:type-check-Rvar}
			
 
				 \end{figure}
			
 
				 
			
@@ -4567,6 +4581,11 @@ environment.
 
				     (define/override (type-check-exp env)
			
 
				       (lambda (e)
			
 
				         (match e
			
 
				+          [(Prim 'eq? (list e1 e2))
			
 
				+           (define-values (e1^ T1) ((type-check-exp env) e1))
			
 
				+           (define-values (e2^ T2) ((type-check-exp env) e2))
			
 
				+           (check-type-equal? T1 T2 e)
			
 
				+           (values (Prim 'eq? (list e1^ e2^)) 'Boolean)]
			
 
				           [(Bool b) (values (Bool b) 'Boolean)]
			
 
				           [(If cnd thn els)
			
 
				            (define-values (cnd^ Tc) ((type-check-exp env) cnd))
			
@@ -4575,11 +4594,6 @@ environment.
 
				            (check-type-equal? Tc 'Boolean e)
			
 
				            (check-type-equal? Tt Te e)
			
 
				            (values (If cnd^ thn^ els^) Te)]
			
 
				-          [(Prim 'eq? (list e1 e2))
			
 
				-           (define-values (e1^ T1) ((type-check-exp env) e1))
			
 
				-           (define-values (e2^ T2) ((type-check-exp env) e2))
			
 
				-           (check-type-equal? T1 T2 e)
			
 
				-           (values (Prim 'eq? (list e1^ e2^)) 'Boolean)]
			
 
				           [else ((super type-check-exp env) e)])))
			
 
				     ))
			
 
				 
			
@@ -4590,46 +4604,93 @@ environment.
 
				 \label{fig:type-check-Rif}
			
 
				 \end{figure}
			
 
				 
			
 
				-Three auxiliary methods are used in the type checker. The method
			
 
				-\code{operator-types} defines a dictionary that maps the operator
			
 
				-names to their parameter and return types. The \code{type-equal?}
			
 
				-method determines whether two types are equal, which for now simply
			
 
				-dispatches to \code{equal?}  (deep equality). The \code{type-check-op}
			
 
				-method looks up the operator in the \code{operator-types} dictionary
			
 
				-and then checks whether the argument types are equal to the parameter
			
 
				-types.  The result is the return type of the operator.
			
 
				+Next we discuss the type checker for \LangIf{} in
			
 
				+Figure~\ref{fig:type-check-Rif}.  The operator \code{eq?} requires the
			
 
				+two arguments to have the same type. The type of a Boolean constant is
			
 
				+\code{Boolean}. The condition of an \code{if} must be of
			
 
				+\code{Boolean} type and the two branches must have the same type.  The
			
 
				+\code{operator-types} function adds dictionary entries for the other
			
 
				+new operators.
			
 
				 
			
 
				 \begin{exercise}\normalfont
			
 
				-Create 10 new example programs in \LangIf{}. Half of the example programs
			
 
				-should have a type error. For those programs, to signal that a type
			
 
				-error is expected, create an empty file with the same base name but
			
 
				-with file extension \code{.tyerr}. For example, if the test
			
 
				-\code{r2\_14.rkt} is expected to error, then create an empty file
			
 
				-named \code{r2\_14.tyerr}.  The other half of the example programs
			
 
				-should not have type errors. Note that if the type checker does not
			
 
				-signal an error for a program, then interpreting that program should
			
 
				-not encounter an error.
			
 
				+Create 10 new test programs in \LangIf{}. Half of the programs should
			
 
				+have a type error. For those programs, create an empty file with the
			
 
				+same base name but with file extension \code{.tyerr}. For example, if
			
 
				+the test \code{cond\_test\_14.rkt} is expected to error, then create
			
 
				+an empty file named \code{cond\_test\_14.tyerr}.  This indicates to
			
 
				+\code{interp-tests} and \code{compiler-tests} that a type error is
			
 
				+expected. The other half of the test programs should not have type
			
 
				+errors.
			
 
				+
			
 
				+In the \code{run-tests.rkt} script, change the second argument of
			
 
				+\code{interp-tests} and \code{compiler-tests} to
			
 
				+\code{type-check-Rif}, which causes the type checker to run prior to
			
 
				+the compiler passes. Temporarily change the \code{passes} to an empty
			
 
				+list and run the script, thereby checking that the new test programs
			
 
				+either type check or not as intended.
			
 
				 \end{exercise}
			
 
				 
			
 
				 
			
 
				+\section{The \LangCIf{} Intermediate Language}
			
 
				+\label{sec:Cif}
			
 
				+
			
 
				+Figure~\ref{fig:c1-syntax} defines the abstract syntax of the
			
 
				+\LangCIf{} intermediate language. (The concrete syntax is in the
			
 
				+Appendix, Figure~\ref{fig:c1-concrete-syntax}.)  Compared to
			
 
				+\LangCVar{}, the \LangCIf{} language adds logical and comparison
			
 
				+operators to the \Exp{} non-terminal and the literals \key{\#t} and
			
 
				+\key{\#f} to the \Arg{} non-terminal.
			
 
				+
			
 
				+Regarding control flow, \LangCIf{} adds \key{goto} and \code{if}
			
 
				+statements to the \Tail{} non-terminal. The condition of an \code{if}
			
 
				+statement is a comparison operation and the branches are \code{goto}
			
 
				+statements, making it straightforward to compile \code{if} statements
			
 
				+to x86.
			
 
				+
			
 
				+
			
 
				+\begin{figure}[tp]
			
 
				+\fbox{
			
 
				+\begin{minipage}{0.96\textwidth}
			
 
				+\small    
			
 
				+\[
			
 
				+\begin{array}{lcl}
			
 
				+\Atm &::=& \gray{\INT{\Int} \mid \VAR{\Var}} \mid \BOOL{\itm{bool}} \\
			
 
				+\itm{cmp} &::= & \key{eq?} \mid \key{<}  \\
			
 
				+\Exp &::= & \gray{ \Atm \mid \READ{} }\\
			
 
				+     &\mid& \gray{ \NEG{\Atm} \mid \ADD{\Atm}{\Atm} } \\
			
 
				+     &\mid& \UNIOP{\key{'not}}{\Atm} 
			
 
				+     \mid \BINOP{\key{'}\itm{cmp}}{\Atm}{\Atm} \\
			
 
				+\Stmt &::=& \gray{ \ASSIGN{\VAR{\Var}}{\Exp} } \\
			
 
				+\Tail &::= & \gray{\RETURN{\Exp} \mid \SEQ{\Stmt}{\Tail} } 
			
 
				+    \mid \GOTO{\itm{label}} \\
			
 
				+    &\mid& \IFSTMT{\BINOP{\itm{cmp}}{\Atm}{\Atm}}{\GOTO{\itm{label}}}{\GOTO{\itm{label}}} \\
			
 
				+\LangCIf{} & ::= & \gray{\CPROGRAM{\itm{info}}{\LP\LP\itm{label}\,\key{.}\,\Tail\RP\ldots\RP}}
			
 
				+\end{array}
			
 
				+\]
			
 
				+\end{minipage}
			
 
				+}
			
 
				+\caption{The abstract syntax of \LangCIf{}, an extension of \LangCVar{}
			
 
				+  (Figure~\ref{fig:c0-syntax}).}
			
 
				+\label{fig:c1-syntax}
			
 
				+\end{figure}
			
 
				+
			
 
				 \section{The \LangXASTIf{} Language}
			
 
				 \label{sec:x86-if}
			
 
				 
			
 
				-\index{x86}
			
 
				-To implement the new logical operations, the comparison operations,
			
 
				-and the \key{if} expression, we need to delve further into the x86
			
 
				-language. Figures~\ref{fig:x86-1-concrete} and \ref{fig:x86-1} define
			
 
				-the concrete and abstract syntax for a larger subset of x86 that
			
 
				-includes instructions for logical operations, comparisons, and
			
 
				-conditional jumps.
			
 
				-
			
 
				-One small challenge is that x86 does not provide an instruction that
			
 
				-directly implements logical negation (\code{not} in \LangIf{} and \LangCIf{}).
			
 
				-However, the \code{xorq} instruction can be used to encode \code{not}.
			
 
				-The \key{xorq} instruction takes two arguments, performs a pairwise
			
 
				-exclusive-or ($\mathrm{XOR}$) operation on each bit of its arguments,
			
 
				-and writes the results into its second argument.  Recall the truth
			
 
				-table for exclusive-or:
			
 
				+\index{x86} To implement the new logical operations, the comparison
			
 
				+operations, and the \key{if} expression, we need to delve further into
			
 
				+the x86 language. Figures~\ref{fig:x86-1-concrete} and \ref{fig:x86-1}
			
 
				+define the concrete and abstract syntax for the \LangXASTIf{} subset
			
 
				+of x86, which includes instructions for logical operations,
			
 
				+comparisons, and conditional jumps.
			
 
				+
			
 
				+One challenge is that x86 does not provide an instruction that
			
 
				+directly implements logical negation (\code{not} in \LangIf{} and
			
 
				+\LangCIf{}).  However, the \code{xorq} instruction can be used to
			
 
				+encode \code{not}.  The \key{xorq} instruction takes two arguments,
			
 
				+performs a pairwise exclusive-or ($\mathrm{XOR}$) operation on each
			
 
				+bit of its arguments, and writes the results into its second argument.
			
 
				+Recall the truth table for exclusive-or:
			
 
				 \begin{center}
			
 
				 \begin{tabular}{l|cc}
			
 
				    & 0 & 1 \\ \hline
			
@@ -4694,16 +4755,16 @@ the first argument:
 
				 \Arg &::=&  \gray{\IMM{\Int} \mid \REG{\Reg} \mid \DEREF{\Reg}{\Int}} 
			
 
				      \mid \BYTEREG{\itm{bytereg}} \\
			
 
				 \itm{cc} & ::= & \key{e} \mid \key{l} \mid \key{le} \mid \key{g} \mid \key{ge} \\
			
 
				-\Instr &::=& \gray{ \BININSTR{\code{'addq}}{\Arg}{\Arg} 
			
 
				-       \mid \BININSTR{\code{'subq}}{\Arg}{\Arg} } \\
			
 
				+\Instr &::=& \gray{ \BININSTR{\code{addq}}{\Arg}{\Arg} 
			
 
				+       \mid \BININSTR{\code{subq}}{\Arg}{\Arg} } \\
			
 
				        &\mid& \gray{ \BININSTR{\code{'movq}}{\Arg}{\Arg} 
			
 
				-       \mid \UNIINSTR{\code{'negq}}{\Arg} } \\
			
 
				+       \mid \UNIINSTR{\code{negq}}{\Arg} } \\
			
 
				        &\mid& \gray{ \CALLQ{\itm{label}}{\itm{int}} \mid \RETQ{} 
			
 
				        \mid \PUSHQ{\Arg} \mid \POPQ{\Arg} \mid \JMP{\itm{label}} } \\
			
 
				-       &\mid& \BININSTR{\code{'xorq}}{\Arg}{\Arg}
			
 
				-       \mid \BININSTR{\code{'cmpq}}{\Arg}{\Arg}\\
			
 
				-       &\mid& \BININSTR{\code{'set}}{\itm{cc}}{\Arg} 
			
 
				-       \mid \BININSTR{\code{'movzbq}}{\Arg}{\Arg}\\
			
 
				+       &\mid& \BININSTR{\code{xorq}}{\Arg}{\Arg}
			
 
				+       \mid \BININSTR{\code{cmpq}}{\Arg}{\Arg}\\
			
 
				+       &\mid& \BININSTR{\code{set}}{\itm{cc}}{\Arg} 
			
 
				+       \mid \BININSTR{\code{movzbq}}{\Arg}{\Arg}\\
			
 
				        &\mid&  \JMPIF{\itm{cc}}{\itm{label}} \\
			
 
				 \Block &::= & \gray{\BLOCK{\itm{info}}{\LP\Instr\ldots\RP}} \\
			
 
				 \LangXASTIf{} &::= & \gray{\XPROGRAM{\itm{info}}{\LP\LP\itm{label} \,\key{.}\, \Block \RP\ldots\RP}}
			
@@ -4724,91 +4785,45 @@ placed. The argument order is backwards: if you want to test whether
 
				 $x < y$, then write \code{cmpq} $y$\code{,} $x$. The result of
			
 
				 \key{cmpq} is placed in the special EFLAGS register. This register
			
 
				 cannot be accessed directly but it can be queried by a number of
			
 
				-instructions, including the \key{set} instruction. The \key{set}
			
 
				-instruction puts a \key{1} or \key{0} into its destination depending
			
 
				-on whether the comparison came out according to the condition code
			
 
				-\itm{cc} (\key{e} for equal, \key{l} for less, \key{le} for
			
 
				-less-or-equal, \key{g} for greater, \key{ge} for greater-or-equal).
			
 
				-The \key{set} instruction has an annoying quirk in that its
			
 
				-destination argument must be single byte register, such as \code{al}
			
 
				-(L for lower bits) or \code{ah} (H for higher bits), which are part of
			
 
				-the \code{rax} register.  Thankfully, the \key{movzbq} instruction can
			
 
				-then be used to move from a single byte register to a normal 64-bit
			
 
				-register.
			
 
				-
			
 
				-The x86 instruction for conditional jump are relevant to the
			
 
				-compilation of \key{if} expressions.  The \key{JmpIf} instruction
			
 
				-updates the program counter to point to the instruction after the
			
 
				-indicated label depending on whether the result in the EFLAGS register
			
 
				-matches the condition code \itm{cc}, otherwise the \key{JmpIf}
			
 
				-instruction falls through to the next instruction.  The abstract
			
 
				-syntax for \key{JmpIf} differs from the concrete syntax for x86 in
			
 
				-that it separates the instruction name from the condition code. For
			
 
				+instructions, including the \key{set} instruction. The instruction
			
 
				+$\key{set}cc~d$ puts a \key{1} or \key{0} into the destination $d$
			
 
				+depending on whether the comparison comes out according to the
			
 
				+condition code \itm{cc} (\key{e} for equal, \key{l} for less, \key{le}
			
 
				+for less-or-equal, \key{g} for greater, \key{ge} for
			
 
				+greater-or-equal).  The \key{set} instruction has an annoying quirk in
			
 
				+that its destination argument must be single byte register, such as
			
 
				+\code{al} (L for lower bits) or \code{ah} (H for higher bits), which
			
 
				+are part of the \code{rax} register.  Thankfully, the \key{movzbq}
			
 
				+instruction can be used to move from a single byte register to a
			
 
				+normal 64-bit register.  The abstract syntax for the \code{set}
			
 
				+instruction differs from the concrete syntax in that it separates the
			
 
				+instruction name from the condition code.
			
 
				+
			
 
				+The x86 instruction for conditional jump is relevant to the
			
 
				+compilation of \key{if} expressions.  The instruction
			
 
				+$\key{j}\itm{cc}~\itm{label}$ updates the program counter to point to
			
 
				+the instruction after \itm{label} depending on whether the result in
			
 
				+the EFLAGS register matches the condition code \itm{cc}, otherwise the
			
 
				+jump instruction falls through to the next instruction.  Like the
			
 
				+abstract syntax for \code{set}, the abstract syntax for conditional
			
 
				+jump separates the instruction name from the condition code. For
			
 
				 example, \code{(JmpIf le foo)} corresponds to \code{jle foo}.  Because
			
 
				-the \key{JmpIf} instruction relies on the EFLAGS register, it is
			
 
				-common for the \key{JmpIf} to be immediately preceded by a \key{cmpq}
			
 
				-instruction to set the EFLAGS register.
			
 
				+the conditional jump instruction relies on the EFLAGS register, it is
			
 
				+common for it to be immediately preceded by a \key{cmpq} instruction
			
 
				+to set the EFLAGS register.
			
 
				 
			
 
				 
			
 
				-\section{The \LangCIf{} Intermediate Language}
			
 
				-\label{sec:Cif}
			
 
				-
			
 
				-As with \LangVar{}, we compile \LangIf{} to a C-like intermediate language, but
			
 
				-we need to grow that intermediate language to handle the new features
			
 
				-in \LangIf{}: Booleans and conditional expressions.
			
 
				-Figure~\ref{fig:c1-syntax} defines the abstract syntax of \LangCIf{}. (The
			
 
				-concrete syntax is in the Appendix,
			
 
				-Figure~\ref{fig:c1-concrete-syntax}.)  The \LangCIf{} language adds logical
			
 
				-and comparison operators to the $\Exp$ non-terminal and the literals
			
 
				-\key{\#t} and \key{\#f} to the $\Arg$ non-terminal.  Regarding control
			
 
				-flow, \LangCIf{} differs considerably from \LangIf{}.  Instead of \key{if}
			
 
				-expressions, \LangCIf{} has \key{goto} and conditional \key{goto} in the
			
 
				-grammar for $\Tail$. This means that a sequence of statements may now
			
 
				-end with a \code{goto} or a conditional \code{goto}. The conditional
			
 
				-\code{goto} jumps to one of two labels depending on the outcome of the
			
 
				-comparison. In Section~\ref{sec:explicate-control-Rif} we discuss how
			
 
				-to translate from \LangIf{} to \LangCIf{}, bridging this gap between \key{if}
			
 
				-expressions and \key{goto}'s.
			
 
				-
			
 
				-\begin{figure}[tp]
			
 
				-\fbox{
			
 
				-\begin{minipage}{0.96\textwidth}
			
 
				-\small    
			
 
				-\[
			
 
				-\begin{array}{lcl}
			
 
				-\Atm &::=& \gray{\INT{\Int} \mid \VAR{\Var}} \mid \BOOL{\itm{bool}} \\
			
 
				-\itm{cmp} &::= & \key{eq?} \mid \key{<}  \\
			
 
				-\Exp &::= & \gray{ \Atm \mid \READ{} }\\
			
 
				-     &\mid& \gray{ \NEG{\Atm} \mid \ADD{\Atm}{\Atm} } \\
			
 
				-     &\mid& \UNIOP{\key{'not}}{\Atm} 
			
 
				-     \mid \BINOP{\key{'}\itm{cmp}}{\Atm}{\Atm} \\
			
 
				-\Stmt &::=& \gray{ \ASSIGN{\VAR{\Var}}{\Exp} } \\
			
 
				-\Tail &::= & \gray{\RETURN{\Exp} \mid \SEQ{\Stmt}{\Tail} } 
			
 
				-    \mid \GOTO{\itm{label}} \\
			
 
				-    &\mid& \IFSTMT{\BINOP{\itm{cmp}}{\Atm}{\Atm}}{\GOTO{\itm{label}}}{\GOTO{\itm{label}}} \\
			
 
				-\LangCIf{} & ::= & \gray{\CPROGRAM{\itm{info}}{\LP\LP\itm{label}\,\key{.}\,\Tail\RP\ldots\RP}}
			
 
				-\end{array}
			
 
				-\]
			
 
				-\end{minipage}
			
 
				-}
			
 
				-\caption{The abstract syntax of \LangCIf{}, an extension of \LangCVar{}
			
 
				-  (Figure~\ref{fig:c0-syntax}).}
			
 
				-\label{fig:c1-syntax}
			
 
				-\end{figure}
			
 
				-
			
 
				-\clearpage
			
 
				-
			
 
				 \section{Shrink the \LangIf{} Language}
			
 
				 \label{sec:shrink-Rif}
			
 
				 
			
 
				 The \LangIf{} language includes several operators that are easily
			
 
				-expressible in terms of other operators. For example, subtraction is
			
 
				-expressible in terms of addition and negation.
			
 
				+expressible with other operators. For example, subtraction is
			
 
				+expressible using addition and negation.
			
 
				 \[
			
 
				  \key{(-}\; e_1 \; e_2\key{)} \quad \Rightarrow \quad \LP\key{+} \; e_1 \; \LP\key{-} \; e_2\RP\RP
			
 
				 \]
			
 
				-Several of the comparison operations are expressible in terms of
			
 
				-less-than and logical negation.
			
 
				+Several of the comparison operations are expressible using less-than
			
 
				+and logical negation.
			
 
				 \[
			
 
				 \LP\key{<=}\; e_1 \; e_2\RP \quad \Rightarrow \quad
			
 
				 \LP\key{let}~\LP\LS\key{tmp.1}~e_1\RS\RP~\LP\key{not}\;\LP\key{<}\;e_2\;\key{tmp.1})\RP\RP
			
@@ -4816,40 +4831,57 @@ less-than and logical negation.
 
				 The \key{let} is needed in the above translation to ensure that
			
 
				 expression $e_1$ is evaluated before $e_2$.
			
 
				 
			
 
				-By performing these translations near the front-end of the compiler,
			
 
				-the later passes of the compiler do not need to deal with these
			
 
				-constructs, making those passes shorter. On the other hand, sometimes
			
 
				-these translations make it more difficult to generate the most
			
 
				-efficient code with respect to the number of instructions. However,
			
 
				-these differences typically do not affect the number of accesses to
			
 
				-memory, which is the primary factor that determines execution time on
			
 
				-modern computer architectures.
			
 
				+By performing these translations in the front-end of the compiler, the
			
 
				+later passes of the compiler do not need to deal with these operators,
			
 
				+making the passes shorter.
			
 
				+
			
 
				+%% On the other hand, sometimes
			
 
				+%% these translations make it more difficult to generate the most
			
 
				+%% efficient code with respect to the number of instructions. However,
			
 
				+%% these differences typically do not affect the number of accesses to
			
 
				+%% memory, which is the primary factor that determines execution time on
			
 
				+%% modern computer architectures.
			
 
				 
			
 
				 \begin{exercise}\normalfont
			
 
				-  Implement the pass \code{shrink} that removes subtraction,
			
 
				-  \key{and}, \key{or}, \key{<=}, \key{>}, and \key{>=} from the language
			
 
				-  by translating them to other constructs in \LangIf{}.  Create tests to
			
 
				-  make sure that the behavior of all of these constructs stays the
			
 
				-  same after translation.
			
 
				+Implement the pass \code{shrink} to remove subtraction, \key{and},
			
 
				+\key{or}, \key{<=}, \key{>}, and \key{>=} from the language by
			
 
				+translating them to other constructs in \LangIf{}.
			
 
				+%
			
 
				+Create six test programs that involve these operators.
			
 
				+%
			
 
				+In the \code{run-tests.rkt} script, add the following entry for
			
 
				+\code{shrink} to the list of passes (it should be the only pass at
			
 
				+this point).
			
 
				+\begin{lstlisting}
			
 
				+(list "shrink" shrink interp-Rif type-check-Rif)
			
 
				+\end{lstlisting}
			
 
				+This instructs \code{interp-tests} to run the intepreter
			
 
				+\code{interp-Rif} and the type checker \code{type-check-Rif} on the
			
 
				+output of \code{shrink}.
			
 
				+%
			
 
				+Run the script to test the \code{shrink} pass on all the test
			
 
				+programs.
			
 
				+
			
 
				 \end{exercise}
			
 
				 
			
 
				 \section{Remove Complex Operands}
			
 
				 \label{sec:remove-complex-opera-Rif}
			
 
				 
			
 
				+The output language for this pass is \LangIfANF{}
			
 
				+(Figure~\ref{fig:Rif-anf-syntax}), the administrative normal form of
			
 
				+\LangIf{}.  The \code{Bool} form is an atomic expressions but
			
 
				+\code{If} is not.  All three sub-expressions of an \code{If} are
			
 
				+allowed to be complex expressions but the operands of \code{not} and
			
 
				+the comparisons must be atoms.
			
 
				+
			
 
				 Add cases for \code{Bool} and \code{If} to the \code{rco-exp} and
			
 
				-\code{rco-atom} functions according to the definition of the output
			
 
				-language for this pass, \LangIfANF{}, the administrative normal
			
 
				-form of \LangIf{}, which is defined in Figure~\ref{fig:Rif-anf-syntax}. The
			
 
				-\code{Bool} form is an atomic expressions but \code{If} is not. All
			
 
				-three sub-expressions of an \code{If} are allowed to be complex
			
 
				-expressions in the output of \code{remove-complex-opera*}, but the
			
 
				-operands of \code{not} and the comparisons must be atoms.  Regarding
			
 
				-the \code{If} form, it is particularly important to \textbf{not}
			
 
				+\code{rco-atom} functions according to whether the output needs to be
			
 
				+\Exp{} or \Atm{} as specified in the grammar for \LangIfANF{}.
			
 
				+Regarding \code{If}, it is particularly important to \textbf{not}
			
 
				 replace its condition with a temporary variable because that would
			
 
				 interfere with the generation of high-quality output in the
			
 
				 \code{explicate-control} pass.
			
 
				 
			
 
				-
			
 
				 \begin{figure}[tp]
			
 
				 \centering
			
 
				 \fbox{
			
@@ -4877,7 +4909,7 @@ R^{\dagger}_2  &::=& \PROGRAM{\code{'()}}{\Exp}
 
				 
			
 
				 Recall that the purpose of \code{explicate-control} is to make the
			
 
				 order of evaluation explicit in the syntax of the program.  With the
			
 
				-addition of \key{if} in \LangIf{} this get more interesting.
			
 
				+addition of \key{if} this get more interesting.
			
 
				 
			
 
				 As a motivating example, consider the following program that has an
			
 
				 \key{if} expression nested in the predicate of another \key{if}.
			
@@ -4899,7 +4931,7 @@ handle each of them in isolation, regardless of their context.  Each
 
				 comparison would be translated into a \key{cmpq} instruction followed
			
 
				 by a couple instructions to move the result from the EFLAGS register
			
 
				 into a general purpose register or stack location. Each \key{if} would
			
 
				-be translated into the combination of a \key{cmpq} and a conditional
			
 
				+be translated into a \key{cmpq} instruction followed by a conditional
			
 
				 jump. The generated code for the inner \key{if} in the above example
			
 
				 would be as follows.
			
 
				 \begin{center}
			
@@ -4909,7 +4941,7 @@ would be as follows.
 
				     cmpq $1, x          ;; (< x 1)
			
 
				     setl %al
			
 
				     movzbq %al, tmp
			
 
				-    cmpq $1, tmp        ;; (if (< x 1) ...)
			
 
				+    cmpq $1, tmp        ;; (if ...)
			
 
				     je then_branch_1
			
 
				     jmp else_branch_1
			
 
				     ...
			
@@ -4917,11 +4949,26 @@ would be as follows.
 
				 \end{minipage}
			
 
				 \end{center}
			
 
				 However, if we take context into account we can do better and reduce
			
 
				-the use of \key{cmpq} and EFLAG-accessing instructions.
			
 
				+the use of \key{cmpq} instructions for accessing the EFLAG register.
			
 
				 
			
 
				-One idea is to try and reorganize the code at the level of \LangIf{},
			
 
				-pushing the outer \key{if} inside the inner one. This would yield the
			
 
				-following code.
			
 
				+Our goal will be compile \key{if} expressions so that the relevant
			
 
				+comparison instruction appears directly before the conditional jump.
			
 
				+For example, we want to generate the following code for the inner
			
 
				+\code{if}.
			
 
				+\begin{center}
			
 
				+\begin{minipage}{0.96\textwidth}
			
 
				+\begin{lstlisting}
			
 
				+    ...
			
 
				+    cmpq $1, x
			
 
				+    je then_branch_1
			
 
				+    jmp else_branch_1
			
 
				+    ...
			
 
				+\end{lstlisting}
			
 
				+\end{minipage}
			
 
				+\end{center}
			
 
				+One way to achieve this is to reorganize the code at the level of
			
 
				+\LangIf{}, pushing the outer \key{if} inside the inner one, yielding
			
 
				+the following code.
			
 
				 \begin{center}
			
 
				 \begin{minipage}{0.96\textwidth}
			
 
				 \begin{lstlisting}
			
@@ -4937,24 +4984,24 @@ following code.
 
				 \end{lstlisting}
			
 
				 \end{minipage}
			
 
				 \end{center}
			
 
				-Unfortunately, this approach duplicates the two branches, and a
			
 
				-compiler must never duplicate code!
			
 
				+Unfortunately, this approach duplicates the two branches from the
			
 
				+outer \code{if} and a compiler must never duplicate code!
			
 
				 
			
 
				-We need a way to perform the above transformation, but without
			
 
				+We need a way to perform the above transformation but without
			
 
				 duplicating code. That is, we need a way for different parts of a
			
 
				-program to refer to the same piece of code, that is, to \emph{share}
			
 
				-code. At the level of x86 assembly this is straightforward because we
			
 
				-can label the code for each of the branches and insert jumps in all
			
 
				-the places that need to execute the branches. At the higher level of
			
 
				-our intermediate languages, we need to move away from abstract syntax
			
 
				-\emph{trees} and instead use \emph{graphs}. In particular, we use a
			
 
				-standard program representation called a \emph{control flow graph}
			
 
				-(CFG), due to Frances Elizabeth \citet{Allen:1970uq}.
			
 
				-\index{control-flow graph} Each vertex is a labeled sequence of code,
			
 
				-called a \emph{basic block}, and each edge represents a jump to
			
 
				-another block. The \key{Program} construct of \LangCVar{} and \LangCIf{} contains
			
 
				-a control flow graph represented as an alist mapping labels to basic
			
 
				-blocks. Each basic block is represented by the $\Tail$ non-terminal.
			
 
				+program to refer to the same piece of code. At the level of x86
			
 
				+assembly this is straightforward because we can label the code for
			
 
				+each branch and insert jumps in all the places that need to execute
			
 
				+the branch. In our intermediate language, we need to move away from
			
 
				+abstract syntax \emph{trees} and instead use \emph{graphs}. In
			
 
				+particular, we use a standard program representation called a
			
 
				+\emph{control flow graph} (CFG), due to Frances Elizabeth
			
 
				+\citet{Allen:1970uq}.  \index{control-flow graph} Each vertex is a
			
 
				+labeled sequence of code, called a \emph{basic block}, and each edge
			
 
				+represents a jump to another block. The \key{CProgram} construct of
			
 
				+\LangCVar{} and \LangCIf{} contains a control flow graph represented
			
 
				+as an alist mapping labels to basic blocks. Each basic block is
			
 
				+represented by the $\Tail$ non-terminal.
			
 
				 
			
 
				 Figure~\ref{fig:explicate-control-s1-38} shows the output of the
			
 
				 \code{remove-complex-opera*} pass and then the
			
@@ -4963,18 +5010,17 @@ the output program and then discuss the algorithm.
 
				 %
			
 
				 Following the order of evaluation in the output of
			
 
				 \code{remove-complex-opera*}, we first have two calls to \code{(read)}
			
 
				-and then the less-than-comparison to \code{1} in the predicate of the
			
 
				+and then the comparison \lstinline{(< x 1)} in the predicate of the
			
 
				 inner \key{if}.  In the output of \code{explicate-control}, in the
			
 
				-block labeled \code{start}, this becomes two assignment statements
			
 
				-followed by a conditional \key{goto} to label \code{block40} or
			
 
				+block labeled \code{start}, is two assignment statements followed by a
			
 
				+\code{if} statement that branches to \code{block40} or
			
 
				 \code{block41}. The blocks associated with those labels contain the
			
 
				-translations of the code \code{(eq? x 0)} and \code{(eq? x 2)},
			
 
				-respectively. Regarding the block labeled with \code{block40}, we
			
 
				-start with the comparison to \code{0} and then have a conditional
			
 
				-goto, either to label \code{block38} or label \code{block39}, which
			
 
				-are the two branches of the outer \key{if}, i.e., \code{(+ y 2)} and
			
 
				-\code{(+ y 10)}. The story for the block labeled \code{block41} is
			
 
				-similar.
			
 
				+translations of the code \lstinline{(eq? x 0)} and \lstinline{(eq? x 2)},
			
 
				+respectively.  In particular, we start \code{block40} with the
			
 
				+comparison \lstinline{(eq? x 0)} and then branch to \code{block38} or
			
 
				+\code{block39}, the two branches of the outer \key{if}, i.e.,
			
 
				+\lstinline{(+ y 2)} and \lstinline{(+ y 10)}. The story for
			
 
				+\code{block41} is similar.
			
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{tabular}{lll}
			
@@ -5008,20 +5054,14 @@ $\Rightarrow$
 
				 start:
			
 
				     x = (read);
			
 
				     y = (read);
			
 
				-    if (< x 1)
			
 
				-       goto block40;
			
 
				-    else
			
 
				-       goto block41;
			
 
				+    if (< x 1) goto block40;
			
 
				+    else goto block41;
			
 
				 block40:
			
 
				-    if (eq? x 0)
			
 
				-       goto block38;
			
 
				-    else
			
 
				-       goto block39;
			
 
				+    if (eq? x 0) goto block38;
			
 
				+    else goto block39;
			
 
				 block41:
			
 
				-    if (eq? x 2)
			
 
				-       goto block38;
			
 
				-    else
			
 
				-       goto block39;
			
 
				+    if (eq? x 2) goto block38;
			
 
				+    else goto block39;
			
 
				 block38:
			
 
				     return (+ y 2);
			
 
				 block39:
			
@@ -5049,10 +5089,10 @@ Recall that in Section~\ref{sec:explicate-control-r1} we implement
 
				 functions, \code{explicate-tail} and \code{explicate-assign}.  The
			
 
				 former function translates expressions in tail position whereas the
			
 
				 later function translates expressions on the right-hand-side of a
			
 
				-\key{let}. With the addition of \key{if} expression in \LangIf{} we have a
			
 
				-new kind of context to deal with: the predicate position of the
			
 
				-\key{if}. We need another function, \code{explicate-pred}, that takes
			
 
				-an \LangIf{} expression and two blocks for the then-branch and
			
 
				+\key{let}. With the addition of \key{if} expression in \LangIf{} we
			
 
				+have a new kind of position to deal with: the predicate position of
			
 
				+the \key{if}. We need another function, \code{explicate-pred}, that
			
 
				+takes an \LangIf{} expression and two blocks for the then-branch and
			
 
				 else-branch. The output of \code{explicate-pred} is a block.
			
 
				 %
			
 
				 %% Note that the three explicate functions need to construct a
			
@@ -5060,15 +5100,14 @@ else-branch. The output of \code{explicate-pred} is a block.
 
				 %% variable.
			
 
				 %
			
 
				 In the following paragraphs we discuss specific cases in the
			
 
				-\code{explicate-pred} function as well as the additions to the
			
 
				+\code{explicate-pred} function as well as additions to the
			
 
				 \code{explicate-tail} and \code{explicate-assign} functions.
			
 
				 
			
 
				 The function \code{explicate-pred} will need a case for every
			
 
				 expression that can have type \code{Boolean}. We detail a few cases
			
 
				 here and leave the rest for the reader. The input to this function is
			
 
				 an expression and two blocks, $B_1$ and $B_2$, for the two branches of
			
 
				-the enclosing \key{if}, though some care will be needed regarding how
			
 
				-we represent the blocks. Suppose the expression is the Boolean
			
 
				+the enclosing \key{if}. Suppose the expression is the Boolean
			
 
				 \code{\#t}.  Then we can perform a kind of partial evaluation
			
 
				 \index{partial evaluation} and translate it to the ``then'' branch
			
 
				 $B_1$. Likewise, we translate \code{\#f} to the ``else`` branch $B_2$.
			
@@ -5078,30 +5117,31 @@ $B_1$. Likewise, we translate \code{\#f} to the ``else`` branch $B_2$.
 
				 \key{\#f} \quad\Rightarrow\quad B_2
			
 
				 \]
			
 
				 These two cases demonstrate that we sometimes discard one of the
			
 
				-blocks that are input to \code{explicate-pred}. We will need to
			
 
				-arrange for the blocks that we actually use to appear in the resulting
			
 
				-control-flow graph, but not the discarded blocks.
			
 
				+blocks that are input to \code{explicate-pred}. We want the blocks
			
 
				+that we actually use to appear in the resulting control-flow graph,
			
 
				+but not the discarded blocks. We return to this issue later.
			
 
				 
			
 
				 The case for \key{if} in \code{explicate-pred} is particularly
			
 
				-illuminating as it deals with the challenges that we discussed above
			
 
				+illuminating because it deals with the challenges we discussed above
			
 
				 regarding the example of the nested \key{if} expressions.  The
			
 
				 ``then'' and ``else'' branches of the current \key{if} inherit their
			
 
				 context from the current one, that is, predicate context. So we
			
 
				 recursively apply \code{explicate-pred} to the ``then'' and ``else''
			
 
				-branches. For both of those recursive calls, we shall pass the blocks
			
 
				-$B_1$ and $B_2$. Thus, $B_1$ may get used twice, once inside each
			
 
				-recursive call, and likewise for $B_2$. As discussed above, to avoid
			
 
				-duplicating code, we need to add these blocks to the control-flow
			
 
				-graph so that we can instead refer to them by name and execute them
			
 
				-with a \key{goto}. However, as we saw in the cases above for \key{\#t}
			
 
				-and \key{\#f}, the blocks $B_1$ or $B_2$ may not get used at all and
			
 
				-we don't want to prematurely add them to the control-flow graph if
			
 
				-they end up being discarded.
			
 
				-
			
 
				-The solution to this conundrum is to use \emph{lazy evaluation} to
			
 
				-delay adding the blocks to the control-flow graph until the points
			
 
				-where we know they will be used~\citep{Friedman:1976aa}.\index{lazy
			
 
				-  evaluation} Racket provides support for lazy evaluation with the
			
 
				+branches. For both of those recursive calls, we pass the blocks $B_1$
			
 
				+and $B_2$. Thus, $B_1$ may get used twice, once inside each recursive
			
 
				+call, and likewise for $B_2$. As discussed above, to avoid duplicating
			
 
				+code, we need to add these blocks to the control-flow graph so that we
			
 
				+can instead refer to them by name and execute them with a
			
 
				+\key{goto}. However, as we saw in the cases above for \key{\#t} and
			
 
				+\key{\#f}, the blocks $B_1$ or $B_2$ may not get used at all and we
			
 
				+don't want to prematurely add them to the control-flow graph if they
			
 
				+end up being discarded.
			
 
				+
			
 
				+The solution to this conundrum is to use \emph{lazy
			
 
				+  evaluation}\index{lazy evaluation} \citep{Friedman:1976aa} to delay
			
 
				+adding the blocks to the control-flow graph until the points where we
			
 
				+know they will be used. Racket provides support for lazy evaluation
			
 
				+with the
			
 
				 \href{https://docs.racket-lang.org/reference/Delayed_Evaluation.html}{\code{racket/promise}}
			
 
				 package. The expression \key{(delay} $e_1 \ldots e_n$\key{)}
			
 
				 \index{delay} creates a \emph{promise}\index{promise} in which the