Browse Source

closure conversion

Jeremy Siek 4 years ago
parent
commit
d821125124
2 changed files with 183 additions and 111 deletions
  1. 182 111
      book.tex
  2. 1 0
      defs.tex

+ 182 - 111
book.tex

@@ -5580,6 +5580,13 @@ Subsequently, in the \code{uncover-locals} pass
 propagated to all variables (including the temporaries generated by
 propagated to all variables (including the temporaries generated by
 \code{remove-complex-opera*}).
 \code{remove-complex-opera*}).
 
 
+To create the s-expression for the \code{Vector} type in
+Figure~\ref{fig:typecheck-R3}, we use the
+\href{https://docs.racket-lang.org/reference/quasiquote.html}{unquote-splicing
+  operator} \code{,@} to insert the list \code{t*} without its usual
+start and end parentheses.  \index{unquote-slicing}
+
+
 \begin{figure}[tp]
 \begin{figure}[tp]
 \begin{lstlisting}
 \begin{lstlisting}
 (define (type-check-exp env)
 (define (type-check-exp env)
@@ -7668,26 +7675,26 @@ $\Downarrow$
 $\Rightarrow$
 $\Rightarrow$
 \begin{minipage}{0.5\textwidth}
 \begin{minipage}{0.5\textwidth}
 \begin{lstlisting}[basicstyle=\ttfamily\scriptsize]
 \begin{lstlisting}[basicstyle=\ttfamily\scriptsize]
-(program ()
- (define (add86)
-   ((locals (x87 . Integer) (y88 . Integer)) 
-    (num-params . 2))
-   ((add86start .
-      (block ()
-        (movq (reg rcx) (var x87))
-        (movq (reg rdx) (var y88))
-        (movq (var x87) (reg rax))
-        (addq (var y88) (reg rax))
-        (jmp add86conclusion)))))
- (define (main)
-   ((locals . ((tmp89 . (Integer Integer -> Integer))))
-    (num-params . 0))
-   ((mainstart .
-      (block ()
-        (leaq (fun-ref add86) (var tmp89))
-        (movq (int 40) (reg rcx))
-        (movq (int 2) (reg rdx))
-        (tail-jmp (var tmp89))))))
+(define (add86)
+  ((locals (x87 . Integer) (y88 . Integer)) 
+   (num-params . 2))
+  ((add86start .
+     (block ()
+       (movq (reg rcx) (var x87))
+       (movq (reg rdx) (var y88))
+       (movq (var x87) (reg rax))
+       (addq (var y88) (reg rax))
+       (jmp add86conclusion)))))
+(define (main)
+  ((locals . ((tmp89 . (Integer Integer
+                        -> Integer))))
+   (num-params . 0))
+  ((mainstart .
+     (block ()
+       (leaq (fun-ref add86) (var tmp89))
+       (movq (int 40) (reg rcx))
+       (movq (int 2) (reg rdx))
+       (tail-jmp (var tmp89)))))
 \end{lstlisting}
 \end{lstlisting}
 $\Downarrow$
 $\Downarrow$
 \end{minipage}
 \end{minipage}
@@ -7851,21 +7858,21 @@ functional languages such as Racket. By lexical scoping we mean that a
 function's body may refer to variables whose binding site is outside
 function's body may refer to variables whose binding site is outside
 of the function, in an enclosing scope.
 of the function, in an enclosing scope.
 %
 %
-Consider the example in Figure~\ref{fig:lexical-scoping} featuring an
-anonymous function defined using the \key{lambda} form.  The body of
-the \key{lambda}, refers to three variables: \code{x}, \code{y}, and
-\code{z}. The binding sites for \code{x} and \code{y} are outside of
-the \key{lambda}. Variable \code{y} is bound by the enclosing
-\key{let} and \code{x} is a parameter of \code{f}. The \key{lambda} is
-returned from the function \code{f}. Below the definition of \code{f},
-we have two calls to \code{f} with different arguments for \code{x},
-first \code{5} then \code{3}. The functions returned from \code{f} are
-bound to variables \code{g} and \code{h}. Even though these two
-functions were created by the same \code{lambda}, they are really
-different functions because they use different values for
-\code{x}. Finally, we apply \code{g} to \code{11} (producing
-\code{20}) and apply \code{h} to \code{15} (producing \code{22}) so
-the result of this program is \code{42}.
+Consider the example in Figure~\ref{fig:lexical-scoping} written in
+$R_5$, which extends $R_4$ with anonymous functions using the
+\key{lambda} form.  The body of the \key{lambda}, refers to three
+variables: \code{x}, \code{y}, and \code{z}. The binding sites for
+\code{x} and \code{y} are outside of the \key{lambda}. Variable
+\code{y} is bound by the enclosing \key{let} and \code{x} is a
+parameter of function \code{f}. The \key{lambda} is returned from the
+function \code{f}. The main expression of the program includes two
+calls to \code{f} with different arguments for \code{x}, first
+\code{5} then \code{3}. The functions returned from \code{f} are bound
+to variables \code{g} and \code{h}. Even though these two functions
+were created by the same \code{lambda}, they are really different
+functions because they use different values for \code{x}. Applying
+\code{g} to \code{11} produces \code{20} whereas applying \code{h} to
+\code{15} produces \code{22}. The result of this program is \code{42}.
 
 
 \begin{figure}[btp]
 \begin{figure}[btp]
 % s4_6.rkt
 % s4_6.rkt
@@ -7884,20 +7891,102 @@ the result of this program is \code{42}.
 \end{figure}
 \end{figure}
 
 
 
 
+The approach that we shall take for implementing lexically scoped
+functions is to compile them into top-level function definitions,
+translating from $R_5$ into $R_4$.  However, the compiler will need to
+provide special treatment for variable occurrences such as \code{x}
+and \code{y} in the body of the \code{lambda} of
+Figure~\ref{fig:lexical-scoping}. After all, an $R_4$ function may not
+refer to variables defined outside of it. To identify such variable
+occurrences, we review the standard notion of free variable.
+
+\begin{definition}
+A variable is \emph{free in expression} $e$ if the variable occurs
+inside $e$ but does not have an enclosing binding in $e$.\index{free
+  variable}
+\end{definition}
+
+For example, in the expression \code{(+ x (+ y z))} the variables
+\code{x}, \code{y}, and \code{z} are all free.  On the other hand,
+only \code{x} and \code{y} are free in the following expression
+because \code{z} is bound by the \code{lambda}.
+\begin{lstlisting}
+   (lambda: ([z : Integer]) : Integer
+      (+ x (+ y z)))
+\end{lstlisting}
+
+So the free variables of a \code{lambda} are the ones that will need
+special treatment. We need to arrange for some way to transport, at
+runtime, the values of those variables from the point where the
+\code{lambda} was created to the point where the \code{lambda} is
+applied. An efficient solution to the problem, due to
+\citet{Cardelli:1983aa}, is to bundle into a vector the values of the
+free variables together with the function pointer for the lambda's
+code, an arrangement called a \emph{flat closure} (which we shorten to
+just ``closure'').  \index{closure}\index{flat closure} Fortunately,
+we have all the ingredients to make closures, Chapter~\ref{ch:tuples}
+gave us vectors and Chapter~\ref{ch:functions} gave us function
+pointers. The function pointer shall reside at index $0$ and the
+values for the free variables will fill in the rest of the vector.
+
+Let us revisit the example in Figure~\ref{fig:lexical-scoping} to see
+how closures work. It's a three-step dance. The program first calls
+function \code{f}, which creates a closure for the \code{lambda}. The
+closure is a vector whose first element is a pointer to the top-level
+function that we will generate for the \code{lambda}, the second
+element is the value of \code{x}, which is \code{5}, and the third
+element is \code{4}, the value of \code{y}. The closure does not
+contain an element for \code{z} because \code{z} is not a free
+variable of the \code{lambda}. Creating the closure is step 1 of the
+dance. The closure is returned from \code{f} and bound to \code{g}, as
+shown in Figure~\ref{fig:closures}.
+%
+The second call to \code{f} creates another closure, this time with
+\code{3} in the second slot (for \code{x}). This closure is also
+returned from \code{f} but bound to \code{h}, which is also shown in
+Figure~\ref{fig:closures}.
+
+\begin{figure}[tbp]
+\centering \includegraphics[width=0.6\textwidth]{figs/closures}
+\caption{Example closure representation for the \key{lambda}'s
+  in Figure~\ref{fig:lexical-scoping}.}
+\label{fig:closures}
+\end{figure}
+
+Continuing with the example, consider the application of \code{g} to
+\code{11} in Figure~\ref{fig:lexical-scoping}.  To apply a closure, we
+obtain the function pointer in the first element of the closure and
+call it, passing in the closure itself and then the regular arguments,
+in this case \code{11}. This technique for applying a closure is step
+2 of the dance.
+%
+But doesn't this \code{lambda} only take 1 argument, for parameter
+\code{z}? The third and final step of the dance is generating a
+top-level function for a \code{lambda}.  We add an additional
+parameter for the closure and we insert a \code{let} at the beginning
+of the function for each free variable, to bind those variables to the
+appropriate elements from the closure parameter.
+%
+This three-step dance is known as \emph{closure conversion}.  We
+discuss the details of closure conversion in
+Section~\ref{sec:closure-conversion} and the code generated from the
+example in Section~\ref{sec:example-lambda}. But first we define the
+syntax and semantics of $R_5$ in Section~\ref{sec:r5}.
+
 \section{The $R_5$ Language}
 \section{The $R_5$ Language}
+\label{sec:r5}
 
 
-The syntax for this language with anonymous functions and lexical
-scoping, $R_5$, is defined in Figure~\ref{fig:r5-syntax}. It adds the
-\key{lambda} form to the grammar for $R_4$, which already has syntax
-for function application.  In this chapter we shall describe how to
-compile $R_5$ back into $R_4$, compiling lexically-scoped functions
-into a combination of functions (as in $R_4$) and tuples (as in
-$R_3$).
+The concrete and abstract syntax for $R_5$, a language with anonymous
+functions and lexical scoping, is defined in
+Figures~\ref{fig:r5-concrete-syntax} and ~\ref{fig:r5-syntax}. It adds
+the \key{lambda} form to the grammar for $R_4$, which already has
+syntax for function application.
 
 
 \begin{figure}[tp]
 \begin{figure}[tp]
 \centering
 \centering
 \fbox{
 \fbox{
-\begin{minipage}{0.96\textwidth}
+  \begin{minipage}{0.96\textwidth}
+    \small
 \[
 \[
 \begin{array}{lcl}
 \begin{array}{lcl}
   \Type &::=& \gray{\key{Integer} \mid \key{Boolean}
   \Type &::=& \gray{\key{Integer} \mid \key{Boolean}
@@ -7913,8 +8002,8 @@ $R_3$).
     &\mid& \gray{(\key{eq?}\;\Exp\;\Exp) \mid \IF{\Exp}{\Exp}{\Exp}} \\
     &\mid& \gray{(\key{eq?}\;\Exp\;\Exp) \mid \IF{\Exp}{\Exp}{\Exp}} \\
     &\mid& \gray{(\key{vector}\;\Exp\ldots) \mid
     &\mid& \gray{(\key{vector}\;\Exp\ldots) \mid
           (\key{vector-ref}\;\Exp\;\Int)} \\
           (\key{vector-ref}\;\Exp\;\Int)} \\
-    &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})} \\
-    &\mid& \gray{(\Exp \; \Exp\ldots)} \\
+    &\mid& \gray{(\key{vector-set!}\;\Exp\;\Int\;\Exp)\mid (\key{void})
+     \mid (\Exp \; \Exp\ldots) } \\
     &\mid& (\key{lambda:}\; ([\Var \key{:} \Type]\ldots) \key{:} \Type \; \Exp) \\
     &\mid& (\key{lambda:}\; ([\Var \key{:} \Type]\ldots) \key{:} \Type \; \Exp) \\
   \Def &::=& \gray{(\key{define}\; (\Var \; [\Var \key{:} \Type]\ldots) \key{:} \Type \; \Exp)} \\
   \Def &::=& \gray{(\key{define}\; (\Var \; [\Var \key{:} \Type]\ldots) \key{:} \Type \; \Exp)} \\
   R_5 &::=& \gray{(\key{program} \; \Def\ldots \; \Exp)}
   R_5 &::=& \gray{(\key{program} \; \Def\ldots \; \Exp)}
@@ -7922,66 +8011,46 @@ $R_3$).
 \]
 \]
 \end{minipage}
 \end{minipage}
 }
 }
-\caption{Syntax of $R_5$, extending $R_4$ (Figure~\ref{fig:r4-syntax}) 
+\caption{Concrete syntax of $R_5$, extending $R_4$ (Figure~\ref{fig:r4-syntax}) 
   with \key{lambda}.}
   with \key{lambda}.}
-\label{fig:r5-syntax}
+\label{fig:r5-concrete-syntax}
 \end{figure}
 \end{figure}
 
 
-To compile lexically-scoped functions to top-level function
-definitions, the compiler will need to provide special treatment to
-variable occurrences such as \code{x} and \code{y} in the body of the
-\code{lambda} of Figure~\ref{fig:lexical-scoping}, for the functions
-of $R_4$ may not refer to variables defined outside the function. To
-identify such variable occurrences, we review the standard notion of
-free variable.
-
-\begin{definition}
-A variable is \emph{free with respect to an expression} $e$ if the
-variable occurs inside $e$ but does not have an enclosing binding in
-$e$.\index{free variable}
-\end{definition}
-
-For example, the variables \code{x}, \code{y}, and \code{z} are all
-free with respect to the expression \code{(+ x (+ y z))}.  On the
-other hand, only \code{x} and \code{y} are free with respect to the
-following expression because \code{z} is bound by the \code{lambda}.
-\begin{lstlisting}
-   (lambda: ([z : Integer]) : Integer
-      (+ x (+ y z)))
-\end{lstlisting}
-
-Once we have identified the free variables of a \code{lambda}, we need
-to arrange for some way to transport, at runtime, the values of those
-variables from the point where the \code{lambda} was created to the
-point where the \code{lambda} is applied. Referring again to
-Figure~\ref{fig:lexical-scoping}, the binding of \code{x} to \code{5}
-needs to be used in the application of \code{g} to \code{11}, but the
-binding of \code{x} to \code{3} needs to be used in the application of
-\code{h} to \code{15}. An efficient solution to the problem, due to
-\citet{Cardelli:1983aa}, is to bundle into a vector the values of the
-free variables together with the function pointer for the lambda's
-code, an arrangement called a \emph{flat closure} (which we shorten to
-just ``closure'').
-\index{closure}\index{flat closure}
-Fortunately, we have all the ingredients to make
-closures, Chapter~\ref{ch:tuples} gave us vectors and
-Chapter~\ref{ch:functions} gave us function pointers. The function
-pointer shall reside at index $0$ and the values for free variables
-will fill in the rest of the vector. Figure~\ref{fig:closures} depicts
-the two closures created by the two calls to \code{f} in
-Figure~\ref{fig:lexical-scoping}.  Because the two closures came from
-the same \key{lambda}, they share the same function pointer but differ
-in the values for the free variable \code{x}.
-
-\begin{figure}[tbp]
-\centering \includegraphics[width=0.6\textwidth]{figs/closures}
-\caption{Example closure representation for the \key{lambda}'s
-  in Figure~\ref{fig:lexical-scoping}.}
-\label{fig:closures}
+\begin{figure}[tp]
+\centering
+\fbox{
+  \begin{minipage}{0.96\textwidth}
+    \small
+\[
+\begin{array}{lcl}
+\Exp &::=& \gray{ \INT{\Int} \mid \READ{} \mid \NEG{\Exp} } \\
+     &\mid& \gray{ \ADD{\Exp}{\Exp} 
+      \mid \BINOP{\code{'-}}{\Exp}{\Exp} } \\
+     &\mid& \gray{ \VAR{\Var} \mid \LET{\Var}{\Exp}{\Exp} } \\
+     &\mid& \gray{ \BOOL{\itm{bool}} 
+      \mid \AND{\Exp}{\Exp} }\\
+     &\mid& \gray{ \OR{\Exp}{\Exp}
+      \mid \NOT{\Exp} } \\
+     &\mid& \gray{ \BINOP{\itm{cmp}}{\Exp}{\Exp}
+      \mid \IF{\Exp}{\Exp}{\Exp} } \\
+     &\mid& \gray{ \VECTOR{\Exp} } \\
+     &\mid& \gray{ \VECREF{\Exp}{\Int} }\\
+     &\mid& \gray{ \VECSET{\Exp}{\Int}{\Exp}} \\
+     &\mid& \gray{ \VOID{} \mid \LP\key{HasType}~\Exp~\Type \RP 
+     \mid \APPLY{\Exp}{\Exp\ldots} }\\
+     &\mid& \LAMBDA{[\Var\code{:}\Type]\ldots}{\Type}{\Exp}\\
+ \Def &::=& \gray{ \FUNDEF{\Var}{([\Var \code{:} \Type]\ldots)}{\Type}{\code{'()}}{\Exp} }\\
+  R_5 &::=& \gray{ \PROGRAMDEFSEXP{\code{'()}}{(\Def\ldots)}{\Exp} }
+\end{array}
+\]
+\end{minipage}
+}
+\caption{The abstract syntax of $R_5$, extending $R_4$ (Figure~\ref{fig:r4-syntax}).}
+\label{fig:r5-syntax}
 \end{figure}
 \end{figure}
 
 
-
-\section{Interpreting $R_5$}
+\index{interpreter}
+\label{sec:interp-R5}
 
 
 Figure~\ref{fig:interp-R5} shows the definitional interpreter for
 Figure~\ref{fig:interp-R5} shows the definitional interpreter for
 $R_5$. The clause for \key{lambda} saves the current environment
 $R_5$. The clause for \key{lambda} saves the current environment
@@ -7998,13 +8067,13 @@ values.
     (define recur (interp-exp env))
     (define recur (interp-exp env))
     (match e
     (match e
       ...
       ...
-      [`(lambda: ([,xs : ,Ts] ...) : ,rT ,body)
+      [(Lambda (list `[,xs : ,Ts] ...) rT body)
        `(lambda ,xs ,body ,env)]
        `(lambda ,xs ,body ,env)]
-      [`(app ,fun ,args ...)
+      [(Apply fun args)
        (define fun-val ((interp-exp env) fun))
        (define fun-val ((interp-exp env) fun))
        (define arg-vals (map (interp-exp env) args))
        (define arg-vals (map (interp-exp env) args))
        (match fun-val
        (match fun-val
-	 [`(lambda (,xs ...) ,body ,lam-env)
+	 [`(lambda ,xs ,body ,lam-env)
 	  (define new-env (append (map cons xs arg-vals) lam-env))
 	  (define new-env (append (map cons xs arg-vals) lam-env))
 	  ((interp-exp new-env) body)]
 	  ((interp-exp new-env) body)]
 	 [else (error "interp-exp, expected function, not" fun-val)])]
 	 [else (error "interp-exp, expected function, not" fun-val)])]
@@ -8015,7 +8084,7 @@ values.
 \label{fig:interp-R5}
 \label{fig:interp-R5}
 \end{figure}
 \end{figure}
 
 
-\section{Type Checking $R_5$}
+
 \label{sec:type-check-r5}
 \label{sec:type-check-r5}
 \index{type checking}
 \index{type checking}
 
 
@@ -8030,13 +8099,15 @@ require the body's type to match the declared return type.
 (define (typecheck-R5 env)
 (define (typecheck-R5 env)
   (lambda (e)
   (lambda (e)
     (match e
     (match e
-      [`(lambda: ([,xs : ,Ts] ...) : ,rT ,body)
-       (define new-env (append (map cons xs Ts) env))
-       (define bodyT ((typecheck-R5 new-env) body))
-       (cond [(equal? rT bodyT)
-              `(,@Ts -> ,rT)]
-             [else
-              (error "mismatch in return type" bodyT rT)])]
+      [(Lambda (and bnd `([,xs : ,Ts] ...)) rT body)
+       (define-values (new-body bodyT) 
+          ((type-check-exp (append (map cons xs Ts) env)) body))
+       (define ty `(,@Ts -> ,rT))
+       (cond
+         [(equal? rT bodyT)
+           (values (HasType (Lambda bnd rT new-body) ty) ty)]
+         [else
+           (error "mismatch in return type" bodyT rT)])]
       ...
       ...
       )))
       )))
 \end{lstlisting}
 \end{lstlisting}

+ 1 - 0
defs.tex

@@ -47,6 +47,7 @@
 \newcommand{\APPLY}[2]{\key{(Apply}\;#1\;#2\code{)}}
 \newcommand{\APPLY}[2]{\key{(Apply}\;#1\;#2\code{)}}
 \newcommand{\FUNDEF}[5]{\key{(Def}~#1~#2~#3~#4~#5\code{)}}
 \newcommand{\FUNDEF}[5]{\key{(Def}~#1~#2~#3~#4~#5\code{)}}
 \newcommand{\FUNREF}[1]{\key{(FunRef}\;#1\code{)}}
 \newcommand{\FUNREF}[1]{\key{(FunRef}\;#1\code{)}}
+\newcommand{\LAMBDA}[3]{\key{(Lambda}~#1~#2~#3\code{)}}
 
 
 \newcommand{\ASSIGN}[2]{\key{(Assign}~#1\;#2\key{)}}
 \newcommand{\ASSIGN}[2]{\key{(Assign}~#1\;#2\key{)}}
 \newcommand{\RETURN}[1]{\key{(Return}~#1\key{)}}
 \newcommand{\RETURN}[1]{\key{(Return}~#1\key{)}}