Browse Source

a little python progress

Jeremy Siek 3 years ago
parent
commit
45f3721d22
2 changed files with 154 additions and 89 deletions
  1. 151 88
      book.tex
  2. 3 1
      defs.tex

+ 151 - 88
book.tex

@@ -23,7 +23,7 @@
 
 
 \def\racketEd{0}
 \def\racketEd{0}
 \def\pythonEd{1}
 \def\pythonEd{1}
-\def\edition{0}
+\def\edition{1}
 
 
 % material that is specific to the Racket edition of the book
 % material that is specific to the Racket edition of the book
 \newcommand{\racket}[1]{{\if\edition\racketEd\color{olive}{#1}\fi}}
 \newcommand{\racket}[1]{{\if\edition\racketEd\color{olive}{#1}\fi}}
@@ -379,7 +379,7 @@ compiler course at IU.
 
 
 We thank professors Bor-Yuh Chang, John Clements, Jay McCarthy, Joseph
 We thank professors Bor-Yuh Chang, John Clements, Jay McCarthy, Joseph
 Near, Ryan Newton, Nate Nystrom, Andrew Tolmach, and Michael Wollowski
 Near, Ryan Newton, Nate Nystrom, Andrew Tolmach, and Michael Wollowski
-for teaching courses based on early drafts of this book and for their
+for teaching courses based on drafts of this book and for their
 invaluable feedback.
 invaluable feedback.
 
 
 We thank Ronald Garcia for helping Jeremy survive Dybvig's compiler
 We thank Ronald Garcia for helping Jeremy survive Dybvig's compiler
@@ -686,13 +686,13 @@ integers and arithmetic operations.
 \index{subject}{grammar}
 \index{subject}{grammar}
 
 
 The first grammar rule for the abstract syntax of \LangInt{} says that an
 The first grammar rule for the abstract syntax of \LangInt{} says that an
-instance of the \code{Int} structure is an expression:
+instance of the \racket{\code{Int} structure}\python{\code{Constant} class} is an expression:
 \begin{equation}
 \begin{equation}
 \Exp ::= \INT{\Int}  \label{eq:arith-int}
 \Exp ::= \INT{\Int}  \label{eq:arith-int}
 \end{equation}
 \end{equation}
 %
 %
-Each rule has a left-hand-side and a right-hand-side. The way to read
-a rule is that if you have an AST node that matches the
+Each rule has a left-hand-side and a right-hand-side.
+If you have an AST node that matches the
 right-hand-side, then you can categorize it according to the
 right-hand-side, then you can categorize it according to the
 left-hand-side.
 left-hand-side.
 %
 %
@@ -700,19 +700,19 @@ A name such as $\Exp$ that is defined by the grammar rules is a
 \emph{non-terminal}.  \index{subject}{non-terminal}
 \emph{non-terminal}.  \index{subject}{non-terminal}
 %
 %
 The name $\Int$ is also a non-terminal, but instead of defining it
 The name $\Int$ is also a non-terminal, but instead of defining it
-with a grammar rule, we define it with the following explanation.  We
-make the simplifying design decision that all of the languages in this
-book only handle machine-representable integers.  On most modern
-machines this corresponds to integers represented with 64-bits, i.e.,
-the in range $-2^{63}$ to $2^{63}-1$.  We restrict this range further
-to match the Racket \texttt{fixnum} datatype, which allows 63-bit
-integers on a 64-bit machine. So an $\Int$ is a sequence of decimals
-($0$ to $9$), possibly starting with $-$ (for negative integers), such
-that the sequence of decimals represent an integer in range $-2^{62}$
-to $2^{62}-1$.
-
-The second grammar rule is the \texttt{read} operation that receives
-an input integer from the user of the program.
+with a grammar rule, we define it with the following explanation.  An
+$\Int$ is a sequence of decimals ($0$ to $9$), possibly starting with
+$-$ (for negative integers), such that the sequence of decimals
+represent an integer in range $-2^{62}$ to $2^{62}-1$.  This enables
+the representation of integers using 63 bits, which simplifies several
+aspects of compilation. \racket{Thus, these integers corresponds to
+  the Racket \texttt{fixnum} datatype on a 64-bit machine.}
+\python{In contrast, integers in Python have unlimited precision, but
+  the techniques need to handle unlimited precision fall outside the
+  scope of this book.}
+
+The second grammar rule is the \READOP{} operation that receives an
+input integer from the user of the program.
 \begin{equation}
 \begin{equation}
   \Exp ::= \READ{} \label{eq:arith-read}
   \Exp ::= \READ{} \label{eq:arith-read}
 \end{equation}
 \end{equation}
@@ -762,12 +762,11 @@ to show that
 is an $\Exp$ in the \LangInt{} language.
 is an $\Exp$ in the \LangInt{} language.
 
 
 If you have an AST for which the above rules do not apply, then the
 If you have an AST for which the above rules do not apply, then the
-AST is not in \LangInt{}. For example, the program
-\racket{\code{(- (read) 8)}}
-\python{\code{input\_int() - 8}}
-is not in \LangInt{} because there are no rules for \key{-} with two arguments.
-Whenever we define a language with a grammar, the language only includes those
-programs that are justified by the rules.
+AST is not in \LangInt{}. For example, the program \racket{\code{(-
+    (read) 8)}} \python{\code{input\_int() - 8}} is not in \LangInt{}
+because there are no rules for the \key{-} operator with two
+arguments.  Whenever we define a language with a grammar, the language
+only includes those programs that are justified by the rules.
 
 
 {\if\edition\pythonEd\color{purple}
 {\if\edition\pythonEd\color{purple}
 The language \LangInt{} includes a second non-terminal $\Stmt$ for statements.
 The language \LangInt{} includes a second non-terminal $\Stmt$ for statements.
@@ -803,7 +802,7 @@ The last grammar rule for \LangInt{} states that there is a
   \LangInt{} ::= \PROGRAM{}{\Stmt^{*}}
   \LangInt{} ::= \PROGRAM{}{\Stmt^{*}}
 \]
 \]
 The asterisk symbol $*$ indicates a list of the preceding grammar item, in
 The asterisk symbol $*$ indicates a list of the preceding grammar item, in
-this case, a list of statments.
+this case, a list of statements.
 %
 %
 The \code{Module} class is defined as follows
 The \code{Module} class is defined as follows
 \begin{lstlisting}
 \begin{lstlisting}
@@ -900,34 +899,39 @@ defined in Figure~\ref{fig:r0-concrete-syntax}.
 \label{sec:pattern-matching}
 \label{sec:pattern-matching}
 
 
 As mentioned in Section~\ref{sec:ast}, compilers often need to access
 As mentioned in Section~\ref{sec:ast}, compilers often need to access
-the parts of an AST node. Racket provides the \texttt{match} form to
-access the parts of a structure. Consider the following example and
-the output on the right. \index{subject}{match} \index{subject}{pattern matching}
+the parts of an AST node. \racket{Racket}\python{Python} provides the
+\texttt{match} feature to access the parts of a value.
+Consider the following example. \index{subject}{match} \index{subject}{pattern matching}
 \begin{center}
 \begin{center}
 \begin{minipage}{0.5\textwidth}
 \begin{minipage}{0.5\textwidth}
+{\if\edition\racketEd\color{olive}
 \begin{lstlisting}
 \begin{lstlisting}
 (match ast1.1
 (match ast1.1
   [(Prim op (list child1 child2))
   [(Prim op (list child1 child2))
     (print op)])
     (print op)])
 \end{lstlisting}
 \end{lstlisting}
-\end{minipage}
-\vrule
-\begin{minipage}{0.25\textwidth}
+\fi}
+{\if\edition\pythonEd\color{purple}
 \begin{lstlisting}
 \begin{lstlisting}
-
-
-   '+
+match ast1_1:
+    case BinOp(child1, op, child2):
+        print(op)
 \end{lstlisting}
 \end{lstlisting}
+\fi}  
 \end{minipage}
 \end{minipage}
 \end{center}
 \end{center}
-In the above example, the \texttt{match} form takes an AST
-\eqref{eq:arith-prog} and binds its parts to the three pattern
-variables \texttt{op}, \texttt{child1}, and \texttt{child2}, and then
-prints out the operator. In general, a match clause consists of a
-\emph{pattern} and a \emph{body}.\index{subject}{pattern} Patterns are
-recursively defined to be either a pattern variable, a structure name
-followed by a pattern for each of the structure's arguments, or an
-S-expression (symbols, lists, etc.).  (See Chapter 12 of The Racket
+
+{\if\edition\racketEd\color{olive}
+%
+In the above example, the \texttt{match} form checks whether the AST
+\eqref{eq:arith-prog} is a binary operator and binds its parts to the
+three pattern variables \texttt{op}, \texttt{child1}, and
+\texttt{child2}, and then prints out the operator. In general, a match
+clause consists of a \emph{pattern} and a
+\emph{body}.\index{subject}{pattern} Patterns are recursively defined
+to be either a pattern variable, a structure name followed by a
+pattern for each of the structure's arguments, or an S-expression
+(symbols, lists, etc.).  (See Chapter 12 of The Racket
 Guide\footnote{\url{https://docs.racket-lang.org/guide/match.html}}
 Guide\footnote{\url{https://docs.racket-lang.org/guide/match.html}}
 and Chapter 9 of The Racket
 and Chapter 9 of The Racket
 Reference\footnote{\url{https://docs.racket-lang.org/reference/match.html}}
 Reference\footnote{\url{https://docs.racket-lang.org/reference/match.html}}
@@ -936,30 +940,73 @@ for a complete description of \code{match}.)
 The body of a match clause may contain arbitrary Racket code.  The
 The body of a match clause may contain arbitrary Racket code.  The
 pattern variables can be used in the scope of the body, such as
 pattern variables can be used in the scope of the body, such as
 \code{op} in \code{(print op)}.
 \code{op} in \code{(print op)}.
+%
+\fi}
+%
+%
+{\if\edition\pythonEd\color{purple}
+%  
+In the above example, the \texttt{match} form checks whether the AST
+\eqref{eq:arith-prog} is a binary operator and binds its parts to the
+three pattern variables \texttt{child1}, \texttt{op}, and
+\texttt{child2}, and then prints out the operator. In general, each
+\code{case} consists of a \emph{pattern} and a
+\emph{body}.\index{subject}{pattern} Patterns are recursively defined
+to be either a pattern variable, a class name followed by a pattern
+for each of its constructor's arguments, or other literals such as
+strings, lists, etc.
+%
+The body of each \code{case} may contain arbitrary Python code. The
+pattern variables can be used in the body, such as \code{op} in
+\code{print(op)}.
+%
+\fi}
+
 
 
 A \code{match} form may contain several clauses, as in the following
 A \code{match} form may contain several clauses, as in the following
-function \code{leaf?} that recognizes when an \LangInt{} node is a leaf in
+function \code{leaf} that recognizes when an \LangInt{} node is a leaf in
 the AST. The \code{match} proceeds through the clauses in order,
 the AST. The \code{match} proceeds through the clauses in order,
 checking whether the pattern can match the input AST. The body of the
 checking whether the pattern can match the input AST. The body of the
-first clause that matches is executed. The output of \code{leaf?} for
+first clause that matches is executed. The output of \code{leaf} for
 several ASTs is shown on the right.
 several ASTs is shown on the right.
 \begin{center}
 \begin{center}
 \begin{minipage}{0.6\textwidth}
 \begin{minipage}{0.6\textwidth}
+{\if\edition\racketEd\color{olive}
 \begin{lstlisting}
 \begin{lstlisting}
-(define (leaf? arith)
+(define (leaf arith)
   (match arith
   (match arith
     [(Int n) #t]
     [(Int n) #t]
     [(Prim 'read '()) #t]
     [(Prim 'read '()) #t]
     [(Prim '- (list e1)) #f]
     [(Prim '- (list e1)) #f]
     [(Prim '+ (list e1 e2)) #f]))
     [(Prim '+ (list e1 e2)) #f]))
 
 
-(leaf? (Prim 'read '()))
-(leaf? (Prim '- (list (Int 8))))
-(leaf? (Int 8))
+(leaf (Prim 'read '()))
+(leaf (Prim '- (list (Int 8))))
+(leaf (Int 8))
 \end{lstlisting}
 \end{lstlisting}
+\fi}
+{\if\edition\pythonEd\color{purple}
+\begin{lstlisting}
+def leaf(arith):
+    match arith:
+        case Constant(n):
+            return True
+        case Call(Name('input_int'), []):
+            return True
+        case UnaryOp(USub(), e1):
+            return False
+        case BinOp(e1, Add(), e2):
+            return False
+
+print(leaf(Call(Name('input_int'), [])))
+print(leaf(UnaryOp(USub(), eight)))
+print(leaf(Constant(8)))
+\end{lstlisting}
+\fi}
 \end{minipage}
 \end{minipage}
 \vrule
 \vrule
 \begin{minipage}{0.25\textwidth}
 \begin{minipage}{0.25\textwidth}
+{\if\edition\racketEd\color{olive}  
   \begin{lstlisting}
   \begin{lstlisting}
 
 
 
 
@@ -972,19 +1019,34 @@ several ASTs is shown on the right.
    #f
    #f
    #t
    #t
 \end{lstlisting}
 \end{lstlisting}
+  \fi}
+{\if\edition\pythonEd\color{purple}
+  \begin{lstlisting}
+
+
+
+
+
+
+    
+   True
+   False
+   True
+\end{lstlisting}
+\fi}
 \end{minipage}
 \end{minipage}
 \end{center}
 \end{center}
 
 
 When writing a \code{match}, we refer to the grammar definition to
 When writing a \code{match}, we refer to the grammar definition to
 identify which non-terminal we are expecting to match against, then we
 identify which non-terminal we are expecting to match against, then we
-make sure that 1) we have one clause for each alternative of that
-non-terminal and 2) that the pattern in each clause corresponds to the
+make sure that 1) we have one \racket{clause}\python{case} for each alternative of that
+non-terminal and 2) that the pattern in each \racket{clause}\python{case} corresponds to the
 corresponding right-hand side of a grammar rule. For the \code{match}
 corresponding right-hand side of a grammar rule. For the \code{match}
-in the \code{leaf?} function, we refer to the grammar for \LangInt{} in
+in the \code{leaf} function, we refer to the grammar for \LangInt{} in
 Figure~\ref{fig:r0-syntax}. The $\Exp$ non-terminal has 4
 Figure~\ref{fig:r0-syntax}. The $\Exp$ non-terminal has 4
-alternatives, so the \code{match} has 4 clauses.  The pattern in each
-clause corresponds to the right-hand side of a grammar rule. For
-example, the pattern \code{(Prim '+ (list e1 e2))} corresponds to the
+alternatives, so the \code{match} has 4 \racket{clauses}\python{cases}.
+The pattern in each \racket{clause}\python{case} corresponds to the right-hand side
+of a grammar rule. For example, the pattern \ADD{\code{e1}}{\code{e2}} corresponds to the
 right-hand side $\ADD{\Exp}{\Exp}$. When translating from grammars to
 right-hand side $\ADD{\Exp}{\Exp}$. When translating from grammars to
 patterns, replace non-terminals such as $\Exp$ with pattern variables
 patterns, replace non-terminals such as $\Exp$ with pattern variables
 of your choice (e.g. \code{e1} and \code{e2}).
 of your choice (e.g. \code{e1} and \code{e2}).
@@ -997,17 +1059,18 @@ of your choice (e.g. \code{e1} and \code{e2}).
 Programs are inherently recursive. For example, an \LangInt{} expression is
 Programs are inherently recursive. For example, an \LangInt{} expression is
 often made of smaller expressions. Thus, the natural way to process an
 often made of smaller expressions. Thus, the natural way to process an
 entire program is with a recursive function.  As a first example of
 entire program is with a recursive function.  As a first example of
-such a recursive function, we define \texttt{exp?} below, which takes
+such a recursive function, we define \texttt{exp} below, which takes
 an arbitrary value and determines whether or not it is an \LangInt{}
 an arbitrary value and determines whether or not it is an \LangInt{}
 expression.
 expression.
 %
 %
 We say that a function is defined by \emph{structural recursion} when
 We say that a function is defined by \emph{structural recursion} when
-it is defined using a sequence of match clauses that correspond to a
-grammar, and the body of each clause makes a recursive call on each
+it is defined using a sequence of match \racket{clauses}\python{cases}
+that correspond to a grammar, and the body of each \racket{clause}\python{case}
+makes a recursive call on each
 child node.\footnote{This principle of structuring code according to
 child node.\footnote{This principle of structuring code according to
   the data definition is advocated in the book \emph{How to Design
   the data definition is advocated in the book \emph{How to Design
     Programs} \url{https://htdp.org/2020-8-1/Book/index.html}.}.
     Programs} \url{https://htdp.org/2020-8-1/Book/index.html}.}.
-Below we also define a second function, named \code{Rint?}, that
+Below we also define a second function, named \code{Rint}, that
 determines whether an AST is an \LangInt{} program.  In general we can
 determines whether an AST is an \LangInt{} program.  In general we can
 expect to write one recursive function to handle each non-terminal in
 expect to write one recursive function to handle each non-terminal in
 a grammar.\index{subject}{structural recursion}
 a grammar.\index{subject}{structural recursion}
@@ -1015,22 +1078,22 @@ a grammar.\index{subject}{structural recursion}
 \begin{center}
 \begin{center}
 \begin{minipage}{0.7\textwidth}
 \begin{minipage}{0.7\textwidth}
 \begin{lstlisting}
 \begin{lstlisting}
-(define (exp? ast)
+(define (exp ast)
   (match ast
   (match ast
     [(Int n) #t]
     [(Int n) #t]
     [(Prim 'read '()) #t]
     [(Prim 'read '()) #t]
-    [(Prim '- (list e)) (exp? e)]
+    [(Prim '- (list e)) (exp e)]
     [(Prim '+ (list e1 e2))
     [(Prim '+ (list e1 e2))
-      (and (exp? e1) (exp? e2))]
+      (and (exp e1) (exp e2))]
     [else #f]))
     [else #f]))
 
 
-(define (Rint? ast)
+(define (Rint ast)
   (match ast
   (match ast
-    [(Program '() e) (exp? e)]
+    [(Program '() e) (exp e)]
     [else #f]))
     [else #f]))
 
 
-(Rint? (Program '() ast1.1)
-(Rint? (Program '()
+(Rint (Program '() ast1.1)
+(Rint (Program '()
        (Prim '- (list (Prim 'read '())
        (Prim '- (list (Prim 'read '())
                       (Prim '+ (list (Num 8)))))))
                       (Prim '+ (list (Num 8)))))))
 \end{lstlisting}
 \end{lstlisting}
@@ -1058,29 +1121,29 @@ a grammar.\index{subject}{structural recursion}
 \end{center}
 \end{center}
 
 
 
 
-You may be tempted to merge the two functions into one, like this:
-\begin{center}
-\begin{minipage}{0.5\textwidth}
-\begin{lstlisting}
-(define (Rint? ast)
-  (match ast
-    [(Int n) #t]
-    [(Prim 'read '()) #t]
-    [(Prim '- (list e)) (Rint? e)]
-    [(Prim '+ (list e1 e2)) (and (Rint? e1) (Rint? e2))]
-    [(Program '() e) (Rint? e)]
-    [else #f]))
-\end{lstlisting}
-\end{minipage}
-\end{center}
-%
-Sometimes such a trick will save a few lines of code, especially when
-it comes to the \code{Program} wrapper.  Yet this style is generally
-\emph{not} recommended because it can get you into trouble.
-%
-For example, the above function is subtly wrong:
-\lstinline{(Rint? (Program '() (Program '() (Int 3))))}
-returns true when it should return false.
+%% You may be tempted to merge the two functions into one, like this:
+%% \begin{center}
+%% \begin{minipage}{0.5\textwidth}
+%% \begin{lstlisting}
+%% (define (Rint ast)
+%%   (match ast
+%%     [(Int n) #t]
+%%     [(Prim 'read '()) #t]
+%%     [(Prim '- (list e)) (Rint e)]
+%%     [(Prim '+ (list e1 e2)) (and (Rint e1) (Rint e2))]
+%%     [(Program '() e) (Rint e)]
+%%     [else #f]))
+%% \end{lstlisting}
+%% \end{minipage}
+%% \end{center}
+%% %
+%% Sometimes such a trick will save a few lines of code, especially when
+%% it comes to the \code{Program} wrapper.  Yet this style is generally
+%% \emph{not} recommended because it can get you into trouble.
+%% %
+%% For example, the above function is subtly wrong:
+%% \lstinline{(Rint (Program '() (Program '() (Int 3))))}
+%% returns true when it should return false.
 
 
 
 
 \section{Interpreters}
 \section{Interpreters}

+ 3 - 1
defs.tex

@@ -112,6 +112,7 @@
 \newcommand{\RS}[0]{\key{]}}
 \newcommand{\RS}[0]{\key{]}}
 \if\edition\racketEd
 \if\edition\racketEd
 \newcommand{\INT}[1]{{\color{olive}\key{(Int}~#1\key{)}}}
 \newcommand{\INT}[1]{{\color{olive}\key{(Int}~#1\key{)}}}
+\newcommand{\READOP}{{\color{olive}\key{read}}}
 \newcommand{\READ}{{\color{olive}\key{(Prim}~\code{read}~\key{())}}}
 \newcommand{\READ}{{\color{olive}\key{(Prim}~\code{read}~\key{())}}}
 \newcommand{\NEG}[1]{{\color{olive}\key{(Prim}~\code{-}~\code{(}#1\code{))}}}
 \newcommand{\NEG}[1]{{\color{olive}\key{(Prim}~\code{-}~\code{(}#1\code{))}}}
 \newcommand{\ADD}[2]{{\color{olive}\key{(Prim}~\code{+}~\code{(}#1~#2\code{))}}}
 \newcommand{\ADD}[2]{{\color{olive}\key{(Prim}~\code{+}~\code{(}#1~#2\code{))}}}
@@ -119,9 +120,10 @@
 \fi
 \fi
 \if\edition\pythonEd
 \if\edition\pythonEd
 \newcommand{\INT}[1]{{\color{purple}\key{Constant(}#1\key{)}}}
 \newcommand{\INT}[1]{{\color{purple}\key{Constant(}#1\key{)}}}
+\newcommand{\READOP}{{\color{purple}\key{input\_int}}}
 \newcommand{\READ}{{\color{purple}\key{Call(Name('input\_int'),[])}}}
 \newcommand{\READ}{{\color{purple}\key{Call(Name('input\_int'),[])}}}
 \newcommand{\NEG}[1]{{\color{purple}\key{UnaryOp(USub(),} #1\code{)}}}
 \newcommand{\NEG}[1]{{\color{purple}\key{UnaryOp(USub(),} #1\code{)}}}
-\newcommand{\ADD}[2]{{\color{purple}\key{BinOp(Add(),}#1\code{,}#2\code{)}}}
+\newcommand{\ADD}[2]{{\color{purple}\key{BinOp(Add()}\key{,}#1\code{,}#2\code{)}}}
 \newcommand{\PRINT}[1]{{\color{purple}\key{Call}\LP\key{Name}\LP\key{print}\RP\key{,}\LS#1\RS\RP}}
 \newcommand{\PRINT}[1]{{\color{purple}\key{Call}\LP\key{Name}\LP\key{print}\RP\key{,}\LS#1\RS\RP}}
 \newcommand{\EXPR}[1]{{\color{purple}\key{Expr}\LP #1\RP}}
 \newcommand{\EXPR}[1]{{\color{purple}\key{Expr}\LP #1\RP}}
 \newcommand{\PROGRAM}[2]{\code{Module}\LP #2\RP}
 \newcommand{\PROGRAM}[2]{\code{Module}\LP #2\RP}