|
@@ -4130,7 +4130,7 @@ generated lexer for \LangInt{} converts the string
|
|
|
\end{lstlisting}
|
|
|
\noindent into the following sequence of token objects
|
|
|
\begin{center}
|
|
|
-\begin{minipage}{0.5\textwidth}
|
|
|
+\begin{minipage}{0.95\textwidth}
|
|
|
\begin{lstlisting}
|
|
|
Token('PRINT', 'print')
|
|
|
Token('LPAR', '(')
|
|
@@ -4211,7 +4211,7 @@ by a colon and then a regular expression surrounded by \code{/}
|
|
|
characters. For example, the \code{DIGIT}, \code{INT}, and
|
|
|
\code{NEWLINE} types of tokens are specified in the following way.
|
|
|
\begin{center}
|
|
|
-\begin{minipage}{0.5\textwidth}
|
|
|
+\begin{minipage}{0.95\textwidth}
|
|
|
\begin{lstlisting}
|
|
|
DIGIT: /[0-9]/
|
|
|
INT: "-"? DIGIT+
|
|
@@ -4256,11 +4256,23 @@ BNF that we use in this book. In particular, the notation $::=$ is
|
|
|
replaced by a single colon and the use of typewriter font for string
|
|
|
literals is replaced by quotation marks. The following serves as a
|
|
|
first draft of a Lark grammar for \LangInt{}.
|
|
|
+\begin{center}
|
|
|
+\begin{minipage}{0.95\textwidth}
|
|
|
\begin{lstlisting}[escapechar=$]
|
|
|
-exp: INT | "input_int""("")" | "-"exp | exp"+"exp | exp"-"exp | "(" exp ")"
|
|
|
-stmt: "print""(" exp ")" | exp
|
|
|
+exp: INT
|
|
|
+ | "input_int" "(" ")"
|
|
|
+ | "-" exp
|
|
|
+ | exp "+" exp
|
|
|
+ | exp "-" exp
|
|
|
+ | "(" exp ")"
|
|
|
+
|
|
|
+stmt: "print" "(" exp ")"
|
|
|
+ | exp
|
|
|
+
|
|
|
lang_int: (stmt NEWLINE)*
|
|
|
\end{lstlisting}
|
|
|
+\end{minipage}
|
|
|
+\end{center}
|
|
|
|
|
|
Let us begin by discussing the rule \code{exp: INT}. In
|
|
|
Section~\ref{sec:grammar} we defined the corresponding \Int{}
|
|
@@ -4286,7 +4298,7 @@ symbol that is a substring of the input program. The parse tree for
|
|
|
\begin{figure}[tbp]
|
|
|
\begin{tcolorbox}[colback=white]
|
|
|
\centering
|
|
|
-\includegraphics[width=0.5\textwidth]{figs/simple-parse-tree}
|
|
|
+\includegraphics[width=2.0in]{figs/simple-parse-tree}
|
|
|
\end{tcolorbox}
|
|
|
\caption{The parse tree for \code{'1+3'}.}
|
|
|
\label{fig:simple-parse-tree}
|
|
@@ -4306,15 +4318,47 @@ the nodes from the parser are \code{Tree} objects. Each \code{Tree}
|
|
|
object has a \code{data} field containing the name of the nonterminal
|
|
|
for the grammar rule that was applied. Each \code{Tree} object also
|
|
|
has a \code{children} field that is a list containing trees and/or
|
|
|
-tokens. Note that Lark does not produce nodes for the string literals
|
|
|
-in the grammar. For example, the \code{Tree} node for the addition
|
|
|
-expression has two children
|
|
|
+tokens. Note that Lark does not produce nodes for string literals in
|
|
|
+the grammar. For example, the \code{Tree} node for the addition
|
|
|
+expression has only two children for the two integers but is missing
|
|
|
+its middle child for the \code{"+"} terminal. This would be
|
|
|
+problematic except that Lark provides a mechanism for customizing the
|
|
|
+\code{data} field of each \code{Tree} node based on which rule was
|
|
|
+applied. Next to each alternative in a grammar rule, write \code{->}
|
|
|
+followed by a string that you would like to appear in the \code{data}
|
|
|
+field. The following is a second draft of a Lark grammar for
|
|
|
+\LangInt{}, this time with more specific labels on the \code{Tree}
|
|
|
+nodes.
|
|
|
+\begin{center}
|
|
|
+\begin{minipage}{0.95\textwidth}
|
|
|
+\begin{lstlisting}[escapechar=$]
|
|
|
+exp: INT -> int
|
|
|
+ | "input_int" "(" ")" -> input_int
|
|
|
+ | "-" exp -> usub
|
|
|
+ | exp "+" exp -> add
|
|
|
+ | exp "-" exp -> sub
|
|
|
+ | "(" exp ")" -> paren
|
|
|
|
|
|
+stmt: "print" "(" exp ")" -> print
|
|
|
+ | exp -> expr
|
|
|
+
|
|
|
+lang_int: (stmt NEWLINE)* -> module
|
|
|
+\end{lstlisting}
|
|
|
+\end{minipage}
|
|
|
+\end{center}
|
|
|
+The resulting parse tree
|
|
|
+\begin{lstlisting}
|
|
|
+Tree('module',
|
|
|
+ [Tree('expr', [Tree('add', [Tree('int', [Token('INT', '1')]),
|
|
|
+ Tree('int', [Token('INT', '3')])])]),
|
|
|
+ Token('NEWLINE', '\n')])
|
|
|
+\end{lstlisting}
|
|
|
|
|
|
+\subsection{Ambiguous Grammars}
|
|
|
|
|
|
A grammar is \emph{ambiguous}\index{subject}{ambiguous} when there are
|
|
|
strings that can be parsed in more than one way. For example, consider
|
|
|
-the string \code{'1+2+3'}. This string can parsed in two different
|
|
|
+the string \code{'1-2+3'}. This string can parsed in two different
|
|
|
ways using our draft grammar, resulting in the two parse trees shown
|
|
|
in figure~\ref{fig:ambig-parse-tree}.
|
|
|
|
|
@@ -4323,7 +4367,7 @@ in figure~\ref{fig:ambig-parse-tree}.
|
|
|
\centering
|
|
|
\includegraphics[width=0.95\textwidth]{figs/ambig-parse-tree}
|
|
|
\end{tcolorbox}
|
|
|
-\caption{The two parse trees for \code{'1+2+3'}.}
|
|
|
+\caption{The two parse trees for \code{'1-2+3'}.}
|
|
|
\label{fig:ambig-parse-tree}
|
|
|
\end{figure}
|
|
|
|