|
@@ -4214,8 +4214,8 @@ though a parser framework does most of the work for us, using one
|
|
properly requires some knowledge. In particular, we must learn about
|
|
properly requires some knowledge. In particular, we must learn about
|
|
its specification languages and we must learn how to deal with
|
|
its specification languages and we must learn how to deal with
|
|
ambiguity in our language specifications. Also, some algorithms, such
|
|
ambiguity in our language specifications. Also, some algorithms, such
|
|
-as LALR(1) place restrictions on the grammars they can handle, in
|
|
|
|
-which case it helps to know the algorithm when trying to decipher the
|
|
|
|
|
|
+as LALR(1), place restrictions on the grammars they can handle, in
|
|
|
|
+which case knowing the algorithm help with trying to decipher the
|
|
error messages.
|
|
error messages.
|
|
|
|
|
|
The process of parsing is traditionally subdivided into two phases:
|
|
The process of parsing is traditionally subdivided into two phases:
|
|
@@ -4239,7 +4239,7 @@ and the use of a slower but more powerful algorithm for parsing.
|
|
%
|
|
%
|
|
The Lark parser framework that we use in this chapter includes both
|
|
The Lark parser framework that we use in this chapter includes both
|
|
lexical analyzers and parsers. The next section discusses lexical
|
|
lexical analyzers and parsers. The next section discusses lexical
|
|
-analysis and the remainder of the chapter discusses parsing.
|
|
|
|
|
|
+analysis, and the remainder of the chapter discusses parsing.
|
|
|
|
|
|
|
|
|
|
\section{Lexical Analysis and Regular Expressions}
|
|
\section{Lexical Analysis and Regular Expressions}
|
|
@@ -4251,7 +4251,7 @@ generated lexer for \LangInt{} converts the string
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
'print(1 + 3)'
|
|
'print(1 + 3)'
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
-\noindent into the following sequence of token objects
|
|
|
|
|
|
+\noindent into the following sequence of token objects:
|
|
\begin{center}
|
|
\begin{center}
|
|
\begin{minipage}{0.95\textwidth}
|
|
\begin{minipage}{0.95\textwidth}
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
@@ -4265,8 +4265,8 @@ Token('NEWLINE', '\n')
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
\end{minipage}
|
|
\end{minipage}
|
|
\end{center}
|
|
\end{center}
|
|
-Each token includes a field for its \code{type}, such as \code{'INT'},
|
|
|
|
-and a field for its \code{value}, such as \code{'1'}.
|
|
|
|
|
|
+Each token includes a field for its \code{type}, such as \skey{INT},
|
|
|
|
+and a field for its \code{value}, such as \skey{1}.
|
|
|
|
|
|
Following in the tradition of \code{lex}~\citep{Lesk:1975uq}, the
|
|
Following in the tradition of \code{lex}~\citep{Lesk:1975uq}, the
|
|
specification language for Lark's lexer is one regular expression for
|
|
specification language for Lark's lexer is one regular expression for
|
|
@@ -4278,20 +4278,20 @@ pattern formed of the following core elements:\index{subject}{regular
|
|
empty regular expression that matches any zero-length part of a
|
|
empty regular expression that matches any zero-length part of a
|
|
string, but Lark does not support the empty regular expression.}
|
|
string, but Lark does not support the empty regular expression.}
|
|
\begin{itemize}
|
|
\begin{itemize}
|
|
-\item A single character $c$ is a regular expression and it only
|
|
|
|
- matches itself. For example, the regular expression \code{a} only
|
|
|
|
- matches with the string \code{'a'}.
|
|
|
|
|
|
+\item A single character $c$ is a regular expression, and it matches
|
|
|
|
+ only itself. For example, the regular expression \code{a} matches
|
|
|
|
+ only the string \skey{a}.
|
|
|
|
|
|
\item Two regular expressions separated by a vertical bar $R_1 \ttm{|}
|
|
\item Two regular expressions separated by a vertical bar $R_1 \ttm{|}
|
|
R_2$ form a regular expression that matches any string that matches
|
|
R_2$ form a regular expression that matches any string that matches
|
|
$R_1$ or $R_2$. For example, the regular expression \code{a|c}
|
|
$R_1$ or $R_2$. For example, the regular expression \code{a|c}
|
|
- matches the string \code{'a'} and the string \code{'c'}.
|
|
|
|
|
|
+ matches the string \skey{a} and the string \skey{c}.
|
|
|
|
|
|
\item Two regular expressions in sequence $R_1 R_2$ form a regular
|
|
\item Two regular expressions in sequence $R_1 R_2$ form a regular
|
|
expression that matches any string that can be formed by
|
|
expression that matches any string that can be formed by
|
|
concatenating two strings, where the first string matches $R_1$ and
|
|
concatenating two strings, where the first string matches $R_1$ and
|
|
the second string matches $R_2$. For example, the regular expression
|
|
the second string matches $R_2$. For example, the regular expression
|
|
- \code{(a|c)b} matches the strings \code{'ab'} and \code{'cb'}.
|
|
|
|
|
|
+ \code{(a|c)b} matches the strings \skey{ab} and \skey{cb}.
|
|
(Parentheses can be used to control the grouping of operators within
|
|
(Parentheses can be used to control the grouping of operators within
|
|
a regular expression.)
|
|
a regular expression.)
|
|
|
|
|
|
@@ -4299,8 +4299,8 @@ pattern formed of the following core elements:\index{subject}{regular
|
|
Kleene closure) is a regular expression that matches any string that
|
|
Kleene closure) is a regular expression that matches any string that
|
|
can be formed by concatenating zero or more strings that each match
|
|
can be formed by concatenating zero or more strings that each match
|
|
the regular expression $R$. For example, the regular expression
|
|
the regular expression $R$. For example, the regular expression
|
|
- \code{"((a|c)b)*"} matches the strings \code{'abcbab'} but not
|
|
|
|
- \code{'abc'}.
|
|
|
|
|
|
+ \code{((a|c)b)*} matches the string \skey{abcbab} but not
|
|
|
|
+ \skey{abc}.
|
|
\end{itemize}
|
|
\end{itemize}
|
|
|
|
|
|
For our convenience, Lark also accepts the following extended set of
|
|
For our convenience, Lark also accepts the following extended set of
|
|
@@ -4310,7 +4310,7 @@ regular expressions.
|
|
\begin{itemize}
|
|
\begin{itemize}
|
|
\item A set of characters enclosed in square brackets $[c_1 c_2 \ldots
|
|
\item A set of characters enclosed in square brackets $[c_1 c_2 \ldots
|
|
c_n]$ is a regular expression that matches any one of the
|
|
c_n]$ is a regular expression that matches any one of the
|
|
- characters. So $[c_1 c_2 \ldots c_n]$ is equivalent to
|
|
|
|
|
|
+ characters. So, $[c_1 c_2 \ldots c_n]$ is equivalent to
|
|
the regular expression $c_1\mid c_2\mid \ldots \mid c_n$.
|
|
the regular expression $c_1\mid c_2\mid \ldots \mid c_n$.
|
|
\item A range of characters enclosed in square brackets $[c_1\ttm{-}c_2]$ is
|
|
\item A range of characters enclosed in square brackets $[c_1\ttm{-}c_2]$ is
|
|
a regular expression that matches any character between $c_1$ and
|
|
a regular expression that matches any character between $c_1$ and
|
|
@@ -4320,19 +4320,21 @@ regular expressions.
|
|
is a regular expression that matches any string that can
|
|
is a regular expression that matches any string that can
|
|
be formed by concatenating one or more strings that each match $R$.
|
|
be formed by concatenating one or more strings that each match $R$.
|
|
So $R+$ is equivalent to $R(R*)$. For example, \code{[a-z]+}
|
|
So $R+$ is equivalent to $R(R*)$. For example, \code{[a-z]+}
|
|
- matches \code{'b'} and \code{'bzca'}.
|
|
|
|
|
|
+ matches \skey{b} and \skey{bzca}.
|
|
\item A regular expression followed by a question mark $R\ttm{?}$
|
|
\item A regular expression followed by a question mark $R\ttm{?}$
|
|
is a regular expression that matches any string that either
|
|
is a regular expression that matches any string that either
|
|
- matches $R$ or that is the empty string.
|
|
|
|
- For example, \code{a?b} matches both \code{'ab'} and \code{'b'}.
|
|
|
|
-\item A string, such as \code{"hello"}, which matches itself,
|
|
|
|
- that is, \code{'hello'}.
|
|
|
|
|
|
+ matches $R$ or is the empty string.
|
|
|
|
+ For example, \code{a?b} matches both \skey{ab} and \skey{b}.
|
|
\end{itemize}
|
|
\end{itemize}
|
|
|
|
|
|
-In a Lark grammar file, specify a name for each type of token followed
|
|
|
|
-by a colon and then a regular expression surrounded by \code{/}
|
|
|
|
-characters. For example, the \code{DIGIT}, \code{INT}, and
|
|
|
|
-\code{NEWLINE} types of tokens are specified in the following way.
|
|
|
|
|
|
+In a Lark grammar file, each kind of token is specified by a
|
|
|
|
+\emph{terminal}\index{subject}{terminal} which is defined by a rule
|
|
|
|
+that consists of the name of the terminal followed by a colon followed
|
|
|
|
+by a sequence of literals. The literals include strings such as
|
|
|
|
+\code{"abc"}, regular expressions surrounded by \code{/} characters,
|
|
|
|
+terminal names, and literals composed using the regular expression
|
|
|
|
+operators ($+$, $*$, etc.). For example, the \code{DIGIT},
|
|
|
|
+\code{INT}, and \code{NEWLINE} terminals are specified as follows:
|
|
\begin{center}
|
|
\begin{center}
|
|
\begin{minipage}{0.95\textwidth}
|
|
\begin{minipage}{0.95\textwidth}
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
@@ -4343,10 +4345,6 @@ NEWLINE: (/\r/? /\n/)+
|
|
\end{minipage}
|
|
\end{minipage}
|
|
\end{center}
|
|
\end{center}
|
|
|
|
|
|
-\noindent In Lark, the regular expression operators can be used both
|
|
|
|
-inside a regular expression, that is, between the \code{/} characters,
|
|
|
|
-and they can be used to combine regular expressions, outside the
|
|
|
|
-\code{/} characters.
|
|
|
|
|
|
|
|
\section{Grammars and Parse Trees}
|
|
\section{Grammars and Parse Trees}
|
|
\label{sec:CFG}
|
|
\label{sec:CFG}
|
|
@@ -4356,16 +4354,15 @@ specify the abstract syntax of a language. We now take a closer look
|
|
at using grammar rules to specify the concrete syntax. Recall that
|
|
at using grammar rules to specify the concrete syntax. Recall that
|
|
each rule has a left-hand side and a right-hand side where the
|
|
each rule has a left-hand side and a right-hand side where the
|
|
left-hand side is a nonterminal and the right-hand side is a pattern
|
|
left-hand side is a nonterminal and the right-hand side is a pattern
|
|
-that defines what can be parsed as that nonterminal.
|
|
|
|
-For concrete syntax, each right-hand side expresses a pattern for a
|
|
|
|
-string, instead of a pattern for an abstract syntax tree. In
|
|
|
|
-particular, each right-hand side is a sequence of
|
|
|
|
|
|
+that defines what can be parsed as that nonterminal. For concrete
|
|
|
|
+syntax, each right-hand side expresses a pattern for a string, instead
|
|
|
|
+of a pattern for an abstract syntax tree. In particular, each
|
|
|
|
+right-hand side is a sequence of
|
|
\emph{symbols}\index{subject}{symbol}, where a symbol is either a
|
|
\emph{symbols}\index{subject}{symbol}, where a symbol is either a
|
|
-terminal or nonterminal. A \emph{terminal}\index{subject}{terminal} is
|
|
|
|
-a string. The nonterminals play the same role as in the abstract
|
|
|
|
-syntax, defining categories of syntax. The nonterminals of a grammar
|
|
|
|
-include the tokens defined in the lexer and all the nonterminals
|
|
|
|
-defined by the grammar rules.
|
|
|
|
|
|
+terminal or a nonterminal. The nonterminals play the same role as in
|
|
|
|
+the abstract syntax, defining categories of syntax. The nonterminals
|
|
|
|
+of a grammar include the tokens defined in the lexer and all the
|
|
|
|
+nonterminals defined by the grammar rules.
|
|
|
|
|
|
As an example, let us take a closer look at the concrete syntax of the
|
|
As an example, let us take a closer look at the concrete syntax of the
|
|
\LangInt{} language, repeated here.
|
|
\LangInt{} language, repeated here.
|
|
@@ -4379,7 +4376,7 @@ As an example, let us take a closer look at the concrete syntax of the
|
|
\]
|
|
\]
|
|
The Lark syntax for grammar rules differs slightly from the variant of
|
|
The Lark syntax for grammar rules differs slightly from the variant of
|
|
BNF that we use in this book. In particular, the notation $::=$ is
|
|
BNF that we use in this book. In particular, the notation $::=$ is
|
|
-replaced by a single colon and the use of typewriter font for string
|
|
|
|
|
|
+replaced by a single colon, and the use of typewriter font for string
|
|
literals is replaced by quotation marks. The following grammar serves
|
|
literals is replaced by quotation marks. The following grammar serves
|
|
as a first draft of a Lark grammar for \LangInt{}.
|
|
as a first draft of a Lark grammar for \LangInt{}.
|
|
\begin{center}
|
|
\begin{center}
|
|
@@ -4400,25 +4397,25 @@ lang_int: stmt_list
|
|
\end{minipage}
|
|
\end{minipage}
|
|
\end{center}
|
|
\end{center}
|
|
|
|
|
|
-Let us begin by discussing the rule \code{exp: INT} which says that if
|
|
|
|
-the lexer matches a string to \code{INT}, then the parser also
|
|
|
|
|
|
+Let us begin by discussing the rule \code{exp: INT}, which says that
|
|
|
|
+if the lexer matches a string to \code{INT}, then the parser also
|
|
categorizes the string as an \code{exp}. Recall that in
|
|
categorizes the string as an \code{exp}. Recall that in
|
|
-Section~\ref{sec:grammar} we defined the corresponding \Int{}
|
|
|
|
-nonterminal with an English sentence. Here we specify \code{INT} more
|
|
|
|
-formally using a type of token \code{INT} and its regular expression
|
|
|
|
-\code{"-"? DIGIT+}.
|
|
|
|
|
|
+section~\ref{sec:grammar} we defined the corresponding \Int{}
|
|
|
|
+nonterminal with a sentence in English. Here we specify \code{INT}
|
|
|
|
+more formally using a type of token \code{INT} and its regular
|
|
|
|
+expression \code{"-"? DIGIT+}.
|
|
|
|
|
|
The rule \code{exp: exp "+" exp} says that any string that matches
|
|
The rule \code{exp: exp "+" exp} says that any string that matches
|
|
\code{exp}, followed by the \code{+} character, followed by another
|
|
\code{exp}, followed by the \code{+} character, followed by another
|
|
string that matches \code{exp}, is itself an \code{exp}. For example,
|
|
string that matches \code{exp}, is itself an \code{exp}. For example,
|
|
-the string \code{'1+3'} is an \code{exp} because \code{'1'} and
|
|
|
|
-\code{'3'} are both \code{exp} by the rule \code{exp: INT}, and then
|
|
|
|
-the rule for addition applies to categorize \code{'1+3'} as an
|
|
|
|
|
|
+the string \lstinline{'1+3'} is an \code{exp} because \lstinline{'1'} and
|
|
|
|
+\lstinline{'3'} are both \code{exp} by the rule \code{exp: INT}, and then
|
|
|
|
+the rule for addition applies to categorize \lstinline{'1+3'} as an
|
|
\code{exp}. We can visualize the application of grammar rules to parse
|
|
\code{exp}. We can visualize the application of grammar rules to parse
|
|
a string using a \emph{parse tree}\index{subject}{parse tree}. Each
|
|
a string using a \emph{parse tree}\index{subject}{parse tree}. Each
|
|
internal node in the tree is an application of a grammar rule and is
|
|
internal node in the tree is an application of a grammar rule and is
|
|
labeled with its left-hand side nonterminal. Each leaf node is a
|
|
labeled with its left-hand side nonterminal. Each leaf node is a
|
|
-substring of the input program. The parse tree for \code{'1+3'} is
|
|
|
|
|
|
+substring of the input program. The parse tree for \lstinline{'1+3'} is
|
|
shown in figure~\ref{fig:simple-parse-tree}.
|
|
shown in figure~\ref{fig:simple-parse-tree}.
|
|
|
|
|
|
\begin{figure}[tbp]
|
|
\begin{figure}[tbp]
|
|
@@ -4426,11 +4423,11 @@ shown in figure~\ref{fig:simple-parse-tree}.
|
|
\centering
|
|
\centering
|
|
\includegraphics[width=1.9in]{figs/simple-parse-tree}
|
|
\includegraphics[width=1.9in]{figs/simple-parse-tree}
|
|
\end{tcolorbox}
|
|
\end{tcolorbox}
|
|
-\caption{The parse tree for \code{'1+3'}.}
|
|
|
|
|
|
+\caption{The parse tree for \lstinline{'1+3'}.}
|
|
\label{fig:simple-parse-tree}
|
|
\label{fig:simple-parse-tree}
|
|
\end{figure}
|
|
\end{figure}
|
|
|
|
|
|
-The result of parsing \code{'1+3'} with this Lark grammar is the
|
|
|
|
|
|
+The result of parsing \lstinline{'1+3'} with this Lark grammar is the
|
|
following parse tree as represented by \code{Tree} and \code{Token}
|
|
following parse tree as represented by \code{Tree} and \code{Token}
|
|
objects.
|
|
objects.
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
@@ -4439,7 +4436,7 @@ objects.
|
|
Tree('exp', [Token('INT', '3')])])]),
|
|
Tree('exp', [Token('INT', '3')])])]),
|
|
Token('NEWLINE', '\n')])
|
|
Token('NEWLINE', '\n')])
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
-The nodes that come from the lexer are \code{Token} objects whereas
|
|
|
|
|
|
+The nodes that come from the lexer are \code{Token} objects, whereas
|
|
the nodes from the parser are \code{Tree} objects. Each \code{Tree}
|
|
the nodes from the parser are \code{Tree} objects. Each \code{Tree}
|
|
object has a \code{data} field containing the name of the nonterminal
|
|
object has a \code{data} field containing the name of the nonterminal
|
|
for the grammar rule that was applied. Each \code{Tree} object also
|
|
for the grammar rule that was applied. Each \code{Tree} object also
|
|
@@ -4449,9 +4446,9 @@ the grammar. For example, the \code{Tree} node for the addition
|
|
expression has only two children for the two integers but is missing
|
|
expression has only two children for the two integers but is missing
|
|
its middle child for the \code{"+"} terminal. This would be
|
|
its middle child for the \code{"+"} terminal. This would be
|
|
problematic except that Lark provides a mechanism for customizing the
|
|
problematic except that Lark provides a mechanism for customizing the
|
|
-\code{data} field of each \code{Tree} node based on which rule was
|
|
|
|
|
|
+\code{data} field of each \code{Tree} node on the basis of which rule was
|
|
applied. Next to each alternative in a grammar rule, write \code{->}
|
|
applied. Next to each alternative in a grammar rule, write \code{->}
|
|
-followed by a string that you would like to appear in the \code{data}
|
|
|
|
|
|
+followed by a string that you want to appear in the \code{data}
|
|
field. The following is a second draft of a Lark grammar for
|
|
field. The following is a second draft of a Lark grammar for
|
|
\LangInt{}, this time with more specific labels on the \code{Tree}
|
|
\LangInt{}, this time with more specific labels on the \code{Tree}
|
|
nodes.
|
|
nodes.
|
|
@@ -4487,7 +4484,7 @@ Tree('module',
|
|
|
|
|
|
A grammar is \emph{ambiguous}\index{subject}{ambiguous} when a string
|
|
A grammar is \emph{ambiguous}\index{subject}{ambiguous} when a string
|
|
can be parsed in more than one way. For example, consider the string
|
|
can be parsed in more than one way. For example, consider the string
|
|
-\code{'1-2+3'}. This string can parsed in two different ways using
|
|
|
|
|
|
+\lstinline{'1-2+3'}. This string can be parsed in two different ways using
|
|
our draft grammar, resulting in the two parse trees shown in
|
|
our draft grammar, resulting in the two parse trees shown in
|
|
figure~\ref{fig:ambig-parse-tree}. This example is problematic because
|
|
figure~\ref{fig:ambig-parse-tree}. This example is problematic because
|
|
interpreting the second parse tree would yield \code{-4} even through
|
|
interpreting the second parse tree would yield \code{-4} even through
|
|
@@ -4498,12 +4495,12 @@ the correct answer is \code{2}.
|
|
\centering
|
|
\centering
|
|
\includegraphics[width=0.95\textwidth]{figs/ambig-parse-tree}
|
|
\includegraphics[width=0.95\textwidth]{figs/ambig-parse-tree}
|
|
\end{tcolorbox}
|
|
\end{tcolorbox}
|
|
-\caption{The two parse trees for \code{'1-2+3'}.}
|
|
|
|
|
|
+\caption{The two parse trees for \lstinline{'1-2+3'}.}
|
|
\label{fig:ambig-parse-tree}
|
|
\label{fig:ambig-parse-tree}
|
|
\end{figure}
|
|
\end{figure}
|
|
|
|
|
|
To deal with this problem we can change the grammar by categorizing
|
|
To deal with this problem we can change the grammar by categorizing
|
|
-the syntax in a more fine grained fashion. In this case we want to
|
|
|
|
|
|
+the syntax in a more fine-grained fashion. In this case we want to
|
|
disallow the application of the rule \code{exp: exp "-" exp} when the
|
|
disallow the application of the rule \code{exp: exp "-" exp} when the
|
|
child on the right is an addition. To do this we can replace the
|
|
child on the right is an addition. To do this we can replace the
|
|
\code{exp} after \code{"-"} with a nonterminal that categorizes all
|
|
\code{exp} after \code{"-"} with a nonterminal that categorizes all
|
|
@@ -4525,18 +4522,18 @@ exp_no_add: INT -> int
|
|
\end{center}
|
|
\end{center}
|
|
|
|
|
|
However, there remains some ambiguity in the grammar. For example, the
|
|
However, there remains some ambiguity in the grammar. For example, the
|
|
-string \code{'1-2-3'} can still be parsed in two different ways, as
|
|
|
|
-\code{'(1-2)-3'} (correct) or \code{'1-(2-3)'} (incorrect). That is
|
|
|
|
-to say, subtraction is left associative. Likewise, addition in Python
|
|
|
|
-is left associative. We also need to consider the interaction of unary
|
|
|
|
-subtraction with both addition and subtraction. How should we parse
|
|
|
|
-\code{'-1+2'}? Unary subtraction has higher
|
|
|
|
-\emph{precendence}\index{subject}{precedence} than addition and
|
|
|
|
-subtraction, so \code{'-1+2'} should parse the same as \code{'(-1)+2'}
|
|
|
|
-and not \code{'-(1+2)'}. The grammar in
|
|
|
|
|
|
+string \lstinline{'1-2-3'} can still be parsed in two different ways,
|
|
|
|
+as \lstinline{'(1-2)-3'} (correct) or \lstinline{'1-(2-3)'}
|
|
|
|
+(incorrect). That is, subtraction is left associative. Likewise,
|
|
|
|
+addition in Python is left associative. We also need to consider the
|
|
|
|
+interaction of unary subtraction with both addition and
|
|
|
|
+subtraction. How should we parse \lstinline{'-1+2'}? Unary subtraction
|
|
|
|
+has higher \emph{precedence}\index{subject}{precedence} than addition
|
|
|
|
+and subtraction, so \lstinline{'-1+2'} should parse the same as
|
|
|
|
+\lstinline{'(-1)+2'} and not \lstinline{'-(1+2)'}. The grammar in
|
|
figure~\ref{fig:Lint-lark-grammar} handles the associativity of
|
|
figure~\ref{fig:Lint-lark-grammar} handles the associativity of
|
|
addition and subtraction by using the nonterminal \code{exp\_hi} for
|
|
addition and subtraction by using the nonterminal \code{exp\_hi} for
|
|
-all the other expressions, and uses \code{exp\_hi} for the second
|
|
|
|
|
|
+all the other expressions, and it uses \code{exp\_hi} for the second
|
|
child in the rules for addition and subtraction. Furthermore, unary
|
|
child in the rules for addition and subtraction. Furthermore, unary
|
|
subtraction uses \code{exp\_hi} for its child.
|
|
subtraction uses \code{exp\_hi} for its child.
|
|
|
|
|
|
@@ -4573,12 +4570,12 @@ lang_int: stmt_list -> module
|
|
\section{From Parse Trees to Abstract Syntax Trees}
|
|
\section{From Parse Trees to Abstract Syntax Trees}
|
|
|
|
|
|
As we have seen, the output of a Lark parser is a parse tree, that is,
|
|
As we have seen, the output of a Lark parser is a parse tree, that is,
|
|
-a tree consisting of \code{Tree} and \code{Token} nodes. So the next
|
|
|
|
|
|
+a tree consisting of \code{Tree} and \code{Token} nodes. So, the next
|
|
step is to convert the parse tree to an abstract syntax tree. This can
|
|
step is to convert the parse tree to an abstract syntax tree. This can
|
|
be accomplished with a recursive function that inspects the
|
|
be accomplished with a recursive function that inspects the
|
|
\code{data} field of each node and then constructs the corresponding
|
|
\code{data} field of each node and then constructs the corresponding
|
|
AST node, using recursion to handle its children. The following is an
|
|
AST node, using recursion to handle its children. The following is an
|
|
-excerpt of the \code{parse\_tree\_to\_ast} function for \LangInt{}.
|
|
|
|
|
|
+excerpt from the \code{parse\_tree\_to\_ast} function for \LangInt{}.
|
|
|
|
|
|
\begin{center}
|
|
\begin{center}
|
|
\begin{minipage}{0.95\textwidth}
|
|
\begin{minipage}{0.95\textwidth}
|
|
@@ -4603,10 +4600,10 @@ def parse_tree_to_ast(e):
|
|
%
|
|
%
|
|
Use Lark to create a lexer and parser for \LangVar{}. Use Lark's
|
|
Use Lark to create a lexer and parser for \LangVar{}. Use Lark's
|
|
default parsing algorithm (Earley) with the \code{ambiguity} option
|
|
default parsing algorithm (Earley) with the \code{ambiguity} option
|
|
- set to \code{'explicit'} so that if your grammar is ambiguous, the
|
|
|
|
- output will include multiple parse trees which will indicate to you
|
|
|
|
|
|
+ set to \lstinline{'explicit'} so that if your grammar is ambiguous, the
|
|
|
|
+ output will include multiple parse trees that will indicate to you
|
|
that there is a problem with your grammar. Your parser should ignore
|
|
that there is a problem with your grammar. Your parser should ignore
|
|
- white space so we recommend using Lark's \code{\%ignore} directive
|
|
|
|
|
|
+ white space, so we recommend using Lark's \code{\%ignore} directive
|
|
as follows.
|
|
as follows.
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
WS: /[ \t\f\r\n]/+
|
|
WS: /[ \t\f\r\n]/+
|
|
@@ -4615,7 +4612,7 @@ WS: /[ \t\f\r\n]/+
|
|
Change your compiler from chapter~\ref{ch:Lvar} to use your
|
|
Change your compiler from chapter~\ref{ch:Lvar} to use your
|
|
Lark parser instead of using the \code{parse} function from
|
|
Lark parser instead of using the \code{parse} function from
|
|
the \code{ast} module. Test your compiler on all of the \LangVar{}
|
|
the \code{ast} module. Test your compiler on all of the \LangVar{}
|
|
-programs that you have created and create four additional programs
|
|
|
|
|
|
+programs that you have created, and create four additional programs
|
|
that test for ambiguities in your grammar.
|
|
that test for ambiguities in your grammar.
|
|
\end{exercise}
|
|
\end{exercise}
|
|
|
|
|
|
@@ -4626,21 +4623,22 @@ that test for ambiguities in your grammar.
|
|
In this section we discuss the parsing algorithm of
|
|
In this section we discuss the parsing algorithm of
|
|
\citet{Earley:1970ly}, the default algorithm used by Lark. The
|
|
\citet{Earley:1970ly}, the default algorithm used by Lark. The
|
|
algorithm is powerful in that it can handle any context-free grammar,
|
|
algorithm is powerful in that it can handle any context-free grammar,
|
|
-which makes it easy to use. However, it is not the most efficient
|
|
|
|
-parsing algorithm: it is $O(n^3)$ for ambiguous grammars and $O(n^2)$
|
|
|
|
-for unambiguous grammars, where $n$ is the number of tokens in the
|
|
|
|
-input string~\citep{Hopcroft06:_automata}. In section~\ref{sec:lalr}
|
|
|
|
-we learn about the LALR(1) algorithm, which is more efficient but
|
|
|
|
-cannot handle all context-free grammars.
|
|
|
|
|
|
+which makes it easy to use. However, it is not a particularly
|
|
|
|
+efficient parsing algorithm. The Earley algorithm is $O(n^3)$ for
|
|
|
|
+ambiguous grammars and $O(n^2)$ for unambiguous grammars, where $n$ is
|
|
|
|
+the number of tokens in the input
|
|
|
|
+string~\citep{Hopcroft06:_automata}. In section~\ref{sec:lalr} we
|
|
|
|
+learn about the LALR(1) algorithm, which is more efficient but cannot
|
|
|
|
+handle all context-free grammars.
|
|
|
|
|
|
The Earley algorithm can be viewed as an interpreter; it treats the
|
|
The Earley algorithm can be viewed as an interpreter; it treats the
|
|
grammar as the program being interpreted and it treats the concrete
|
|
grammar as the program being interpreted and it treats the concrete
|
|
syntax of the program-to-be-parsed as its input. The Earley algorithm
|
|
syntax of the program-to-be-parsed as its input. The Earley algorithm
|
|
uses a data structure called a \emph{chart}\index{subject}{chart} to
|
|
uses a data structure called a \emph{chart}\index{subject}{chart} to
|
|
-keep track of its progress and to memoize its results. The chart is an
|
|
|
|
|
|
+keep track of its progress and to store its results. The chart is an
|
|
array with one slot for each position in the input string, where
|
|
array with one slot for each position in the input string, where
|
|
position $0$ is before the first character and position $n$ is
|
|
position $0$ is before the first character and position $n$ is
|
|
-immediately after the last character. So the array has length $n+1$
|
|
|
|
|
|
+immediately after the last character. So, the array has length $n+1$
|
|
for an input string of length $n$. Each slot in the chart contains a
|
|
for an input string of length $n$. Each slot in the chart contains a
|
|
set of \emph{dotted rules}. A dotted rule is simply a grammar rule
|
|
set of \emph{dotted rules}. A dotted rule is simply a grammar rule
|
|
with a period indicating how much of its right-hand side has already
|
|
with a period indicating how much of its right-hand side has already
|
|
@@ -4669,8 +4667,8 @@ grammar in figure~\ref{fig:Lint-lark-grammar}, we place
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
in slot $0$ of the chart. The algorithm then proceeds with
|
|
in slot $0$ of the chart. The algorithm then proceeds with
|
|
\emph{prediction} actions in which it adds more dotted rules to the
|
|
\emph{prediction} actions in which it adds more dotted rules to the
|
|
-chart based on which nonterminals come immediately after a period. In
|
|
|
|
-the above, the nonterminal \code{stmt\_list} appears after a period,
|
|
|
|
|
|
+chart based on the nonterminals that come immediately after a period. In
|
|
|
|
+the dotted rule above, the nonterminal \code{stmt\_list} appears after a period,
|
|
so we add all the rules for \code{stmt\_list} to slot $0$, with a
|
|
so we add all the rules for \code{stmt\_list} to slot $0$, with a
|
|
period at the beginning of their right-hand sides, as follows:
|
|
period at the beginning of their right-hand sides, as follows:
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
@@ -4700,7 +4698,7 @@ We have exhausted the opportunities for prediction, so the algorithm
|
|
proceeds to \emph{scanning}, in which we inspect the next input token
|
|
proceeds to \emph{scanning}, in which we inspect the next input token
|
|
and look for a dotted rule at the current position that has a matching
|
|
and look for a dotted rule at the current position that has a matching
|
|
terminal immediately following the period. In our running example, the
|
|
terminal immediately following the period. In our running example, the
|
|
-first input token is \code{"print"} so we identify the rule in slot
|
|
|
|
|
|
+first input token is \code{"print"}, so we identify the rule in slot
|
|
$0$ of the chart where \code{"print"} follows the period:
|
|
$0$ of the chart where \code{"print"} follows the period:
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
stmt: . "print" "(" exp ")" (0)
|
|
stmt: . "print" "(" exp ")" (0)
|
|
@@ -4711,7 +4709,7 @@ to slot $1$ of the chart:
|
|
stmt: "print" . "(" exp ")" (0)
|
|
stmt: "print" . "(" exp ")" (0)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
If the new dotted rule had a nonterminal after the period, we would
|
|
If the new dotted rule had a nonterminal after the period, we would
|
|
-need to carry out a prediction action, adding more dotted rules into
|
|
|
|
|
|
+need to carry out a prediction action, adding more dotted rules to
|
|
slot $1$. That is not the case, so we continue scanning. The next
|
|
slot $1$. That is not the case, so we continue scanning. The next
|
|
input token is \code{"("}, so we add the following to slot $2$ of the
|
|
input token is \code{"("}, so we add the following to slot $2$ of the
|
|
chart.
|
|
chart.
|
|
@@ -4733,12 +4731,12 @@ exp_hi: . "-" exp_hi (2)
|
|
exp_hi: . "(" exp ")" (2)
|
|
exp_hi: . "(" exp ")" (2)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
With this prediction complete, we return to scanning, noting that the
|
|
With this prediction complete, we return to scanning, noting that the
|
|
-next input token is \code{"1"} which the lexer parses as an
|
|
|
|
|
|
+next input token is \code{"1"}, which the lexer parses as an
|
|
\code{INT}. There is a matching rule in slot $2$:
|
|
\code{INT}. There is a matching rule in slot $2$:
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
exp_hi: . INT (2)
|
|
exp_hi: . INT (2)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
-so we advance the period and put the following rule is slot $3$.
|
|
|
|
|
|
+so we advance the period and put the following rule into slot $3$.
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
exp_hi: INT . (2)
|
|
exp_hi: INT . (2)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
@@ -4746,7 +4744,7 @@ This brings us to \emph{completion} actions. When the period reaches
|
|
the end of a dotted rule, we recognize that the substring
|
|
the end of a dotted rule, we recognize that the substring
|
|
has matched the nonterminal on the left-hand side of the rule, in this case
|
|
has matched the nonterminal on the left-hand side of the rule, in this case
|
|
\code{exp\_hi}. We therefore need to advance the periods in any dotted
|
|
\code{exp\_hi}. We therefore need to advance the periods in any dotted
|
|
-rules in slot $2$ (the starting position for the finished rule) if
|
|
|
|
|
|
+rules into slot $2$ (the starting position for the finished rule) if
|
|
the period is immediately followed by \code{exp\_hi}. So we identify
|
|
the period is immediately followed by \code{exp\_hi}. So we identify
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
exp: . exp_hi (2)
|
|
exp: . exp_hi (2)
|
|
@@ -4777,14 +4775,14 @@ exp_hi: . "(" exp ")" (4)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
The next input token is \code{"3"} which the lexer categorized as an
|
|
The next input token is \code{"3"} which the lexer categorized as an
|
|
\code{INT}, so we advance the period past \code{INT} for the rules in
|
|
\code{INT}, so we advance the period past \code{INT} for the rules in
|
|
-slot $4$, of which there is just one, and put the following in slot $5$.
|
|
|
|
|
|
+slot $4$, of which there is just one, and put the following into slot $5$.
|
|
\begin{lstlisting}[escapechar=$]
|
|
\begin{lstlisting}[escapechar=$]
|
|
exp_hi: INT . (4)
|
|
exp_hi: INT . (4)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
|
|
|
|
The period at the end of the rule triggers a completion action for the
|
|
The period at the end of the rule triggers a completion action for the
|
|
rules in slot $4$, one of which has a period before \code{exp\_hi}.
|
|
rules in slot $4$, one of which has a period before \code{exp\_hi}.
|
|
-So we advance the period and put the following in slot $5$.
|
|
|
|
|
|
+So we advance the period and put the following into slot $5$.
|
|
\begin{lstlisting}[escapechar=$]
|
|
\begin{lstlisting}[escapechar=$]
|
|
exp: exp "+" exp_hi . (2)
|
|
exp: exp "+" exp_hi . (2)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
@@ -4797,7 +4795,7 @@ exp: exp . "-" exp_hi (2)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
|
|
|
|
We scan the next input token \code{")"}, placing the following dotted
|
|
We scan the next input token \code{")"}, placing the following dotted
|
|
-rule in slot $6$.
|
|
|
|
|
|
+rule into slot $6$.
|
|
\begin{lstlisting}[escapechar=$]
|
|
\begin{lstlisting}[escapechar=$]
|
|
stmt: "print" "(" exp ")" . (0)
|
|
stmt: "print" "(" exp ")" . (0)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
@@ -4806,7 +4804,7 @@ This triggers the completion of \code{stmt} in slot $0$
|
|
stmt_list: stmt . NEWLINE stmt_list (0)
|
|
stmt_list: stmt . NEWLINE stmt_list (0)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
The last input token is a \code{NEWLINE}, so we advance the period
|
|
The last input token is a \code{NEWLINE}, so we advance the period
|
|
-and place the new dotted rule in slot $7$.
|
|
|
|
|
|
+and place the new dotted rule into slot $7$.
|
|
\begin{lstlisting}
|
|
\begin{lstlisting}
|
|
stmt_list: stmt NEWLINE . stmt_list (0)
|
|
stmt_list: stmt NEWLINE . stmt_list (0)
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
@@ -4841,7 +4839,7 @@ algorithm.
|
|
\item The algorithm repeatedly applies the following three kinds of
|
|
\item The algorithm repeatedly applies the following three kinds of
|
|
actions for as long as there are opportunities to do so.
|
|
actions for as long as there are opportunities to do so.
|
|
\begin{itemize}
|
|
\begin{itemize}
|
|
- \item Prediction: if there is a rule in slot $k$ whose period comes
|
|
|
|
|
|
+ \item Prediction: If there is a rule in slot $k$ whose period comes
|
|
before a nonterminal, add the rules for that nonterminal into slot
|
|
before a nonterminal, add the rules for that nonterminal into slot
|
|
$k$, placing a period at the beginning of their right-hand sides
|
|
$k$, placing a period at the beginning of their right-hand sides
|
|
and recording their starting position as $k$.
|
|
and recording their starting position as $k$.
|
|
@@ -4856,7 +4854,7 @@ algorithm.
|
|
of the completed rule, then advance their period, placing the new
|
|
of the completed rule, then advance their period, placing the new
|
|
dotted rule in slot $k$.
|
|
dotted rule in slot $k$.
|
|
\end{itemize}
|
|
\end{itemize}
|
|
- While repeating these three actions, take care to never add
|
|
|
|
|
|
+ While repeating these three actions, take care never to add
|
|
duplicate dotted rules to the chart.
|
|
duplicate dotted rules to the chart.
|
|
\end{enumerate}
|
|
\end{enumerate}
|
|
|
|
|
|
@@ -4883,22 +4881,22 @@ efficient enough to use with even the largest of input files.
|
|
\label{sec:lalr}
|
|
\label{sec:lalr}
|
|
|
|
|
|
The LALR(1) algorithm~\citep{DeRemer69,Anderson73} can be viewed as a
|
|
The LALR(1) algorithm~\citep{DeRemer69,Anderson73} can be viewed as a
|
|
-two phase approach in which it first compiles the grammar into a state
|
|
|
|
|
|
+two-phase approach in which it first compiles the grammar into a state
|
|
machine and then runs the state machine to parse an input string. The
|
|
machine and then runs the state machine to parse an input string. The
|
|
second phase has time complexity $O(n)$ where $n$ is the number of
|
|
second phase has time complexity $O(n)$ where $n$ is the number of
|
|
tokens in the input, so LALR(1) is the best one could hope for with
|
|
tokens in the input, so LALR(1) is the best one could hope for with
|
|
respect to efficiency.
|
|
respect to efficiency.
|
|
%
|
|
%
|
|
A particularly influential implementation of LALR(1) is the
|
|
A particularly influential implementation of LALR(1) is the
|
|
-\texttt{yacc} parser generator by \citet{Johnson:1979qy}, which stands
|
|
|
|
-for Yet Another Compiler Compiler.
|
|
|
|
|
|
+\texttt{yacc} parser generator by \citet{Johnson:1979qy};
|
|
|
|
+\texttt{yacc} stands for ``yet another compiler compiler''.
|
|
%
|
|
%
|
|
The LALR(1) state machine uses a stack to record its progress in
|
|
The LALR(1) state machine uses a stack to record its progress in
|
|
parsing the input string. Each element of the stack is a pair: a
|
|
parsing the input string. Each element of the stack is a pair: a
|
|
-state number and a grammar symbol (a terminal or nonterminal). The
|
|
|
|
-symbol characterizes the input that has been parsed so-far and the
|
|
|
|
|
|
+state number and a grammar symbol (a terminal or a nonterminal). The
|
|
|
|
+symbol characterizes the input that has been parsed so far, and the
|
|
state number is used to remember how to proceed once the next
|
|
state number is used to remember how to proceed once the next
|
|
-symbol-worth of input has been parsed. Each state in the machine
|
|
|
|
|
|
+symbol's worth of input has been parsed. Each state in the machine
|
|
represents where the parser stands in the parsing process with respect
|
|
represents where the parser stands in the parsing process with respect
|
|
to certain grammar rules. In particular, each state is associated with
|
|
to certain grammar rules. In particular, each state is associated with
|
|
a set of dotted rules.
|
|
a set of dotted rules.
|
|
@@ -4912,26 +4910,26 @@ exp: INT
|
|
stmt: "print" exp
|
|
stmt: "print" exp
|
|
start: stmt
|
|
start: stmt
|
|
\end{lstlisting}
|
|
\end{lstlisting}
|
|
-Consider state 1 in Figure~\ref{fig:shift-reduce}. The parser has just
|
|
|
|
|
|
+Consider state 1 in figure~\ref{fig:shift-reduce}. The parser has just
|
|
read in a \lstinline{"print"} token, so the top of the stack is
|
|
read in a \lstinline{"print"} token, so the top of the stack is
|
|
\lstinline{(1,"print")}. The parser is part of the way through parsing
|
|
\lstinline{(1,"print")}. The parser is part of the way through parsing
|
|
the input according to grammar rule 1, which is signified by showing
|
|
the input according to grammar rule 1, which is signified by showing
|
|
rule 1 with a period after the \code{"print"} token and before the
|
|
rule 1 with a period after the \code{"print"} token and before the
|
|
-\code{exp} nonterminal. There are several rules that could apply next,
|
|
|
|
-both rule 2 and 3, so state 1 also shows those rules with a period at
|
|
|
|
|
|
+\code{exp} nonterminal. There are two rules that could apply next,
|
|
|
|
+rules 2 and 3, so state 1 also shows those rules with a period at
|
|
the beginning of their right-hand sides. The edges between states
|
|
the beginning of their right-hand sides. The edges between states
|
|
indicate which transitions the machine should make depending on the
|
|
indicate which transitions the machine should make depending on the
|
|
next input token. So, for example, if the next input token is
|
|
next input token. So, for example, if the next input token is
|
|
\code{INT} then the parser will push \code{INT} and the target state 4
|
|
\code{INT} then the parser will push \code{INT} and the target state 4
|
|
-on the stack and transition to state 4. Suppose we are now at the end
|
|
|
|
-of the input. In state 4 it says we should reduce by rule 3, so we pop
|
|
|
|
|
|
+on the stack and transition to state 4. Suppose that we are now at the end
|
|
|
|
+of the input. State 4 says that we should reduce by rule 3, so we pop
|
|
from the stack the same number of items as the number of symbols in
|
|
from the stack the same number of items as the number of symbols in
|
|
the right-hand side of the rule, in this case just one. We then
|
|
the right-hand side of the rule, in this case just one. We then
|
|
momentarily jump to the state at the top of the stack (state 1) and
|
|
momentarily jump to the state at the top of the stack (state 1) and
|
|
then follow the goto edge that corresponds to the left-hand side of
|
|
then follow the goto edge that corresponds to the left-hand side of
|
|
the rule we just reduced by, in this case \code{exp}, so we arrive at
|
|
the rule we just reduced by, in this case \code{exp}, so we arrive at
|
|
state 3. (A slightly longer example parse is shown in
|
|
state 3. (A slightly longer example parse is shown in
|
|
-Figure~\ref{fig:shift-reduce}.)
|
|
|
|
|
|
+figure~\ref{fig:shift-reduce}.)
|
|
|
|
|
|
\begin{figure}[htbp]
|
|
\begin{figure}[htbp]
|
|
\centering
|
|
\centering
|
|
@@ -4940,11 +4938,11 @@ Figure~\ref{fig:shift-reduce}.)
|
|
\label{fig:shift-reduce}
|
|
\label{fig:shift-reduce}
|
|
\end{figure}
|
|
\end{figure}
|
|
|
|
|
|
-In general, the algorithm works as follows. Set the current state to
|
|
|
|
|
|
+In general, the algorithm works as follows. First, set the current state to
|
|
state $0$. Then repeat the following, looking at the next input token.
|
|
state $0$. Then repeat the following, looking at the next input token.
|
|
\begin{itemize}
|
|
\begin{itemize}
|
|
\item If there there is a shift edge for the input token in the
|
|
\item If there there is a shift edge for the input token in the
|
|
- current state, push the edge's target state and the input token on
|
|
|
|
|
|
+ current state, push the edge's target state and the input token onto
|
|
the stack and proceed to the edge's target state.
|
|
the stack and proceed to the edge's target state.
|
|
\item If there is a reduce action for the input token in the current
|
|
\item If there is a reduce action for the input token in the current
|
|
state, pop $k$ elements from the stack, where $k$ is the number of
|
|
state, pop $k$ elements from the stack, where $k$ is the number of
|
|
@@ -8843,10 +8841,6 @@ upcoming \code{explicate\_control} pass.
|
|
|
|
|
|
\newcommand{\LifMonadASTPython}{
|
|
\newcommand{\LifMonadASTPython}{
|
|
\begin{array}{rcl}
|
|
\begin{array}{rcl}
|
|
-%% \itm{binaryop} &::=& \code{Add()} \MID \code{Sub()} \\
|
|
|
|
-%% \itm{cmp} &::= & \code{Eq()} \MID \code{NotEq()} \MID \code{Lt()} \MID \code{LtE()} \MID \code{Gt()} \MID \code{GtE()} \\
|
|
|
|
-%% \itm{unaryop} &::=& \code{USub()} \MID \code{Not()} \\
|
|
|
|
-%% \itm{bool} &::=& \code{True} \MID \code{False} \\
|
|
|
|
\Atm &::=& \BOOL{\itm{bool}}\\
|
|
\Atm &::=& \BOOL{\itm{bool}}\\
|
|
\Exp &::=& \CMP{\Atm}{\itm{cmp}}{\Atm} \MID \IF{\Exp}{\Exp}{\Exp} \\
|
|
\Exp &::=& \CMP{\Atm}{\itm{cmp}}{\Atm} \MID \IF{\Exp}{\Exp}{\Exp} \\
|
|
&\MID& \BEGIN{\Stmt^{*}}{\Exp}\\
|
|
&\MID& \BEGIN{\Stmt^{*}}{\Exp}\\
|
|
@@ -23609,4 +23603,4 @@ registers.
|
|
% LocalWords: multilanguage Prelim shinan DeRemer lexer Lesk LPAR cb
|
|
% LocalWords: multilanguage Prelim shinan DeRemer lexer Lesk LPAR cb
|
|
% LocalWords: RPAR abcbab abc bzca usub paren expr lang WS Tomita qr
|
|
% LocalWords: RPAR abcbab abc bzca usub paren expr lang WS Tomita qr
|
|
% LocalWords: subparses LCCN ebook hardcover epub pdf LCSH LCC DDC
|
|
% LocalWords: subparses LCCN ebook hardcover epub pdf LCSH LCC DDC
|
|
-% LocalWords: LC partialevaluation pythonEd TOC
|
|
|
|
|
|
+% LocalWords: LC partialevaluation pythonEd TOC TrappedError
|