|
@@ -4227,7 +4227,7 @@ all, fast code is useless if it produces incorrect results!
|
|
|
In this chapter we learn how to use the Lark parser
|
|
|
framework~\citep{shinan20:_lark_docs} to translate the concrete syntax
|
|
|
of \LangInt{} (a sequence of characters) into an abstract syntax tree.
|
|
|
-You will then be asked to use Lark to create a parser for \LangVar{}.
|
|
|
+You are then asked to create a parser for \LangVar{} using Lark.
|
|
|
We also describe the parsing algorithms used inside Lark, studying the
|
|
|
\citet{Earley:1970ly} and LALR(1) algorithms~\citep{DeRemer69,Anderson73}.
|
|
|
|
|
@@ -4238,7 +4238,7 @@ properly requires some knowledge. In particular, we must learn about
|
|
|
its specification languages and we must learn how to deal with
|
|
|
ambiguity in our language specifications. Also, some algorithms, such
|
|
|
as LALR(1), place restrictions on the grammars they can handle, in
|
|
|
-which case knowing the algorithm help with trying to decipher the
|
|
|
+which case knowing the algorithm helps with trying to decipher the
|
|
|
error messages.
|
|
|
|
|
|
The process of parsing is traditionally subdivided into two phases:
|
|
@@ -4646,7 +4646,7 @@ that test for ambiguities in your grammar.
|
|
|
In this section we discuss the parsing algorithm of
|
|
|
\citet{Earley:1970ly}, the default algorithm used by Lark. The
|
|
|
algorithm is powerful in that it can handle any context-free grammar,
|
|
|
-which makes it easy to use. However, it is not a particularly
|
|
|
+which makes it easy to use, but it is not a particularly
|
|
|
efficient parsing algorithm. Earley's algorithm is $O(n^3)$ for
|
|
|
ambiguous grammars and $O(n^2)$ for unambiguous grammars, where $n$ is
|
|
|
the number of tokens in the input
|
|
@@ -4727,7 +4727,7 @@ $0$ of the chart where \code{"print"} follows the period:
|
|
|
stmt: . "print" "(" exp ")" (0)
|
|
|
\end{lstlisting}
|
|
|
We advance the period past \code{"print"} and add the resulting rule
|
|
|
-to slot $1$ of the chart:
|
|
|
+to slot $1$:
|
|
|
\begin{lstlisting}
|
|
|
stmt: "print" . "(" exp ")" (0)
|
|
|
\end{lstlisting}
|
|
@@ -4745,13 +4745,13 @@ prediction actions, adding dotted rules for \code{exp} and
|
|
|
\code{exp\_hi} to slot $2$ with a period at the beginning and with
|
|
|
starting position $2$.
|
|
|
\begin{lstlisting}[escapechar=$]
|
|
|
- exp: . exp "+" exp_hi (2)
|
|
|
- exp: . exp "-" exp_hi (2)
|
|
|
- exp: . exp_hi (2)
|
|
|
- exp_hi: . INT (2)
|
|
|
+ exp: . exp "+" exp_hi (2)
|
|
|
+ exp: . exp "-" exp_hi (2)
|
|
|
+ exp: . exp_hi (2)
|
|
|
+ exp_hi: . INT (2)
|
|
|
exp_hi: . "input_int" "(" ")" (2)
|
|
|
- exp_hi: . "-" exp_hi (2)
|
|
|
- exp_hi: . "(" exp ")" (2)
|
|
|
+ exp_hi: . "-" exp_hi (2)
|
|
|
+ exp_hi: . "(" exp ")" (2)
|
|
|
\end{lstlisting}
|
|
|
With this prediction complete, we return to scanning, noting that the
|
|
|
next input token is \code{"1"}, which the lexer parses as an
|
|
@@ -4791,10 +4791,10 @@ we add the following to slot $4$.
|
|
|
The period precedes the nonterminal \code{exp\_hi}, so prediction adds
|
|
|
the following dotted rules to slot $4$ of the chart.
|
|
|
\begin{lstlisting}[escapechar=$]
|
|
|
- exp_hi: . INT (4)
|
|
|
+ exp_hi: . INT (4)
|
|
|
exp_hi: . "input_int" "(" ")" (4)
|
|
|
- exp_hi: . "-" exp_hi (4)
|
|
|
- exp_hi: . "(" exp ")" (4)
|
|
|
+ exp_hi: . "-" exp_hi (4)
|
|
|
+ exp_hi: . "(" exp ")" (4)
|
|
|
\end{lstlisting}
|
|
|
The next input token is \code{"3"} which the lexer categorized as an
|
|
|
\code{INT}, so we advance the period past \code{INT} for the rules in
|
|
@@ -4813,8 +4813,8 @@ This triggers another completion action for the rules in slot $2$ that
|
|
|
have a period before \code{exp}.
|
|
|
\begin{lstlisting}[escapechar=$]
|
|
|
stmt: "print" "(" exp . ")" (0)
|
|
|
- exp: exp . "+" exp_hi (2)
|
|
|
- exp: exp . "-" exp_hi (2)
|
|
|
+ exp: exp . "+" exp_hi (2)
|
|
|
+ exp: exp . "-" exp_hi (2)
|
|
|
\end{lstlisting}
|
|
|
|
|
|
We scan the next input token \code{")"}, placing the following dotted
|
|
@@ -4887,9 +4887,9 @@ parse tree. The basic idea is simple, but building parse trees in an
|
|
|
efficient way is more complex, requiring a data structure called a
|
|
|
shared packed parse forest~\citep{Tomita:1985qr}. The simple idea is
|
|
|
to attach a partial parse tree to every dotted rule in the chart.
|
|
|
-Initially, the tree node associated with a dotted rule has no
|
|
|
+Initially, the node associated with a dotted rule has no
|
|
|
children. As the period moves to the right, the nodes from the
|
|
|
-subparses are added as children to the tree node.
|
|
|
+subparses are added as children to the node.
|
|
|
|
|
|
As mentioned at the beginning of this section, Earley's algorithm is
|
|
|
$O(n^2)$ for unambiguous grammars, which means that it can parse input
|
|
@@ -4954,7 +4954,7 @@ the rule we just reduced by, in this case \code{exp}, so we arrive at
|
|
|
state 3. (A slightly longer example parse is shown in
|
|
|
figure~\ref{fig:shift-reduce}.)
|
|
|
|
|
|
-\begin{figure}[htbp]
|
|
|
+\begin{figure}[tbp]
|
|
|
\centering
|
|
|
\includegraphics[width=5.0in]{figs/shift-reduce-conflict}
|
|
|
\caption{An LALR(1) parse table and a trace of an example run.}
|