|
@@ -4949,20 +4949,20 @@ state $0$. Then repeat the following, looking at the next input token.
|
|
symbols in the right-hand side of the rule being reduced. Jump to
|
|
symbols in the right-hand side of the rule being reduced. Jump to
|
|
the state at the top of the stack and then follow the goto edge for
|
|
the state at the top of the stack and then follow the goto edge for
|
|
the nonterminal that matches the left-hand side of the rule that we
|
|
the nonterminal that matches the left-hand side of the rule that we
|
|
- reducing by. Push the edge's target state and the nonterminal on the
|
|
|
|
|
|
+ are reducing by. Push the edge's target state and the nonterminal on the
|
|
stack.
|
|
stack.
|
|
\end{itemize}
|
|
\end{itemize}
|
|
|
|
|
|
-Notice that in state 6 of Figure~\ref{fig:shift-reduce} there is both
|
|
|
|
|
|
+Notice that in state 6 of figure~\ref{fig:shift-reduce} there is both
|
|
a shift and a reduce action for the token \lstinline{PLUS}, so the
|
|
a shift and a reduce action for the token \lstinline{PLUS}, so the
|
|
algorithm does not know which action to take in this case. When a
|
|
algorithm does not know which action to take in this case. When a
|
|
state has both a shift and a reduce action for the same token, we say
|
|
state has both a shift and a reduce action for the same token, we say
|
|
there is a \emph{shift/reduce conflict}. In this case, the conflict
|
|
there is a \emph{shift/reduce conflict}. In this case, the conflict
|
|
-will arise, for example, when trying to parse the input
|
|
|
|
-\lstinline{print 1 + 2 + 3}. After having consumed \lstinline{print 1 + 2}
|
|
|
|
-the parser will be in state 6, and it will not know whether to
|
|
|
|
-reduce to form an \code{exp} of \lstinline{1 + 2}, or whether it
|
|
|
|
-should proceed by shifting the next \lstinline{+} from the input.
|
|
|
|
|
|
+will arise, for example, in trying to parse the input
|
|
|
|
+\lstinline{print 1 + 2 + 3}. After having consumed \lstinline{print 1 + 2},
|
|
|
|
+the parser will be in state 6 and will not know whether to
|
|
|
|
+reduce to form an \code{exp} of \lstinline{1 + 2} or
|
|
|
|
+to proceed by shifting the next \lstinline{+} from the input.
|
|
|
|
|
|
A similar kind of problem, known as a \emph{reduce/reduce} conflict,
|
|
A similar kind of problem, known as a \emph{reduce/reduce} conflict,
|
|
arises when there are two reduce actions in a state for the same
|
|
arises when there are two reduce actions in a state for the same
|
|
@@ -4973,32 +4973,32 @@ generated from the grammar, which we discuss next.
|
|
The parse table is generated one state at a time. State 0 represents
|
|
The parse table is generated one state at a time. State 0 represents
|
|
the start of the parser. We add the grammar rule for the start symbol
|
|
the start of the parser. We add the grammar rule for the start symbol
|
|
to this state with a period at the beginning of the right-hand side,
|
|
to this state with a period at the beginning of the right-hand side,
|
|
-similar to the initialization phase of the Earley parser. If the
|
|
|
|
|
|
+similarly to the initialization phase of the Earley parser. If the
|
|
period appears immediately before another nonterminal, we add all the
|
|
period appears immediately before another nonterminal, we add all the
|
|
rules with that nonterminal on the left-hand side. Again, we place a
|
|
rules with that nonterminal on the left-hand side. Again, we place a
|
|
-period at the beginning of the right-hand side of each the new
|
|
|
|
-rules. This process, called \emph{state closure}, is continued
|
|
|
|
-until there are no more rules to add (similar to the prediction
|
|
|
|
|
|
+period at the beginning of the right-hand side of each new
|
|
|
|
+rule. This process, called \emph{state closure}, is continued
|
|
|
|
+until there are no more rules to add (similarly to the prediction
|
|
actions of an Earley parser). We then examine each dotted rule in the
|
|
actions of an Earley parser). We then examine each dotted rule in the
|
|
-current state $I$. Suppose a dotted rule has the form $A ::=
|
|
|
|
-s_1.\,X s_2$, where $A$ and $X$ are symbols and $s_1$ and $s_2$
|
|
|
|
-are sequences of symbols. We create a new state, call it $J$. If $X$
|
|
|
|
-is a terminal, we create a shift edge from $I$ to $J$ (analogous to
|
|
|
|
|
|
+current state $I$. Suppose that a dotted rule has the form $A ::=
|
|
|
|
+s_1.\,X \,s_2$, where $A$ and $X$ are symbols and $s_1$ and $s_2$
|
|
|
|
+are sequences of symbols. We create a new state and call it $J$. If $X$
|
|
|
|
+is a terminal, we create a shift edge from $I$ to $J$ (analogously to
|
|
scanning in Earley), whereas if $X$ is a nonterminal, we create a
|
|
scanning in Earley), whereas if $X$ is a nonterminal, we create a
|
|
goto edge from $I$ to $J$. We then need to add some dotted rules to
|
|
goto edge from $I$ to $J$. We then need to add some dotted rules to
|
|
state $J$. We start by adding all dotted rules from state $I$ that
|
|
state $J$. We start by adding all dotted rules from state $I$ that
|
|
-have the form $B ::= s_1.\,Xs_2$ (where $B$ is any nonterminal and
|
|
|
|
-$s_1$ and $s_2$ are arbitrary sequences of symbols), but with
|
|
|
|
|
|
+have the form $B ::= s_1.\,X\,s_2$ (where $B$ is any nonterminal and
|
|
|
|
+$s_1$ and $s_2$ are arbitrary sequences of symbols), with
|
|
the period moved past the $X$. (This is analogous to completion in
|
|
the period moved past the $X$. (This is analogous to completion in
|
|
the Earley algorithm.) We then perform state closure on $J$. This
|
|
the Earley algorithm.) We then perform state closure on $J$. This
|
|
process repeats until there are no more states or edges to add.
|
|
process repeats until there are no more states or edges to add.
|
|
|
|
|
|
We then mark states as accepting states if they have a dotted rule
|
|
We then mark states as accepting states if they have a dotted rule
|
|
that is the start rule with a period at the end. Also, to add
|
|
that is the start rule with a period at the end. Also, to add
|
|
-in the reduce actions, we look for any state containing a dotted rule
|
|
|
|
|
|
+the reduce actions, we look for any state containing a dotted rule
|
|
with a period at the end. Let $n$ be the rule number for this dotted
|
|
with a period at the end. Let $n$ be the rule number for this dotted
|
|
rule. We then put a reduce $n$ action into that state for every token
|
|
rule. We then put a reduce $n$ action into that state for every token
|
|
-$Y$. For example, in Figure~\ref{fig:shift-reduce} state 4 has an
|
|
|
|
|
|
+$Y$. For example, in figure~\ref{fig:shift-reduce} state 4 has a
|
|
dotted rule with a period at the end. We therefore put a reduce by
|
|
dotted rule with a period at the end. We therefore put a reduce by
|
|
rule 3 action into state 4 for every
|
|
rule 3 action into state 4 for every
|
|
token.
|
|
token.
|
|
@@ -5011,9 +5011,10 @@ the parse table.
|
|
\begin{exercise}
|
|
\begin{exercise}
|
|
\normalfont\normalsize
|
|
\normalfont\normalsize
|
|
%
|
|
%
|
|
-On a piece of paper, walk through the parse table generation process
|
|
|
|
-for the grammar at the top of figure~\ref{fig:shift-reduce} and check
|
|
|
|
-your results against parse table in figure~\ref{fig:shift-reduce}.
|
|
|
|
|
|
+Working on paper, walk through the parse table generation process for
|
|
|
|
+the grammar at the top of figure~\ref{fig:shift-reduce}, and check
|
|
|
|
+your results against the parse table shown in
|
|
|
|
+figure~\ref{fig:shift-reduce}.
|
|
\end{exercise}
|
|
\end{exercise}
|
|
|
|
|
|
|
|
|
|
@@ -5021,7 +5022,7 @@ your results against parse table in figure~\ref{fig:shift-reduce}.
|
|
\normalfont\normalsize
|
|
\normalfont\normalsize
|
|
%
|
|
%
|
|
Change the parser in your compiler for \LangVar{} to set the
|
|
Change the parser in your compiler for \LangVar{} to set the
|
|
- \code{parser} option of Lark to \code{'lalr'}. Test your compiler on
|
|
|
|
|
|
+ \code{parser} option of Lark to \lstinline{'lalr'}. Test your compiler on
|
|
all the \LangVar{} programs that you have created. In doing so, Lark
|
|
all the \LangVar{} programs that you have created. In doing so, Lark
|
|
may signal an error due to shift/reduce or reduce/reduce conflicts
|
|
may signal an error due to shift/reduce or reduce/reduce conflicts
|
|
in your grammar. If so, change your Lark grammar for \LangVar{} to
|
|
in your grammar. If so, change your Lark grammar for \LangVar{} to
|
|
@@ -5034,16 +5035,16 @@ your results against parse table in figure~\ref{fig:shift-reduce}.
|
|
In this chapter we have just scratched the surface of the field of
|
|
In this chapter we have just scratched the surface of the field of
|
|
parsing, with the study of a very general but less efficient algorithm
|
|
parsing, with the study of a very general but less efficient algorithm
|
|
(Earley) and with a more limited but highly efficient algorithm
|
|
(Earley) and with a more limited but highly efficient algorithm
|
|
-(LALR). There are many more algorithms, and classes of grammars, that
|
|
|
|
-fall between these two ends of the spectrum. We recommend the reader
|
|
|
|
-to \citet{Aho:2006wb} for a thorough treatment of parsing.
|
|
|
|
-
|
|
|
|
-Regarding lexical analysis, we described the specification language,
|
|
|
|
-the regular expressions, but not the algorithms for recognizing them.
|
|
|
|
-In short, regular expressions can be translated to nondeterministic
|
|
|
|
-finite automata, which in turn are translated to finite automata. We
|
|
|
|
-refer the reader again to \citet{Aho:2006wb} for all the details on
|
|
|
|
-lexical analysis.
|
|
|
|
|
|
+(LALR). There are many more algorithms and classes of grammars that
|
|
|
|
+fall between these two ends of the spectrum. We recommend to the reader
|
|
|
|
+\citet{Aho:2006wb} for a thorough treatment of parsing.
|
|
|
|
+
|
|
|
|
+Regarding lexical analysis, we have described the specification
|
|
|
|
+language, which are the regular expressions, but not the algorithms
|
|
|
|
+for recognizing them. In short, regular expressions can be translated
|
|
|
|
+to nondeterministic finite automata, which in turn are translated to
|
|
|
|
+finite automata. We refer the reader again to \citet{Aho:2006wb} for
|
|
|
|
+all the details on lexical analysis.
|
|
|
|
|
|
\fi}
|
|
\fi}
|
|
|
|
|