2 年之前 · ac5f48a492
--- a/book.bib
+++ b/book.bib
@@ -1,3 +1,13 @@
 
															+@book{Tomita:1985qr,
														
 
															+	address = {Norwell, MA, USA},
														
 
															+	author = {Masaru Tomita},
														
 
															+	date-added = {2008-12-02 14:16:33 -0700},
														
 
															+	date-modified = {2008-12-02 14:16:39 -0700},
														
 
															+	isbn = {0898382025},
														
 
															+	publisher = {Kluwer Academic Publishers},
														
 
															+	title = {Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems},
														
 
															+	year = {1985}}
														
 
															+
														
 
															 @article{Earley:1970ly,
														
 
															 	acmid = {362035},
														
 
															 	address = {New York, NY, USA},
														
--- a/book.tex
+++ b/book.tex
@@ -4298,7 +4298,7 @@ The rule \code{exp: exp "+" exp} says that any string that matches
 
															 \code{exp}, followed by the \code{+} character, followed by another
														
 
															 string that matches \code{exp}, is itself an \code{exp}.  For example,
														
 
															 the string \code{'1+3'} is an \code{exp} because \code{'1'} and
														
 
															-\code{'3'} are both \code{exp} by rule \code{exp: INT}, and then the
														
 
															+\code{'3'} are both \code{exp} by the rule \code{exp: INT}, and then the
														
 
															 rule for addition applies to categorize \code{'1+3'} as an \Exp{}. We
														
 
															 can visualize the application of grammar rules to categorize a string
														
 
															 using a \emph{parse tree}\index{subject}{parse tree}. Each internal
														
@@ -4426,9 +4426,9 @@ nonterminal \code{exp\_hi} for all the other expressions, and uses
 
															 subtraction. Furthermore, unary subtraction uses \code{exp\_hi} for
														
 
															 its child.
														
 
															-For languages with more operators with more precedence levels, one
														
 
															+For languages with more operators and more precedence levels, one
														
 
															 would need to refine the \code{exp} nonterminal into several
														
 
															-nonterminals, 
														
 
															+nonterminals, one for each precedence level.
														
 
															 \begin{figure}[tbp]
														
 
															 \begin{tcolorbox}[colback=white]
														
@@ -4514,7 +4514,8 @@ In this section we discuss the parsing algorithm of
 
															 The algorithm is powerful in that it can handle any context-free
														
 
															 grammar, which makes it easy to use. However, it is not the most
														
 
															 efficient parsing algorithm: it is $O(n^3)$ for ambiguous grammars and
														
 
															-$O(n^2)$ for unambiguous grammars~\citep{Hopcroft06:_automata}.  In
														
 
															+$O(n^2)$ for unambiguous grammars~\citep{Hopcroft06:_automata}, where
														
 
															+$n$ is the number of tokens in the input string.  In
														
 
															 section~\ref{sec:lalr} we learn about the LALR algorithm, which is
														
 
															 more efficient but can only handle a subset of the context-free
														
 
															 grammars.
														
@@ -4531,16 +4532,24 @@ been parsed. For example, the dotted rule
 
															 \begin{lstlisting}
														
 
															 exp: exp "+" . exp_hi
														
 
															 \end{lstlisting}
														
 
															-represents a partial parse that has matched an expression followed by
														
 
															-\code{+}, but has not yet parsed an expression to the right of
														
 
															+represents a partial parse that has matched an \code{exp} followed by
														
 
															+\code{+}, but has not yet parsed an \code{exp} to the right of
														
 
															 \code{+}.
														
 
															-
														
 
															-The algorithm begins by creating dotted rules for all the grammar
														
 
															-rules whose left-hand side is the start symbol and placing then in
														
 
															-slot $0$ of the chart.  For example, given the grammar in
														
 
															-figure~\ref{fig:Lint-lark-grammar}, we would place
														
 
															+%
														
 
															+The Earley algorithm starts with an initialization phase, and then
														
 
															+repeats three actions (prediction, scanning, and completion) for as
														
 
															+long as opportunities arise for those actions. We demonstrate the
														
 
															+Earley algorithm on a running example, parsing the following program:
														
 
															 \begin{lstlisting}
														
 
															-  lang_int: . stmt_list
														
 
															+  print(1 + 3)
														
 
															+\end{lstlisting}
														
 
															+The algorithm's initialization phase creates dotted rules for all the
														
 
															+grammar rules whose left-hand side is the start symbol and places them
														
 
															+in slot $0$ of the chart. It also records the starting position of the
														
 
															+dotted rule, in parentheses on the right. For example, given the
														
 
															+grammar in figure~\ref{fig:Lint-lark-grammar}, we place
														
 
															+\begin{lstlisting}
														
 
															+  lang_int: . stmt_list         (0)
														
 
															 \end{lstlisting}
														
 
															 in slot $0$ of the chart. The algorithm then proceeds to its
														
 
															 \emph{prediction} phase in which it adds more dotted rules to the
														
@@ -4549,32 +4558,340 @@ the nonterminal \code{stmt\_list} appears after a period, so we add all
 
															 the rules for \code{stmt\_list} to slot $0$, with a period at the
														
 
															 beginning of their right-hand sides, as follows:
														
 
															 \begin{lstlisting}
														
 
															-stmt_list:  . 
														
 
															-stmt_list:  .  stmt  NEWLINE  stmt_list
														
 
															+stmt_list:  .                             (0)
														
 
															+stmt_list:  .  stmt  NEWLINE  stmt_list   (0)
														
 
															+\end{lstlisting}
														
 
															+We continue to perform prediction actions as more opportunities
														
 
															+arise. For example, the \code{stmt} nonterminal now appears after a
														
 
															+period, so we add all the rules for \code{stmt}.
														
 
															+\begin{lstlisting}
														
 
															+stmt:  .  "print" "("  exp ")"   (0)
														
 
															+stmt:  .  exp                    (0)
														
 
															+\end{lstlisting}
														
 
															+This reveals more opportunities for prediction, so we add the grammar
														
 
															+rules for \code{exp} and \code{exp\_hi}.
														
 
															+\begin{lstlisting}[escapechar=$]
														
 
															+exp: . exp "+" exp_hi         (0)
														
 
															+exp: . exp "-" exp_hi         (0)
														
 
															+exp: . exp_hi                 (0)
														
 
															+exp_hi: . INT                 (0)
														
 
															+exp_hi: . "input_int" "(" ")" (0)
														
 
															+exp_hi: . "-" exp_hi          (0)
														
 
															+exp_hi: . "(" exp ")"         (0)
														
 
															+\end{lstlisting}
														
 
															+
														
 
															+We have exhausted the opportunities for prediction, so the algorithm
														
 
															+proceeds to \emph{scanning}, in which we inspect the next input token
														
 
															+and look for a dotted rule at the current position that has a matching
														
 
															+terminal following the period. In our running example, the first input
														
 
															+token is \code{"print"} so we identify the dotted rule in slot $0$ of
														
 
															+the chart:
														
 
															+\begin{lstlisting}
														
 
															+stmt:  .  "print" "("  exp ")"       (0)
														
 
															+\end{lstlisting}
														
 
															+and add the following rule to slot $1$ of the chart, with the period
														
 
															+moved forward past \code{"print"}.
														
 
															+\begin{lstlisting}
														
 
															+stmt:  "print" . "("  exp ")"        (0)
														
 
															+\end{lstlisting}
														
 
															+If the new dotted rule had a nonterminal after the period, we would
														
 
															+need to carry out a prediction action, adding more dotted rules into
														
 
															+slot $1$. That is not the case, so we continue scanning. The next
														
 
															+input token is \code{"("}, so we add the following to slot $2$ of the
														
 
															+chart.
														
 
															+\begin{lstlisting}
														
 
															+stmt:  "print" "(" . exp ")"         (0)
														
 
															+\end{lstlisting}
														
 
															+
														
 
															+Now we have a nonterminal after the period, so we carry out several
														
 
															+prediction actions, adding dotted rules for \code{exp} and
														
 
															+\code{exp\_hi} to slot $2$ with a period at the beginning and with
														
 
															+starting position $2$.
														
 
															+\begin{lstlisting}[escapechar=$]
														
 
															+exp: . exp "+" exp_hi         (2)
														
 
															+exp: . exp "-" exp_hi         (2)
														
 
															+exp: . exp_hi                 (2)
														
 
															+exp_hi: . INT                 (2)
														
 
															+exp_hi: . "input_int" "(" ")" (2)
														
 
															+exp_hi: . "-" exp_hi          (2)
														
 
															+exp_hi: . "(" exp ")"         (2)
														
 
															+\end{lstlisting}
														
 
															+With that prediction complete, we return to scanning, noting that the
														
 
															+next input token is \code{"1"} which the lexer categorized as an
														
 
															+\code{INT}. There is a matching rule is slot $2$:
														
 
															+\begin{lstlisting}
														
 
															+exp_hi: . INT             (2)
														
 
															+\end{lstlisting}
														
 
															+so we advance the period and put the following rule is slot $3$.
														
 
															+\begin{lstlisting}
														
 
															+exp_hi: INT .             (2)
														
 
															 \end{lstlisting}
														
 
															-The prediction phase continues to add dotted rules as more
														
 
															-opportunities arise. For example, the \code{stmt} nonterminal now
														
 
															-appears after a period, so we add all the rules for \code{stmt}.
														
 
															+This brings us to \emph{completion} actions.  When the period reaches
														
 
															+the end of a dotted rule, we have finished parsing a substring
														
 
															+according to the left-hand side of the rule, in this case
														
 
															+\code{exp\_hi}. We therefore need to advance the periods in any dotted
														
 
															+rules in slot $2$ (the starting position for the finished rule) the
														
 
															+period is immediately followed by \code{exp\_hi}. So we identify
														
 
															 \begin{lstlisting}
														
 
															-stmt:  .  "print" "("  exp ")"
														
 
															-stmt:  .  exp
														
 
															+exp: . exp_hi                 (2)
														
 
															+\end{lstlisting}
														
 
															+and add the following dotted rule to slot $3$
														
 
															+\begin{lstlisting}
														
 
															+exp: exp_hi .                 (2)
														
 
															+\end{lstlisting}
														
 
															+This triggers another completion step for the nonterminal \code{exp},
														
 
															+adding two more dotted rules to slot $3$.
														
 
															+\begin{lstlisting}[escapechar=$]
														
 
															+exp: exp . "+" exp_hi         (2)
														
 
															+exp: exp . "-" exp_hi         (2)
														
 
															+\end{lstlisting}
														
 
															+
														
 
															+Returning to scanning, the next input token is \code{"+"}, so
														
 
															+we add the following to slot $4$.
														
 
															+\begin{lstlisting}[escapechar=$]
														
 
															+exp: exp "+" . exp_hi         (2)
														
 
															+\end{lstlisting}
														
 
															+The period precedes the nonterminal \code{exp\_hi}, so prediction adds
														
 
															+the following dotted rules to slot $4$ of the chart.
														
 
															+\begin{lstlisting}[escapechar=$]
														
 
															+exp_hi: . INT                 (4)
														
 
															+exp_hi: . "input_int" "(" ")" (4)
														
 
															+exp_hi: . "-" exp_hi          (4)
														
 
															+exp_hi: . "(" exp ")"         (4)
														
 
															+\end{lstlisting}
														
 
															+The next input token is \code{"3"} which the lexer categorized as an
														
 
															+\code{INT}, so we advance the period past \code{INT} for the rules in
														
 
															+slot $4$, of which there is just one, and put the following in slot $5$.
														
 
															+\begin{lstlisting}[escapechar=$]
														
 
															+exp_hi: INT .                 (4)
														
 
															+\end{lstlisting}
														
 
															+
														
 
															+The period at the end of the rule triggers a completion action for the
														
 
															+rules in slot $4$, one of which has a period before \code{exp\_hi}.
														
 
															+So we advance the period and put the following in slot $5$.
														
 
															+\begin{lstlisting}[escapechar=$]
														
 
															+exp: exp "+" exp_hi .         (2)
														
 
															 \end{lstlisting}
														
 
															-To finish the preduction phase, we add the grammar rules for
														
 
															-\code{exp} and \code{exp\_hi}.
														
 
															+This triggers another completion action for the rules in slot $2$ that
														
 
															+have a period before \code{exp}.
														
 
															 \begin{lstlisting}[escapechar=$]
														
 
															-exp: . exp "+" exp_hi
														
 
															-exp: . exp "-" exp_hi
														
 
															-exp: . exp_hi
														
 
															-exp_hi: . INT
														
 
															-exp_hi: . "input_int" "(" ")"
														
 
															-exp_hi: . "-" exp_hi
														
 
															-exp_hi: . "(" exp ")"
														
 
															+stmt:  "print" "(" exp . ")"  (0)
														
 
															+exp: exp . "+" exp_hi         (2)
														
 
															+exp: exp . "-" exp_hi         (2)
														
 
															 \end{lstlisting}
														
 
															+We scan the next input token \code{")"}, placing the following dotted
														
 
															+rule in slot $6$.
														
 
															+\begin{lstlisting}[escapechar=$]
														
 
															+stmt:  "print" "(" exp ")" .  (0)
														
 
															+\end{lstlisting}
														
 
															+This triggers the completion of \code{stmt} in slot $0$
														
 
															+\begin{lstlisting}
														
 
															+stmt_list:  stmt . NEWLINE  stmt_list   (0)
														
 
															+\end{lstlisting}
														
 
															+The last input token is a \code{NEWLINE}, so we advance the period
														
 
															+and place the new dotted rule in slot $7$.
														
 
															+\begin{lstlisting}
														
 
															+stmt_list:  stmt NEWLINE .  stmt_list  (0)
														
 
															+\end{lstlisting}
														
 
															+We are close to the end of parsing the input!
														
 
															+The period is before the \code{stmt\_list} nonterminal, so we
														
 
															+apply prediction for \code{stmt\_list} and then \code{stmt}.
														
 
															+\begin{lstlisting}
														
 
															+stmt_list:  .                             (7)
														
 
															+stmt_list:  .  stmt  NEWLINE  stmt_list   (7)
														
 
															+stmt:  .  "print" "("  exp ")"            (7)
														
 
															+stmt:  .  exp                             (7)
														
 
															+\end{lstlisting}
														
 
															+There is immediately an opportunity for completion of \code{stmt\_list},
														
 
															+so we add the following to slot $7$.
														
 
															+\begin{lstlisting}
														
 
															+stmt_list:  stmt NEWLINE stmt_list .  (0)
														
 
															+\end{lstlisting}
														
 
															+This triggers another completion action for \code{stmt\_list} in slot $0$
														
 
															+\begin{lstlisting}
														
 
															+lang_int: stmt_list .               (0)
														
 
															+\end{lstlisting}
														
 
															+which in turn completes \code{lang\_int}, the start symbol of the
														
 
															+grammar, so the parsing of the input is complete.
														
 
															+
														
 
															+For reference, we now give a general description of the Earley
														
 
															+algorithm.
														
 
															+\begin{enumerate}
														
 
															+\item The algorithm begins by initializing slot $0$ of the chart with the
														
 
															+  grammar rule for the start symbol, placing a period at the beginning
														
 
															+  of the right-hand side, and recording its starting position as $0$.
														
 
															+  
														
 
															+\item The algorithm repeatedly applies the following three kinds of
														
 
															+  actions for as long as there are opportunities to do so.
														
 
															+  \begin{itemize}
														
 
															+  \item Prediction: if there is a dotted rule in slot $k$ whose period
														
 
															+    comes before a nonterminal, add all the rules for that nonterminal
														
 
															+    into slot $k$, placing a period at the beginning of their
														
 
															+    right-hand sides, and recording their starting position as
														
 
															+    $k$.
														
 
															+  \item Scanning: If the token at position $k$ of the input string
														
 
															+    matches the symbol after the period in a dotted rule in slot $k$
														
 
															+    of the chart, advance the prior in the dotted rule, adding
														
 
															+    the result to slot $k+1$.
														
 
															+  \item Completion: If a dotted rule in slot $k$ has a period at the
														
 
															+    end, consider the rules in the slot corresponding to the starting
														
 
															+    position of the completed rule. If any of those rules have a
														
 
															+    nonterminal following their period that matches the left-hand side
														
 
															+    of the completed rule, then advance their period, placing the new
														
 
															+    dotted rule in slot $k$.
														
 
															+  \end{itemize}
														
 
															+  While repeating these three actions, take care to never add
														
 
															+  duplicate dotted rules to the chart.
														
 
															+\end{enumerate}
														
 
															-\section{The LALR Algorithm}
														
 
															+We have described how the Earley algorithm recognizes that an input
														
 
															+string matches a grammar, but we have not described how it builds a
														
 
															+parse tree. The basic idea is simple, but it turns out that building
														
 
															+parse trees in an efficient way is more complex, requiring a data
														
 
															+structure called a shared packed parse forest~\citep{Tomita:1985qr}.
														
 
															+The simple idea is to attach a partial parse tree to every dotted
														
 
															+rule.  Initially, the tree node associated with a dotted rule has no
														
 
															+children. As the period moves to the right, the nodes from the
														
 
															+subparses are added as children to this tree node.
														
 
															+
														
 
															+As mentioned at the beginning of this section, the Earley algorithm is
														
 
															+$O(n^2)$ for unambiguous grammars, which means that it can parse input
														
 
															+files that contain thousands of tokens in a reasonable amount of time,
														
 
															+but not millions. In the next section we discuss the LALR(1) parsing
														
 
															+algorithm, which has time complexity $O(n)$, making it practical to
														
 
															+use with even the largest of input files.
														
 
															+
														
 
															+\section{The LALR(1) Algorithm}
														
 
															 \label{sec:lalr}
														
 
															+The LALR(1) algorithm consists of a finite automata and a stack to
														
 
															+record its progress in parsing the input string.  Each element of the
														
 
															+stack is a pair: a state number and a grammar symbol (a terminal or
														
 
															+nonterminal). The symbol characterizes the input that has been parsed
														
 
															+so-far and the state number is used to remember how to proceed once
														
 
															+the next symbol-worth of input has been parsed.  Each state in the
														
 
															+finite automata represents where the parser stands in the parsing
														
 
															+process with respect to certain grammar rules. In particular, each
														
 
															+state is associated with a set of dotted rules.
														
 
															+
														
 
															+Figure~\ref{fig:shift-reduce} shows an example LALR(1) parse table
														
 
															+generated by Lark for the following simple but amiguous grammar:
														
 
															+\begin{lstlisting}[escapechar=$]
														
 
															+exp: INT
														
 
															+   | exp "+" exp
														
 
															+stmt: "print" exp
														
 
															+start: stmt
														
 
															+\end{lstlisting}
														
 
															+%% When PLY generates a parse table, it also
														
 
															+%% outputs a textual representation of the parse table to the file
														
 
															+%% \texttt{parser.out} which is useful for debugging purposes.
														
 
															+Consider state 1 in Figure~\ref{fig:shift-reduce}. The parser has just
														
 
															+read in a \lstinline{PRINT} token, so the top of the stack is
														
 
															+\lstinline{(1,PRINT)}. The parser is part of the way through parsing
														
 
															+the input according to grammar rule 1, which is signified by showing
														
 
															+rule 1 with a period after the \code{PRINT} token and before the
														
 
															+\code{exp} nonterminal.  A rule with a period in it is called an
														
 
															+\emph{item}. There are several rules that could apply next, both rule
														
 
															+2 and 3, so state 1 also shows those rules with a period at the
														
 
															+beginning of their right-hand sides. The edges between states indicate
														
 
															+which transitions the automata should make depending on the next input
														
 
															+token. So, for example, if the next input token is \code{INT} then the
														
 
															+parser will push \code{INT} and the target state 4 on the stack and
														
 
															+transition to state 4.  Suppose we are now at the end of the input. In
														
 
															+state 4 it says we should reduce by rule 3, so we pop from the stack
														
 
															+the same number of items as the number of symbols in the right-hand
														
 
															+side of the rule, in this case just one.  We then momentarily jump to
														
 
															+the state at the top of the stack (state 1) and then follow the goto
														
 
															+edge that corresponds to the left-hand side of the rule we just
														
 
															+reduced by, in this case \code{exp}, so we arrive at state 3.  (A
														
 
															+slightly longer example parse is shown in
														
 
															+Figure~\ref{fig:shift-reduce}.)
														
 
															+
														
 
															+
														
 
															+\begin{figure}[htbp]
														
 
															+  \centering
														
 
															+\includegraphics[width=5.0in]{figs/shift-reduce-conflict}  
														
 
															+  \caption{An LALR(1) parse table and a trace of an example run.}
														
 
															+  \label{fig:shift-reduce}
														
 
															+\end{figure}
														
 
															+
														
 
															+In general, the shift-reduce algorithm works as follows. Look at the
														
 
															+next input token.
														
 
															+\begin{itemize}
														
 
															+\item If there there is a shift edge for the input token, push the
														
 
															+  edge's target state and the input token on the stack and proceed to
														
 
															+  the edge's target state.
														
 
															+\item If there is a reduce action for the input token, pop $k$
														
 
															+  elements from the stack, where $k$ is the number of symbols in the
														
 
															+  right-hand side of the rule being reduced. Jump to the state at the
														
 
															+  top of the stack and then follow the goto edge for the nonterminal
														
 
															+  that matches the left-hand side of the rule we're reducing by. Push
														
 
															+  the edge's target state and the nonterminal on the stack.
														
 
															+\end{itemize}
														
 
															+
														
 
															+Notice that in state 6 of Figure~\ref{fig:shift-reduce} there is both
														
 
															+a shift and a reduce action for the token \lstinline{PLUS}, so the
														
 
															+algorithm does not know which action to take in this case. When a
														
 
															+state has both a shift and a reduce action for the same token, we say
														
 
															+there is a \emph{shift/reduce conflict}.  In this case, the conflict
														
 
															+will arise, for example, when trying to parse the input
														
 
															+\lstinline{print 1 + 2 + 3}.  After having consumed \lstinline{print 1 + 2}
														
 
															+the parser will be in state 6, and it will not know whether to
														
 
															+reduce to form an \emph{exp} of \lstinline{1 + 2}, or whether it
														
 
															+should proceed by shifting the next \lstinline{+} from the input.
														
 
															+
														
 
															+A similar kind of problem, known as a \emph{reduce/reduce} conflict,
														
 
															+arises when there are two reduce actions in a state for the same
														
 
															+token. To understand which grammars gives rise to shift/reduce and
														
 
															+reduce/reduce conflicts, it helps to know how the parse table is
														
 
															+generated from the grammar, which we discuss next.
														
 
															+
														
 
															+The parse table is generated one state at a time. State 0 represents
														
 
															+the start of the parser. We add the grammar rule for the start symbol
														
 
															+to this state with a period at the beginning of the right-hand side,
														
 
															+similar to the initialization phase of the Earley parser.  If the
														
 
															+period appears immediately before another nonterminal, we add all the
														
 
															+rules with that nonterminal on the left-hand side. Again, we place a
														
 
															+period at the beginning of the right-hand side of each the new
														
 
															+rules. This process called \emph{state closure} is continued
														
 
															+until there are no more rules to add (similar to the prediction
														
 
															+actions of an Earley parser). We then examine each dotted rule in the
														
 
															+current state $I$. Suppose a dotted rule has the form $A ::=
														
 
															+\alpha.X\beta$, where $A$ and $X$ are symbols and $\alpha$ and $\beta$
														
 
															+are sequences of symbols. We create a new state, call it $J$.  If $X$
														
 
															+is a terminal, we create a shift edge from $I$ to $J$ (analogous to
														
 
															+scanning in Earley), whereas if $X$ is a nonterminal, we create a
														
 
															+goto edge from $I$ to $J$.  We then need to add some dotted rules to
														
 
															+state $J$. We start by adding all dotted rules from state $I$ that
														
 
															+have the form $B ::= \gamma.X\kappa$ (where $B$ is any nonterminal and
														
 
															+$\gamma$ and $\kappa$ are arbitrary sequences of symbols), but with
														
 
															+the period moved past the $X$.  (This is analogous to completion in
														
 
															+the Earley algorithm.)  We then perform state closure on $J$.  This
														
 
															+process repeats until there are no more states or edges to add.
														
 
															+
														
 
															+We then mark states as accepting states if they have a dotted rule
														
 
															+that is the start rule with a period at the end.  Also, to add
														
 
															+in the reduce actions, we look for any state containing a dotted rule
														
 
															+with a period at the end. Let $n$ be the rule number for this dotted
														
 
															+rule. We then put a reduce $n$ action into that state for every token
														
 
															+$Y$. For example, in Figure~\ref{fig:shift-reduce} state 4 has an
														
 
															+dotted rule with a period at the end. We therefore put a reduce by
														
 
															+rule 3 action into state 4 for every
														
 
															+token. (Figure~\ref{fig:shift-reduce} does not show a reduce rule for
														
 
															+\code{INT} in state 4 because this grammar does not allow two
														
 
															+consecutive \code{INT} tokens in the input. We will not go into how
														
 
															+this can be figured out, but in any event it does no harm to have a
														
 
															+reduce rule for \code{INT} in state 4; it just means the input will be
														
 
															+rejected at a later point in the parsing process.)
														
 
															+
														
 
															+\begin{exercise}
														
 
															+On a piece of paper, walk through the parse table generation 
														
 
															+process for the grammar in Figure~\ref{fig:parser1} and check
														
 
															+your results against Figure~\ref{fig:shift-reduce}. 
														
 
															+\end{exercise}
														
 
															+
														
 
															+
														
 
															 \section{Further Reading}
														
 
															 UNDER CONSTRUCTION