2 gadi atpakaļ · ac5f48a492
--- a/book.bib
+++ b/book.bib
@@ -1,3 +1,13 @@
 
				+@book{Tomita:1985qr,
			
 
				+	address = {Norwell, MA, USA},
			
 
				+	author = {Masaru Tomita},
			
 
				+	date-added = {2008-12-02 14:16:33 -0700},
			
 
				+	date-modified = {2008-12-02 14:16:39 -0700},
			
 
				+	isbn = {0898382025},
			
 
				+	publisher = {Kluwer Academic Publishers},
			
 
				+	title = {Efficient Parsing for Natural Language: A Fast Algorithm for Practical Systems},
			
 
				+	year = {1985}}
			
 
				+
			
 
				 @article{Earley:1970ly,
			
 
				 	acmid = {362035},
			
 
				 	address = {New York, NY, USA},
			
--- a/book.tex
+++ b/book.tex
@@ -4298,7 +4298,7 @@ The rule \code{exp: exp "+" exp} says that any string that matches
 
				 \code{exp}, followed by the \code{+} character, followed by another
			
 
				 string that matches \code{exp}, is itself an \code{exp}.  For example,
			
 
				 the string \code{'1+3'} is an \code{exp} because \code{'1'} and
			
 
				-\code{'3'} are both \code{exp} by rule \code{exp: INT}, and then the
			
 
				+\code{'3'} are both \code{exp} by the rule \code{exp: INT}, and then the
			
 
				 rule for addition applies to categorize \code{'1+3'} as an \Exp{}. We
			
 
				 can visualize the application of grammar rules to categorize a string
			
 
				 using a \emph{parse tree}\index{subject}{parse tree}. Each internal
			
@@ -4426,9 +4426,9 @@ nonterminal \code{exp\_hi} for all the other expressions, and uses
 
				 subtraction. Furthermore, unary subtraction uses \code{exp\_hi} for
			
 
				 its child.
			
 
				 
			
 
				-For languages with more operators with more precedence levels, one
			
 
				+For languages with more operators and more precedence levels, one
			
 
				 would need to refine the \code{exp} nonterminal into several
			
 
				-nonterminals, 
			
 
				+nonterminals, one for each precedence level.
			
 
				 
			
 
				 \begin{figure}[tbp]
			
 
				 \begin{tcolorbox}[colback=white]
			
@@ -4514,7 +4514,8 @@ In this section we discuss the parsing algorithm of
 
				 The algorithm is powerful in that it can handle any context-free
			
 
				 grammar, which makes it easy to use. However, it is not the most
			
 
				 efficient parsing algorithm: it is $O(n^3)$ for ambiguous grammars and
			
 
				-$O(n^2)$ for unambiguous grammars~\citep{Hopcroft06:_automata}.  In
			
 
				+$O(n^2)$ for unambiguous grammars~\citep{Hopcroft06:_automata}, where
			
 
				+$n$ is the number of tokens in the input string.  In
			
 
				 section~\ref{sec:lalr} we learn about the LALR algorithm, which is
			
 
				 more efficient but can only handle a subset of the context-free
			
 
				 grammars.
			
@@ -4531,16 +4532,24 @@ been parsed. For example, the dotted rule
 
				 \begin{lstlisting}
			
 
				 exp: exp "+" . exp_hi
			
 
				 \end{lstlisting}
			
 
				-represents a partial parse that has matched an expression followed by
			
 
				-\code{+}, but has not yet parsed an expression to the right of
			
 
				+represents a partial parse that has matched an \code{exp} followed by
			
 
				+\code{+}, but has not yet parsed an \code{exp} to the right of
			
 
				 \code{+}.
			
 
				-
			
 
				-The algorithm begins by creating dotted rules for all the grammar
			
 
				-rules whose left-hand side is the start symbol and placing then in
			
 
				-slot $0$ of the chart.  For example, given the grammar in
			
 
				-figure~\ref{fig:Lint-lark-grammar}, we would place
			
 
				+%
			
 
				+The Earley algorithm starts with an initialization phase, and then
			
 
				+repeats three actions (prediction, scanning, and completion) for as
			
 
				+long as opportunities arise for those actions. We demonstrate the
			
 
				+Earley algorithm on a running example, parsing the following program:
			
 
				 \begin{lstlisting}
			
 
				-  lang_int: . stmt_list
			
 
				+  print(1 + 3)
			
 
				+\end{lstlisting}
			
 
				+The algorithm's initialization phase creates dotted rules for all the
			
 
				+grammar rules whose left-hand side is the start symbol and places them
			
 
				+in slot $0$ of the chart. It also records the starting position of the
			
 
				+dotted rule, in parentheses on the right. For example, given the
			
 
				+grammar in figure~\ref{fig:Lint-lark-grammar}, we place
			
 
				+\begin{lstlisting}
			
 
				+  lang_int: . stmt_list         (0)
			
 
				 \end{lstlisting}
			
 
				 in slot $0$ of the chart. The algorithm then proceeds to its
			
 
				 \emph{prediction} phase in which it adds more dotted rules to the
			
@@ -4549,32 +4558,340 @@ the nonterminal \code{stmt\_list} appears after a period, so we add all
 
				 the rules for \code{stmt\_list} to slot $0$, with a period at the
			
 
				 beginning of their right-hand sides, as follows:
			
 
				 \begin{lstlisting}
			
 
				-stmt_list:  . 
			
 
				-stmt_list:  .  stmt  NEWLINE  stmt_list
			
 
				+stmt_list:  .                             (0)
			
 
				+stmt_list:  .  stmt  NEWLINE  stmt_list   (0)
			
 
				+\end{lstlisting}
			
 
				+We continue to perform prediction actions as more opportunities
			
 
				+arise. For example, the \code{stmt} nonterminal now appears after a
			
 
				+period, so we add all the rules for \code{stmt}.
			
 
				+\begin{lstlisting}
			
 
				+stmt:  .  "print" "("  exp ")"   (0)
			
 
				+stmt:  .  exp                    (0)
			
 
				+\end{lstlisting}
			
 
				+This reveals more opportunities for prediction, so we add the grammar
			
 
				+rules for \code{exp} and \code{exp\_hi}.
			
 
				+\begin{lstlisting}[escapechar=$]
			
 
				+exp: . exp "+" exp_hi         (0)
			
 
				+exp: . exp "-" exp_hi         (0)
			
 
				+exp: . exp_hi                 (0)
			
 
				+exp_hi: . INT                 (0)
			
 
				+exp_hi: . "input_int" "(" ")" (0)
			
 
				+exp_hi: . "-" exp_hi          (0)
			
 
				+exp_hi: . "(" exp ")"         (0)
			
 
				+\end{lstlisting}
			
 
				+
			
 
				+We have exhausted the opportunities for prediction, so the algorithm
			
 
				+proceeds to \emph{scanning}, in which we inspect the next input token
			
 
				+and look for a dotted rule at the current position that has a matching
			
 
				+terminal following the period. In our running example, the first input
			
 
				+token is \code{"print"} so we identify the dotted rule in slot $0$ of
			
 
				+the chart:
			
 
				+\begin{lstlisting}
			
 
				+stmt:  .  "print" "("  exp ")"       (0)
			
 
				+\end{lstlisting}
			
 
				+and add the following rule to slot $1$ of the chart, with the period
			
 
				+moved forward past \code{"print"}.
			
 
				+\begin{lstlisting}
			
 
				+stmt:  "print" . "("  exp ")"        (0)
			
 
				+\end{lstlisting}
			
 
				+If the new dotted rule had a nonterminal after the period, we would
			
 
				+need to carry out a prediction action, adding more dotted rules into
			
 
				+slot $1$. That is not the case, so we continue scanning. The next
			
 
				+input token is \code{"("}, so we add the following to slot $2$ of the
			
 
				+chart.
			
 
				+\begin{lstlisting}
			
 
				+stmt:  "print" "(" . exp ")"         (0)
			
 
				+\end{lstlisting}
			
 
				+
			
 
				+Now we have a nonterminal after the period, so we carry out several
			
 
				+prediction actions, adding dotted rules for \code{exp} and
			
 
				+\code{exp\_hi} to slot $2$ with a period at the beginning and with
			
 
				+starting position $2$.
			
 
				+\begin{lstlisting}[escapechar=$]
			
 
				+exp: . exp "+" exp_hi         (2)
			
 
				+exp: . exp "-" exp_hi         (2)
			
 
				+exp: . exp_hi                 (2)
			
 
				+exp_hi: . INT                 (2)
			
 
				+exp_hi: . "input_int" "(" ")" (2)
			
 
				+exp_hi: . "-" exp_hi          (2)
			
 
				+exp_hi: . "(" exp ")"         (2)
			
 
				+\end{lstlisting}
			
 
				+With that prediction complete, we return to scanning, noting that the
			
 
				+next input token is \code{"1"} which the lexer categorized as an
			
 
				+\code{INT}. There is a matching rule is slot $2$:
			
 
				+\begin{lstlisting}
			
 
				+exp_hi: . INT             (2)
			
 
				+\end{lstlisting}
			
 
				+so we advance the period and put the following rule is slot $3$.
			
 
				+\begin{lstlisting}
			
 
				+exp_hi: INT .             (2)
			
 
				 \end{lstlisting}
			
 
				-The prediction phase continues to add dotted rules as more
			
 
				-opportunities arise. For example, the \code{stmt} nonterminal now
			
 
				-appears after a period, so we add all the rules for \code{stmt}.
			
 
				+This brings us to \emph{completion} actions.  When the period reaches
			
 
				+the end of a dotted rule, we have finished parsing a substring
			
 
				+according to the left-hand side of the rule, in this case
			
 
				+\code{exp\_hi}. We therefore need to advance the periods in any dotted
			
 
				+rules in slot $2$ (the starting position for the finished rule) the
			
 
				+period is immediately followed by \code{exp\_hi}. So we identify
			
 
				 \begin{lstlisting}
			
 
				-stmt:  .  "print" "("  exp ")"
			
 
				-stmt:  .  exp
			
 
				+exp: . exp_hi                 (2)
			
 
				+\end{lstlisting}
			
 
				+and add the following dotted rule to slot $3$
			
 
				+\begin{lstlisting}
			
 
				+exp: exp_hi .                 (2)
			
 
				+\end{lstlisting}
			
 
				+This triggers another completion step for the nonterminal \code{exp},
			
 
				+adding two more dotted rules to slot $3$.
			
 
				+\begin{lstlisting}[escapechar=$]
			
 
				+exp: exp . "+" exp_hi         (2)
			
 
				+exp: exp . "-" exp_hi         (2)
			
 
				+\end{lstlisting}
			
 
				+
			
 
				+Returning to scanning, the next input token is \code{"+"}, so
			
 
				+we add the following to slot $4$.
			
 
				+\begin{lstlisting}[escapechar=$]
			
 
				+exp: exp "+" . exp_hi         (2)
			
 
				+\end{lstlisting}
			
 
				+The period precedes the nonterminal \code{exp\_hi}, so prediction adds
			
 
				+the following dotted rules to slot $4$ of the chart.
			
 
				+\begin{lstlisting}[escapechar=$]
			
 
				+exp_hi: . INT                 (4)
			
 
				+exp_hi: . "input_int" "(" ")" (4)
			
 
				+exp_hi: . "-" exp_hi          (4)
			
 
				+exp_hi: . "(" exp ")"         (4)
			
 
				+\end{lstlisting}
			
 
				+The next input token is \code{"3"} which the lexer categorized as an
			
 
				+\code{INT}, so we advance the period past \code{INT} for the rules in
			
 
				+slot $4$, of which there is just one, and put the following in slot $5$.
			
 
				+\begin{lstlisting}[escapechar=$]
			
 
				+exp_hi: INT .                 (4)
			
 
				+\end{lstlisting}
			
 
				+
			
 
				+The period at the end of the rule triggers a completion action for the
			
 
				+rules in slot $4$, one of which has a period before \code{exp\_hi}.
			
 
				+So we advance the period and put the following in slot $5$.
			
 
				+\begin{lstlisting}[escapechar=$]
			
 
				+exp: exp "+" exp_hi .         (2)
			
 
				 \end{lstlisting}
			
 
				-To finish the preduction phase, we add the grammar rules for
			
 
				-\code{exp} and \code{exp\_hi}.
			
 
				+This triggers another completion action for the rules in slot $2$ that
			
 
				+have a period before \code{exp}.
			
 
				 \begin{lstlisting}[escapechar=$]
			
 
				-exp: . exp "+" exp_hi
			
 
				-exp: . exp "-" exp_hi
			
 
				-exp: . exp_hi
			
 
				-exp_hi: . INT
			
 
				-exp_hi: . "input_int" "(" ")"
			
 
				-exp_hi: . "-" exp_hi
			
 
				-exp_hi: . "(" exp ")"
			
 
				+stmt:  "print" "(" exp . ")"  (0)
			
 
				+exp: exp . "+" exp_hi         (2)
			
 
				+exp: exp . "-" exp_hi         (2)
			
 
				 \end{lstlisting}
			
 
				 
			
 
				+We scan the next input token \code{")"}, placing the following dotted
			
 
				+rule in slot $6$.
			
 
				+\begin{lstlisting}[escapechar=$]
			
 
				+stmt:  "print" "(" exp ")" .  (0)
			
 
				+\end{lstlisting}
			
 
				+This triggers the completion of \code{stmt} in slot $0$
			
 
				+\begin{lstlisting}
			
 
				+stmt_list:  stmt . NEWLINE  stmt_list   (0)
			
 
				+\end{lstlisting}
			
 
				+The last input token is a \code{NEWLINE}, so we advance the period
			
 
				+and place the new dotted rule in slot $7$.
			
 
				+\begin{lstlisting}
			
 
				+stmt_list:  stmt NEWLINE .  stmt_list  (0)
			
 
				+\end{lstlisting}
			
 
				+We are close to the end of parsing the input!
			
 
				+The period is before the \code{stmt\_list} nonterminal, so we
			
 
				+apply prediction for \code{stmt\_list} and then \code{stmt}.
			
 
				+\begin{lstlisting}
			
 
				+stmt_list:  .                             (7)
			
 
				+stmt_list:  .  stmt  NEWLINE  stmt_list   (7)
			
 
				+stmt:  .  "print" "("  exp ")"            (7)
			
 
				+stmt:  .  exp                             (7)
			
 
				+\end{lstlisting}
			
 
				+There is immediately an opportunity for completion of \code{stmt\_list},
			
 
				+so we add the following to slot $7$.
			
 
				+\begin{lstlisting}
			
 
				+stmt_list:  stmt NEWLINE stmt_list .  (0)
			
 
				+\end{lstlisting}
			
 
				+This triggers another completion action for \code{stmt\_list} in slot $0$
			
 
				+\begin{lstlisting}
			
 
				+lang_int: stmt_list .               (0)
			
 
				+\end{lstlisting}
			
 
				+which in turn completes \code{lang\_int}, the start symbol of the
			
 
				+grammar, so the parsing of the input is complete.
			
 
				+
			
 
				+For reference, we now give a general description of the Earley
			
 
				+algorithm.
			
 
				+\begin{enumerate}
			
 
				+\item The algorithm begins by initializing slot $0$ of the chart with the
			
 
				+  grammar rule for the start symbol, placing a period at the beginning
			
 
				+  of the right-hand side, and recording its starting position as $0$.
			
 
				+  
			
 
				+\item The algorithm repeatedly applies the following three kinds of
			
 
				+  actions for as long as there are opportunities to do so.
			
 
				+  \begin{itemize}
			
 
				+  \item Prediction: if there is a dotted rule in slot $k$ whose period
			
 
				+    comes before a nonterminal, add all the rules for that nonterminal
			
 
				+    into slot $k$, placing a period at the beginning of their
			
 
				+    right-hand sides, and recording their starting position as
			
 
				+    $k$.
			
 
				+  \item Scanning: If the token at position $k$ of the input string
			
 
				+    matches the symbol after the period in a dotted rule in slot $k$
			
 
				+    of the chart, advance the prior in the dotted rule, adding
			
 
				+    the result to slot $k+1$.
			
 
				+  \item Completion: If a dotted rule in slot $k$ has a period at the
			
 
				+    end, consider the rules in the slot corresponding to the starting
			
 
				+    position of the completed rule. If any of those rules have a
			
 
				+    nonterminal following their period that matches the left-hand side
			
 
				+    of the completed rule, then advance their period, placing the new
			
 
				+    dotted rule in slot $k$.
			
 
				+  \end{itemize}
			
 
				+  While repeating these three actions, take care to never add
			
 
				+  duplicate dotted rules to the chart.
			
 
				+\end{enumerate}
			
 
				 
			
 
				-\section{The LALR Algorithm}
			
 
				+We have described how the Earley algorithm recognizes that an input
			
 
				+string matches a grammar, but we have not described how it builds a
			
 
				+parse tree. The basic idea is simple, but it turns out that building
			
 
				+parse trees in an efficient way is more complex, requiring a data
			
 
				+structure called a shared packed parse forest~\citep{Tomita:1985qr}.
			
 
				+The simple idea is to attach a partial parse tree to every dotted
			
 
				+rule.  Initially, the tree node associated with a dotted rule has no
			
 
				+children. As the period moves to the right, the nodes from the
			
 
				+subparses are added as children to this tree node.
			
 
				+
			
 
				+As mentioned at the beginning of this section, the Earley algorithm is
			
 
				+$O(n^2)$ for unambiguous grammars, which means that it can parse input
			
 
				+files that contain thousands of tokens in a reasonable amount of time,
			
 
				+but not millions. In the next section we discuss the LALR(1) parsing
			
 
				+algorithm, which has time complexity $O(n)$, making it practical to
			
 
				+use with even the largest of input files.
			
 
				+
			
 
				+\section{The LALR(1) Algorithm}
			
 
				 \label{sec:lalr}
			
 
				 
			
 
				+The LALR(1) algorithm consists of a finite automata and a stack to
			
 
				+record its progress in parsing the input string.  Each element of the
			
 
				+stack is a pair: a state number and a grammar symbol (a terminal or
			
 
				+nonterminal). The symbol characterizes the input that has been parsed
			
 
				+so-far and the state number is used to remember how to proceed once
			
 
				+the next symbol-worth of input has been parsed.  Each state in the
			
 
				+finite automata represents where the parser stands in the parsing
			
 
				+process with respect to certain grammar rules. In particular, each
			
 
				+state is associated with a set of dotted rules.
			
 
				+
			
 
				+Figure~\ref{fig:shift-reduce} shows an example LALR(1) parse table
			
 
				+generated by Lark for the following simple but amiguous grammar:
			
 
				+\begin{lstlisting}[escapechar=$]
			
 
				+exp: INT
			
 
				+   | exp "+" exp
			
 
				+stmt: "print" exp
			
 
				+start: stmt
			
 
				+\end{lstlisting}
			
 
				+%% When PLY generates a parse table, it also
			
 
				+%% outputs a textual representation of the parse table to the file
			
 
				+%% \texttt{parser.out} which is useful for debugging purposes.
			
 
				+Consider state 1 in Figure~\ref{fig:shift-reduce}. The parser has just
			
 
				+read in a \lstinline{PRINT} token, so the top of the stack is
			
 
				+\lstinline{(1,PRINT)}. The parser is part of the way through parsing
			
 
				+the input according to grammar rule 1, which is signified by showing
			
 
				+rule 1 with a period after the \code{PRINT} token and before the
			
 
				+\code{exp} nonterminal.  A rule with a period in it is called an
			
 
				+\emph{item}. There are several rules that could apply next, both rule
			
 
				+2 and 3, so state 1 also shows those rules with a period at the
			
 
				+beginning of their right-hand sides. The edges between states indicate
			
 
				+which transitions the automata should make depending on the next input
			
 
				+token. So, for example, if the next input token is \code{INT} then the
			
 
				+parser will push \code{INT} and the target state 4 on the stack and
			
 
				+transition to state 4.  Suppose we are now at the end of the input. In
			
 
				+state 4 it says we should reduce by rule 3, so we pop from the stack
			
 
				+the same number of items as the number of symbols in the right-hand
			
 
				+side of the rule, in this case just one.  We then momentarily jump to
			
 
				+the state at the top of the stack (state 1) and then follow the goto
			
 
				+edge that corresponds to the left-hand side of the rule we just
			
 
				+reduced by, in this case \code{exp}, so we arrive at state 3.  (A
			
 
				+slightly longer example parse is shown in
			
 
				+Figure~\ref{fig:shift-reduce}.)
			
 
				+
			
 
				+
			
 
				+\begin{figure}[htbp]
			
 
				+  \centering
			
 
				+\includegraphics[width=5.0in]{figs/shift-reduce-conflict}  
			
 
				+  \caption{An LALR(1) parse table and a trace of an example run.}
			
 
				+  \label{fig:shift-reduce}
			
 
				+\end{figure}
			
 
				+
			
 
				+In general, the shift-reduce algorithm works as follows. Look at the
			
 
				+next input token.
			
 
				+\begin{itemize}
			
 
				+\item If there there is a shift edge for the input token, push the
			
 
				+  edge's target state and the input token on the stack and proceed to
			
 
				+  the edge's target state.
			
 
				+\item If there is a reduce action for the input token, pop $k$
			
 
				+  elements from the stack, where $k$ is the number of symbols in the
			
 
				+  right-hand side of the rule being reduced. Jump to the state at the
			
 
				+  top of the stack and then follow the goto edge for the nonterminal
			
 
				+  that matches the left-hand side of the rule we're reducing by. Push
			
 
				+  the edge's target state and the nonterminal on the stack.
			
 
				+\end{itemize}
			
 
				+
			
 
				+Notice that in state 6 of Figure~\ref{fig:shift-reduce} there is both
			
 
				+a shift and a reduce action for the token \lstinline{PLUS}, so the
			
 
				+algorithm does not know which action to take in this case. When a
			
 
				+state has both a shift and a reduce action for the same token, we say
			
 
				+there is a \emph{shift/reduce conflict}.  In this case, the conflict
			
 
				+will arise, for example, when trying to parse the input
			
 
				+\lstinline{print 1 + 2 + 3}.  After having consumed \lstinline{print 1 + 2}
			
 
				+the parser will be in state 6, and it will not know whether to
			
 
				+reduce to form an \emph{exp} of \lstinline{1 + 2}, or whether it
			
 
				+should proceed by shifting the next \lstinline{+} from the input.
			
 
				+
			
 
				+A similar kind of problem, known as a \emph{reduce/reduce} conflict,
			
 
				+arises when there are two reduce actions in a state for the same
			
 
				+token. To understand which grammars gives rise to shift/reduce and
			
 
				+reduce/reduce conflicts, it helps to know how the parse table is
			
 
				+generated from the grammar, which we discuss next.
			
 
				+
			
 
				+The parse table is generated one state at a time. State 0 represents
			
 
				+the start of the parser. We add the grammar rule for the start symbol
			
 
				+to this state with a period at the beginning of the right-hand side,
			
 
				+similar to the initialization phase of the Earley parser.  If the
			
 
				+period appears immediately before another nonterminal, we add all the
			
 
				+rules with that nonterminal on the left-hand side. Again, we place a
			
 
				+period at the beginning of the right-hand side of each the new
			
 
				+rules. This process called \emph{state closure} is continued
			
 
				+until there are no more rules to add (similar to the prediction
			
 
				+actions of an Earley parser). We then examine each dotted rule in the
			
 
				+current state $I$. Suppose a dotted rule has the form $A ::=
			
 
				+\alpha.X\beta$, where $A$ and $X$ are symbols and $\alpha$ and $\beta$
			
 
				+are sequences of symbols. We create a new state, call it $J$.  If $X$
			
 
				+is a terminal, we create a shift edge from $I$ to $J$ (analogous to
			
 
				+scanning in Earley), whereas if $X$ is a nonterminal, we create a
			
 
				+goto edge from $I$ to $J$.  We then need to add some dotted rules to
			
 
				+state $J$. We start by adding all dotted rules from state $I$ that
			
 
				+have the form $B ::= \gamma.X\kappa$ (where $B$ is any nonterminal and
			
 
				+$\gamma$ and $\kappa$ are arbitrary sequences of symbols), but with
			
 
				+the period moved past the $X$.  (This is analogous to completion in
			
 
				+the Earley algorithm.)  We then perform state closure on $J$.  This
			
 
				+process repeats until there are no more states or edges to add.
			
 
				+
			
 
				+We then mark states as accepting states if they have a dotted rule
			
 
				+that is the start rule with a period at the end.  Also, to add
			
 
				+in the reduce actions, we look for any state containing a dotted rule
			
 
				+with a period at the end. Let $n$ be the rule number for this dotted
			
 
				+rule. We then put a reduce $n$ action into that state for every token
			
 
				+$Y$. For example, in Figure~\ref{fig:shift-reduce} state 4 has an
			
 
				+dotted rule with a period at the end. We therefore put a reduce by
			
 
				+rule 3 action into state 4 for every
			
 
				+token. (Figure~\ref{fig:shift-reduce} does not show a reduce rule for
			
 
				+\code{INT} in state 4 because this grammar does not allow two
			
 
				+consecutive \code{INT} tokens in the input. We will not go into how
			
 
				+this can be figured out, but in any event it does no harm to have a
			
 
				+reduce rule for \code{INT} in state 4; it just means the input will be
			
 
				+rejected at a later point in the parsing process.)
			
 
				+
			
 
				+\begin{exercise}
			
 
				+On a piece of paper, walk through the parse table generation 
			
 
				+process for the grammar in Figure~\ref{fig:parser1} and check
			
 
				+your results against Figure~\ref{fig:shift-reduce}. 
			
 
				+\end{exercise}
			
 
				+
			
 
				+
			
 
				 \section{Further Reading}
			
 
				 
			
 
				 UNDER CONSTRUCTION