浏览代码

edits to parsing chapter

Jeremy Siek 2 年之前
父节点
当前提交
d6c8fe5f73
共有 1 个文件被更改,包括 104 次插入105 次删除
  1. 104 105
      book.tex

+ 104 - 105
book.tex

@@ -4167,60 +4167,59 @@ Each token includes a field for its \code{type}, such as \code{'INT'},
 and a field for its \code{value}, such as \code{'1'}.
 and a field for its \code{value}, such as \code{'1'}.
 
 
 Following in the tradition of \code{lex}~\citep{Lesk:1975uq}, the
 Following in the tradition of \code{lex}~\citep{Lesk:1975uq}, the
-specification language for Lark's lexical analysis generator is one
-regular expression for each type of token. The term \emph{regular}
-comes from the term \emph{regular languages}, which are the languages
-that can be recognized by a finite automata. A \emph{regular
-  expression} is a pattern formed of the following core
-elements:\index{subject}{regular expression}\footnote{Regular
-  expressions traditionally include the empty regular expression that
-  matches any zero-length part of a string, but Lark does not support
-  the empty regular expression.}
+specification language for Lark's lexer is one regular expression for
+each type of token. The term \emph{regular} comes from the term
+\emph{regular languages}, which are the languages that can be
+recognized by a finite state machine. A \emph{regular expression} is a
+pattern formed of the following core elements:\index{subject}{regular
+  expression}\footnote{Regular expressions traditionally include the
+  empty regular expression that matches any zero-length part of a
+  string, but Lark does not support the empty regular expression.}
 \begin{itemize}
 \begin{itemize}
 \item A single character $c$ is a regular expression and it only
 \item A single character $c$ is a regular expression and it only
   matches itself. For example, the regular expression \code{a} only
   matches itself. For example, the regular expression \code{a} only
   matches with the string \code{'a'}.
   matches with the string \code{'a'}.
   
   
-\item Two regular expressions separated by a vertical bar $R_1 \mid
+\item Two regular expressions separated by a vertical bar $R_1 \ttm{|}
   R_2$ form a regular expression that matches any string that matches
   R_2$ form a regular expression that matches any string that matches
   $R_1$ or $R_2$. For example, the regular expression \code{a|c}
   $R_1$ or $R_2$. For example, the regular expression \code{a|c}
   matches the string \code{'a'} and the string \code{'c'}.
   matches the string \code{'a'} and the string \code{'c'}.
 
 
 \item Two regular expressions in sequence $R_1 R_2$ form a regular
 \item Two regular expressions in sequence $R_1 R_2$ form a regular
   expression that matches any string that can be formed by
   expression that matches any string that can be formed by
-  concatenating two strings, where the first matches $R_1$
-  and the second matches $R_2$. For example, the regular expression
+  concatenating two strings, where the first string matches $R_1$ and
+  the second string matches $R_2$. For example, the regular expression
   \code{(a|c)b} matches the strings \code{'ab'} and \code{'cb'}.
   \code{(a|c)b} matches the strings \code{'ab'} and \code{'cb'}.
   (Parentheses can be used to control the grouping of operators within
   (Parentheses can be used to control the grouping of operators within
   a regular expression.)
   a regular expression.)
 
 
-\item A regular expression followed by an asterisks $R*$ (called
+\item A regular expression followed by an asterisks $R\ttm{*}$ (called
   Kleene closure) is a regular expression that matches any string that
   Kleene closure) is a regular expression that matches any string that
   can be formed by concatenating zero or more strings that each match
   can be formed by concatenating zero or more strings that each match
   the regular expression $R$.  For example, the regular expression
   the regular expression $R$.  For example, the regular expression
-  \code{"((a|c)b)*"} matches the strings \code{'abcbab'} and
-  \code{''}, but not \code{'abc'}.
+  \code{"((a|c)b)*"} matches the strings \code{'abcbab'} but not
+  \code{'abc'}.
 \end{itemize}
 \end{itemize}
 
 
-For our convenience, Lark also accepts an extended set of regular
-expressions that are automatically translated into the core regular
-expressions.
+For our convenience, Lark also accepts the following extended set of
+regular expressions that are automatically translated into the core
+regular expressions.
 
 
 \begin{itemize}
 \begin{itemize}
 \item A set of characters enclosed in square brackets $[c_1 c_2 \ldots
 \item A set of characters enclosed in square brackets $[c_1 c_2 \ldots
   c_n]$ is a regular expression that matches any one of the
   c_n]$ is a regular expression that matches any one of the
   characters. So $[c_1 c_2 \ldots c_n]$  is equivalent to
   characters. So $[c_1 c_2 \ldots c_n]$  is equivalent to
   the regular expression $c_1\mid c_2\mid \ldots \mid c_n$.
   the regular expression $c_1\mid c_2\mid \ldots \mid c_n$.
-\item A range of characters enclosed in square brackets $[c_1-c_2]$ is
+\item A range of characters enclosed in square brackets $[c_1\ttm{-}c_2]$ is
   a regular expression that matches any character between $c_1$ and
   a regular expression that matches any character between $c_1$ and
   $c_2$, inclusive. For example, \code{[a-z]} matches any lowercase
   $c_2$, inclusive. For example, \code{[a-z]} matches any lowercase
   letter in the alphabet.
   letter in the alphabet.
-\item A regular expression followed by the plus symbol $R+$
+\item A regular expression followed by the plus symbol $R\ttm{+}$
   is a regular expression that matches any string that can
   is a regular expression that matches any string that can
   be formed by concatenating one or more strings that each match $R$.
   be formed by concatenating one or more strings that each match $R$.
   So $R+$ is equivalent to $R(R*)$. For example, \code{[a-z]+}
   So $R+$ is equivalent to $R(R*)$. For example, \code{[a-z]+}
   matches \code{'b'} and \code{'bzca'}.
   matches \code{'b'} and \code{'bzca'}.
-\item A regular expression followed by a question mark $R?$
+\item A regular expression followed by a question mark $R\ttm{?}$
   is a regular expression that matches any string that either
   is a regular expression that matches any string that either
   matches $R$ or that is the empty string.
   matches $R$ or that is the empty string.
   For example, \code{a?b}  matches both \code{'ab'} and \code{'b'}.
   For example, \code{a?b}  matches both \code{'ab'} and \code{'b'}.
@@ -4253,9 +4252,11 @@ and they can be used to combine regular expressions, outside the
 In section~\ref{sec:grammar} we learned how to use grammar rules to
 In section~\ref{sec:grammar} we learned how to use grammar rules to
 specify the abstract syntax of a language. We now take a closer look
 specify the abstract syntax of a language. We now take a closer look
 at using grammar rules to specify the concrete syntax. Recall that
 at using grammar rules to specify the concrete syntax. Recall that
-each rule has a left-hand side and a right-hand side. However, for
-concrete syntax, each right-hand side expresses a pattern for a
-string, instead of a patter for an abstract syntax tree. In
+each rule has a left-hand side and a right-hand side where the
+left-hand side is a nonterminal and the right-hand side is a pattern
+that defines what can be parsed as that nonterminal.
+For concrete syntax, each right-hand side expresses a pattern for a
+string, instead of a pattern for an abstract syntax tree. In
 particular, each right-hand side is a sequence of
 particular, each right-hand side is a sequence of
 \emph{symbols}\index{subject}{symbol}, where a symbol is either a
 \emph{symbols}\index{subject}{symbol}, where a symbol is either a
 terminal or nonterminal. A \emph{terminal}\index{subject}{terminal} is
 terminal or nonterminal. A \emph{terminal}\index{subject}{terminal} is
@@ -4297,13 +4298,13 @@ lang_int: stmt_list
 \end{minipage}
 \end{minipage}
 \end{center}
 \end{center}
 
 
-Let us begin by discussing the rule \code{exp: INT}.  In
+Let us begin by discussing the rule \code{exp: INT} which says that if
+the lexer matches a string to \code{INT}, then the parser also
+categorizes the string as an \code{exp}.  Recall that in
 Section~\ref{sec:grammar} we defined the corresponding \Int{}
 Section~\ref{sec:grammar} we defined the corresponding \Int{}
 nonterminal with an English sentence. Here we specify \code{INT} more
 nonterminal with an English sentence. Here we specify \code{INT} more
 formally using a type of token \code{INT} and its regular expression
 formally using a type of token \code{INT} and its regular expression
-\code{"-"? DIGIT+}. Thus, the rule \code{exp: INT} says that if the
-lexer matches a string to \code{INT}, then the parser also categorizes
-the string as an \code{exp}.
+\code{"-"? DIGIT+}.
 
 
 The rule \code{exp: exp "+" exp} says that any string that matches
 The rule \code{exp: exp "+" exp} says that any string that matches
 \code{exp}, followed by the \code{+} character, followed by another
 \code{exp}, followed by the \code{+} character, followed by another
@@ -4311,8 +4312,8 @@ string that matches \code{exp}, is itself an \code{exp}.  For example,
 the string \code{'1+3'} is an \code{exp} because \code{'1'} and
 the string \code{'1+3'} is an \code{exp} because \code{'1'} and
 \code{'3'} are both \code{exp} by the rule \code{exp: INT}, and then
 \code{'3'} are both \code{exp} by the rule \code{exp: INT}, and then
 the rule for addition applies to categorize \code{'1+3'} as an
 the rule for addition applies to categorize \code{'1+3'} as an
-\Exp{}. We can visualize the application of grammar rules to parse a
-string using a \emph{parse tree}\index{subject}{parse tree}. Each
+\code{exp}. We can visualize the application of grammar rules to parse
+a string using a \emph{parse tree}\index{subject}{parse tree}. Each
 internal node in the tree is an application of a grammar rule and is
 internal node in the tree is an application of a grammar rule and is
 labeled with its left-hand side nonterminal. Each leaf node is a
 labeled with its left-hand side nonterminal. Each leaf node is a
 substring of the input program.  The parse tree for \code{'1+3'} is
 substring of the input program.  The parse tree for \code{'1+3'} is
@@ -4363,12 +4364,12 @@ exp: INT                    -> int
    | "(" exp ")"            -> paren
    | "(" exp ")"            -> paren
 
 
 stmt: "print" "(" exp ")"   -> print
 stmt: "print" "(" exp ")"   -> print
-    | exp                   -> expr
+    | exp                    -> expr
 
 
-stmt_list:                   -> empty_stmt
+stmt_list:                      -> empty_stmt
     | stmt NEWLINE stmt_list -> add_stmt
     | stmt NEWLINE stmt_list -> add_stmt
 
 
-lang_int: stmt_list          -> module
+lang_int: stmt_list             -> module
 \end{lstlisting}
 \end{lstlisting}
 \end{minipage}
 \end{minipage}
 \end{center}
 \end{center}
@@ -4510,10 +4511,10 @@ WS: /[ \t\f\r\n]/+
 %ignore WS
 %ignore WS
 \end{lstlisting}
 \end{lstlisting}
 Change your compiler from chapter~\ref{ch:Lvar} to use your
 Change your compiler from chapter~\ref{ch:Lvar} to use your
-Lark-generated parser instead of using the \code{parse} function from
+Lark parser instead of using the \code{parse} function from
 the \code{ast} module. Test your compiler on all of the \LangVar{}
 the \code{ast} module. Test your compiler on all of the \LangVar{}
 programs that you have created and create four additional programs
 programs that you have created and create four additional programs
-that would reveal ambiguities in your grammar.
+that test for ambiguities in your grammar.
 \end{exercise}
 \end{exercise}
 
 
 
 
@@ -4521,14 +4522,14 @@ that would reveal ambiguities in your grammar.
 \label{sec:earley}
 \label{sec:earley}
 
 
 In this section we discuss the parsing algorithm of
 In this section we discuss the parsing algorithm of
-\citet{Earley:1970ly}, which is the default algorithm used by Lark.
-The algorithm is powerful in that it can handle any context-free
-grammar, which makes it easy to use. However, it is not the most
-efficient parsing algorithm: it is $O(n^3)$ for ambiguous grammars and
-$O(n^2)$ for unambiguous grammars, where $n$ is the number of tokens
-in the input string~\citep{Hopcroft06:_automata}.  In
-section~\ref{sec:lalr} we learn about the LALR(1) algorithm, which is
-more efficient but cannot handle all context-free grammars.
+\citet{Earley:1970ly}, the default algorithm used by Lark.  The
+algorithm is powerful in that it can handle any context-free grammar,
+which makes it easy to use. However, it is not the most efficient
+parsing algorithm: it is $O(n^3)$ for ambiguous grammars and $O(n^2)$
+for unambiguous grammars, where $n$ is the number of tokens in the
+input string~\citep{Hopcroft06:_automata}.  In section~\ref{sec:lalr}
+we learn about the LALR(1) algorithm, which is more efficient but
+cannot handle all context-free grammars.
 
 
 The Earley algorithm can be viewed as an interpreter; it treats the
 The Earley algorithm can be viewed as an interpreter; it treats the
 grammar as the program being interpreted and it treats the concrete
 grammar as the program being interpreted and it treats the concrete
@@ -4564,7 +4565,7 @@ grammar in figure~\ref{fig:Lint-lark-grammar}, we place
 \begin{lstlisting}
 \begin{lstlisting}
   lang_int: . stmt_list         (0)
   lang_int: . stmt_list         (0)
 \end{lstlisting}
 \end{lstlisting}
-in slot $0$ of the chart. The algorithm then proceeds to with
+in slot $0$ of the chart. The algorithm then proceeds with
 \emph{prediction} actions in which it adds more dotted rules to the
 \emph{prediction} actions in which it adds more dotted rules to the
 chart based on which nonterminals come immediately after a period. In
 chart based on which nonterminals come immediately after a period. In
 the above, the nonterminal \code{stmt\_list} appears after a period,
 the above, the nonterminal \code{stmt\_list} appears after a period,
@@ -4582,7 +4583,7 @@ stmt:  .  "print" "("  exp ")"   (0)
 stmt:  .  exp                    (0)
 stmt:  .  exp                    (0)
 \end{lstlisting}
 \end{lstlisting}
 This reveals yet more opportunities for prediction, so we add the grammar
 This reveals yet more opportunities for prediction, so we add the grammar
-rules for \code{exp} and \code{exp\_hi}.
+rules for \code{exp} and \code{exp\_hi} to slot $0$.
 \begin{lstlisting}[escapechar=$]
 \begin{lstlisting}[escapechar=$]
 exp: . exp "+" exp_hi         (0)
 exp: . exp "+" exp_hi         (0)
 exp: . exp "-" exp_hi         (0)
 exp: . exp "-" exp_hi         (0)
@@ -4596,14 +4597,14 @@ exp_hi: . "(" exp ")"         (0)
 We have exhausted the opportunities for prediction, so the algorithm
 We have exhausted the opportunities for prediction, so the algorithm
 proceeds to \emph{scanning}, in which we inspect the next input token
 proceeds to \emph{scanning}, in which we inspect the next input token
 and look for a dotted rule at the current position that has a matching
 and look for a dotted rule at the current position that has a matching
-terminal following the period. In our running example, the first input
-token is \code{"print"} so we identify the rule in slot $0$ of
-the chart whose dot comes before \code{"print"}:
+terminal immediately following the period. In our running example, the
+first input token is \code{"print"} so we identify the rule in slot
+$0$ of the chart where \code{"print"} follows the period:
 \begin{lstlisting}
 \begin{lstlisting}
 stmt:  .  "print" "("  exp ")"       (0)
 stmt:  .  "print" "("  exp ")"       (0)
 \end{lstlisting}
 \end{lstlisting}
-and add the following rule to slot $1$ of the chart, with the period
-moved forward past \code{"print"}.
+We advance the period past \code{"print"} and add the resulting rule
+to slot $1$ of the chart:
 \begin{lstlisting}
 \begin{lstlisting}
 stmt:  "print" . "("  exp ")"        (0)
 stmt:  "print" . "("  exp ")"        (0)
 \end{lstlisting}
 \end{lstlisting}
@@ -4629,9 +4630,9 @@ exp_hi: . "input_int" "(" ")" (2)
 exp_hi: . "-" exp_hi          (2)
 exp_hi: . "-" exp_hi          (2)
 exp_hi: . "(" exp ")"         (2)
 exp_hi: . "(" exp ")"         (2)
 \end{lstlisting}
 \end{lstlisting}
-With that prediction complete, we return to scanning, noting that the
+With this prediction complete, we return to scanning, noting that the
 next input token is \code{"1"} which the lexer parses as an
 next input token is \code{"1"} which the lexer parses as an
-\code{INT}. There is a matching rule is slot $2$:
+\code{INT}. There is a matching rule in slot $2$:
 \begin{lstlisting}
 \begin{lstlisting}
 exp_hi: . INT             (2)
 exp_hi: . INT             (2)
 \end{lstlisting}
 \end{lstlisting}
@@ -4644,7 +4645,7 @@ the end of a dotted rule, we recognize that the substring
 has matched the nonterminal on the left-hand side of the rule, in this case
 has matched the nonterminal on the left-hand side of the rule, in this case
 \code{exp\_hi}. We therefore need to advance the periods in any dotted
 \code{exp\_hi}. We therefore need to advance the periods in any dotted
 rules in slot $2$ (the starting position for the finished rule) if
 rules in slot $2$ (the starting position for the finished rule) if
-period is immediately followed by \code{exp\_hi}. So we identify
+the period is immediately followed by \code{exp\_hi}. So we identify
 \begin{lstlisting}
 \begin{lstlisting}
 exp: . exp_hi                 (2)
 exp: . exp_hi                 (2)
 \end{lstlisting}
 \end{lstlisting}
@@ -4738,17 +4739,16 @@ algorithm.
 \item The algorithm repeatedly applies the following three kinds of
 \item The algorithm repeatedly applies the following three kinds of
   actions for as long as there are opportunities to do so.
   actions for as long as there are opportunities to do so.
   \begin{itemize}
   \begin{itemize}
-  \item Prediction: if there is a dotted rule in slot $k$ whose period
-    comes before a nonterminal, add all the rules for that nonterminal
-    into slot $k$, placing a period at the beginning of their
-    right-hand sides, and recording their starting position as
-    $k$.
+  \item Prediction: if there is a rule in slot $k$ whose period comes
+    before a nonterminal, add the rules for that nonterminal into slot
+    $k$, placing a period at the beginning of their right-hand sides
+    and recording their starting position as $k$.
   \item Scanning: If the token at position $k$ of the input string
   \item Scanning: If the token at position $k$ of the input string
     matches the symbol after the period in a dotted rule in slot $k$
     matches the symbol after the period in a dotted rule in slot $k$
-    of the chart, advance the prior in the dotted rule, adding
+    of the chart, advance the period in the dotted rule, adding
     the result to slot $k+1$.
     the result to slot $k+1$.
   \item Completion: If a dotted rule in slot $k$ has a period at the
   \item Completion: If a dotted rule in slot $k$ has a period at the
-    end, consider the rules in the slot corresponding to the starting
+    end, inspect the rules in the slot corresponding to the starting
     position of the completed rule. If any of those rules have a
     position of the completed rule. If any of those rules have a
     nonterminal following their period that matches the left-hand side
     nonterminal following their period that matches the left-hand side
     of the completed rule, then advance their period, placing the new
     of the completed rule, then advance their period, placing the new
@@ -4766,23 +4766,28 @@ shared packed parse forest~\citep{Tomita:1985qr}.  The simple idea is
 to attach a partial parse tree to every dotted rule in the chart.
 to attach a partial parse tree to every dotted rule in the chart.
 Initially, the tree node associated with a dotted rule has no
 Initially, the tree node associated with a dotted rule has no
 children. As the period moves to the right, the nodes from the
 children. As the period moves to the right, the nodes from the
-subparses are added as children to this tree node.
+subparses are added as children to the tree node.
 
 
 As mentioned at the beginning of this section, the Earley algorithm is
 As mentioned at the beginning of this section, the Earley algorithm is
 $O(n^2)$ for unambiguous grammars, which means that it can parse input
 $O(n^2)$ for unambiguous grammars, which means that it can parse input
 files that contain thousands of tokens in a reasonable amount of time,
 files that contain thousands of tokens in a reasonable amount of time,
-but not millions. In the next section we discuss the LALR(1) parsing
-algorithm, which has time complexity $O(n)$, making it practical to
-use with even the largest of input files.
+but not millions.
+%
+In the next section we discuss the LALR(1) parsing algorithm, which is
+efficient enough to use with even the largest of input files.
+
 
 
 \section{The LALR(1) Algorithm}
 \section{The LALR(1) Algorithm}
 \label{sec:lalr}
 \label{sec:lalr}
 
 
 The LALR(1) algorithm~\citep{DeRemer69,Anderson73} can be viewed as a
 The LALR(1) algorithm~\citep{DeRemer69,Anderson73} can be viewed as a
 two phase approach in which it first compiles the grammar into a state
 two phase approach in which it first compiles the grammar into a state
-machine and then runs the state machine to parse an input string.
+machine and then runs the state machine to parse an input string.  The
+second phase has time complexity $O(n)$ where $n$ is the number of
+tokens in the input, so LALR(1) is the best one could hope for with
+respect to efficiency.
 %
 %
-A particularly influential implementation of LALR(1) was the
+A particularly influential implementation of LALR(1) is the
 \texttt{yacc} parser generator by \citet{Johnson:1979qy}, which stands
 \texttt{yacc} parser generator by \citet{Johnson:1979qy}, which stands
 for Yet Another Compiler Compiler.
 for Yet Another Compiler Compiler.
 %
 %
@@ -4806,25 +4811,24 @@ stmt: "print" exp
 start: stmt
 start: stmt
 \end{lstlisting}
 \end{lstlisting}
 Consider state 1 in Figure~\ref{fig:shift-reduce}. The parser has just
 Consider state 1 in Figure~\ref{fig:shift-reduce}. The parser has just
-read in a \lstinline{PRINT} token, so the top of the stack is
-\lstinline{(1,PRINT)}. The parser is part of the way through parsing
+read in a \lstinline{"print"} token, so the top of the stack is
+\lstinline{(1,"print")}. The parser is part of the way through parsing
 the input according to grammar rule 1, which is signified by showing
 the input according to grammar rule 1, which is signified by showing
-rule 1 with a period after the \code{PRINT} token and before the
-\code{exp} nonterminal.  A rule with a period in it is called an
-\emph{item}. There are several rules that could apply next, both rule
-2 and 3, so state 1 also shows those rules with a period at the
-beginning of their right-hand sides. The edges between states indicate
-which transitions the machine should make depending on the next input
-token. So, for example, if the next input token is \code{INT} then the
-parser will push \code{INT} and the target state 4 on the stack and
-transition to state 4.  Suppose we are now at the end of the input. In
-state 4 it says we should reduce by rule 3, so we pop from the stack
-the same number of items as the number of symbols in the right-hand
-side of the rule, in this case just one.  We then momentarily jump to
-the state at the top of the stack (state 1) and then follow the goto
-edge that corresponds to the left-hand side of the rule we just
-reduced by, in this case \code{exp}, so we arrive at state 3.  (A
-slightly longer example parse is shown in
+rule 1 with a period after the \code{"print"} token and before the
+\code{exp} nonterminal. There are several rules that could apply next,
+both rule 2 and 3, so state 1 also shows those rules with a period at
+the beginning of their right-hand sides. The edges between states
+indicate which transitions the machine should make depending on the
+next input token. So, for example, if the next input token is
+\code{INT} then the parser will push \code{INT} and the target state 4
+on the stack and transition to state 4.  Suppose we are now at the end
+of the input. In state 4 it says we should reduce by rule 3, so we pop
+from the stack the same number of items as the number of symbols in
+the right-hand side of the rule, in this case just one.  We then
+momentarily jump to the state at the top of the stack (state 1) and
+then follow the goto edge that corresponds to the left-hand side of
+the rule we just reduced by, in this case \code{exp}, so we arrive at
+state 3.  (A slightly longer example parse is shown in
 Figure~\ref{fig:shift-reduce}.)
 Figure~\ref{fig:shift-reduce}.)
 
 
 \begin{figure}[htbp]
 \begin{figure}[htbp]
@@ -4834,18 +4838,19 @@ Figure~\ref{fig:shift-reduce}.)
   \label{fig:shift-reduce}
   \label{fig:shift-reduce}
 \end{figure}
 \end{figure}
 
 
-In general, the algorithm works as follows. Look at the next input
-token.
+In general, the algorithm works as follows. Set the current state to
+state $0$. Then repeat the following, looking at the next input token.
 \begin{itemize}
 \begin{itemize}
-\item If there there is a shift edge for the input token, push the
-  edge's target state and the input token on the stack and proceed to
-  the edge's target state.
-\item If there is a reduce action for the input token, pop $k$
-  elements from the stack, where $k$ is the number of symbols in the
-  right-hand side of the rule being reduced. Jump to the state at the
-  top of the stack and then follow the goto edge for the nonterminal
-  that matches the left-hand side of the rule that we reducing
-  by. Push the edge's target state and the nonterminal on the stack.
+\item If there there is a shift edge for the input token in the
+  current state, push the edge's target state and the input token on
+  the stack and proceed to the edge's target state.
+\item If there is a reduce action for the input token in the current
+  state, pop $k$ elements from the stack, where $k$ is the number of
+  symbols in the right-hand side of the rule being reduced. Jump to
+  the state at the top of the stack and then follow the goto edge for
+  the nonterminal that matches the left-hand side of the rule that we
+  reducing by. Push the edge's target state and the nonterminal on the
+  stack.
 \end{itemize}
 \end{itemize}
 
 
 Notice that in state 6 of Figure~\ref{fig:shift-reduce} there is both
 Notice that in state 6 of Figure~\ref{fig:shift-reduce} there is both
@@ -4856,7 +4861,7 @@ there is a \emph{shift/reduce conflict}.  In this case, the conflict
 will arise, for example, when trying to parse the input
 will arise, for example, when trying to parse the input
 \lstinline{print 1 + 2 + 3}. After having consumed \lstinline{print 1 + 2}
 \lstinline{print 1 + 2 + 3}. After having consumed \lstinline{print 1 + 2}
 the parser will be in state 6, and it will not know whether to
 the parser will be in state 6, and it will not know whether to
-reduce to form an \emph{exp} of \lstinline{1 + 2}, or whether it
+reduce to form an \code{exp} of \lstinline{1 + 2}, or whether it
 should proceed by shifting the next \lstinline{+} from the input.
 should proceed by shifting the next \lstinline{+} from the input.
 
 
 A similar kind of problem, known as a \emph{reduce/reduce} conflict,
 A similar kind of problem, known as a \emph{reduce/reduce} conflict,
@@ -4872,7 +4877,7 @@ similar to the initialization phase of the Earley parser.  If the
 period appears immediately before another nonterminal, we add all the
 period appears immediately before another nonterminal, we add all the
 rules with that nonterminal on the left-hand side. Again, we place a
 rules with that nonterminal on the left-hand side. Again, we place a
 period at the beginning of the right-hand side of each the new
 period at the beginning of the right-hand side of each the new
-rules. This process called \emph{state closure} is continued
+rules. This process, called \emph{state closure}, is continued
 until there are no more rules to add (similar to the prediction
 until there are no more rules to add (similar to the prediction
 actions of an Earley parser). We then examine each dotted rule in the
 actions of an Earley parser). We then examine each dotted rule in the
 current state $I$. Suppose a dotted rule has the form $A ::=
 current state $I$. Suppose a dotted rule has the form $A ::=
@@ -4897,12 +4902,6 @@ $Y$. For example, in Figure~\ref{fig:shift-reduce} state 4 has an
 dotted rule with a period at the end. We therefore put a reduce by
 dotted rule with a period at the end. We therefore put a reduce by
 rule 3 action into state 4 for every
 rule 3 action into state 4 for every
 token.
 token.
-%% (Figure~\ref{fig:shift-reduce} does not show a reduce rule for
-%% \code{INT} in state 4 because this grammar does not allow two
-%% consecutive \code{INT} tokens in the input. We will not go into how
-%% this can be figured out, but in any event it does no harm to have a
-%% reduce rule for \code{INT} in state 4; it just means the input will be
-%% rejected at a later point in the parsing process.)
 
 
 When inserting reduce actions, take care to spot any shift/reduce or
 When inserting reduce actions, take care to spot any shift/reduce or
 reduce/reduce conflicts. If there are any, abort the construction of
 reduce/reduce conflicts. If there are any, abort the construction of
@@ -5177,8 +5176,8 @@ During liveness analysis we know which variables are call-live because
 we compute which variables are in use at every instruction
 we compute which variables are in use at every instruction
 (section~\ref{sec:liveness-analysis-Lvar}). When we build the
 (section~\ref{sec:liveness-analysis-Lvar}). When we build the
 interference graph (section~\ref{sec:build-interference}), we can
 interference graph (section~\ref{sec:build-interference}), we can
-place an edge between each call-live variable and the caller-saved
-registers in the interference graph. This will prevent the graph
+place an edge in the interference graph between each call-live
+variable and the caller-saved registers. This will prevent the graph
 coloring algorithm from assigning call-live variables to caller-saved
 coloring algorithm from assigning call-live variables to caller-saved
 registers.
 registers.