瀏覽代碼

parsing rough draft complete

Jeremy Siek 2 年之前
父節點
當前提交
0ef137071a
共有 1 個文件被更改,包括 35 次插入20 次删除
  1. 35 20
      book.tex

+ 35 - 20
book.tex

@@ -4858,14 +4858,14 @@ rules. This process called \emph{state closure} is continued
 until there are no more rules to add (similar to the prediction
 actions of an Earley parser). We then examine each dotted rule in the
 current state $I$. Suppose a dotted rule has the form $A ::=
-\alpha.X\beta$, where $A$ and $X$ are symbols and $\alpha$ and $\beta$
+s_1.\,X s_2$, where $A$ and $X$ are symbols and $s_1$ and $s_2$
 are sequences of symbols. We create a new state, call it $J$.  If $X$
 is a terminal, we create a shift edge from $I$ to $J$ (analogous to
 scanning in Earley), whereas if $X$ is a nonterminal, we create a
 goto edge from $I$ to $J$.  We then need to add some dotted rules to
 state $J$. We start by adding all dotted rules from state $I$ that
-have the form $B ::= \gamma.X\kappa$ (where $B$ is any nonterminal and
-$\gamma$ and $\kappa$ are arbitrary sequences of symbols), but with
+have the form $B ::= s_1.\,Xs_2$ (where $B$ is any nonterminal and
+$s_1$ and $s_2$ are arbitrary sequences of symbols), but with
 the period moved past the $X$.  (This is analogous to completion in
 the Earley algorithm.)  We then perform state closure on $J$.  This
 process repeats until there are no more states or edges to add.
@@ -4878,40 +4878,55 @@ rule. We then put a reduce $n$ action into that state for every token
 $Y$. For example, in Figure~\ref{fig:shift-reduce} state 4 has an
 dotted rule with a period at the end. We therefore put a reduce by
 rule 3 action into state 4 for every
-token. (Figure~\ref{fig:shift-reduce} does not show a reduce rule for
-\code{INT} in state 4 because this grammar does not allow two
-consecutive \code{INT} tokens in the input. We will not go into how
-this can be figured out, but in any event it does no harm to have a
-reduce rule for \code{INT} in state 4; it just means the input will be
-rejected at a later point in the parsing process.)
+token.
+%% (Figure~\ref{fig:shift-reduce} does not show a reduce rule for
+%% \code{INT} in state 4 because this grammar does not allow two
+%% consecutive \code{INT} tokens in the input. We will not go into how
+%% this can be figured out, but in any event it does no harm to have a
+%% reduce rule for \code{INT} in state 4; it just means the input will be
+%% rejected at a later point in the parsing process.)
+
+When inserting reduce actions, take care to spot any shift/reduce or
+reduce/reduce conflicts. If there are any, abort the construction of
+the parse table.
+
 
 \begin{exercise}
-On a piece of paper, walk through the parse table generation 
-process for the grammar in Figure~\ref{fig:parser1} and check
-your results against Figure~\ref{fig:shift-reduce}. 
+  \normalfont\normalsize
+%
+On a piece of paper, walk through the parse table generation process
+for the grammar at the top of figure~\ref{fig:shift-reduce} and check
+your results against parse table in figure~\ref{fig:shift-reduce}.
 \end{exercise}
 
 
 \begin{exercise}
+  \normalfont\normalsize
+%
   Change the parser in your compiler for \LangVar{} to set the
   \code{parser} option of Lark to \code{'lalr'}. Test your compiler on
   all the \LangVar{} programs that you have created. In doing so, Lark
   may signal an error due to shift/reduce or reduce/reduce conflicts
   in your grammar. If so, change your Lark grammar for \LangVar{} to
   remove those conflicts.
-
 \end{exercise}
 
 
 \section{Further Reading}
 
-UNDER CONSTRUCTION
-
-finite automata
-
-
-
-
+In this chapter we have just scratched the surface of the field of
+parsing, with the study of a very general put less efficient algorithm
+(Earley) and with a more limited but highly efficient algorithm
+(LALR). There are many more algorithms, and classes of grammars, that
+fall between these two. We recommend the reader to \citet{Aho:2006wb}
+for a thorough treatment of parsing.
+
+Regarding lexical analysis, we described the specification language,
+the regular expressions, but not the algorithms for recognizing them.
+In short, regular expressions can be translated to nondeterministic
+finite automata, which in turn are translated to finite automata.  We
+refer the reader again to \citet{Aho:2006wb} for all the details of
+lexical analysis.
 
 \fi}