Pārlūkot izejas kodu

parsing rough draft complete

Jeremy Siek 2 gadi atpakaļ
vecāks
revīzija
0ef137071a
1 mainītis faili ar 35 papildinājumiem un 20 dzēšanām
  1. 35 20
      book.tex

+ 35 - 20
book.tex

@@ -4858,14 +4858,14 @@ rules. This process called \emph{state closure} is continued
 until there are no more rules to add (similar to the prediction
 until there are no more rules to add (similar to the prediction
 actions of an Earley parser). We then examine each dotted rule in the
 actions of an Earley parser). We then examine each dotted rule in the
 current state $I$. Suppose a dotted rule has the form $A ::=
 current state $I$. Suppose a dotted rule has the form $A ::=
-\alpha.X\beta$, where $A$ and $X$ are symbols and $\alpha$ and $\beta$
+s_1.\,X s_2$, where $A$ and $X$ are symbols and $s_1$ and $s_2$
 are sequences of symbols. We create a new state, call it $J$.  If $X$
 are sequences of symbols. We create a new state, call it $J$.  If $X$
 is a terminal, we create a shift edge from $I$ to $J$ (analogous to
 is a terminal, we create a shift edge from $I$ to $J$ (analogous to
 scanning in Earley), whereas if $X$ is a nonterminal, we create a
 scanning in Earley), whereas if $X$ is a nonterminal, we create a
 goto edge from $I$ to $J$.  We then need to add some dotted rules to
 goto edge from $I$ to $J$.  We then need to add some dotted rules to
 state $J$. We start by adding all dotted rules from state $I$ that
 state $J$. We start by adding all dotted rules from state $I$ that
-have the form $B ::= \gamma.X\kappa$ (where $B$ is any nonterminal and
-$\gamma$ and $\kappa$ are arbitrary sequences of symbols), but with
+have the form $B ::= s_1.\,Xs_2$ (where $B$ is any nonterminal and
+$s_1$ and $s_2$ are arbitrary sequences of symbols), but with
 the period moved past the $X$.  (This is analogous to completion in
 the period moved past the $X$.  (This is analogous to completion in
 the Earley algorithm.)  We then perform state closure on $J$.  This
 the Earley algorithm.)  We then perform state closure on $J$.  This
 process repeats until there are no more states or edges to add.
 process repeats until there are no more states or edges to add.
@@ -4878,40 +4878,55 @@ rule. We then put a reduce $n$ action into that state for every token
 $Y$. For example, in Figure~\ref{fig:shift-reduce} state 4 has an
 $Y$. For example, in Figure~\ref{fig:shift-reduce} state 4 has an
 dotted rule with a period at the end. We therefore put a reduce by
 dotted rule with a period at the end. We therefore put a reduce by
 rule 3 action into state 4 for every
 rule 3 action into state 4 for every
-token. (Figure~\ref{fig:shift-reduce} does not show a reduce rule for
-\code{INT} in state 4 because this grammar does not allow two
-consecutive \code{INT} tokens in the input. We will not go into how
-this can be figured out, but in any event it does no harm to have a
-reduce rule for \code{INT} in state 4; it just means the input will be
-rejected at a later point in the parsing process.)
+token.
+%% (Figure~\ref{fig:shift-reduce} does not show a reduce rule for
+%% \code{INT} in state 4 because this grammar does not allow two
+%% consecutive \code{INT} tokens in the input. We will not go into how
+%% this can be figured out, but in any event it does no harm to have a
+%% reduce rule for \code{INT} in state 4; it just means the input will be
+%% rejected at a later point in the parsing process.)
+
+When inserting reduce actions, take care to spot any shift/reduce or
+reduce/reduce conflicts. If there are any, abort the construction of
+the parse table.
+
 
 
 \begin{exercise}
 \begin{exercise}
-On a piece of paper, walk through the parse table generation 
-process for the grammar in Figure~\ref{fig:parser1} and check
-your results against Figure~\ref{fig:shift-reduce}. 
+  \normalfont\normalsize
+%
+On a piece of paper, walk through the parse table generation process
+for the grammar at the top of figure~\ref{fig:shift-reduce} and check
+your results against parse table in figure~\ref{fig:shift-reduce}.
 \end{exercise}
 \end{exercise}
 
 
 
 
 \begin{exercise}
 \begin{exercise}
+  \normalfont\normalsize
+%
   Change the parser in your compiler for \LangVar{} to set the
   Change the parser in your compiler for \LangVar{} to set the
   \code{parser} option of Lark to \code{'lalr'}. Test your compiler on
   \code{parser} option of Lark to \code{'lalr'}. Test your compiler on
   all the \LangVar{} programs that you have created. In doing so, Lark
   all the \LangVar{} programs that you have created. In doing so, Lark
   may signal an error due to shift/reduce or reduce/reduce conflicts
   may signal an error due to shift/reduce or reduce/reduce conflicts
   in your grammar. If so, change your Lark grammar for \LangVar{} to
   in your grammar. If so, change your Lark grammar for \LangVar{} to
   remove those conflicts.
   remove those conflicts.
-
 \end{exercise}
 \end{exercise}
 
 
 
 
 \section{Further Reading}
 \section{Further Reading}
 
 
-UNDER CONSTRUCTION
-
-finite automata
-
-
-
-
+In this chapter we have just scratched the surface of the field of
+parsing, with the study of a very general put less efficient algorithm
+(Earley) and with a more limited but highly efficient algorithm
+(LALR). There are many more algorithms, and classes of grammars, that
+fall between these two. We recommend the reader to \citet{Aho:2006wb}
+for a thorough treatment of parsing.
+
+Regarding lexical analysis, we described the specification language,
+the regular expressions, but not the algorithms for recognizing them.
+In short, regular expressions can be translated to nondeterministic
+finite automata, which in turn are translated to finite automata.  We
+refer the reader again to \citet{Aho:2006wb} for all the details of
+lexical analysis.
 
 
 \fi}
 \fi}