UNIX ON-LINE Man Pages - Die Onlinehilfe

Die Syntax von Unixbefehlen wird in den entsprechenden Manpages dokumentiert. Hier können Sie diese Onlinehilfe für viele Standardbefehle abrufen.

Seiten auflisten, welche beginnen mit:
A   B   C   D   E   F   G   H   I   J   K   L   M   N   O   P   Q   R   S   T   U   V   W   X   Y   Z   ALPHA   NUM   ANDERE   ALLE
Parse::RecDescent(3)  User Contributed Perl Documentation Parse::RecDescent(3)

       Parse::RecDescent - Generate Recursive-Descent Parsers

       This document describes version 1.964 of Parse::RecDescent released
       April  9, 2003.

        use Parse::RecDescent;

        # Generate a parser from the specification in $grammar:

            $parser = new Parse::RecDescent ($grammar);

        # Generate a parser from the specification in $othergrammar

            $anotherparser = new Parse::RecDescent ($othergrammar);

        # Parse $text using rule 'startrule' (which must be
        # defined in $grammar):


        # Parse $text using rule 'otherrule' (which must also
        # be defined in $grammar):


        # Change the universal token prefix pattern
        # (the default is: '\s*'):

           $Parse::RecDescent::skip = '[ \t]+';

        # Replace productions of existing rules (or create new ones)
        # with the productions defined in $newgrammar:


        # Extend existing rules (or create new ones)
        # by adding extra productions defined in $moregrammar:


        # Global flags (useful as command line arguments under -s):

           $::RD_ERRORS       # unless undefined, report fatal errors
           $::RD_WARN         # unless undefined, also report non-fatal problems
           $::RD_HINT         # if defined, also suggestion remedies
           $::RD_TRACE        # if defined, also trace parsers' behaviour
           $::RD_AUTOSTUB     # if defined, generates "stubs" for undefined rules
           $::RD_AUTOACTION   # if defined, appends specified action to productions

       Parse::RecDescent incrementally generates top-down recursive-descent
       text parsers from simple yacc-like grammar specifications. It provides:

       o   Regular expressions or literal strings as terminals (tokens),

       o   Multiple (non-contiguous) productions for any rule,

       o   Repeated and optional subrules within productions,

       o   Full access to Perl within actions specified as part of the

       o   Simple automated error reporting during parser generation and

       o   The ability to commit to, uncommit to, or reject particular
           productions during a parse,

       o   The ability to pass data up and down the parse tree ("down" via
           subrule argument lists, "up" via subrule return values)

       o   Incremental extension of the parsing grammar (even during a parse),

       o   Precompilation of parser objects,

       o   User-definable reduce-reduce conflict resolution via "scoring" of
           matching productions.

   Using "Parse::RecDescent"
       Parser objects are created by calling "Parse::RecDescent::new", passing
       in a grammar specification (see the following subsections). If the
       grammar is correct, "new" returns a blessed reference which can then be
       used to initiate parsing through any rule specified in the original
       grammar. A typical sequence looks like this:

           $grammar = q {

           $parser = new Parse::RecDescent ($grammar) or die "Bad grammar!\n";

           # acquire $text

           defined $parser->startrule($text) or print "Bad text!\n";

       The rule through which parsing is initiated must be explicitly defined
       in the grammar (i.e. for the above example, the grammar must include a
       rule of the form: "startrule: <subrules>".

       If the starting rule succeeds, its value (see below) is returned.
       Failure to generate the original parser or failure to match a text is
       indicated by returning "undef". Note that it's easy to set up grammars
       that can succeed, but which return a value of 0, "0", or "".  So don't
       be tempted to write:

           $parser->startrule($text) or print "Bad text!\n";

       Normally, the parser has no effect on the original text. So in the
       previous example the value of $text would be unchanged after having
       been parsed.

       If, however, the text to be matched is passed by reference:


       then any text which was consumed during the match will be removed from
       the start of $text.

       In the grammar from which the parser is built, rules are specified by
       giving an identifier (which must satisfy /[A-Za-z]\w*/), followed by a
       colon on the same line, followed by one or more productions, separated
       by single vertical bars. The layout of the productions is entirely

           rule1:  production1
            |  production2 |
           production3 | production4

       At any point in the grammar previously defined rules may be extended
       with additional productions. This is achieved by redeclaring the rule
       with the new productions. Thus:

           rule1: a | b | c
           rule2: d | e | f
           rule1: g | h

       is exactly equivalent to:

           rule1: a | b | c | g | h
           rule2: d | e | f

       Each production in a rule consists of zero or more items, each of which
       may be either: the name of another rule to be matched (a "subrule"), a
       pattern or string literal to be matched directly (a "token"), a block
       of Perl code to be executed (an "action"), a special instruction to the
       parser (a "directive"), or a standard Perl comment (which is ignored).

       A rule matches a text if one of its productions matches. A production
       matches if each of its items match consecutive substrings of the text.
       The productions of a rule being matched are tried in the same order
       that they appear in the original grammar, and the first matching
       production terminates the match attempt (successfully). If all
       productions are tried and none matches, the match attempt fails.

       Note that this behaviour is quite different from the "prefer the longer
       match" behaviour of yacc. For example, if yacc were parsing the rule:

           seq : 'A' 'B'
           | 'A' 'B' 'C'

       upon matching "AB" it would look ahead to see if a 'C' is next and, if
       so, will match the second production in preference to the first. In
       other words, yacc effectively tries all the productions of a rule
       breadth-first in parallel, and selects the "best" match, where "best"
       means longest (note that this is a gross simplification of the true
       behaviour of yacc but it will do for our purposes).

       In contrast, "Parse::RecDescent" tries each production depth-first in
       sequence, and selects the "best" match, where "best" means first. This
       is the fundamental difference between "bottom-up" and "recursive
       descent" parsing.

       Each successfully matched item in a production is assigned a value,
       which can be accessed in subsequent actions within the same production
       (or, in some cases, as the return value of a successful subrule call).
       Unsuccessful items don't have an associated value, since the failure of
       an item causes the entire surrounding production to immediately fail.
       The following sections describe the various types of items and their
       success values.

       A subrule which appears in a production is an instruction to the parser
       to attempt to match the named rule at that point in the text being
       parsed. If the named subrule is not defined when requested the
       production containing it immediately fails (unless it was "autostubbed"
       - see Autostubbing).

       A rule may (recursively) call itself as a subrule, but not as the left-
       most item in any of its productions (since such recursions are usually

       The value associated with a subrule is the value associated with its
       $return variable (see "Actions" below), or with the last successfully
       matched item in the subrule match.

       Subrules may also be specified with a trailing repetition specifier,
       indicating that they are to be (greedily) matched the specified number
       of times. The available specifiers are:

           subrule(?)  # Match one-or-zero times
           subrule(s)  # Match one-or-more times
           subrule(s?) # Match zero-or-more times
           subrule(N)  # Match exactly N times for integer N > 0
           subrule(N..M)   # Match between N and M times
           subrule(..M)    # Match between 1 and M times
           subrule(N..)    # Match at least N times

       Repeated subrules keep matching until either the subrule fails to
       match, or it has matched the minimal number of times but fails to
       consume any of the parsed text (this second condition prevents the
       subrule matching forever in some cases).

       Since a repeated subrule may match many instances of the subrule
       itself, the value associated with it is not a simple scalar, but rather
       a reference to a list of scalars, each of which is the value associated
       with one of the individual subrule matches. In other words in the rule:

           program: statement(s)

       the value associated with the repeated subrule "statement(s)" is a
       reference to an array containing the values matched by each call to the
       individual subrule "statement".

       Repetition modifiers may include a separator pattern:

           program: statement(s /;/)

       specifying some sequence of characters to be skipped between each
       repetition.  This is really just a shorthand for the <leftop:...>
       directive (see below).

       If a quote-delimited string or a Perl regex appears in a production,
       the parser attempts to match that string or pattern at that point in
       the text. For example:

           typedef: "typedef" typename identifier ';'

           identifier: /[A-Za-z_][A-Za-z0-9_]*/

       As in regular Perl, a single quoted string is uninterpolated, whilst a
       double-quoted string or a pattern is interpolated (at the time of
       matching, not when the parser is constructed). Hence, it is possible to
       define rules in which tokens can be set at run-time:

           typedef: "$::typedefkeyword" typename identifier ';'

           identifier: /$::identpat/

       Note that, since each rule is implemented inside a special namespace
       belonging to its parser, it is necessary to explicitly quantify
       variables from the main package.

       Regex tokens can be specified using just slashes as delimiters or with
       the explicit "m<delimiter>......<delimiter>" syntax:

           typedef: "typedef" typename identifier ';'

           typename: /[A-Za-z_][A-Za-z0-9_]*/

           identifier: m{[A-Za-z_][A-Za-z0-9_]*}

       A regex of either type can also have any valid trailing parameter(s)
       (that is, any of [cgimsox]):

           typedef: "typedef" typename identifier ';'

           identifier: / [a-z_]        # LEADING ALPHA OR UNDERSCORE
                 [a-z0-9_]*    # THEN DIGITS ALSO ALLOWED
               /ix     # CASE/SPACE/COMMENT INSENSITIVE

       The value associated with any successfully matched token is a string
       containing the actual text which was matched by the token.

       It is important to remember that, since each grammar is specified in a
       Perl string, all instances of the universal escape character '\' within
       a grammar must be "doubled", so that they interpolate to single '\'s
       when the string is compiled. For example, to use the grammar:

           word:       /\S+/ | backslash
           line:       prefix word(s) "\n"
           backslash:  '\\'

       the following code is required:

           $parser = new Parse::RecDescent (q{

               word:   /\\S+/ | backslash
               line:   prefix word(s) "\\n"
               backslash:  '\\\\'


   Anonymous subrules
       Parentheses introduce a nested scope that is very like a call to an
       anonymous subrule. Hence they are useful for "in-lining" subroutine
       calls, and other kinds of grouping behaviour. For example, instead of:

           word:       /\S+/ | backslash
           line:       prefix word(s) "\n"

       you could write:

           line:       prefix ( /\S+/ | backslash )(s) "\n"

       and get exactly the same effects.

       Parentheses are also use for collecting unrepeated alternations within
       a single production.

           secret_identity: "Mr" ("Incredible"|"Fantastic"|"Sheen") ", Esq."

   Terminal Separators
       For the purpose of matching, each terminal in a production is
       considered to be preceded by a "prefix" - a pattern which must be
       matched before a token match is attempted. By default, the prefix is
       optional whitespace (which always matches, at least trivially), but
       this default may be reset in any production.

       The variable $Parse::RecDescent::skip stores the universal prefix,
       which is the default for all terminal matches in all parsers built with

       The prefix for an individual production can be altered by using the
       "<skip:...>" directive (see below).

       An action is a block of Perl code which is to be executed (as the block
       of a "do" statement) when the parser reaches that point in a
       production. The action executes within a special namespace belonging to
       the active parser, so care must be taken in correctly qualifying
       variable names (see also "Start-up Actions" below).

       The action is considered to succeed if the final value of the block is
       defined (that is, if the implied "do" statement evaluates to a defined
       value - even one which would be treated as "false"). Note that the
       value associated with a successful action is also the final value in
       the block.

       An action will fail if its last evaluated value is "undef". This is
       surprisingly easy to accomplish by accident. For instance, here's an
       infuriating case of an action that makes its production fail, but only
       when debugging isn't activated:

           description: name rank serial_number
               { print "Got $item[2] $item[1] ($item[3])\n"
               if $::debugging

       If $debugging is false, no statement in the block is executed, so the
       final value is "undef", and the entire production fails. The solution

           description: name rank serial_number
               { print "Got $item[2] $item[1] ($item[3])\n"
               if $::debugging;

       Within an action, a number of useful parse-time variables are available
       in the special parser namespace (there are other variables also
       accessible, but meddling with them will probably just break your
       parser. As a general rule, if you avoid referring to unqualified
       variables - especially those starting with an underscore - inside an
       action, things should be okay):

       @item and %item
           The array slice @item[1..$#item] stores the value associated with
           each item (that is, each subrule, token, or action) in the current
           production. The analogy is to $1, $2, etc. in a yacc grammar.  Note
           that, for obvious reasons, @item only contains the values of items
           before the current point in the production.

           The first element ($item[0]) stores the name of the current rule
           being matched.

           @item is a standard Perl array, so it can also be indexed with
           negative numbers, representing the number of items back from the
           current position in the parse:

               stuff: /various/ bits 'and' pieces "then" data 'end'
                   { print $item[-2] }  # PRINTS data
                        # (EASIER THAN: $item[6])

           The %item hash complements the <@item> array, providing named
           access to the same item values:

               stuff: /various/ bits 'and' pieces "then" data 'end'
                   { print $item{data}  # PRINTS data
                        # (EVEN EASIER THAN USING @item)

           The results of named subrules are stored in the hash under each
           subrule's name (including the repetition specifier, if any), whilst
           all other items are stored under a "named positional" key that
           indictates their ordinal position within their item type:
           __STRINGn__, __PATTERNn__, __DIRECTIVEn__, __ACTIONn__:

               stuff: /various/ bits 'and' pieces "then" data 'end' { save }
                   { print $item{__PATTERN1__}, # PRINTS 'various'
                   $item{__STRING2__},  # PRINTS 'then'
                   $item{__ACTION1__},  # PRINTS RETURN
                            # VALUE OF save

           If you want proper named access to patterns or literals, you need
           to turn them into separate rules:

               stuff: various bits 'and' pieces "then" data 'end'
                   { print $item{various}  # PRINTS various

               various: /various/

           The special entry $item{__RULE__} stores the name of the current
           rule (i.e. the same value as $item[0].

           The advantage of using %item, instead of @items is that it removes
           the need to track items positions that may change as a grammar
           evolves. For example, adding an interim "<skip>" directive of
           action can silently ruin a trailing action, by moving an @item
           element "down" the array one place. In contrast, the named entry of
           %item is unaffected by such an insertion.

           A limitation of the %item hash is that it only records the last
           value of a particular subrule. For example:

               range: '(' number '..' number )'
                   { $return = $item{number} }

           will return only the value corresponding to the second match of the
           "number" subrule. In other words, successive calls to a subrule
           overwrite the corresponding entry in %item. Once again, the
           solution is to rename each subrule in its own rule:

               range: '(' from_num '..' to_num )'
                   { $return = $item{from_num} }

               from_num: number
               to_num:   number

       @arg and %arg
           The array @arg and the hash %arg store any arguments passed to the
           rule from some other rule (see ""Subrule argument lists"). Changes
           to the elements of either variable do not propagate back to the
           calling rule (data can be passed back from a subrule via the
           $return variable - see next item).

           If a value is assigned to $return within an action, that value is
           returned if the production containing the action eventually matches
           successfully. Note that setting $return doesn't cause the current
           production to succeed. It merely tells it what to return if it does
           succeed.  Hence $return is analogous to $$ in a yacc grammar.

           If $return is not assigned within a production, the value of the
           last component of the production (namely: $item[$#item]) is
           returned if the production succeeds.

           The current state of commitment to the current production (see
           "Directives" below).

           The current terminal prefix (see "Directives" below).

           The remaining (unparsed) text. Changes to $text do not propagate
           out of unsuccessful productions, but do survive successful
           productions. Hence it is possible to dynamically alter the text
           being parsed - for example, to provide a "#include"-like facility:

               hash_include: '#include' filename
                   { $text = ::loadfile($item[2]) . $text }

               filename: '<' /[a-z0-9._-]+/i '>'  { $return = $item[2] }
               | '"' /[a-z0-9._-]+/i '"'  { $return = $item[2] }

       $thisline and $prevline
           $thisline stores the current line number within the current parse
           (starting from 1). $prevline stores the line number for the last
           character which was already successfully parsed (this will be
           different from $thisline at the end of each line).

           For efficiency, $thisline and $prevline are actually tied hashes,
           and only recompute the required line number when the variable's
           value is used.

           Assignment to $thisline adjusts the line number calculator, so that
           it believes that the current line number is the value being
           assigned. Note that this adjustment will be reflected in all
           subsequent line numbers calculations.

           Modifying the value of the variable $text (as in the previous
           "hash_include" example, for instance) will confuse the line
           counting mechanism. To prevent this, you should call
           "Parse::RecDescent::LineCounter::resync($thisline)" immediately
           after any assignment to the variable $text (or, at least, before
           the next attempt to use $thisline).

           Note that if a production fails after assigning to or resync'ing
           $thisline, the parser's line counter mechanism will usually be

           Also see the entry for @itempos.

           The line number can be set to values other than 1, by calling the
           start rule with a second argument. For example:

               $parser = new Parse::RecDescent ($grammar);

               $parser->input($text, 10);  # START LINE NUMBERS AT 10

       $thiscolumn and $prevcolumn
           $thiscolumn stores the current column number within the current
           line being parsed (starting from 1). $prevcolumn stores the column
           number of the last character which was actually successfully
           parsed. Usually "$prevcolumn == $thiscolumn-1", but not at the end
           of lines.

           For efficiency, $thiscolumn and $prevcolumn are actually tied
           hashes, and only recompute the required column number when the
           variable's value is used.

           Assignment to $thiscolumn or $prevcolumn is a fatal error.

           Modifying the value of the variable $text (as in the previous
           "hash_include" example, for instance) may confuse the column
           counting mechanism.

           Note that $thiscolumn reports the column number before any
           whitespace that might be skipped before reading a token. Hence if
           you wish to know where a token started (and ended) use something
           like this:

               rule: token1 token2 startcol token3 endcol token4
                   { print "token3: columns $item[3] to $item[5]"; }

               startcol: '' { $thiscolumn }    # NEED THE '' TO STEP PAST TOKEN SEP
               endcol:  { $prevcolumn }

           Also see the entry for @itempos.

       $thisoffset and $prevoffset
           $thisoffset stores the offset of the current parsing position
           within the complete text being parsed (starting from 0).
           $prevoffset stores the offset of the last character which was
           actually successfully parsed. In all cases "$prevoffset ==

           For efficiency, $thisoffset and $prevoffset are actually tied
           hashes, and only recompute the required offset when the variable's
           value is used.

           Assignment to $thisoffset or <$prevoffset> is a fatal error.

           Modifying the value of the variable $text will not affect the
           offset counting mechanism.

           Also see the entry for @itempos.

           The array @itempos stores a hash reference corresponding to each
           element of @item. The elements of the hash provide the following:

               $itempos[$n]{offset}{from}  # VALUE OF $thisoffset BEFORE $item[$n]
               $itempos[$n]{offset}{to}    # VALUE OF $prevoffset AFTER $item[$n]
               $itempos[$n]{line}{from}    # VALUE OF $thisline BEFORE $item[$n]
               $itempos[$n]{line}{to}  # VALUE OF $prevline AFTER $item[$n]
               $itempos[$n]{column}{from}  # VALUE OF $thiscolumn BEFORE $item[$n]
               $itempos[$n]{column}{to}    # VALUE OF $prevcolumn AFTER $item[$n]

           Note that the various "$itempos[$n]...{from}" values record the
           appropriate value after any token prefix has been skipped.

           Hence, instead of the somewhat tedious and error-prone:

               rule: startcol token1 endcol
                 startcol token2 endcol
                 startcol token3 endcol
                   { print "token1: columns $item[1]
                         to $item[3]
                    token2: columns $item[4]
                         to $item[6]
                    token3: columns $item[7]
                         to $item[9]" }

               startcol: '' { $thiscolumn }    # NEED THE '' TO STEP PAST TOKEN SEP
               endcol:  { $prevcolumn }

           it is possible to write:

               rule: token1 token2 token3
                   { print "token1: columns $itempos[1]{column}{from}
                         to $itempos[1]{column}{to}
                    token2: columns $itempos[2]{column}{from}
                         to $itempos[2]{column}{to}
                    token3: columns $itempos[3]{column}{from}
                         to $itempos[3]{column}{to}" }

           Note however that (in the current implementation) the use of
           @itempos anywhere in a grammar implies that item positioning
           information is collected everywhere during the parse. Depending on
           the grammar and the size of the text to be parsed, this may be
           prohibitively expensive and the explicit use of $thisline,
           $thiscolumn, etc. may be a better choice.

           A reference to the "Parse::RecDescent" object through which parsing
           was initiated.

           The value of $thisparser propagates down the subrules of a parse
           but not back up. Hence, you can invoke subrules from another parser
           for the scope of the current rule as follows:

               rule: subrule1 subrule2
               | { $thisparser = $::otherparser } <reject>
               | subrule3 subrule4
               | subrule5

           The result is that the production calls "subrule1" and "subrule2"
           of the current parser, and the remaining productions call the named
           subrules from $::otherparser. Note, however that "Bad Things" will
           happen if "::otherparser" isn't a blessed reference and/or doesn't
           have methods with the same names as the required subrules!

           A reference to the "Parse::RecDescent::Rule" object corresponding
           to the rule currently being matched.

           A reference to the "Parse::RecDescent::Production" object
           corresponding to the production currently being matched.

       $score and $score_return
           $score stores the best production score to date, as specified by an
           earlier "<score:...>" directive. $score_return stores the
           corresponding return value for the successful production.

           See "Scored productions".

       Warning: the parser relies on the information in the various "this..."
       objects in some non-obvious ways. Tinkering with the other members of
       these objects will probably cause Bad Things to happen, unless you
       really know what you're doing. The only exception to this advice is
       that the use of "$this...->{local}" is always safe.

   Start-up Actions
       Any actions which appear before the first rule definition in a grammar
       are treated as "start-up" actions. Each such action is stripped of its
       outermost brackets and then evaluated (in the parser's special
       namespace) just before the rules of the grammar are first compiled.

       The main use of start-up actions is to declare local variables within
       the parser's special namespace:

           { my $lastitem = '???'; }

           list: item(s)   { $return = $lastitem }

           item: book  { $lastitem = 'book'; }
             bell  { $lastitem = 'bell'; }
             candle    { $lastitem = 'candle'; }

       but start-up actions can be used to execute any valid Perl code within
       a parser's special namespace.

       Start-up actions can appear within a grammar extension or replacement
       (that is, a partial grammar installed via "Parse::RecDescent::Extend()"
       or "Parse::RecDescent::Replace()" - see "Incremental Parsing"), and
       will be executed before the new grammar is installed. Note, however,
       that a particular start-up action is only ever executed once.

       It is sometimes desirable to be able to specify a default action to be
       taken at the end of every production (for example, in order to easily
       build a parse tree). If the variable $::RD_AUTOACTION is defined when
       "Parse::RecDescent::new()" is called, the contents of that variable are
       treated as a specification of an action which is to appended to each
       production in the corresponding grammar.

       Alternatively, you can hard-code the autoaction within a grammar, using
       the "<autoaction:...>" directive.

       So, for example, to construct a simple parse tree you could write:

           $::RD_AUTOACTION = q { [@item] };

           parser = Parse::RecDescent->new(q{
           expression: and_expr '||' expression | and_expr
           and_expr:   not_expr '&&' and_expr   | not_expr
           not_expr:   '!' brack_expr       | brack_expr
           brack_expr: '(' expression ')'       | identifier
           identifier: /[a-z]+/i


           parser = Parse::RecDescent->new(q{
           <autoaction: { [@item] } >

           expression: and_expr '||' expression | and_expr
           and_expr:   not_expr '&&' and_expr   | not_expr
           not_expr:   '!' brack_expr       | brack_expr
           brack_expr: '(' expression ')'       | identifier
           identifier: /[a-z]+/i

       Either of these is equivalent to:

           parser = new Parse::RecDescent (q{
           expression: and_expr '||' expression
               { [@item] }
             | and_expr
               { [@item] }

           and_expr:   not_expr '&&' and_expr
               { [@item] }
           |   not_expr
               { [@item] }

           not_expr:   '!' brack_expr
               { [@item] }
           |   brack_expr
               { [@item] }

           brack_expr: '(' expression ')'
               { [@item] }
             | identifier
               { [@item] }

           identifier: /[a-z]+/i
               { [@item] }

       Alternatively, we could take an object-oriented approach, use different
       classes for each node (and also eliminating redundant intermediate

           $::RD_AUTOACTION = q
             { $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item]) };

           parser = Parse::RecDescent->new(q{
               expression: and_expr '||' expression | and_expr
               and_expr:   not_expr '&&' and_expr   | not_expr
               not_expr:   '!' brack_expr           | brack_expr
               brack_expr: '(' expression ')'       | identifier
               identifier: /[a-z]+/i


           parser = Parse::RecDescent->new(q{
                 $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item])

               expression: and_expr '||' expression | and_expr
               and_expr:   not_expr '&&' and_expr   | not_expr
               not_expr:   '!' brack_expr           | brack_expr
               brack_expr: '(' expression ')'       | identifier
               identifier: /[a-z]+/i

       which are equivalent to:

           parser = Parse::RecDescent->new(q{
               expression: and_expr '||' expression
                   { "expression_node"->new(@item[1..3]) }
               | and_expr

               and_expr:   not_expr '&&' and_expr
                   { "and_expr_node"->new(@item[1..3]) }
               |   not_expr

               not_expr:   '!' brack_expr
                   { "not_expr_node"->new(@item[1..2]) }
               |   brack_expr

               brack_expr: '(' expression ')'
                   { "brack_expr_node"->new(@item[1..3]) }
               | identifier

               identifier: /[a-z]+/i
                   { "identifer_node"->new(@item[1]) }

       Note that, if a production already ends in an action, no autoaction is
       appended to it. For example, in this version:

           $::RD_AUTOACTION = q
             { $#item==1 ? $item[1] : "$item[0]_node"->new(@item[1..$#item]) };

           parser = Parse::RecDescent->new(q{
               expression: and_expr '&&' expression | and_expr
               and_expr:   not_expr '&&' and_expr   | not_expr
               not_expr:   '!' brack_expr           | brack_expr
               brack_expr: '(' expression ')'       | identifier
               identifier: /[a-z]+/i
                   { 'terminal_node'->new($item[1]) }

       each "identifier" match produces a "terminal_node" object, not an
       "identifier_node" object.

       A level 1 warning is issued each time an "autoaction" is added to some

       A commonly needed autoaction is one that builds a parse-tree. It is
       moderately tricky to set up such an action (which must treat terminals
       differently from non-terminals), so Parse::RecDescent simplifies the
       process by providing the "<autotree>" directive.

       If this directive appears at the start of grammar, it causes
       Parse::RecDescent to insert autoactions at the end of any rule except
       those which already end in an action. The action inserted depends on
       whether the production is an intermediate rule (two or more items), or
       a terminal of the grammar (i.e. a single pattern or string item).

       So, for example, the following grammar:


           file    : command(s)
           command : get | set | vet
           get : 'get' ident ';'
           set : 'set' ident 'to' value ';'
           vet : 'check' ident 'is' value ';'
           ident   : /\w+/
           value   : /\d+/

       is equivalent to:

           file    : command(s)        { bless \%item, $item[0] }
           command : get       { bless \%item, $item[0] }
           | set           { bless \%item, $item[0] }
           | vet           { bless \%item, $item[0] }
           get : 'get' ident ';'   { bless \%item, $item[0] }
           set : 'set' ident 'to' value ';'    { bless \%item, $item[0] }
           vet : 'check' ident 'is' value ';'  { bless \%item, $item[0] }

           ident   : /\w+/  { bless {__VALUE__=>$item[1]}, $item[0] }
           value   : /\d+/  { bless {__VALUE__=>$item[1]}, $item[0] }

       Note that each node in the tree is blessed into a class of the same
       name as the rule itself. This makes it easy to build object-oriented
       processors for the parse-trees that the grammar produces. Note too that
       the last two rules produce special objects with the single attribute
       '__VALUE__'. This is because they consist solely of a single terminal.

       This autoaction-ed grammar would then produce a parse tree in a data
       structure like this:

             file => {
               command => {
                [ get => {
                   identifier => { __VALUE__ => 'a' },
                  set => {
                   identifier => { __VALUE__ => 'b' },
                   value      => { __VALUE__ => '7' },
                  vet => {
                   identifier => { __VALUE__ => 'b' },
                   value      => { __VALUE__ => '7' },

       (except, of course, that each nested hash would also be blessed into
       the appropriate class).

       Normally, if a subrule appears in some production, but no rule of that
       name is ever defined in the grammar, the production which refers to the
       non-existent subrule fails immediately. This typically occurs as a
       result of misspellings, and is a sufficiently common occurance that a
       warning is generated for such situations.

       However, when prototyping a grammar it is sometimes useful to be able
       to use subrules before a proper specification of them is really
       possible.  For example, a grammar might include a section like:

           function_call: identifier '(' arg(s?) ')'

           identifier: /[a-z]\w*/i

       where the possible format of an argument is sufficiently complex that
       it is not worth specifying in full until the general function call
       syntax has been debugged. In this situation it is convenient to leave
       the real rule "arg" undefined and just slip in a placeholder (or

           arg: 'arg'

       so that the function call syntax can be tested with dummy input such

           f2(arg arg)
           f3(arg arg arg)

       et cetera.

       Early in prototyping, many such "stubs" may be required, so
       "Parse::RecDescent" provides a means of automating their definition.
       If the variable $::RD_AUTOSTUB is defined when a parser is built, a
       subrule reference to any non-existent rule (say, "sr"), causes a "stub"
       rule of the form:

           sr: 'sr'

       to be automatically defined in the generated parser.  A level 1 warning
       is issued for each such "autostubbed" rule.

       Hence, with $::AUTOSTUB defined, it is possible to only partially
       specify a grammar, and then "fake" matches of the unspecified
       (sub)rules by just typing in their name.

       If a subrule, token, or action is prefixed by "...", then it is treated
       as a "look-ahead" request. That means that the current production can
       (as usual) only succeed if the specified item is matched, but that the
       matching does not consume any of the text being parsed. This is very
       similar to the "/(?=...)/" look-ahead construct in Perl patterns. Thus,
       the rule:

           inner_word: word ...word

       will match whatever the subrule "word" matches, provided that match is
       followed by some more text which subrule "word" would also match
       (although this second substring is not actually consumed by

       Likewise, a "...!" prefix, causes the following item to succeed
       (without consuming any text) if and only if it would normally fail.
       Hence, a rule such as:

           identifier: ...!keyword ...!'_' /[A-Za-z_]\w*/

       matches a string of characters which satisfies the pattern
       "/[A-Za-z_]\w*/", but only if the same sequence of characters would not
       match either subrule "keyword" or the literal token '_'.

       Sequences of look-ahead prefixes accumulate, multiplying their positive
       and/or negative senses. Hence:

           inner_word: word ...!......!word

       is exactly equivalent the the original example above (a warning is
       issued in cases like these, since they often indicate something left
       out, or misunderstood).

       Note that actions can also be treated as look-aheads. In such cases,
       the state of the parser text (in the local variable $text) after the
       look-ahead action is guaranteed to be identical to its state before the
       action, regardless of how it's changed within the action (unless you
       actually undefine $text, in which case you get the disaster you deserve

       Directives are special pre-defined actions which may be used to alter
       the behaviour of the parser. There are currently twenty-three
       directives: "<commit>", "<uncommit>", "<reject>", "<score>",
       "<autoscore>", "<skip>", "<resync>", "<error>", "<warn>", "<hint>",
       "<trace_build>", "<trace_parse>", "<nocheck>", "<rulevar>",
       "<matchrule>", "<leftop>", "<rightop>", "<defer>", "<nocheck>",
       "<perl_quotelike>", "<perl_codeblock>", "<perl_variable>", and

       Committing and uncommitting
           The "<commit>" and "<uncommit>" directives permit the recursive
           descent of the parse tree to be pruned (or "cut") for efficiency.
           Within a rule, a "<commit>" directive instructs the rule to ignore
           subsequent productions if the current production fails. For

               command: 'find' <commit> filename
                  | 'open' <commit> filename
                  | 'move' filename filename

           Clearly, if the leading token 'find' is matched in the first
           production but that production fails for some other reason, then
           the remaining productions cannot possibly match. The presence of
           the "<commit>" causes the "command" rule to fail immediately if an
           invalid "find" command is found, and likewise if an invalid "open"
           command is encountered.

           It is also possible to revoke a previous commitment. For example:

               if_statement: 'if' <commit> condition
                   'then' block <uncommit>
                   'else' block
                   | 'if' <commit> condition
                   'then' block

           In this case, a failure to find an "else" block in the first
           production shouldn't preclude trying the second production, but a
           failure to find a "condition" certainly should.

           As a special case, any production in which the first item is an
           "<uncommit>" immediately revokes a preceding "<commit>" (even
           though the production would not otherwise have been tried). For
           example, in the rule:

               request: 'explain' expression
                  | 'explain' <commit> keyword
                  | 'save'
                  | 'quit'
                  | <uncommit> term '?'

           if the text being matched was "explain?", and the first two
           productions failed, then the "<commit>" in production two would
           cause productions three and four to be skipped, but the leading
           "<uncommit>" in the production five would allow that production to
           attempt a match.

           Note in the preceding example, that the "<commit>" was only placed
           in production two. If production one had been:

               request: 'explain' <commit> expression

           then production two would be (inappropriately) skipped if a leading
           "explain..." was encountered.

           Both "<commit>" and "<uncommit>" directives always succeed, and
           their value is always 1.

       Rejecting a production
           The "<reject>" directive immediately causes the current production
           to fail (it is exactly equivalent to, but more obvious than, the
           action "{undef}"). A "<reject>" is useful when it is desirable to
           get the side effects of the actions in one production, without
           prejudicing a match by some other production later in the rule. For
           example, to insert tracing code into the parse:

               complex_rule: { print "In complex rule...\n"; } <reject>

               complex_rule: simple_rule '+' 'i' '*' simple_rule
                   | 'i' '*' simple_rule
                   | simple_rule

           It is also possible to specify a conditional rejection, using the
           form "<reject:condition>", which only rejects if the specified
           condition is true. This form of rejection is exactly equivalent to
           the action "{(condition)?undef:1}>".  For example:

               command: save_command
                  | restore_command
                  | <reject: defined $::tolerant> { exit }
                  | <error: Unknown command. Ignored.>

           A "<reject>" directive never succeeds (and hence has no associated
           value). A conditional rejection may succeed (if its condition is
           not satisfied), in which case its value is 1.

           As an extra optimization, "Parse::RecDescent" ignores any
           production which begins with an unconditional "<reject>" directive,
           since any such production can never successfully match or have any
           useful side-effects. A level 1 warning is issued in all such cases.

           Note that productions beginning with conditional "<reject:...>"
           directives are never "optimized away" in this manner, even if they
           are always guaranteed to fail (for example: "<reject:1>")

           Due to the way grammars are parsed, there is a minor restriction on
           the condition of a conditional "<reject:...>": it cannot contain
           any raw '<' or '>' characters. For example:

               line: cmd <reject: $thiscolumn > max> data

           results in an error when a parser is built from this grammar (since
           the grammar parser has no way of knowing whether the first > is a
           "less than" or the end of the "<reject:...>".

           To overcome this problem, put the condition inside a do{} block:

               line: cmd <reject: do{$thiscolumn > max}> data

           Note that the same problem may occur in other directives that take
           arguments. The same solution will work in all cases.

       Skipping between terminals
           The "<skip>" directive enables the terminal prefix used in a
           production to be changed. For example:

               OneLiner: Command <skip:'[ \t]*'> Arg(s) /;/

           causes only blanks and tabs to be skipped before terminals in the
           "Arg" subrule (and any of its subrules>, and also before the final
           "/;/" terminal.  Once the production is complete, the previous
           terminal prefix is reinstated. Note that this implies that distinct
           productions of a rule must reset their terminal prefixes

           The "<skip>" directive evaluates to the previous terminal prefix,
           so it's easy to reinstate a prefix later in a production:

               Command: <skip:","> CSV(s) <skip:$item[1]> Modifier

           The value specified after the colon is interpolated into a pattern,
           so all of the following are equivalent (though their efficiency
           increases down the list):

               <skip: "$colon|$comma">   # ASSUMING THE VARS HOLD THE OBVIOUS VALUES

               <skip: ':|,'>

               <skip: q{[:,]}>

               <skip: qr/[:,]/>

           There is no way of directly setting the prefix for an entire rule,
           except as follows:

               Rule: <skip: '[ \t]*'> Prod1
               | <skip: '[ \t]*'> Prod2a Prod2b
               | <skip: '[ \t]*'> Prod3

           or, better:

               Rule: <skip: '[ \t]*'>
                 | Prod2a Prod2b
                 | Prod3

           Note: Up to release 1.51 of Parse::RecDescent, an entirely
           different mechanism was used for specifying terminal prefixes. The
           current method is not backwards-compatible with that early
           approach. The current approach is stable and will not to change

           The "<resync>" directive provides a visually distinctive means of
           consuming some of the text being parsed, usually to skip an
           erroneous input. In its simplest form "<resync>" simply consumes
           text up to and including the next newline ("\n") character,
           succeeding only if the newline is found, in which case it causes
           its surrounding rule to return zero on success.

           In other words, a "<resync>" is exactly equivalent to the token
           "/[^\n]*\n/" followed by the action "{ $return = 0 }" (except that
           productions beginning with a "<resync>" are ignored when generating
           error messages). A typical use might be:

               script : command(s)

               command: save_command
                  | restore_command
                  | <resync> # TRY NEXT LINE, IF POSSIBLE

           It is also possible to explicitly specify a resynchronization
           pattern, using the "<resync:pattern>" variant. This version
           succeeds only if the specified pattern matches (and consumes) the
           parsed text. In other words, "<resync:pattern>" is exactly
           equivalent to the token "/pattern/" (followed by a
           "{ $return = 0 }" action). For example, if commands were terminated
           by newlines or semi-colons:

               command: save_command
                  | restore_command
                  | <resync:[^;\n]*[;\n]>

           The value of a successfully matched "<resync>" directive (of either
           type) is the text that it consumed. Note, however, that since the
           directive also sets $return, a production consisting of a lone
           "<resync>" succeeds but returns the value zero (which a calling
           rule may find useful to distinguish between "true" matches and
           "tolerant" matches).  Remember that returning a zero value
           indicates that the rule succeeded (since only an "undef" denotes
           failure within "Parse::RecDescent" parsers.

       Error handling
           The "<error>" directive provides automatic or user-defined
           generation of error messages during a parse. In its simplest form
           "<error>" prepares an error message based on the mismatch between
           the last item expected and the text which cause it to fail. For
           example, given the rule:

               McCoy: curse ',' name ', I'm a doctor, not a' a_profession '!'
                | pronoun 'dead,' name '!'
                | <error>

           the following strings would produce the following messages:

           "Amen, Jim!"
                      ERROR (line 1): Invalid McCoy: Expected curse or pronoun
                          not found

           "Dammit, Jim, I'm a doctor!"
                      ERROR (line 1): Invalid McCoy: Expected ", I'm a doctor, not a"
                          but found ", I'm a doctor!" instead

           "He's dead,\n"
                      ERROR (line 2): Invalid McCoy: Expected name not found

           "He's alive!"
                      ERROR (line 1): Invalid McCoy: Expected 'dead,' but found
                          "alive!" instead

           "Dammit, Jim, I'm a doctor, not a pointy-eared Vulcan!"
                      ERROR (line 1): Invalid McCoy: Expected a profession but found
                          "pointy-eared Vulcan!" instead

           Note that, when autogenerating error messages, all underscores in
           any rule name used in a message are replaced by single spaces (for
           example "a_production" becomes "a production"). Judicious choice of
           rule names can therefore considerably improve the readability of
           automatic error messages (as well as the maintainability of the
           original grammar).

           If the automatically generated error is not sufficient, it is
           possible to provide an explicit message as part of the error
           directive. For example:

               Spock: "Fascinating ',' (name | 'Captain') '.'
                | "Highly illogical, doctor."
                | <error: He never said that!>

           which would result in all failures to parse a "Spock" subrule
           printing the following message:

                  ERROR (line <N>): Invalid Spock:  He never said that!

           The error message is treated as a "qq{...}" string and interpolated
           when the error is generated (not when the directive is specified!).

               <error: Mystical error near "$text">

           would correctly insert the ambient text string which caused the

           There are two other forms of error directive: "<error?>" and
           "<error?: msg>". These behave just like "<error>" and
           "<error: msg>" respectively, except that they are only triggered if
           the rule is "committed" at the time they are encountered. For

               Scotty: "Ya kenna change the Laws of Phusics," <commit> name
                 | name <commit> ',' 'she's goanta blaw!'
                 | <error?>

           will only generate an error for a string beginning with "Ya kenna
           change the Laws o' Phusics," or a valid name, but which still fails
           to match the corresponding production. That is,
           "$parser->Scotty("Aye, Cap'ain")" will fail silently (since neither
           production will "commit" the rule on that input), whereas
           "$parser->Scotty("Mr Spock, ah jest kenna do'ut!")"  will fail with
           the error message:

                  ERROR (line 1): Invalid Scotty: expected 'she's goanta blaw!'
                      but found 'I jest kenna do'ut!' instead.

           since in that case the second production would commit after
           matching the leading name.

           Note that to allow this behaviour, all "<error>" directives which
           are the first item in a production automatically uncommit the rule
           just long enough to allow their production to be attempted (that
           is, when their production fails, the commitment is reinstated so
           that subsequent productions are skipped).

           In order to permanently uncommit the rule before an error message,
           it is necessary to put an explicit "<uncommit>" before the
           "<error>". For example:

               line: 'Kirk:'  <commit> Kirk
               | 'Spock:' <commit> Spock
               | 'McCoy:' <commit> McCoy
               | <uncommit> <error?> <reject>
               | <resync>

           Error messages generated by the various "<error...>" directives are
           not displayed immediately. Instead, they are "queued" in a buffer
           and are only displayed once parsing ultimately fails. Moreover,
           "<error...>" directives that cause one production of a rule to fail
           are automatically removed from the message queue if another
           production subsequently causes the entire rule to succeed.  This
           means that you can put "<error...>" directives wherever useful
           diagnosis can be done, and only those associated with actual parser
           failure will ever be displayed. Also see "Gotchas".

           As a general rule, the most useful diagnostics are usually
           generated either at the very lowest level within the grammar, or at
           the very highest. A good rule of thumb is to identify those
           subrules which consist mainly (or entirely) of terminals, and then
           put an "<error...>" directive at the end of any other rule which
           calls one or more of those subrules.

           There is one other situation in which the output of the various
           types of error directive is suppressed; namely, when the rule
           containing them is being parsed as part of a "look-ahead" (see
           "Look-ahead"). In this case, the error directive will still cause
           the rule to fail, but will do so silently.

           An unconditional "<error>" directive always fails (and hence has no
           associated value). This means that encountering such a directive
           always causes the production containing it to fail. Hence an
           "<error>" directive will inevitably be the last (useful) item of a
           rule (a level 3 warning is issued if a production contains items
           after an unconditional "<error>" directive).

           An "<error?>" directive will succeed (that is: fail to fail :-), if
           the current rule is uncommitted when the directive is encountered.
           In that case the directive's associated value is zero. Hence, this
           type of error directive can be used before the end of a production.
           For example:

               command: 'do' <commit> something
                  | 'report' <commit> something
                  | <error?: Syntax error> <error: Unknown command>

           Warning: The "<error?>" directive does not mean "always fail (but
           do so silently unless committed)". It actually means "only fail
           (and report) if committed, otherwise succeed". To achieve the "fail
           silently if uncommitted" semantics, it is necessary to use:

               rule: item <commit> item(s)
               | <error?> <reject>  # FAIL SILENTLY UNLESS COMMITTED

           However, because people seem to expect a lone "<error?>" directive
           to work like this:

               rule: item <commit> item(s)
               | <error?: Error message if committed>
               | <error:  Error message if uncommitted>

           Parse::RecDescent automatically appends a "<reject>" directive if
           the "<error?>" directive is the only item in a production. A level
           2 warning (see below) is issued when this happens.

           The level of error reporting during both parser construction and
           parsing is controlled by the presence or absence of four global
           variables: $::RD_ERRORS, $::RD_WARN, $::RD_HINT, and <$::RD_TRACE>.
           If $::RD_ERRORS is defined (and, by default, it is) then fatal
           errors are reported.

           Whenever $::RD_WARN is defined, certain non-fatal problems are also

           Warnings have an associated "level": 1, 2, or 3. The higher the
           level, the more serious the warning. The value of the corresponding
           global variable ($::RD_WARN) determines the lowest level of warning
           to be displayed. Hence, to see all warnings, set $::RD_WARN to 1.
           To see only the most serious warnings set $::RD_WARN to 3.  By
           default $::RD_WARN is initialized to 3, ensuring that serious but
           non-fatal errors are automatically reported.

           There is also a grammar directive to turn on warnings from within
           the grammar: "<warn>". It takes an optional argument, which
           specifies the warning level: "<warn: 2>".

           See "DIAGNOSTICS" for a list of the varous error and warning
           messages that Parse::RecDescent generates when these two variables
           are defined.

           Defining any of the remaining variables (which are not defined by
           default) further increases the amount of information reported.
           Defining $::RD_HINT causes the parser generator to offer more
           detailed analyses and hints on both errors and warnings.  Note that
           setting $::RD_HINT at any point automagically sets $::RD_WARN to 1.
           There is also a "<hint>" directive, which can be hard-coded into a

           Defining $::RD_TRACE causes the parser generator and the parser to
           report their progress to STDERR in excruciating detail (although,
           without hints unless $::RD_HINT is separately defined). This detail
           can be moderated in only one respect: if $::RD_TRACE has an integer
           value (N) greater than 1, only the N characters of the "current
           parsing context" (that is, where in the input string we are at any
           point in the parse) is reported at any time.
              > $::RD_TRACE is mainly useful for debugging a grammar that
           isn't behaving as you expected it to. To this end, if $::RD_TRACE
           is defined when a parser is built, any actual parser code which is
           generated is also written to a file named "RD_TRACE" in the local

           There are two directives associated with the $::RD_TRACE variable.
           If a grammar contains a "<trace_build>" directive anywhere in its
           specification, $::RD_TRACE is turned on during the parser
           construction phase.  If a grammar contains a "<trace_parse>"
           directive anywhere in its specification, $::RD_TRACE is turned on
           during any parse the parser performs.

           Note that the four variables belong to the "main" package, which
           makes them easier to refer to in the code controlling the parser,
           and also makes it easy to turn them into command line flags
           ("-RD_ERRORS", "-RD_WARN", "-RD_HINT", "-RD_TRACE") under perl -s.

           The corresponding directives are useful to "hardwire" the various
           debugging features into a particular grammar (rather than having to
           set and reset external variables).

       Consistency checks
           Whenever a parser is build, Parse::RecDescent carries out a number
           of (potentially expensive) consistency checks. These include:
           verifying that the grammar is not left-recursive and that no rules
           have been left undefined.

           These checks are important safeguards during development, but
           unnecessary overheads when the grammar is stable and ready to be
           deployed. So Parse::RecDescent provides a directive to disable
           them: "<nocheck>".

           If a grammar contains a "<nocheck>" directive anywhere in its
           specification, the extra compile-time checks are by-passed.

       Specifying local variables
           It is occasionally convenient to specify variables which are local
           to a single rule. This may be achieved by including a
           "<rulevar:...>" directive anywhere in the rule. For example:

               markup: <rulevar: $tag>

               markup: tag {($tag=$item[1]) =~ s/^<|>$//g} body[$tag]

           The example "<rulevar: $tag>" directive causes a "my" variable
           named $tag to be declared at the start of the subroutine
           implementing the "markup" rule (that is, before the first
           production, regardless of where in the rule it is specified).

           Specifically, any directive of the form: "<rulevar:text>" causes a
           line of the form "my text;" to be added at the beginning of the
           rule subroutine, immediately after the definitions of the following
           local variables:

               $thisparser $commit
               $thisrule   @item
               $thisline   @arg
               $text   %arg

           This means that the following "<rulevar>" directives work as

               <rulevar: $count = 0 >

               <rulevar: $firstarg = $arg[0] || '' >

               <rulevar: $myItems = \@item >

               <rulevar: @context = ( $thisline, $text, @arg ) >

               <rulevar: ($name,$age) = $arg{"name","age"} >

           If a variable that is also visible to subrules is required, it
           needs to be "local"'d, not "my"'d. "rulevar" defaults to "my", but
           if "local" is explicitly specified:

               <rulevar: local $count = 0 >

           then a "local"-ized variable is declared instead, and will be
           available within subrules.

           Note however that, because all such variables are "my" variables,
           their values do not persist between match attempts on a given rule.
           To preserve values between match attempts, values can be stored
           within the "local" member of the $thisrule object:

               countedrule: { $thisrule->{"local"}{"count"}++ }
                  | subrule1
                  | subrule2
                  | <reject: $thisrule->{"local"}{"count"} == 1>

           When matching a rule, each "<rulevar>" directive is matched as if
           it were an unconditional "<reject>" directive (that is, it causes
           any production in which it appears to immediately fail to match).
           For this reason (and to improve readability) it is usual to specify
           any "<rulevar>" directive in a separate production at the start of
           the rule (this has the added advantage that it enables
           "Parse::RecDescent" to optimize away such productions, just as it
           does for the "<reject>" directive).

       Dynamically matched rules
           Because regexes and double-quoted strings are interpolated, it is
           relatively easy to specify productions with "context sensitive"
           tokens. For example:

               command:  keyword  body  "end $item[1]"

           which ensures that a command block is bounded by a "<keyword>...end
           <same keyword>" pair.

           Building productions in which subrules are context sensitive is
           also possible, via the "<matchrule:...>" directive. This directive
           behaves identically to a subrule item, except that the rule which
           is invoked to match it is determined by the string specified after
           the colon. For example, we could rewrite the "command" rule like

               command:  keyword  <matchrule:body>  "end $item[1]"

           Whatever appears after the colon in the directive is treated as an
           interpolated string (that is, as if it appeared in "qq{...}"
           operator) and the value of that interpolated string is the name of
           the subrule to be matched.

           Of course, just putting a constant string like "body" in a
           "<matchrule:...>" directive is of little interest or benefit.  The
           power of directive is seen when we use a string that interpolates
           to something interesting. For example:

               command:    keyword <matchrule:$item[1]_body> "end $item[1]"

               keyword:    'while' | 'if' | 'function'

               while_body: condition block

               if_body:    condition block ('else' block)(?)

               function_body:  arglist block

           Now the "command" rule selects how to proceed on the basis of the
           keyword that is found. It is as if "command" were declared:

               command:    'while'    while_body    "end while"
                  |    'if'       if_body   "end if"
                  |    'function' function_body "end function"

           When a "<matchrule:...>" directive is used as a repeated subrule,
           the rule name expression is "late-bound". That is, the name of the
           rule to be called is re-evaluated each time a match attempt is
           made. Hence, the following grammar:

               { $::species = 'dogs' }

               pair:   'two' <matchrule:$::species>(s)

               dogs:   /dogs/ { $::species = 'cats' }

               cats:   /cats/

           will match the string "two dogs cats cats" completely, whereas it
           will only match the string "two dogs dogs dogs" up to the eighth
           letter. If the rule name were "early bound" (that is, evaluated
           only the first time the directive is encountered in a production),
           the reverse behaviour would be expected.

           Note that the "matchrule" directive takes a string that is to be
           treated as a rule name, not as a rule invocation. That is, it's
           like a Perl symbolic reference, not an "eval". Just as you can say:

               $subname = 'foo';

               # and later...


           but not:

               $subname = 'foo(@args)';

               # and later...


           likewise you can say:

               $rulename = 'foo';

               # and in the grammar...


           but not:

               $rulename = 'foo[@args]';

               # and in the grammar...


       Deferred actions
           The "<defer:...>" directive is used to specify an action to be
           performed when (and only if!) the current production ultimately

           Whenever a "<defer:...>" directive appears, the code it specifies
           is converted to a closure (an anonymous subroutine reference) which
           is queued within the active parser object. Note that, because the
           deferred code is converted to a closure, the values of any "local"
           variable (such as $text, <@item>, etc.) are preserved until the
           deferred code is actually executed.

           If the parse ultimately succeeds and the production in which the
           "<defer:...>" directive was evaluated formed part of the successful
           parse, then the deferred code is executed immediately before the
           parse returns. If however the production which queued a deferred
           action fails, or one of the higher-level rules which called that
           production fails, then the deferred action is removed from the
           queue, and hence is never executed.

           For example, given the grammar:

               sentence: noun trans noun
               | noun intrans

               noun:     'the dog'
                   { print "$item[1]\t(noun)\n" }
               |     'the meat'
                   { print "$item[1]\t(noun)\n" }

               trans:    'ate'
                   { print "$item[1]\t(transitive)\n" }

               intrans:  'ate'
                   { print "$item[1]\t(intransitive)\n" }
                  |  'barked'
                   { print "$item[1]\t(intransitive)\n" }

           then parsing the sentence "the dog ate" would produce the output:

               the dog  (noun)
               ate  (transitive)
               the dog  (noun)
               ate  (intransitive)

           This is because, even though the first production of "sentence"
           ultimately fails, its initial subrules "noun" and "trans" do match,
           and hence they execute their associated actions.  Then the second
           production of "sentence" succeeds, causing the actions of the
           subrules "noun" and "intrans" to be executed as well.

           On the other hand, if the actions were replaced by "<defer:...>"

               sentence: noun trans noun
               | noun intrans

               noun:     'the dog'
                   <defer: print "$item[1]\t(noun)\n" >
               |     'the meat'
                   <defer: print "$item[1]\t(noun)\n" >

               trans:    'ate'
                   <defer: print "$item[1]\t(transitive)\n" >

               intrans:  'ate'
                   <defer: print "$item[1]\t(intransitive)\n" >
                  |  'barked'
                   <defer: print "$item[1]\t(intransitive)\n" >

           the output would be:

               the dog  (noun)
               ate  (intransitive)

           since deferred actions are only executed if they were evaluated in
           a production which ultimately contributes to the successful parse.

           In this case, even though the first production of "sentence" caused
           the subrules "noun" and "trans" to match, that production
           ultimately failed and so the deferred actions queued by those
           subrules were subsequently disgarded. The second production then
           succeeded, causing the entire parse to succeed, and so the deferred
           actions queued by the (second) match of the "noun" subrule and the
           subsequent match of "intrans" are preserved and eventually

           Deferred actions provide a means of improving the performance of a
           parser, by only executing those actions which are part of the final
           parse-tree for the input data.

           Alternatively, deferred actions can be viewed as a mechanism for
           building (and executing) a customized subroutine corresponding to
           the given input data, much in the same way that autoactions (see
           "Autoactions") can be used to build a customized data structure for
           specific input.

           Whether or not the action it specifies is ever executed, a
           "<defer:...>" directive always succeeds, returning the number of
           deferred actions currently queued at that point.

       Parsing Perl
           Parse::RecDescent provides limited support for parsing subsets of
           Perl, namely: quote-like operators, Perl variables, and complete
           code blocks.

           The "<perl_quotelike>" directive can be used to parse any Perl
           quote-like operator: 'a string', "m/a pattern/", "tr{ans}{lation}",
           etc.  It does this by calling Text::Balanced::quotelike().

           If a quote-like operator is found, a reference to an array of eight
           elements is returned. Those elements are identical to the last
           eight elements returned by Text::Balanced::extract_quotelike() in
           an array context, namely:

           [0] the name of the quotelike operator -- 'q', 'qq', 'm', 's', 'tr'
               -- if the operator was named; otherwise "undef",

           [1] the left delimiter of the first block of the operation,

           [2] the text of the first block of the operation (that is, the
               contents of a quote, the regex of a match, or substitution or
               the target list of a translation),

           [3] the right delimiter of the first block of the operation,

           [4] the left delimiter of the second block of the operation if
               there is one (that is, if it is a "s", "tr", or "y"); otherwise

           [5] the text of the second block of the operation if there is one
               (that is, the replacement of a substitution or the translation
               list of a translation); otherwise "undef",

           [6] the right delimiter of the second block of the operation (if
               any); otherwise "undef",

           [7] the trailing modifiers on the operation (if any); otherwise

           If a quote-like expression is not found, the directive fails with
           the usual "undef" value.

           The "<perl_variable>" directive can be used to parse any Perl
           variable: $scalar, @array, %hash, $ref->{field}[$index], etc.  It
           does this by calling Text::Balanced::extract_variable().

           If the directive matches text representing a valid Perl variable
           specification, it returns that text. Otherwise it fails with the
           usual "undef" value.

           The "<perl_codeblock>" directive can be used to parse curly-brace-
           delimited block of Perl code, such as: { $a = 1; f() =~ m/pat/; }.
           It does this by calling Text::Balanced::extract_codeblock().

           If the directive matches text representing a valid Perl code block,
           it returns that text. Otherwise it fails with the usual "undef"

           You can also tell it what kind of brackets to use as the outermost
           delimiters. For example:

               arglist: <perl_codeblock ()>

           causes an arglist to match a perl code block whose outermost
           delimiters are "(...)" (rather than the default "{...}").

       Constructing tokens
           Eventually, Parse::RecDescent will be able to parse tokenized
           input, as well as ordinary strings. In preparation for this joyous
           day, the "<token:...>" directive has been provided.  This directive
           creates a token which will be suitable for input to a
           Parse::RecDescent parser (when it eventually supports tokenized

           The text of the token is the value of the immediately preceding
           item in the production. A "<token:...>" directive always succeeds
           with a return value which is the hash reference that is the new
           token. It also sets the return value for the production to that
           hash ref.

           The "<token:...>" directive makes it easy to build a
           Parse::RecDescent-compatible lexer in Parse::RecDescent:

               my $lexer = new Parse::RecDescent q
               lex:    token(s)

               token:  /a\b/          <token:INDEF>
                    |  /the\b/        <token:DEF>
                    |  /fly\b/        <token:NOUN,VERB>
                    |  /[a-z]+/i { lc $item[1] }  <token:ALPHA>
                    |  <error: Unknown token>


           which will eventually be able to be used with a regular
           Parse::RecDescent grammar:

               my $parser = new Parse::RecDescent q
               startrule: subrule1 subrule 2

               # ETC...

           either with a pre-lexing phase:

               $parser->startrule( $lexer->lex($data) );

           or with a lex-on-demand approach:

               $parser->startrule( sub{$lexer->token(\$data)} );

           But at present, only the "<token:...>" directive is actually
           implemented. The rest is vapourware.

       Specifying operations
           One of the commonest requirements when building a parser is to
           specify binary operators. Unfortunately, in a normal grammar, the
           rules for such things are awkward:

               disjunction:    conjunction ('or' conjunction)(s?)
                   { $return = [ $item[1], @{$item[2]} ] }

               conjunction:    atom ('and' atom)(s?)
                   { $return = [ $item[1], @{$item[2]} ] }

           or inefficient:

               disjunction:    conjunction 'or' disjunction
                   { $return = [ $item[1], @{$item[2]} ] }
                  |    conjunction
                   { $return = [ $item[1] ] }

               conjunction:    atom 'and' conjunction
                   { $return = [ $item[1], @{$item[2]} ] }
                  |    atom
                   { $return = [ $item[1] ] }

           and either way is ugly and hard to get right.

           The "<leftop:...>" and "<rightop:...>" directives provide an easier
           way of specifying such operations. Using "<leftop:...>" the above
           examples become:

               disjunction:    <leftop: conjunction 'or' conjunction>
               conjunction:    <leftop: atom 'and' atom>

           The "<leftop:...>" directive specifies a left-associative binary
           operator.  It is specified around three other grammar elements
           (typically subrules or terminals), which match the left operand,
           the operator itself, and the right operand respectively.

           A "<leftop:...>" directive such as:

               disjunction:    <leftop: conjunction 'or' conjunction>

           is converted to the following:

               disjunction:    ( conjunction ('or' conjunction)(s?)
                   { $return = [ $item[1], @{$item[2]} ] } )

           In other words, a "<leftop:...>" directive matches the left operand
           followed by zero or more repetitions of both the operator and the
           right operand. It then flattens the matched items into an anonymous
           array which becomes the (single) value of the entire "<leftop:...>"

           For example, an "<leftop:...>" directive such as:

               output:  <leftop: ident '<<' expr >

           when given a string such as:

               cout << var << "str" << 3

           would match, and $item[1] would be set to:

               [ 'cout', 'var', '"str"', '3' ]

           In other words:

               output:  <leftop: ident '<<' expr >

           is equivalent to a left-associative operator:

               output:  ident          { $return = [$item[1]]   }
                 |  ident '<<' expr        { $return = [@item[1,3]]     }
                 |  ident '<<' expr '<<' expr      { $return = [@item[1,3,5]]   }
                 |  ident '<<' expr '<<' expr '<<' expr    { $return = [@item[1,3,5,7]] }
                 #  ...etc...

           Similarly, the "<rightop:...>" directive takes a left operand, an
           operator, and a right operand:

               assign:  <rightop: var '=' expr >

           and converts them to:

               assign:  ( (var '=' {$return=$item[1]})(s?) expr
                   { $return = [ @{$item[1]}, $item[2] ] } )

           which is equivalent to a right-associative operator:

               assign:  var        { $return = [$item[1]]       }
                 |  var '=' expr       { $return = [@item[1,3]]     }
                 |  var '=' var '=' expr   { $return = [@item[1,3,5]]   }
                 |  var '=' var '=' var '=' expr   { $return = [@item[1,3,5,7]] }
                 #  ...etc...

           Note that for both the "<leftop:...>" and "<rightop:...>"
           directives, the directive does not normally return the operator
           itself, just a list of the operands involved. This is particularly
           handy for specifying lists:

               list: '(' <leftop: list_item ',' list_item> ')'
                   { $return = $item[2] }

           There is, however, a problem: sometimes the operator is itself
           significant.  For example, in a Perl list a comma and a "=>" are
           both valid separators, but the "=>" has additional stringification
           semantics.  Hence it's important to know which was used in each

           To solve this problem the "<leftop:...>" and "<rightop:...>"
           directives do return the operator(s) as well, under two
           circumstances.  The first case is where the operator is specified
           as a subrule. In that instance, whatever the operator matches is
           returned (on the assumption that if the operator is important
           enough to have its own subrule, then it's important enough to

           The second case is where the operator is specified as a regular
           expression. In that case, if the first bracketed subpattern of the
           regular expression matches, that matching value is returned (this
           is analogous to the behaviour of the Perl "split" function, except
           that only the first subpattern is returned).

           In other words, given the input:

               ( a=>1, b=>2 )

           the specifications:

               list:      '('  <leftop: list_item separator list_item>  ')'

               separator: ',' | '=>'


               list:      '('  <leftop: list_item /(,|=>)/ list_item>  ')'

           cause the list separators to be interleaved with the operands in
           the anonymous array in $item[2]:

               [ 'a', '=>', '1', ',', 'b', '=>', '2' ]

           But the following version:

               list:      '('  <leftop: list_item /,|=>/ list_item>  ')'

           returns only the operators:

               [ 'a', '1', 'b', '2' ]

           Of course, none of the above specifications handle the case of an
           empty list, since the "<leftop:...>" and "<rightop:...>" directives
           require at least a single right or left operand to match. To
           specify that the operator can match "trivially", it's necessary to
           add a "(s?)" qualifier to the directive:

               list:      '('  <leftop: list_item /(,|=>)/ list_item>(s?)  ')'

           Note that in almost all the above examples, the first and third
           arguments of the "<leftop:...>" directive were the same subrule.
           That is because "<leftop:...>"'s are frequently used to specify
           "separated" lists of the same type of item. To make such lists
           easier to specify, the following syntax:

               list:   element(s /,/)

           is exactly equivalent to:

               list:   <leftop: element /,/ element>

           Note that the separator must be specified as a raw pattern (i.e.
           not a string or subrule).

       Scored productions
           By default, Parse::RecDescent grammar rules always accept the first
           production that matches the input. But if two or more productions
           may potentially match the same input, choosing the first that does
           so may not be optimal.

           For example, if you were parsing the sentence "time flies like an
           arrow", you might use a rule like this:

               sentence: verb noun preposition article noun { [@item] }
               | adjective noun verb article noun   { [@item] }
               | noun verb preposition article noun { [@item] }

           Each of these productions matches the sentence, but the third one
           is the most likely interpretation. However, if the sentence had
           been "fruit flies like a banana", then the second production is
           probably the right match.

           To cater for such situtations, the "<score:...>" can be used.  The
           directive is equivalent to an unconditional "<reject>", except that
           it allows you to specify a "score" for the current production. If
           that score is numerically greater than the best score of any
           preceding production, the current production is cached for later
           consideration. If no later production matches, then the cached
           production is treated as having matched, and the value of the item
           immediately before its "<score:...>" directive is returned as the

           In other words, by putting a "<score:...>" directive at the end of
           each production, you can select which production matches using
           criteria other than specification order. For example:

               sentence: verb noun preposition article noun { [@item] } <score: sensible(@item)>
               | adjective noun verb article noun   { [@item] } <score: sensible(@item)>
               | noun verb preposition article noun { [@item] } <score: sensible(@item)>

           Now, when each production reaches its respective "<score:...>"
           directive, the subroutine "sensible" will be called to evaluate the
           matched items (somehow). Once all productions have been tried, the
           one which "sensible" scored most highly will be the one that is
           accepted as a match for the rule.

           The variable $score always holds the current best score of any
           production, and the variable $score_return holds the corresponding
           return value.

           As another example, the following grammar matches lines that may be
           separated by commas, colons, or semi-colons. This can be tricky if
           a colon-separated line also contains commas, or vice versa. The
           grammar resolves the ambiguity by selecting the rule that results
           in the fewest fields:

               line: seplist[sep=>',']  <score: -@{$item[1]}>
               | seplist[sep=>':']  <score: -@{$item[1]}>
               | seplist[sep=>" "]  <score: -@{$item[1]}>

               seplist: <skip:""> <leftop: /[^$arg{sep}]*/ "$arg{sep}" /[^$arg{sep}]*/>

           Note the use of negation within the "<score:...>" directive to
           ensure that the seplist with the most items gets the lowest score.

           As the above examples indicate, it is often the case that all
           productions in a rule use exactly the same "<score:...>" directive.
           It is tedious to have to repeat this identical directive in every
           production, so Parse::RecDescent also provides the
           "<autoscore:...>" directive.

           If an "<autoscore:...>" directive appears in any production of a
           rule, the code it specifies is used as the scoring code for every
           production of that rule, except productions that already end with
           an explicit "<score:...>" directive. Thus the rules above could be

               line: <autoscore: -@{$item[1]}>
               line: seplist[sep=>',']
               | seplist[sep=>':']
               | seplist[sep=>" "]

               sentence: <autoscore: sensible(@item)>
               | verb noun preposition article noun { [@item] }
               | adjective noun verb article noun   { [@item] }
               | noun verb preposition article noun { [@item] }

           Note that the "<autoscore:...>" directive itself acts as an
           unconditional "<reject>", and (like the "<rulevar:...>" directive)
           is pruned at compile-time wherever possible.

       Dispensing with grammar checks
           During the compilation phase of parser construction,
           Parse::RecDescent performs a small number of checks on the grammar
           it's given. Specifically it checks that the grammar is not left-
           recursive, that there are no "insatiable" constructs of the form:

               rule: subrule(s) subrule

           and that there are no rules missing (i.e. referred to, but never

           These checks are important during development, but can slow down
           parser construction in stable code. So Parse::RecDescent provides
           the <nocheck> directive to turn them off. The directive can only
           appear before the first rule definition, and switches off checking
           throughout the rest of the current grammar.

           Typically, this directive would be added when a parser has been
           thoroughly tested and is ready for release.

   Subrule argument lists
       It is occasionally useful to pass data to a subrule which is being
       invoked. For example, consider the following grammar fragment:

           classdecl: keyword decl

           keyword:   'struct' | 'class';

           decl:      # WHATEVER

       The "decl" rule might wish to know which of the two keywords was used
       (since it may affect some aspect of the way the subsequent declaration
       is interpreted). "Parse::RecDescent" allows the grammar designer to
       pass data into a rule, by placing that data in an argument list (that
       is, in square brackets) immediately after any subrule item in a
       production. Hence, we could pass the keyword to "decl" as follows:

           classdecl: keyword decl[ $item[1] ]

           keyword:   'struct' | 'class';

           decl:      # WHATEVER

       The argument list can consist of any number (including zero!) of comma-
       separated Perl expressions. In other words, it looks exactly like a
       Perl anonymous array reference. For example, we could pass the keyword,
       the name of the surrounding rule, and the literal 'keyword' to "decl"
       like so:

           classdecl: keyword decl[$item[1],$item[0],'keyword']

           keyword:   'struct' | 'class';

           decl:      # WHATEVER

       Within the rule to which the data is passed ("decl" in the above
       examples) that data is available as the elements of a local variable
       @arg. Hence "decl" might report its intentions as follows:

           classdecl: keyword decl[$item[1],$item[0],'keyword']

           keyword:   'struct' | 'class';

           decl:      { print "Declaring $arg[0] (a $arg[2])\n";
                print "(this rule called by $arg[1])" }

       Subrule argument lists can also be interpreted as hashes, simply by
       using the local variable %arg instead of @arg. Hence we could rewrite
       the previous example:

           classdecl: keyword decl[keyword => $item[1],
               caller  => $item[0],
               type    => 'keyword']

           keyword:   'struct' | 'class';

           decl:      { print "Declaring $arg{keyword} (a $arg{type})\n";
                print "(this rule called by $arg{caller})" }

       Both @arg and %arg are always available, so the grammar designer may
       choose whichever convention (or combination of conventions) suits best.

       Subrule argument lists are also useful for creating "rule templates"
       (especially when used in conjunction with the "<matchrule:...>"
       directive). For example, the subrule:

           list:     <matchrule:$arg{rule}> /$arg{sep}/ list[%arg]
               { $return = [ $item[1], @{$item[3]} ] }
           |     <matchrule:$arg{rule}>
               { $return = [ $item[1]] }

       is a handy template for the common problem of matching a separated
       list.  For example:

           function: 'func' name '(' list[rule=>'param',sep=>';'] ')'

           param:    list[rule=>'name',sep=>','] ':' typename

           name:     /\w+/

           typename: name

       When a subrule argument list is used with a repeated subrule, the
       argument list goes before the repetition specifier:

           list:   /some|many/ thing[ $item[1] ](s)

       The argument list is "late bound". That is, it is re-evaluated for
       every repetition of the repeated subrule.  This means that each
       repeated attempt to match the subrule may be passed a completely
       different set of arguments if the value of the expression in the
       argument list changes between attempts. So, for example, the grammar:

           { $::species = 'dogs' }

           pair:   'two' animal[$::species](s)

           animal: /$arg[0]/ { $::species = 'cats' }

       will match the string "two dogs cats cats" completely, whereas it will
       only match the string "two dogs dogs dogs" up to the eighth letter. If
       the value of the argument list were "early bound" (that is, evaluated
       only the first time a repeated subrule match is attempted), one would
       expect the matching behaviours to be reversed.

       Of course, it is possible to effectively "early bind" such argument
       lists by passing them a value which does not change on each repetition.
       For example:

           { $::species = 'dogs' }

           pair:   'two' { $::species } animal[$item[2]](s)

           animal: /$arg[0]/ { $::species = 'cats' }

       Arguments can also be passed to the start rule, simply by appending
       them to the argument list with which the start rule is called (after
       the "line number" parameter). For example, given:

           $parser = new Parse::RecDescent ( $grammar );

           $parser->data($text, 1, "str", 2, \@arr);

           #         ^^^^^  ^  ^^^^^^^^^^^^^^^
           #       |    |     |
           # TEXT TO BE PARSED  |     |
           # STARTING LINE NUMBER     |
           # ELEMENTS OF @arg WHICH IS PASSED TO RULE data

       then within the productions of the rule "data", the array @arg will
       contain "("str", 2, \@arr)".

       Alternations are implicit (unnamed) rules defined as part of a
       production. An alternation is defined as a series of '|'-separated
       productions inside a pair of round brackets. For example:

           character: 'the' ( good | bad | ugly ) /dude/

       Every alternation implicitly defines a new subrule, whose
       automatically-generated name indicates its origin:
       "_alternation_<I>_of_production_<P>_of_rule<R>" for the appropriate
       values of <I>, <P>, and <R>. A call to this implicit subrule is then
       inserted in place of the brackets. Hence the above example is merely a
       convenient short-hand for:

           character: 'the'

              good | bad | ugly

       Since alternations are parsed by recursively calling the parser
       generator, any type(s) of item can appear in an alternation. For

           character: 'the' ( 'high' "plains"  # Silent, with poncho
                | /no[- ]name/ # Silent, no poncho
                | vengeance_seeking    # Poncho-optional
                | <error>
                ) drifter

       In this case, if an error occurred, the automatically generated message
       would be:

           ERROR (line <N>): Invalid implicit subrule: Expected
                 'high' or /no[- ]name/ or generic,
                 but found "pacifist" instead

       Since every alternation actually has a name, it's even possible to
       extend or replace them:

               'generic Eastwood'"

       More importantly, since alternations are a form of subrule, they can be
       given repetition specifiers:

           character: 'the' ( good | bad | ugly )(?) /dude/

   Incremental Parsing
       "Parse::RecDescent" provides two methods - "Extend" and "Replace" -
       which can be used to alter the grammar matched by a parser. Both
       methods take the same argument as "Parse::RecDescent::new", namely a
       grammar specification string

       "Parse::RecDescent::Extend" interprets the grammar specification and
       adds any productions it finds to the end of the rules for which they
       are specified. For example:

           $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/";

       adds two productions to the rule "name" (creating it if necessary) and
       one production to the rule "desc".

       "Parse::RecDescent::Replace" is identical, except that it first resets
       are rule specified in the additional grammar, removing any existing
       productions.  Hence after:

           $add = "name: 'Jimmy-Bob' | 'Bobby-Jim'\ndesc: colour /necks?/";

       are are only valid "name"s and the one possible description.

       A more interesting use of the "Extend" and "Replace" methods is to call
       them inside the action of an executing parser. For example:

           typedef: 'typedef' type_name identifier ';'
                  { $thisparser->Extend("type_name: '$item[3]'") }
              | <error>

           identifier: ...!type_name /[A-Za-z_]w*/

       which automatically prevents type names from being typedef'd, or:

           command: 'map' key_name 'to' abort_key
                  { $thisparser->Replace("abort_key: '$item[2]'") }
              | 'map' key_name 'to' key_name
                  { map_key($item[2],$item[4]) }
              | abort_key
                  { exit if confirm("abort?") }

           abort_key: 'q'

           key_name: ...!abort_key /[A-Za-z]/

       which allows the user to change the abort key binding, but not to
       unbind it.

       The careful use of such constructs makes it possible to reconfigure a a
       running parser, eliminating the need for semantic feedback by providing
       syntactic feedback instead. However, as currently implemented,
       "Replace()" and "Extend()" have to regenerate and re-"eval" the entire
       parser whenever they are called. This makes them quite slow for large

       In such cases, the judicious use of an interpolated regex is likely to
       be far more efficient:

           typedef: 'typedef' type_name/ identifier ';'
                  { $thisparser->{local}{type_name} .= "|$item[3]" }
              | <error>

           identifier: ...!type_name /[A-Za-z_]w*/

           type_name: /$thisparser->{local}{type_name}/

   Precompiling parsers
       Normally Parse::RecDescent builds a parser from a grammar at run-time.
       That approach simplifies the design and implementation of parsing code,
       but has the disadvantage that it slows the parsing process down - you
       have to wait for Parse::RecDescent to build the parser every time the
       program runs. Long or complex grammars can be particularly slow to
       build, leading to unacceptable delays at start-up.

       To overcome this, the module provides a way of "pre-building" a parser
       object and saving it in a separate module. That module can then be used
       to create clones of the original parser.

       A grammar may be precompiled using the "Precompile" class method.  For
       example, to precompile a grammar stored in the scalar $grammar, and
       produce a class named PreGrammar in a module file named PreGrammar.pm,
       you could use:

           use Parse::RecDescent;

           Parse::RecDescent->Precompile($grammar, "PreGrammar");

       The first argument is the grammar string, the second is the name of the
       class to be built. The name of the module file is generated
       automatically by appending ".pm" to the last element of the class name.

           Parse::RecDescent->Precompile($grammar, "My::New::Parser");

       would produce a module file named Parser.pm.

       It is somewhat tedious to have to write a small Perl program just to
       generate a precompiled grammar class, so Parse::RecDescent has some
       special magic that allows you to do the job directly from the command-

       If your grammar is specified in a file named grammar, you can generate
       a class named Yet::Another::Grammar like so:

           > perl -MParse::RecDescent - grammar Yet::Another::Grammar

       This would produce a file named Grammar.pm containing the full
       definition of a class called Yet::Another::Grammar. Of course, to use
       that class, you would need to put the Grammar.pm file in a directory
       named Yet/Another, somewhere in your Perl include path.

       Having created the new class, it's very easy to use it to build a
       parser. You simply "use" the new module, and then call its "new" method
       to create a parser object. For example:

           use Yet::Another::Grammar;
           my $parser = Yet::Another::Grammar->new();

       The effect of these two lines is exactly the same as:

           use Parse::RecDescent;

           open GRAMMAR_FILE, "grammar" or die;
           local $/;
           my $grammar = <GRAMMAR_FILE>;

           my $parser = Parse::RecDescent->new($grammar);

       only considerably faster.

       Note however that the parsers produced by either approach are exactly
       the same, so whilst precompilation has an effect on set-up speed, it
       has no effect on parsing speed. RecDescent 2.0 will address that

       This section describes common mistakes that grammar writers seem to
       make on a regular basis.

   1. Expecting an error to always invalidate a parse
       A common mistake when using error messages is to write the grammar like

           file: line(s)

           line: line_type_1
           | line_type_2
           | line_type_3
           | <error>

       The expectation seems to be that any line that is not of type 1, 2 or 3
       will invoke the "<error>" directive and thereby cause the parse to

       Unfortunately, that only happens if the error occurs in the very first
       line.  The first rule states that a "file" is matched by one or more
       lines, so if even a single line succeeds, the first rule is completely
       satisfied and the parse as a whole succeeds. That means that any error
       messages generated by subsequent failures in the "line" rule are
       quietly ignored.

       Typically what's really needed is this:

           file: line(s) eofile    { $return = $item[1] }

           line: line_type_1
           | line_type_2
           | line_type_3
           | <error>

           eofile: /^\Z/

       The addition of the "eofile" subrule  to the first production means
       that a file only matches a series of successful "line" matches that
       consume the complete input text. If any input text remains after the
       lines are matched, there must have been an error in the last "line". In
       that case the "eofile" rule will fail, causing the entire "file" rule
       to fail too.

       Note too that "eofile" must match "/^\Z/" (end-of-text), not "/^\cZ/"
       or "/^\cD/" (end-of-file).

       And don't forget the action at the end of the production. If you just

           file: line(s) eofile

       then the value returned by the "file" rule will be the value of its
       last item: "eofile". Since "eofile" always returns an empty string on
       success, that will cause the "file" rule to return that empty string.
       Apart from returning the wrong value, returning an empty string will
       trip up code such as:

           $parser->file($filetext) || die;

       (since "" is false).

       Remember that Parse::RecDescent returns undef on failure, so the only
       safe test for failure is:

           defined($parser->file($filetext)) || die;

   2. Using a "return" in an action
       An action is like a "do" block inside the subroutine implementing the
       surrounding rule. So if you put a "return" statement in an action:

           range: '(' start '..' end )'
               { return $item{end} }

       that subroutine will immediately return, without checking the rest of
       the items in the current production (e.g. the "/\s+/") and without
       setting up the necessary data structures to tell the parser that the
       rule has succeeded.

       The correct way to set a return value in an action is to set the
       $return variable:

           range: '(' start '..' end )'
               { $return = $item{end} }

       Diagnostics are intended to be self-explanatory (particularly if you
       use -RD_HINT (under perl -s) or define $::RD_HINT inside the program).

       "Parse::RecDescent" currently diagnoses the following:

       o   Invalid regular expressions used as pattern terminals (fatal

       o   Invalid Perl code in code blocks (fatal error).

       o   Lookahead used in the wrong place or in a nonsensical way (fatal

       o   "Obvious" cases of left-recursion (fatal error).

       o   Missing or extra components in a "<leftop>" or "<rightop>"

       o   Unrecognisable components in the grammar specification (fatal

       o   "Orphaned" rule components specified before the first rule (fatal
           error) or after an "<error>" directive (level 3 warning).

       o   Missing rule definitions (this only generates a level 3 warning,
           since you may be providing them later via

       o   Instances where greedy repetition behaviour will almost certainly
           cause the failure of a production (a level 3 warning - see "ON-

       o   Attempts to define rules named 'Replace' or 'Extend', which cannot
           be called directly through the parser object because of the
           predefined meaning of "Parse::RecDescent::Replace" and
           "Parse::RecDescent::Extend". (Only a level 2 warning is generated,
           since such rules can still be used as subrules).

       o   Productions which consist of a single "<error?>" directive, and
           which therefore may succeed unexpectedly (a level 2 warning, since
           this might conceivably be the desired effect).

       o   Multiple consecutive lookahead specifiers (a level 1 warning only,
           since their effects simply accumulate).

       o   Productions which start with a "<reject>" or "<rulevar:...>"
           directive. Such productions are optimized away (a level 1 warning).

       o   Rules which are autogenerated under $::AUTOSTUB (a level 1

       Damian Conway (damian@conway.org)

       There are undoubtedly serious bugs lurking somewhere in this much code
       :-) Bug reports and other feedback are most welcome.

       Ongoing annoyances include:

       o   There's no support for parsing directly from an input stream.  If
           and when the Perl Gods give us regular expressions on streams, this
           should be trivial (ahem!) to implement.

       o   The parser generator can get confused if actions aren't properly
           closed or if they contain particularly nasty Perl syntax errors
           (especially unmatched curly brackets).

       o   The generator only detects the most obvious form of left recursion
           (potential recursion on the first subrule in a rule). More subtle
           forms of left recursion (for example, through the second item in a
           rule after a "zero" match of a preceding "zero-or-more" repetition,
           or after a match of a subrule with an empty production) are not

       o   Instead of complaining about left-recursion, the generator should
           silently transform the grammar to remove it. Don't expect this
           feature any time soon as it would require a more sophisticated
           approach to parser generation than is currently used.

       o   The generated parsers don't always run as fast as might be wished.

       o   The meta-parser should be bootstrapped using "Parse::RecDescent"

       1.  Repetitions are "incorrigibly greedy" in that they will eat
           everything they can and won't backtrack if that behaviour causes a
           production to fail needlessly.  So, for example:

               rule: subrule(s) subrule

           will never succeed, because the repetition will eat all the
           subrules it finds, leaving none to match the second item. Such
           constructions are relatively rare (and "Parse::RecDescent::new"
           generates a warning whenever they occur) so this may not be a
           problem, especially since the insatiable behaviour can be overcome
           "manually" by writing:

               rule: penultimate_subrule(s) subrule

               penultimate_subrule: subrule ...subrule

           The issue is that this construction is exactly twice as expensive
           as the original, whereas backtracking would add only 1/N to the
           cost (for matching N repetitions of "subrule"). I would welcome
           feedback on the need for backtracking; particularly on cases where
           the lack of it makes parsing performance problematical.

       2.  Having opened that can of worms, it's also necessary to consider
           whether there is a need for non-greedy repetition specifiers.
           Again, it's possible (at some cost) to manually provide the
           required functionality:

               rule: nongreedy_subrule(s) othersubrule

               nongreedy_subrule: subrule ...!othersubrule

           Overall, the issue is whether the benefit of this extra
           functionality outweighs the drawbacks of further complicating the
           (currently minimalist) grammar specification syntax, and (worse)
           introducing more overhead into the generated parsers.

       3.  An "<autocommit>" directive would be nice. That is, it would be
           useful to be able to say:

               command: <autocommit>
               command: 'find' name
                  | 'find' address
                  | 'do' command 'at' time 'if' condition
                  | 'do' command 'at' time
                  | 'do' command
                  | unusual_command

           and have the generator work out that this should be "pruned" thus:

               command: 'find' name
                  | 'find' <commit> address
                  | 'do' <commit> command <uncommit>
                   'at' time
                   'if' <commit> condition
                  | 'do' <commit> command <uncommit>
                   'at' <commit> time
                  | 'do' <commit> command
                  | unusual_command

           There are several issues here. Firstly, should the "<autocommit>"
           automatically install an "<uncommit>" at the start of the last
           production (on the grounds that the "command" rule doesn't know
           whether an "unusual_command" might start with "find" or "do") or
           should the "unusual_command" subgraph be analysed (to see if it
           might be viable after a "find" or "do")?

           The second issue is how regular expressions should be treated. The
           simplest approach would be simply to uncommit before them (on the
           grounds that they might match). Better efficiency would be obtained
           by analyzing all preceding literal tokens to determine whether the
           pattern would match them.

           Overall, the issues are: can such automated "pruning" approach a
           hand-tuned version sufficiently closely to warrant the extra set-up
           expense, and (more importantly) is the problem important enough to
           even warrant the non-trivial effort of building an automated

   Mailing List
       Visit <http://www.perlfoundation.org/perl5/index.cgi?parse_recdescent>
       to sign up for the mailing list.

       <http://www.PerlMonks.org> is also a good place to ask questions.

       Visit Parse::RecDescent::FAQ for answers to frequently (and not so
       frequently) asked questions about Parse::RecDescent

       Regexp::Grammars provides Parse::RecDescent style parsing using native
       Perl 5.10 regular expressions.

       Copyright (c) 1997-2007, Damian Conway "<DCONWAY@CPAN.org>". All rights

       This module is free software; you can redistribute it and/or modify it
       under the same terms as Perl itself. See perlartistic.



perl v5.12.1                      2010-07-05              Parse::RecDescent(3)

Scannen Sie den Barcode um die Webseite zu öffnen

Quelle: http://www.trinler.net/de/service/doc/linux/man.html?command=Parse%3A%3ARecDescent
Gedruckt am: 21.11.2017 13:10 GMT+0100 (2017-11-21T13:10:11+01:00)