parsepp  Artifact Content

Artifact a3dc1f596f57ab8d62cc78d425106fc320f5b23a:

Wiki page [RulesOverview] by stephan 2008-05-30 13:39:22.
D 2008-05-30T13:39:22
L RulesOverview
P 242ce44f3eefec319d3a50f51d07ef1e436eb144
U stephan
W 8139
<h1>Overview of core parsepp Rules</h1>

This page gives a brief overview of the various core Rules and
what they do.

(Thanks to
[http://code.google.com/p/pegtl/|the pegtl docs] for its rules table,
off of which this one is based.)

Some notes about terminology in the following table:

<ul>

<li>Comments below which refer to <b>State</b> or <b>Client State</b> mean the
client-side State object which is passed to <tt>parse()</tt>.</li>

<li>References to a <b>RuleList</b> type refer to the
<tt>rule_list&lt;></tt> type, a pseudo-variadic typelist template used
by the framework (in C++0x we have true variadics, and don't need this
type). All types contained in a <tt>rule_list&lt;></tt> must be Rules,
except in the case of ActionList:
</li> 

<li>References to an <b>ActionList</b> type also refer to <tt>rule_list&lt;></tt>,
except that the contained types are Actions instead of Rules.
</li>

<li>References to a <b>CharList</b> type refer to
the <tt>char_list&lt;></tt> pseudo-variadic template. In a CharList,
each entry in the list is a single character constant.
</li>

</ul>

You can find out more about the "list types" on type [typelists] page.

Additional information about each of the types shown below can be found in
the API documentation, which lives in the header files.

<hr>

<h1>Core parsing rules</h1>


<table border=1>

<tr><th width=20%>Equivalent expression</th><th>Rule class</th><th>Notes</th></tr>

<tr>
<td>true</td>
<td><tt>r_success</tt></td>
<td>Always matches without consuming input.</td>
</tr>

<tr>
<td>false</td>
<td><tt>r_failure</tt></td>
<td>Always fails to match and consumes no input.</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_eof</tt></td>
<td>Matches only if input is exhausted.</td>
</tr>

<tr>
<td>R1 && R2 ... RN<br>
or:<br>
R1 || R2 ... RN
</td>
<td><tt>r_seq&lt;bool FailFast,RuleList></tt></td>
<td>If FailFast then it matches only if all contained Rules match.
If (!FailFast) then it matches once any of the Rules match.
i.e. (FailFast==true) is an AND operation and (FailFast==false)
is an OR operation.
</td>
</tr>

<tr>
<td>R1 || R2 ... RN</td>
<td><tt>r_or&lt;RuleList></tt></td>
<td>Convenience form of <tt>r_seq&lt;false,RuleList></tt>.</td>
</tr>

<tr>
<td>R1 && R2 ... RN</td>
<td><tt>r_and&lt;RuleList></tt></td>
<td>Convenience form of <tt>r_seq&lt;true,RuleList></tt>.</td>
</tr>

<tr>
<td>&amp;Rule</td>
<td><tt>r_at&lt;Rule></tt></td>
<td>Matches only if Rule matches, but never consumes input. That is,
it can be used to say "are we AT an integer?" without actually
consuming the token.</td>
</tr>

<tr>
<td>!Rule</td>
<td><tt>r_notat&lt;Rule></tt></td>
<td>The negate of <tt>r_at&lt;R></tt></td>
</tr>

<tr>
<td>Rule?</td>
<td><tt>r_opt&lt;Rule></tt></td>
<td>Always matches except but may or may not consume input.</td>
</tr>

<tr>
<td>Rule+</td>
<td><tt>r_plus&lt;Rule></tt></td>
<td>Matches 1 or more instances of Rule.</td>
</tr>

<tr>
<td>Rule*</td>
<td><tt>r_star&lt;Rule></tt></td>
<td>Matches 0 or more instances of Rule.
Always mathches but may not consume input.</td>
</tr>

<tr>
<td>Rule{Min,Max}</td>
<td><tt>r_repeat&lt;Rule,int Min,int Max=Min></tt></td>
<td>Matches up to Max number of times if Rule matches at least
the Minimum number of times.</td>
</tr>

<tr>
<td>if (Rule) then Action</td>
<td><tt>r_action&lt;Rule,Action></tt></td>
<td>If Rule matches then the matching string is passed on
to Action::matched().</td>
</tr>

<tr>
<td>(LeftRule*) && Rule && (RightRule*) </td>
<td><tt>r_pad&lt;Rule,LeftRule,RightRule=LeftRule></tt></td>
<td>Typically used to skip leading and/or trailing whitespace.</td>
</tr>

</table>

There are more rules in the core library, but some
are very specialized, highly experimental, or in some other way
deviant. See parsepp.hpp for full details.

<hr>

<h1>Core Actions</h1>


<table border=1>
<tr><td colspan=3>
Actions are types which get passed matching tokens by the
<tt>r_action&lt;></tt> Rule.
</td></tr>

<tr>
<td><tt>a_append</tt><br>
(Requires <tt>parsepp_append.hpp</tt>)
</td>
<td colspan=2>"Appends" a matched token to the client-supplied State object,
where the exact meaning of "append" is dependent on the State type. Implementations
exist for std::string, std::vector/set/list&lt;string>, and vector/set/list&lt;ValueType>,
where ValueType is a type which can be lexically cast from a std::string by using
i/o operators.
</td>
</tr>

<tr>
<td><tt>a_actions&lt;ActionList></tt></td>
<td colspan=2>Forwards a matched token to an arbitrary number of Actions.
</td>
</tr>


</table>

<h1>Ascii-matching Rules</h1>


<table border=1>

<tr>
<td colspan=3>The following rules are specific to matching ASCII characters.
Instead of templated Rule arguments, most take a character constants (in the form
of integer template arguments) or a <tt>char_list&lt;></tt> argument.
</td>
</tr>


<tr><th width=20%>Equivalent expression</th><th>Rule class</th><th>Notes</th></tr>


<tr>
<td>C</td>
<td><tt>r_ch&lt;char C></tt></td>
<td>Matches only C.</td>
</tr>

<tr>
<td><nowiki>[Cc]</nowiki></td>
<td><tt>r_chi&lt;char C></tt></td>
<td>Matches C case-insensitively.</td>
</tr>

<tr>
<td><nowiki>[^C]</nowiki></td>
<td><tt>r_notch&lt;char C></tt></td>
<td>Matches any character except C.</td>
</tr>

<tr>
<td><nowiki>[^Cc]</nowiki></td>
<td><tt>r_notchi&lt;char C></tt></td>
<td>Matches any character except C, case-insensitively.</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_any</tt></td>
<td>Matches any char except eof.</td>
</tr>

<tr>
<td><nowiki>[Min-Max]</nowiki></td>
<td><tt>r_range&lt;int Min, int Max></tt></td>
<td>Matches if input is in the ACII range (Min..Max).
e.g.<tt>r_range&lt;'a','z'></tt>
</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_oneof&lt;CharList></tt></td>
<td>Matches if the input contains any character in the CharList.</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_oneofi&lt;CharList></tt></td>
<td>Identical to <tt>r_oneof</tt>, but does a case-insensitive match.</td>
</tr>


<tr>
<td>C && Cn...</td>
<td><tt>r_chseq&lt;CharList></tt></td>
<td>Matches if input matches all characters
in the list.</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_chseqi&lt;CharList></tt></td>
<td>Case-insensitive form of <tt>r_chseq</tt>. Results are undefined if
any given char is not alphabetic.</td>
</tr>

<tr>
<td><nowiki>[ \t]</nowiki></td>
<td><tt>r_blank</tt></td>
<td>Matches 1 blank char (space or tab).</td>
</tr>

<tr>
<td><nowiki>[ \n\t\r\f\v]</nowiki></td>
<td><tt>r_space</tt></td>
<td>Matches 1 whitespace char.</td>
</tr>

<tr>
<td>((\r\n)||(\n))</td>
<td><tt>r_eol</tt></td>
<td>Matches only at end end-of-line.</td>
</tr>

<tr>
<td><nowiki>[A-Z]</nowiki></td>
<td><tt>r_upper</tt></td>
<td>Matches 1 upper-case letter.</td>
</tr>

<tr>
<td><nowiki>[a-z]</nowiki></td>
<td><tt>r_lower</tt></td>
<td>Matches 1 lower-case letter.</td>
</tr>

<tr>
<td><nowiki>[a-zA-Z]</nowiki></td>
<td><tt>r_alpha</tt></td>
<td>Matches 1 upper- or lower-case letter.</td>
</tr>

<tr>
<td><nowiki>[0-9]</nowiki></td>
<td><tt>r_digit</tt></td>
<td>Matches 1 digit 0..9</td>
</tr>

<tr>
<td><nowiki>[a-zA-Z0-9]</nowiki></td>
<td><tt>r_alnum</tt></td>
<td>Matches 1 alpha or numeric char.</td>
</tr>

<tr>
<td><nowiki>[0-9a-fA-F]</nowiki></td>
<td><tt>r_xdigit</tt></td>
<td>Matches 1 hexidecimal digit.</td>
</tr>

<tr>
<td> (alpha||_) && (alpha||_||digit)*</td>
<td><tt>r_identifier</tt></td>
<td>C-style identifier string (used by most
programming languages).</td>
</tr>

</table>

<hr>

<h1>Misc...</h1>

Aside from the core rules, in <tt>parsepp_*.hpp</tt>
can be found several common types of parsers, such as:

<ul>
<li><tt>parsepp_numeric.hpp</tt>: numbers (decimal int, octal, hex, double)</li>
<li><tt>parsepp_calc.hpp</tt>: A calculator supporting all of the types from
<tt>parsepp_numeric.hpp</tt></li>
</ul>

Contributions of new parsers are of course welcomed.

Z 2c9e89135202b0dcda666a0ff11f1208