parse0x  Artifact Content

Artifact 555faec7f5cc01aadcaaf6ab9ad4032e3e9608a8:

Wiki page [RulesOverview] by stephan 2011-07-15 14:08:45.
D 2011-07-15T14:08:45.208
L RulesOverview
P 4c8cd7e64c034e0bf0b268fae79de0552d2dc01e
U stephan
W 7318
<h1>Overview of core parse0x Rules</h1>

This page gives a brief overview of the various core Rules and
what they do.

(Thanks to
[http://code.google.com/p/pegtl/|the pegtl docs] for its rules table,
off of which this one is based.)

Comments below which refer to "State..." or "State list" mean the
list of client-side State objects that are passed to <tt>parse()</tt>.
A reference to "the Nth State argument" refers to only a given object
from (State...).


<h1>Core parsing rules</h1>


<table border=1>

<tr><th width=20%>Equivalent expression</th><th>Rule class</th><th>Notes</th></tr>

<tr>
<td>true</td>
<td><tt>r_success</tt></td>
<td>Always matches without consuming input.</td>
</tr>

<tr>
<td>false</td>
<td><tt>r_failure</tt></td>
<td>Always fails to match and consumes no input.</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_eof</tt></td>
<td>Matches only if input is exhausted.</td>
</tr>

<tr>
<td>R1 && R2 ... RN</td>
<td><tt>r_seq&lt;First,Rest...></tt></td>
<td>Matches only if all contained Rules match.</td>
</tr>

<tr>
<td>R1 || R2 ... RN</td>
<td><tt>r_or&lt;First,Rest...></tt></td>
<td>Matches if any of the contained Rules match, stopping
at the first match.</td>
</tr>

<tr>
<td>&amp;Rule</td>
<td><tt>r_at&lt;Rule></tt></td>
<td>Matches only if Rule matches, but never consumes input. That is,
it can be used to say "are we AT an integer?"</td>
</tr>

<tr>
<td>!Rule</td>
<td><tt>r_notat&lt;Rule></tt></td>
<td>The negate of r_at&lt;R></td>
</tr>

<tr>
<td>Rule?</td>
<td><tt>r_opt&lt;Rule></tt></td>
<td>Always matches but may or may not consume input.</td>
</tr>

<tr>
<td>Rule+</td>
<td><tt>r_plus&lt;Rule></tt></td>
<td>Matches 1 or more instances of Rule.</td>
</tr>

<tr>
<td>Rule*</td>
<td><tt>r_star&lt;Rule></tt></td>
<td>Matches 0 or more instances of Rule.
Always mathches but may not consume input.</td>
</tr>

<tr>
<td>Rule{Min,Max}</td>
<td><tt>r_repeat&lt;Rule,Min,Max=Min></tt></td>
<td>Matches up to Max number of times if Rule matches at least
the Minimum number of times.</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_action&lt;Rule,Action></tt></td>
<td>If Rule matches then Action::matched() is called, passed
the matching string.</td>
</tr>

<tr>
<td>(Left*) && Rule && (Right*) </td>
<td><tt>r_pad&lt;Rule,Left,Right=Left></tt></td>
<td>Typically used to skip whitespace.</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_action_nth&lt;Num, Rule,Action></tt></td>
<td>Like <tt>r_action&lt;Rule,Action></tt>,
but if Rule matches then the Action is sent only
the Num'th State object. This is useful only when a client
has multiple State objects and various Actions which
each expect a different type of State. Using
this (or <tt>a_action_nth</tt>) you can select which
State objects are sent which matches.
</td>
</tr>

</table>

There are more rules in the core library, but some
are very specialized, highly experimental, or in some other way
deviant. See parse0x.hpp for full details.


<hr>
<h1>Core Actions</h1>

<table border=1>
<tr>
<td colspan=3>These built-in Actions can be used with <tt>r_action&lt;Rule,Action></tt>.
</td>
</tr>

<tr>
<td><tt>a_actions&lt;Action1,...ActionN></tt></td>
<td colspan=2>Forwards a match to Action1...ActionN.</td>
</tr>

<tr>
<td><tt>a_action_nth&lt;Int,Action></tt></td>
<td colspan=2>A unary Action proxy which calls Action::matched(...), passing
it the matched string and the Nth State argument. You should not use
<tt>a_action_nth&lt;></tt> within an <tt>r_action_nth&lt;></tt>. Use
either <tt>r_action_nth&lt;N,...></tt> or
<tt>r_action&lt;Rule,a_action_nth&lt;N,Action>></tt>.
</td>
</tr>

<tr>
<td><tt>a_append</tt><br>(requires append.hpp)</td>
<td colspan=2>"Appends" the matched string to the State. The definition of "append"
is State-dependent. Specializations exist for std::string, and
std::vector/list/set&lt;std::string></td>
</tr>

<tr>
<td><tt>a_appendto&lt;Int></tt><br>(requires append.hpp)</td>
<td colspan=2>Like a_append, but "appends" the match only to the Nth State object.
</td>
</tr>

</table>

<hr>
<h1>Ascii-matching Rules</h1>


<table border=1>


<tr><th width=20%>Equivalent expression</th><th>Rule class</th><th>Notes</th></tr>

<tr>
<td colspan=3>The following rules are specific to matching ASCII characters.
Instead of templated Rule arguments, most take character constants (in the form
of integer template arguments).

</td>
</tr>

<tr>
<td>C</td>
<td><tt>r_ch&lt;int C></tt></td>
<td>Matches only C.</td>
</tr>

<tr>
<td><nowiki>[Cc]</nowiki></td>
<td><tt>r_chi&lt;int C></tt></td>
<td>Matches C case-insensitively.</td>
</tr>

<tr>
<td><nowiki>[^C]</nowiki></td>
<td><tt>r_notch&lt;int C></tt></td>
<td>Matches any character except C.</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_any</tt></td>
<td>Matches any char except eof.</td>
</tr>

<tr>
<td><nowiki>[Min-Max]</nowiki></td>
<td><tt>r_range&lt;int Min, int Max></tt></td>
<td>Matches if input is in the ACII range (Min..Max).
e.g.<tt>r_range&lt;'a','z'></tt>
</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_oneof&lt;C,Chars...></tt></td>
<td>Matches if an input char is any of the templatized character arguments.</td>
</tr>

<tr>
<td>C && Cn...</td>
<td><tt>r_chseq&lt;C,Chars...></tt></td>
<td>Matches if input matches all characters
in the list.</td>
</tr>

<tr>
<td>&nbsp;</td>
<td><tt>r_chiseq&lt;C,Chars...></tt></td>
<td>Case-insensitive form of <tt>r_chseq</tt></td>
</tr>

<tr>
<td><nowiki>[ \t]</nowiki></td>
<td><tt>r_blank</tt></td>
<td>Matches 1 blank char.</td>
</tr>

<tr>
<td><nowiki>[ \n\t\r\f\v]</nowiki></td>
<td><tt>r_space</tt></td>
<td>Matches 1 whitespace char.</td>
</tr>

<tr>
<td>((\r\n)||(\n))</td>
<td><tt>r_eol</tt></td>
<td>Matches only at end end-of-line.</td>
</tr>

<tr>
<td><nowiki>[A-Z]</nowiki></td>
<td><tt>r_upper</tt></td>
<td>Matches 1 upper-case letter.</td>
</tr>

<tr>
<td><nowiki>[a-z]</nowiki></td>
<td><tt>r_lower</tt></td>
<td>Matches 1 lower-case letter.</td>
</tr>

<tr>
<td><nowiki>[a-zA-Z]</nowiki></td>
<td><tt>r_alpha</tt></td>
<td>Matches 1 upper- or lower-case letter.</td>
</tr>

<tr>
<td><nowiki>[0-9]</nowiki></td>
<td><tt>r_digit</tt></td>
<td>Matches 1 digit 0..9</td>
</tr>

<tr>
<td><nowiki>[a-zA-Z0-9]</nowiki></td>
<td><tt>r_alnum</tt></td>
<td>Matches 1 alpha or numeric char.</td>
</tr>

<tr>
<td><nowiki>[0-9a-fA-F]</nowiki></td>
<td><tt>r_xdigit</tt></td>
<td>Matches 1 hexidecimal digit.</td>
</tr>

<tr>
<td> (alpha||_) && (alpha||_||digit)*</td>
<td><tt>r_identifier</tt></td>
<td>C-style identifier string (used by most
programming languages).</td>
</tr>

</table>

<hr>

<h1>Misc...</h1>

Aside from the core rules, under <tt>include/s11n.net/parse0x/</tt>
can be found several headers implementing some common types of
parsers, such as:

<ul>
<li><tt>numeric.hpp</tt>: numbers (decimal int, octal, hex, double)</li>
<li><tt>strings.hpp</tt>: C-style quoted strings.</li>
<li><tt>ipv4.hpp</tt>: IPv4 addresses</li>
<li><tt>calculator.hpp</tt>: A calculator supporting all of the types from
numeric.hpp</li>
</ul>

Contributions of new parsers are of course welcomed.

Z 7b5a6feb9b2fa78ff767daa887b6da15