Overview of core parsepp Rules
This page gives a brief overview of the various core Rules and what they do.
(Thanks to the pegtl docs for its rules table, off of which this one is based.)
Some notes about terminology in the following table:
Comments below which refer to State or Client State mean the client-side State object which is passed to parse().
References to a RuleList type refer to the rule_list<> type, a pseudo-variadic typelist template used by the framework (in C++0x we have true variadics, and don't need this type). All types contained in a rule_list<> must be Rules, except in the case of ActionList:
References to an ActionList type also refer to rule_list<>, except that the contained types are Actions instead of Rules.
References to a CharList type refer to the char_list<> pseudo-variadic template. In a CharList, each entry in the list is a single character constant.
You can find out more about the "list types" on type typelists page.
Additional information about each of the types shown below can be found in the API documentation, which lives in the header files.
Core parsing rules
Equivalent expression | Rule class | Notes |
---|---|---|
true |
r_success | Always matches without consuming input. |
false |
r_failure | Always fails to match and consumes no input. |
|
r_eof | Matches only if input is exhausted. |
R1 && R2 ... RN |
r_seq<bool FailFast,RuleList> | If FailFast then it matches only if all contained Rules match. If (!FailFast) then it matches once any of the Rules match. i.e. (FailFast==true) is an AND operation and (FailFast==false) is an OR operation. |
R1 || R2 ... RN |
r_or<RuleList> | Convenience form of r_seq<false,RuleList>. |
R1 && R2 ... RN |
r_and<RuleList> | Convenience form of r_seq<true,RuleList>. |
&Rule |
r_at<Rule> | Matches only if Rule matches, but never consumes input. That is, it can be used to say "are we AT an integer?" without actually consuming the token. |
!Rule |
r_notat<Rule> | The negate of r_at<R> |
Rule? |
r_opt<Rule> | Always matches except but may or may not consume input. |
Rule+ |
r_plus<Rule> | Matches 1 or more instances of Rule. |
Rule* |
r_star<Rule> | Matches 0 or more instances of Rule. Always mathches but may not consume input. |
Rule{Min,Max} |
r_repeat<Rule,int Min,int Max=Min> | Matches up to Max number of times if Rule matches at least the Minimum number of times. |
if (Rule) then Action |
r_action<Rule,Action> | If Rule matches then the matching string is passed on to Action::matched(). |
(LeftRule*) && Rule && (RightRule*) |
r_pad<Rule,LeftRule,RightRule=LeftRule> | Typically used to skip leading and/or trailing whitespace. |
There are more rules in the core library, but some are very specialized, highly experimental, or in some other way deviant. See parsepp.hpp for full details.
Core Actions
Actions are types which get passed matching tokens by the r_action<> Rule. | ||
a_append |
"Appends" a matched token to the client-supplied State object, where the exact meaning of "append" is dependent on the State type. Implementations exist for std::string, std::vector/set/list<string>, and vector/set/list<ValueType>, where ValueType is a type which can be lexically cast from a std::string by using i/o operators. | |
a_actions<ActionList> |
Forwards a matched token to an arbitrary number of Actions. |
Ascii-matching Rules
The following rules are specific to matching ASCII characters. Instead of templated Rule arguments, most take a character constants (in the form of integer template arguments) or a char_list<> argument. |
||
Equivalent expression | Rule class | Notes |
---|---|---|
C |
r_ch<char C> | Matches only C. |
[Cc] |
r_chi<char C> | Matches C case-insensitively. |
[^C] |
r_notch<char C> | Matches any character except C. |
[^Cc] |
r_notchi<char C> | Matches any character except C, case-insensitively. |
|
r_any | Matches any char except eof. |
[Min-Max] |
r_range<int Min, int Max> | Matches if input is in the ACII range (Min..Max). e.g.r_range<'a','z'> |
|
r_oneof<CharList> | Matches if the input contains any character in the CharList. |
|
r_oneofi<CharList> | Identical to r_oneof, but does a case-insensitive match. |
C && Cn... |
r_chseq<CharList> | Matches if input matches all characters in the list. |
|
r_chseqi<CharList> | Case-insensitive form of r_chseq. Results are undefined if any given char is not alphabetic. |
[ \t] |
r_blank | Matches 1 blank char (space or tab). |
[ \n\t\r\f\v] |
r_space | Matches 1 whitespace char. |
((\r\n)||(\n)) |
r_eol | Matches only at end end-of-line. |
[A-Z] |
r_upper | Matches 1 upper-case letter. |
[a-z] |
r_lower | Matches 1 lower-case letter. |
[a-zA-Z] |
r_alpha | Matches 1 upper- or lower-case letter. |
[0-9] |
r_digit | Matches 1 digit 0..9 |
[a-zA-Z0-9] |
r_alnum | Matches 1 alpha or numeric char. |
[0-9a-fA-F] |
r_xdigit | Matches 1 hexidecimal digit. |
(alpha||_) && (alpha||_||digit)* |
r_identifier | C-style identifier string (used by most programming languages). |
Misc...
Aside from the core rules, in parsepp_*.hpp can be found several common types of parsers, such as:
- parsepp_numeric.hpp: numbers (decimal int, octal, hex, double)
- parsepp_calc.hpp: A calculator supporting all of the types from parsepp_numeric.hpp
Contributions of new parsers are of course welcomed.