parsepp  RulesOverview

Overview of core parsepp Rules

This page gives a brief overview of the various core Rules and what they do.

(Thanks to the pegtl docs for its rules table, off of which this one is based.)

Some notes about terminology in the following table:

  • Comments below which refer to State or Client State mean the client-side State object which is passed to parse().

  • References to a RuleList type refer to the rule_list<> type, a pseudo-variadic typelist template used by the framework (in C++0x we have true variadics, and don't need this type). All types contained in a rule_list<> must be Rules, except in the case of ActionList:

  • References to an ActionList type also refer to rule_list<>, except that the contained types are Actions instead of Rules.

  • References to a CharList type refer to the char_list<> pseudo-variadic template. In a CharList, each entry in the list is a single character constant.

You can find out more about the "list types" on type typelists page.

Additional information about each of the types shown below can be found in the API documentation, which lives in the header files.


Core parsing rules

Equivalent expression

Rule classNotes

true

r_success Always matches without consuming input.

false

r_failure Always fails to match and consumes no input.

 

r_eof Matches only if input is exhausted.

R1 && R2 ... RN
or:
R1 || R2 ... RN

r_seq<bool FailFast,RuleList> If FailFast then it matches only if all contained Rules match. If (!FailFast) then it matches once any of the Rules match. i.e. (FailFast==true) is an AND operation and (FailFast==false) is an OR operation.

R1 || R2 ... RN

r_or<RuleList> Convenience form of r_seq<false,RuleList>.

R1 && R2 ... RN

r_and<RuleList> Convenience form of r_seq<true,RuleList>.

&Rule

r_at<Rule> Matches only if Rule matches, but never consumes input. That is, it can be used to say "are we AT an integer?" without actually consuming the token.

!Rule

r_notat<Rule> The negate of r_at<R>

Rule?

r_opt<Rule> Always matches except but may or may not consume input.

Rule+

r_plus<Rule> Matches 1 or more instances of Rule.

Rule*

r_star<Rule> Matches 0 or more instances of Rule. Always mathches but may not consume input.

Rule{Min,Max}

r_repeat<Rule,int Min,int Max=Min> Matches up to Max number of times if Rule matches at least the Minimum number of times.

if (Rule) then Action

r_action<Rule,Action> If Rule matches then the matching string is passed on to Action::matched().

(LeftRule*) && Rule && (RightRule*)

r_pad<Rule,LeftRule,RightRule=LeftRule> Typically used to skip leading and/or trailing whitespace.

There are more rules in the core library, but some are very specialized, highly experimental, or in some other way deviant. See parsepp.hpp for full details.


Core Actions

Actions are types which get passed matching tokens by the r_action<> Rule.

a_append
(Requires parsepp_append.hpp)

"Appends" a matched token to the client-supplied State object, where the exact meaning of "append" is dependent on the State type. Implementations exist for std::string, std::vector/set/list<string>, and vector/set/list<ValueType>, where ValueType is a type which can be lexically cast from a std::string by using i/o operators.

a_actions<ActionList>

Forwards a matched token to an arbitrary number of Actions.

Ascii-matching Rules

The following rules are specific to matching ASCII characters. Instead of templated Rule arguments, most take a character constants (in the form of integer template arguments) or a char_list<> argument.

Equivalent expression

Rule classNotes

C

r_ch<char C> Matches only C.

[Cc]

r_chi<char C> Matches C case-insensitively.

[^C]

r_notch<char C> Matches any character except C.

[^Cc]

r_notchi<char C> Matches any character except C, case-insensitively.

 

r_any Matches any char except eof.

[Min-Max]

r_range<int Min, int Max> Matches if input is in the ACII range (Min..Max). e.g.r_range<'a','z'>

 

r_oneof<CharList> Matches if the input contains any character in the CharList.

 

r_oneofi<CharList> Identical to r_oneof, but does a case-insensitive match.

C && Cn...

r_chseq<CharList> Matches if input matches all characters in the list.

 

r_chseqi<CharList> Case-insensitive form of r_chseq. Results are undefined if any given char is not alphabetic.

[ \t]

r_blank Matches 1 blank char (space or tab).

[ \n\t\r\f\v]

r_space Matches 1 whitespace char.

((\r\n)||(\n))

r_eol Matches only at end end-of-line.

[A-Z]

r_upper Matches 1 upper-case letter.

[a-z]

r_lower Matches 1 lower-case letter.

[a-zA-Z]

r_alpha Matches 1 upper- or lower-case letter.

[0-9]

r_digit Matches 1 digit 0..9

[a-zA-Z0-9]

r_alnum Matches 1 alpha or numeric char.

[0-9a-fA-F]

r_xdigit Matches 1 hexidecimal digit.

(alpha||_) && (alpha||_||digit)*

r_identifier C-style identifier string (used by most programming languages).

Misc...

Aside from the core rules, in parsepp_*.hpp can be found several common types of parsers, such as:

  • parsepp_numeric.hpp: numbers (decimal int, octal, hex, double)
  • parsepp_calc.hpp: A calculator supporting all of the types from parsepp_numeric.hpp

Contributions of new parsers are of course welcomed.