parse0x  Update of "RulesOverview"

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: 555faec7f5cc01aadcaaf6ab9ad4032e3e9608a8
Page Name:RulesOverview
Date: 2011-07-15 14:08:45
Original User: stephan
Parent: 4c8cd7e64c034e0bf0b268fae79de0552d2dc01e
Content

Overview of core parse0x Rules

This page gives a brief overview of the various core Rules and what they do.

(Thanks to the pegtl docs for its rules table, off of which this one is based.)

Comments below which refer to "State..." or "State list" mean the list of client-side State objects that are passed to parse(). A reference to "the Nth State argument" refers to only a given object from (State...).

Core parsing rules

Equivalent expression

Rule classNotes

true

r_success Always matches without consuming input.

false

r_failure Always fails to match and consumes no input.

 

r_eof Matches only if input is exhausted.

R1 && R2 ... RN

r_seq<First,Rest...> Matches only if all contained Rules match.

R1 || R2 ... RN

r_or<First,Rest...> Matches if any of the contained Rules match, stopping at the first match.

&Rule

r_at<Rule> Matches only if Rule matches, but never consumes input. That is, it can be used to say "are we AT an integer?"

!Rule

r_notat<Rule> The negate of r_at<R>

Rule?

r_opt<Rule> Always matches but may or may not consume input.

Rule+

r_plus<Rule> Matches 1 or more instances of Rule.

Rule*

r_star<Rule> Matches 0 or more instances of Rule. Always mathches but may not consume input.

Rule{Min,Max}

r_repeat<Rule,Min,Max=Min> Matches up to Max number of times if Rule matches at least the Minimum number of times.

 

r_action<Rule,Action> If Rule matches then Action::matched() is called, passed the matching string.

(Left*) && Rule && (Right*)

r_pad<Rule,Left,Right=Left> Typically used to skip whitespace.

 

r_action_nth<Num, Rule,Action> Like r_action<Rule,Action>, but if Rule matches then the Action is sent only the Num'th State object. This is useful only when a client has multiple State objects and various Actions which each expect a different type of State. Using this (or a_action_nth) you can select which State objects are sent which matches.

There are more rules in the core library, but some are very specialized, highly experimental, or in some other way deviant. See parse0x.hpp for full details.


Core Actions

These built-in Actions can be used with r_action<Rule,Action>.

a_actions<Action1,...ActionN>

Forwards a match to Action1...ActionN.

a_action_nth<Int,Action>

A unary Action proxy which calls Action::matched(...), passing it the matched string and the Nth State argument. You should not use a_action_nth<> within an r_action_nth<>. Use either r_action_nth<N,...> or r_action<Rule,a_action_nth<N,Action>>.

a_append
(requires append.hpp)

"Appends" the matched string to the State. The definition of "append" is State-dependent. Specializations exist for std::string, and std::vector/list/set<std::string>

a_appendto<Int>
(requires append.hpp)

Like a_append, but "appends" the match only to the Nth State object.

Ascii-matching Rules

Equivalent expression

Rule classNotes

The following rules are specific to matching ASCII characters. Instead of templated Rule arguments, most take character constants (in the form of integer template arguments).

C

r_ch<int C> Matches only C.

[Cc]

r_chi<int C> Matches C case-insensitively.

[^C]

r_notch<int C> Matches any character except C.

 

r_any Matches any char except eof.

[Min-Max]

r_range<int Min, int Max> Matches if input is in the ACII range (Min..Max). e.g.r_range<'a','z'>

 

r_oneof<C,Chars...> Matches if an input char is any of the templatized character arguments.

C && Cn...

r_chseq<C,Chars...> Matches if input matches all characters in the list.

 

r_chiseq<C,Chars...> Case-insensitive form of r_chseq

[ \t]

r_blank Matches 1 blank char.

[ \n\t\r\f\v]

r_space Matches 1 whitespace char.

((\r\n)||(\n))

r_eol Matches only at end end-of-line.

[A-Z]

r_upper Matches 1 upper-case letter.

[a-z]

r_lower Matches 1 lower-case letter.

[a-zA-Z]

r_alpha Matches 1 upper- or lower-case letter.

[0-9]

r_digit Matches 1 digit 0..9

[a-zA-Z0-9]

r_alnum Matches 1 alpha or numeric char.

[0-9a-fA-F]

r_xdigit Matches 1 hexidecimal digit.

(alpha||_) && (alpha||_||digit)*

r_identifier C-style identifier string (used by most programming languages).

Misc...

Aside from the core rules, under include/s11n.net/parse0x/ can be found several headers implementing some common types of parsers, such as:

  • numeric.hpp: numbers (decimal int, octal, hex, double)
  • strings.hpp: C-style quoted strings.
  • ipv4.hpp: IPv4 addresses
  • calculator.hpp: A calculator supporting all of the types from numeric.hpp

Contributions of new parsers are of course welcomed.