parse0x  Artifact Content

Artifact e7a9767924d75bed11b18ba8aaae01f43e22c984:

Wiki page [IntroToParse0x] by stephan 2008-05-16 16:54:05.
D 2008-05-16T16:54:05
L IntroToParse0x
P 5e5f5b124ab53f8fed6e01c8199636b8f87d3ec0
U stephan
W 5129
<h1>An Unimpressively Brief Introduction to parse0x</h1>

ACHTUNG: VERY INCOMPLETE!

This page provides a very brief introduction to using parse0x. It assumes a fair
amount of prior C++ knowledge on the reader's part, as well as knowledge
about what "parsing" means, what it is used for, and generally how it works.


<h2>Include the header file(s)</h2>

parse0x's core library is distributed as a single header file,
intended to be included like:

<verbatim>#include <s11n.net/parse0x/parse0x.hpp></verbatim>

From there, all of the code is in the <tt>parse0x</tt> namespace.

Several utility headers also ship with the library. These include
parsers for common things like numbers (in various formats),
ipv4 addresses, and a mathmatical expression evaluator (i.e. a calculator).


<h2>What are Rules?</h2>

Rules are the core-most concept of the library. A Rule is a struct or
class type which implements a certain interface (shown below). A
Rule's purpose is to traverse a given input and see if it "matches",
where "matches" is a Rule-dependent decision. Rules are combined by
simply declaring typedefs or empty structs which subclass a desired
Rule. They can be combined in arbitrarily complex manners, making them
easy to use and maintain, powerful, and a tad bit frightening.

Rules have the following interface:

<verbatim>
struct RuleConcept {
    template <typename ... State>
    static bool matches( parser_state &, State && ... );
};
</verbatim>

(We'll talk more about (State...) later.)

The full docs for that interface are in the
[/doc/tip/include/s11n.net/parse0x/parse0x.hpp|header file].

You can find an overview of the built-in rules
[RulesOverview|here].

<h2>How parsers are constructed</h2>

Let's say we have the string "abc123" and we want to match all leading
alphabetic character. That looks something like this:

<verbatim>
typedef r_star<r_alpha> myrule;
</verbatim>

That's the only typedef you need
to match any number of alpha chars (equivalent to the regular expression
<nowiki>[a-zA-Z]*</nowiki>, which is where it gets the name "star" from).

(Side-note: by library conventions, core rules all have an <tt>r_</tt> prefix.)

We can check any given input string against our new rule with:

<verbatim>
std::string input("abc123");
if( parse<myrule>( input ) ) {
 ... success ...
}
</verbatim>

That's what it looks like. It's all fine and good to know whether a parse
succeeds, but we also want to get data from it. We'll do that using...


<h2>Actions</h2>

Actions are a counterpart of Rules. Where Rules match, Actions collect. Actions
are added to a parser using the <tt>r_action</tt> rule, like this:

<verbatim>
typedef r_action< myrule, myaction > active_rule;
</verbatim>

That rule says "if myrule matches, call myaction."

Actions have this interface:

<verbatim>struct ActionConcept {
  /**
     The second arg is the matching string which triggered this
     action. The State object is the object passed by the client
     to parse().
  */
  template <typename ... State>
      static void matched( parser_state &, std::string const &, State && ... );
};
</verbatim>

Actions almost always ignore their first input parameter. The second
parameter contains a string of all matched text which led to this
Action. The (State...) argument(s) are Action-dependent, and explained
below.


<h2>State objects</h2>

There are two distinctively different categroies of State types in
parse0x. The first is the <tt>parser_state</tt> type, which keeps
track of the input and provides a few basic features to help support
error tracking. When we say State, however, we normally mean
client-determined types passed to Rules via their matches() method.

Rules take an arbitrary number of template arguments which, and all
core rules ignore these parameters. Rules pass those arguments on to
Actions, which are types which perform some action on strings matched
by a Rule.

One or more State objects (of any type) may optionally be passed to
parse() by the client, and those objects are passed as-is to all
Actions invoked for that parse() run. This allows client-side actions
to interact with client-side state during the parsing process.

As a short example, we'll demonstrate how to append a matched string
to a vector of strings. First, our Rule:


<verbatim>
  typedef r_plus< r_action< r_pad<r_plus<r_alpha>,r_space>, a_append> > Rule2;
</verbatim>


That rule will match the regex <nowiki>(((\s*[a-zA-Z]\s*))+)</nowiki>.
If the innermost group matches (the r_pad part) then the enclosing
r_action will call
a_append::matched(...) and pass is the State object(s). There are
implementations for a_append for State objects of type std::string
and std::vector<std::string>, and those can easily be used as a basis
for creating new ones.

Now to see the whole example:

<verbatim>
  input = "abc def";
  parser_state ps(input);
  typedef std::vector<std::string> VecT;
  VecT state;
  parse<Rule2>(ps,state);
  std::copy( state.begin(), state.end(),
             std::ostream_iterator<std::string>(std::cout,"\n") );
</verbatim>

That will output:
<verbatim>
        abc
        def
</verbatim>


Z aba87027143c49b3e10a1a4645d9e5a1