parse0x  Update of "IntroToParse0x"

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: e7a9767924d75bed11b18ba8aaae01f43e22c984
Page Name:IntroToParse0x
Date: 2008-05-16 16:54:05
Original User: stephan
Parent: 5e5f5b124ab53f8fed6e01c8199636b8f87d3ec0
Content

An Unimpressively Brief Introduction to parse0x

ACHTUNG: VERY INCOMPLETE!

This page provides a very brief introduction to using parse0x. It assumes a fair amount of prior C++ knowledge on the reader's part, as well as knowledge about what "parsing" means, what it is used for, and generally how it works.

Include the header file(s)

parse0x's core library is distributed as a single header file, intended to be included like:

#include <s11n.net/parse0x/parse0x.hpp>

From there, all of the code is in the parse0x namespace.

Several utility headers also ship with the library. These include parsers for common things like numbers (in various formats), ipv4 addresses, and a mathmatical expression evaluator (i.e. a calculator).

What are Rules?

Rules are the core-most concept of the library. A Rule is a struct or class type which implements a certain interface (shown below). A Rule's purpose is to traverse a given input and see if it "matches", where "matches" is a Rule-dependent decision. Rules are combined by simply declaring typedefs or empty structs which subclass a desired Rule. They can be combined in arbitrarily complex manners, making them easy to use and maintain, powerful, and a tad bit frightening.

Rules have the following interface:

struct RuleConcept {
    template <typename ... State>
    static bool matches( parser_state &, State && ... );
};

(We'll talk more about (State...) later.)

The full docs for that interface are in the header file.

You can find an overview of the built-in rules here.

How parsers are constructed

Let's say we have the string "abc123" and we want to match all leading alphabetic character. That looks something like this:

typedef r_star<r_alpha> myrule;

That's the only typedef you need to match any number of alpha chars (equivalent to the regular expression [a-zA-Z]*, which is where it gets the name "star" from).

(Side-note: by library conventions, core rules all have an r_ prefix.)

We can check any given input string against our new rule with:

std::string input("abc123");
if( parse<myrule>( input ) ) {
 ... success ...
}

That's what it looks like. It's all fine and good to know whether a parse succeeds, but we also want to get data from it. We'll do that using...

Actions

Actions are a counterpart of Rules. Where Rules match, Actions collect. Actions are added to a parser using the r_action rule, like this:

typedef r_action< myrule, myaction > active_rule;

That rule says "if myrule matches, call myaction."

Actions have this interface:

struct ActionConcept {
  /**
     The second arg is the matching string which triggered this
     action. The State object is the object passed by the client
     to parse().
  */
  template <typename ... State>
      static void matched( parser_state &, std::string const &, State && ... );
};

Actions almost always ignore their first input parameter. The second parameter contains a string of all matched text which led to this Action. The (State...) argument(s) are Action-dependent, and explained below.

State objects

There are two distinctively different categroies of State types in parse0x. The first is the parser_state type, which keeps track of the input and provides a few basic features to help support error tracking. When we say State, however, we normally mean client-determined types passed to Rules via their matches() method.

Rules take an arbitrary number of template arguments which, and all core rules ignore these parameters. Rules pass those arguments on to Actions, which are types which perform some action on strings matched by a Rule.

One or more State objects (of any type) may optionally be passed to parse() by the client, and those objects are passed as-is to all Actions invoked for that parse() run. This allows client-side actions to interact with client-side state during the parsing process.

As a short example, we'll demonstrate how to append a matched string to a vector of strings. First, our Rule:

  typedef r_plus< r_action< r_pad<r_plus<r_alpha>,r_space>, a_append> > Rule2;

That rule will match the regex (((\s*[a-zA-Z]\s*))+). If the innermost group matches (the r_pad part) then the enclosing r_action will call a_append::matched(...) and pass is the State object(s). There are implementations for a_append for State objects of type std::string and std::vector<std::string>, and those can easily be used as a basis for creating new ones.

Now to see the whole example:

  input = "abc def";
  parser_state ps(input);
  typedef std::vector<std::string> VecT;
  VecT state;
  parse<Rule2>(ps,state);
  std::copy( state.begin(), state.end(),
             std::ostream_iterator<std::string>(std::cout,"\n") );

That will output:

        abc
        def