pegc  pegc

Welcome to the pegc fossil repository

pegc is an experimental toolkit for writing PEG parsers in C using something similar to functional composition, conceptually similar to C++ parsing toolkits like Boost.Spirit, PEGTL, and parsepp. While several C++ implementations exist for this model, it is currently (September 2008) believed that pegc is the the first C library of its kind :).

Author: Stephan Beal (http://wanderinghorse.net/home/stephan)

License: The pegc code itself is released into the Public Domain. However, pegc uses some hashtable code which has an MIT license.

Overview

pegc attempts to implement a model of parser which has become quite popular in C++, but to do so within the limitations of C (e.g. lack of type safety in many places, and no safe casts).

The basic idea is that one defines a grammar as a list of Rule objects. A grammar starts with a top rule, and that rule may then delegate all parsing, as it sees fit, to other rules. The result of a parse is either 'true' (the top-most rule matches) or false (the top-most rule fails). It is roughly modelled off of recursive descent parsers, and follows most of those conventions. For example, a parsing rule which does not match (i.e. it returns false) must not consume input. Most rules which do match, on the other hand, do consume (there are several exceptions to that rule, though).

In C++ we would build the parser using templates (at least that's how i'd do it). In C we don't have that option, so we build up little objects which contain a Rule function and some data for that function. Those rules can then be processed in an RD fashion.

The peg/leg project is similar to pegc but solves the problem from the exact opposite direction - it uses a custom PEG grammar as input to a code generator, whereas pegc is not a generator (but could be used to implement one).

My theory is that once the basic set of rules are in place, it will be relatively easy to implement a self-hosted code generator which can read a lex/yacc/lemon-like grammar and generate pegc-based parsers. That is, a PEGC-parsed grammar which in turn generates PEGC-based parsers.

Features

  • Relatively simple creation of PEG-style parsers via C code, using "struct composition" to perform some of the tricks which C++ libraries of this type implement using templates or functional composition.
  • Subparsers can normally be developed and tested in isolation from the rest of the parser. So once a parser is in place, it can be reused in arbitrary grammars.
  • Supports attaching semantic actions to rules. These can be "immediate" actions (run when the match is made) or "delayed" (queued when they match, then triggered after a whole parse succeeds).
  • Has a "sort of garbage collector" to free up resources allocated for a specific parser. This greatly simplifies memory management within the library and client-side rules. (Creation of rules often requires the allocation of dynamic resources.)
  • Complete API documentation. i'm a stickler for API docs.
  • Fairly small. The entire library currently compiles on gcc in about 1 second and using TinyCC it compiles in under 0.5 seconds. (Times for a 2.1GHz i386, compiling using 1 CPU.)

Requirements

  • pegc has no 3rd-party library requirements, only functions defined by ISO C (as if ISO is relevant any more, after the OOXML debacle).
  • pegc may (this is currently unclear) use some features which require C99 support, and may require that the user set a specific compiler option to enable this (e.g. in gcc it's -std=c99). On some compilers (some versions of gcc) the C99 flag apparently need not be explicit, but on others it apparently must be. Currently (20080929) pegc compiles on gcc 3.4.3, gcc 4.2.x, and Sun Studio 12 without any special options, implying that either C99 is on by default on those compiler or that the code doesn't have any C99-specific features.
  • Platforms: pegc is primarily developed on Linux with gcc 4.2+, but has also been shown to work on Solaris platforms with gcc 3.4.x and the Sun Studio 12 C compiler. It is believe that it will work on any C compiler which supports C99. (PS: the Sun Studio compiler apparently has no way to report the frigging version number - i had to search for 30 minutes to find it: cc -xhelp=readme)

Current Status

pegc is very much beta software, but the core parsing features are all in place and "seem to work", so it's mostly just a matter of refinement now. "Refinement" is a friendly way of saying, "i change the public API a lot as i experiment with different techniques."

News

  • 23 Dec 2008: Zajcev Evgeny, an surprised and stunned the world when he announced that he has created a PEG engine for the lisp programming language using pegc as the back-end! Zajcev claims, "pegc is quite solid and extensible."

Download

See this page for instructions.

TODOs

  • This page needs some info! Currently you've got to download the source to find out anything.
  • Add more core rules (port in any missing rules from parsepp).
  • Port in some of the utility parsers from the parsepp tree (e.g. URLs, quoted strings, numbers, calculator).
  • Add error reporting and logging mechanisms to the API.