libcwal  Update of "s2"

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.


Artifact ID: aea50bbc93a9dc03400b5155d261f1f8705d382e
Page Name:s2
Date: 2017-12-01 09:57:33
Original User: stephan
Parent: e35071ef8a821e8be4e51feb5657e4e213afb1ce

s2: cwal's 2nd scripting language

(Actually, it's the 3rd, but (A) the first one was a throwaway prototype and (B) S3 is the name of a chip manufacturer.)

s2 is a cwal-based scripting engine, th1ish's successor, and aims at the same niche: a small platform for adding scripted test harnesses to C libraries. s2 provides a JavaScript-like environment and is feature-complete. Experience with th1ish (which s2 surpasses in almost every way) has shown that the underlying engine can in fact do quite a lot more than just run unit tests, such as generating whole web sites. For example, libfossil uses custom s2 bindings to provide sqlite database access. Aside from scripting test harnesses, it can be used for general-purpose scripting for a variety of applications or libraries, provided its built-in limits are respected (namely, it is strictly single-threaded, has highly unconventional symbol lookup rules, and in some places sacrifices performance in favor of a smaller memory footprint).

s2 is intended to be embedded directly into client projects, as opposed to being provided as an external library (though it can be used that way, too, of course), and distributes as 2 source files (one header and one C file) which clients simply compile right in their project. To build this distribution, simply run "make amal" from the s2 subdirectory of the source tree. Its sources do not include the "exports" needed for making a Windows-format DLL.

s2, as of this writing (20141207), implements features undreamed of when the cwal project was started, pushing it well past any initial visions for the code. In particular, subtle refinements in the garbage collection in November/December 2014 have essentially ended any concerns i formerly had about the viability of the GC core. (That's not to say it's bullet-proof, but the biggest of the known GC death-traps have been adequately resolved.) That said, there is still plenty of room for experimentation and the cwal/s2 code bases have proven to be relatively open to experimentation.

s2 includes a header-only C++ cwal/native type conversion layer which, using only template voodoo, literally converts near-arbitrary C/C++ values, functions, methods, non-function members, and constructors to cwal callback functions (resp. accessor/mutator functions) with complete type- and NULL-pointer safety using the magic of C++ templates. It includes a mechanism for defining complex callback dispatching rules, created using only typedefs, to route script-based calls to one of any number of potential native functions. See s2/cwal_convert.hpp for details and s2/sample_cpp2.cpp and s2/sample_cpp2.s2 for examples. While it currently lives in the s2 directory, it's not dependent on s2, and can work with any cwal client code. The C++ layer saves so much work (writing and testing code) that it's a compelling option for writing bindings even when the rest of the host project is pure C.


Those documents (in particular the first two) are "the" place for all information about s2 (totaling well over 160 pages, as of 20160225), but here's a brief overview of its primary advantages over th1ish:

  • th1ish is fairly light compared to all other engines i compared it to, but s2 is lighter. Current rough estimates, based on apples-to-oranges comparisons, put it at just over half as much memory as th1ish for the unit tests and far less for more generic work.
  • A stack-machine-based evaluation engine (th1ish is pure recursive-descent, and has no stack machine). Supports C/C++ operator precedence, and adding new operators is normally a simple manner (unlike in th1ish, where it could be quite difficult).
  • Much easier to maintain (internally) due to a better separation of duties between the parser and the stack machine.
  • A cleaned up, more flexible syntax. Supports JavaScript-style inlined Arrays and Objects, as well as the [braces] property access operator. It also does away with most of th1ish's TCL heritage, going more for JavaScript-like.
  • The only way in which th1ish is potentially superior is tokenization speed. th1ish pre-tokenizes an entire script and internally works with a higher-level linked list of tokens which (normally) take up much more memory than the inputs. s2 tokenizes one token at a time, and has to re-tokenizes parts of a script it traverses more than once (e.g., loop and function bodies). Computationally speaking, though it hasn't been measured, th1ish's evaluation engine is faster in that regard. However, s2's core evaluation engine is (computationally speaking) faster, which "might" make up for the re-tokenization efforts. It hasn't been measured because they are what they are and th1ish is now obsolete, so it's not worth spending much effort on.

All that being said, s2 reuses (well, copy/pastes) the vast majority of th1ish's prototype-level infrastructure, as well as its module loader. It also leveraged a lot of experience gained from developing th1ish, allowing it to be created from scratch with about 5 weeks and roughly 200 commits of effort. th1ish's first edition took roughly 10 weeks and well more than twice as many commits. Much of that (basically everything outside the core evaluation engine) was code which s2 got to steal outright, saving a lot of time on this go-around. It turns out that most code at the prototype/class level depends only on the core engine (cwal), not the script language (th1ish or s2), making it easy to share C-level code between the two cwal clients.