(⬑Table of Contents)

Embedding and Extending s2 from C/C++

Jump to...

Introduction
- cwal Setup
- Core Concept: What is a Value?
Binding Functions
Custom Native Wrappers
- Rescopers
Simplify Bindings with C++ Template Magic

Introduction

This chapter introduces the process of binding client-side C and C++ code with s2/cwal.

s2, the scripting language, is built upon a language-independent scripting engine called cwal. cwal provides the base Value and memory management system, but knows nothing at all about any script language. Client-side bindings mostly concern themselves with the core scripting engine (cwal) rather than the concrete language (s2), so these docs refer far more to the former than the latter.

The first rule of binding client code to cwal is: never, ever (ever! (ever ever!!)) let a C++ exception pass through the s2/cwal C APIs. NEVER! Doing so will corrupt s2's internal state. Likewise, do not evaluate script code in the engine from an interrupt handler, as that will almost certainly (depending on many factors) also corrupt its internal state.

cwal Setup

Because s2 cannot know how a given client would like his cwal-level engine configured, setting up s2 unfortunately requires a bit of bootstrapping to configure a cwal engine instance. First a cwal instance has to be configured, which includes an allocator, configuration of the output stream and allocation/recycling strategies, and a number of other flags which have to be set up before it is handed over to s2. The s2 engine is then given the cwal instance, taking over ownership of it. As of that point they are inseparable: cwal provides the basic script-engine services s2 uses for the creation and management of memory (mostly abstract Values), including (if needed), the memory to allocate the s2 engine itself. (That said, the cwal engine and s2 engine are normally allocated on the stack - only internals need to be dynamically allocated. The engines both may optionally be dynamically allocated, but in practice never are except for testing purposes.)

The following apps all demonstrate that setup in full:

cwal's test.c demonstrates standalone cwal setup, independent of s2.
s2's test.c is the "original" s2 test app, from the time long before s2sh could run tests. It's still used as a sanity-checking test, and it provides the simplest (though not best-documented) demonstration of setting up an s2 engine.
s2's shell.c is the main source code for s2sh and demonstrates, in gross detail, the "real-world" setup of a cwal/s2 pair and the surrounding infrastructure.

Those, in particular s2's shell.c, demonstrate everything there is to know about setting up cwal and/or s2 from a client-side perspective. (That said, it is recommended that interested clients inspect those files in the order they are listed above, i.e. from most basic to most advanced.)

Core Concept: What is a Value?

cwal's base-most abstract, script-side data type is called cwal_value, which the documentation abstractly refers to as Value. It's an opaque type which acts as a handle for one of the higher-level types, be it an integer or an array or a buffer¹. Given a Value handle, cwal can access its higher-level representation, and vice versa. For example:

cwal_engine * e = ... /* the app's cwal context (typically 1 per
    app). All APIs which may need to allocate memory, and most which
    may need to free it, take a cwal_engine as their first argument,
    as that type manages *ALL* memory in cwal/s2. The exceptions to
    this pattern are functions from which cwal can derive the
    cwal_engine instance from other arguments.
*/;
cwal_array * a = cwal_new_array( e );
if(!a) return CWAL_RC_OOM;
cwal_value * v = cwal_array_value(a);
// The following hold:
assert( a == cwal_value_get_array(v) );
assert( NULL == cwal_value_get_object(v) );
assert( NULL == cwal_value_get_string(v) );

When this documentation refers to a Value, it means a cwal_value or one of its associated higher-level types.

Binding Native Functions

cwal's primary interface with client code is the cwal_callback_f() interface, which looks like:

int (*)( cwal_callback_args const * args, cwal_value ** resultValue );

This is the only interface cwal uses to bind native-side functions with script-side functions, and any cwal bindings are likely to have many implementations of this interface (one per script-bound function).

Such callbacks return 0 on success, with non-0 (see below) triggering an error in the interpreter. The args parameter holds the argument count and list, the this value, the callee (the function being called), and a reference to the underlying cwal engine (required in many contexts). Callbacks set their script-side result value (if any) in the resultValue parameter (defaulting, in script code, to the undefined value if they provide no result or assign *resultValue=NULL). If they return non-0, any resultValue is ignored, but practice suggests that implementations should not assign *resultValue except on success. The vast majority of implementations never need concern themselves with the intricacies of Value lifetimes, reference counts, and whatnot.

ACHTUNG: non-0 values returned from callbacks MUST either be (A) from the CWAL_RC_xxx or CWAL_SCR_xxx family of codes or (B) be guaranteed (by the user) not to collide with any such cwal-level result code, or results are undefined (as in Undefined Behaviour, not the s2 undefined value!). Why? Because many of the CWAL_xxx codes are interpreted specially by cwal and/or s2. For example, CWAL_RC_EXCEPTION means an exception was thrown (the catch keyword looks for this and expects to find an exception stored in the engine), and CWAL_RC_OOM means out of memory (a fatal error which ends script execution and is propagated back to the C-level caller). Returning a non-cwal result code which collides with a cwal code will (at some point) confuse cwal or s2. As a general rule, callbacks intended for execution from script-space must return 0, CWAL_RC_EXCEPTION, or CWAL_RC_OOM, as any other result codes are likely to be interpreted as fatal to the script. Other non-fatal codes, e.g. CWAL_RC_MISUSE or CWAL_RC_RANGE, can be wrapped up in an exception, if needed, using cwal_exception_setf() or one of its relatives, e.g. s2_cb_throw().

We refer the user to the example loadable module for several documented examples of creating and binding callback functions, as well as complete error handling.

That's about it for the intro. The example loadable module, along with its demo script, exist to provide a documented introduction to the topic, and interested readers are referred there for more details.

Custom Native Wrappers

cwal supports a mechanism, called "native" values, for binding custom client-side C/C++ values to Value instances. This mechanism is type-safe, in that clients can safely determine whether or not a given Value really holds a native of they type they are expecting. It allows the client to provide a finalizer function which gets called when the Value gets cleaned up, and it also provides a mechanism for clients to cleanly disconnect the native instance from its cwal-side value (optionally calling its finalizer or not). It allows clients to bind both dynamically-allocated and stack-allocated native values to the engine, provided they use valid combinations of finalizers (resp. none at all) and cleanup logic.

This section provides an overview of how it works and how to use it, with a small amount of example code. Full-fledged code can be found in the s2 source tree, most notably in the PathFinder class' implementation and the example loadable module, both of which demonstrates the complete native-binding process.

The first thing we need is a native type to bind to cwal. Let's start with this:

typedef struct {
 int value;
 // whatever other members we need...
} MyNative;

Next, we need (for the common case, anyway) a finalizer function which implements the cwal_finalizer_f() interface. It looks like:

static void MyNative_finalizer( cwal_engine * e, void * mem ){
 MyNative * my = (MyNative*)mem;
 // … do any type-specific cleanup here …
 cwal_free( e, mem )
   /* *ONLY* if mem was allocated using
      cwal_malloc()/cwal_realloc() */;
}

Note that not all bindings need a finalizer. Bindings to static resources, or those with lifetimes outside of (and longer than) s2's interpreter, often don't need a finalizer.

Normally (because i'm pedantic like that) we've also got an allocator function which initializes the native's memory to any defaults we need, but for simple cases, direct calls to cwal_malloc() and memset() suffices. Client code is not strictly required to use cwal's allocation APIs for its own memory, but doing so allows it to partake in cwal's memory recycling. In any case, memory allocated via cwal, via cwal_malloc() or cwal_realloc(), must be freed using cwal_free() or cwal_free2() (the latter is an optional optimization which is useful only if the caller knows the exact size of the to-be-freed memory).

Lastly, so that cwal can provide type safety for us, we need an arbitrary static constant internal pointer address to use as a type identifier tag. Any internal value we can get a void const * pointer to will suffice. Here's one:

static const int MyNative_TypeId = 0
  /* the value is 100% irrelevant, we use only its address */;

There are other useful utilities we'll want to add once everything is up and running, but with the above bits in place, we can now bind a MyNative instance to cwal resp. s2.

Creating new native Values is done using cwal_new_native() and cwal_new_native_value() (which does the same thing but returns a Value handle instead of a Native handle; given one of those two we can always derive the other). The exact context of such setup is binding-dependent, but abstractly it looks like the following (including error handling):

cwal_engine * e = …your local cwal_engine instance…;
MyNative * my = 0; // The "Native" part of the binding
cwal_value * vMy = 0; // The "Value" part of the binding

// First create the client-side half of the native:
my = (MyNative*)cwal_malloc( e, sizeof(MyNative) )/* [^53] */;
if(!my) return CWAL_RC_OOM
  /*    ^^^^^^^^^^^^^^^^^^
    *IMPORTANT*: all cwal/s2 contexts *require* CWAL_RC_xxx
    or S2_RC_xxx codes! Passing code from outside that
    range will cause Grief.
  */;
memset( my, 0, sizeof(MyNative) ); // or proper init code
// Now create the cwal half of the native:
vMy = cwal_new_native_value(e, my, MyNative_finalizer,
                            &MyNative_TypeId);
if(!vMy){ // Alloc failure: clean up...
 MyNative_finalizer(e, my);
 return CWAL_RC_OOM;
}
// As of here, the following assertions hold:
assert(
    my == cwal_native_get(
        cwal_value_get_native(vMy),
        &MyNative_TypeId
        /* ^^^ this part is what makes this operation
          type-safe. If we pass cwal_native_get() a second
          argument which does not match the one we
          passed to cwal_new_native_value(), or if we
          pass a non-MyNative Value instance as the first
          argument, cwal_native_get() will return NULL.
        */
    )
);
assert(
    my == cwal_native_get(
        cwal_value_native_part(e, vMy, &MyNative_TypeId),
        &MyNative_TypeId
        // ^^^ again, these (&MyNative_TypeId) type-id
        // flags are what make this type-safe.
    )
);

A complete example of such initialization can be found in the s2_pf_new() function in the PathFinder code.

Normally a value will be cleaned up, and its finalizer (if any) called, by the lifetime management system. For client-side types it is often necessary (or helpful) to add "explicit finalizers" in order to guaranty proper behavior in the C parts of the bindings, in particular when there are relationships between multiple parts which have both C- and script-side relationships. The canonical example is a database which creates prepared statement handles. It is normally necessary, for proper behaviour of the DB driver, to destroy the prepared statement handles before destroying the DB connection which created them. While the full intricacies of such lifetime management are beyond the scope of this introduction, we will say this: the basis of writing such "manual finalizers" is the cwal_native_clear() function, which disassociates a Value from its Native part and optionally calls its native finalizer (if any). The Value itself may still be visible in script-space, but any attempt to extract the MyNative part from it will result in a NULL pointer, which bindings must check to avoid dereferencing it. An example of how this is used can be seen in the POSIX regex module, some version of which is demonstrated here:

/* helper macro for use in cwal_callback_f() bindings */
#define THIS_REGEX \
    cwal_native * nself = cwal_value_get_native(args->self);    \
    regex_t * self = \
        nself  ? (regex_t *)cwal_native_get(nself, &s2porex_typeid) : NULL; \
    if(!self) return s2_cb_throw(args, CWAL_RC_TYPE,                      \
                                 "'this' is not (or is no longer) "\
                                 "a regex_t instance.")

/* cwal_finalizer_f() impl. for POSIX regex objects. */
static void s2porex_finalizer( cwal_engine * e, void * v ){
  regfree((regex_t *)v);
  cwal_free2(e, v, (cwal_size_t)sizeof(regex_t));
}

/* "Manual destructor" cwal_callback_f() binding. */
static int s2porex_cb_destroy( cwal_callback_args const * args,
                               cwal_value **rv ){
    THIS_REGEX;
    cwal_native_clear( nself, 1 )
      /* a truthy 2nd argument means to call the native finalizer */;
    *rv = cwal_value_undefined();
    return 0;
}

The THIS_REGEX macro seen there for "extracting" the native value from the callback arguments is a common sight in client-side native bindings, but is difficult to encapsulate directly into s2 because it requires type-specific knowledge and potentially type-specific handling, so must be implemented client-side.

Rescopers

One element missing from the examples above is a so-called "rescoper". A rescoper is a callback used by cwal to tell a native value that the value is undergoing a re-scope and that the native must (if appropriate) re-scope any (cwal_value*) members it's personally managing (as opposed to values which are held as, e.g. object-level properties (which are rescoped by the core library)). This is necessary only when natives hold cwal_value pointers, or pointers to one of the concrete Value-derived types, directly, outside of some other lifetime-management construct, and not all natives do this. There are several examples to be found in the sources for the s2 loadable modules. For a concrete example, search mod_sqlite3.c for the text rescoper_f_. As of this writing, that function looks like (edited slightly for readability and clarity):

static int
cwal_value_rescoper_f_s2x_sq3( cwal_scope * s, cwal_value * v ){
  cwal_native * n =
      cwal_value_native_part( s->e, v, &cwal_type_id_s2x_sq3 );
  s2x_sq3 * db =
      (s2x_sq3 *)cwal_native_get( n, &cwal_type_id_s2x_sq3 );
  int rc = 0;
  assert(n);
  assert(db);
  if(db->udfStore){
    rc = cwal_value_rescope(s, db->udfStore);
  }
  return rc;
}

That function implements the cwal_value_rescoper_f() interface in order to rescope the binding's udfStore member, a cwal_value*.

Search that file for that function name to see how the rescoper gets installed.

Simplify Bindings with C++ Template Magic

The file cwal_convert.hpp defines a header-only², C++98-compatible, template-based mechanism for converting between cwal value types and "native" C++ types. In principle it can be "taught" (via template specializations) to convert any types to which we can apply reasonable conversion semantics. Notably excluded is T **, which has multiple possible semantics, and we have to be careful with conversions from T* to ensure we aren't leaking memory. This mechanism is derived from earlier projects implemented for the Google V8 and Mozilla SpiderMonkey JavaScript engines, and is known to work very well in practice.

By itself, the type conversion APIs can simplify implementing bindings in C++ somewhat. The API takes it a step further, however, and provides template APIs to convert free functions, class methods, and constructors to cwal callback functions. Then it goes one little step further to implement (via templates) getter and/or setter function bindings to non-function class members. These features can easily save hundreds of lines of code (compared to native C) in any moderately-sized script bindings.

The file sample C++ module fully demonstrates how to use API and contains copious amounts of documentation. Here are a few examples of the type conversions from that code:

cwal_engine * e = /* … the app's cwal context */;
cwal_callback_f cb; // the cwal callback function interface
cwal_function * f; // high-level representation of a Function value
cwal_value * fv; // cwal Value representation of a Function/Object/whatever

// Each of the next three statements converts one
// function or method into a script-bindable equivalent:

cwal_callback_f freeFunc =
  ToCb< FunctionPtr<unsigned (unsigned), ::sleep> >::callback;

cwal_callback_f nonConstMethod =
  ToCb< MethodPtr<T, int(), &T::foo> >::callback;

cwal_callback_f constMethod =
  ToCb< MethodPtr<T const, int(), &T::bar> >::callback;

// Creating non-method function bindings: bind sleep(2):

typedef cwal::FunctionPtr<unsigned (unsigned), ::sleep> tSleep;
cb = cwal::ToCb< tSleep >::callback; // that was *easy*!

// or:
f = cwal::newFunction< tSleep >(e);
fv = cwal_function_value(f);

// or:
fv = cwal::newFunctionValue< tSleep >(e);
f = cwal_value_get_function(fv);

// Creating [const] method bindings requires teaching
// (via template specializations) the API how to
// convert one's own types, and enables the following:

typedef cwal::MethodPtr<
    MyType const,
    int(int), &MyType::func1c
> mFunc;
cb = cwal::ToCb< mFunc >::callback;

// or:
f = cwal::newFunction< mFunc >(e);

// or:
fv = cwal::newFunctionValue< mFunc >(e);

Sidebar: doing the equivalent in C requires at least 8-10x as much code (maybe 15-20x), adds a non-trivial amount of test effort, and leaves lots of room for "manual error," making this API an attractive approach even for applications which are otherwise (except for the bindings layer) pure C. If s2 is relegated to use as a testing tool within a C-only project's source tree, never intended to be distributed, it may indeed make sense to use C++ for the bindings solely for the sake of the time savings. (That said, using them properly requires a fair level of proficiency with C++ templates, and deciphering compilation errors for such templates takes some practice.)

There are various template options to control whether or not result values of native functions get converted back to cwal (some cannot be, either due to non-convertible types or because of semantic discrepancies), as well was whether or not to convert C++ exceptions to script exceptions (this is normally desired, but may not be needed/desired when going through multiple levels of proxies or when one knows that a given binding cannot throw, simply to avoid the try/catch overhead). When in doubt, leave "exception conversion" on, to avoid the possibility that a C++ exception propagates through the C API (which will, at the very least, corrupt C-level lifetimes/memory).

Similar templates can be used to create getter/setter routines for member properties.

Interested users can experiment with those example bindings via s2sh:

s2sh> var my = s2.loadModule('./sample_cpp.so')
sample_cpp.cpp:31:MyType(): MyType@0x1612380::MyType()
result: native@0x16123a0[scope=#1@0x7fffbd5246e8 ref#=2] ==> native@0x16123a0
s2sh> my.value()
result: integer@0x686601[scope=#0@(nil) ref#=0] ==> 1
s2sh> my.value(2)
result: native@0x16123a0[scope=#1@0x7fffbd5246e8 ref#=1] ==> native@0x16123a0
s2sh> my.value()
result: integer@0x1607340[scope=#1@0x7fffbd5246e8 ref#=0] ==> 2
s2sh> my.value(3).value()
result: integer@0x1607340[scope=#1@0x7fffbd5246e8 ref#=0] ==> 3
s2sh> my.funcStr("via std::string conversion")
sample_cpp.cpp:54:funcStr(): MyType@0x1612380::funcStr(via std::string conversion)
result: double@0x1607340[scope=#1@0x7fffbd5246e8 ref#=0] ==> 3.0

The value() function is mapped directly to a non-function member, and a template creates a combination getter-setter routine for us. If called with an argument, it converts and sets the value, then returns this (which these templates then convert from a native C++ type), otherwise it converts and returns the member's value. The conversions layer is relatively powerful, and has no problems converting, e.g. T const &, provided there is a legal conversion to T* (which the reference conversions use as their basis) or the client has "trained" it (via template specializations) to do the conversion for this type. When using the default cwal-to-native-reference type conversions most of them will trigger an exception if they cannot perform the conversion (the alternative would be stepping on an invalid reference). Some common cases, like std::string const &, are handled via template specializations which try to behave intuitively, insofar as cross-script/native semantics allow for.

Footnotes

^{^} There is one exception: the so-called "unique" data type, which s2 uses for enum entries, has no higher-level representation.
^{^} There's a second, script-generated header as well, which implements most of the N-ary templates.

[^ 53 ]

Tip: if the memory's size is static, prefer `cwal_free2()` when
freeing it, to allow the core to recycle that memory better.

s2: Embedding and Extending s2 from C/C++