libcwal  Artifact Content

Artifact 769b61dd3abcfad304d1f86c6da0f82807157fe7:

Wiki page [HowTo] by stephan 2016-01-12 20:21:35.
D 2016-01-12T20:21:35.531
L HowTo
P ecee4942eb41c6a8ca6778c61cace2455a6da597
U stephan
W 13093
<h1>Using the cwal C API</h1>

This page provides a brief how-to for cwal's C API.

<h2>Headers</h2>

<nowiki><pre>
#include "wh/cwal/cwal.h"
/* or, if using the amalgamation build: */
#include "cwal_amalgamation.h"
</pre></nowiki>

<h2>Initializing and Cleaning Up</h2>

The first thing we need is an "engine" instance, which manages scoping and memory for us. An engine instance requires a so-called "vtab", which is a collection of "virtual" functions which clients can provide to customize certain behaviours of the engine. For example, the vtab defines the memory allocator and a generic output mechanism suitable for use by scripting implementations.

Here is an example of setting up an engine (see [/finfo?name=test.c|test.c] for the full implementation):


<nowiki><pre>
    cwal_engine_vtab vtab = cwal_engine_vtab_basic;
    cwal_engine E = cwal_engine_empty
        // MAKE SURE to initialize stack-allocated instances this way!!!!
        ;
    cwal_engine * e = // demonstrate using a stack- vs heap-based engine.
        1 ? &E : NULL;
    int rc;

    // Set up our "virtual" functions:

    // cwal_output() sends its output to...
    vtab.outputer.output = cwal_output_f_FILE;
    vtab.outputer.state.data = stdout;

    // Tracing output (if enabled) goes here:
    vtab.tracer = cwal_engine_tracer_FILE;
    vtab.tracer.state = stdout;

    // Our app-local initialization hook:
    vtab.hook.on_init = e_init_engine;
    vtab.hook.init_state = e;

    // Now initialize e with our vtab:
    rc = cwal_engine_init( &e, &vtab )
      // yes, e's address  ^^! See the API docs for why.
      ;
    if(rc) { ...error... }
    else { ...use e... }

    // Eventually clean up (regardless of success or failure):
    cwal_engine_destroy(e);
</pre></nowiki>


<h2>Creating and Freeing Values</h2>

Consider this code:

<nowiki><pre>
int rc = cwal_scope_push( e, NULL );
if(rc) {
    // Serious error!
    // Will only fail if !e, an OOM condition, or perhaps memory
    // corruption or some such
}else{
 // use e, then... 
 cwal_scope_pop( e ); // MUST be called for each _successful_ push()
}

// sidebar: there are a couple other options for pushing scopes, depending on how
// one wants to manage the memory of the scope object (e.g. it may be stack-allocated).
</pre></nowiki>

Between the push/pop calls, all new values are initially owned by the pushed scope and will be cleaned up before <tt>cwal_scope_pop()</tt> returns. The library internally tracks the <em>oldest</em> scope which ever "claims" a value, such that a value will (generally) always be tracked by exactly one scope. There are brief periods in a value's lifetime where it has no scope or points to a higher-level scope than it should, but the library accounts for these, correcting them if needed.

When a value is inserted into a container value (an Array or Object) the container ensures that the value is managed by (or transferred to) a scope which  is at the container's level or lower. This effectively ensures that values live as long as all of their references and gives us a solid set of rules which we can use for proper cleanup of scopes in the face of cycles.

Values are always created within the context of the currently-active cwal scope (the top scope on the stack). There are a number of factory functions which create values, as demonstrated here:

<nowiki><pre>
cwal_value * vInt = cwal_new_integer( e, 42 );
cwal_value * vBool = cwal_new_bool(1); //cwal_value_true()/cwal_value_false()
cwal_value * vDbl = cwal_new_double( e, 24.42 );
cwal_value * vNull = cwal_value_null();
cwal_value * vundef = cwal_value_undefined();
cwal_string * str = cwal_new_string( e, "Hi!", 3 );
cwal_value * vStr = cwal_string_value(str);
assert( cwal_value_get_string(vStr) == str );
cwal_object * obj = cwal_new_object(e);
cwal_array * ar = cwal_new_array(e);
// There are a few others as well.
</pre></nowiki>

Each of those values is initially managed by the current scope, as described above, and the client does not need to explicitly release them (it can be done, though).

All non-container values are <em>immutable</em> - their underlying "native" value cannot be modified without destroying/recreating them (this is not as expensive as it sounds due to refcounting and memory recycling). The container types are of course mutable (or they would be pretty useless). Strings are immutable, and cannot be made otherwise without sacrificing at least two <tt>malloc()</tt>-reduction optimizations which rely on the current string-allocation mechanism.

When values are inserted into and removed from containers, their reference counts are managed to ensure that a value can survive longer than its scope and get cleaned up when the last reference is released. If a container holding it gets moved to a higher-level (older) scope , it gets up-scoped, too, if needed.

Clients, if they want to keep values alive, manage lifetimes using a combination of refcounts and scope ownership. Let's suffice with this simplified description (which is accurate but is not the whole truth)...

A client may obtain a reference for himself by calling <tt>cwal_value_ref()</tt>. For each time a value is passed to that, it <em>must</em> be passed to <tt>cwal_value_unref()</tt>. Think of <tt>cwal_value_ref()</tt> as resource acquisition and <tt>cwal_value_unref()</tt> as the release of that resource. Unref'ing might or might not immediately free the value, but the client must treat it as if it were semantically freed, and not refer to the pointer again. (While, in fact, it <em>might</em> be valid for some time due to recycling and such, accessing it invokes Undefined Behaviour.) If a client fails to call unref, the owning scope will eventually reap the value (all references to it, minus any cleaned up by containers in the same scope). i.e. it will be cleaned up eventually, but it represents a virtual memory leak as long as the referencing scope is alive).

<h2>Working with Values</h2>

The cwal value types map naturally to commonly-used value types. An overview of the types:

   *  Integers, doubles, booleans.
   *  The special-case "null" and "undefined" values.
   *  Strings: the library does not assume a specific encoding, with the exception that the cwal-to-json utility code assumes UTF8. Minor bug: currently cannot work with UTF16 because it assumes 0 is the end-of-string byte. UTF8 should be fine.
   *  Arrays provide generic list support. They can also hold properties like...
   *  Objects provide map-like support, with the caveat that they currently have O(N) search time because the scoping rules originally required that they be kept in insertion order (that is no longer the case, since we cannot guaranty destruction order once a value leaves its original scope). At one point hashtables were used for property storage (with O(1) search speed), but the memory cost was deemed too high.
  * Hash Tables offer faster property storage and retrieval, but (normally) cost more memory than Objects (though recycling improvements in late 2014 reduced that gap considerably).
   *  "Natives" are client-specified types wrapped behind a cwal value handle, allowing clients to integrate their own types via cwal's value/scoping/lifetime mechanisms, including a finalizer method to clean up the value when the engine is done with it. Natives transparently take part in reference counting and virtual extend the Object type, so they can hold member properties (and thus participate in graphs).

<nowiki><pre>
// Be sure to read the APIs to understand which conversions are legal
// and which are not!
cwal_int t i = cwal_value_get_integer(v);
cwal_double_t d = cwal_value_get_double(v);
cwal_string * str = cwal_value_get_string(v);
char const * cstr = cwal_string_cstr(str);
cwal_size_t slen = cwal_string_length_bytes(str);

int rc;

rc = cwal_object_set(e, obj, "foo", 3, someValue ); //3==strlen("foo")
assert(0==rc);
rc = cwal_object_set_s(e, obj, str, someValue );
assert(0==rc);
rc = cwal_array_set( e, array, someValue );
assert(0==rc);
...
</pre></nowiki>

The container APIs are missing a good deal of functionality because they derive from a JSON API which didn't need a full set of utility APIs. They will be fleshed out as needed. Containers have to take care to keep the scope-tracking mechanism on track (primarily (possibly only) for the sake of cycles), making them slightly more tedious to implement than they would otherwise be (but not overly-so).

<h2>Trivia: How Destruction Works</h2>

Destruction of values happens at only one time: when its reference count drops to 0. This normally happens when a scope is popped and it cleans up anything it holds a reference to, normally in reverse of the order it acquired the references (unless scope-migration or reference management changes this). Trying to clean up a set of cyclic values from client code can be messy, and this is best left to a parent scope. It is capable of weeding out cycles during destruction. <em>How</em> a it cleans up is important to understand in order to avoid some orphaning- and mis-destruction pitfalls the API currently allows for (but aims to make difficult to do).

The scope cleanup algorithm <em>basically</em> looks like this extremely over-simplified rendition:

<nowiki><pre>
// values == a list of value handles
while( values.length > 0  ) {
  foreach(values as v [in reverse order]) unref(v);
}
</pre></nowiki>

Note the outer "while" loop. When a value's refcount drops to 0, part of the cleanup process is to remove it from its controlling scope (which this loop asserts is the case). That changes the list we are looping over (the algo accounts for that, of course). When we are done unref'ing each entry 1 time, we check the size of the list. If it is not empty, we do it again. And again, and again... This process weeds out any cycles one step at a time. However, it also means that if a client references a value multiple times but does not unref an equal number of times, its controlling scope will actually loop over all reference counts (including the ones from the client), as many times as it takes to clean it up. This is not <em>actually</em> a feature, and it might make implementing closures more difficult or impossible, but it turns out to work rather well with the whole strict-scoping model as long as clients understand and apply some rules if they want to take reference/lifetime management into their own hands (as opposed to leaving it to a scope). The rules are not yet completely documented because the APIs for these parts are still under consideration. Stay tuned...


cwal's slowest component is the wiring required for handling proper cleanup in the face of cycles, and this wiring affects other places than just the cleanup code. (Actually: valgrind says that <tt>malloc()</tt> takes far more time than the cleanup!) Cleanup is, computationally speaking, somewhat slow because it has to traverse cyclic graphs one time per cycle, reducing the refcount of each item along the way once per traversal. (Side-note: it doesn't actually know if there are cycles but side-effects the ownership/reference counting rules allow it deduce this as it goes.) For non-graphs this is the same cost as normal cleanup (effectively O(N), N=total number of values), but for cycles it turns into some function of O(N*X), where X is a function of the number of cycles in the current cleanup-run and N might or might decrement by some number less than or equal to N for each iteration over the clean-up list (depending on whether a given iteration breaks any cycles or removes any non-cyclic items). My brain isn't big enough to do the math. This infrastructure also requires tracking of a parent scope for each value, and that infrastructure can impact the speed of operations which move a value from one scope to another (because they might have to ensure that their contained children also have the proper scope).

Historic note: cwal originally used arrays to manage ownership lists. It was easy to do but led to several corner cases where an OOM error could cause an otherwise harmless operation to lead to undefined ownership of a value (i.e. potentially a leak). These arrays included the ownership list in a scope, the recycle bin, and the deferred GC queue. As an experiment i turned the Value handles into linked lists of Value handles to speed up the various list-related operations (from worst-case O(N) to guaranteed O(1) for what we do with them). The switch to linked lists costs (predictably) fewer overall allocations, but also decreased the average Value memory cost because we lost the arrays being held elsewhere (which were holding the same data in a more verbose form). It also removed several potential error cases in OOM conditions from which we had no recovery strategy. And it turns out to be easier to manage the values' scope-level ownership this way, so that transition was an overall win-win.
Z 5970ebbec57d737c7d2577e9722aedad