Login
jaccwabyt.md
Login

File jaccwabyt/jaccwabyt.md from the latest check-in


Jaccwabyt 🐇

Jaccwabyt: JavaScript ⇄ C Struct Communication via WASM Byte Arrays

Welcome to Jaccwabyt, a JavaScript API which creates bindings for WASM-compiled C structs, defining them in such a way that changes to their state in JS are visible in C/WASM, and vice versa, permitting two-way interchange of struct state with very little user-side friction.

(If that means nothing to you, neither will the rest of this page!)

Browser compatibility: this library requires a recent browser and makes no attempt whatsoever to accommodate "older" or lesser-capable ones, where "recent," very roughly, means released in mid-2018 or later, with late 2021 releases required for some optional features in some browsers (e.g. BigInt64Array in Safari). It also relies on a couple non-standard, but widespread, features, namely TextEncoder and TextDecoder. It is developed primarily on Firefox and Chrome on Linux and all claims of Safari compatibility are based solely on feature compatibility tables provided at MDN.

Formalities:

The license for both this documentation and the software it documents is the same as sqlite3, the project from which this spinoff project was spawned:


2022-06-30:

The author disclaims copyright to this source code. In place of a legal notice, here is a blessing:

May you do good and not evil. May you find forgiveness for yourself and forgive others. May you share freely, never taking more than you give.


Table of Contents

Overview

Management summary: this JavaScript-only framework provides limited two-way bindings between C structs and JavaScript objects, such that changes to the struct in one environment are visible in the other.

Details...

It works by creating JavaScript proxies for C structs. Reads and writes of the JS-side members are marshaled through a flat byte array allocated from the WASM heap. As that heap is shared with the C-side code, and the memory block is written using the same approach C does, that byte array can be used to access and manipulate a given struct instance from both JS and C.

Motivating use case: this API was initially developed as an experiment to determine whether it would be feasible to implement, completely in JS, custom "VFS" and "virtual table" objects for the WASM build of sqlite3. Doing so was going to require some form of two-way binding of several structs. Once the proof of concept was demonstrated, a rabbit hole appeared and down we went... It has since grown beyond its humble proof-of-concept origins and is believed to be a useful (or at least interesting) tool for mixed JS/C applications.

Portability notes:

Architecture

<!-- bug(?) (fossil): using "center" shrinks pikchr too much. -->
StructBinderFactory StructBinder StructType<T> Struct<T> Ctor Struct<T> Instances C Structs Generates Contains Generates Constructs Inherits Shared Memory Mirrors Struct Model From Prototype of
BSBF: box rad 0.3*boxht "StructBinderFactory" fit fill lightblue
BSB: box same "StructBinder" fit at 0.75 e of 0.7 s of BSBF.c
BST: box same "StructType<T>" fit at 1.5 e of BSBF
BSC: box same "Struct<T>" "Ctor" fit at 1.5 s of BST
BSI: box same "Struct<T>" "Instances" fit at 1 right of BSB.e
BC: box same at 0.25 right of 1.6 e of BST "C Structs" fit fill lightgrey

arrow -> from BSBF.s to BSB.w "Generates" aligned above
arrow -> from BSB.n to BST.sw "Contains" aligned above
arrow -> from BSB.s to BSC.nw "Generates" aligned below
arrow -> from BSC.ne to BSI.s "Constructs" aligned below
arrow <- from BST.se to BSI.n "Inherits" aligned above
arrow <-> from BSI.e to BC.s dotted "Shared" aligned above "Memory" aligned below
arrow -> from BST.e to BC.w dotted "Mirrors Struct" aligned above "Model From" aligned below
arrow -> from BST.s to BSC.n "Prototype of" aligned above

Its major classes and functions are:

An app may have any number of StructBinders, but will typically need only one. Each StructBinder is effectively a separate namespace for struct creation.

Creating and Binding Structs

From the amount of documentation provided, it may seem that creating and using struct bindings is a daunting task, but it essentially boils down to:

  1. Confire Jaccwabyt for your WASM environment. This is a one-time task per project and results is a factory function which can create new struct bindings.
  2. Create a JSON-format description of your C structs. This is required once for each struct and required updating if the C structs change.
  3. Feed (2) to the function generated by (1) to create JS constuctor functions for each struct. This is done at runtime, as opposed to during a build-process step, and can be set up in such a way that it does not require any maintenace after its initial setup.
  4. Create and use instances of those structs.

Detailed instructions for each of those steps follows...

Step 1: Configure Jaccwabyt for the Environment

Jaccwabyt's highest-level API is a single function. It creates a factory for processing struct descriptions, but does not process any descriptions itself. This level of abstraction exist primarily so that the struct-specific factories can be configured for a given WASM environment. Its usage looks like:

const MyBinder = StructBinderFactory({
  // These config options are all required:
  heap: WebAssembly.Memory instance or a function which returns
        a Uint8Array or Int8Array view of the WASM memory,
  alloc:   function(howMuchMemory){...},
  dealloc: function(pointerToFree){...}
});

It also offers a number of other settings, but all are optional except for the ones shown above. Those three config options abstract away details which are specific to a given WASM environment. They provide the WASM "heap" memory, the memory allocator, and the deallocator. In a conventional Emscripten setup, that config might simply look like:

{
    heap:    Module['asm']['memory'],
    //Or:
    // heap: ()=>Module['HEAP8'],
    alloc:   (n)=>Module['_malloc'](n),
    dealloc: (m)=>Module['_free'](m)
}

The StructBinder factory function returns a function which can then be used to create bindings for our structs.

Step 2: Create a Struct Description

The primary input for this framework is a JSON-compatible construct which describes a struct we want to bind. For example, given this C struct:

// C-side:
struct Foo {
  int member1;
  void * member2;
  int64_t member3;
};

Its JSON description looks like:

{
  "name": "Foo",
  "sizeof": 16,
  "members": {
    "member1": {"offset": 0,"sizeof": 4,"signature": "i"},
    "member2": {"offset": 4,"sizeof": 4,"signature": "p"},
    "member3": {"offset": 8,"sizeof": 8,"signature": "j"}
  }
}

These data must match up with the C-side definition of the struct (if any). See Appendix G for one way to easily generate these from C code.

Each entry in the members object maps the member's name to its low-level layout:

The order of the members entries is not important: their memory layout is determined by their offset and sizeof members. The name property is technically optional, but one of the steps in the binding process requires that either it be passed an explicit name or there be one in the struct description. The names of the members entries need not match their C counterparts. Project conventions may call for giving them different names in the JS side and the StructBinderFactory can be configured to automatically add a prefix and/or suffix to their names.

Nested structs are as-yet unsupported by this tool.

Struct member "signatures" describe the data types of the members and are an extended variant of the format used by Emscripten's addFunction(). A signature for a non-function-pointer member, or function pointer member which is to be modelled as an opaque pointer, is a single letter. A signature for a function pointer may also be modelled as a series of letters describing the call signature. The supported letters are:

Noting that:

Sidebar: Emscripten's public docs do not mention p, but their generated code includes p as an alias for i, presumably to mean "pointer". Though i is legal for pointer types in the signature, p is more descriptive, so this framework encourages the use of p for pointer-type members. Using p for pointers also helps future-proof the signatures against the eventuality that WASM eventually supports 64-bit pointers. Note that sometimes p really means pointer-to-pointer, but the Emscripten JS/WASM glue does not offer that level of expressiveness in these signatures. We simply have to be aware of when we need to deal with pointers and pointers-to-pointers in JS code.

Trivia: this API treates p as distinctly different from i in some contexts, so its use is encouraged for pointer types.

Signatures in the form x(...) denote function-pointer members and x denotes non-function members. Functions with no arguments use the form x(). For function-type signatures, the strings are formulated such that they can be passed to Emscripten's addFunction() after stripping out the ( and ) characters. For good measure, to match the public Emscripten docs, p, c, and C, should also be replaced with i. In JavaScript that might look like:

signature.replace(/[^vipPsjfdcC]/g,'').replace(/[pPscC]/g,'i');

P vs p in Method Signatures

This support is experimental and subject to change.

The method signature letter p means "pointer," which, in WASM, means "integer." p is treated as an integer for most contexts, while still also being a separate type (analog to how pointers in C are just a special use of unsigned numbers). A capital P changes the semantics of plain member pointers (but not, as of this writing, function pointer members) as follows:

Step 3: Binding the Struct

We can now use the results of steps 1 and 2:

const MyStruct = MyBinder(myStructDescription);

That creates a new constructor function, MyStruct, which can be used to instantiate new instances. The binder will throw if it encounters any problems.

That's all there is to it.

Sidebar: that function may modify the struct description object and/or its sub-objects, or may even replace sub-objects, in order to simplify certain later operations. If that is not desired, then feed it a copy of the original, e.g. by passing it JSON.parse(JSON.stringify(structDefinition)).

Step 4: Creating, Using, and Destroying Struct Instances

Now that we have our constructor...

const my = new MyStruct();

It is important to understand that creating a new instance allocates memory on the WASM heap. We must not simply rely on garbage collection to clean up the instances because doing so will not free up the WASM heap memory. The correct way to free up that memory is to use the object's dispose() method.

The following usage pattern offers one way to easily ensure proper cleanup of struct instances:

const my = new MyStruct();
try {
  console.log(my.member1, my.member2, my.member3);
  my.member1 = 12;
  assert(12 === my.member1);
  /* ^^^ it may seem silly to test that, but recall that assigning that
     property encodes the value into a byte array in heap memory, not
     a normal JS property. Similarly, fetching the property decodes it
     from the byte array. */
  // Pass the struct to C code which takes a MyStruct pointer:
  aCFunction( my.pointer );
} finally {
  my.dispose();
}

Sidebar: the finally block will be run no matter how the try exits, whether it runs to completion, propagates an exception, or uses flow-control keywords like return or break. It is perfectly legal to use try/finally without a catch, and doing so is an ideal match for the memory management requirements of Jaccwaby-bound struct instances.

It is often useful to wrap an existing instance of a C-side struct without taking over ownership of its memory. That can be achieved by simply passing a pointer to the constructor. For example:

const m = new MyStruct( functionReturningASharedPtr() );
// calling m.dispose() will _not_ free the wrapped C-side instance
// but will trigger any ondispose handler.

Now that we have struct instances, there are a number of things we can do with them, as covered in the rest of this document.

API Reference

API: Binder Factory

This is the top-most function of the API, from which all other functions and types are generated. The binder factory's signature is:

Function StructBinderFactory(object configOptions);

It returns a function which these docs refer to as a StructBinder (covered in the next section). It throws on error.

The binder factory supports the following options in its configuration object argument:

API: Struct Binder

Struct Binders are factories which are created by the StructBinderFactory. A given Struct Binder can process any number of distinct structs. In a typical setup, an app will have ony one shared Binder Factory and one Struct Binder. Struct Binders which are created via different StructBinderFactory calls are unrelated to each other, sharing no state except, perhaps, indirectly via StructBinderFactory configuration (e.g. the memory heap).

These factories have two call signatures:

Function StructBinder([string structName,] object structDescription)

If the struct description argument has a name property then the name argument is optional, otherwise it is required.

The returned object is a constructor for instances of the struct described by its argument(s), each of which derives from a separate StructType instance.

The Struct Binder has the following members:

API: Struct Type

The StructType class is a property of the StructBinder function.

Each constructor created by a StructBinder inherits from its own instance of the StructType class, which contains state specific to that struct type (e.g. the struct name and description metadata). StructTypes which are created via different StructBinder instances are unrelated to each other, sharing no state except StructBinderFactory config options.

The StructType constructor cannot be called from client code. It is only called by the StructBinder-generated constructors. The StructBinder.StructType object has the following "static" properties1:

The base StructType prototype has the following members, all of which are inherited by struct instances and may only legally be called on concrete struct instances unless noted otherwise:

API: Struct Constructors

Struct constructors (the functions returned from StructBinder) are used for, intuitively enough, creating new instances of a given struct type:

const x = new MyStruct;

Normally they should be passed no arguments, but they optionally accept a single argument: a WASM heap pointer address of memory which the object will use for storage. It does not take over ownership of that memory and that memory must be valid at for least as long as this struct instance. This is used, for example, to proxy static/shared C-side instances:

const x = new MyStruct( someCFuncWhichReturnsAMyStructPointer() );
...
x.dispose(); // does NOT free the memory

The JS-side construct does not own the memory in that case and has no way of knowing when the C-side struct is destroyed. Results are specifically undefined if the JS-side struct is used after the C-side struct's member is freed.

Potential TODO: add a way of passing ownership of the C-side struct to the JS-side object. e.g. maybe simply pass true as the second argument to tell the constructor to take over ownership. Currently the pointer can be taken over using something like myStruct.ondispose=[myStruct.pointer] immediately after creation.

These constructors have the following "static" members:

API: Struct Prototypes

The prototypes of structs created via the constructors described in the previous section are each a struct-type-specific instance of StructType and add the following struct-type-specific properties to the mix:

API: Struct Instances

Instances of structs created via the constructors described above each have the following instance-specific state in common:

Appendices

Appendix A: Limitations, TODOs, and Non-TODOs

Appendix D: Debug Info

The StructBinderFactory, StructBinder, and StructType classes all have the following "unsupported" method intended primarily to assist in their own development, as opposed to being for use in client code:

Appendix G: Generating Struct Descriptions From C

Struct definitions are ideally generated from WASM-compiled C, as opposed to simply guessing the sizeofs and offsets, so that the sizeof and offset information can be collected using C's sizeof() and offsetof() features (noting that struct padding may impact offsets in ways which might not be immediately obvious, so writing them by hand is most certainly not recommended).

How exactly the desciption is generated is necessarily project-dependent. It's tempting say, "oh, that's easy! We'll just write it by hand!" but that would be folly. The struct sizes and byte offsets into the struct must be precisely how C-side code sees the struct or the runtime results are completely undefined.

The approach used in developing and testing this software is...

Below is a complete copy/pastable example of how we can use a small set of macros to generate struct descriptions from C99 or later into static string memory. Simply add such a file to your WASM build, arrange for its function to be exported2, and call it from JS (noting that it requires environment-specific JS glue to convert the returned pointer to a JS-side string). Use JSON.parse() to process it, then feed the included struct descriptions into the binder factory at your leisure.


#include <string.h> /* memset() */
#include <stddef.h> /* offsetof() */
#include <stdio.h>  /* snprintf() */
#include <stdint.h> /* int64_t */
#include <assert.h>

struct ExampleStruct {
  int v4;
  void * ppV;
  int64_t v8;
  void (*xFunc)(void*);
};
typedef struct ExampleStruct ExampleStruct;

const char * wasm__ctype_json(void){
  static char strBuf[512 * 8] = {0}
    /* Static buffer which must be sized large enough for
       our JSON. The string-generation macros try very
       hard to assert() if this buffer is too small. */;
  int n = 0, structCount = 0 /* counters for the macros */;
  char * pos = &strBuf[1]
    /* Write-position cursor. Skip the first byte for now to help
       protect against a small race condition */;
  char const * const zEnd = pos + sizeof(strBuf)
    /* one-past-the-end cursor (virtual EOF) */;
  if(strBuf[0]) return strBuf; // Was set up in a previous call.

  ////////////////////////////////////////////////////////////////////
  // First we need to build up our macro framework...

  ////////////////////////////////////////////////////////////////////
  // Core output-generating macros...
#define lenCheck assert(pos < zEnd - 100)
#define outf(format,...) \
  pos += snprintf(pos, ((size_t)(zEnd - pos)), format, __VA_ARGS__); \
  lenCheck
#define out(TXT) outf("%s",TXT)
#define CloseBrace(LEVEL) \
  assert(LEVEL<5); memset(pos, '}', LEVEL); pos+=LEVEL; lenCheck

  ////////////////////////////////////////////////////////////////////
  // Macros for emiting StructBinders...
#define StructBinder__(TYPE)                 \
  n = 0;                                     \
  outf("%s{", (structCount++ ? ", " : ""));  \
  out("\"name\": \"" # TYPE "\",");          \
  outf("\"sizeof\": %d", (int)sizeof(TYPE)); \
  out(",\"members\": {");
#define StructBinder_(T) StructBinder__(T)
// ^^^ extra indirection needed to expand CurrentStruct
#define StructBinder StructBinder_(CurrentStruct)
#define _StructBinder CloseBrace(2)
#define M(MEMBER,SIG)                                         \
  outf("%s\"%s\": "                                           \
       "{\"offset\":%d,\"sizeof\": %d,\"signature\":\"%s\"}", \
       (n++ ? ", " : ""), #MEMBER,                            \
       (int)offsetof(CurrentStruct,MEMBER),                   \
       (int)sizeof(((CurrentStruct*)0)->MEMBER),              \
       SIG)
  // End of macros.
  ////////////////////////////////////////////////////////////////////

  ////////////////////////////////////////////////////////////////////
  // With that out of the way, we can do what we came here to do.
  out("\"structs\": ["); {

// For each struct description, do...
#define CurrentStruct ExampleStruct
    StructBinder {
      M(v4,"i");
      M(ppV,"p");
      M(v8,"j");
      M(xFunc,"v(p)");
    } _StructBinder;
#undef CurrentStruct

  } out( "]"/*structs*/);
  ////////////////////////////////////////////////////////////////////
  // Done! Finalize the output...
  out("}"/*top-level wrapper*/);
  *pos = 0;
  strBuf[0] = '{'/*end of the race-condition workaround*/;
  return strBuf;

// If this file will ever be concatenated or #included with others,
// it's good practice to clean up our macros:
#undef StructBinder
#undef StructBinder_
#undef StructBinder__
#undef M
#undef _StructBinder
#undef CloseBrace
#undef out
#undef outf
#undef lenCheck
}

<style> div.content { counter-reset: h1 -1; } div.content h1, div.content h2, div.content h3 { border-radius: 0.25em; border-bottom: 1px solid #70707070; } div.content h1 { counter-reset: h2; } div.content h1::before, div.content h2::before, div.content h3::before { background-color: #a5a5a570; margin-right: 0.5em; border-radius: 0.25em; } div.content h1::before { counter-increment: h1; content: counter(h1) ; padding: 0 0.5em; border-radius: 0.25em; } div.content h2::before { counter-increment: h2; content: counter(h1) "." counter(h2); padding: 0 0.5em 0 1.75em; border-radius: 0.25em; } div.content h2 { counter-reset: h3; } div.content h3::before { counter-increment: h3; content: counter(h1) "." counter(h2) "." counter(h3); padding: 0 0.5em 0 2.5em; } div.content h3 {border-left-width: 2.5em} </style>


  1. ^ Which are accessible from individual instances via theInstance.constructor.
  2. ^ In Emscripten, add its name, prefixed with _, to the project's EXPORT_FUNCTIONS list.