nosjob  AtomInternals

Atom Internals: what's inside an Atom?

The library's basic value type is called Atom. Each of the concrete data types (e.g. Integer, Double, and Array) subclasses the atom type. One key property of nosjob is that it does not use virtual inheritance. Why not? Because in order to keep the value semantics which this library needs (or really, really wants), all of the internal state of any concrete Atom must fit within the Atom type itself. This allows us to do the following:

Atom atom = Object();
assert( Object::isObject(atom) );
... pass atom around and do things with it ...
... downstream we can do: ...
Object o = Object::cast(atom);

That "upgrades" the basic atom value to an Object (or throws if the Atom was not an Object to begin with).

This support is achieved as follows...

The Atom class internally holds two values:

  • A (AtomAPI const *), which defines the "virtual" operations which Atoms must support.
  • A (mutable void *), which holds any type-specific internal state. Some simple types (Integer and Boolean) store their value directly in this pointer. More complex types (e.g. Strings, Object, and Array) have to allocate an internal structure and store it here. This value is mutable to allow us to do delayed instantiation of the internal state in some cases (namely copy constructors/operators when the source (right-hand-side) object has not yet been populated).

AtomAPI is a class which specifies the core operations which all concrete Atom types must support. In the abstract, these operations are:

  • Copy the value of a source Atom to a target Atom.
  • Compare two Atoms to each other, following memcmp() semantics.
  • Destroy an Atom's internal state.
  • A "to boolean" operation which returns the value of an Atom as a boolean. e.g. Strings evaluate to true if they are not empty.
  • Fetch an Atom's Type ID (an enum value specified by the core API).

For any given concrete AtomAPI implementation, there is one AtomAPI instance, and each Atom of a given concrete type holds a pointer to that one instance. When a high-level Atom is copied, the following happens:

  • The AtomAPI implementation cleans up the target Atom's internal state (e.g. reduces the reference count to its current data object).
  • The AtomAPI implementation copies the internal state from the assigned-from Atom to the target Atom. More details below.

The meaning of "copy" is type-specific. Some types (e.g. Integer and Boolean) simply copy the internal data pointer, whereas an Object or Array will copy the data pointer and then increase a reference count in the referenced data object. Thus copying an Object is only a few operations more expensive than copying an Integer (an immeasurably small difference, for all intents and purposes).

When an Atom is copied, the logical type of the target Atom changes to that of the source Atom. For example:

Atom a = Integer(3);
assert( Integer::isInteger(a) ); // true
a = Double(3.4);
assert( Double::isDouble(a) ); // true

Some concrete types support cross-type copy operations to some degree. For example, the numeric types (Boolean, Integer, and Double) can all freely convert between themselves. Likewise, Utf8String and Utf16String objects can be converted to and from each other, but they cannot be "converted" to an Object or a Array because such a conversion is meaningless.

When copying "immutable" Atoms, creating a copy truly creates a new copy with its own value. For example:

Integer i1(1);
Integer i2(2);
Integer i3 = i2;
assert( i1.value() == i3.value() );
i3 = Integer(3);
assert( i2.value() == 2 ); // true

Again, this "copy" is literally just a pointer copy (or reference count increase, which is logically the same as a copy).

This is in contrast to mutable types (Object and Array):

Object o1;
Object o2(o1);
Object o3(o2);
// o1, o2, and o3 all reference the same internal state,
// and a change to one affects them all:
o3.set( Integer(3), Double(3.4) );
assert( 1 == o1.size() ); // true
assert( 1 == o2.size() ); // true
assert( 1 == o3.size() ); // true

The easiest (but not the cheapest nor most efficient) way to deeply copy a mutable Atom is to convert it to JSON:

Utf8String json = atomToJSON(object);

And then back to an Object tree:

Atom root = JsonParser().parse(json);

See the JsonParser page for more details on using the parser.

TODO: add a deep-copy operation, e.g. Atom::clone(), which performs a deep copy.