libcwal  Artifact Content

Artifact 461a66c48507a74eddf2df49fd7204f420ab77b7:

Wiki page [MemoryModel] by stephan 2014-05-09 15:22:20.
D 2014-05-09T15:22:20.429
L MemoryModel
P 1b768d4b0b3a6be769399d57e539db3b350e36dc
U stephan
W 14903
See also: [DataTypes], [cwal_gc]

<h1>cwal's Memory Model</h1>

cwal gets its memory from a client-supplied allocator (a default which uses the system allocator is provided, of course). On top of that allocator cwal performs various higher-level memory-management tasks such as the initialization, destruction, and recycling of abstract Values. 

cwal's primary feature is its management of Values on behalf of the client. It manages such memory in scopes, which behave more or less like C++ scopes, in that only one can be active at a time, there is a stack of them, and when one ends (is popped from the stack) all values created in that scope but not somehow passed out of that scope are cleaned up. That's more or less cwal's core functionality. It does this through the use of several abstractions...

Values are all of a base type which provides some small degree polymorphic behaviour via a simple subclassing mechanism, allowing the engine to work with a base interface more often than not. Sometimes, for the sake of handling cycles, we have to differentiate between "containers" and "simple" values (those which can contain child values and those which cannot) because containers can lead to cycles.

Scopes act as a "root" for allocations, and newly-allocated values are owned by the currently-active scope. When a scope is cleaned, all values it currently owns are cleaned up in such a way as to provide deterministic behaviour in the face of cycles, such that all cycles can be broken and cleaned up properly.

Ownership of values uses a mixture of scope-level and reference counting. At the base, reference counting tells us when a value can be freed. Scope ownership tells us who is responsible for eventually triggering the destruction if a client does not do it on his own. (Were it not for cycles, this would be a lot simpler, by the way.) As a value "is referenced by" a higher-level (older) scope, its ownership migrates to that scope, such that only one scope will ever take responsibility for cleaning up a value. The rules guaranty (insofar as the code is bug-free!) that values in a scope cannot reference values in lower-level (newer) scopes because referencing such values migrates them in to the older scope, and therefore can be cleaned up without risk of stomping on cycles created via other scopes (i.e. the cleanup-roots). It is not possible to have a cycle which reaches into a lower (newer) scope because such a reference would cause the lower-scoped (newer) value to be migrated to the higher (older) scope (<em>only</em> for memory management purposes, not client-side-visibility purposes!).

Destruction of a value is really a three-step process: the first is "unreferencing" the object - reducing its reference count via <tt>cwal_value_unref()</tt>. As long as the adjusted refcount is above zero, unref'ing does nothing. Once it reaches zero, a value-type-specific cleanup function (part of Value's virtual interface) is called to free any resources owned by that value, but not free the Value memory. After (type-specific) cleanup, the value's memory is sent to the recycling subsystem. This memory is still "stamped" with enough information to allow us to re-use much of it for future allocations of the same Value types. (We cannot recycle memory across different Value types because of restrictions imposed by other malloc-reduction optimizations.) There are cases where reaching a zero refcount causes a value to re-enter the "temporary" state in a higher scope, rather than being destroyed, in order to avoid having to "leak" a reference in some cases involving value propagation.

cwal uses a simple recycling mechanism to keep some configurable number of value allocations (configured on a per-Value-type basis) in memory instead of freeing them immediately when their time has come. When creating new values of the same type, the recycle bin is checked, and if an entry exists it is recycled. This means that a loop which simply allocates and deallocates a value might actually only allocate once. Some types (namely strings) are not "as recyclable" because we can only recycle strings for new strings of (roughly) the same size, but auto-interning of strings makes often-used strings very cheap. Recycling does not require any additional memory, it only delays the freeing of already-allocated memory for potential re-use. Because Values can form a singly-linked list, the recycler can use that to keep track of recycled values without requiring any extra significant infrastructure to do so. The (small) tracking bits have a static size and are allocated as part of the core engine structure, and the linked list capabilities of the base Value type  provides the rest.

When cycles come in to play, it is possible that a "dead reference" will be traversed during cleanup of those cycles. cwal manages this (partially) by using a "deferred gc queue." All container values which are sent to the recycling bin (as a side-effect of unref'ing) during destruction of a scope are temporarily directed to the gc-queue, which requires no memory and O(1) time (just re-linking of one linked list entry). The cleanup process recognizes if it is stepping on a dead reference and skips the entry (and emits a tracing message if those are activated). When the scope has finished cleaning up it "flushes" the gc queue, which causes those entries to either go into the recycling bin or (if it is full or disabled) to be freed.

A final level of memory management is the "interning" of strings: cwal can optionally "internalize" strings automatically, such that if two strings with the same bytes are created, they will share the same underlying Value handle, requiring no allocation for the 2nd and subsequent instances. This feature is quite fast (amortized O(1)) but uses "significant" memory: 2kb-4kb (depending on how i've configured it!) for each page of its interning table, and the number of pages equals the highest number of collisions of any single hash value amongst all unique strings. For small uses a single page normally suffices, 500 unique strings can cause it to create 6-9 pages, but that's 6-9 mallocs vs. 500 (assuming those strings are re-used - if not, interning is a waste of memory). Interning uses a special-purposes hashtable-like structure which allocates whole tables instead of individual entry objects and only holds values (not keys). This causes <em>far</em> fewer mallocs and <em>can</em>, under ideal circumstances (but normally does not), use less memory than an equivalent conventional hashtable with the same load. Internalizing of strings is managed solely during allocation and finalization of string values, and does not <em>directly</em> change how other library-level code works (but does cause us to disallow some minor internal optimizations we might otherwise be able to make, and its side-effects influence a significant internal design decision in the scope cleanup code). Regardless of the memory effects, string interning can vastly speed up certain types of string usage (e.g. identifier lookups in scripts). In some tests (not specifically aimed at this problem) script code runs twice as fast with string-interning due to how the scripting language resolves (relatively inefficiently) identifier strings.

Does this all really work?

<h1>The Proof is in the Pudding...</h1>

Now a short demonstration of the above-mentioned moving parts, working in unison. The following output was generated by [th1ish], and shows cwal-internal allocation metrics when run across 23 unit test script files, totaling about 3000 lines of code, in a single interpreter session. An explanation of the numbers follows:

<blockquote><nowiki><pre>
ValueTypeId/Name     CreationRequests   Actually allocated count * Value-type sizeof() ==> total bytes from allocator

3  integer           21044              96     (000.46%)  * 48 ==> 4608     
4  double            101                18     (017.82%)  * 48 ==> 864      
5  string            3164               468    (014.79%)  * 48 ==> 32064    Incl. string bytes
6  array             600                29     (004.83%)  * 88 ==> 8032     Incl. peak list memory
7  object            732                32     (004.37%)  * 64 ==> 2048     
8  function          233                158    (067.81%)  * 96 ==> 15168    
9  exception         49                 5      (010.20%)  * 72 ==> 360      
10 native            2                  2      (100.00%)  * 96 ==> 192      
11 buffer            11                 4      (036.36%)  * 64 ==> 183394   [1] Incl. peak buffered memory and non-Value buffers
12 hash              4                  4      (100.00%)  * 88 ==> 1136     Incl. table memory
13 cwal_scope        8903               1      (000.01%)  * 80 ==> 80       [2]
14 cwal_kvp          3031               304    (010.03%)  * 32 ==> 9728     
16 x-string          32                 32     (100.00%)  * 56 ==> 1792     Not incl. external string bytes
17 z-string          4                  4      (100.00%)  * 56 ==> 242      Incl. string bytes

Totals:              29008              1157   (003.99%)[1]    ==> 259708  

[1] = % applies to allocation counts, not sizes. The total count does not account for buffer memory (re)allocations, but the total size does.

[2] = Scopes are almost always stack allocated and skew the statistics, so only allocated scopes are counted for totals purposes.

String interning tables:  4 page(s) of size 379 (3056 bytes) ==> 12224 bytes

General-purpose buffer capacity: 151 bytes

Property-sorting buffer capacity: 40 bytes

cwal_engine instance appears to have been stack-allocated (not by cwal_engine_init()).

Total bytes allocated for metrics-tracked resources: 272123

</pre></nowiki></blockquote>

The first two columns tells us what Value type the metrics are for. The third tells us how many "requests" we made to allocate a value of that type. The fourth tells us how many times cwal actually had to call into the system allocator to serve that request. The remaining columns are stats based on those values.

Sidebar: <tt>cwal_kvp</tt> holds key/value pairs. Each named value, including scope-level variables and named function parameters, are stored as a pair of Values in one of these (they're called key/value pairs, but the key is of the data type Value).

This particular setup only had to allocate memory for 4% of all Value creation requests. The rest was served either by the recycling subsystem or (in the case of some of the strings) the string interning subsystem. Not all scripts achieve a 25:1 ratio of "savings" in allocations, but larger scripts, in particular those which runs loops, tend to have very high ratios. Short-lived scripts tend to serve most requests from the system allocator.

Sidebar: these numbers do not include any of the [th1ish]-allocated memory. cwal manages the Values and generic buffers and th1ish manages the script-specific parts (much of it via cwal, so it's indirectly included here). My "best guess" is that the th1ish part takes up roughly the same amount of memory, though it currently has no metrics-collection mechanism like the cwal core does.

For comparison, disabling recycling and string interning leads to <em>much different</em> metrics:

<blockquote><nowiki><pre>
ValueTypeId/Name     CreationRequests   Actually allocated count * Value-type sizeof() ==> total bytes from allocator

3  integer           21044              10647  (050.59%)  * 48 ==> 511056   
4  double            101                87     (086.14%)  * 48 ==> 4176     
5  string            51152              51133  (099.96%)  * 48 ==> 2737213  Incl. string bytes
6  array             600                600    (100.00%)  * 88 ==> 58296    Incl. peak list memory
7  object            732                732    (100.00%)  * 64 ==> 46848    
8  function          233                233    (100.00%)  * 96 ==> 22368    
9  exception         49                 49     (100.00%)  * 72 ==> 3528     
10 native            2                  2      (100.00%)  * 96 ==> 192      
11 buffer            11                 11     (100.00%)  * 64 ==> 183842   [1] Incl. peak buffered memory and non-Value buffers
12 hash              4                  4      (100.00%)  * 88 ==> 1136     Incl. table memory
13 cwal_scope        8903               1      (000.01%)  * 80 ==> 80       [2]
14 cwal_kvp          3031               3031   (100.00%)  * 32 ==> 96992    
16 x-string          32                 32     (100.00%)  * 56 ==> 1792     Not incl. external string bytes
17 z-string          4                  4      (100.00%)  * 56 ==> 242      Incl. string bytes

Totals:              76996              66566  (086.45%)[1]    ==> 3667761 

[1] = % applies to allocation counts, not sizes. The total count does not account for buffer memory (re)allocations, but the total size does.

[2] = Scopes are almost always stack allocated and skew the statistics, so only allocated scopes are counted for totals purposes.

General-purpose buffer capacity: 151 bytes

Property-sorting buffer capacity: 40 bytes

cwal_engine instance appears to have been stack-allocated (not by cwal_engine_init()).

Total bytes allocated for metrics-tracked resources: 3667952
</pre></nowiki></blockquote>

The integers are very interesting there: only half required allocation. One might well reasonably ask, "how is that possible if recycling is disabled?" The trick is that the integer and double values -1, 0, and 1 are built-in constants, and never require allocation. This is an optimization which, in hindsight, paid off in spades. We can infer from these numbers that those three values make up half of the integers used by the script code. (Maybe that's because i often type "1" as a short-hand for "true"?)

Strings have a similar special case: length-zero strings are all the same string instance. So we can infer by these numbers that 0.04% of the string requests we for empty strings.

Likewise, the boolean, null, and undefined values are also built-in constants which (like the above-mentioned cases) do not partake in reference counting, scope ownership, or several other intricacies of the engine. Though they are used by clients just like any other values, refcounting and rescoping are actually no-ops for all built-in constant values. Internally, the determination whether a given Value pointer refers to a built-in constant requires only two pointer address comparisons, so it's fast. We do not keep a flag for this, but how we store the builtins allows us to determine their "built-in-edness" without one very quickly. Search [/finfo/cwal.c|cwal.c] for <tt>CWAL_BUILTIN_VALS_</tt> and <tt>V_IS_BUILTIN(V)</tt> for an explanation of how it works.

That concludes our introduction to cwal's memory system!
Z 26b7458ba815cf6109a0fdea068c0689