whalloc  Update of "WhallocBasics"

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: 995bf747824ea49dc8bd6a2c2c1ec957a319430f
Page Name:WhallocBasics
Date: 2010-03-01 15:12:36
Original User: stephan
Parent: a067be3e91cc26dc8c4eceae1c1747e811ba714c
Content

(Editorial note: this page was written when the library had only two allocators, with no aspirations for any others. We now have several allocators, listed in the TableOfContents page.)

whalloc Basics

This library is intended for use where one does not wish to allocate dynamic memory, due to application requirements or system constraints. The client provides a memory space (typically a stack-allocated char buffer), which this API can then divide up for use with individual objects.

Using the library looks a bit like:

#include <whalloc.h>
...
typedef struct my_type_ {
  int x;
  int y;
  struct my_type_ * next;
} my_type;
whalloc_bt pool = whalloc_bt_empty;
enum { BufLen = 1024 * 8 };
unsigned char buffer[BufLen];
whalloc_size_t blockSize = sizeof(my_type);
int rc = whalloc_bt_init(&pool, buffer, BufLen, blockSize );
if( whalloc_rc.OK != rc ) { ... error ... }
my_type * m = whalloc_bt_alloc(&pool, sizeof(my_type));
// ^^^ m now lives somewhere inside of buffer
...
whalloc_bt_free( &pool, m ); // frees only m
whalloc_bt_drain( &pool ); // Clears pool state. It then can be re-used.

Notice that we pass a block size to the initializer function. The allocator manages memory in blocks of that size, and any given allocation will internally mark at least that much memory as used. Internally it divides the buffer into roughly (bufferSize/blockSize) chunks. Thus a small block size, relative to the buffer size, requires more management than a larger block size (see below).

Semantically, this API works like the standard malloc() and free() functions. It just happens to use an alternate data store with very frugal memory needs.

Speaking of memory: if this API manages tables of allocated objects, doesn't it need memory to do so? Yes - it reserves part of the client-provided memory range! This allows it to avoid allocating any dynamic resources.

We have two different implementations. Here is an overview of how they use memory:

  • The hashtable variant (class whalloc_ht) builds up a near-optimal hashtable in the user-supplied memory space, reserving two bytes per data block. This makes is very space-inefficient for small block sizes (and it cannot handle a block size of 1). Assuming WHALLOC_BITNESS=16, a block size of 8 requires about 20% memory overhead, whereas a block size of 16 requires about 12%, 32 about 6%, the overhead approximately halving for every doubling of the block size.
  • The bitmap variant (class whalloc_bt) also uses the client-supplied memory space, but only requires enough reserved bytes to store two bits of information per memory block. A block size of 8 has only a 3% memory overhead and the worst case (block size=1) has 26%. The overhead approximately halves for each doubling of the block size. A block size of 64 or higher takes up less than half of one percent of the client-provided memory. If the bits needed are smaller than some internal buffer size (default=32 bytes=128 block entries), it will use its internal buffer, leaving the full user-supplied space available for allocation (except possibly for the final block, depending on how the division works out).

In terms of speed, they're about equivalent. Their underlying algorithms are roughly the same, the just manage the memory differently. For de/allocation of memory chunks of blockSize or smaller requires O(1) time for the average case. It is possible, with some rather contrived de/allocation ordering, to make it run as slow as O(N) (N based blockSize and the requested de/allocation size), but that is worst/pathological case.

De/allocation of chunks larger than one block have a linear component, as they must check ranges of blocks to see if they are taken/free. The bitmap variant can theoretically, with the inclusion of a few optimizations (on the TODO list), outperform the hashtable variant in this regard.

Both variants use a first-fit allocation algorithm, and use internal hints to speed up the search process to O(1) for the most common cases (where objects are deallocated in reverse order of their allocation). When the user frees memory allocated via this API (via whalloc_ht_free() or whalloc_bt_free()), locating the memory management data is always O(1) and freeing it is O(N), where N is the number of blocks associated with that memory.

Each allocator instance can manage, at most, 2^(WHALLOC_BITNESS) memory, and no single allocation can be larger than 2^(WHALLOC_BITNESS-1). See WhallocConfig for information about WHALLOC_BITNESS.

Overview of memory costs

Here is an overview of the amount of memory (in percent) each allocator class typically needs to reserve for its own use. These number are approximate, and depend slightly on the memory buffer size.

Block Size

whalloc_bt allocator whalloc_ht allocator
1 25-26% not supported
2 11-13% 50%
3 8-9% 40%
4 6-7% 33%
8 3-4% 20%
16 1.5% 11-12%
32 0.8% 6%
64 0.4% 3%
128 0.2% 1.5%

Obviously, whalloc_bt is much more memory-friendly (and it supports a realloc operation), and should probably be the default choice. There are no compelling reasons to choose whalloc_ht over whalloc_bt, and the omission of a realloc operation for whalloc_ht might be reason enough alone not to use it. The primary difference in the two is that the former stores the actual size of each allocation, which is sometimes useful for debugging and makes multi-block deallocation simpler compared to whalloc_bt's chain-walking approach.