whio  Artifact Content

Artifact c7d1db172c6be5f13ac438db24a821aa90422ec7:

Wiki page [whio_ht] by stephan 2011-05-23 18:50:57.
D 2011-05-23T18:50:57.123
L whio_ht
P 5aaeb8967ac3990b837b257862819a47a5ce53d4
U stephan
W 3159
<strong>ACHTUNG: AS OF 20110523, THIS PAGE IS NOW MAINTAINED IN THE NEW WIKI:</strong> [http://whiki.wanderinghorse.net/wikis/whio/?page=whio_ht]

<h1>whio_ht: a whio_dev-stored hashtable</h1>

Header: <tt>&lt;wh/whio/whio_ht.h&gt;</tt>

whio_ht is similar to [whio_udb], except that the keys and values may have variable lengths (essentially of unbounded size). It has the same overall performance characteristics of whio_udb: insertion, searching, and removal are all amortized O(1). Unlike whio_udb, this implementation has constant memory costs.

whio_ht is highly optimized for low memory usage, and a given instance requires far less than 1000 bytes of dynamically-allocated memory. As a basis for comparison, on my system <tt>sizeof(FILE)</tt>==216 and <tt>fopen()</tt> allocates about twice that. That said, <em>in-memory</em> hashtables (using a RAM-based [whio_dev] as storage) are <em>not</em> memory-light, due to the overhead of holding the various block-level and linking-related information in RAM (but they should be pretty fast).

It might interest one to know that whio_ht is basically a wrapper around a [whio_vlbm] object, and the VLBM API is used for the underlying block management.

<h2>Primary Features</h2>

   *  Can use any [whio_dev] device as its storage. (e.g., files, in-memory devices, subdevices, or embedded in a [whio_epfs] instance).
   *  Excellent overall performance: insertion, removal, and searching are all amortized O(1).
   *  Tiny memory footprint and constant memory usage (unless the user explicitly tells it to allocate memory for keys/values). It requires far less than 1000 bytes of dynamically-allocated memory.
   *  Well-documented.
   *  Optionally supports a client-provided mutex.
   *  Can operate in read-only or read-write modes.
   *  Data format is platform- and endianness-neutral. Encoding of keys/values, if necessary, is up to the client.
   *  Supports iteration over all records.

<h2>Primary Misfeatures</h2>

   *  It's very young (it was made public on 20101203) and needs a lot more testing.
   *  All keys use the same hash and comparison functions (provided by the client, or use one of the ones provided by the library), so mixed-type keys cannot be reliably used. (This is normal for hashtables, though.)
   *  It is not very storage-space-efficient for data sets made up only of very small keys and values. If the average (key+value) length is less than about 50 bytes,  [whio_udb] might be a more space-efficient choice.
   *  A poorly-timed SIGINT (or similar) can leave the table in an inconsistent state.
   *  Cannot actually free storage for removed records. Those blocks are marked for recycling, but the storage size never shrinks. That said, it only allocates new blocks when necessary.

Some of the more notable TODOs, in no particular order:

   *  Storage-level locking.
   *  Currently refuses to compare on-storage keys larger than a couple kb, because doing so would require us to dynamically allocate space for the comparison or change the hashing interfaces to allow incremental hashing.


See [whio_ht_HOWTO].

Z dbfb0c53628480a1566cf4ccfde185cc