whio  whio_ht

ACHTUNG: AS OF 20110523, THIS PAGE IS NOW MAINTAINED IN THE NEW WIKI: http://whiki.wanderinghorse.net/wikis/whio/?page=whio_ht

whio_ht: a whio_dev-stored hashtable

Header: <wh/whio/whio_ht.h>

whio_ht is similar to whio_udb, except that the keys and values may have variable lengths (essentially of unbounded size). It has the same overall performance characteristics of whio_udb: insertion, searching, and removal are all amortized O(1). Unlike whio_udb, this implementation has constant memory costs.

whio_ht is highly optimized for low memory usage, and a given instance requires far less than 1000 bytes of dynamically-allocated memory. As a basis for comparison, on my system sizeof(FILE)==216 and fopen() allocates about twice that. That said, in-memory hashtables (using a RAM-based whio_dev as storage) are not memory-light, due to the overhead of holding the various block-level and linking-related information in RAM (but they should be pretty fast).

It might interest one to know that whio_ht is basically a wrapper around a whio_vlbm object, and the VLBM API is used for the underlying block management.

Primary Features

  • Can use any whio_dev device as its storage. (e.g., files, in-memory devices, subdevices, or embedded in a whio_epfs instance).
  • Excellent overall performance: insertion, removal, and searching are all amortized O(1).
  • Tiny memory footprint and constant memory usage (unless the user explicitly tells it to allocate memory for keys/values). It requires far less than 1000 bytes of dynamically-allocated memory.
  • Well-documented.
  • Optionally supports a client-provided mutex.
  • Can operate in read-only or read-write modes.
  • Data format is platform- and endianness-neutral. Encoding of keys/values, if necessary, is up to the client.
  • Supports iteration over all records.

Primary Misfeatures

  • It's very young (it was made public on 20101203) and needs a lot more testing.
  • All keys use the same hash and comparison functions (provided by the client, or use one of the ones provided by the library), so mixed-type keys cannot be reliably used. (This is normal for hashtables, though.)
  • It is not very storage-space-efficient for data sets made up only of very small keys and values. If the average (key+value) length is less than about 50 bytes, whio_udb might be a more space-efficient choice.
  • A poorly-timed SIGINT (or similar) can leave the table in an inconsistent state.
  • Cannot actually free storage for removed records. Those blocks are marked for recycling, but the storage size never shrinks. That said, it only allocates new blocks when necessary.

Some of the more notable TODOs, in no particular order:

  • Storage-level locking.
  • Currently refuses to compare on-storage keys larger than a couple kb, because doing so would require us to dynamically allocate space for the comparison or change the hashing interfaces to allow incremental hashing.

Examples

See whio_ht_HOWTO.