whio  Update of "whio_udb_HOWTO"

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: 52b4436bf32c3de8f669c51e75af5b5111c65df5
Page Name:whio_udb_HOWTO
Date: 2011-05-23 18:50:58
Original User: stephan
Parent: f06f9d2efc50a0dc378eaf0b57e72e0ca0853e7f
Content

ACHTUNG: AS OF 20110523, THIS PAGE IS NOW MAINTAINED IN THE NEW WIKI: http://whiki.wanderinghorse.net/wikis/whio/?page=whio_udb_HOWTO

whio_udb HOWTO

Here's a quick intro on how to use whio_udb...

Step 1: Create a database...

The creation of a UDB is rather involved because we have to tell it:

  • The maximum key and value lengths.
  • The size of the hashtable.
  • The device to use as storage.
  • The functions we will use for hashing, comparing, and finding the length of keys and data. The API allows one to map names to sets of functions, and it uses this to record (in the DB) which function set is required for a given DB.

The formatting process looks like:

#include <wh/whio/whio_udb.h>

...

whio_udb * db = NULL; // DO NOT use an uninitialized ptr. Undefined results!
whio_dev * dev = whio_dev_for_filename("my.udb","w+");

whio_udb_opt opt = whio_udb_opt_empty;
opt.hashSize = 3779; // ideally a prime number
opt.keyLen = 16; // maximum length of keys
opt.dataLen = 128; // maximum length of values

int rc = whio_udb_dev_format( &db, dev, &opt, "string=string" );
if( rc ) { // Error! We must clean up the dev object:
   dev->api->finalize(dev);
   return ...;
}
// ... the db object now owns the dev object ...
// ... when we're done, close the db ...
whio_udb_free(db);

The "string=string" argument to whio_udb_dev_format() is the name of a built-in function set suitable for use with string-like keys and values. If we wanted case-insensitive key handling we could use "string:nocase=string". The function set name for a db must be resolvable via whio_udb_funcs_parse().

Step 2: Opening a database

Opening a database is simpler than formatting one. The sizes/ranges are stored in the db, so we do not need to provide them. Abstractly it also requires that we tell it which functions we will use, but the formatting process stores the set as a string name. That string, in conjunction the whio_udb_funcs_register() (and friends) API, can be used to load the function set on demand.

whio_dev * dev = whio_dev_for_filename("my.udb","r+");
whio_udb * db = NULL;
int rc = whio_udb_dev_open( &db, dev );
if( rc ) { // ... Error! We must clean up the dev object:
    dev->api->finalize(dev);
    return ...;
}
//... use db, and close it when done ...
whio_udb_free(db);

Step 3: Insertion and Removal

Inserting is trivial:

bool replaceIfItAlreadyExists = false;
int rc = whio_udb_insert( db, key, value, replaceIfItAlreadyExists );

The key and value parameters are (void const *) and must be of the type expected by the database's hashing and length-counting functions. The db i/o does no encoding of the key/data parts of a record, treating them as raw byte data, but all metadata for a record is encoded into a platform-neutral format.

Removal looks like:

int rc = whio_udb_remove( db, key );

Removal fails with the error code whio_rc.NotFoundError if no such entry exists. In most applications we don't care if removal really succeeds or not (it is assumed to just always works or fails because the resource did not exist). In this context, however (like in most i/o-driven contexts), a failure other than whio_rc.NotFoundError can really mean the removal failed (or partially failed) at the storage level.

Step 4: Searching

Because of buffering and threading-related issued, searching requires that the user allocate an object which can be used to store search results. These objects serve these purposes:

  • They hold the memory used for the encoding/decoding buffer for i/o on the record.
  • They allow multiple records to be opened concurrently without stepping on the db's own internal buffer.

In their simplest form are used like this:

whio_udb_record * r = whio_udb_search( db, key, NULL );
if( r ) { // we got a match
    ... use r, then release its memory ...
    whio_udb_record_free(db, r);
}

Internally the db recycles record allocations, so allocating and deallocating for each search is not as expensive as it might sound (it is, more often than not, an O(1) operation).

However, because allocating for every search is a real drag, we can instead do the following:

// Create a re-usable record:
whio_udb_record * RecBuf = whio_udb_record_alloc(db);

// Search using it:

whio_udb_record * r = whio_udb_search( db, key, RecBuf );
if( ! r ) { ... no match ... }
else {
  assert( r == RecBuf ); // just to demonstrate the return semantics
  void const * key = NULL;
  whio_size_t keyLen = 0;
  void const * data = NULL;
  whio_size_t dataLen = 0;
  key = whio_udb_record_key( r, &keyLen );
  data = whio_udb_record_data( r, &dataLen );
  ... use key and data ...
  ... r _is_ RecBuf. No need to free it ...
}
// But we have to free it when we're done:
whio_udb_record_free( db, RecBuf );

Each time the RecBuf object is used for a search, its memory is over-written by the decoding of the record from storage. On a search miss, the contents of the RecBuf are unspecified - they may contain data from the last search or a different record with the same hash key which was read during hash collision resolution.

The memory required for a single record object is directly proportional to (but a fixed amount larger than) the maximum key/value lengths. Clients can query the size using whio_udb_sizeof_record(db) (all records in a given db are the same size). Note however, that the db allocates them in groups, with an unspecified (but fairly small) number of objects per group, so the size of a single record cannot be used to 100% accurately predict uDB's memory usage, but it provides a good indicator.

The buffer used by the db for the i/o encoding/decoding is included in an allocated record's memory, but not in a stack-allocated record, since the key/data lengths are not known at compile time. Partly because of this limitation, the record class is opaque to clients. Internally we do some embarrassingly geeky memory allocation tricks to keep the number of calls to malloc() to a minimum.