whio  whio_epfs_namer

ACHTUNG: AS OF 20110523, THIS PAGE IS NOW MAINTAINED IN THE NEW WIKI: http://whiki.wanderinghorse.net/wikis/whio/?page=whio_epfs_namer

See also: whio_epfs

whio_epfs_namer

ACHTUNG: This API was introduced in late December 2010 and is still highly beta. The whio_epfs_namer_ht implementation is believed to work as designed, but it's still beta. You Have Been Warned.

whio_epfs_namer is an API which allows whio_epfs instances to associate names with inodes. It is designed such that the details of how inode names are set, fetched, removed, and searched for are unknown to the whio_epfs core, which allows us to experiment with different approaches for handling inode-to-name mappings without touching the core code. In libwhefs, EPFS' genetic father, the naming support was built into the core library, but after trying three different naming models i wasn't happy with (due to the memory-vs-performance-vs-complexity tradeoffs), when it came time to implement whio_epfs as the new basis for libwhefs, i decided to leave the details of inode naming out of the core and delegate it to higher-level code. The EPFS core now only concerns itself with those details with a few lines of code, where it interacts with the generic whio_epfs_namer interface. This allows us, for example, to swap out inode-to-name mapping implementations without changing the EFS file format or losing pseudofile-contained data within an EFS.

The "namer" interface provides for the following features:

  • Setting/getting/removing the name for a given inode ID.
  • Optional: Searching for an inode by name.
  • Optional: Iterating over all names in a for-each style.

The interface does not require that all operations be implemented by a given implementation. e.g. the iteration routine might not be feasible for a given implementation. Likewise, search-by-name normally requires a two-way mapping between inode IDs and their names, which might be overkill for a particular client who will never need to search by name.

Names will normally be strings, but the library allows binary names as well. While i can think of few purposes for this, aside from using UTF-16 strings, it can be done if one needs to (assuming the namer implementation being used supports it).

Each EPFS can have, at most, one namer associated with it. Clients use the function whio_epfs_has_namer() to determine if a given EFS has a namer associated with it. The functions whio_epfs_name_get(), whio_epfs_name_set(), whio_epfs_name_search() to get and set the names. Un-setting/removing the name is done by setting its name to a NULL or length-zero string. whio_epfs_name_foreach() can be used to iterate over all names. Note, however , that the search and for-each operations may or may not be implemented by any given concrete namer implementation. For information on installing a namer, see the section on that topic further down this page.

Where does a namer store it the inode-to-name mappings? The API leaves that to implementations. Some examples off the top of my head:

  • In one or more EFS pseudofiles, using whatever raw data format they want.
  • In memory.
  • In an external data source (e.g. an sqlite3 database, though that would have a significant memory overhead).

How Namers are Stored and Loaded

Namers are made available to EPFS by registering them with the library. This maps their name to the set of functions which implement their API-specified behaviours. This registration is necessary so that when opening an EFS we can find and load the appropriate namer implementation. Clients can register their own, but it is hoped that the provided ones (described down further on this page) will suit most client's needs.

The EPFS data format has a small place reserved for storing namer metadata, but the EPFS is really only interested in one thing: the namer's unique identifier (i.e. its name). When preparing a namer for a given EPFS instance, the core initializes the namer by calling a namer-specific initialization routine which is part of the namer interface. If initialization succeeds then it writes the namer's name and its private metadata (which is initialized by the namer init routine) to its reserved spot in the EFS. When opening an EFS, we check to see if a namer has been set. If one is found, we check the namer registration database for that namer. If found, we instantiate and initialize it, passing it the metadata which the initial namer setup code wrote to the EFS. The metadata area is quite small, but gives the namer a place to store, for example, the inode ID(s) where its internal data are stored.

Setting up a Namer

Setting up a namer abstractly looks like:

  • Format an EPFS.
  • Register a custom namer class or fetch a built-in namer.
  • Tell the EPFS to use the given namer.

In code that looks like:

// First, try to load a registered namer class:

whio_epfs_namer_reg reg = whio_epfs_namer_reg_empty;
int rc = whio_epfs_namer_reg_search( "whio_epfs_namer_ht", &reg );
if( 0 != rc ) { ... error ... }

// Second, apply it to a freshly-formatted EFS:
// (Assume that 'fs' is such an an object.)

rc = whio_epfs_namer_format( fs, &reg );
if( 0 != rc ) { ... error ... }

If that fails, the EFS is "very probably" still usable, but it could not initialize the namer for some reason. For example, this operation will fail on a read-only EFS, or the EFS might fill up while the namer is storing its state there.

If the formating step works, naming operations on EFS instance will use the namer we loaded in the first couple of lines. When we close the EFS and open it later on, that namer will be used as well, so that we can fetch names stored in previous sessions. The minor caveat here is: if clients register their own namers, only applications which see those registrations can use the namer. If such registrations cannot be found then other applications using that same EFS will not have access to the inode-to-name mappings.

If, while re-opening an EFS, there is an error while initializing the stored namer, the EFS will disregard the namer and will continue to work without it. In that case (or if the EFS) has no namer, all of the various name-handling routines will return whio_rc.UnsupportedError.

Current Namer Implementations

As of 20101218 we have the following implementations. The names given here are the names one should use when calling whio_epfs_namer_reg_search() or passed to whio-epfs-mkfs' --namer=xyz option.

Pedantic warning: even though the interface allows for binary file names, none of the implementations listed here explicitly support them.

whio_epfs_namer_array is a "transient" namer, intended primarily as initial proof of concept of the whio_epfs_namer API. Its entries are all stored in memory and are lost when the EFS is closed. It has memory costs proportional to the number of inodes and the lengths of their names. This namer imposes no specific hard limit on the length of names.

whio_epfs_namer_ht uses whio_ht to store the naming information as a pseudofile in the EFS. It has constant memory costs except for one case, and getting/setting/searching are amortized O(1) (interestingly, though, search by ID requires two O(1) lookups, whereas search by name only needs one). The one case where extra memory is required is the for-each iteration operation, as it must have memory to hold the name it passes to the client's for-each callback function. In that case, a buffer is reserved for the duration of the for-each loop, and it may be expanded as larger names are encountered. The maximum size of the buffer will be proportional to the longest name it traverses. This namer imposes no specific hard limit on the length of names, but the underlying whio_ht code currently (Dec. 2010) has a built-in upper limit of a few kb for hashtable keys. Any "reasonable" name will work with the current code, as long as it is a valid C-style string. Binary names are not supported here because the internal hashtable is set up for string keys and the hashing/comparison functions may not work as-is with binary data.

Registering Custom Namers

To register your own, you must:

  • Implement the functions specified by the whio_epfs_namer_api interface.
  • Populate a whio_api_namer_api object which points to those implementations. This object is, in practice, a static/const defined only in one implementation file (and not leaked to the client via a public API).
  • Register it as shown in the following example.
static const whio_epfs_namer_api myAPI = { ... member functions ... };
whio_epfs_namer_reg reg = whio_epfs_namer_reg_empty;
reg.api = &myAPI;
reg.alloc = namer_allocation_function;
reg.free = namer_deallocation_function;
strcpy( (char *)reg.label, "my_namer" );
int rc = whio_epfs_namer_reg_add( &reg );
if( 0 != rc ) { ... error ... }

(The use of strcpy() and a cast here is a bit unsightly, but the reg object's label member is of type (unsigned char[some_fixed_size]) for internal reasons.

Registration error conditions include:

  • If any of reg's members are NULL or the first byte of reg.label is 0 then whio_rc.ArgError is returned.
  • There is limited internal space for registrations, and if the list fills up then whio_rc.AllocError is returned (though there is no allocation, an allocation error is abstractly what happens).
  • If the list already has an entry with the requested name, whio_rc.AccessError is returned.

There is a hard-coded limit to the size of the namer's label. whio_epfs_namer_reg::label is an array of that size, and one must be sure not to write outside of its bounds. The library guarantees at least 128 bytes for the label, but the label space is also used to store metadata associated with a given namer, so the namer's name itself must be small enough to leave space for the metadata. See the API docs for full details.

Note that the library copies registration object's contents, so it need not be static/shared, but the client may want to keep it around in static/const memory so that he can reference elsewhere (e.g. to pass it to whio_epfs_namer_format(). It is not strictly necessary to do so, however, because whio_epfs_namer_search("my_namer",...) can be used to find the registered implementation later on. The only part which absolutely must be stable after registration is the whio_epfs_namer_reg::api member (a whio_epfs_namer_api *) of the registration object (reg.api in the above example). Its memory must be valid for the lifetime of the namer, and it is normally a static/const object defined in a private implementation file.

TODOs

  • There is not yet an API for removing a namer from a given EFS. This is bad because if we somehow hose the namer we cannot replace it, and if we do not want it, we cannot remove it.
  • Not all of the various EPFS tools use the naming information yet (or support them in a limited form).