whio  Artifact Content

Artifact 9b13122ae77e5199fd3c4af7b2100513e124caed:

Wiki page [whio_epfs] by stephan 2011-05-23 18:50:54.
D 2011-05-23T18:50:54.564
L whio_epfs
P ae12fb187c760a45b1ae4ef195e92daf6931f2e2
U stephan
W 9424
<strong>ACHTUNG: AS OF 20110523, THIS PAGE IS NOW MAINTAINED IN THE NEW WIKI:</strong> [http://whiki.wanderinghorse.net/wikis/whio/?page=whio_epfs]

See also: [whio_dev], [whio_epfs_tools], [whio_epfs_namer]


Files: <tt>whio_epfs.h</tt> (public API), <tt>whio_epfs_config.h</tt> (configuration options), <tt>whio_epfs_internal.h</tt> (private/internal API)

whoi_epfs (hereafter called EPFS, where EPFS=<em>Embedded Pseudo-FileSystem</em>) is a component of the whio i/o library, originally written to replace most of the internals of [http://code.google.com/p/whefs|libwhefs]. libwhefs is this library's genetic father, and that library's web pages describe, in detail, the concepts of EPFS.

EPFS basically provides basic support for an "embedded" filesystem. A so-called container file (a.k.a. EFS container) acts as storage for a number of pseudofiles. The EFS container file, as well as the pseudofiles within an EFS, use the [whio_dev] interface for all i/o.

An EPFS container is similar to an archive file, like a ZIP or TAR file, except that it allows random read/write access to the files within the archive. They can grow and shrink just like normal files, but they live entirely within the EFS container file.

See [whio_epfs_tools] for a list of tools for manipulating EPF containers.

<h2>Library Features</h2>

  *  Simple to use and pedantically-well documented. See the API docs in the headers.
  *  Provides an API for creating and manipulating EFS container files and "pseudofiles" within those containers.
  *  An EFS can operate in r/o or r/w modes. Pseudofiles can be r/o or r/w (if the EFS is r/w).
  *  All i/o, for both the underlying storage and embedded files, is via the [whio_dev] interface. This allows an EFS container to be stored in any storage supported by whio_dev (including a file, in memory, or in a pseudofile embedded in another EFS).
  *  The EFS container file format is bitness- and endian-neutral. EFSs created on one platform can be read on other platforms.
  *  Very low memory usage (under 1k concurrently for typical uses) and an explicit focus on eliminating calls to <tt>malloc()</tt> where possible (e.g. by using stack-allocated memory whenever possible). Optionally supports using a client-supplied de/allocator for its internal allocations, allowing an EPFS to operate without internally using <tt>malloc()</tt> at all.
  *  Fairly good all-around performance. All of the significant algorithms are (as of 20100312) amortized O(1). :-D
  *  Gets run through [http://www.valgrind.org/|Valgrind] fairly often to unsure that it doesn't leak memory.
  *  Allows clients to add their own flags (32 bits of data) to each pseudofile. This can be used to store the type of an entry's data, Unix filesystem access rights, or similar.
  *  Comes with [whio_epfs_tools|several tools] with which one can create and manipulate EFS containers. These tools also act as sample applications for how to use the library.
  *  Thorough error checking and validation of ranges (and the like). There are no known crash cases if one uses the API in conformance with the API docs.
  *  It's one of only two open source projects of its type (that i'm aware of), the other being [http://code.google.com/p/whefs|libwhefs] (where EPFS was born).

<h2>Library Misfeatures</h2>

  *  The file format is likely to undergo incompatible changes from [/timeline?y=ci|commit to commit] until the library stabilizes.
  *  The upper limit on the number of pseudofile entries an EFS may contain must be specified when the EFS container is created, and cannot be modified later on. (We now have the components we need to create an unbounded inode table, but that will require a notable amount of re-architecting, so implementing it not a high priority.)
  *  Does not directly support the notion of OS-specific access rights. Within an EFS container, there is no concept of access rights except the read-only resp. read/write modes. High-level access rights were deemed superfluous for an embedded context.
  *  It is not legal for more than one thread to use an EPFS object, and doing so <em>will</em> corrupt the EFS state at some point.
  *  Currently [whio_epfs_locking|only skeletal support for file locking]: the library locks the whole container file, as opposed to locking individual pseudofiles. This means that processes may wait on a lock before trying to us the EFS. (Without this, multiple writers would eventually cause corruption.) Improving the granularity of the locking (e.g. only lock opened inodes and blocks) is on the TODO list.
  *  A poorly-timed C signal during a <tt>write()</tt> or <tt>flush()</tt> could lead to undefined behaviour, including EFS corruption (lost blocks, orphaned or mis-linked inodes, and the like).
  *  On Linux systems (and possibly others) the system-level functions <tt>gmtime_r()</tt> and <tt>mktime()</tt> (which we use for updating inode timestamps) both call <tt>malloc()</tt>. Thus valgrind may report that this code allocates <em>way</em> more often than it really does. In my simple tests, those functions account for more than 98% of all calls to <tt>malloc()</tt>.

<h2>Code Status</h2>

EPFS is considered beta. It seems to work, but the nature of the problem means there are lots of room for errors. Don't use it for data you cannot afford to lose. That said, it works quite well for me, and i've not yet experienced any data loss or corruption.

<h2>Why EPFS does not support inode names</h2>

(See the [whio_epfs_namer] page for information about naming pseudofiles. The information below is not quite irrelevant, but some support for naming inodes was added in December 2010, and the text below is not quite up to date.)

The most notable missing feature of EPFS is the mapping of filenames to pseudofiles. Instead, it only maps unique sequential IDs to pseudofiles.

The justifications for this lacking feature is as follows...

There are many options for mapping names to inodes, with varying storage and memory requirements, as well as widely varying performance characteristics. Within [http://code.google.com/p/whefs|libwhefs] (this library's genetic father), an estimated 20-25% of the code has _something_ to do with inode names (loading, saving, searching, or caching them). Since, however, i haven't been 90% (or more) satisfied with any one inode naming solution (and i've tried a small handful of them), i decided to refactor whefs into two layers:

  *  whio_epfs provides all of the primary basic features. This includes management of inodes and data blocks, as well as random read/write access to inodes via the [whio_dev] interface. This part is inherently memory-light and has well-definable performance characteristics.
  *  Higher-level clients (like whefs) can then provide inode naming support (and other features) on top of that using whatever mechanisms they wish. e.g. if someone could create an on-disk b-tree (or similar search structure) which uses [whio_dev] as its storage, we could store the names as a search tree in the first inode of the EFS. (My attempts at implementing such a class have failed. i've been unable to find a good/small FILE-based b-tree i can use as a code base, and my own understanding of the data structures and algos is too poor to hack it together myself.)

Benefits of the restructuring include:

  *  The EPFS layer now takes care of all the trickiest parts.
  *  The EPFS layer can get by with very, very little memory (under 500 bytes for small use cases). Its memory costs depend only on how many pseudofiles are currently opened and how many data blocks each of those inodes contains. If inodes are never opened, it never has to dynamically allocate any memory.
  *  The EPFS layer has support for its own internal memory allocator which can use client-provided memory (e.g. a stack-allocated or static buffer) for its internal allocations, allowing it to run without internally calling <tt>malloc()</tt>.
  *  We can experiment with different name-to-inode mapping approaches in whefs without touching the core EFS code, which should help keep the EFS code stabler.
  *  Storage locking got simpler because of a locking abstraction added to the the core [whio_dev] API. (But the record-level locking turns out to be hyper-complicated because EPFS allows a client to open any given inode an arbitrary number of times in read-only or read-write modes.)

Some other side effects (not necessarily benefits) of the restructuring include:

  *  The EPFS container must be embedded within the whefs container (e.g. as a whio subdevice). whefs will then manage only the parts of the container which have to do with its inode/names mapping, and leave the rest to EPFS.
  *  whefs will be able to add blocks dynamically to the EFS, but the inode count is still set when the EFS container is created.
  *  We can tell EPFS to fall back to system memory allocators if the client-supplied [whio_epfs_mempool|memory pool] runs out, or we can tell it to fail if it needs more memory than we give it. This will give us fairly flexible control over EPFS' memory consumption.
  *  Though the whefs public API will likely not change much, the internals will have to be completely rewritten (but will require far, far less code this time, since EPFS takes over most of the features and core data structures).

Z 76d859b831e0f8e0004cae71b428aa56