whio  Update of "whio_epfs_consistency"

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.

Overview

Artifact ID: b8db45cefec813bc3a33eba497efd2492b3520a2
Page Name:whio_epfs_consistency
Date: 2011-05-23 18:50:54
Original User: stephan
Parent: 36918588c925baae69b95322b6f249a8b9cc2cc2
Content

ACHTUNG: AS OF 20110523, THIS PAGE IS NOW MAINTAINED IN THE NEW WIKI: http://whiki.wanderinghorse.net/wikis/whio/?page=whio_epfs_consistency

EFS Consistency

The whio_epfs API does not have a whio_epfs_fsck() function, a consistency-checking tool with which to confirm that an EFS is completely intact. Eventually i would like to add such a tool, but it is more complex a problem than i'm up for tackling for the time being. This page describes what EPFS does offer in terms of EFS container consistency...

Built-in/Automatic Consistency Checks

All of the metadata in an EFS is stored in blocks. Each different type of metadata block has a fixed size. When we write, e.g., inode metadata to the EFS we encode the inode to such a memory block and write that block using a single seek followed by a single write operation. For reads we do the converse: read into a small block of raw memory and decode the metadata from that block. All of the data is encoded using an endian-neutral format, and encoded numbers have fixed sizes so there are no surprises when switching between 32- and 64-bit builds.

The encoded metadata blocks are fairly resistant to corruption, in the sense that it is quite unlikely that we could successfully decode a corrupted metadata block. Each metadata block is preceded by a consistency-checking byte (with a fixed value) and each field stored within that block also has a consistency-checking tag byte. (Any given metadata block typically has 3-5 fields or so.) If any of those tag bytes are not what we expect, deserialization of the metadata fails and the API is likely to return a whio_rc.ConsistencyError (though it might show up as a different error code in some higher-level APIs). It is possible that the tag bytes are intact but their values become corrupted, in which case we enter the realm of Undefined Behaviour, but to check for that we would need to checksum every metadata block. (That might be an interesting addition someday.)

Given that, the fundamental consistency checking of an EFS is baked in to every read/write operation that it performs, with the exception that it does not consistency-check client-provided bytes (pseudofile content bytes). (i'm not sure how we might do that without having to checksum the whole file each time.)

However, it is easy to envision corruption cases which might not disrupt normal use or may cause intermittent problems. It's those types of problems which should be findable by a filesystem checking ("fsck") application.

Where's whio_epfs_fsck()?

It doesn't exist yet, and i may never get around to writing it.

Here's a list of things we could potentially check for with such a tool:

  • "Orphaned" blocks (marked as used but have no owning inode)
  • "Extra" namer entries: unused inodes which nevertheless have names assigned to them.
  • The real length of an inode (number of blocks) compared to its virtual length (its recorded byte count).
  • Multi-linked blocks: a block to which more than one other block points.
  • Unused inodes which nonetheless have blocks associated with them.

i've never actually seem any of those problems occur, but i know (because i wrote the code), that they are potentially possible when e.g. a certain operation fails halfway through due to an I/O error or an untimely C signal. Some operations have to update several different metadata blocks, and a serious failure (e.g. i/o, allocation, or consistency checking errors) in the middle of such an operation can easily lead to inconsistent state.

Having some sort of journal file would of course be cool, but that would add a lot more complexity to the code. However, i do think that most of the journaling code could be abstracted into a custom whio_dev implementation which we would shove between an EPFS instance and its underlying storage.