libfossil
Fossil is/is not...

Through porting the main fossil application into library form, the following things have become very clear (or been reinforced)...

Fossil is...

  • _Exceedingly_ robust. Not only is sqlite literally the single most robust application-agnostic container file format on the planet, but Fossil goes way out of its way to ensure that what gets put in is what gets pulled out. It cuts zero corners on data integrity, even adding in checks which seem superfluous but provide another layer of data integrity (i'm primarily talking about the R-card here, but there are other validation checks). It does this at the cost of memory and performance (that said, it's still easily fast enough for its intended uses). "Robust" doesn't mean that it never crashes nor fails, but that it does so with (insofar as is technically possible) essentially zero chance of data loss/corruption.
  • Long-lived: the underlying data format is independent of its storage format. It is, in principal, usable by systems as yet unconceived by the next generation of programmers. This implementation is based on sqlite, but the model can work with arbitrary underlying storage.
  • Amazingly space-efficient. The size of a repository database necessarily grows as content is modified. However, Fossil's use of zlib-compressed deltas, using a very space-efficient delta format, leads to tremendous compression ratios. As of this writing (September, 2013), the main Fossil repo contains approximately 1.3GB of content, were we to check out every single version in its history. Its repository database is only 42MB, however, equating to a 32:1 compression ration. Ratios in the range of 20:1 to 40:1 are common, and more active repositories tend to have higher ratios. The TCL core repository, with just over 15 years of code history (imported, of course, as Fossil was introduced in 2007), is only 187MB, with 6.2GB of content and a 33:1 compression ratio.

Fossil is not...

  • Memory-light. Even very small uses can easily suck up 1MB of RAM and many operations (verification of the R card, for example) can quickly allocate and free up hundreds of MB because they have to compose various versions of content on their way to a specific version. Tto be clear, that is total RAM usage, not _peak_ RAM usage. Peak usage is normally a function of the content it works with at a given time. For any given delta application operation, Fossil needs the original content, the new content, and the delta all in memory at once, and may go through several such iterations while resolving deltified content. Verification of its 'R-card' alone can require a thousand or more underlying DB operations and hundreds of delta applications. The internals use caching where it would save us a significant amount of db work relative to the operation in question, but relatively high memory costs are unavoidable. That's not to say we can't optimize a bit, but first make it work, then optimize it. The library takes care to re-use memory buffers where it is feasible (and not too intrusive) to do so, but there is yet more RAM to be optimized away in this regard.