libfossil
Fossil Architecture Overview

An introduction to the Fossil architecture.

These docs are basically just a reformulation of other, more detailed, docs which can be found via the main Fossil site, e.g.:

Fossil's internals are fundamentally broken down into two basic parts. The first is a "collection of blobs." The simplest way to think of this (and it's not far from the full truth) is a directory containing lots of files, each one named after the SHA1 hash of its contents. This pool contains ALL content required for a repository - all other data can be generated from data contained here. Included in the blob pool are so-called Artifacts. Artifacts are simple text files with a very strict format, which hold information regarding the idententies of, relationships involving, and other metadata for each type of blob in the pool. The most basic Artifact type is called a Manifest, and a Manifest tells us, amongst other things, which of the SHA1-based file names has which "real" file name, which version the parent (or parents!) is (or are), and other data required for a "commit" operation.

The blob pool and the Manifests are all a Fossil repository really needs in order to function. On top of that basis, other forms of Artifacts provide features such as tagging (which is the basis of branching and merging), wiki pages, and tickets. From those Artifacts, Fossil can create/calculate all sorts of information. For example, as new Artifacts are inserted it transforms the Artifact's metadata into a relational model which sqlite can work with. That leads us to what is conceptually the next-higher-up level, but is in practice a core-most component...

Storage. Fossil's core model is agnostic about how its blobs are stored, but libfossil and fossil(1) both make heavy use of sqlite to implement many of their features. These include:

  • Transaction-capable storage. It's almost impossible to corrupt a Fossil db in normal use. sqlite3 offers literally the most robust general-purpose file format on the planet.
  • The storage of the raw blobs.
  • Artifact metadata is transformed into various DB structures which allow libfossil to traverse historical data much more efficiently than would be possible without a db-like infrastructure (and everything that implies). These structures are kept up to date as new Artifacts are stored in a repository, either via local edits or synching in remote content. These data are incrementally updated as changes are made to a repo.
  • A tremendous amount of the "leg-work" in processing the repository state is handled by SQL queries, without which the library would easily require 5-10x more code in the form of equivalent hard-coded data structures and corresponding functionality. The db approach allows us to ad-hoc structures as we need them, providing us a great deal of flexibility.

All content in a Fossil repository is in fact stored in a single database file. Fossil additionally uses another database (a "checkout" db) to keep track of local changes, but the repo contains all "fossilized" content. Each copy of a repo is a full-fledged repo, each capable of acting as a central copy for any number of clones or checkouts.

That's really all there is to understand about Fossil. How it does its magic, keeping everything aligned properly, merging in content, how it stores content, etc., is all internal details which most clients will not need to know anything about in order to make use of fossil(1). Using libfossil effectively, though, does require learning _some_ amount of how Fossil works. That will require taking some time with _other_ docs, however: see the links at the top of this section for some starting points.

Sidebar:

  • The only file-level permission Fossil tracks is the "executable" (a.k.a. "+x") bit. It internally marks symlinks as a permission attribute, but that is applied much differently than the executable bit and only does anything useful on platforms which support symlinks.