cson  Update of "cson_session"

Many hyperlinks are disabled.
Use anonymous login to enable hyperlinks.


Artifact ID: e6b09ba61b4f1afaffc5e1c31451c2029f1ceb9e
Page Name:cson_session
Date: 2011-05-09 17:58:44
Original User: stephan
Parent: bf7570f60fab6e06d2d49e94c6d3c923ce66aee0

ACHTUNG: THIS PAGE IS NOW MAINTAINED IN THE NEW WIKI: http://whiki.wanderinghorse.net/wikis/cson/?page=cson_session


The cson_session API is a small extension to the cson core which provides an interface which defines operations necessary for loading/storing persistent application sessions using cson-based JSON trees. It was originally written as part of cson_cgi but is independent of that code.

By "persistent session" we mean a JSON object tree (in this case, implemented as cson object trees) which we can save/load to store/restore application state during subsequent application sessions. With this in place, applications can easily save/restore their app-specific state to JSON.

The API only defines the (small) interface needed for managing sessions, and does not specify how the sessions are to be stored. The library comes with these implementations, however:

  • cson_sessmgr_file: file-based storage. This requires that the user account under which the application is running has access to store his session (or at least read it). For local applications this is rarely a problem, but can be problem when run as a CGI on a hoster where CGI apps run as a different user than the account holder.
  • cson_sessmgr_cpdo: uses a database as the backend. The cpdo database access abstraction library is used here to support any database it supports (currently MySQL v5 and sqlite3). (cpdo is included in this source tree and need not be installed separately.) When using sqlite3 storage the database and the directory containing it must be writable by the user account under which CGIs are run. This is because sqlite3 creates a temporary journal file there and will fail if it cannot create it.
  • cson_sessmgr_whio_ht: a file-based shared hashtable, where session IDs are the keys and JSON data are the values. It is based on whio_ht, and that code is included in this tree. The makefile does not build this one by default because (A) i'm the only one likely to use it and (B) it requires including/linking a huge amount of external code from libwhio.
  • cson_sessmgr_whio_epfs: an "embedded filesystem" container file, based on libwhio_epfs. Like the hashtable-based solution, this one is not compiled in by default (for the same reasons given above).

Regarding the file/directory permissions for CGIs: most hosters (in my experience) run CGI scripts as the account holder's user, and thus CGI apps tend to have full access to anything the account holder has. Some configurations may, on the other hand, execute CGI scripts as, e.g., the www or apache user, in which case the session storage has to be writable by that user (and probably cannot be cleaned up by the account holder!).

See the file include/wh/cson/cson_session.h for the API docs for full details and information on how to configure the back-ends for the local environment. Configuration options are given as cson-based JSON trees, which means the configuration data is easy to import from JSON config files or to create them programmatically. There are examples of such configuration details shown further down on this page.

The API does not provide interfaces for handling administrative details like cleaning up expired sessions from the underlying storage. Implementing these things is, more often than not, much simpler using backend/system-specific scripting than it would be to do it in C. We also cannot generically implement the cleanup logic at the library level. e.g. some clients might require that the session storage not get deleted, but get moved to an archive. Thus cleanup of stale session storage is left to the system administrator (or developer of the client app which creates the sessions). A periodic cron job, or app logic which runs the cleanup commands every so often, are often viable solutions.

It is important to note that session management is not "automatic." The client app must do the following:

  • Instantiate a session manager instance (C class=cson_sessmgr) using one of the provided factory functions. (Or roll their own - it's quite easy.)
  • Generate or acquire a unique ID for the session. How you do this is your business. (i like UUIDs.)
  • Use the manager to load the session for a given unique ID. The session itself is a cson JSON object tree which can be queried/modified by the client to fetch/set the session data.
  • Eventually use the manager to save the session.
  • Optionally use the manager to delete that single session instance from the storage.
  • Free the session manager instance (not using free()! See the API docs!).

The cson_cgi API's session support takes care of those details for clients of that API, loading and saving the session, as well as tracking a session ID via an HTTP cookie, leaving the client only to provide the configuration information and deal with the actual JSON object tree.

Memory and Storage Costs

The memory cost of a session manager object is, in and of itself, very low (well under 100 bytes, not counting the actual underlying storage handle). The concrete storage type plays the largest role in the cost. Here are some approximate RAM memory usages (as measured by valgrind) for a demo/test application on a 32-bit system...

  • File-based: 15kb (most of this was unrelated to session management)
  • hashtable-based: 19kb
  • sqlite3: 105kb
  • mysql5: 222kb

(Each of those was about 5kb bigger on 64-bit build.)

For the file- and hashtable-based managers, far less than 1kb was the actual session management, another 1kb or so were the session data, and the remaining 13kb (or so) was application state unrelated to session management. Thus we can, for purposes of the sqlite3/mysql values, derive the approximate session management costs by subtracting 13kb from their values. This costs will of course differ depending on the size of the application's data set.

Storage costs are of course directly related to the size of the session data, plus any overhead imposed by the concrete session manager implementation. Session JSON data is stored without any extra spacing for human-readability, to save on storage space. Some managers, like the hashtable- or cpdo-based managers, have to buffer their JSON input/output into RAM before it can be processed or written to the storage, whereas others, namely the file-based manager, can stream the data directly through cson_parse() and cson_output() without an intermediary buffer.

i may someday add the option to compress the session data, but that would require compressing the contents entirely in memory because we cannot generically stream the data to/from the backend. e.g. we cannot stream column data to a database this way, so we would have to compress it in memory first and then write the compressed data as a BLOB (which is not inherently wrong, by the way). Likewise for reading. What i'm trying to say is that the load/save-related memory costs would approximately double due to the requirement of buffering the de/compressed data. And, given the (anticipated) average size of session data (a few kb or less), the de/compression-related buffering and overhead is likely to swamp any storage gains.

Implementing Custom Back-ends

The library only specifies a generic interface, and not how it is to be implemented. It also comes with two very different implementations, but that's mainly to show that the interface itself is suited to its purpose (it was designed to support cson_cgi). The implication is that clients can provide concrete implementations of these interfaces to provide their own storage back-ends (e.g. an on-disk hashtable or an embedded filesystem).

The cson_sessmgr interface, presented in cson_session.h, declares only four operations for session management. Those are, in conceptual terms (the signatures actually look a bit different, and are simplified here for clarity):

  • cson_value * load( char const * session_ID ) loads a JSON object/array tree by session ID.
  • int save( cson_value * json, char const * session_ID ) saves a JSON object/array tree using the given session ID.
  • int remove( char const * session_ID ) removes the given ID from the underlying storage.
  • void finalize() cleans up the session manager instance, including any resources it owns (e.g. a database connection handle).

Implementing custom ones is simple. As a benchmark, the file- and cpdo-based implementations each took under two hours to implement, and much of that time was due to me changing the API and configuration-related interfaces several times during development. If one were to sit down a write a custom file-based implementation without the "development mode" overhead of the first implementations, it could probably be implemented in under an hour of work, provided the underlying library can do what needs to be done without much shoehorning.

See the files cson_session_file.c and cson_session_cpdo.c for example implementations. The file-based implementation currently (20110411) has only about 170 real lines of code and the database implementation has about 280.

Once a custom manager is implemented it can be registered for use with other session-using code by calling cson_sessmgr_register(), and clients can instantiate a registered session manager by name using cson_sessmgr_load().


Notable TODOs...

  • An implementation which uses HTTP cookies for storage. That requires that i first add Object/Array support to the cookie serialization bits. In and of itself, that's easy enough to do, but some cookies are themselves stored as objects, so we have an ambiguity problem. If i add the ability to add client-specific flags to cson_object, i could work around it, but that would be kludgy. i'd fundamentally like to have client-specified flags for all cson_value instances, but i'm not keen on adding that to their memory cost and it would interfere with the "special" internal constant values like empty strings, booleans, null, and the number 0.

Less pressing TODOs...

  • Standardize the last-saved timestamps to GMT/UTC instead of the current use of local time.