cson  Artifact Content

Artifact 2b152f1a1e9bf5cb582900d9636948037815b907:

Wiki page [TODOs] by stephan 2011-05-09 17:58:47.
D 2011-05-09T17:58:47.054
P f8f6ed98f011b365584098e2d369c3f6b37b6508
U stephan
W 4743
<strong>ACHTUNG: THIS PAGE IS NOW MAINTAINED IN THE NEW WIKI:</strong> [http://whiki.wanderinghorse.net/wikis/cson/?page=TODOs]

<h1>cson TODOs</h1>

The more significant TODOs include:

   *  The object iteration API needs to be changed to allow us to modify an object during iteration without invalidating the iterator. Doing this probably requires adding a much different (more complex) internal API for handling object key/value pairs. Currently we use an unsorted list of key/value pairs, which is actually okay for generic use cases, but is not high-performance and has the iterator invalidation problem. After initial experimentation with a binary search tree, it became clear that we would normally get worst-case search performance (O(N)) because JSON keys/values from foreign sources are, more often than not, already sorted by key.
   *  [cson_cgi] has one routine (for parsing HTTP cookies) which has a couple known buffer-overrun possibilities. Need to fix this. (The rest of the code is, as far as i can determine, safe against malicious buffer overruns.)
   *  <tt>cson_cgi_init()</tt> needs to be able to take client-side configuration in the form of a JSON object tree, instead of only as an external JSON file.
   *  [cson_cgi] needs a debug channel, so that we can log output which should not interrupt the JSON output. e.g. non-fatal initialization-related problems (can't load config file or session db connect failed, both of which are currently silently ignored).

Potential TODOs which don't really have a priority:

   *  We're probably missing a number of 5-line convenience functions which simply need to be written.
   *  The <s>object and</s> array APIs provide no way to <em>remove</em> entries from the containers. These features are normally not needed when generating/reading JSON, and will be deferred until they are needed. (20110324: Properties can now be removed from Objects, but not yet arrays.)
   *  <tt>cson_parse()</tt> currently reads one byte at a time, instead of buffering. This is because i've been too lazy to add the additional nested loop to handle buffering a few kb of input at a time. The underlying JSON parser reads byte-by-byte, so buffering is not really needed, but would be an optimization.
   *  <em>Maybe</em> more complete object/array APIs. Currently they only have the functions which are necessary for generating and parsing JSON, and traversing the results. They do not have a full suite of mutator APIs, for example.
   *  Maybe pull in the whprintf code so we can add support for setting string values via printf-like commands.
   *  Consider pulling in the whalloc_pager code from the nosjob tree, and using the paging allocator for the base POD types. For disparate types with the same <tt>sizeof()</tt> we can use the same allocator instance. This introduces threading issues, but the allocator supports a client-specified mutex.
   *  Objects internally use string keys for properties (as is proper for JSON). It might be worthwhile to have it internally use <tt>cson_value</tt> to hold the keys, so that we could do reference counting in them. However, actually getting any benefit from that would require a good deal of plumbing (and extra overhead) to "internalize" property keys for re-use. That introduces threading issues, reference-counting issues, as well as adds a good deal of complexity.
   *  Do some memory optimization. In input mode it's using considerably more memory than i feel it really should, and i'm not certain where the exact culprit lies. In one test: about 410kb of peak RAM needed to build a JSON DOM from a 60kb JSON file containing 3126 keys and 3825 values. That averages out to about 60 bytes per key or value, and that seems high to me (though the data contained several large strings).
   *  Use the JSON_parser code's option to allow us to self-parse double values, so that we can support long double.

Random thoughts on further memory optimizations:

   *  Add reference counting to cson_string. During parsing, "internalize" all keys into a sorted list, and increase the refcount instead of allocating anew. This could quickly get ugly, and requires us to store (and sort) a list of (unique key count) cson_string pointers. But it would be kinda cool. If we used a hashtable it's be easier, but that would have a lot more memory overhead. This would introduce threading issues, since the internalized strings would need to be shared to be of any benefit.
   *  We might be able to come up with an interface which lets the client provide string de/allocation routines, and then he could add his own back-end more suited to his data. Hmm. (Strings are used heavily by the API, as all object keys are strings.)

Z 9edf42fe71229d2d33c10de1a0bfeec4