cwal

s2: Hashtables
Login

s2: Hashtables

(⬑Table of Contents)

The Hash (a.k.a. Hashtable) Type

Jump to...

Hashes

Hashes, also known as hash tables, work similarly to Objects except that they support both "normal" properties (as for containers) and "fast" properties (a hash table). Normal properties are set and fetched as for other objects, and accessing the hashtable properties uses separate routines. Hashes, due to the nature of hash tables, take up notably more memory than objects (approximately hash-size times sizeof(void*) more), and should only be used when speed of access to properties is of the utmost importance or when storing large mappings of keys to values, as searching through large sets is much faster with hashes than objects. Hashed property access is amortized O(1) and does not traverse prototypes if an entry is not found.

As of 2021-07-12, the object class supports a build-time operation which enables internal use of a hashtable for property access. With that option enabled, normal property access is actually a tick faster than hashtable entries (because it lacks the script-side function-call overhead of hashtable entries) but the two types of storage have one notable difference: lookups for object-level properties will, when a property is not found in an object, be searched for in the prototype(s) containers. Conversely, prototype lookups are never performed when looking for plain hashtable entries.

To create a hash table:

var h = new s2.Hash(713);
assert h inherits s2.Hash;
assert 'hash' === typeinfo(name h);
assert typeinfo(ishash h);

Or use the "hash literal" syntax, which identical to object literals except for the addition of the # character as the first character of the literal's body:

{# // <== denotes a hash literal
  a: 1,
  b:= 'hi', // := sets the entry as const (cannot be re-set later)
  ...
}

The (optional) first parameter to the constructor defines the (approximate) size of the hash table (s2 might round it to some "nearby" prime value). The default value is relatively small but is fine for simple uses. A value of 0, or larger than some internally-set limit, causes some unspecified default to be used. A negative value triggers an exception. A hash's table can be resized using its resize() method. One simple way to get a prime is to call 0.nthPrime(20), which returns the Nth prime (where N is its argument), limited to the first 1000 prime numbers (up to and including 7919).

Tip: the inherited withThis() method can be used to initialize a hash table at its construction point:

const h = new s2.Hash(0.nthPrime(20)).withThis(proc(){
 this.insert('a', 'aaaa');
 this.insert('b', 'bbbb');
 ….
});

Alternately (and newer):

const h = new s2.Hash(0.nthPrime(20)) {
 this.insert('a', 'aaaa');
 this.insert('b', 'bbbb');
 ….
};

Hashed values (as opposed to "normal" properties) are added to the object using the insert() method, searched for using search() or (much more efficiently!) the # operator, and are removed using remove(). The full list of member functions is in the next subsection.

ACHTUNG, ACHTUNG, ACHTUNG: As with object properties, hashes can contain any Value type, both as keys and values. However, unlike object properties, hashed keys are type-strict, meaning that the keys integer 1, string "1", and double 1.0 are not equivalent hash keys as they are for "normal" properties. This is a side-effect of how hashing works (type-lax, hashcode-based comparisons of keys cannot work (can it?)). Similarly, Tuples do not hash based on their contents (even though they compare using their contents) because hashing requires stable values (and tuples are mutable). Because of this, Tuples as keys will never hash as equivalent to each other except via some blind twist of fate (an internal hash collision with a different Tuple instance and equivalent content). Summary: do not use Tuples as hash keys. The engine does not currently disallow it, but it is not recommended and may be disabled in the future.

Hash Methods

Hashes add the following methods to their class hierarchy (in alphabetical order):

Hash clearEntries([bool clearProperties=false])

Clears all entries from the hash table. If the optional parameter is true then non-hash properties (those belonging to "the object part" of the Hash) are also cleared (only those set in this object, not properties inherited from the prototype). This does not change the size of the hashtable. Returns this object.

bool containsEntry(Key)

Returns true if the hash contains the given key.

Hash eachEntry(Function)

Calls the given function one time for each key/value in the hash table, passing it the (key,value). Note that the order of the keys is non-deterministic. It is illegal to modify the hash table while it is being iterated over, and doing so will lead to an exception. If the callback returns a literal false value, then looping over the callback stops immediately without an error. If the callback throws, the exception is propagated. Returns this object.

Note that foreach(#aHash) is much more efficient, but was added to the language long after this method.

Hash eachEntry(Hash, Function)

Works like eachEntry(Function), but uses the first parameter as the "this" for the call.

Hash eachEntry(Hash)

Copies all properties from this object to the given hash. Equivalent to: myHash.eachEntry(targetHash, targetHash.insert)

integer entryCount()

Returns the number of entries in the hash table (an O(1) operation). Note that this is a function, not a value property (because s2 lacks so-called property interceptor functions)!

Array entryKeys()

Returns a new array of all keys in the hash table, in an indeterminate order. (Note that hashes inherit the Object.propertyKeys() method as well.)

Array entryValues()

Returns a new array of all values in the hash table. Their ordering is the same as their corresponding keys (but that ordering is indeterminate). It is guaranteed that the arrays returned from this function and entryKeys() will have their entries aligned such that a given index in the key list properly goes with the value at that same index in the values list, provided the hash table is not modified between the calls to entryKeys() and entryValues().

Hash growIfLoaded([double loadFactor=approx. 0.8])

If this hash's load factor is over the given value, it is resized to have (roughly) the given load factor. Returns this hash. Unusually small or large numbers might be trimmed to within "reasonable" limits. This routine never shrinks the table.

bool hasEntries()

Returns true if this hash contains any hash table entries, else returns false.

Design note: in order to avoid confusion wrt the inherited Object.isEmpty(), we don't override it, but instead provide this function, which has the opposite return semantics.

integer hashSize()

Returns the size of the hash table (not the number of entries). Ideally (for search speed) this value is a prime number larger than the number of entries. This value is set at creation time or later on via the resize() method.

Value insert(Key, Value)

Inserts the given key/value pair, overwriting any existing entry with an equivalent key. Returns the value it sets.

Potential TODO: add a 3rd (boolean) param specifying whether to overwrite or fail if an entry already exists.

new Hash(integer tableSize)

Evaluates to a new hashtable with the given table size (which really, really should be a prime number). The constructor might (and might not) adjusted the provided size to fit within some certain range and/or make it a prime number. The nthPrime() method of the integer prototype can be used to fetch any of the first 1000 prime numbers, e.g. 0.nthPrime(20) fetches the 20th prime.

bool remove(Key)

Removes the given key. Returns true if it found an entry, else false.

Hash resize(integer newSizePleaseUseAPrimeNumber)

Resizes the internal lookup table, retaining all existing entries. Its argument must be an integer value greater than 0 and it really, really should be a prime number. The library reserves the right to cap the size at some internal limit. Tip: the expression 0.nthPrime(N), where N is an integer between 1 and 1000, will return the Nth prime number.

Value search(Key)

Searches for the given key and returns its value, or undefined if no entry was found (but note that undefined is also a legal value!). The # ("hash") operator provides this same feature but without the function call overhead (i.e. it's faster and needs less memory). e.g. hashInstance # key is equivalent to, but slightly faster than, hashInstance.search(key).

Hash takeProperties(Container src [, integer overwritePolicy=1])

Warning: this member is not implemented (will throw an exception) if the library is compiled with the CWAL_OBASE_ISA_HASH option. The underlying algorithm has to be reimplemented for that mode of operation.

Transfers (moves) all properties from the given container, inserting them as hash table entries in such a way that does not require allocating any new memory. The second parameter determines how overwriting is handled: 0 means throw an exception if a key collision occurs. Greater than 0 means to overwrite any entries in the hash with the entry from the source container. Less than 0 means to not overwrite entries, keeping any existing values (and treat it as success). If the passed-in container contains any properties after this method returns, it means those properties were not transferred because the overwritePolicy did not allow for it. Returns this object.

If the 2nd argument is of type "unique" (i.e. is an enum entry) then its wrapped value is used in its place. (Potential TODO: add enum entries to this method which correspond to the 3 legal values of the policy.)