cwal

s2: Tips and Tricks
Login

s2: Tips and Tricks

(⬑Table of Contents)

s2 Tips and Tricks

This page describes various random tips and tricks for using, and making more effective use of, s2.

Jump to...

The s2 Mindset

While s2 superficially looks like JavaScript, and does indeed have much in common with it, it does behave differently (more on that in a separate subsection of this chapter) and can be used more effectively if one thinks not in terms of a JavaScript-like language, but in terms of an expression evaluation engine (with, incidentally, a JS-like syntax).

Keep in mind that keywords, like return and throw and even if/for/while, are all (unlike in JS) expressions and can be used anywhere any other expression is allowed. A variable declaration is - you guessed it! - an expression, which makes it possible to optionally declare a variable or not (which is rarely useful, but interesting nonetheless ;), as in this example:

foo() ? const x = 3 : false /* any useless value will do here */;
// equivalent to:
foo() && (const x = 3);
// NOT equivalent to:
if( foo() ) const x = 3; // x lives in the if()'s scope!

In the first two examples, if foo() evaluates to a truthy value, x gets declared in the current scope, otherwise it does not. This is admittedly not a generally useful trick, but it demonstrates a primary difference between s2 and most other languages it conceptually derives from or resembles, namely that it treats keywords like expressions (the exception is inherits, which behaves like an operator because it sits between its operands (a.k.a. infix)).

This is somewhat unconventional, but once one gets used to the idea, it allows constructs which other languages don't because of their strict separation of, e.g. keywords vs. function calls. s2, on the other hand, allows (because keywords are expressions) an if/else block to be a function argument:

print(if(0) 0;else if(0) 0); // outputs "false"[^62]
print(if(0) 0;else if(1) 0); // outputs "true"

(Remember that an if/else block evaluates to true if any of its if conditions evaluates to true, else it evaluates to false.)

Not that one would really want to use that particular construct, but there are more sensible uses.

Non-obvious Differences from JavaScript

s2 has a lot in common with JavaScript, at least cosmetically. Its data type model is directly derived from JSON, which is a subset of JavaScript. Nonetheless, s2 has some features which superficially appear to be similar to counterparts in JavaScript but which behave differently, or appear to work the same but do so very differently. Here is an overview of things which those used to JavaScript might potentially stumble over because of their deceptive similarity to JS:

Editing s2 Script Code

Most editors which support JavaScript syntax highlighting can handle s2 very well. The syntax is close enough to JavaScript that the majority of editors will have no problems with it (or with most of it, though they do sometimes trip up on heredocs and unusual keyword usage). For convenience, configure your editor to use JavaScript mode for the file extension(s) you use for your s2 scripts (i use, quite uncreatively, ".s2"). If your editor is intelligent enough to do "code inspections," you may want to disable them, as JS-specific inspections are wrong more often than not in s2.

The only s2 construct which consistently gives me grief in JS syntax-highlighting modes is heredocs.

See the mindset section for suggestions regarding the "mental side" of editing s2 code.

Use importSymbols() resp. using for Inner Functions

When a function defines inner functions, s2 has to (re-)create those functions each time the outer function is run (remember, s2 is exceedingly memory-light, and does not "compile" its code). For this reason, it is good practice to define inner functions as imported symbols unless the outer function will rarely be called or will only called once or twice during the life of a given script.

For the general case, instead of this:

const f = proc(x){
 x.foo( proc() { … } );
 const aFunc = proc(){ … };
 aFunc(x);
 ...
};

Prefer this:

const f = proc(x){
 x.foo( xCallback );
 aFunc(x);
 ...
} using {
 aFunc: proc() { … },
 xCallback: proc() { … }
};

The main reason is memory usage: the second form only has to allocate the inner functions one time (and then keeps them in memory as long as the containing function references them), whereas the first form has to do it on every call. (To be clear: it might not allocate anything - it may get all its memory from the recycler. Abstractly, though, it has to allocate a new, local instance.) Another reason is parsing effort - the second form requires, in the aggregate, less work unless the function is only called once or twice. Functions which rarely get called do not benefit, or not as much, from this guideline, and may even cost more memory: if a function is never called, the imported symbols will have been created but will never be used.

The first form is arguably more readable, though, and if that is more important to you than memory allocation and performance, feel free to use it.

Such constructs can be nested - see this script for an example of inner functions nested several levels deep this way.

Another option is to set up inner functions only the first time they are needed:

const f = proc callee(){
 if(!callee.inner){
   callee.inner = proc(){...};
 }
 callee.inner(...);
};

Here's a slightly different syntax with the same effect:

const f = proc callee(){
 callee.inner || (callee.inner = proc(){...});
 callee.inner(...);
 // or, more compactly (note the ||| operator):\
 (callee.inner ||| (callee.inner = proc(){...}))(...);
};

Those variants have a bit of parsing overhead on each call so that the tokenizer can skip over the function creation on the 2nd and subsequent calls, but it is not as expensive as instantiating the inner function on each call. The form shown last (with the ||| operator) is, in terms of parser overhead, the most efficient of the initialize-on-first-use approaches shown, but the first option (with an if) is arguably more readable by humans.

All that being said, there are cases where a new function instance must, for proper behaviour, be created anew on each use. This script demonstrates such a case, where each instance of an inner function will be returned to the caller and needs its own imported symbols, independent of all other copies of the function.

Descriptive Assertions

When an assertion or affirmation fails via the assert resp. affirm keywords, the text of the error contains the whole source of the failed expression, up to the end-of-expression terminator. This means that one can comment assertions in as much detail as they like, and have those details bubble up to the user on error:

assert false
  /* a comment before the semicolon
     gets included in the error message */;

The assertion message text will include the comments because they come before the semicolon resp. the end of the expression (colloquially called EOX). The following comment will not be included in the error message because it comes after the semicolon resp. EOX:

assert false; // a comment after the semicolon is not included in the message

Clients can use this as a way to explain to the user what went wrong at that point, what the assertion is really checking, why it does so, and possibly recovery suggestions. Or maybe to annotate assertions with corresponding ticket numbers. This is also useful for function argument validation:

const getCityCoordinatesByName = proc(a){
 affirm typeinfo(isstring a) /* expecting a city name */;
 ...
};

Alternately:

const getCityCoordinatesByName = proc(a){
 typeinfo(isstring a) || throw "Expecting a city name";
 ...
};

Sidebar: personally, i prefer affirm over throw because it's shorter and comments are (A) faster to parse than strings because they require no unescaping and (B) cost the interpreter no extra memory unless/until they're used in an error string. That said, the above throw is skipped over if the affirmation is true, and the string literal costs us no extra memory in that case because skipping over an expression necessarily tokenizes it but does not allocate any memory along the way. (That said, tokenizing a string is ever-so-slightly slower than tokenizing a comment.)

Relative Performance of Various Common Ops

The following list orders (approximately!) various types of script-side operations from "least expensive" to "most expensive," in terms of abstract computational costs (including memory, though not all operations actually allocate). The ordering is based solely on my (relatively intimate) understanding of the implementation, and not on profiling, so interpret the list as an "educated suggestion" as opposed to a "mathematical rule":

  1. Comments, newlines, and other noise/junk tokens get filtered out early on, and affect only the tokenizer level. Such tokens inside function bodies take up memory, though, because a function body is internally stored as a copy of its originally string source plus the location information for where the function was defined (for error reporting).
  2. Tokenizing operators is the cheapest of the truly "moving" parts, hands-down.
  3. Tokenizing identifiers. This normally requires a string allocation because the operator layer (the stack machine) does not have access to the raw token memory (they work with a higher-level form of the tokens).
  4. Using operators:
    • Comparisons and instanceof are cheapest, as their results require no allocation.
    • Simple operators (+, -, *, etc.) often have to allocate a value for the result.
    • Compound assignment ops (+=, *=, etc.) cost a bit more because they first have to do a lookup of their LHS.
    • Overloaded operators cost the same as a property lookup, a function call, plus an operator.
  5. Property lookup (implies an operator call) is approximately O(N) on the number of total properties (including those in prototypes), but the keys are (as of 20170320) kept sorted to cut average search time. In practice the performance has never been problematic. Getting notably faster lookup would require prohibitive (to me/cwal) memory costs. (← At the time that was written, hash tables cost considerably more memory, on average, than Objects, but that gap has since been cut notably.)
  6. Keywords most often basically act as a recursive call back into the main eval loop, with a C function in the middle to change the parsing semantics a bit. Sometimes they're cheaper than operators (e.g. a simple break or return) but larger ones (if/while/etc.) may of course run an arbitrary number of expressions and scopes. The setup for that is not all that high (cheaper than a function call), but it's more costly than a simple list of values and operators.
  7. Function calls. While they appear simple, they have the highest relative setup and teardown cost of any core s2 operations, quite possibly by an order of magnitude or more (they've never been profiled).

Garbage collection is going on all the time, as values reach the ends of their lives and as the core evaluation engine cleans up every now and then (always between atomic expressions, though scoping might temporarily redefine what that really means). As such, it doesn't have a cost, per se, which we can measure in terms of the above operations. GC costs depend 100% on the amount of stuff to clean up, which depends largely on how much memory the previous 1 to N previous expressions in the current scope allocated (where N is a "sweep interval" config option of s2 which defaults to 1). Once the recycling bins have gotten a few entries, further allocation sometimes drops to 0.

Don't Need a Prototype? Remove it!

If you really don't want a given container value to have a prototype, simply unset it:

unset myObj.prototype;
// or:
myObj.unset('prototype');
// or, as a special case, assignment to null or undefined:
myObj.prototype = null;
myObj.prototype = undefined;

(Assigning the null or undefined values to a prototype removes it, and is also legal in the body of object literals. That is a special-case behaviour of the prototype pseudo-property. For all other properties, doing so simply assigns them that value.)

This is not allowed on non-containers because all non-containers of a given type share (for memory usage reasons) the same prototype pointer at the C level, and s2 does not allow clients (for reasons of sanity) to replace the prototypes of the non-container types (it does allow those prototypes to be modified, however).

myObj.unset('prototype') works even though prototype is not a "real" property and myObj.hasOwnProperty('prototype') always returns false. Interestingly, though, after calling the former, myObj no longer has an unset() method(!) because that method is inherited from (surprise!) its prototype or a further-up ancestor:

s2sh> g.prototype
result: object@0x1f18010[scope=#1@0x7fff825ac558 ref#=17] ==> {
}
s2sh> g.unset('prototype')
result: bool@0x684718[scope=#0@(nil) ref#=0] ==> true
s2sh> g.unset('x')
rc=105 (CWAL_RC_EXCEPTION)
EXCEPTION: exception@0x1f4aaf0[scope=#1@0x7fff825ac558 ref#=0]
==> {
 "code": 304,
 "column": 7,
 "line": 1,
 "message": "Non-function (undefined) LHS before a call() operation.",
 "script": "shell input",
 "stackTrace": [{
 "column": 7,
 "line": 1,
 "script": "shell input"
 }]
}

Parsing Efficiency Tweaks

Here are some tips regarding squeezing a few extra cycles out of the tokenizer and parser. These are not rules, they are just tips for the performance-conscious.

Footnotes

var o = {}; o[var x = 'hi'] = 1; assert 'hi'===x;
As of 2021-06-24, property access in the form `x[y]` and `x.(y)`
now run those blocks in their own scopes.

  1. ^ In hindsight possibly a poor idea, but changing this may well break stuff (in the sense that its docs would need fixing - it's unlike that any non-s2/th1ish-test script code actually relies on the current behaviour).
  2. ^ Reminder to self: the C++ conversions can theoretically hide the exit in some unusual constructs, essentially stopping its propagation. It would require quite an unusual setup to do it, though. We might need an s2-internal flag for is-exiting, instead of relying solely on result code propagation. (Some of that has since been put in place, but it's currently only used for the optional interruption/Ctrl-C handling.)
  3. [^ 1 ]
    Prior to 2021-06-24, `[]` in property access context did not
    use its own scope, which made it possible to declare variables
    which would be available outside of that access. For example:
  4. [^ 62 ]
    This is an example of how the end of a parenthesis block acts
    as an implicit EOX, which is why the "else" part does not need a
    semicolon after it.