(⬑Table of Contents)
Symbol Resolution in WHCL
whcl's symbol resolution initially followed the highly unconventional approach used by its predecessor, s2, but was changed on 2022-03-14 to use something more conventional but still slightly different enough from potential expectations that it's worth explaining and demonstrating in detail. This new approach is more limiting than the prior approach but is also less likely to lead to the shooting of oneself in the proverbial foot.
Reminder to self: if this approach doesn't work out, we simply have to make a small tweak to
whcl_var_search_v()
to restore the previous behavior.
When whcl goes to look up a symbolic name (an identifier or variable reference), it goes through the following steps, stopping at the first one which resolves the symbol:
- Is it a builtin value?
- Look in the properties stored in the current scope.
- If the current scope is not a function call scope, look in the
parent scope, recursively, until either a function call scope is
checked or the top-most scope (a.k.a. the "global" scope) is
checked. (A "function call scope" is the one started specifically in
response to calling a function. It's the one where builtin variables
such as
this
andargv
live, as well as variables which are named after function parameters.) - If the current scope is a function call scope, skip directly to the top-most (global) scope and check there.
Many contexts which perform symbol lookup will fail if the searched-for symbol is not found and some will fail if it is found in the current scope (e.g. you cannot declare a symbol twice in the same scope).
The above might seem straightforward but it may leave a scripter
scratching their heads in certain cases. Most notably, functions
created with the proc
builtin command are, by
default, created in the current scope. More often than not, the
current scope is not the global scope. That means, for example, that
the following lookup may not behave as one expects:
proc f {} { return 1 }
proc ff {} {
f; # It may seem intuitive that f will resolve to the
# function declared above, but that's not the case
# unless f is global.
}
ff ; # will fail because it cannot resolve f.
The reason that will fail (if it's performed outside of the global scope) is straightforward but possibly not intuitive, especially to programmers used to languages with "automatic" closure support (e.g. JavaScript).
Because functions are first-class values in whcl, they may be
referenced like any other values and may be propagated, e.g. assigned
as properties in arbitrary objects or returned as the result of a
function or eval
block. Because of that, it's never generically
possibly to know whether a given function is being called in the scope
it was initially declared in or not. The body of a function is, when
it's declared, converted to a form suitable for later evaluation, but
no state about its scope or any values it references (or doesn't) in
its body is available. The code in the function is only evaluated when
the function is called. It "would" be possible for a function
declaration to grab ahold of a reference to the scope which is active
when the function is created, but (A) that would have attrocious
side-effects with regards to garbage collection and (B) it's not
always obvious to an onlooker where the language injects new scopes,
so which scope is the current scope might not always be apparent to
the user.
Sidebar: OMG. Since the scoping overhaul merged into trunk on 2022-03-22, having a function grab a reference to its scope would not have that bad of an effect on garbage collection but would, more often than not, capture far, far more state than the function needs (and indirectly hold a reference to a bunch of state it doesn't need). Something worth experimenting with, in any case, or maybe add a new flag to
proc
which tells it to do so.
In order for the above ff
to resolve f
properly, any one of
several conditions must exist (in no particular order):
f
must be a builtin value.f
must be declared inside offf
's body.f
must be imported intoff
with theusing
modifier or theimport-symbols
method.f
must be declared in the global scope. This can be achieved with the-global
flag to theproc
command, but practice suggests that declaring functions as global is usually unnecessary.
We'll demonstrate those final 3 options below (the first one not being possible without modifying whcl's C code to add the new builtin value).
#2: a nested function
proc ff {} {
proc f {} { return 1 }
assert 1 == [f];
}
ff
Sidebar: don't do that (if
ff
will be called many times) because it requires recreatingf
on every call toff
, which is relatively expensive. Instead, prefer one of the options demonstrated below. The exception is when the containing function is only called once, or perhaps a few times in the life of a script. In such cases it can actually save memory to use the approach demonstrated above.
#3a: using
proc f {} { return 1 }
proc ff {} {assert 1 == [f]} using {f $f}
ff
#3b: using
again (alternate (and more efficient) formulation)
proc ff {} {
assert 1 == [f]
} using {
f [proc {} {return 1}]
}
ff
#3c: import-symbols
(equivalent to #3a, above)
proc f {} { return 1 }
proc ff {} {assert 1 == [f]}
ff.import-symbols f
#4: global f
declaration
proc -global f {} { return 1 }
proc ff {} {assert 1 == [f]}
Be Aware of Lazy Symbol Resolution
The symbol resolution gremlins are lazy. They don't like to look up anything until they're asked to and they don't like to remember what they've looked up before.
To continue the previous section's example: because any symbols
referenced by ff
's default parameter values and its body are looked
up each time it is called, it is entirely possible that the f
they're referring to is a different f
than was in scope in any prior
call. In some use cases this is desireable but in others it may be
less so. In order for ff
to bind f
to a specific f
instance,
and keep using that specific f
instance (even if the global-scope
f
is replaced), one of the following things has to happen:
- The
using
orimport-symbols
approach must be used, as those resolvef
immediately when they are invoked and retain that reference tof
. f
has to be declaredconst
so that it cannot be replaced, as demonstrated below.
In the general case, a const declaration looks like:
decl -const f proc {} {...}
But the caveat is that script code doesn't normally run in the global
scope, so the above will not declare a global-scope f
. One workaround
is to store f
in the whcl.client
object, which is reserved specifically
for client-side use, and set
it const there:
set -const whcl.client.f proc {} { ... }
The Caveats
whcl's symbol resolution rules are different from all other cwal-based languages so far. Generally speaking, it's an improvement in that it keeps code from resolving symbols in potentially surprising ways. On the other hand, it has at least one significant caveat which the previous model allowed for. Consider a function which takes a script snippet as an argument:
my-function {a script snippet}
It was historically (in whcl, prior to this change, as well as whcl's predecessors) common to have such functions which did things along the lines of:
decl i 0
myDb for-each "select * from t" {
incr i
... do something with this db record ...
}
Using whcl's current symbol lookup rules, the first line of that final
argument cannot resolve the symbol i
(or will resolve one from the
global scope). Such constructs must now be reformulated as a callback
function, along the lines of:
decl o object i 0
myDb for-each "select * from t" [proc {} {
incr o.i
... do something with this db record ...
} using {o $o}]
The extra object (or array) around the i
reference is necessary
because of how imported symbols work.
The caveat here, in case it's not clear, is the added complexity of requiring a wrapper function, instead of a plain script snippet, and an additional level of object/container wrapper for the results.
Hypothetically, but only hypothetically, it would be possible to
tell whcl to run such eval
'able code blocks in the scope from which
their containing function was called. That would, however, not work
with current code because its structure is based on the engine's
age-old assumption that only the newest stack is ever active, and the
implications of breaking that assumption are akin to the proverbial
"crossing of the streams."
Those familiar with TCL might fairly say "just add upvar
support."
The most correct answer to that is: much easier said that done. The
way vars are stored does not directly support such a thing, it would
require adding new tooling to the vars storage to able to handle it,
and it would be easily bypassable because that tooling would happen at
the whcl layer whereas property/variable access, at its lowest level,
happens via the cwal layer's API. Though it would be trivial to add
a builtin which resolves vars from scopes an arbitrary number of
levels up, it would be limited in terms of what it could do with them.
e.g. assignment of values through such resolution would not work.
(Hypothetical script code...)
decl i
proc {} {
echo "i =" [lookup -1 i] ; # this would be easy to do, but...
incr [lookup -1 i] ; # ... could not work. i think.
}
It's worth experimenting with someday, though.
Having said all of that...
Lookup Across Call Boundaries
The proc
builtin now has the -xsym
flag
to tell it that calling that function does not create such a lookup
boundary.