(⬑Main Module Docs)
CGI: You know, for s2-powered CGI scripts!
- Source code: /dir/s2/mod/cgi?ci=trunk
- Test/demo code: /finfo/s2/mod/cgi/test.s2
Jump to:
Intro
The CGI module provides a basic framework for implementing CGI scripts using s2. It's possibly missing a few useful features. What's there works reasonably well, though. It is in active use in several of my own sites to power JSON-centric back-end services.
Example code:
This module is the basis of s2cgi, a perpetually-in-development higher-level framework for JSON APIs, implemented in script code:
My own sites use several CGIs based on s2cgi, including:
- https://fossil.wanderinghorse.net/r/www-wh/finfo/cgi-bin/search.s2
- https://fossil.wanderinghorse.net/r/morbad-card-tools-s2cgi
- https://fossil.wanderinghorse.net/r/cide-site-s2cgi
- https://fossil.wanderinghorse.net/r/byoo
- https://fossil.wanderinghorse.net/r/rita-cgi
The CGI loadable module, when its init()
method is called, populates
script-space with various CGI input data and utilities for working
with it. It is intended to be used for implementing CGI applications
entirely in s2, and its direct code-predecessor was used to implement
CGI demo pages for the libfossil/th1ish1 bindings. This module
provides a basis for writing CGI applications, but not a complete
framework. All but the lowest-level work is intended to be done in
script code (e.g. routing/dispatching, output buffering management
(assisted by the C layer), page rendering, ...).
As of this writing (20181122), the CGI module is used by several
back-end scripts on my own website and s2.tmpl()
is used to generate
most of the static HTML files from various input sources, including an
sqlite3 database.
Its main services/features include:
- Converts all GET parameters and cookies into a script-usable form.
- Makes incoming POSTed JSON or form-urlencoded data available to the script.
- Intended to be used with the output buffering
API (
s2.ob.*
), to enable setting of HTTP headers (e.g. cookies) at arbitrary points throughout the page generation process (headers must be output first, but buffering allows headers to be changed throughout the generation of the page).
Loading the CGI module:
const cgi = s2.loadModule('cgi.so');
When using require.s2:
const cgi = s2.require(['s2mod!cgi']).0;
// or:
s2.require( ['s2mod!cgi'], proc(cgi){ … } );
In order to properly accommodate this module's usage via both a DLL and
statically built in to an app, its init()
method must be called once
before its other methods are used. Because that method returns the CGI
object, it may be called directly when loading the module, e.g.:
const cgi = s2.loadModule('cgi.so').init();
// or:
const cgi = s2.require(['s2mod!cgi']).0.init();
CGI Environment Variables
The following description of the relationship of certain CGI environment variables is taken verbatim from code comments in the Fossil SCM project:
REQUEST_URI, PATH_INFO, and SCRIPT_NAME are related as follows:
REQUEST_URI == SCRIPT_NAME + PATH_INFO
Or if QUERY_STRING is not empty:
REQUEST_URI == SCRIPT_NAME + PATH_INFO + '?' + QUERY_STRING
Where "+" means concatenate.
Sometimes PATH_INFO is missing and SCRIPT_NAME is not a prefix of REQUEST_URI. (See https://fossil-scm.org/forum/forumpost/049e8650ed.)
SCGI typically omits PATH_INFO. CGI sometimes omits REQUEST_URI and PATH_INFO when it is empty.
CGI Parameter quick reference:
REQUEST_URI
_____________|________________
/ \
https://fossil-scm.org/forum/info/12736b30c072551a?t=c
\___/ \____________/\____/\____________________/ \_/
| | | | |
| HTTP_HOST | PATH_INFO QUERY_STRING
| |
REQUEST_SCHEMA SCRIPT_NAME
CGI Methods
This module defines the methods described in this section. The intent is that the CGI object be fleshed out via script code, e.g. with convenience methods for working with the CGI environment's GET/POST data, as well as any app-specific routing functionality. Those sort of features are much simpler to implement and evolve in script code than in C.
Fetches the HTTP status code set via the setter variant of this function.
integer httpStatus()
CGI httpStatus(integer code [, string message = code-dependent default])
Gets or sets the HTTP response status code. The default, if not explicitly set, is 200 (success). If called with no arguments (the getter) it returns the current status code, else it returns this object. When passing a message, it must not contain any newlines, nor should it be "unduly long", else it may corrupt the response header
CGI init([object options])
As of 20171201, to accommodate this module's inclusion in both a DLL and statically linked in, this module must be manually initialized, by calling this function, before using any of its other methods. Returns this object and throws if called more than once. It optionally takes an object with configuration options for the CGI framework:
pushOb (bool, default=true)
: If truthy, push an output buffer ("ob") layer. If false, the client must ensure that output buffering is enabled before generating body output (e.g. by usings2out
or using this object's<<
operator). It is necessary to buffer the body so that HTTP response headers can be sent in the proper order (before the body, but new headers may be created up until the body is output).importEnv (bool, default=false)
: If truthy, it imports all environment variables intocgi.request.ENV
, otherwise it only imports those which are specified for CGI scripts. Any which have no value are set as empty strings. The core environment vars are:AUTH_CONTENT
AUTH_TYPE
CONTENT_LENGTH
(noting that the content will have already been consumed by the native-level CGI bits).CONTENT_TYPE
DOCUMENT_ROOT
HTTP_ACCEPT
HTTP_ACCEPT_ENCODING
HTTP_COOKIE
HTTP_HOST
HTTP_IF_MODIFIED_SINCE
HTTP_IF_NONE_MATCH
HTTP_REFERER
HTTP_SCHEME
HTTP_USER_AGENT
HTTPS
PATH
PATH_INFO
QUERY_STRING
REMOTE_ADDR
REQUEST_METHOD
REQUEST_URI
REMOTE_USER
SCGI
SCRIPT_DIRECTORY
SCRIPT_FILENAME
SCRIPT_NAME
SERVER_NAME
SERVER_PORT
SERVER_PROTOCOL
SERVER_SOFTWARE
CGI operator<<(arg)
Overloads the <<
operator, effectively behaving like the s2out
keyword
but also pushes an output buffer level to the stack if none has yet
been initialized. This operator adds no extra spacing nor newlines to
the output. It returns this object, so calls may be chained.
string headerTimestamp([integer unixEpochTime = -1])
For a given Unix epoch time, this returns the RFC-7231 time string. A negative time value is interpreted as the current time. This time format is often seen in HTTP headers.
void respond([bool exit=false])
Outputs all pending output, submitting the HTTP headers first,
followed by any body content living in the output buffering subsystem
(it flushes each buffering level, one at a time - all levels pushed by
init()
or since init()
was called). This must only be called one
time, and calling it cleans up the module internals. Calling any
C-native methods on this object after this will trigger an
exception. If it is passed a truthy value, it triggers an exit, such
that control never returns to the calling script. After this object
answers its request, there's really no more for it to do, in any case,
and exiting immediately seems easier for script code than handling
waiting-to-unwind client code which is expecting to be able to pop
output buffers which are no longer there.
void respondPassthrough(string filename [, bool exit=false])
An alternative to respond()
(see above) intended for efficiently
serving static files. This function bypasses all output buffering,
emits any pending output headers, then streams the contents of the
given filename directly to the lowest-level cwal-defined output
channel (typically sent to stdout, but s2sh
's (-o filename
) option can
override that). See respond()
for details and implications of this
function cleaning up the native object, as well as a description of
the 2nd parameter.
CGI setContentType(string)
Sets the Content-type header to the given value, replacing any previous one. Please use this function instead of setting that header directly so that we can avoid any case-sensitivity problems (the HTTP standard is case-insensitive with regards to the header keys, but this C code is not!). Returns this object. TODO?: define an s2 enum for the set of content type strings, to avoid any case-sensitivity issues.
CGI setCookie(key, value)
Sets the given cookie. Set the value to null
to cause it to
expire. Setting a cookie multiple times replaces the to-send value
each time. By default, all cookies are session cookies, expiring when
the browser is closed. Note that this simply queues up a cookie for
the respond()
method, and does not change any visible cookie
state. Its value will not be available in cgi.request.COOKIES
until
the client sends it again. Any basic value type is fine as a cookie
value, but if the value is container type then it is expected to have
the following structure (all listed properties are optional and extra
properties are ignored):
{
value: mixed, // Use null/undefined to unset the cookie
domain: string,
path: string,
expires: integer, // Unix Epoch time. null/undefined/values<=0 expire the cookie.
secure: bool,
httpOnly: bool,
sameSite: string // https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie/SameSite
}
Those properties all correspond to standard cookie attributes.
CGI setHeader(string key, Value)
Queues up the given HTTP response header to be sent via respond()
,
overwriting any prior value. It technically accepts any value type,
but your mileage may vary with non-strings/numbers. Returns this
object. As a special case, it allows an arbitrary enum entry as the
first parameter, and the value of that entry will be used as the
header key. The intention of that is to allow script-side code to
define an enum holding common headers, so that they don't have to
worry about the case of the key. Minor achtung: because HTTP headers are
case-insensitive, but this native code is not, this implementation
lower-cases all headers so that clients do not have to be 100% careful
to use consistent casing.
CGI setExpires(integer unixEpochTime)
Sets an "Expires" HTTP header to the specified UNIX timestamp. A negative value sets it for some arbitrarily long time in the future. Each call to this function overwrites any previous call's header value. Returns this object.
string urldecode(string)
Returns the URL-decoded copy of the given string. Note that the API decodes incoming request data automatically, so this is not normally needed. Intended for decoding individual elements at a time, not a whole query string.
string urlencode(string)
Returns the URL-encoded form of the given string. Use this for escaping, e.g. values for use in hyperlinks. It is intended to be passed individual values, not a whole query string (it will URL-escape the query string's special characters!).
Non-function Members
ob
This object holds the output buffering ("ob") API normally bound to
s2.ob
. It gets installed here just in case it's not otherwise installed as
s2.ob
in a given s2 runtime environment.
request
An Object which holds data from the request, described in the next section.
Request Data (cgi.request
)
Data can arrive to the CGI script via several paths, and those data are
encapsulated in the cgi.request
Object.
COOKIES
: this object contains any HTTP cookies sent by the client, in the form of key/value pairs. COOKIES is undefined if no cookies were sent.ENV
: this object only gets installed ifinit()
is explicitly told to (seeinit()
above). Contains all variables from the C-levelenviron
array. Note thatENV
values are always strings, even if they look like numbers. Applyunary +
to a numeric value to convert a value to a number (or 0 if it's not a number). Note that this object may contain security-relevant data (e.g. system-level paths) and should not generally be exposed to remote users via script code.GET
:undefined
if no URL parameters are passed in, otherwise this Object contains any GET parameters passed to the URL. If parameters are named likename[]
then the value is an Array and each new instance of that name is appended to that array. It does not support indexes/names inside the brackets, nor nested arrays, however. SeeENV
(above) for notes regarding the "stringiness" of the values. Though GET parameters are technically always strings, this API tries to convert them to their "apparent" cwal type (number, boolean, etc.). A GET parameter with no '=' is considered to be a flag with a boolean true value.POST
: The module accepts either raw UTF-8-encoded JSON or form-urlencoded data as POST data (which it internally transforms into a JSON-like construct). That data will then be set here.POST
is undefined if noPOST
data was submitted to the script or parsing it failed (silently). Note that form-urlencoded data are technically always strings, even if they look like numbers, but this module tries to convert them to their "apparent" type (numbers, booleans, etc.). JSON data retains whatever data type(s) the JSON was generated with. Note that it only accepts Arrays or Objects as top-level JSON values (as RFC 4627 specifies) and not arbitrary single values (as some JSON parsers allow). The API optimistically assumes that inbound content with any of the types "application/json", "application/javascript", or "text/plain" might be JSON. Any other type of content which arrives via POST, except for form-urlencoded, is not consumed. Invalid JSON is discarded without any sort of error (downstream code which expects it may fail loudly, of course).user
: Reserved for client-side use. Script-side frameworks which use user authentication are encouraged to store relevant information about the current user in the request object for easy access throughout scripts.TODO?
HEADERS
: if someone can tell me how to access arbitrary request headers via CGI, i'll add those to the request object. i'm not actually sure we get those at this level, other than the ones which CGI specifies go in the environment? How to get X-MyHeader headers?
Sidebar: Why uppercase names? It's inherited from PHP, which uses
similarly-named globals for this same purpose. It also incidentally
avoids any confusion with the built-in Object.get()
method. It's
worth noting that the CGI object removes the prototypes from the
GET
, POST
, etc. objects, to ensure that clients don't
inadvertently get (as it were) any inherited properties when checking
for request data.
Example Usage
Here is an example, using the interactive shell, which demonstrates the general workflow for a CGI app:
[stephan@host:~/...]$ QUERY_STRING='a[]=1%202&a[]=2' s2sh -v
...
s2sh> const c=s2.loadModule('cgi.so')
s2sh> c.init(); // required as of 201712. Accepts config params.
// ^^^ init() pushes an output buffer onto the stack by default
s2sh> c.setCookie('one',1) // headers get buffered separately
s2sh> c.setContentType('application/json')
s2sh> print(c.request.toJSONString(2)) // goes to the output buffer
s2sh> c.respond()
// ^^^^ sends headers first, then flushes pending buffered content
Status: 200 OK
Content-type: application/json
Set-Cookie: one=1
{"GET": {"a": [
"1 2",
"2"]
}}
The use of the output buffering API is important so that the content
type, cookies, and other headers may be changed throughout the life of
the app. Once any output is sent, headers may no longer be set/changed.
The respond()
call will output any headers first and flush any pending
buffered output (any number of buffering levels). respond()
must only be
called a single time, and calling it actually shuts down/cleans up the
CGI module. Calling any of its methods after respond()
will cause an
exception because they will no longer be able to find their (already
destroyed) C-side data.
Footnotes
- ^ th1ish is s2's direct ancestor.