(⬑Main Module Docs)

CGI: You know, for s2-powered CGI scripts!

Source code: /dir/s2/mod/cgi?ci=trunk
Test/demo code: /finfo/s2/mod/cgi/test.s2

Jump to:

Intro
- Loading the CGI module
CGI Environment Vars
CGI Methods
Non-function Members
- Request Data (cgi.request)
Example Usage

Intro

The CGI module provides a basic framework for implementing CGI scripts using s2. It's possibly missing a few useful features. What's there works reasonably well, though. It is in active use in several of my own sites to power JSON-centric back-end services.

Example code:

This module is the basis of s2cgi, a perpetually-in-development higher-level framework for JSON APIs, implemented in script code:

https://fossil.wanderinghorse.net/r/s2cgi

My own sites use several CGIs based on s2cgi, including:

The CGI loadable module, when its init() method is called, populates script-space with various CGI input data and utilities for working with it. It is intended to be used for implementing CGI applications entirely in s2, and its direct code-predecessor was used to implement CGI demo pages for the libfossil/th1ish¹ bindings. This module provides a basis for writing CGI applications, but not a complete framework. All but the lowest-level work is intended to be done in script code (e.g. routing/dispatching, output buffering management (assisted by the C layer), page rendering, ...).

As of this writing (20181122), the CGI module is used by several back-end scripts on my own website and s2.tmpl() is used to generate most of the static HTML files from various input sources, including an sqlite3 database.

Its main services/features include:

Converts all GET parameters and cookies into a script-usable form.
Makes incoming POSTed JSON or form-urlencoded data available to the script.
Intended to be used with the output buffering API (s2.ob.*), to enable setting of HTTP headers (e.g. cookies) at arbitrary points throughout the page generation process (headers must be output first, but buffering allows headers to be changed throughout the generation of the page).

Loading the CGI module:

const cgi = s2.loadModule('cgi.so');

When using require.s2:

const cgi = s2.require(['s2mod!cgi']).0;
// or:
s2.require( ['s2mod!cgi'], proc(cgi){ … } );

In order to properly accommodate this module's usage via both a DLL and statically built in to an app, its init() method must be called once before its other methods are used. Because that method returns the CGI object, it may be called directly when loading the module, e.g.:

const cgi = s2.loadModule('cgi.so').init();
// or:
const cgi = s2.require(['s2mod!cgi']).0.init();

CGI Environment Variables

The following description of the relationship of certain CGI environment variables is taken verbatim from code comments in the Fossil SCM project:

REQUEST_URI, PATH_INFO, and SCRIPT_NAME are related as follows:

REQUEST_URI == SCRIPT_NAME + PATH_INFO

Or if QUERY_STRING is not empty:

REQUEST_URI == SCRIPT_NAME + PATH_INFO + '?' + QUERY_STRING

Where "+" means concatenate.

Sometimes PATH_INFO is missing and SCRIPT_NAME is not a prefix of REQUEST_URI. (See https://fossil-scm.org/forum/forumpost/049e8650ed.)

SCGI typically omits PATH_INFO. CGI sometimes omits REQUEST_URI and PATH_INFO when it is empty.

CGI Parameter quick reference:

                               REQUEST_URI
                       _____________|________________
                      /                              \
https://fossil-scm.org/forum/info/12736b30c072551a?t=c
\___/   \____________/\____/\____________________/ \_/
  |           |          |             |            |
  |       HTTP_HOST      |        PATH_INFO     QUERY_STRING
  |                      |
REQUEST_SCHEMA         SCRIPT_NAME

CGI Methods

This module defines the methods described in this section. The intent is that the CGI object be fleshed out via script code, e.g. with convenience methods for working with the CGI environment's GET/POST data, as well as any app-specific routing functionality. Those sort of features are much simpler to implement and evolve in script code than in C.

Fetches the HTTP status code set via the setter variant of this function.

integer httpStatus()
CGI httpStatus(integer code [, string message = code-dependent default])

Gets or sets the HTTP response status code. The default, if not explicitly set, is 200 (success). If called with no arguments (the getter) it returns the current status code, else it returns this object. When passing a message, it must not contain any newlines, nor should it be "unduly long", else it may corrupt the response header

CGI init([object options])

As of 20171201, to accommodate this module's inclusion in both a DLL and statically linked in, this module must be manually initialized, by calling this function, before using any of its other methods. Returns this object and throws if called more than once. It optionally takes an object with configuration options for the CGI framework:

pushOb (bool, default=true): If truthy, push an output buffer ("ob") layer. If false, the client must ensure that output buffering is enabled before generating body output (e.g. by using s2out or using this object's << operator). It is necessary to buffer the body so that HTTP response headers can be sent in the proper order (before the body, but new headers may be created up until the body is output).
importEnv (bool, default=false): If truthy, it imports all environment variables into cgi.request.ENV, otherwise it only imports those which are specified for CGI scripts. Any which have no value are set as empty strings. The core environment vars are:
- AUTH_CONTENT
- AUTH_TYPE
- CONTENT_LENGTH (noting that the content will have already been consumed by the native-level CGI bits).
- CONTENT_TYPE
- DOCUMENT_ROOT
- HTTP_ACCEPT
- HTTP_ACCEPT_ENCODING
- HTTP_COOKIE
- HTTP_HOST
- HTTP_IF_MODIFIED_SINCE
- HTTP_IF_NONE_MATCH
- HTTP_REFERER
- HTTP_SCHEME
- HTTP_USER_AGENT
- HTTPS
- PATH
- PATH_INFO
- QUERY_STRING
- REMOTE_ADDR
- REQUEST_METHOD
- REQUEST_URI
- REMOTE_USER
- SCGI
- SCRIPT_DIRECTORY
- SCRIPT_FILENAME
- SCRIPT_NAME
- SERVER_NAME
- SERVER_PORT
- SERVER_PROTOCOL
- SERVER_SOFTWARE

CGI operator<<(arg)

Overloads the << operator, effectively behaving like the s2out keyword but also pushes an output buffer level to the stack if none has yet been initialized. This operator adds no extra spacing nor newlines to the output. It returns this object, so calls may be chained.

string headerTimestamp([integer unixEpochTime = -1])

For a given Unix epoch time, this returns the RFC-7231 time string. A negative time value is interpreted as the current time. This time format is often seen in HTTP headers.

void respond([bool exit=false])

Outputs all pending output, submitting the HTTP headers first, followed by any body content living in the output buffering subsystem (it flushes each buffering level, one at a time - all levels pushed by init() or since init() was called). This must only be called one time, and calling it cleans up the module internals. Calling any C-native methods on this object after this will trigger an exception. If it is passed a truthy value, it triggers an exit, such that control never returns to the calling script. After this object answers its request, there's really no more for it to do, in any case, and exiting immediately seems easier for script code than handling waiting-to-unwind client code which is expecting to be able to pop output buffers which are no longer there.

void respondPassthrough(string filename [, bool exit=false])

An alternative to respond() (see above) intended for efficiently serving static files. This function bypasses all output buffering, emits any pending output headers, then streams the contents of the given filename directly to the lowest-level cwal-defined output channel (typically sent to stdout, but s2sh's (-o filename) option can override that). See respond() for details and implications of this function cleaning up the native object, as well as a description of the 2nd parameter.

CGI setContentType(string)

Sets the Content-type header to the given value, replacing any previous one. Please use this function instead of setting that header directly so that we can avoid any case-sensitivity problems (the HTTP standard is case-insensitive with regards to the header keys, but this C code is not!). Returns this object. TODO?: define an s2 enum for the set of content type strings, to avoid any case-sensitivity issues.

CGI setCookie(key, value)

Sets the given cookie. Set the value to null to cause it to expire. Setting a cookie multiple times replaces the to-send value each time. By default, all cookies are session cookies, expiring when the browser is closed. Note that this simply queues up a cookie for the respond() method, and does not change any visible cookie state. Its value will not be available in cgi.request.COOKIES until the client sends it again. Any basic value type is fine as a cookie value, but if the value is container type then it is expected to have the following structure (all listed properties are optional and extra properties are ignored):

{
  value: mixed, // Use null/undefined to unset the cookie
  domain: string,
  path: string,
  expires: integer, // Unix Epoch time. null/undefined/values<=0 expire the cookie.
  secure: bool,
  httpOnly: bool,
  sameSite: string // https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Set-Cookie/SameSite
}

Those properties all correspond to standard cookie attributes.

CGI setHeader(string key, Value)

Queues up the given HTTP response header to be sent via respond(), overwriting any prior value. It technically accepts any value type, but your mileage may vary with non-strings/numbers. Returns this object. As a special case, it allows an arbitrary enum entry as the first parameter, and the value of that entry will be used as the header key. The intention of that is to allow script-side code to define an enum holding common headers, so that they don't have to worry about the case of the key. Minor achtung: because HTTP headers are case-insensitive, but this native code is not, this implementation lower-cases all headers so that clients do not have to be 100% careful to use consistent casing.

CGI setExpires(integer unixEpochTime)

Sets an "Expires" HTTP header to the specified UNIX timestamp. A negative value sets it for some arbitrarily long time in the future. Each call to this function overwrites any previous call's header value. Returns this object.

string urldecode(string)

Returns the URL-decoded copy of the given string. Note that the API decodes incoming request data automatically, so this is not normally needed. Intended for decoding individual elements at a time, not a whole query string.

string urlencode(string)

Returns the URL-encoded form of the given string. Use this for escaping, e.g. values for use in hyperlinks. It is intended to be passed individual values, not a whole query string (it will URL-escape the query string's special characters!).

Non-function Members

ob

This object holds the output buffering ("ob") API normally bound to s2.ob. It gets installed here just in case it's not otherwise installed as s2.ob in a given s2 runtime environment.

request

An Object which holds data from the request, described in the next section.

Request Data (`cgi.request`)

Data can arrive to the CGI script via several paths, and those data are encapsulated in the cgi.request Object.

COOKIES: this object contains any HTTP cookies sent by the client, in the form of key/value pairs. COOKIES is undefined if no cookies were sent.
ENV: this object only gets installed if init() is explicitly told to (see init() above). Contains all variables from the C-level environ array. Note that ENV values are always strings, even if they look like numbers. Apply unary + to a numeric value to convert a value to a number (or 0 if it's not a number). Note that this object may contain security-relevant data (e.g. system-level paths) and should not generally be exposed to remote users via script code.
GET: undefined if no URL parameters are passed in, otherwise this Object contains any GET parameters passed to the URL. If parameters are named like name[] then the value is an Array and each new instance of that name is appended to that array. It does not support indexes/names inside the brackets, nor nested arrays, however. See ENV (above) for notes regarding the "stringiness" of the values. Though GET parameters are technically always strings, this API tries to convert them to their "apparent" cwal type (number, boolean, etc.). A GET parameter with no '=' is considered to be a flag with a boolean true value.
POST: The module accepts either raw UTF-8-encoded JSON or form-urlencoded data as POST data (which it internally transforms into a JSON-like construct). That data will then be set here. POST is undefined if no POST data was submitted to the script or parsing it failed (silently). Note that form-urlencoded data are technically always strings, even if they look like numbers, but this module tries to convert them to their "apparent" type (numbers, booleans, etc.). JSON data retains whatever data type(s) the JSON was generated with. Note that it only accepts Arrays or Objects as top-level JSON values (as RFC 4627 specifies) and not arbitrary single values (as some JSON parsers allow). The API optimistically assumes that inbound content with any of the types "application/json", "application/javascript", or "text/plain" might be JSON. Any other type of content which arrives via POST, except for form-urlencoded, is not consumed. Invalid JSON is discarded without any sort of error (downstream code which expects it may fail loudly, of course).
user: Reserved for client-side use. Script-side frameworks which use user authentication are encouraged to store relevant information about the current user in the request object for easy access throughout scripts.
TODO? HEADERS: if someone can tell me how to access arbitrary request headers via CGI, i'll add those to the request object. i'm not actually sure we get those at this level, other than the ones which CGI specifies go in the environment? How to get X-MyHeader headers?

Sidebar: Why uppercase names? It's inherited from PHP, which uses similarly-named globals for this same purpose. It also incidentally avoids any confusion with the built-in Object.get() method. It's worth noting that the CGI object removes the prototypes from the GET, POST, etc. objects, to ensure that clients don't inadvertently get (as it were) any inherited properties when checking for request data.

Example Usage

Here is an example, using the interactive shell, which demonstrates the general workflow for a CGI app:

[stephan@host:~/...]$ QUERY_STRING='a[]=1%202&a[]=2' s2sh -v
...
s2sh> const c=s2.loadModule('cgi.so')
s2sh> c.init(); // required as of 201712. Accepts config params.
    // ^^^ init() pushes an output buffer onto the stack by default
s2sh> c.setCookie('one',1) // headers get buffered separately
s2sh> c.setContentType('application/json')
s2sh> print(c.request.toJSONString(2)) // goes to the output buffer
s2sh> c.respond()
    // ^^^^ sends headers first, then flushes pending buffered content
Status: 200 OK
Content-type: application/json
Set-Cookie: one=1

{"GET": {"a": [
      "1 2",
      "2"]
    }}

The use of the output buffering API is important so that the content type, cookies, and other headers may be changed throughout the life of the app. Once any output is sent, headers may no longer be set/changed. The respond() call will output any headers first and flush any pending buffered output (any number of buffering levels). respond() must only be called a single time, and calling it actually shuts down/cleans up the CGI module. Calling any of its methods after respond() will cause an exception because they will no longer be able to find their (already destroyed) C-side data.

Footnotes

^{^} th1ish is s2's direct ancestor.

s2 CGI