libfossil-forum

RFC: require C99?
Login

RFC: require C99?

RFC: require C99?

(1) By Stephan Beal (stephan) on 2021-02-12 05:55:46 [source]

libf has, since its inception, been purely C89, with only tiny exceptions:

  • It requires long long because sqlite3 uses it, which isn't strictly C89 but is supported by all platforms.

  • It optionally supports C99 for its <stdint.h> and <inttypes.h>, which define fixed-size integers and portable printf-style format flags for them.

When libf was started, MSVC did not support stdint/inttypes (not until 2015, i'm told), so C89 was a no-brainer.

Is there a compelling reason to switch to C99 as a minimum platform? The relatively minor benefits i can imagine from it:

  • Remove fsl_int32_t, fsl_int64_t, and use the standard variants instead.
  • ... nothing else comes to mind, really.

Opinions?

It's always easy to say, "C89 is ancient history! Upgrade!" but, IMHO, the newer versions of the language have yet to make any truly compelling/can't-life-without additions other than portable fixed-size integers and standard macros for their printf-style format flags. (C89 compilers can use those by simply including the C99 headers - there's no language-level change needed there.)

"If it ain't broke, don't fix it," but i'm open to other opinions on the topic. In short: is C89 holding us back in any way?

(2) By sdr on 2021-02-12 06:49:37 in reply to 1 [link] [source]

As a C library, I think it should be left at ANSI C for the same reason SQLite does: Make it compatible with as many platforms as possible. Someone could create a tiny fossil compatible tool that would work in an environment that fossil itself might not work in.

Any pre-build configure step can detect if "stdint.h" and/or "inttypes.h" is available, conditionally including them if they exist and defining the same type names as the C99 standard if the headers do not exist. This allows libfossil to have the benefit of the shorter standard names when they are guaranteed to exist, and to use the same names when they are not reserved by the system (as they are not reserved identifiers in C89).

There are other reasons to stick with fsl_* prefixed identifiers if you don't like that solution, as it is less likely to conflict with other software that might try to do the same thing in an incompatible way.

The other issue with C99 (which libfossil will have to take into consideration) is that people will compile it with toolchains that claim C89 compliance, but use the same standard library headers with C99 or later. Those headers often have platform specific features to detect C99 nonstandard or undefined issues that simply weren't specified in C89. For example, if my memory holds, "memcpy(0,0,0);" wasn't undefined in C89, but is undefined in C99 due to null pointers being passed to memcpy, even though the specified size means the pointers need never be dereferenced.

So it really should IMO stick to C89 unless there is a compelling reason (64 bit integers) and it should use the more restrictive definition for standard functions that are still basically compatible.

It could mean something as simple as having a fsl_memcpy() (for example) that does the null pointer checking and skips the memcpy() call if the size is 0 to avoid passing null pointers to the function. Or just ensuring one never passes null pointers to those functions when that is perfectly fine.

Mind you, I personally prefer to use the latest standard C++ version available on a platform for my own projects, but I'm generally not trying to write such portable tools, either.

(3) By Stephan Beal (stephan) on 2021-02-12 07:26:56 in reply to 2 [link] [source]

Any pre-build configure step can detect if "stdint.h" and/or "inttypes.h" is available, conditionally including them if they exist and defining the same type names as the C99 standard if the headers do not exist. This allows libfossil to have the benefit of the shorter standard names when they are guaranteed to exist, and to use the same names when they are not reserved by the system (as they are not reserved identifiers in C89).

It does that, but it still has to create project-local typedefs for those so that the project-level code doesn't have to know whether we have the C99 ints or not:

https://fossil.wanderinghorse.net/r/libfossil/file?ci=tip&name=include%2Ffossil-scm%2Ffossil-config.h&ln=73-120

So it really should IMO stick to C89 unless there is a compelling reason (64 bit integers) and it should use the more restrictive definition for standard functions that are still basically compatible.

We necessarily inherit a portable (though not strictly C89-standard) int64 from sqlite, in the form of long long. When compiling with -std=c89 -Wall -pedantic, it will complain about long long unless we disable that warning (which we necessarily do, as we also build with -Werror). Thus we don't need C99 for an int64, but having C99 would give us the well-defined int64_t.

It's six one way and half a dozen the other, i guess.

It could mean something as simple as having a fsl_memcpy() (for example) that does the null pointer checking and skips the memcpy() call if the size is 0 to avoid passing null pointers to the function.

That level of defined/undefined doesn't bother me - i've never seen anyone pass NULL to memcpy(). We do, however, implement fsl_strlen() and similar util functions for similar concerns.

(4.1) By sdr on 2021-02-12 09:44:07 edited from 4.0 in reply to 3 [link] [source]

It does that, but it still has to create project-local typedefs for those so that the project-level code doesn't have to know whether we have the C99 ints or not:

What I meant was it is perfectly legal in C89 to write:

typedef unsigned char uint8_t;

Because C89 doesn't reserve that identifier. Thus one could write:

#ifdef HAVE_STDINT_H
#include <stdint.h>
#else
typedef unsigned char uint8_t;
/* etc */
#endif

The idea being if stdint.h isn't available, then the name isn't reserved. But it can still conflict with another incompatible use of the same symbol. That is not a consideration for an application that controls what libraries it uses, but it is a consideration for a library that would like to be useful when combined with yet more libraries.

So ... yeah. libfossil could decide to define those types in C89 with the reserved names from C99, or use the C99 names when stdint.h is available, but the names would be the same. But defining those names in C89 potentially conflicts with other libraries that attempt to do the same thing because they all exist in the same namespace.

The machinations required to accomplish it might not be desirable. It is probably best to use fsl_intx types, at least in headers. In a C file you have more flexibility because you control what you include.

In typing that example text, however, it does occur to me that there are two additional benefits to C99.

  1. Variables can be declared anywhere a statement can go. Some people feel strongly that all variables belong at the beginning of a scope. I personally like putting them as close to first use as possible. Both are appropriate at different times.

  2. // style comments

I'm not saying those two things are sufficient to go C99. I agree with you that C99 is a marginal improvement to C in just a few places. Things like variable length arrays were a mistake I think. They are useful but just don't feel C-like. But then as I said, I like C++. :)

(5.1) By sdr on 2021-02-12 09:58:11 edited from 5.0 in reply to 4.1 [link] [source]

OR!

/* fsl_stdint.h */

#ifdef HAVE_STDINT_H
#include <stdint.h>
typedef uint8_t fsl_uint8_t;
/* etc */
#else
typedef unsigned char fsl_uint8_t;
/* etc */
#endif

#ifdef LIBFOSSIL_INTERNAL
#define uint8_t fsl_uint8_t
/* etc */
#endif

Something like this could allow one to have fsl_uintX_t types for the public interface, but "switch" to the shorter names for internal use of the library.

It may not be worth it, and it may have a standard compliance issue with some or all versions of the C standard (not using the preprocessor to redefine symbols reserved by the standard), just something that popped into my head as I lay here not falling asleep.

(6) By Stephan Beal (stephan) on 2021-02-12 12:48:51 in reply to 4.1 [link] [source]

So ... yeah. libfossil could decide to define those types in C89 with the reserved names from C99, or use the C99 names when stdint.h is available, but the names would be the same. But defining those names in C89 potentially conflicts with other libraries that attempt to do the same thing because they all exist in the same namespace.

Interestingly... i don't know if this applies to C, but it does (or once did) to C++: the suffix _t is supposedly reserved for use by standard library implementors. i think (but am not certain) that that's a C++ thing. i use xyz_t all the time and it hasn't yet slapped me, but having read that somewhere always sticks in the back of my mind.

Variables can be declared anywhere a statement can go.

Once you get used to the C89 way, it's not all that onerous. i'm not strongly for one or the other, and generally prefer the "declare at the latest possible point" approach, but (OTOH) often find myself overlooking decls when they're not all grouped together. That's not a strong preference, though. i see it as "slight syntactic sugar" rather than a reason to switch.

// style comments

Those i do miss. Often, actually. Also minor syntactic sugar, but it is annoying to have to wrap a line in /* ... */ just to comment it out.

i have no idea, really, what my hang-up about updating is. If a single platform, like pre-2015 MSVC, can't be bothered to update then screw 'em.

Hmmm.

(7) By sdr on 2021-02-12 16:56:00 in reply to 6 [link] [source]

Interestingly... [I] don't know if this applies to C, but it does (or once did) to C++: the suffix _t is supposedly reserved for use by standard library implementors. [I] think (but am not certain) that that's a C++ thing. [I] use xyz_t all the time and it hasn't yet slapped me, but having read that somewhere always sticks in the back of my mind.

C++ inherited that from C, I believe, though I don't have standards at hand to "prove" it. Gnu credits C and posix: https://www.gnu.org/software/libc/manual/html_node/Reserved-Names.html

So if one really wants to be as standard conforming as possible, _t suffixed identifiers (whether types or not) are to be avoided. So forget what I said which works in every environment I've tried but is not "portable" for some definition. Really anything with a leading underscore or any double underscore (at least for C++).

I find cppreference.com a great resource for both C and C++. It isn't "the standard" but it attempts to reference them and show when things were introduced or deprecated or removed. In particular: https://en.cppreference.com/w/c/language/identifier.

(8) By Stephan Beal (stephan) on 2021-02-12 21:19:54 in reply to 1 [link] [source]

As proof that the internet works...

Opinions?

After a brief off-list exchange with Warren, specifically about the topic covered here:

https://fossil-scm.org/forum/forumpost/bc132b9cb8

he's convinced me (without trying) that C99 is the way to go. In short, libf makes tons of use of variadics with custom format specifiers, and C99's behavior with them is simply better-defined. i will sleep far better knowing that i don't have to cast every (non-literal) variadic argument in order to get well-defined behaviour.

i'll go ahead and add a check for C99 in auto.def (configure script) and remove the "if we don't have C99" options from the config header.

So... feel free to use any C99 bits you like now, including //-one-line-comments.

(9) By Stephan Beal (stephan) on 2021-02-12 21:32:25 in reply to 8 [link] [source]

i'll go ahead and add a check for C99 in auto.def (configure script) and remove the "if we don't have C99" options from the config header.

i'll also eliminate the fsl_intXX_t and fsl_uintXX_t types, but will retain fsl_int_t and fsl_uint_t because they have very specific meanings which are more about documentation and less about whether they are 32- or 64-bitmisref. That said, those might also get renamed to replace the _t suffix with some as-yet-undefined suffixed, as per the upthread discussion.

misref see include/fossil-scm/fossil-config.h for the docs which explain that.


  1. ^ a b Misreference

(10) By sdr on 2021-02-13 01:09:44 in reply to 9 [link] [source]

I feel like the platform size_t is the right unsigned type for fsl_uint_t on a brief reading of the referenced file.

There isn't a signed equivalent in standard C. There is a ssize_t defined in posix that is supposed to be the same bitwidth as size_t but can accommodate at least a -1 for an invalid size value. I've rarely if ever wanted it unless I was using a specific posix function that used it.

I think it could definitely be an "implementation detail" in the C code if needed that would not need to be exposed in the interface layer, where size_t should be able to handle any possible use of fsl_uint_t.

Except for the possibility of greater than 4GB. If it might be used to indicate an artifact size or a file size, then size_t could be inadequate.

Or if there is a desire to port to a platform smaller than 32 bit, but that seems unlikely.

(11) By Stephan Beal (stephan) on 2021-02-13 03:34:37 in reply to 10 [link] [source]

I feel like the platform size_t is the right unsigned type for fsl_uint_t on a brief reading of the referenced file.

i'm not even sure that that one is actually used, but fsl_size_t (which i forgot to rename to size_t but will do that) is. fsl_int_t (signed) is used primarily for documentation purposes, rather than technical ones. We have so many different integer semantics that i find it useful to use different typedefs to distinguish between them at times: intXX_t vs int vs fsl_int_t vs fsl_id_t.

Similarly, the API uses fsl_uuid_str and fsl_uuid_cstr to specifically mean C-style strings/const strings which are required (or, depending on the context, guaranteed) to be full UUIDs, as opposed to hash prefixes or tag names or some such. That makes the API easier for me to read. So never try to pass a hash prefix to a routine which is declared as taking a fsl_uuid_cstr, as that one requires a full SHA1/SHA3 hash instead.

Except for the possibility of greater than 4GB. If it might be used to indicate an artifact size or a file size, then size_t could be inadequate.

Fossil's actually limited to 32-bit signed blob sizes, so 2GB. It's not currently possible to get a blob bigger than that into fossil, and applying a delta to such a blob would be prohibitively RAM-expensive on many systems (e.g. shared hosters or an SBC system), requiring approx. 4GB plus the size of the delta in RAM concurrently.

The only thing fossil really needs 64 bits for in the public interface is statistics, e.g. the dbstat command. It hypothetically needs them for record IDs, but i don't think that it's even realistically possible for a repo to get 2+ billion blob entries and overflow the 32-bit signed int which fossil currently uses for them in the C code (whereas the db supports 64-bit row IDs).

Or if there is a desire to port to a platform smaller than 32 bit, but that seems unlikely.

The design initially (very briefly) aimed for that but, AFAIK, there's no 16-bit platform which can run sqlite.