jckarter

everyone already knows i'm a dog

the swift programming language is my fault to some degree. mostly here to see dogs, shitpost, fix old computers, and/or talk about math and weird computer programming things. for effortposts check the #longpost pinned tag. asks are open.


email
mailto:joe@duriansoftware.com

This book I got in a pile of FM TOWNS books turns out to be a lot more interesting that I was expecting an '80s C compiler manual to be. For as long as C and its relatives have been in mainstream use, it has been necessary to use vendor language extensions to actually get anything done with it, though in today's GCC/Clang/MSVC oligopoly those extensions tend to be focused on the yak-shaving details of dealing with the underlying platform. Things were much more interesting in the 80s, when there were a lot more, smaller companies competing for adoption. Phar Lap wrote one of the first DOS extenders that allowed programs to take full advantage of the 32-bit 80386 processor from the otherwise 16-bit-bound MS-DOS environment, and they hired MetaWare to port their High C Compiler to their SDK.

Fujitsu in turn chose Phar Lap's DOS extender to integrate into the OS for their 80386-based FM TOWNS platform, and High C became the first-party C compiler for the platform. The FM TOWNS came out in 1989, just barely in time for the first ANSI C standard C89 to be ratified. High C has its share of DOS-specific extensions, but it also contains a lot of interesting user-oriented language extensions I haven't seen in other C compilers I've used, ranging from small quality of life improvements to fairly advanced features you wouldn't think would be possible in C, let alone a late-80s dialect of C! Some of these things would take literal decades to make it into some official standard of C or C++, and some of them still don't have equivalents in either language today. Here are some of the extensions I found interesting:


Underscores in numeric literals

manual page explaining that _ can be placed in numeric literals, like 1_000_000 for 1000000

It's a little thing, but it always bothers me when a programming language doesn't let you write long numeric literals with separators to make it readable. Many other languages have had this since C, but C++ didn't get anything like this till C++14, using the single quote as a separator like 1'000'000 instead of underscore, and C only followed suit earlier this year with C23.

Labeled arguments

manual page showing the use of labeled arguments. after declaring void P(int A, float B, Color C, Color D);, you can call it with named arguments as P(C => Red, D =› Blue, B => X*10. 0, A => y);

When calling functions with lots of parameters, or with parameters of nondescriptive types like bool, it's extremely helpful to be able to label the arguments in the call site. This is one of Python's most popular features, and High C's variant works a lot like Python. Argument labels are optional, but when they're present, you can specify the arguments in any order, using argumentName => value syntax, and you can combine unlabeled and labeled arguments arbitrarily as long as every parameter to the function has one matching argument. Neither standard C nor C++ has this feature yet.

Case ranges

manual page screenshot showing the use of case ranges case 'A'..'Z': to match all ASCII uppercase letters

Pascal lets you match a range of values with case low..high; wouldn't it be great if C had that feature? High C does, another feature standard C and C++ never adopted.

Nested functions

manual page screenshot showing the use of nested functions, including the void Callme()! type syntax for declaring the "full function value" type, and the ability to goto from nested functions into the parent function

The previous features were just very nice to have, but here we get into features that start greatly increasing the expressivity of the language. High C lets you nest functions inside of other functions, another borrow from Pascal. However, High C's implementation is much more interesting and complete than standard Pascal or GCC's nested function extension. Not only can you declare nested functions, but you can declare "full function value" types. Unlike traditional C function pointers, these work as nonescaping closures, carrying a context pointer in addition to the function pointer to let the nested function find its captured context again. (GCC infamously did horrible things to allow for nested functions to be referenced by normal function pointers, by writing executable code into the callstack to thunk the context pointer, an obvious security nightmare causing many platforms to disable the feature entirely.) This allows local function references to be used as first-class values, though their lifetime doesn't extend past when the surrounding function returns. Nested functions can even goto back into their parent function, allowing for nonlocal exits to break out of nested functions like Smalltalk blocks, allowing control flow-like functions to be built using them.

Objective-C got blocks in 2009, which can be used as escaping closures, and C++ got lambdas in 2011, but neither language got the nonlocal exit ability. Standard C still has yet to have any official nested function feature.

Generators

manual page demonstrating the generator and yield syntax, along with the for loop syntax to consume it

MetaWare was clearly proud of this since they dedicate a whole chapter to explaining it. All the way back in 1989, they supported Python-style generator coroutines! In plain C! A function declared with the syntax void foo(Arg arguments) -> (Yield yields) can call the magic function yield(values...) multiple times to generate a sequence of values. Callers can then use a new for loop syntax for variable... <- foo(arguments...) do { ... } to run a loop over each of the generator's yielded values in turn.

manual page showing an example of a recursive local function call traversing a tree, and yield-ing to the outer generator function

The implementation even allows for some pretty intricate interactions with the nested function feature. A function nested in a generator can capture the yield operation from the outer generator, and the nested function can call itself recursively to traverse a tree or other recursive data structure, yield-ing at each level to produce values for the generator. I don't think you can do that in Python or in many other mainstream languages with generator coroutines.

manual page demonstrating the desugaring of generators and for loops into nested functions

How does all this work in plain C without a fancy runtime? High C's generators act as relatively straightforward syntax sugar over the nested function feature. When you declare a generator function void foo(Arg arguments) -> (Yield yields), that's equivalent to declaring a normal function void foo(void yield(Yield yields)!, Arg arguments), where yield is an implicit parameter of "full function value" type. Using yield(values) inside the generator body is a regular function call into that implicit function parameter. On the caller's side, a for loop's body is transformed into a nested function, which is passed as the yield argument to the generator. Simple, yet effective. Since nested functions allow for nonlocal exits, break, continue, or goto out of the for loop body work too by doing a goto to the appropriate place outside of the loop.

It's unlikely that standard C would ever attempt to integrate a feature like this. C++20 now has an extremely flexible and complicated coroutine feature, based on compile-time coroutine transformations, and you can probably implement generators using it, though the resulting feature probably wouldn't be able to so straightforwardly compose with local functions.


You must log in to comment.

in reply to @jckarter's post:

i know stroustrup has been on record saying that "functions shouldn't have enough parameters to need labeled arguments because that means they're too complicated" and, lol you made C++. but there are practical issues with just using the declared parameter names as labels in C and C++, since that makes those parameter names part of your API. functions tend to get redeclared all over the place in various headers, and then the function definition is separate from the declaration, and in cases like that, it isn't clear whose parameter names would win. it really seems like a feature that needs to be part of the language from inception, like in Python or Smalltalk, so that APIs are designed with it in mind

ah yeah, that would be a big compatibility problem, and trying to rework headers to stop doing that would break way too many things.

i guess options structs are the way to go then. really glad that c++20 finally got designated initializers

yeah one of the examples in the generator chapter is basically "look at this two-page-long #define you'd have to write to loop over a tree in standard C, now let's write it as a regular function using generators"