Simple and terse C object-oriented programming for single inheritance designs

By Joel K. Pettersson. Added 2022-01-03. Updated 2022-04-28.

Loosely based on a 2012 article of mine, which I had on an older personal website in the early 2010s. Extended based on both an old project and a fresh look at it.

In 2013, I put the fanciest implementation of these ideas I had into a C library named 'SCOOP' for this kind of simple C OOP.

In plain C, to implement object-oriented type inheritance (of C struct members), functionally similar to how it works for C++ classes, can be done using the preprocessor – with inherited members of a superclass becoming members of the derived class, and accessed with the same syntax as members unique to that subclass – as long as only single inheritance is done. Though as with many other tricks, there's drawbacks and things to keep in mind.

With some additional elaboration on top of the basic approach, mainly adding information about each class, a virtual function table, and some code to handle basic tasks for all classes, this preprocessor-centered approach makes for the most concise way of writing single inheritance "C with classes" in plain C (rather than C++) than I've so far seen. Though it's not likely to be useful to copy this approach as-is for a project. I'd recommend looking at the ideas it contains and picking only what fits and properly works for the intended uses.

Object oriented design in C is usually best done differently, when it is to be done. For a more thorough and conventional look at the virtual functions part of the picture, I'd recommend reading Christopher Wellons's article.

Contents

Aggregation and prefixing

Looking first at the more common alternative to using the preprocessor, however, this would be to include an instance of the supertype at the very beginning of its subtype – which accomplishes two things:

  1. The ability to access inherited members, albeit through an added prefix – the name of the included instance of the supertype.
  2. Type-cast compatibility from type to supertype and back.

A very simple example showing the first point is below.

typedef struct TypeA {
        int x;
} TypeA;

typedef struct TypeB {
        TypeA a;
        int y;
} TypeB;
TypeB *b;
...
b->a.x = 10; /* access x through prefix */
b->y = 5;

There is nothing wrong with this approach, and in some ways it remains "the cleanest" solution. Alternatives which get rid of such an access prefix for inherited members have their own quirks and subtle uglinesses. However, whenever more than one level of type inheritance is done, it becomes rather nice to get rid of those prefixes which otherwise add up.

It makes sense to stick to aggregation when simple aggregation rather than an object-oriented hierarchy is wanted, though. Aggregation can also be combined with the below single inheritance approach for a touch of multiple inheritance, though I think that's only good design when the aggregation is a natural fit rather than a complication added to work around a lack of multiple inheritance.

Defining with shared macros

In various C codebases, there's one or another set of types which have a little something in common – such as a pointer or two – while there isn't really more in the way of any object-oriented hierarchy. For that purpose, it works very well to list the members common to the types in a macro definition and then use it at the beginning of each struct. For example...

#define LIST_MEMBERS \
        struct List *next, *prev;

typedef struct List { LIST_MEMBERS } List;

typedef struct TypeWithList {
        LIST_MEMBERS
        int value;
} TypeWithList;

Each type with the members of List included at the beginning can then be used by functions or macros for List. As long as the members included in this way are simply pointers, or of the same size as pointers, there's little to worry about.

This trick can however be used to include data of smaller sizes as well, in ways which do not line up with how struct sizes are padded at the end (with inserted extra bytes) by the compiler at compile time. How such padding is done may vary between systems. If a char c; were added to the end of LIST_MEMBERS, for example, then the type List may technically end up as large as TypeWithList (if pointers are 8 bytes large and both List and TypeWithList are padded to the next multiple of 8 bytes, say). Many things will still work the same regardless, but some things won't. If code uses sizeof(SomeBaseType) to decide how much to read from or write to an object which happens to be of ADerivedType, and the latter doesn't include all padding from SomeBaseType, the wrong thing will happen and data used by the derived type will be messed with.

That's the big problem with what is otherwise a straightforward way to get rid of the need to use a prefix to access members of an inherited type. This problem can be solved using the C11 feature of anonymous structs, by wrapping the list of inherited members in an anonymous struct when defining the list, or at least when using it. (This problem can also be non-portably solved by disabling padding when compilers support that, but that would result in different problems, as padding is used for a reason.) With that said, below is the example which used aggregation again, rewritten to use this style, both with and without the use of an anonymous struct.

#define TYPE_A_MEMBERS \
        int x;

typedef struct TypeA {
        TYPE_A_MEMBERS
} TypeA;

typedef struct TypeB {
        TYPE_A_MEMBERS
        int y;
} TypeB;

typedef struct TypeB_C11 {
        struct { TYPE_A_MEMBERS };
        int y;
} TypeB_C11;
TypeB *b;
...
b->x = 10; /* access x directly */
b->y = 5;

This works for accessing members by name; the prefix is removed. But more generally, with the non-C11 version sometimes padding at the end of TypeA will not be included in TypeB, which in practice means that there is only almost type cast compatibility. Taking the safer C11 pattern a step further can look like the below.

#define TYPE_B_MEMBERS struct { TYPE_A_MEMBERS }; \
        int y;

typedef struct TypeB {
        TYPE_B_MEMBERS
} TypeB;

typedef struct TypeC {
        struct { TYPE_B_MEMBERS };
        int z;
} TypeC;

Either version can be repeated many levels. But compared to simple aggregation, this approach becomes somewhat more verbose and messy to use when defining types (as opposed to when using them), probably part of why it's not so often seen used for more than two levels of types.

Proceeding along this path, however, terseness can be brought using the C preprocessor more fully to define types, and deciding on a naming convention for macros providing information about types. This technique could be used for more elaborate collections of related struct types, but I have yet to find it useful to use by itself. For each type Name, if its members are listed in a macro Name_, for example, then the convenience macro below can be used to define the type.

#define structdef(Name) \
        typedef struct Name { Name##_ } Name

Following this convention, with the basic idea of listing the members of each type of a hierarchy in a macro, the resulting code is terser, albeit a bit different-looking:

#define TypeA_ \
        int x;
structdef(TypeA);

#define TypeB_ TypeA_ \
        int y;
structdef(TypeB);

#define TypeB_C11_ struct { TypeA_ }; \
        int y;
structdef(TypeB_C11);

There's now an additional, smaller catch, in the form of slightly worse diagnostics from the compiler if something is wrong with the list of members used for such a type. The contents of a macro definition always all end up on one single line, as the compiler sees it – both when defined and when referenced – so the line and character numbers in any warnings or errors when compiling that piece of code will be less useful.

Elaborating a real type system

This is a summary of what my old SCOOP library contains and does, written roughly a decade later. Keep in mind that the approach builds on the unsafe (non-C11) version of the above, which means that data added in derived class structs may overlap with padding at the end of a base class struct. (Changing that is possible using a combination of anonymous union and anonymous struct, but it's not done below.)

The above is too simple for when features such as virtual functions and run-time type identification are wanted, but can be extended to also support that. At the core, a more elaborate version of the convenience macro for defining a type can insert an extra pointer to a meta-type instance at the start of each class struct, before it uses the macro with the list of normal members. Each class requires only one extra pointer, and in each class the pointer will go to a struct of a different type – the unique meta-type of that class.

#define classtype(Name) \
        typedef struct Name { const struct Name##_Meta *meta; Name##_ } Name

The meta-type can be defined by the same use of a class-defining convenience macro. Its struct can hold a pointer to the supertype meta-type, pointers for virtual functions, etc. Function pointers for virtual function use can be listed in another macro named after the type, similarly to the one for normal class members, allowing more of them to be added at each step of the inheritance chain.

Below, enough is defined to allow defining the main and meta types of Class by writing classdef(Class);, and to allow defining a (global) meta-type instance describing the class by writing metainst(Class, ...). Constructor and "new"-function pairs can also be added using ctordef() (and optionally ctordec()). Together, this is basically a stripped-down core of what SCOOP's Object.h provides (it also has extensive comments).

typedef void (*Dtor)(void *o);
typedef void (*Vtinit)(void *o);

#define metatype(Class) \
typedef struct Class##_Virt { Dtor dtor; Class##__ } Class##_Virt; \
typedef struct Class##_Meta { \
        const struct Object_Meta *super; \
        size_t size; \
        unsigned short vnum; \
        unsigned char done; \
        const char *name; \
        Vtinit vtinit; /* virtual table init function, passed meta */ \
        Class##_Virt virt; \
} Class##_Meta

#define classdef(Class) \
        classtype(Class); \
        metatype(Class); \
        extern Class##_Meta _##Class##_meta

// global meta type instance
#define metaof(Class) (&(_##Class##_meta))

// universal Object and Object_Meta structs, for type cast use only
#define Object_
#define Object__
classtype(Object);
metatype(Object);

#define metainst(Class, Superclass, dtor, vtinit) \
struct Class##_Meta _##Class##_meta = { \
        (Object_Meta*) metaof(Superclass), \
        sizeof(Class), \
        (sizeof(Class##_Virt) / sizeof(void (*)())), \
        0, \
        #Class, \
        (Vtinit)vtinit, \
        {(Dtor)dtor} \
}

// allow "None" to be used as a (super)class name with NULL meta type instance
#define _None_meta (*(Object_Meta*)(0))

// declare "new" and constructor function pair, end with ;
#define ctordec(Class, FunctionName, Parlist) \
Class* FunctionName##_new Parlist; \
unsigned char FunctionName##_ctor Parlist

// define "new" and constructor function pair, end with { ... }
#define ctordef(Class, FunctionName, Parlist, Arglist, o) \
unsigned char FunctionName##_ctor Parlist; \
Class* FunctionName##_new Parlist \
{ \
        void *ctordef__mem = (o); \
        (o) = raw_new(ctordef__mem, metaof(Class)); \
        if ((o) && !FunctionName##_ctor Arglist) { \
                ((Object*)(o))->meta = metaof(None); \
                if (!ctordef__mem) free(o); \
                return 0; \
        } \
        return (o); \
} \
unsigned char FunctionName##_ctor Parlist

If each meta-type is given a global instance named after the type, then the need to add and use boilerplate code along with each type defined can be reduced greatly. Most of the work can be done by a few functions common to all types, for handling generic allocation given a passed meta-type, meta-type initialization (when the global instance used isn't already all-set), run-time type checks, etc. Terse and simple use along with the types, and for defining things like constructors making use of supertype constructors, etc., simply requires more convenience macros. With such a framework, there's no need to require any explicit registering of types before being able to create and destroy instances, as used in some object-oriented C designs, nor any need for clean-up de-registration.

Below is minimal runtime code which can be used by all types. It's a somewhat simplified version of the code in SCOOP's Object.c.

void pure_virtual(void *o) {
        fputs("error: pure virtual function called", stderr);
        exit(EXIT_FAILURE);
}

/* recursively fills in blank parts of meta type instance chain */
void init_meta(Object_Meta *o) {
        void (**virt)() = (void (**)()) &o->virt,
                         (**super_virtab)() = 0;
        unsigned int i = 1, max; /* skip dtor */
        if (o->super) {
                if (!o->super->done)
                        init_meta((Object_Meta*)o->super);
                super_virtab = (void (**)()) &o->super->virt;
                for (max = o->super->vnum; i < max; ++i)
                        if (!virt[i]) virt[i] = super_virtab[i];
        }
        if (o->vtinit)
                o->vtinit(o);
        for (max = o->vnum; i < max; ++i)
                if (!virt[i]) virt[i] = pure_virtual;
        o->done = 1;
}

void *raw_new(void *mem, void *_meta) {
        Object_Meta *meta = _meta;
        if (!mem) {
                if (!(mem = calloc(1, meta->size)))
                        return 0;
        } else {
                memset(mem, 0, meta->size);
        }
        if (!meta->done) init_meta(meta);
        ((Object*)mem)->meta = meta;
        return mem;
}

void delete(void *o) {
        const Object_Meta *meta = ((Object*)o)->meta;
        do {
                if (meta->virt.dtor) meta->virt.dtor(o);
                meta = meta->super;
        } while (meta);
        free(o);
}

void finalize(void *o) {
        const Object_Meta *meta = ((Object*)o)->meta;
        do {
                if (meta->virt.dtor) meta->virt.dtor(o);
                meta = meta->super;
        } while (meta);
        ((Object*)o)->meta = metaof(None);
}

/* core of type comparison */
int rtticheck(const void *submeta, const void *meta) {
        const Object_Meta *subclass = submeta, *class = meta;
        if (subclass == class)
                return 0;
        do {
                subclass = subclass->super;
                if (subclass == class)
                        return 1;
        } while (subclass);
        return -1;
}

The above code allows simply allocating zero-filled instances of any class – both new heap allocations, and the (re-)using of other memory for allocations. The delete() function fully handles clean-up when it should include a free(), while finalize() is for other cases. But unlike the use of destructors for clean-up, the use of constructors for further initialization is not included in the automation, because they can take varied forms – and sometimes it's good, or necessary, to be able to define several versions for one class. Making that simpler is possible e.g. using the macros which define pairs of functions where a Class_new() calls a Class_ctor() (the names can be varied) with the return value from the raw_new() function above for Class.

On delete() and destructors, the above code ensures all destructors in a class hierarchy are automatically called, instead of giving a subclass destructor the responsibility of calling the destructor of its superclass. More generally, there's a good reason to treat the destructor of each class in a special way, instead of lumping it together with the rest of the virtual functions. Allowing a destructor function to be NULL reduces boilerplate code when defining simple types. The same is true for the function a class may use for setting the vtable members added or changed by the class; it can be allowed to be NULL if there's no such virtual functions (virtual destructors don't count).

The disadvantage of coupling all destructor calls to delete() is that there's always some checking done for each object to finalize it. Decoupling it instead would make it easier to write code where, for example, a large container object can register the destructors of allocated sub-objects, performing all checking ahead of the eventual clean-up in one step and later only doing what is strictly needed. That would matter most if implementing a memory pool, the destroying of which is intended to free all memory for contained objects with the bare minimum of operations.

Run-time type identification logic can use the above rtticheck() function for most basic things, as very simple macros can use it to implement checks to see if an object is of_class(), of_subclass(), or to compare class names passed to metaof(), etc.

Revisiting the example types

Redefining the two trivial types "TypeA" and "TypeB" from earlier using this framework can be done as follows. The second part below, which unlike the first is not suitable to place in a header file, is needed to create the global instances of the meta types. That's just one line per class, though, since these classes don't need destructors, nor functions to fill in the vtables.

#define TypeA_ \
        int x;
#define TypeA__
classdef(TypeA);

#define TypeB_ TypeA_ \
        int y;
#define TypeB__ TypeA__
classdef(TypeB);
metainst(TypeA, None, NULL, NULL);
metainst(TypeB, TypeA, NULL, NULL);

The above leaves out the declaring and defining of constructor and class-specific "new" functions. Let's try again and include them. (The lazy alternative would be to simply allocate an instance of TypeB using the call raw_new(NULL, metaof(TypeB));. All fields other than meta will then be filled with zero bytes.)

Below, I follow the simple convention of naming the object pointer variable for a method o (it would be called this in C++). Whatever its name and place in the parameter list, the ctordef() macro requires it to be named as the last argument so that the boilerplate "new" code inserted can assign it before passing it to the constructor, done using the argument list passed as the next-to-last argument.

#define TypeA_ \
        int x;
#define TypeA__
classdef(TypeA);
ctordec(TypeA, TypeA, (TypeA* o));

#define TypeB_ TypeA_ \
        int y;
#define TypeB__ TypeA__
classdef(TypeB);
ctordec(TypeB, TypeB, (TypeB* o));
metainst(TypeA, None, NULL, NULL);
ctordef(TypeA, TypeA, (TypeA* o), (o), o) {
        o->x = 10;
        return 1; /* non-zero for success */
}

metainst(TypeB, TypeA, NULL, NULL);
ctordef(TypeB, TypeB, (TypeB* o), (o), o) {
        TypeA_ctor((TypeA*) o);
        o->y = 5;
        return 1; /* non-zero for success */
}

An instance of TypeB can now be allocated from the heap using TypeB_new(NULL); – or from the stack by passing an address.

In the SCOOP tests/ directory, there's a simple demonstration of adding constructors and using virtual functions, for types declared and defined in separate files (as is common in old-style "modular" pre-module C++ code, and C code imitating such structure). SCOOP also includes some extra preprocessor logic which can optionally be used to reduce the need to insert type casts, such as the cast above in the text TypeA_ctor((TypeA*) o);, when types are placed in different compilation units.

Background and thoughts on uses

I began experimenting with this in 2010. I had been working on and off on a fork of the C++ GUI library FLTK 2.0 for some years, FLPTK – the added "P" for "Plugin" (as I'd made a reworked version of the GUI toolkit more suitable for my audio DSP plugin purposes back then). Now it's long-abandoned, but before I put it all aside, back in 2010 I began to work on a replacement for it in C, based on porting a mixture of my old code, the new (FLTK versioning is messy) FLTK 1.3, and some things from eFLTK (another FLTK 2.0 fork). It was a large project, which I found interesting at first but which after a time didn't seem practically worthwhile.

The next year I lost motivation to complete that project, but I kept the little C framework I worked out for the conversion (and which would in principle fully allow it), adding it to FLPTK, and then in 2013 splitting it out into the separate C library SCOOP. I've not directly used it for anything else since, but have mused on related ideas from time to time.

(A few years later, I toyed with the idea of trying to do a similar project, but then basing a C port on the FLTK 1.3 fork NTK instead, for better Cairo-based graphics rendering. The FLTK project then went on to work on FLTK 1.4, and they more recently seem to plan further graphics changes in newer work...)

The SCOOP library seems to contain everything that would be needed for such endeavors – due to the particular form of the code to be converted, which is single inheritance C++ restricted in its use of features. But for more general – and practical – purposes, it's probably not suitable for comfortably converting C++ code to C. (As of the early 2020s, I've been thinking that modern C++ is getting more interesting, and in any case it seems best to simply use that when fancier C++ features are wanted.)

For adding OO organization to existing C projects – particularly when a collection of types would genuinely fit neatly into single inheritance hierarchies (I think OO is best in such cases, and usually overrated outside of them) – I think it can be worthwhile to extract some ideas from SCOOP.