CC - a self-hosting, bootstrappable, minimal C compiler

Introduction

On the never-ending quest of a minimal system I found Swieros and C4 (the C compiler in 4 functions). Inspired and intrigued I started to implement my own.

For abaos (a small operating system of mine, also in C) I cloned the minimal C library, so we can build a freestanding version of C4.

C4 serves as a test whether my own CC is minimal enough and doesn't use silly functions. Additionally C4 as well as CC are compiled both in a (on Linux) hosted version and a freestanding version. We use a series of compilers like gcc, clang, tcc and pcc to make sure that we are not using silly C constructs.

In order to be able to port easily we make almost no use of system calls, the ones we need are:

Similarly we simplify the C language to not use certain features which can cause trouble when bootstrapping:

Local version of C4

The local version of C4 has the following adaoptions and extensions:

Note: only too late I discovered that there was a C5 version of the same compiler, which would maybe have served better as a basis.

Examples

Running on the host system using the hosts C compiler

Compiled in either hosted (host libc) or freestanding (our own libc, currently IA-32 Linux kernel only syscalls):

./build.sh cc hostcc hosted d
./build.sh cc hostcc freestanding d
./cc < test1.c > test1.asm

Create a plain binary from the assembly code:

fasm test1.asm test1.bin

Disassemble it to verify it's correctness:

ndisasm -b32 -o1000000h -a test1.bin

You can choose gcc, clang, tcc or pcc as host compiler (hostcc).

Running on the host in the C4 interpreter

Running in C4 interpreter, again, the C4 program can be compiled in hosted or freestanding mode:

./build.sh c4 hostcc hosted d
./build.sh c4 hostcc freestanding d

Here again you can choose the host compiler for compiling C4.

Then we have to create the standard input for C4 using:

echo -n -e "\034" > EOF
cat cc.c EOF hello.c | ./c4
cat c4.c EOF cc.c EOF hello.c | ./c4
cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4

EOF contains the traditional FS (file separator) character in the ASCII character set. Every time c4 is invoked it reads exacly one input file up to the first FS character (or stops at the end of stdin).

We can also use -s, or -d on every level as follows:

cat cc.c EOF hello.c | ./c4 -d

References

Compiler construction in general:

C4:

Other minimal compilers and systems:

Assembly:

Documentation: