From f5c8ee1f889824eaa4bb89a01711a538e295dba0 Mon Sep 17 00:00:00 2001 From: Andreas Baumann Date: Thu, 30 Sep 2021 11:21:37 +0200 Subject: added a Makefile for wordgrinder export and markdown to HMTL --- miniany/Makefile | 10 +++++ miniany/README | 113 ---------------------------------------------------- miniany/README.html | 79 ++++++++++++++++++++++++++++++++++++ miniany/cc.c | 43 ++++++++++++++++++++ 4 files changed, 132 insertions(+), 113 deletions(-) create mode 100644 miniany/Makefile delete mode 100644 miniany/README create mode 100644 miniany/README.html diff --git a/miniany/Makefile b/miniany/Makefile new file mode 100644 index 0000000..24c535e --- /dev/null +++ b/miniany/Makefile @@ -0,0 +1,10 @@ +.PHONY: doc + +doc: README.html + +README.html: cc.md + md2html --fpermissive-url-autolinks < cc.md > README.html + +cc.md: cc.wg + wordgrinder -c cc.wg cc.md + \ No newline at end of file diff --git a/miniany/README b/miniany/README deleted file mode 100644 index 814f6b3..0000000 --- a/miniany/README +++ /dev/null @@ -1,113 +0,0 @@ - -# CC - a self-hosting, bootstrappable, minimal C compiler - -## Introduction - -On the never-ending quest of a minimal system I found Swieros and C4 (the C compiler in 4 functions). Inspired and intrigued I started to implement my own. - -For abaos (a small operating system of mine, also in C) I cloned the minimal C library, so we can build a freestanding version of C4. - -C4 serves as a test whether my own CC is minimal enough and doesn't use silly functions. Additionally C4 as well as CC are compiled both in a (on Linux) hosted version and a freestanding version. We use a series of compilers like gcc, clang, tcc and pcc to make sure that we are not using silly C constructs. - -In order to be able to port easily we make almost no use of system calls, the ones we need are: - - -- brk: for malloc/free, change the start address of the heap segment of the process, if the OS only assigns a single static space, then brk results in a NOP. -- exit: terminate the process, return does not always work in all combinations (for instance with pcc on Linux). Can be a NOP, we don't require any trickery as atext and we don't use buffering anywhere (for instance flushing stdout on exit). -- read/write: read from stdin linearly, write to stdout linearly, this is essentially a model using an input and an output tape. Those two functions must really exist. This basically eliminates the need for a file system which we might not have during early bootstrapping. - -Similarly we simplify the C language to not use certain features which can cause trouble when bootstrapping: - - -- variable arguments: though simple in principle (just some pointers into the stack if you use a stack for function parameters), it is not typesafe. And the only example in practice it's really heavily used for is in printf-like functions. -- preprocessor: it needs a filesystem, we take this outside of the compiler by feeding it an (eventually) concatenated list of \*.c files. -- two types: int and char, so we can interpret memory as words or as bytes. - -## Local version of C4 - -The local version of C4 has the following adaoptions and extensions: - - -- switch statement from the switch-and-structs branch, adapted c4 itself to use switch statements instead of if's (as in the switch-and-structs branch) -- struct support from switch-and-structs -- constants like EOF, EXIT\_SUCCESS, NULL -- standard C block comments along to c++ end of line ones -- negative enum initializers -- do/while loops -- more C functions like isspace, getc, strcmp -- some simplified functions for printing like putstring, putint, putnl -- strict C89 conformance, mainly use standard comment blocks, also removed some warnings -- some casts around malloc and memset to fit to non-void freestanding-libc -- converted printf to putstring/putint/putnl and some helper functions for error reporting like error() -- removed all memory leaks -- de-POSIX-ified, no open/read/close, use getchar from stdin only (don't assume the existence of a file system), this also means we had to create sort of an old style tape-file with FS markers to separate the files piped to c4. - -Note: only too late I discovered that there was a C5 version of the same compiler, which would maybe have served better as a basis. - -## Examples - -### Running on the host system using the hosts C compiler - -Compiled in either hosted (host libc) or freestanding (our own libc, currently IA-32 Linux kernel only syscalls): - -`./build.sh cc hostcc hosted d -./build.sh cc hostcc freestanding d -./cc \< test1.c \> test1.asm` -Create a plain binary from the assembly code: - -`fasm test1.asm test1.bin` -Disassemble it to verify it's correctness: - -`ndisasm -b32 -o1000000h -a test1.bin` -You can choose gcc, clang, tcc or pcc as host compiler (hostcc). - -### Running on the host in the C4 interpreter - -Running in C4 interpreter, again, the C4 program can be compiled in hosted or freestanding mode: - -`./build.sh c4 hostcc hosted d -./build.sh c4 hostcc freestanding d` -Here again you can choose the host compiler for compiling C4. - -Then we have to create the standard input for C4 using: - -`echo -n -e "\034" \> EOF -cat cc.c EOF hello.c | ./c4 -cat c4.c EOF cc.c EOF hello.c | ./c4 -cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4` -EOF contains the traditional FS (file separator) character in the ASCII character set. Every time c4/c4.c is invoked it reads exacly one input file up to the first FS character (or stops at the end of stdin). - -We can also use -s, or -d on every level as follows: - -`cat cc.c EOF hello.c | ./c4 -d` -## References - -Compiler construction in general: - - -- "Compiler Construction"", Niklaus Wirth -- https://github.com/DoctorWkt/acwj: a nice series on building a C compiler, step by step with lots of good explanations -- https://www.engr.mun.ca/~theo/Misc/exp\_parsing.htm\#climbing, https://en.wikipedia.org/wiki/Operator-precedence\_parser\#Precedence\_climbing\_method -- https://github.com/lotabout/write-a-C-interpreter/blob/master/tutorial/en/, tutorial based on C4 how to build a C interpreter, explains nicely details in C4. - -C4: - - -- https://github.com/rswier/c4.git, C4 - C in four functions, Robert Swierczek, minimalistic C compiler running on an emulator on the IR, inspiration for this project -- https://github.com/rswier/c4/blob/switch-and-structs/c4.c, c4 adaptions to provide switch and structs -- https://github.com/EarlGray/c4: a X86 JIT version of c4 -- https://github.com/jserv/amacc: based on C4, JIT or native code, for ARM, quite well documented, also very nice list of compiler resources on Github page - -Other minimal compilers and systems: - - -- http://selfie.cs.uni-salzburg.at/: C\* self-hosting C compiler (also emulator, hypervisor) for RISCV, inspiration for what makes up a minimal C language -- http://www.iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c, Marc Feeley, really easy and much more readable, meant as educational compiler -- https://github.com/rswier/swieros.git: c.c in swieros, Robert Swierczek - -Assembly: - - -- https://github.com/felipensp/assembly/blob/master/x86/itoa.s, for putint (early debugging keyword) -- https://baptiste-wicht.com/posts/2011/11/print-strings-integers-intel-assembly.htm (earldy debugging keyword) - diff --git a/miniany/README.html b/miniany/README.html new file mode 100644 index 0000000..411c667 --- /dev/null +++ b/miniany/README.html @@ -0,0 +1,79 @@ +

CC - a self-hosting, bootstrappable, minimal C compiler

+

Introduction

+

On the never-ending quest of a minimal system I found Swieros and C4 (the C compiler in 4 functions). Inspired and intrigued I started to implement my own.

+

For abaos (a small operating system of mine, also in C) I cloned the minimal C library, so we can build a freestanding version of C4.

+

C4 serves as a test whether my own CC is minimal enough and doesn't use silly functions. Additionally C4 as well as CC are compiled both in a (on Linux) hosted version and a freestanding version. We use a series of compilers like gcc, clang, tcc and pcc to make sure that we are not using silly C constructs.

+

In order to be able to port easily we make almost no use of system calls, the ones we need are:

+ +

Similarly we simplify the C language to not use certain features which can cause trouble when bootstrapping:

+ +

Local version of C4

+

The local version of C4 has the following adaoptions and extensions:

+ +

Note: only too late I discovered that there was a C5 version of the same compiler, which would maybe have served better as a basis.

+

Examples

+

Running on the host system using the hosts C compiler

+

Compiled in either hosted (host libc) or freestanding (our own libc, currently IA-32 Linux kernel only syscalls):

+

./build.sh cc hostcc hosted d ./build.sh cc hostcc freestanding d ./cc \< test1.c \> test1.asm +Create a plain binary from the assembly code:

+

fasm test1.asm test1.bin +Disassemble it to verify it's correctness:

+

ndisasm -b32 -o1000000h -a test1.bin +You can choose gcc, clang, tcc or pcc as host compiler (hostcc).

+

Running on the host in the C4 interpreter

+

Running in C4 interpreter, again, the C4 program can be compiled in hosted or freestanding mode:

+

./build.sh c4 hostcc hosted d ./build.sh c4 hostcc freestanding d +Here again you can choose the host compiler for compiling C4.

+

Then we have to create the standard input for C4 using:

+

echo -n -e "\034" \> EOF cat cc.c EOF hello.c | ./c4 cat c4.c EOF cc.c EOF hello.c | ./c4 cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4 +EOF contains the traditional FS (file separator) character in the ASCII character set. Every time c4/c4.c is invoked it reads exacly one input file up to the first FS character (or stops at the end of stdin).

+

We can also use -s, or -d on every level as follows:

+

cat cc.c EOF hello.c | ./c4 -d

+

References

+

Compiler construction in general:

+ +

C4:

+ +

Other minimal compilers and systems:

+ +

Assembly:

+ diff --git a/miniany/cc.c b/miniany/cc.c index c814f3d..28d9f1c 100644 --- a/miniany/cc.c +++ b/miniany/cc.c @@ -1295,6 +1295,49 @@ void parseIf( struct Compiler *compiler ) free( label1 ); } +/* + void parseIf( struct Compiler *compiler ) +{ + struct Parser *parser; + struct ASTnode *node; + char *label1, *label2; + + parser = compiler->parser; + parserExpect( parser, S_IF, "if" ); + parserExpect( parser, S_LPAREN, "(" ); + node = parseExpression( parser, 0 ); + if( compiler->generator->debug ) { + putstring( "; if then" ); putnl( ); + } + generateFromAST( compiler->generator, node, NOREG ); + putstring( "cmp al, 0" ); putnl( ); + label1 = genGetLabel( compiler, compiler->parser->global_scope ); + putstring( "je " ); putstring( label1 ); putnl( ); + genFreeAllRegs( compiler->generator ); + parserExpect( parser, S_RPAREN, ")" ); + parseStatementBlock( compiler ); + if( parser->token == S_ELSE ) { + label2 = genGetLabel( compiler, compiler->parser->global_scope ); + putstring( "jmp " ); putstring( label2 ); putnl( ); + if( compiler->generator->debug ) { + putstring( "; else" ); putnl( ); + } + putstring( label1 ); putchar( ':' ); putnl( ); + compiler->parser->token = getToken( compiler->parser->scanner ); + parseStatementBlock( compiler ); + putstring( label2 ); putchar( ':' ); putnl( ); + free( label2 ); + } else { + putstring( label1 ); putchar( ':' ); putnl( ); + } + + if( compiler->generator->debug ) { + putstring( "; fi" ); putnl( ); + } + free( label1 ); +} + + */ void parseStatement( struct Compiler *compiler ) { struct Parser *parser; -- cgit v1.2.3-54-g00ecf