From b055c157e04ae30f6ffb9eef1fa97d512ce4e65a Mon Sep 17 00:00:00 2001 From: Andreas Baumann Date: Thu, 30 Sep 2021 07:57:04 +0000 Subject: README is markdown export of cc.wg now, Wordgrinder file is the master --- miniany/README | 188 +++++++++++++++++++++++++++++++++------------------------ miniany/cc.wg | 37 +++++++++--- 2 files changed, 138 insertions(+), 87 deletions(-) diff --git a/miniany/README b/miniany/README index 6367011..814f6b3 100644 --- a/miniany/README +++ b/miniany/README @@ -1,85 +1,113 @@ -# Running on host: -``` -./build.sh cc tcc hosted d -./build.sh cc tcc freestanding d -./cc < test1.c > test1.asm -fasm test1.asm test1.bin -ndisasm -b32 -o1000000h -a test1.bin -``` +# CC - a self-hosting, bootstrappable, minimal C compiler -# Running on c4: +## Introduction -``` -echo -n -e "\034" > EOF +On the never-ending quest of a minimal system I found Swieros and C4 (the C compiler in 4 functions). Inspired and intrigued I started to implement my own. + +For abaos (a small operating system of mine, also in C) I cloned the minimal C library, so we can build a freestanding version of C4. + +C4 serves as a test whether my own CC is minimal enough and doesn't use silly functions. Additionally C4 as well as CC are compiled both in a (on Linux) hosted version and a freestanding version. We use a series of compilers like gcc, clang, tcc and pcc to make sure that we are not using silly C constructs. + +In order to be able to port easily we make almost no use of system calls, the ones we need are: + + +- brk: for malloc/free, change the start address of the heap segment of the process, if the OS only assigns a single static space, then brk results in a NOP. +- exit: terminate the process, return does not always work in all combinations (for instance with pcc on Linux). Can be a NOP, we don't require any trickery as atext and we don't use buffering anywhere (for instance flushing stdout on exit). +- read/write: read from stdin linearly, write to stdout linearly, this is essentially a model using an input and an output tape. Those two functions must really exist. This basically eliminates the need for a file system which we might not have during early bootstrapping. + +Similarly we simplify the C language to not use certain features which can cause trouble when bootstrapping: + + +- variable arguments: though simple in principle (just some pointers into the stack if you use a stack for function parameters), it is not typesafe. And the only example in practice it's really heavily used for is in printf-like functions. +- preprocessor: it needs a filesystem, we take this outside of the compiler by feeding it an (eventually) concatenated list of \*.c files. +- two types: int and char, so we can interpret memory as words or as bytes. + +## Local version of C4 + +The local version of C4 has the following adaoptions and extensions: + + +- switch statement from the switch-and-structs branch, adapted c4 itself to use switch statements instead of if's (as in the switch-and-structs branch) +- struct support from switch-and-structs +- constants like EOF, EXIT\_SUCCESS, NULL +- standard C block comments along to c++ end of line ones +- negative enum initializers +- do/while loops +- more C functions like isspace, getc, strcmp +- some simplified functions for printing like putstring, putint, putnl +- strict C89 conformance, mainly use standard comment blocks, also removed some warnings +- some casts around malloc and memset to fit to non-void freestanding-libc +- converted printf to putstring/putint/putnl and some helper functions for error reporting like error() +- removed all memory leaks +- de-POSIX-ified, no open/read/close, use getchar from stdin only (don't assume the existence of a file system), this also means we had to create sort of an old style tape-file with FS markers to separate the files piped to c4. + +Note: only too late I discovered that there was a C5 version of the same compiler, which would maybe have served better as a basis. + +## Examples + +### Running on the host system using the hosts C compiler + +Compiled in either hosted (host libc) or freestanding (our own libc, currently IA-32 Linux kernel only syscalls): + +`./build.sh cc hostcc hosted d +./build.sh cc hostcc freestanding d +./cc \< test1.c \> test1.asm` +Create a plain binary from the assembly code: + +`fasm test1.asm test1.bin` +Disassemble it to verify it's correctness: + +`ndisasm -b32 -o1000000h -a test1.bin` +You can choose gcc, clang, tcc or pcc as host compiler (hostcc). + +### Running on the host in the C4 interpreter + +Running in C4 interpreter, again, the C4 program can be compiled in hosted or freestanding mode: + +`./build.sh c4 hostcc hosted d +./build.sh c4 hostcc freestanding d` +Here again you can choose the host compiler for compiling C4. + +Then we have to create the standard input for C4 using: + +`echo -n -e "\034" \> EOF cat cc.c EOF hello.c | ./c4 cat c4.c EOF cc.c EOF hello.c | ./c4 -cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4 -``` - -# Local version of c4 - -This local version of c4 is used to guarantee that our own bootstrapped -cc.c can also run on c4. This makes sure we are not integrating too -complex things into our own compiler. - -* currently integrated: - - switch statement from switch-and-structs, adapted c4 to use switch - statements instead of ifs (as in switch-and-structs) - - structures from switch-and-structs -* my own changes: - - constants like EOF, EXIT_SUCCESS, NULL - - standard C block comments along to c++ end of line ones - - negative enum initializers - - do/while loops - - more C functions like isspace, getc, strcmp - - some simplified functions for printing like putstring, putint, putnl - - strict C89 conformance, mainly use standard comment blocks, also - removed some warnings - - some casts around malloc and memset to fit to non-void freestanding-libc - - converted printf to putstring/putint/putnl and some helper functions - for error reporting like error() - - removed memory leaks - - de-POSIX-ified, no open/read/close, use getchar from stdin only - (don't assume the existence of a file system), this also means - we have to create sort of an old style tape-file with FS markers - to separate the files piped to c4. - -# Acknoledgments and references - -* c4 - * https://github.com/rswier/c4.git - c4 - C in four functions - minimalistic C compiler running on an emulator on the IR, inspiration - for this project - * https://github.com/rswier/c4/blob/switch-and-structs/c4.c: - c4 adaptions to provide switch and structs - * https://github.com/EarlGray/c4: a X86 JIT version of c4 - * https://github.com/jserv/amacc: based on C4, JIT or native code, for - ARM, quite well documented, also very nice list of compiler - resources on Github page - -* selfie - * http://selfie.cs.uni-salzburg.at/: - C* self-hosting C compiler (also emulator, hypervisor) for RISCV, - inspiration for what makes up a minimal C language - -* tiny.c - * http://www.iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c, - Marc Feeley, really easy and much more readable, meant as educational compiler - -* c.c in swieros: https://github.com/rswier/swieros.git - -* https://github.com/lotabout/write-a-C-interpreter/blob/master/tutorial/en/: tutorial - based on C4 how to build a C interpreter, explains nicely details in C4. - -* https://github.com/felipensp/assembly/blob/master/x86/itoa.s, - https://baptiste-wicht.com/posts/2011/11/print-strings-integers-intel-assembly.html, - putint for early debugging - -* documentation - * "Compiler Construction", Niklaus Wirth - * https://github.com/DoctorWkt/acwj: a nice series on building a C compiler, - step by step with lots of good explanations - * https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing, - https://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method +cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4` +EOF contains the traditional FS (file separator) character in the ASCII character set. Every time c4/c4.c is invoked it reads exacly one input file up to the first FS character (or stops at the end of stdin). + +We can also use -s, or -d on every level as follows: + +`cat cc.c EOF hello.c | ./c4 -d` +## References + +Compiler construction in general: + + +- "Compiler Construction"", Niklaus Wirth +- https://github.com/DoctorWkt/acwj: a nice series on building a C compiler, step by step with lots of good explanations +- https://www.engr.mun.ca/~theo/Misc/exp\_parsing.htm\#climbing, https://en.wikipedia.org/wiki/Operator-precedence\_parser\#Precedence\_climbing\_method +- https://github.com/lotabout/write-a-C-interpreter/blob/master/tutorial/en/, tutorial based on C4 how to build a C interpreter, explains nicely details in C4. + +C4: + + +- https://github.com/rswier/c4.git, C4 - C in four functions, Robert Swierczek, minimalistic C compiler running on an emulator on the IR, inspiration for this project +- https://github.com/rswier/c4/blob/switch-and-structs/c4.c, c4 adaptions to provide switch and structs +- https://github.com/EarlGray/c4: a X86 JIT version of c4 +- https://github.com/jserv/amacc: based on C4, JIT or native code, for ARM, quite well documented, also very nice list of compiler resources on Github page + +Other minimal compilers and systems: + + +- http://selfie.cs.uni-salzburg.at/: C\* self-hosting C compiler (also emulator, hypervisor) for RISCV, inspiration for what makes up a minimal C language +- http://www.iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c, Marc Feeley, really easy and much more readable, meant as educational compiler +- https://github.com/rswier/swieros.git: c.c in swieros, Robert Swierczek + +Assembly: + + +- https://github.com/felipensp/assembly/blob/master/x86/itoa.s, for putint (early debugging keyword) +- https://baptiste-wicht.com/posts/2011/11/print-strings-integers-intel-assembly.htm (earldy debugging keyword) + diff --git a/miniany/cc.wg b/miniany/cc.wg index e5ae626..e2b2c47 100644 --- a/miniany/cc.wg +++ b/miniany/cc.wg @@ -23,14 +23,20 @@ WordGrinder dumpfile v3: this is a text file; diff me! .addons.spellchecker.enabled: false .addons.spellchecker.usesystemdictionary: true .addons.spellchecker.useuserdictionary: true -.documents.1.co: 1 -.documents.1.cp: 31 -.documents.1.cw: 1 +.clipboard.co: 1 +.clipboard.cp: 1 +.clipboard.cw: 1 +.clipboard.margin: 0 +.clipboard.viewmode: 1 +.clipboard.wordcount: 19 +.documents.1.co: 9 +.documents.1.cp: 71 +.documents.1.cw: 4 .documents.1.margin: 0 .documents.1.name: "main" .documents.1.sticky_selection: false .documents.1.viewmode: 1 -.documents.1.wordcount: 774 +.documents.1.wordcount: 910 .fileformat: 8 .findtext: "C4" .menu.accelerators.^@: "ZM" @@ -139,6 +145,9 @@ WordGrinder dumpfile v3: this is a text file; diff me! .replacetext: "" .statusbar: true .current: 1 +#clipboard +LB http://selfie.cs.uni-salzburg.at/: C* self-hosting C compiler (also emulator, hypervisor) for RISCV, inspiration for what makes up a minimal C language +. #1 H1 CC - a self-hosting, bootstrappable, minimal C compiler H2 Introduction @@ -193,8 +202,22 @@ PRE cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4 P EOF contains the traditional FS (file separator) character in the ASCII character set. Every time c4/c4.c is invoked it reads exacly one input file up to the first FS character (or stops at the end of stdin). P We can also use -s, or -d on every level as follows: PRE cat cc.c EOF hello.c | ./c4 -d -H2 References -LB "Compiler Construction", Niklaus Wirth +H2 References +P Compiler construction in general: +LB "Compiler Construction"", Niklaus Wirth LB https://github.com/DoctorWkt/acwj: a nice series on building a C compiler, step by step with lots of good explanations -LB https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing, https://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method +LB https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing, https://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method +LB https://github.com/lotabout/write-a-C-interpreter/blob/master/tutorial/en/, tutorial based on C4 how to build a C interpreter, explains nicely details in C4. +P C4: +LB https://github.com/rswier/c4.git, C4 - C in four functions, Robert Swierczek, minimalistic C compiler running on an emulator on the IR, inspiration for this project +LB https://github.com/rswier/c4/blob/switch-and-structs/c4.c, c4 adaptions to provide switch and structs +LB https://github.com/EarlGray/c4: a X86 JIT version of c4 +LB https://github.com/jserv/amacc: based on C4, JIT or native code, for ARM, quite well documented, also very nice list of compiler resources on Github page +P Other minimal compilers and systems: +LB http://selfie.cs.uni-salzburg.at/: C* self-hosting C compiler (also emulator, hypervisor) for RISCV, inspiration for what makes up a minimal C language +LB http://www.iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c, Marc Feeley, really easy and much more readable, meant as educational compiler +LB https://github.com/rswier/swieros.git: c.c in swieros, Robert Swierczek +P Assembly: +LB https://github.com/felipensp/assembly/blob/master/x86/itoa.s, for putint (early debugging keyword) +LB https://baptiste-wicht.com/posts/2011/11/print-strings-integers-intel-assembly.htm (earldy debugging keyword) . -- cgit v1.2.3-54-g00ecf