summaryrefslogtreecommitdiff
path: root/miniany/README
diff options
context:
space:
mode:
Diffstat (limited to 'miniany/README')
-rw-r--r--miniany/README188
1 files changed, 108 insertions, 80 deletions
diff --git a/miniany/README b/miniany/README
index 6367011..814f6b3 100644
--- a/miniany/README
+++ b/miniany/README
@@ -1,85 +1,113 @@
-# Running on host:
-```
-./build.sh cc tcc hosted d
-./build.sh cc tcc freestanding d
-./cc < test1.c > test1.asm
-fasm test1.asm test1.bin
-ndisasm -b32 -o1000000h -a test1.bin
-```
+# CC - a self-hosting, bootstrappable, minimal C compiler
-# Running on c4:
+## Introduction
-```
-echo -n -e "\034" > EOF
+On the never-ending quest of a minimal system I found Swieros and C4 (the C compiler in 4 functions). Inspired and intrigued I started to implement my own.
+
+For abaos (a small operating system of mine, also in C) I cloned the minimal C library, so we can build a freestanding version of C4.
+
+C4 serves as a test whether my own CC is minimal enough and doesn't use silly functions. Additionally C4 as well as CC are compiled both in a (on Linux) hosted version and a freestanding version. We use a series of compilers like gcc, clang, tcc and pcc to make sure that we are not using silly C constructs.
+
+In order to be able to port easily we make almost no use of system calls, the ones we need are:
+
+
+- brk: for malloc/free, change the start address of the heap segment of the process, if the OS only assigns a single static space, then brk results in a NOP.
+- exit: terminate the process, return does not always work in all combinations (for instance with pcc on Linux). Can be a NOP, we don't require any trickery as <i>atext </i>and we don't use buffering anywhere (for instance flushing stdout on exit).
+- read/write: read from stdin linearly, write to stdout linearly, this is essentially a model using an input and an output tape. Those two functions must really exist. This basically eliminates the need for a file system which we might not have during early bootstrapping.
+
+Similarly we simplify the C language to not use certain features which can cause trouble when bootstrapping:
+
+
+- variable arguments: though simple in principle (just some pointers into the stack if you use a stack for function parameters), it is not typesafe. And the only example in practice it's really heavily used for is in printf-like functions.
+- preprocessor: it needs a filesystem, we take this outside of the compiler by feeding it an (eventually) concatenated list of \*.c files.
+- two types: int and char, so we can interpret memory as words or as bytes.
+
+## Local version of C4
+
+The local version of C4 has the following adaoptions and extensions:
+
+
+- switch statement from the <i>switch-and-struct</i>s branch, adapted c4 itself to use switch statements instead of if's (as in the <i>switch-and-structs </i>branch)
+- struct support from <i>switch-and-structs</i>
+- constants like <i>EO</i>F, <i>EXIT\_SUCCES</i>S, <i>NUL</i>L
+- standard C block comments along to c++ end of line ones
+- negative enum initializers
+- do/while loops
+- more C functions like <i>isspac</i>e, <i>get</i>c, <i>strcm</i>p
+- some simplified functions for printing like <i>putstring</i>, <i>putin</i>t, <i>putn</i>l
+- strict C89 conformance, mainly use standard comment blocks, also removed some warnings
+- some casts around malloc and memset to fit to non-void freestanding-libc
+- converted printf to putstring/putint/putnl and some helper functions for error reporting like error()
+- removed all memory leaks
+- de-POSIX-ified, no open/read/close, use getchar from stdin only (don't assume the existence of a file system), this also means we had to create sort of an old style tape-file with FS markers to separate the files piped to c4.
+
+<i>Note: </i>only too late I discovered that there was a C5 version of the same compiler, which would maybe have served better as a basis.
+
+## Examples
+
+### Running on the host system using the hosts C compiler
+
+Compiled in either hosted (host libc) or freestanding (our own libc, currently IA-32 Linux kernel only syscalls):
+
+`./build.sh cc hostcc hosted d
+./build.sh cc hostcc freestanding d
+./cc \< test1.c \> test1.asm`
+Create a plain binary from the assembly code:
+
+`fasm test1.asm test1.bin`
+Disassemble it to verify it's correctness:
+
+`ndisasm -b32 -o1000000h -a test1.bin`
+You can choose <i>gcc</i>, <i>clang</i>, <i>tcc </i>or <i>pcc </i>as host compiler (<i>hostcc</i>).
+
+### Running on the host in the C4 interpreter
+
+Running in C4 interpreter, again, the C4 program can be compiled in hosted or freestanding mode:
+
+`./build.sh c4 hostcc hosted d
+./build.sh c4 hostcc freestanding d`
+Here again you can choose the host compiler for compiling C4.
+
+Then we have to create the standard input for C4 using:
+
+`echo -n -e "\034" \> EOF
cat cc.c EOF hello.c | ./c4
cat c4.c EOF cc.c EOF hello.c | ./c4
-cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
-```
-
-# Local version of c4
-
-This local version of c4 is used to guarantee that our own bootstrapped
-cc.c can also run on c4. This makes sure we are not integrating too
-complex things into our own compiler.
-
-* currently integrated:
- - switch statement from switch-and-structs, adapted c4 to use switch
- statements instead of ifs (as in switch-and-structs)
- - structures from switch-and-structs
-* my own changes:
- - constants like EOF, EXIT_SUCCESS, NULL
- - standard C block comments along to c++ end of line ones
- - negative enum initializers
- - do/while loops
- - more C functions like isspace, getc, strcmp
- - some simplified functions for printing like putstring, putint, putnl
- - strict C89 conformance, mainly use standard comment blocks, also
- removed some warnings
- - some casts around malloc and memset to fit to non-void freestanding-libc
- - converted printf to putstring/putint/putnl and some helper functions
- for error reporting like error()
- - removed memory leaks
- - de-POSIX-ified, no open/read/close, use getchar from stdin only
- (don't assume the existence of a file system), this also means
- we have to create sort of an old style tape-file with FS markers
- to separate the files piped to c4.
-
-# Acknoledgments and references
-
-* c4
- * https://github.com/rswier/c4.git
- c4 - C in four functions
- minimalistic C compiler running on an emulator on the IR, inspiration
- for this project
- * https://github.com/rswier/c4/blob/switch-and-structs/c4.c:
- c4 adaptions to provide switch and structs
- * https://github.com/EarlGray/c4: a X86 JIT version of c4
- * https://github.com/jserv/amacc: based on C4, JIT or native code, for
- ARM, quite well documented, also very nice list of compiler
- resources on Github page
-
-* selfie
- * http://selfie.cs.uni-salzburg.at/:
- C* self-hosting C compiler (also emulator, hypervisor) for RISCV,
- inspiration for what makes up a minimal C language
-
-* tiny.c
- * http://www.iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c,
- Marc Feeley, really easy and much more readable, meant as educational compiler
-
-* c.c in swieros: https://github.com/rswier/swieros.git
-
-* https://github.com/lotabout/write-a-C-interpreter/blob/master/tutorial/en/: tutorial
- based on C4 how to build a C interpreter, explains nicely details in C4.
-
-* https://github.com/felipensp/assembly/blob/master/x86/itoa.s,
- https://baptiste-wicht.com/posts/2011/11/print-strings-integers-intel-assembly.html,
- putint for early debugging
-
-* documentation
- * "Compiler Construction", Niklaus Wirth
- * https://github.com/DoctorWkt/acwj: a nice series on building a C compiler,
- step by step with lots of good explanations
- * https://www.engr.mun.ca/~theo/Misc/exp_parsing.htm#climbing,
- https://en.wikipedia.org/wiki/Operator-precedence_parser#Precedence_climbing_method
+cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4`
+EOF contains the traditional FS (file separator) character in the ASCII character set. Every time c4/c4.c is invoked it reads exacly one input file up to the first FS character (or stops at the end of stdin).
+
+We can also use <i>-s</i>, or <i>-d </i>on every level as follows:
+
+`cat cc.c EOF hello.c | ./c4 -d`
+## References
+
+Compiler construction in general:
+
+
+- <i>"Compiler </i><i>Construction"</i>", Niklaus Wirth
+- https://github.com/DoctorWkt/acwj: a nice series on building a C compiler, step by step with lots of good explanations
+- https://www.engr.mun.ca/~theo/Misc/exp\_parsing.htm\#climbing, https://en.wikipedia.org/wiki/Operator-precedence\_parser\#Precedence\_climbing\_method
+- https://github.com/lotabout/write-a-C-interpreter/blob/master/tutorial/en/, tutorial based on C4 how to build a C interpreter, explains nicely details in C4.
+
+C4:
+
+
+- https://github.com/rswier/c4.git, <i>C4 </i><i>- </i><i>C </i><i>in </i><i>four </i><i>functions</i>, Robert Swierczek, minimalistic C compiler running on an emulator on the IR, inspiration for this project
+- https://github.com/rswier/c4/blob/switch-and-structs/c4.c, c4 adaptions to provide switch and structs
+- https://github.com/EarlGray/c4: a X86 JIT version of c4
+- https://github.com/jserv/amacc: based on C4, JIT or native code, for ARM, quite well documented, also very nice list of compiler resources on Github page
+
+Other minimal compilers and systems:
+
+
+- http://selfie.cs.uni-salzburg.at/: C\* self-hosting C compiler (also emulator, hypervisor) for RISCV, inspiration for what makes up a minimal C language
+- http://www.iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c, Marc Feeley, really easy and much more readable, meant as educational compiler
+- https://github.com/rswier/swieros.git: c.c in swieros, Robert Swierczek
+
+Assembly:
+
+
+- https://github.com/felipensp/assembly/blob/master/x86/itoa.s, for putint (early debugging keyword)
+- https://baptiste-wicht.com/posts/2011/11/print-strings-integers-intel-assembly.htm (earldy debugging keyword)
+