From 121209ac9702979d16b73340ec6a3a38adbe1b03 Mon Sep 17 00:00:00 2001 From: Andreas Baumann Date: Sun, 10 Oct 2021 19:44:33 +0200 Subject: added documenation from REQUIREMENTS to cc.wg (README.html) added a TODO file added an automatic build file with entr --- miniany/README.html | 257 ++++++++++++++++++++++++++++++++++++++++++++++++--- miniany/REQUIREMENTS | 142 ---------------------------- miniany/TODOS | 0 miniany/autobuild.sh | 17 ++++ miniany/cc.wg | 191 ++++++++++++++++++++++++++++++++++---- 5 files changed, 435 insertions(+), 172 deletions(-) delete mode 100644 miniany/REQUIREMENTS create mode 100644 miniany/TODOS create mode 100755 miniany/autobuild.sh diff --git a/miniany/README.html b/miniany/README.html index 53eac72..7be5708 100644 --- a/miniany/README.html +++ b/miniany/README.html @@ -2,18 +2,18 @@

Introduction

On the never-ending quest of a minimal system I found Swieros and C4 (the C compiler in 4 functions). Inspired and intrigued I started to implement my own.

For abaos (a small operating system of mine, also in C) I cloned the minimal C library, so we can build a freestanding version of C4.

-

C4 serves as a test whether my own CC is minimal enough and doesn't use silly functions. Additionally C4 as well as CC are compiled both in a (on Linux) hosted version and a freestanding version. We use a series of compilers like gcc, clang, tcc and pcc to make sure that we are not using silly C constructs.

+

C4 serves as a test whether my own CC is minimal enough and doesn't use silly functions. Additionally C4 as well as CC are compiled both in a (on Linux) hosted version and a freestanding version. We use a series of compilers like gcc, clang, tcc and pcc to make sure that we are not using more silly C constructs.

In order to be able to port easily we make almost no use of system calls, the ones we need are:

Similarly we simplify the C language to not use certain features which can cause trouble when bootstrapping:

Local version of C4

The local version of C4 has the following adaoptions and extensions:

@@ -29,11 +29,12 @@
  • BSD-style string functions like strlcpy, strlcat
  • strict C89 conformance, mainly use standard comment blocks, also removed some warnings
  • some casts around malloc and memset to fit to non-void freestanding-libc
  • -
  • converted printf to putstring/putint/putnl and some helper functions for error reporting like error()
  • +
  • converted printf to putstring/putint/putnl and some helper functions for error reporting like error()
  • removed all memory leaks
  • -
  • de-POSIX-ified, no open/read/close, use getchar from stdin only (don't assume the existence of a file system), this also means we had to create sort of an old style tape-file with FS markers to separate the files piped to c4.
  • +
  • de-POSIX-ified, no open/read/close, use getchar from stdin only (don't assume the existence of a file system), this also means we had to create sort of an old style tape-file with FS markers to separate the files piped to c4.
  • -

    Note: only too late I discovered that there was a C5 version of the same compiler, which would maybe have served better as a basis.

    +

    The reason for all those adaptions is to minimize the dependency on the host system and to be able to use libc-freestanding.c.

    +

    Note: only too late I discovered that there was a C5 version of the same compiler, which would maybe have served better as a basis.

    Examples

    Running on the host system using the hosts C compiler

    Compiled in either hosted (host libc) or freestanding (our own libc, currently IA-32 Linux kernel only syscalls):

    @@ -64,14 +65,248 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4

    We can also use -s, or -d on every level as follows:

    cat cc.c EOF hello.c | ./c4 -d
     
    +

    Features and Requirements

    +

    We have to careful what to put in a bootstrapping compiler, there is a tradeoff between

    + +

    So we collect some ideas here about features we add or do not add and why. We also collect here what their implications are, when we are implementing them.

    +

    We also have to be careful what C4 can do for us and either add it there (but only if small enough) in order no to loose this test case.

    +

    Preprocessor for modularisation

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Alternative:

    + +

    Counter arguments:

    + +

    Preprocessor for conditional compilation

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Alternative:

    + +

    Preprocessor for constant declarations

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Caveats:

    + +

    Variable Initializers

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Counter arguments:

    + +

    Inline Assembly

    +

    Implementation status: yes

    +

    Reasoning:

    + +

    Counter arguments:

    + +

    Alternative:

    + +

    Some general notes:

    +

    GNU inline asm statement has become the de-facto standard (which is too complicated IMHO): I would require sort of a .byte 0xXX instruction only, for readablility maybe simple fasm-like syntax. We must be careful that our invention of an inline assembler can be mapped somehow to the GNU inline asm version, so that we can use that one on the host with gcc/clang/tcc/pcc..

    +

    c.c in swieros (the c4 successor) has asm(NOP), this is something we could implement easily. u.h contains an enum with opcodes (most likely doable or an easy architecture like the one in swieros, I doubt this works for Intel opcodes, but we should check if it works for our simplified Intel opcode subset).

    +

    There should though be only one single point of information for opcodes per architecture, so asm gets sort of an inline string generator for the assembly output. Or we share a common C-file with enums for the opcodes and cat it to both the assembler and the compiler during the build (should not result in increaed code size, as those are enums).

    +

    The asm(x) or asm(x,y) constructs can be mapped on the host compilers to asm __volatile__ .byte ugliness. In cc and c4 we can take the swieros approach. This should give us nice lowlevel inline assembly in a really simplified way (basically embedding bytes).

    +

    Not having inline assembly means you need compilation units written and linked to the program in assembly, which - well - adds a linker and calling conventions, which might be too early in bootstrapping.

    +

    Object formats and linkers

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Alternative:

    + +

    Forward declarations of function prototypes

    +

    Implementation status: yes (TODO)

    +

    Reasoning:

    + +

    Caveats:

    + +

    Counter arguments:

    + +

    Functions with variable arguments

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Requirements

    + +

    Alternative:

    + +

    Counter arguments:

    + +

    FILE* and stderr

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Counter arguments:

    + +

    Typedefs

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Counter arguments:

    + +

    For-loops

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Counter arguments:

    + +

    Passing arguments to main

    +

    Implementation status: yes

    +

    Reasoning:

    + +

    Counter argument:

    + +

    bool

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Union

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Counter arguments:

    + +

    Dangling else

    +

    Implementation status: no

    +

    Reasoning:

    + +

    Register Allocation

    +

    Implementation status: yes

    +

    Reasoning:

    + +

    Abstract Syntax Trees

    +

    Implementation status: yes

    +

    Reasoning:

    + +

    Caveats:

    + +

    Counter arguments:

    + +

    Builtin functions

    +

    Implementation status: yes

    +

    Reasoning:

    + +

    Caveats:

    +

    References

    Compiler construction in general:

    +

    Some special compiler building topics:

    +

    C4: