implementing: - userland - argument passing to main function (argc, argv) - libc - print_char - requires a 3 parameter syscall to 80h (Linux) - requires - inline assembly not implementing: - libc - variadic functions are not type-safe, do we need them? - printf -> putint, putchar, etc. - format string only, as replacement for puts - vararg required in compiler - not type-safe - snprintf no option, strcat, strstr etc also not really - newer formating functions and logging: strfmon, error, warn, syslog - syscall - puts - requires stdout, which is a FILE structure - print_char - requires a 3 parameter syscall to 80h (Linux) - requires - either inline assembly - linker and calling convention - preprocessor - have a cat building up the required modules instead - needs file operations (at least open, close, read) - needs a file system on the host and the destination (alternative: have a tape-like file system) - linker - have compilation units needs a linker do build an executable - symname[t] printing the symbol and not the number, requires static initializers for array of char* - ASTs are basically only useful when you start to optimize, till then you can use an intermediate format (as C4) does and a stack machine. They also make the code easier readable. For use they force the introduction of pointers, references and structs. In expression parsing we see, that const folding already needs an AST, because we should not emit code when still reading a constant expression. It also seperates syntactical stuff like '[' from logical stuff like 'declaration of array size' and 'derefencing a pointer'. - void * allowing to omit (char *) from and to for instance structs in dynamic memory management - typedefs are just syntactic sugar, I use them mostly for 'struct T' -> 'T' - initializers of global and locals, not that important as we use C89 anyway, forcing us to separate declaration and usage of variables per scope - unions, useful to safe space in AST, but not strictly necessary - bool, useful, but not strigtly necessary - enums as constant replacement (instead of preprocessor), realy enum types are not really useful. - forward struct definitions or typedefs (handy for Compiler structure), but.. and we can work around by not producing any loops (hopefully) - for loop: unless we start optimizing (SIMD) there is no real benefit for a generic 'for', a strict for i=0 to N, i++ is easier to optimize, when you have a grammatical construct to help recognizing it. - register number for register alloation https://en.wikipedia.org/wiki/Strahler_number - volatile: we are not doing any optimizations for now, so volatile (as const) can just be a ignored keyword. - c4 freestanding - uses some casts, the malloc ones are actually good for clarification, the ones in memset are not so useful (this is all because we don't have 'void *') - open/read/close is POSIX, we would prefer either C style file handling (we have it in libc-freestanding.c or some stdin, stdout thingy) - again printf and varargs, either use libc-freestanding.c or revert to putint, putstring, putnl.. - if (tk == '(') next(); else { printf("%d: open paren expected\n", line); exit(-1); } => error("open paren expected"); } - printf("%d: compiler error tk=%d\n", line, tk); exit(-1); - printf("could not malloc(%d) symbol area\n") => remove size, also map to error - printf("read() returned %d\n", i); => dito - we also print a non-sensical line, but we don't really care about this - printf("%d: bad enum identifier %d\n", line, tk); he number 'tk' looks like debug output here, so we drop it. error1int is the other option (also choosen in other places) - other cases translate by hand: - case EXIT: /* putstring("exit("); putint(*sp); putstring(") cycle = "); putint(cycle); putnl(); */ return *sp; - default: putstring("unknown instruction = "); putint(i); putstring("! cycle = "); putint(cycle); putnl(); return -1; TODO: - global char array declarations - void parameter TODO: - avoid GNU-stype inline assembler (is far too complex), have more a inline bytecode adder for explicit opcodes, e.g. nop -> .byte 0x90 - c.c in swieros (the c4 successor) has 'asm(NOP)', this is something we could implement easily, preferably just as 'asm(0x90)'. u.h contains an enum with opcodes (most likely doable or an easy architecture like the one in swieros, I doubt this works for Intel opcodes). There should though be only one single point of information for opcodes per architecture, so asm gets sort of an inline string generator for the assembly output. Or we share a common C-file with enums for the opcodes and cat it to both the assembler and the compiler during the build (should not result in increaed code size, as those are enums). the asm(x) or asm(x,y) constructs can be mapped on the host compilers to asm __volatile__ .byte ugliness. In cc and c4 we can take the swieros approach. This should give us nice lowlevel inline assembly in a really simplified way (basically embedding bytes). Not having inline assembly means you need compilation units written and linked to the program in assembly, which - well - adds a linker and calling conventions, which might be too early in bootstrapping. - asm-i386: device a new version which runs on c4 and is again freestanding - static: just ignore, we don't have a linker, otoh, just rewrite it whithout static, vararg, etc. - c4.c: checkout c5-AST branch (darn, that one looks more promising to extend!) - cc.c: putint as a command in the language for early debugging (as in early Pascal), points to a fundamental conflict: bootstrapping is better with stdout and stdin in the language (no linker, no function calls etc. needed). OTOH we don't want to have I/O as part of the language later, more be in the standard library. Inline assembly in the generated code duplicates code with the putint in libc-freestanding. - error output is not on stderr, well, are we going to add stdout, stderr now or do we write errors as sort of assembly comments? - AST debate: - expressions really require an AST (just the A_ASSIGN itself with its reversed order). IF, WHILE, etc. not, they can be in an AST or be endcoded directly (or if not, what kind of optimizations do we loose?) - the context of a boolean expression can be an if (in this case we would generate direcly the jnXX instruction) and the far less often seen case of assigning it to a variable. Knowing the contest would require an AST. - AST should not be the output of whole programs, scoping is maybe better - don't allow non-blocked if/else, just avoid dangling else problems - for loops: they simply have a too general and weird semantic in C (missing semicolon after the last triplet). IMHO a for loop makes sense for SIMD operations only when we can use a stricter grammar to optimize certain iterations.