From 121209ac9702979d16b73340ec6a3a38adbe1b03 Mon Sep 17 00:00:00 2001 From: Andreas Baumann Date: Sun, 10 Oct 2021 19:44:33 +0200 Subject: added documenation from REQUIREMENTS to cc.wg (README.html) added a TODO file added an automatic build file with entr --- miniany/REQUIREMENTS | 142 --------------------------------------------------- 1 file changed, 142 deletions(-) delete mode 100644 miniany/REQUIREMENTS (limited to 'miniany/REQUIREMENTS') diff --git a/miniany/REQUIREMENTS b/miniany/REQUIREMENTS deleted file mode 100644 index a6fce03..0000000 --- a/miniany/REQUIREMENTS +++ /dev/null @@ -1,142 +0,0 @@ -implementing: - -- userland - - argument passing to main function (argc, argv) -- libc - - print_char - - requires a 3 parameter syscall to 80h (Linux) - - requires - - inline assembly - -not implementing: -- libc - - variadic functions are not type-safe, do we need them? - - printf -> putint, putchar, etc. - - format string only, as replacement for puts - - vararg required in compiler - - not type-safe - - snprintf no option, strcat, strstr etc also not really - - newer formating functions and logging: strfmon, error, warn, syslog - - syscall - - puts - - requires stdout, which is a FILE structure - - print_char - - requires a 3 parameter syscall to 80h (Linux) - - requires - - either inline assembly - - linker and calling convention -- preprocessor - - have a cat building up the required modules instead - - needs file operations (at least open, close, read) - - needs a file system on the host and the destination - (alternative: have a tape-like file system) -- linker - - have compilation units needs a linker do build - an executable -- symname[t] printing the symbol and not the number, - requires static initializers for array of char* -- ASTs are basically only useful when you start to optimize, - till then you can use an intermediate format (as C4) does - and a stack machine. They also make the code easier readable. - For use they force the introduction of pointers, references and structs. - In expression parsing we see, that const folding already needs - an AST, because we should not emit code when still reading - a constant expression. It also seperates syntactical stuff like '[' - from logical stuff like 'declaration of array size' and 'derefencing - a pointer'. -- void * allowing to omit (char *) from and to for instance structs - in dynamic memory management -- typedefs are just syntactic sugar, I use them mostly for 'struct T' -> 'T' -- initializers of global and locals, not that important as we use C89 anyway, - forcing us to separate declaration and usage of variables per scope -- unions, useful to safe space in AST, but not strictly necessary -- bool, useful, but not strigtly necessary -- enums as constant replacement (instead of preprocessor), realy enum types - are not really useful. -- forward struct definitions or typedefs (handy for Compiler structure), but.. - and we can work around by not producing any loops (hopefully) -- for loop: unless we start optimizing (SIMD) there is no real benefit - for a generic 'for', a strict for i=0 to N, i++ is easier to optimize, when - you have a grammatical construct to help recognizing it. -- register number for register alloation - https://en.wikipedia.org/wiki/Strahler_number -- volatile: we are not doing any optimizations for now, so volatile (as const) - can just be a ignored keyword. -- c4 freestanding - - uses some casts, the malloc ones are actually good for clarification, - the ones in memset are not so useful (this is all because we don't - have 'void *') - - open/read/close is POSIX, we would prefer either C style file handling - (we have it in libc-freestanding.c or some stdin, stdout thingy) - - again printf and varargs, either use libc-freestanding.c or revert to - putint, putstring, putnl.. - - if (tk == '(') next(); else { printf("%d: open paren expected\n", line); exit(-1); } - => - error("open paren expected"); } - - printf("%d: compiler error tk=%d\n", line, tk); exit(-1); - - printf("could not malloc(%d) symbol area\n") => remove size, also map to error - - printf("read() returned %d\n", i); => dito - - we also print a non-sensical line, but we don't really care about this - - printf("%d: bad enum identifier %d\n", line, tk); he number 'tk' looks like - debug output here, so we drop it. - error1int is the other option (also choosen in other places) - - other cases translate by hand: - - case EXIT: /* putstring("exit("); putint(*sp); putstring(") cycle = "); putint(cycle); putnl(); */ return *sp; - - default: putstring("unknown instruction = "); putint(i); putstring("! cycle = "); putint(cycle); putnl(); return -1; - TODO: - - global char array declarations - - void parameter -TODO: -- avoid GNU-stype inline assembler (is far too complex), have more a - inline bytecode adder for explicit opcodes, e.g. nop -> .byte 0x90 - - c.c in swieros (the c4 successor) has 'asm(NOP)', this is something we - could implement easily, preferably just as 'asm(0x90)'. - u.h contains an enum with opcodes (most likely doable or an easy architecture - like the one in swieros, I doubt this works for Intel opcodes). - There should though be only one single point of information for - opcodes per architecture, so asm gets sort of an inline string - generator for the assembly output. Or we share a common C-file with - enums for the opcodes and cat it to both the assembler and the compiler - during the build (should not result in increaed code size, as - those are enums). - the asm(x) or asm(x,y) constructs can be mapped on the host compilers - to asm __volatile__ .byte ugliness. In cc and c4 we can take the swieros - approach. This should give us nice lowlevel inline assembly in a really - simplified way (basically embedding bytes). - Not having inline assembly means you need compilation units written - and linked to the program in assembly, which - well - adds a linker - and calling conventions, which might be too early in bootstrapping. -- asm-i386: device a new version which runs on c4 and is again freestanding - - static: just ignore, we don't have a linker, otoh, just rewrite it whithout static, - vararg, etc. -- c4.c: checkout c5-AST branch (darn, that one looks more promising to extend!) -- cc.c: putint as a command in the language for early debugging (as in early Pascal), - points to a fundamental conflict: bootstrapping is better with stdout and stdin in - the language (no linker, no function calls etc. needed). OTOH we don't want to - have I/O as part of the language later, more be in the standard library. - Inline assembly in the generated code duplicates code with the putint in libc-freestanding. -- error output is not on stderr, well, are we going to add stdout, stderr now - or do we write errors as sort of assembly comments? -- AST debate: - - expressions really require an AST (just the A_ASSIGN itself with its reversed - order). IF, WHILE, etc. not, they can be in an AST or be endcoded directly (or if - not, what kind of optimizations do we loose?) - - the context of a boolean expression can be an if (in this case we would generate - direcly the jnXX instruction) and the far less often seen case of assigning it to - a variable. Knowing the contest would require an AST. - - AST should not be the output of whole programs, scoping is maybe better -- don't allow non-blocked if/else, just avoid dangling else problems -- for loops: they simply have a too general and weird semantic in C (missing - semicolon after the last triplet). IMHO a for loop makes sense for SIMD - operations only when we can use a stricter grammar to optimize certain - iterations. -- c4: recursive descent parsing requires forward function declarations. Forward - function declarations are not that easy to implement, because you have to - generate a placeholder for the call address before you get the whole - definition of the forwarded function (especially its entry address). - Or we create sort of a temporary jump into a jump table (sort of a GOT) which - we patch when we know the address of the implementation of the function. - Having a global table at one place scales easier, as we don't have to keep - the whole generated code around just for patching (remember, we have tapes - and memory, no seek of files). - -- cgit v1.2.3-54-g00ecf