summaryrefslogtreecommitdiff
path: root/miniany/REQUIREMENTS
diff options
context:
space:
mode:
Diffstat (limited to 'miniany/REQUIREMENTS')
-rw-r--r--miniany/REQUIREMENTS142
1 files changed, 0 insertions, 142 deletions
diff --git a/miniany/REQUIREMENTS b/miniany/REQUIREMENTS
deleted file mode 100644
index a6fce03..0000000
--- a/miniany/REQUIREMENTS
+++ /dev/null
@@ -1,142 +0,0 @@
-implementing:
-
-- userland
- - argument passing to main function (argc, argv)
-- libc
- - print_char
- - requires a 3 parameter syscall to 80h (Linux)
- - requires
- - inline assembly
-
-not implementing:
-- libc
- - variadic functions are not type-safe, do we need them?
- - printf -> putint, putchar, etc.
- - format string only, as replacement for puts
- - vararg required in compiler
- - not type-safe
- - snprintf no option, strcat, strstr etc also not really
- - newer formating functions and logging: strfmon, error, warn, syslog
- - syscall
- - puts
- - requires stdout, which is a FILE structure
- - print_char
- - requires a 3 parameter syscall to 80h (Linux)
- - requires
- - either inline assembly
- - linker and calling convention
-- preprocessor
- - have a cat building up the required modules instead
- - needs file operations (at least open, close, read)
- - needs a file system on the host and the destination
- (alternative: have a tape-like file system)
-- linker
- - have compilation units needs a linker do build
- an executable
-- symname[t] printing the symbol and not the number,
- requires static initializers for array of char*
-- ASTs are basically only useful when you start to optimize,
- till then you can use an intermediate format (as C4) does
- and a stack machine. They also make the code easier readable.
- For use they force the introduction of pointers, references and structs.
- In expression parsing we see, that const folding already needs
- an AST, because we should not emit code when still reading
- a constant expression. It also seperates syntactical stuff like '['
- from logical stuff like 'declaration of array size' and 'derefencing
- a pointer'.
-- void * allowing to omit (char *) from and to for instance structs
- in dynamic memory management
-- typedefs are just syntactic sugar, I use them mostly for 'struct T' -> 'T'
-- initializers of global and locals, not that important as we use C89 anyway,
- forcing us to separate declaration and usage of variables per scope
-- unions, useful to safe space in AST, but not strictly necessary
-- bool, useful, but not strigtly necessary
-- enums as constant replacement (instead of preprocessor), realy enum types
- are not really useful.
-- forward struct definitions or typedefs (handy for Compiler structure), but..
- and we can work around by not producing any loops (hopefully)
-- for loop: unless we start optimizing (SIMD) there is no real benefit
- for a generic 'for', a strict for i=0 to N, i++ is easier to optimize, when
- you have a grammatical construct to help recognizing it.
-- register number for register alloation
- https://en.wikipedia.org/wiki/Strahler_number
-- volatile: we are not doing any optimizations for now, so volatile (as const)
- can just be a ignored keyword.
-- c4 freestanding
- - uses some casts, the malloc ones are actually good for clarification,
- the ones in memset are not so useful (this is all because we don't
- have 'void *')
- - open/read/close is POSIX, we would prefer either C style file handling
- (we have it in libc-freestanding.c or some stdin, stdout thingy)
- - again printf and varargs, either use libc-freestanding.c or revert to
- putint, putstring, putnl..
- - if (tk == '(') next(); else { printf("%d: open paren expected\n", line); exit(-1); }
- =>
- error("open paren expected"); }
- - printf("%d: compiler error tk=%d\n", line, tk); exit(-1);
- - printf("could not malloc(%d) symbol area\n") => remove size, also map to error
- - printf("read() returned %d\n", i); => dito
- - we also print a non-sensical line, but we don't really care about this
- - printf("%d: bad enum identifier %d\n", line, tk); he number 'tk' looks like
- debug output here, so we drop it.
- error1int is the other option (also choosen in other places)
- - other cases translate by hand:
- - case EXIT: /* putstring("exit("); putint(*sp); putstring(") cycle = "); putint(cycle); putnl(); */ return *sp;
- - default: putstring("unknown instruction = "); putint(i); putstring("! cycle = "); putint(cycle); putnl(); return -1;
- TODO:
- - global char array declarations
- - void parameter
-TODO:
-- avoid GNU-stype inline assembler (is far too complex), have more a
- inline bytecode adder for explicit opcodes, e.g. nop -> .byte 0x90
- - c.c in swieros (the c4 successor) has 'asm(NOP)', this is something we
- could implement easily, preferably just as 'asm(0x90)'.
- u.h contains an enum with opcodes (most likely doable or an easy architecture
- like the one in swieros, I doubt this works for Intel opcodes).
- There should though be only one single point of information for
- opcodes per architecture, so asm gets sort of an inline string
- generator for the assembly output. Or we share a common C-file with
- enums for the opcodes and cat it to both the assembler and the compiler
- during the build (should not result in increaed code size, as
- those are enums).
- the asm(x) or asm(x,y) constructs can be mapped on the host compilers
- to asm __volatile__ .byte ugliness. In cc and c4 we can take the swieros
- approach. This should give us nice lowlevel inline assembly in a really
- simplified way (basically embedding bytes).
- Not having inline assembly means you need compilation units written
- and linked to the program in assembly, which - well - adds a linker
- and calling conventions, which might be too early in bootstrapping.
-- asm-i386: device a new version which runs on c4 and is again freestanding
- - static: just ignore, we don't have a linker, otoh, just rewrite it whithout static,
- vararg, etc.
-- c4.c: checkout c5-AST branch (darn, that one looks more promising to extend!)
-- cc.c: putint as a command in the language for early debugging (as in early Pascal),
- points to a fundamental conflict: bootstrapping is better with stdout and stdin in
- the language (no linker, no function calls etc. needed). OTOH we don't want to
- have I/O as part of the language later, more be in the standard library.
- Inline assembly in the generated code duplicates code with the putint in libc-freestanding.
-- error output is not on stderr, well, are we going to add stdout, stderr now
- or do we write errors as sort of assembly comments?
-- AST debate:
- - expressions really require an AST (just the A_ASSIGN itself with its reversed
- order). IF, WHILE, etc. not, they can be in an AST or be endcoded directly (or if
- not, what kind of optimizations do we loose?)
- - the context of a boolean expression can be an if (in this case we would generate
- direcly the jnXX instruction) and the far less often seen case of assigning it to
- a variable. Knowing the contest would require an AST.
- - AST should not be the output of whole programs, scoping is maybe better
-- don't allow non-blocked if/else, just avoid dangling else problems
-- for loops: they simply have a too general and weird semantic in C (missing
- semicolon after the last triplet). IMHO a for loop makes sense for SIMD
- operations only when we can use a stricter grammar to optimize certain
- iterations.
-- c4: recursive descent parsing requires forward function declarations. Forward
- function declarations are not that easy to implement, because you have to
- generate a placeholder for the call address before you get the whole
- definition of the forwarded function (especially its entry address).
- Or we create sort of a temporary jump into a jump table (sort of a GOT) which
- we patch when we know the address of the implementation of the function.
- Having a global table at one place scales easier, as we don't have to keep
- the whole generated code around just for patching (remember, we have tapes
- and memory, no seek of files).
-