summaryrefslogtreecommitdiff
path: root/miniany/REQUIREMENTS
blob: a6fce03b5f3586671f512171d5258bcab03fc88f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
implementing:

- userland
  - argument passing to main function (argc, argv)
- libc
  - print_char
    - requires a 3 parameter syscall to 80h (Linux)
      - requires
        - inline assembly

not implementing:
- libc
  - variadic functions are not type-safe, do we need them?
    - printf -> putint, putchar, etc.
      - format string only, as replacement for puts
      - vararg required in compiler
      - not type-safe
    - snprintf no option, strcat, strstr etc also not really
    - newer formating functions and logging: strfmon, error, warn, syslog
    - syscall
  - puts
    - requires stdout, which is a FILE structure
  - print_char
    - requires a 3 parameter syscall to 80h (Linux)
      - requires
        - either inline assembly
        - linker and calling convention
- preprocessor
  - have a cat building up the required modules instead
  - needs file operations (at least open, close, read)
  - needs a file system on the host and the destination
    (alternative: have a tape-like file system)
- linker
  - have compilation units needs a linker do build
    an executable
- symname[t] printing the symbol and not the number,
  requires static initializers for array of char*
- ASTs are basically only useful when you start to optimize,
  till then you can use an intermediate format (as C4) does
  and a stack machine. They also make the code easier readable.
  For use they force the introduction of pointers, references and structs.
  In expression parsing we see, that const folding already needs
  an AST, because we should not emit code when still reading
  a constant expression. It also seperates syntactical stuff like '['
  from logical stuff like 'declaration of array size' and 'derefencing
  a pointer'.
- void * allowing to omit (char *) from and to for instance structs
  in dynamic memory management
- typedefs are just syntactic sugar, I use them mostly for 'struct T' -> 'T'
- initializers of global and locals, not that important as we use C89 anyway,
  forcing us to separate declaration and usage of variables per scope
- unions, useful to safe space in AST, but not strictly necessary  
- bool, useful, but not strigtly necessary
- enums as constant replacement (instead of preprocessor), realy enum types
  are not really useful.
- forward struct definitions or typedefs (handy for Compiler structure), but..
  and we can work around by not producing any loops (hopefully)
- for loop: unless we start optimizing (SIMD) there is no real benefit
  for a generic 'for', a strict for i=0 to N, i++ is easier to optimize, when
  you have a grammatical construct to help recognizing it.
- register number for register alloation
  https://en.wikipedia.org/wiki/Strahler_number
- volatile: we are not doing any optimizations for now, so volatile (as const)
  can just be a ignored keyword.
- c4 freestanding
  - uses some casts, the malloc ones are actually good for clarification,
    the ones in memset are not so useful (this is all because we don't
    have 'void *')
  - open/read/close is POSIX, we would prefer either C style file handling
    (we have it in libc-freestanding.c or some stdin, stdout thingy)
  - again printf and varargs, either use libc-freestanding.c or revert to
    putint, putstring, putnl..
    - if (tk == '(') next(); else { printf("%d: open paren expected\n", line); exit(-1); }
      =>
      error("open paren expected"); }
    - printf("%d: compiler error tk=%d\n", line, tk); exit(-1);
    - printf("could not malloc(%d) symbol area\n") => remove size, also map to error
    - printf("read() returned %d\n", i); => dito
    - we also print a non-sensical line, but we don't really care about this
    - printf("%d: bad enum identifier %d\n", line, tk); he number 'tk' looks like
      debug output here, so we drop it.
      error1int is the other option (also choosen in other places)
    - other cases translate by hand:
      - case EXIT: /* putstring("exit("); putint(*sp); putstring(") cycle = "); putint(cycle); putnl(); */ return *sp;
      - default: putstring("unknown instruction = "); putint(i); putstring("! cycle = "); putint(cycle); putnl(); return -1;
  TODO:
  - global char array declarations
  - void parameter
TODO:
- avoid GNU-stype inline assembler (is far too complex), have more a
  inline bytecode adder for explicit opcodes, e.g. nop -> .byte 0x90
  - c.c in swieros (the c4 successor) has 'asm(NOP)', this is something we
    could implement easily, preferably just as 'asm(0x90)'.
    u.h contains an enum with opcodes (most likely doable or an easy architecture
    like the one in swieros, I doubt this works for Intel opcodes).
    There should though be only one single point of information for
    opcodes per architecture, so asm gets sort of an inline string
    generator for the assembly output. Or we share a common C-file with
    enums for the opcodes and cat it to both the assembler and the compiler
    during the build (should not result in increaed code size, as
    those are enums).
    the asm(x) or asm(x,y) constructs can be mapped on the host compilers
    to asm __volatile__ .byte ugliness. In cc and c4 we can take the swieros
    approach. This should give us nice lowlevel inline assembly in a really
    simplified way (basically embedding bytes).
    Not having inline assembly means you need compilation units written
    and linked to the program in assembly, which - well - adds a linker
    and calling conventions, which might be too early in bootstrapping.
- asm-i386: device a new version which runs on c4 and is again freestanding   
  - static: just ignore, we don't have a linker, otoh, just rewrite it whithout static,
    vararg, etc.
- c4.c: checkout c5-AST branch (darn, that one looks more promising to extend!)
- cc.c: putint as a command in the language for early debugging (as in early Pascal),
  points to a fundamental conflict: bootstrapping is better with stdout and stdin in
  the language (no linker, no function calls etc. needed). OTOH we don't want to
  have I/O as part of the language later, more be in the standard library.
  Inline assembly in the generated code duplicates code with the putint in libc-freestanding.
- error output is not on stderr, well, are we going to add stdout, stderr now
  or do we write errors as sort of assembly comments?
- AST debate:
  - expressions really require an AST (just the A_ASSIGN itself with its reversed
    order). IF, WHILE, etc. not, they can be in an AST or be endcoded directly (or if
    not, what kind of optimizations do we loose?)
    - the context of a boolean expression can be an if (in this case we would generate
      direcly the jnXX instruction) and the far less often seen case of assigning it to
      a variable. Knowing the contest would require an AST.
  - AST should not be the output of whole programs, scoping is maybe better
- don't allow non-blocked if/else, just avoid dangling else problems
- for loops: they simply have a too general and weird semantic in C (missing
  semicolon after the last triplet). IMHO a for loop makes sense for SIMD
  operations only when we can use a stricter grammar to optimize certain
  iterations.
- c4: recursive descent parsing requires forward function declarations. Forward
  function declarations are not that easy to implement, because you have to
  generate a placeholder for the call address before you get the whole
  definition of the forwarded function (especially its entry address).
  Or we create sort of a temporary jump into a jump table (sort of a GOT) which
  we patch when we know the address of the implementation of the function.
  Having a global table at one place scales easier, as we don't have to keep
  the whole generated code around just for patching (remember, we have tapes
  and memory, no seek of files).