summaryrefslogtreecommitdiff
path: root/minic/README
blob: a8a4314ebd873cc03133115906131671f903e0cf (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
mission
-------

minimal C to program the kernel.

design desicions
----------------

Use enum constants rather than preprocessor constants.

Do we allow structs, functions etc. We could go with global variables
only and basic types, makes the compiler simpler, but the code maybe
not so well-structured.

The kind of C dialect we use is a simplified version, so we
rather write i = i + 1 than i++, ++i

Constructs we need
------------------

From CPP
--------

In principle we would like to avoid having to implement CPP
funtionality, but..

header file includes
--------------------

#include "filename.h" (but we can go without preprocessor and make it
a special command in the language itself. We don't want and need
preprocessor tricks (we think). On the other hand we want to use
only things known to standard compilers so that we can bootstrap
from a host with a standard compiler.

=> #include "filename.h"

include guards
--------------

#import "a.h" is deprectated in gcc, was Microsoft extension and
Objective-C.

ok, no go then.

#pragma once in a.h

very simple to implement, avoids clashes of names of guard macros,
which is very good.

instead of

#ifndef XXXX
#define XXXX
#include "a.h"
#endif

We could implement a simple preprocessor functionality which only
works with defined symbols (names and no values).

#pragma once is not portable, but maybe quite well supported?

include_next
------------

To extend a standard header. I see this only in a hosted environment.
For instance to shim standard headers if a compiler doesn't provide
some things (see stdlib.h in abaos/libc).

platform switches
-----------------

#ifdef HOSTED
#define print(X) puts(X)
#else
// our own function print to our kernel
#endif

alternatives?

Linking to specific implementation files io-host.c vs io-abaos.c

pimpls and casts for OS-specific structs, not-typesave. on the
other hand #ifdef's in structs are also quite dangerous (ABI
mismatches).

debug code
----------

#ifdef DEBUG
#endif

constants
---------

for array dimensions mainly.

#define ACONSTANT 5
int b[ACONSTANT];

constants can be done with 'enum { ACONSTANT = 5 };'

No this is really a problem when parts of standard C require a macro
processor just to define things like NULL:

#if !defined(NULL)
    #define NULL ((void*)0)
#endif

runtime
-------

We need a minimalistic runtime (basically the functions needed to write
the self-compiling first-stage compiler).

crt0 type entry points:

raise
_start
_exit

testing
-------

From the simplest test program on, make sure we generate some output
and verify it! Even if this means we must write special bootstrap test 
code..

Make sure we can output all phases of the compiler, lexing/parsing,
semantics, code generation, etc.

Have a pseudo code generator like "cucu", or maybe better something
which even runs (Jasmin, Java byte-code), cucu has a Python interpreter
for the pseudo assembler. The real target we should postpone, as it
involves linking, ELF, GPT, Intel assembly code and other things which
complicate issues in the beginning.

Also having a small virtual CPU as in Oberon (where the virtual CPU
actually can get real :-) ) or as in
http://schweigi.github.io/assembler-simulator/instruction-set.html
is an idea.
The idea is even usable for a running system like
erlm.github.io/OberonEmulator/

header files
------------

Generate them, later make the compiler self-aware of source files
in Oberon-style. For bootstrapping via another C compiler we
need simple header file (#pragma once only, #include only in the
C file. For including things which are prerequisites either:
- the program can define a global series of includes
- the library or component (like minilib) defines a facade header
  file

generating header files:
http://www.hwaci.com/sw/mkhdr/

But, we would like the minic compiler to do this, not yet another
tool in the chain.

makeheaders plays too many tricks with preprocessors. Another possibility
is to have a modified C with other constructs which then first gets
converted into plain C89 with full-preprocessor support.

io.c:

typedef struct Io {
} Io;

implicit
-> #include "io.h"

declaration gets moved into io.h

user of a module uses:

import io;

shadowing
---------

Traditional C has it, but is it a good idea? Why has it been
invented? How much is the complecity of symbol management increased?

TODO: Finding a counter example.

Coffescript doesn't have explicit shadowing, so you always have to
choose proper names.

We have to avoid confusion between assignment and declaration:

x = a;
int x = a;

One symbol space (pascal) so you cannot have a type 'X' and
a variable 'X' or a function 'X' in the same scope. Also here
C deviates and has separate namespaces.

floating point arithmentic
--------------------------

For kernel programming more a nuisance than helpful, so it's second
priority.

unicode
-------

Traditionally a mess, so maybe having it in userland as user library
only?

compiler and linker for CPU features
------------------------------------

Imagine a linker who can handle f_sse2( ), f_i486( ) as function f( )
at runtime. This would also eliminate the need for #ifdef i386 and
stuff like that.

SSE2 {
}

i486 {
}

could be like namespaces with special meaning. But this would require
inline assembly as optimzation of C inside the namespace might not be
enough.

Approaches
----------

C4
--

C4 is self-hosting and has the minimum features we need, it lacks
some things:
- create object files for running (not just in-memory execution)
- too many OS dependencies
- functions are part of the parser

This shows that the compiler is indeed self-hosting:

./c4 c4.c c4.c hello.c
hello, world
exit(0) cycle = 9
exit(0) cycle = 26015
exit(0) cycle = 10059669

Minimalistic, usable with modifications for bootstrapping a compiler.

lcc
---

book. very good to read. shows practical issues. sadly the coding style
is not of our likeing. the distinction in front and backend is very good.
picoc looks like a bootstrapping interpreter.

qbe
---

Interesting project. with a bootstrapping minic. Sort of an intermediate
LLVM-like language, but much simpler.

links
-----

https://github.com/alexfru/SmallerC
https://github.com/rswier/c4
http://c9x.me/compile/doc/il.html (QBE)

Building
--------

gcc -I../minilib -g -O0 -m32 -march=i386 -ffreestanding -Werror -Wall -Wno-return-type -pedantic -std=c89 -o minic *.c ../minilib/*.c
clang -I../minilib -g -O0 -march=i386 -fno-builtin -std=c89 -Werror -Wall -Wno-return-type -o minic *.c ../minilib/*.c
tcc -I../minilib -g -O0 -march=i386 -fno-builtin -std=c89 -Werror -Wall -Wno-return-type -o minic *.c ../minilib/*.c
pcc -I../minilib -g -O0 -march=i386 -fno-builtin -std=c89 -Wall -Wno-return-type -o minic *.c ../minilib/*.c