summaryrefslogtreecommitdiff
path: root/minie/TODOS
blob: d57248d979e7b00568f5b1a3409954cee738c8d3 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
expression can also be the result of a function (system.readline)
assignment dependens on the type of the left-hand-side string (s := system.readline)
or w introduce VAR parameters

s := system.writeline( );
system.readline( var s : array of char );

the problem here is: we cannot return a structure of arbitrary size
via the stack. so the var version is the one fitting more to one
written in a real system module in language e. The embedded function
version of the pseudo system module looks more like a special assignment
for string arrays.

types:

boolean: and or not operators

integer : + - * / mod, max(int), min(int)

char: 'A' or char(13)

byte: 8-bit unsigned value
word: 32-bit addressable value

enum colors (red, blue, green, yellow) we allow int(red)=0
what about assignments of explicit values as in C?

<= <> = (assignment :=)

const types?

no floats for now

structured types: record, array, set

array a[1:25]

strings are arrays of variable len?
do we need ranges for arrays?

a : array[20] of integer;

is clearer than in Edison: array a[0..20] (int)

array 10, 10 of integer, that's Oberon syntatic sugar

another way of representing the length in the last bytes of the array
and also to zero terminate the string (bron dijkstra string):
https://github.com/norayr/Bron-Dijkstra-Strings/blob/master/bdStrings.Mod
Dijkstra\ -\ Efficient\ String.pdf

---

edison-es drops modules. I actually find system.writeln, system.readln quite
appealing.

writeinteger
writeboolean
writechar
writeln/writeline/writestring, but there is no basic type for a sequence
of chars, is this a array[20] of char?

explicit skip

strings as types: "Abc" is a string constant can be represented as
array[3] of char, but then, how can this be assigned to a array[4] of char?
So types can be assigned if they are compatible, so we can say assigning
an array[3] of char (also 'Abc') to array[4] of char is possible, but
not the other way round as it would violate the boundaries!
array of char is only possible with dynamic memory management, which is
a thing we might not want at all?
0-terminated vs. length. but not 255 Pascal-like, have a RLE schema for
first N bytes.
char can also be unicode, conversion to integer is possible, but not to
byte. Use array[128] of byte for buffers.
certain functions might have to work on arrays of arbitrary size, like
a 'StrCopy' function with 'array of char' with an unknown size. They need
a relaxed type check and delegate checking of boundaries if needed into
the runtime.

built-in functions like LEN or system.length. length, sizeof sounds
more like a compiler thing, system.length more like a library thing.
Actually. We don't want to say len is platform dependency, so using
length in a piece of code might be very portable.

So we have an internal set of functions related to compiler things:
- domains of data types
- conversion of data types
- len, size, addr of variables/arrays

The system module on the other hand contains things which relate
somehow to the environment, e.g. backend, operating system and which
might have to be ported heavily. They are still called inside the
compiler most likely when generating code.

expressions

var
	b : boolean;

b := s[i] <> char( 0 );
if b do
	x
end

is the same as:

if s[i] <> char( 0 ) do
	x
end

The '<>' operator must return a boolean type. So we just call expressions
inside if as for in the assignment (later also in the 'while' condition).

return expression: only at the end, after statementBlock, or as
"begin" statementlist [ "return expression ] "end" or
as a semantic thing allowing "return" everywhere but knowing whether
the context is a procedural or a function context, or as in C, allow
it everywhere because everything is a function.

system.readline( s ) fits more to fgets, but s := system.readline; is
more what I want.

memory management
-----------------

options:
- static allocation
- stack-based
- explicit: C malloc/free
- region-based
- thread-local heap
- implizit:
  - garbage collection
  - ARC: reference counting and weak pointers

decouple from polymorphism, seems to be a big design problem in
most programming messages.

dangers in real-time programming:
- priority inversion on locks
- fragmentation of memory, program fails because there is not enough
  un-fragmented space, a copying garbage collector might help or 
  compacting and rewriting pointers, but this is again a real-time
  issue if not done incrementally

how to decoouple read-only and read-write parts of the the statically
allocated memory?

Stack only allocation if possible. This also means, temorary structures
can not be trees with pointers. This means a transpiler must emit code
(in our case C source code) while parsing, which might me challenging.

Even better is static sized local buffers and global statically allocated
structures (e.g. a symbol table with at most 50 types). This limits have
to be adapted and the compiler has to be recompiled. But the benefits are
that you are not using any dynamic memory allocation which can go wrong
in some ways.

procedure/function declarations:

Pascalish:
procedure f : char;
procedure f;

C-ish:
procedure f( ) : char;
procedure f( );
=> matter of definition of ParameterList in ProcedureDeclaration

function getChar : char;
procedure getChar : char;
=> doesn't add anything to help parsing
=> pascal/oberon calls it function procedures
=> had some weird discussions telling me functions are not procedures..

procedure f( var a : integer );
function sin( x : float ) : float;
=> function seems more mathematical, but otherwise we don't gain anything
   to have a keyword more for detecting anything we couldn't detect already

using/calling:

procedure call:
init;
proc( a, b );

function call in expression:
a+sin( b )
a+rand( )
=> even here it might feel more logical and actually the syntax element '('
helps us to detect it is actually a function, otoh we can get the same
information from the symbol table.

enums:

Oberon has none.
You can always use constants or sets, but then the switch statement cannot
be protected against wrong use of constants. C and Java went the way from
constants to proper enums.
=> subtyping problem, extending enums means removing states to be defined
in a sane way. Now removing states in an enum makes hardly code relying on
more states behave in a consistent way.
=> subtype-explosion, enums are just a fancy way of defining integer constants
the only practical application I have is avoid implicit type-coersion to ints
and handle the ranges in a state machine switch.
=> enums used in array subscripts lead to the sub-range problem of pascal/edison
   unless I force enums to always start from 0,1,2,... as internal representation
=> OOP has no need for enums, as I can discriminate and extend a basic type,
e.g. KEYWORD extended to KEYWORD_MODULE, KEYWORD_IF, etc.
=> enum constants have no const value, so they cannot be used to define an
   array (or at least, this needs a special cast again)
compared to functional languages the C-version of enums is quite limites,
see tagged unions (for instance in Rust).

underscores:
started when trying to add S_module constant, so defacto a workaround for
a missing namespace/module called 'Scanner' with constant 'module'. Do we
forbid _ alltogether, as they are a sign of bad modularization or namespace
emulation? On the other hand we will have longer identifiers, so _ is needed
to separate words.

AST:
https://stackoverflow.com/questions/21150454/representing-an-abstract-syntax-tree-in-c

design
Scanner class or struct vs. OPS module containing all variables. all modules
in the Oberon compiler act as singletons.

nested procedures

intermediate formats can be in memory or on external media. The later is
the older design when memory was scarse. It also avoids using complicated
data structures in memory.

symbols

all allocation on stack might not be the best idea..
..we allocate all symbols with their type, we must split symbols and
types. Variables point to their defined type, parameters in a function
to local variables (or variables of the upper scope in case of a VAR
parameter), names in a record type definition to symbols agains, which
point to types.
This also means we usually get a global namespace this way, so a variable
'a', a type 'a' and a procedure 'a' cannot co-exist.

procdure types

type
	Func = procdure( x : integer, y : integer ) : char;
var
	f : func;
begin
	f := nil;

this is a pointer to a function, so is 'pointer to procdure' better as
in a pointer to a procedure implementing that inteface? also, the 'x'
and 'y' don't really have a semantic meaning in the type declaration
as I can assign also a matching function 'f2':

procdure f2( a : integer, b : integer ) : char;
begin
	...
end

f := f2;

nil should most likely end in an exception. but this means we have to
check a special condition on every function call.

e2c or C-transpiler questions

Symbol for scanner and Symbol for symbol table clash, we should use modules
and different names. And why are we insisting on having no preprocessor
and only one e2c.c?

set/bitset/enum

set: mutually exclusive
bitset: usable as switches, flags

have a look at System/360 and how the grand-fathers and fathers did it:
XPL/XCOM: strings af variable size with garbage collection

links
-----

https://hackernoon.com/considerations-for-programming-language-design-a-rebuttal-5fb7ef2fd4ba
https://en.wikibooks.org/wiki/Oberon/A2/Oberon.Strings.Mod
https://en.wikipedia.org/wiki/Tombstone_diagram