summaryrefslogtreecommitdiff
path: root/miniany/README.html
diff options
context:
space:
mode:
authorAndreas Baumann <mail@andreasbaumann.cc>2022-05-13 10:57:30 +0200
committerAndreas Baumann <mail@andreasbaumann.cc>2022-05-13 10:57:30 +0200
commit79610f324861f32fff88ac34c97e6bc5a7f3bda8 (patch)
tree0cb5aa7731fe4b948807c0083ae7a97f101ea951 /miniany/README.html
parent5a1bc4fc3c5990f7e71ccf5191bc33c5874dea04 (diff)
downloadcompilertests-79610f324861f32fff88ac34c97e6bc5a7f3bda8.tar.gz
compilertests-79610f324861f32fff88ac34c97e6bc5a7f3bda8.tar.bz2
fixed main documentation
Diffstat (limited to 'miniany/README.html')
-rw-r--r--miniany/README.html58
1 files changed, 33 insertions, 25 deletions
diff --git a/miniany/README.html b/miniany/README.html
index a736885..ab8a828 100644
--- a/miniany/README.html
+++ b/miniany/README.html
@@ -73,8 +73,8 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<li>required features from the operating system: for instance the requirement for a POSIX layer</li>
<li>ugliness of the resulting code: it doesn't make sense to omit features which are somewhat hard to implement but can render readability of the code much better: best examples are the <i>switch/case </i>vs. <i>if/else </i>and the funny array indexing vs. structures in C4.</li>
</ul>
-<p>So we collect some ideas here about features we add or do not add and why. We also collect here what their implications are, when we are implementing them.</p>
-<p>We also have to be careful what C4 can do for us and either add it there (but only if small enough) in order no to loose this test case.</p>
+<p>So we collect some ideas here about features we add or do not add and why. We also collect here the implications when we are implementing them.</p>
+<p>We also have to be careful what C4 can do for us and either add it there (but only if small enough) in order not to loose this test case.</p>
<h3>Preprocessor for modularisation</h3>
<p>Implementation status: no</p>
<p>Reasoning:</p>
@@ -90,6 +90,7 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<ul>
<li>we could have sort of a special tape filesystem with a directory at the beginning and offset to allow the preprocessor to find the files for the early destination OS</li>
<li>there is no reason not to use the preprocessor on the host</li>
+<li>We can end up in nasty hen-and-egg problems with functions needing each other for different falvours of implementation (e. g. <i>atoi </i>in hosted and freestading), so we might end up duplicating source code.</li>
</ul>
<h3>Preprocessor for conditional compilation</h3>
<p>Implementation status: no</p>
@@ -144,8 +145,8 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<p>Some general notes:</p>
<p>GNU inline asm statement has become the de-facto standard (which is too complicated IMHO): I would require sort of a <i>.byte </i><i>0xX</i>X instruction only, for readablility maybe simple fasm-like syntax. We must be careful that our invention of an inline assembler can be mapped somehow to the GNU inline asm version, so that we can use that one on the host with gcc/clang/tcc/pcc..</p>
<p>c.c in swieros (the c4 successor) has <i>asm(NOP)</i>, this is something we could implement easily. u.h contains an enum with opcodes (most likely doable or an easy architecture like the one in swieros, I doubt this works for Intel opcodes, but we should check if it works for our simplified Intel opcode subset).</p>
-<p>There should though be only one single point of information for opcodes per architecture, so asm gets sort of an inline string generator for the assembly output. Or we share a common C-file with enums for the opcodes and cat it to both the assembler and the compiler during the build (should not result in increaed code size, as those are enums).</p>
-<p>The asm(x) or asm(x,y) constructs can be mapped on the host compilers to asm __volatile__ .byte ugliness. In cc and c4 we can take the swieros approach. This should give us nice lowlevel inline assembly in a really simplified way (basically embedding bytes).</p>
+<p>There should though be only one single point of information for opcodes per architecture, so <i>asm </i>gets sort of an inline string generator for the assembly output. Or we share a common C-file with enums for the opcodes and cat it to both the assembler and the compiler during the build (should not result in increaed code size, as those are enums).</p>
+<p>The <i>asm(x) </i>or <i>asm(x,y) </i>constructs can be mapped on the host compilers to <i>asm </i><i>__volatile__ </i><i>.byte </i>ugliness. In cc and c4 we can take the swieros approach. This should give us nice lowlevel inline assembly in a really simplified way (basically embedding bytes).</p>
<p>Not having inline assembly means you need compilation units written and linked to the program in assembly, which - well - adds a linker and calling conventions, which might be too early in bootstrapping.</p>
<h3>Object formats and linkers</h3>
<p>Implementation status: no</p>
@@ -158,7 +159,7 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<li>make compilation units from a bunch of source code, this results in bigger binaries, as we cannot share too much. This might be acceptable for early bootstrapping. Later on it would be nice to add a linker as an optional addon (which can be used outside of bootstrapping).</li>
</ul>
<h3>Forward declarations of function prototypes</h3>
-<p>Implementation status: yes (<b>TODO</b>)</p>
+<p>Implementation status: yes</p>
<p>Reasoning:</p>
<ul>
<li>Recursive descent parsing requires forward function declarations.</li>
@@ -183,7 +184,7 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<p>Requirements</p>
<ul>
<li>on the hosted Linux envorinment we need syscalls to <i>syscall</i>, <i>int </i><i>0x80</i>, etc.</li>
-<li>we need inline assembly to create to syscalls</li>
+<li>we need inline assembly to create the syscalls</li>
</ul>
<p>Alternative:</p>
<ul>
@@ -193,16 +194,16 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
</ul>
<p>Counter arguments:</p>
<ul>
-<li>we deviate from the C standard, printf just belongs to C</li>
-<li>printf is actually not hard to implement in a type-unsafe-way</li>
-<li>syscalls are variable arguments</li>
+<li>we deviate from the C standard, <i>printf </i>just belongs to C</li>
+<li><i>printf </i>is actually not hard to implement in a type-unsafe-way</li>
+<li>syscalls have variable arguments</li>
</ul>
<h3>FILE* and stderr</h3>
<p>Implementation status: no</p>
<p>Reasoning:</p>
<ul>
<li>requires FILE * structures, requires various write channels from the operating system</li>
-<li>we can write error messages into the output stream as comment <i>; </i><i>ERROR </i><i>in </i><i>line </i><i>32, </i><i>pos </i><i>2: </i><i>generic </i><i>error </i>(<b>TODO</b>)</li>
+<li>we can write error messages into the output stream as comments like <i>; </i><i>ERROR </i><i>in </i><i>line </i><i>32, </i><i>pos </i><i>2: </i><i>generic </i><i>error </i>(<b>TODO</b>)</li>
</ul>
<p>Counter arguments:</p>
<ul>
@@ -213,7 +214,7 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<p>Implementation status: no</p>
<p>Reasoning:</p>
<ul>
-<li>typedefs are syntactic sugar for <i>typedef </i><i>struct </i><i>T </i>as T, not strictly necessary and they don't make our code look too ugly.</li>
+<li>typedefs are syntactic sugar for <i>typedef </i><i>struct </i><i>T </i>as <i>T</i>, not strictly necessary and they don't make our code look too ugly.</li>
</ul>
<p>Counter arguments:</p>
<ul>
@@ -223,12 +224,12 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<p>Implementation status: no</p>
<p>Reasoning:</p>
<ul>
-<li>unless we start optimizing (SIMD) there is no real benefit, for a generic 'for', a strict for i=0 to N, i++ is easier to optimize, when you have a grammatical construct to help recognizing it.</li>
+<li>unless we start optimizing (SIMD) there is no real benefit for a generic 'for', a strict for i=0 to N, i++ is easier to optimize, when you have a grammatical construct to help recognizing it.</li>
<li>the C for loop has a funny ending semicolon issue and you can add arbitraty code blocks into the three parts of the for, this is way too generic for a minimalistic language</li>
</ul>
<p>Counter arguments:</p>
<ul>
-<li>for loops are not hard to implement</li>
+<li>for-loops are not hard to implement</li>
</ul>
<h3>Passing arguments to main</h3>
<p>Implementation status: yes</p>
@@ -240,18 +241,23 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<ul>
<li>Do really everything over the input stream, but this would feel a little bit too mainframe-ish.</li>
</ul>
-<h3>bool</h3>
+<h3>Boolean Type</h3>
<p>Implementation status: no</p>
<p>Reasoning:</p>
<ul>
-<li>C89 has no boolean</li>
+<li>C89 has no <i>bool </i>type</li>
<li>useful, but not strigtly necessary, we can live with <i>int </i>holding a boolean value</li>
</ul>
+<p>Counter arguments:</p>
+<ul>
+<li>boolean and integer values and variables form a nice little type system for expressions so introducing booleans might have some educational value.</li>
+</ul>
<h3>Union</h3>
<p>Implementation status: no</p>
<p>Reasoning:</p>
<ul>
<li>in the compiler there is little benefit of compressing parts of e. g. ASTs into unions</li>
+<li>unions allow accessing the same memory via different means, this can also be achieved with code accessing the memory differentlyand doing some byte/word conversions for instance.</li>
</ul>
<p>Counter arguments:</p>
<ul>
@@ -273,7 +279,7 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<p>Implemention status: yes, but..</p>
<p>Reasoning:</p>
<ul>
-<li>There are good reasons not to allow <i>return </i>everywhere in the code, see newest Oberon revisions allowing <i>RETURN </i>only at the end of the function declaration. There are benefits like easier detection whether the function returns a function, easier flow analysis (image <i>returns </i>in complicated <i>if-else</i>-statements). For now we allow it everywhere, but we should try hard not to use it in the middle of code blocks.</li>
+<li>There are good reasons for not allowing <i>return </i>everywhere in the code, see newest Oberon revisions allowing <i>RETURN </i>only at the end of the function declaration. There are benefits like easier detection whether the function returns a value, easier flow analysis (imagine <i>returns </i>in complicated <i>if-else</i>-statements). For now we allow it everywhere, but we should try hard not to use it in the middle of code blocks.</li>
<li>There is an argument from the code correctness point of view as <i>return </i>in the middle of code makes the code hard to reason about (similar to too many if-else-statements)</li>
<li>Allowing <i>return </i>only at the end of a function and nowhere else makes tail-recursion easy.</li>
<li>Error handling is really hard when <i>return </i>appears everywhere in the body, it's much easier to check whether there is a <i>return </i>at the end and whether the returned type matches the prototype of the function,</li>
@@ -286,7 +292,7 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
</ul>
<p>Counter arguments:</p>
<ul>
-<li>Stack machine with the top of stack in EAX is also quite a simple solution.</li>
+<li>Stack machine with the top of stack in EAX is also quite a simple and efficient solution (see <i>Write </i><i>Your </i><i>Own </i><i>Compiler, </i>Holm).</li>
</ul>
<h3>Abstract Syntax Trees</h3>
<p>Implementation status: yes</p>
@@ -294,11 +300,11 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<ul>
<li>Delaying code generation is essential when doing an assignment (rvalue must be evaluated before lvalue), in const folding (do no generate code as you don't know the expression is a constant but instead just compute the current value of the constant expression).</li>
<li>Separate semantic operation <i>array </i><i>index </i><i>evaluation </i>from <i>definition </i><i>of </i><i>the </i><i>size </i><i>of </i><i>an </i><i>array </i>for instance with the '[' character (we do not want to react on the scanner symbol directly).</li>
-<li>When evaluating a boolean expression we don't know yet it's context (can be in a <i>if/while </i>condition, in which case we would geneate a conditional jump or whether in an assignment where we would store the value to the boolean variable).</li>
+<li>When evaluating a boolean expression we don't know yet its context (can be in a <i>if/while </i>condition, in which case we would generate a conditional jump or it can be in an an <i>assignment</i>, in which case we would store the value into a boolean variable).</li>
</ul>
<p>Caveats:</p>
<ul>
-<li>Try to keep the scope of an AST as small as possible and as big as necessary (the output of the parser should not be the complete source code). Minimal is an expression and some context, maybe maximal is the scope of a function.</li>
+<li>Try to keep the scope of an AST as small as possible and as big as necessary (the output of the parser should not be the complete source code). The mininum we need is an expression and some context, the maximum maybe is the scope of a function.</li>
</ul>
<p>Counter arguments:</p>
<ul>
@@ -308,11 +314,11 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
<p>Implementation status: yes</p>
<p>Reasoning:</p>
<ul>
-<li>Adding <i>putint/getchar </i>style of functions as elements of the language is tempting, as it allows early debugging and testing (as in PASCAL). The fundemental conflict here is betweet Bootstrapping is better with stdout and stdin in the language (no function calls, no linker, etc. needed). But later on we want those functions be part of a language library and not of the langyage itself.</li>
+<li>Adding <i>putint/getchar </i>style of functions as elements of the language is tempting, as it allows early debugging and testing (as in PASCAL). The fundemental conflict here is that bootstrapping is better with stdout and stdin in the language (no function calls, no linker, etc. needed). But later on we want those functions be part of a language library and not of the language itself.</li>
</ul>
<p>Caveats:</p>
<ul>
-<li>Avoid code duplication (inline assembly in the compiler for the keyword implementation and with intline assembly in the language library). (<b>TODO</b>)</li>
+<li>Avoid code duplication (inline assembly in the compiler for the keyword implementation and with inline assembly in the language library). (<b>TODO</b>)</li>
</ul>
<h2>References</h2>
<p>Compiler construction in general:</p>
@@ -335,14 +341,16 @@ cat c4.c4 EOF c4.c EOF cc.c EOF hello.c | ./c4
</ul>
<p>Other minimal compilers and systems:</p>
<ul>
-<li><a href="http://selfie.cs.uni-salzburg.at/">http://selfie.cs.uni-salzburg.at/</a>: C* self-hosting C compiler (also emulator, hypervisor) for RISCV, inspiration for what makes up a minimal C language</li>
-<li><a href="http://www.iro.umontreal.ca/%7Efelipe/IFT2030-Automne2002/Complements/tinyc.c">http://www.iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c</a>, Marc Feeley, really easy and much more readable, meant as educational compiler</li>
-<li><a href="https://github.com/rswier/swieros.git">https://github.com/rswier/swieros.git</a>: c.c in swieros, Robert Swierczek</li>
+<li><a href="http://selfie.cs.uni-salzburg.at/">http://selfie.cs.uni-salzburg.at/</a>: <i>Christoph </i><i>Kirsch</i>, C* self-hosting C compiler (also emulator, hypervisor) for RISCV, inspiration for what makes up a minimal C language</li>
+<li><a href="http://www.iro.umontreal.ca/%7Efelipe/IFT2030-Automne2002/Complements/tinyc.c">http://www.iro.umontreal.ca/~felipe/IFT2030-Automne2002/Complements/tinyc.c</a>, <i>Marc </i><i>Feeley</i>, really easy and much more readable, meant as educational compiler</li>
+<li><a href="https://github.com/rswier/swieros.git">https://github.com/rswier/swieros.git</a>: <i>Robert </i><i>Swierczek</i>, c.c in swieros</li>
+<li><a href="https://github.com/ras52/boostrap">https://github.com/ras52/boostrap</a>: <i>Richard </i><i>Smith</i>, bootstrapping experiment</li>
+<li><a href="http://t3x.org/t3x">http://t3x.org/t3x</a>: <i>Nils </i><i>M </i><i>Holm</i>, the T3X programming language, especially the bootstrapping version T3X9</li>
</ul>
<p>Assembly:</p>
<ul>
<li><a href="https://github.com/felipensp/assembly/blob/master/x86/itoa.s">https://github.com/felipensp/assembly/blob/master/x86/itoa.s</a>, for putint (early debugging keyword)</li>
-<li><a href="https://baptiste-wicht.com/posts/2011/11/print-strings-integers-intel-assembly.htm">https://baptiste-wicht.com/posts/2011/11/print-strings-integers-intel-assembly.htm</a> (earldy debugging keyword)</li>
+<li><a href="https://baptiste-wicht.com/posts/2011/11/print-strings-integers-intel-assembly.htm">https://baptiste-wicht.com/posts/2011/11/print-strings-integers-intel-assembly.htm</a> (early debugging keyword)</li>
</ul>
<p>Documentation:</p>
<ul>