diff options
Diffstat (limited to 'miniany/doc/www.andreadrian.de_tbng.txt')
-rw-r--r-- | miniany/doc/www.andreadrian.de_tbng.txt | 914 |
1 files changed, 914 insertions, 0 deletions
diff --git a/miniany/doc/www.andreadrian.de_tbng.txt b/miniany/doc/www.andreadrian.de_tbng.txt new file mode 100644 index 0000000..b3f150c --- /dev/null +++ b/miniany/doc/www.andreadrian.de_tbng.txt @@ -0,0 +1,914 @@ + BASICO - Programming Language, Self-compiling compiler + + Author: Andre Adrian + Version: 22.apr.2008 + +Abstract + + BASICO + * is a small imperative programming language that is just powerful + enough to compile itself (compiler bootstrapping). + * has no GOTO, but has while-break-wend and multiple return + * has C-like string handling. + * is implemented in less then 1000 source code lines for the + compiler. + * produces real binary programs for x86 processors, not P-code or + Byte-Code. + * uses the C compiler toolchain (assembler, linker) + * uses C library functions like printf(), getchar(), strcpy(), + isdigit(), rand() for run-time support. + + Please check the section [1]BASICO source code reading to get a + detailed discussion of the source code! + +Contents + + * [2]Abstract + * [3]Contents + * [4]Introduction + * [5]BASICO syntax ideas + + [6]GOTO considered harmful + + [7]Statement separator + + [8]Statement list + + [9]Dangling else + + [10]Assignment and Equal + + [11]Evaluation of conditional expressions + + [12]Variable type declaration + + [13]Automatic type conversion + + [14]Function calls + + [15]Nested functions + + [16]Call-by-value and Call-by-reference + + [17]Named constants + + [18]Comment + + [19]Compiler error handling + * [20]BASICO syntax + * [21]BASICO example program + * [22]BASICO development strategy + * [23]Predictive Parser for Expression + * [24]Examples for Compiler bootstrapping + * [25]Symbol Table + * [26]Code Generator + * [27]Peephole optimization + * [28]BASICO sources + + [29]Version 0.9 + + [30]Version 0.8 + + [31]Version 0.7 + + [32]Version 0.6 + + [33]Version 0.5 + + [34]Version 0.4 + + [35]Version 0.3 + + [36]Version 0.2 + + [37]Version 0.1 + * [38]Books + +Introduction + + In January 2006 Dr. Dobb's journal celebrated the [39]30th anniversary + of Tiny BASIC. Tiny BASIC was specified by Dennis Allison and was + implemented by Dick Whipple and John Arnold. Tiny BASIC needed 3 + Kilobytes of memory and was sold for 5 US-dollars. The Microsoft BASIC + of this time needed 8 Kilobytes and 150 US-dollars. + The original Tiny BASIC interpreter is written in an intermediate + language (IL) to save memory. The CPU runs a virtual machine (VM) to + emulate the IL processor. The IL processor interprets the Tiny BASIC + program. Li-Chen Wang created a Tiny BASIC version without IL for the + Z80 8-bit CPU. + One drawback of this approach is that the IL can handle strings, + because it has to parse the BASIC keywords like PRINT, GOTO, but the + Tiny BASIC itself has no string handling commands. IL has small memory + requirement but also slow execution speed. + Dennis Ritchie used compiler bootstrapping for the C compiler, like + Niklaus Wirth did for the computer languages PASCAL, Modula and Oberon. + The compiler itself is written in the language the compiler can + translate. After an first iteration that needs external help, the + compiler of iteration N can compile the source code that creates + compiler of iteration N+1. + The project is pure fun. It is just a "what if" you ask 30 years later. + In 1976 the knowledge for BASICO was already present. The "[40]goto + considered harmful" paper of Edsger W. Dijkstra was written in 1968. + The first PASCAL compiler was written in 1971. The UNIX developers used + compiler bootstrapping in 1973. The compiler tools [41]lex (Lesk, 1975) + and [42]yacc (Johnson, 1974) were available. + The word BASICO can be read as "[43]Beginner's All-purpose Symbolic + Instruction Code zero" or as Basico, just something with an italian + sound like lambrusco. + Note: The project started with the name "Tiny BASIC no GOTO (TB-NG)", + but BASICO sounds so much better. Basico is the name of a [44]small + italian town (745 inhabitants). + +BASICO syntax ideas + + Today the programming language C is the standard. Other languages are + comments to C. PASCAL is C with nested procedures but without stdio + library. C++ is C overkill. Modula-2 is C without printf but with + import and export. Ada is PASCAL++ for the DoD. JAVA is C++ with + garbage collection but byte-code. Tcl/Tk is no C at all, but a language + with minimum syntax. Last but not least BASIC is still not dead, but + has included a lot of syntax from other languages. + + GOTO considered harmful + + Today everybody agrees on the "one coding block shall have one entry + and one exit" paradigma " to shorten the conceptual gap between the + static program and the dynamic process, to make the correspondence + between the program (spread out in text space) and the process (spread + out in time) as trivial as possible". But in the same paper Dijkstra + also wrote "I remember having read the explicit recommendation to + restrict the use of the go to statement to alarm exits". + Just deleting the goto statement and not introduce some other syntax + for "alarm exits" does make programs less structured. Some alarm exits + are + * conditional break in a loop + * multiple return in a function + * conditional return stack un-rolling (try/throw/catch or + setjmp/longjmp) + + The traditional structured control flow for the "alarm exit" problem + needs additional state variables or special function return-values that + remember the alarm-status. Specially for return stack un-rolling the + dynamic control flow is hard to see out of the static program listing. + + Statement separator + + The original BASIC of 1964 used linefeed as statement terminator. + PASCAL uses ; as statement separator, C uses ; as statement terminator. + Beginners often forget this silent syntax element. BASIC later adopted + the : as an additional statement separator. BASICO avoids silent syntax + elements, therefore both linefeed and ; are used as statement + terminators. With linefeed as statement terminator a long statement can + not be split in two source code lines. Such a statement needs a + continuation symbol to undo the statement terminator effect of the + linefeed. In Tcl/Tk the \ as last character before linefeed works as + continuation symbol. BASICO follows Tcl/Tk syntax. + + Statement list + + PASCAL encloses statements between the keywords BEGIN and END. C uses { + and } for the same purpose. These statement list keywords give trouble + to the beginner. First, they are silent syntax elements, second the + beginner has problems to see the difference between statement and + statement list. + The statement list keywords are not needed if the control flow keywords + include them. One example is the WHILE condition DO statement-list WEND + syntax. Another is the IF condition THEN statement-list ENDIF + construct. The first statement-list is enclosed by DO and WEND, the + second statement-list with THEN and ENDIF. + + Dangling else + + Dangling else is solved with the IF condition THEN statement-list ELSE + statement-list ENDIF construct. With ENDIF there is no more dangling + else shift/reduce conflict. + + Assignment and Equal + + BASIC uses = both for assigment and for test on equivalence, as it is + mathematical tradition. Pascal uses := and =, C uses = and ==. As long + as multiple assignment like a = b = c is not needed, the both meanings + of = do not produce a conflict. The BASICO meaning of a = b = c is a + assign (b equal c), that is a becomes 1 if b is equal to c and 0 else. + The author's all time favorite in simple C bugs, the nasty if (a = b) + instead of if (a == b) is gone with BASICO. + + Evaluation of conditional expressions + + The symbols '<', '>=', .. are defined as conditional operators. The + expression a >= b can result in 1 for true or 0 for false. There is no + short circuit evaluation for conditional expressions. This is fine for + today's CPUs that do not like jumps very much because of the execution + queue flushing. + + Variable type declaration + + Pascal and C have strict variable types. Without declaration no + variable can be used. BASIC needs a variable declaration only for array + variables. Scalar variables can be used directly. BASICO follows Pascal + and C. The variable types are integer, character, one-dimensional array + of integer and one dimensional array of character. + + Automatic type conversion + + All calculations are done with integer. Therefore the char variables + are promoted to integer. The BASICO char is an unsigned char. There are + no negative char values. + + Function calls + + The C library functions like printf() or strcpy() can be called out of + a BASICO program. The C call convention for the GNU C compiler is used. + The function arguments are pushed from right to left on the stack. All + arguments have the same size (32bit for GNU C on Intel x86). The return + value in placed into a CPU register (eax for GNU C on Intel x86). + + Nested functions + + BASICO does not support nested functions. I even think that nested + functions are a bad method for information hiding. + + Call-by-value and Call-by-reference + + Call-by-value is the method for int and char variables as function + arguments: A copy of the variable is forwarded to the function. With + call-by-value only input parameters are possible. Array variables are + forwarded with call-by-reference: The memory location of the start of + the array is given. This method allows arrays as input and output + parameters. + The return value of the function is one output parameter. To get more + output parameters, the trick in BASICO is to use arrays as function + arguments to hold these output parameters. + + Named constants + + Pascal has named constants with the const keyword. C uses the #define + preprocessor command for constants. BASICO has no named constants (for + now). + + Comment + + BASIC uses REM to start a one-line comment. PASCAL and C have + multi-line comments with { } or /* */. C++ re-introduced the one-line + comment with //. BASICO uses the C++ one-line comment. + + Compiler error handling + + The compiler stops at the first error. The source code line is shown up + to the point where the scanner, parser or code-generator detected the + bug and an error message is given. For the batch processing age + compiler gurus out there: one error is all you get from this compiler. + +BASICO syntax + + The following LL(1) context-free grammar in Extended Backus-Naur-Form + (EBNF) for BASICO is final. It is based on the [45]PASCAL syntax. The + syntax is free of shift/reduce conflicts. All variable declarations are + before variable use. A simple LL(1) predictive parser needs one pass to + translate the source. Syntax terminals are enclosed in ' ' or " " like + '<' and "if". Capital letter non-terminals come from the scanner + (CONST, IDENT, ..). The empty set is commented. The scanner translates + some linefeeds into ';'. This trick makes the syntax semicolon-free. + The EBNF was checked by [46]Coco/R a scanner and parser generator for + LL(1) grammars from Hanspeter Mössenböck. In EBNF, [ ] is 0 to 1 + repetition, { } is 0 to n repetion, ( ) is grouping alternatives. + basico = + [ "var" { globalDecl ';' } ] + { "func" IDENT '(' paramList ')' ':' returnType ';' blockOrForward + ';' } + . + globalDecl = + IDENT ':' ( + "int" + | "char" + | "array" CONST ( + "int" + | "char" + ) + ) . + paramList = + [ paramDecl { ',' paramDecl } ] + . + paramDecl = + IDENT ':' ( + "int" + | "char" + | "array" ( + "int" + | "char" + ) + ) . + + returnType = + "int" + | "char" + | "void" + . + + blockOrForward = + pblock3 + | "forward" + . + + pblock3 = + [ "var" { localDecl ';' } ] + "begin" stmtList "end" + . + localDecl = + IDENT ':' ( + "int" + | "char" + | "array" CONST ( + "int" + | "char" + ) + ) . + + stmtList = + { [ statement ] ';' } + . + + statement = + IDENT ( + [ '[' expr ']' ] '=' expr + | exprList + ) + | "if" expr "then" stmtList [ "else" stmtList ] "endif" + | "while" expr "do" stmtList "wend" + | "return" [ expr ] + | "break" + . + exprList = + '(' [ expr { ',' expr } ] ')' + . + + expr = + addexpr { '=' addexpr + | '#' addexpr + | '<' [ '=' ] addexpr + | '>' [ '=' ] addexpr + } . + + addexpr = + term { '+' term + | '-' term + | '|' term + | '^' term + } . + + term = + unaryfact { '*' unaryfact + | '/' unaryfact + | '%' unaryfact + | '&' unaryfact + } . + + unaryfact = + '-' fact + | '~' fact + | fact + . + + fact = + IDENT [ '[' expr ']' | exprList ] + | CONST + | CHRCONST + | STRCONST + | '(' expr ')' + . + +BASICO example program + + This is the old "Guess a number" game in BASICO. The example program + uses the C functions getchar(), printf(), time(), srand() and rand(). + The function main() is the entry point like in C. This is the first + BASICO game ever written. Please note the structured use of alarm exits + with "return decimal" in getDecimal() and "break" in main(). + // guess.bas + // Guess a number game + // Compile with BASICO version 0.9 + func getDecimal():int + var decimal: int + ch: char + begin + decimal = 0; + while 1 do + ch = getchar() + if (ch>='0')&(ch<='9') then + decimal = decimal * 10 + ch - '0' + else + return decimal + endif + wend + end + func main():void + var myNumber: int + yourNumber: int + guesses: int + ch: char + begin + printf("Guess a number game\n") + printf("Numbers are between 1 and 50\n") + srand(time(0)) // get time in seconds since 1970, init Random + Generator + while 1 do + myNumber = rand() // get a random number + myNumber = myNumber % 50 + 1 + guesses = 0 + while 1 do + printf("Your guess: ") + yourNumber = getDecimal() + guesses = guesses + 1 + if yourNumber = myNumber then + printf("You guessed it in %d guesses!\n", guesses) + break + else + if yourNumber > myNumber then + printf("Your guess is to high\n") + else + printf("Your guess is to low\n") + endif + endif + wend + printf("Another game (y or n): ") + ch = getchar(); + if (ch='n')|(ch='N') then + break + endif + getchar() // eat \n + wend + printf("Goodbye.\n") + end + Compile program: + ./basico<guess.bas >guess.s + cc -o guess guess.s + Run program: + ./guess + Guess a number game + Numbers are between 1 and 50 + Your guess: 25 + Your guess is to high + Your guess: 13 + Your guess is to low + Your guess: 19 + Your guess is to high + Your guess: 16 + You guessed it in 4 guesses! + Another game (y or n): n + Goodbye. + +BASICO development strategy + + Compiler bootstrapping is done in several steps. We start with the + development environment for one language, in our case C, and we end + with the development environment for another language, BASICO. To + minimize our effort, we only replace the C compiler with the BASICO + compiler but keep the assembler, linker and library of the C + tool-chain. + * Define the BASICO source language. For compiler bootstrap you need + character, character arrays (strings), integer and integer arrays. + A recursive descent parser needs recursive function calls. Function + calls need parameters (formal arguments), local variables, return + values and call-by-reference in/out variables. See + basico05_bnf.txt. + * Define the BASICO target language. The output of a recursive + descent parser (compiler) are op-codes for a stack-machine. These + stack op-codes are further translated into op-codes (assembler + listing) for a real register machine like the Intel x86 or the + Renesas R8C. See x86.txt. + * Implement a throw-away compiler that translates BASICO source into + target op-codes. This initial compiler is done with lex and yacc. + * Establish the tool-chain with BASICO throw-away compiler, + assembler, linker and library. + * Convert the BASICO grammer to LL(1) with the help of the CoCo/R + scanner and parser generator. + * Write the first version of the BASICO compiler in BASICO source + code. + * Translate the first BASICO compiler with the initial lex/yacc + compiler into a binary program. + * Now the development environment is complete. We have the compiler + source, a throw-away compiler that can translate this source into a + program. This program can compile it's own source. + +Predictive Parser for Expression + + The following parser is an enhanced version of the infix to postfix + translator in chapter 2.5 of "COMPILERS Principles, Techniques and + Tools" by Aho, Sethi and Ullman. The program can only handle single + digit numbers and can't skip white space, but does understand all + single character BASICO operators like =, #, <, >, +, -, |, ^, *, /, %, + &, ~, ( and ). + var lookahead: int + func main(): void + begin + lookahead = getchar() + expr() + putchar('\n') + end + func expr(): void + begin + addexpr() + while 1 do + if lookahead = '=' then + match('='); addexpr(); putchar('=') + else if lookahead = '#' then + match('#'); addexpr(); putchar('#') + else if lookahead = '<' then + match('<'); addexpr(); putchar('<') + else if lookahead = '>' then + match('>'); addexpr(); putchar('>') + else; break; endif; endif; endif; endif + wend + end + func addexpr(): void + begin + term() + while 1 do + if lookahead = '+' then + match('+'); term(); putchar('+') + else if lookahead = '-' then + match('-'); term(); putchar('-') + else if lookahead = '|' then + match('|'); term(); putchar('|') + else if lookahead = '^' then + match('^'); term(); putchar('^') + else; break; endif; endif; endif; endif + wend + end + func term(): void + begin + unaryfact(); + while 1 do + if lookahead = '*' then + match('*'); unaryfact(); putchar('*') + else if lookahead = '/' then + match('/'); unaryfact(); putchar('/') + else if lookahead = '%' then + match('%'); unaryfact(); putchar('%') + else if lookahead = '&' then + match('&'); unaryfact(); putchar('&') + else; break; endif; endif; endif; endif + wend + end + func unaryfact(): void + begin + if lookahead = '-' then + match('-'); fact(); putchar('-') + else if lookahead = '~' then + match('~'); fact(); putchar('~') + else; fact(); endif; endif + end + func fact(): void + begin + if lookahead = '(' then + match('('); expr(); match(')') + else if isdigit(lookahead) then + putchar(lookahead); match(lookahead) + else; error(); endif; endif + end + func match(t: int): void + begin + if lookahead = t then + lookahead = getchar() + else; error(); endif + end + func error(): void + begin + printf("syntax error\n") + exit(1) + end + Compile program: + ./basico04<rpn3.bas >rpn3.s + cc -o rpn3 rpn3.s + Run program: + ./rpn3 + 7=1+2*3 + 7123*+= + The infix expression 7 test for equal with 1 plus 2 multiply by 3 is + translated into the postfix expression push 7, 1, 2, 3, then multiply 2 + by 3, add this with 1 and test this for equal with 7. + +Examples for Compiler bootstrapping + + * Niklaus Wirth published the source code of many small languages + like PL/0 in 1976, [47]PASCAL-S in 1975 and [48]Oberon-0 + ([49]sources) in 1996. These languages can not bootstrap + themselves. BASICO follows the PASCAL syntax. + * [50]P4-Pascal and [51]UCSD-Pascal can bootstrap themselves. + * Dennis Ritchie used compiler bootstrapping for the UNIX C compiler. + Today the sources of the "[52]last1120c" compiler are on his home + page. This compiler has no struct and creates PDP11/20 object code. + Together with the UNIX version 1 binaries and the PDP11 emulator + you can go back to 1973 and program like the old heros. Even the + UNIX version 1 man pages are available again. BASICO follows the C + semantics. + * Jack W. Crenshaw published between 1988 and 1989 the [53]TINY + language. He used characters instead of constants for keywords in + the parser. This idea is used in BASICO, too. TINY can not + bootstrap itself. BASICO follows TINY's code generation ideas. + * [54]Euphoria is a BASIC like programming language. Programs can run + interpreted or compiled with help of an Euphoria to C translator. + There is a free Euphoria interpreter in Euphoria. + * Edmund Grimley Evans wrote in 2001 "[55]Bootstrapping a simple + compiler from nothing". He started on the pure binary machine code + level. The language is Forth like. + * Fabrice Bellard wrote in 2002 the [56]Obfuscated Tiny C Compiler + (OTCC). Macro extended with gcc -E and pretty printed with indent + the source is 463 lines and can compile itself on Linux into 80386 + machine code. See this masterpiece yourself ([57]readable otcc.c). + +Symbol Table + + Niklaus Wirth used in PASCAL-S a two-dimensional symbol table. BASICO + provides only one-dimensional arrays. The compiler program itself has + to do the abstraction from 1-dim to 2-dim. + The identifiers in the program can be keywords (if, then, else), global + variable names, function names, function parameter names or local + variable names. The search in the symbol table is keywords first, then + local and parameter variables and last global variables and function + names. These three namespaces (keywords, local, global) can be + implemented with three symbol tables. Or in one symbal table that has + three sections. The first approach is implemented. + The variable name is the unique identifier in the symbol table to get + and set the attributes of the variable. One attribute is the variable + type like int, char, array-of-int and array-of char. Another is the + storage type like global storage or local storage. One more attribute + for parameter and local variables is the frame-pointer offset + (frame-pointer relative memory location). See symtab.c for details. + +Code Generator + + The recursive decent parser creates op-codes for a stack-machine. + Normal microprocessors are register machines that can handle 1 to 3 + addresses per op-code. The Intel x86 chips or the Renesas R8C chip can + handle 2 addresses. To bridge the gap between stack-machine and + 2-address machine two or more registers of the CPU are used as + stack-cache, that is TOS (top of stack), NOS (next of stack), third of + stack and fourth of stack are in registers. Instead of moving the + contents of the CPU registers to perform the push and pop stack + operations, the stack-machine cache labels (TOS, NOS, THIRD, FOURTH) + are re-mapped. Therefore CPU register %eax can be NOS at one time and + TOS at another time. See codegen.c for details. Some examples of code + generation for the x86. The BASICO statement follows as comment the + assembler code: + movl a, %eax + movl $1, %ebx + subl %ebx, %eax + movl %eax, x + # x=a-1 + movl a, %eax + movl b, %ebx + subl %ebx, %eax + movl c, %ebx + movl d, %ecx + subl %ecx, %ebx + movl %edx, %esi + cltd + idivl %ebx + movl %esi, %edx + movl %eax, x + # x=(a-b)/(c-d) + movl a, %eax + negl %eax + movl b, %ebx + movl c, %ecx + movl d, %edx + imul %edx, %ecx + subl %ecx, %ebx + subl %ebx, %eax + movl %eax, x + # x=-a-(b-c*d) + movl i, %eax + movl $1, %ebx + addl %ebx, %eax + movl n, %ebx + movl i, %ecx + movl bb(,%ecx,4), %ecx + imul %ecx, %ebx + movl %ebx, bb(,%eax,4) + # bb[i+1] = n*bb[i] + movl a, %eax + movl b, %ebx + notl %ebx + andl %ebx, %eax + movl a, %ebx + notl %ebx + movl b, %ecx + andl %ecx, %ebx + orl %ebx, %eax + movl %eax, x + # x=a&~b|~a&b + +Peephole optimization + + [58]Wiktionary definition: An optimization that works by eliminating + redundant instructions from a small area of code. + Some examples of possible peephole optimization for the Basico + compiler: + + original code + optimized code + comment + movl $0, %eax + movl %eax, -12(%ebp) movl $0, -12(%ebp) The x86 CPU can + move a constant to a memory location. + movl $48, %ebx + cmpl %ebx, %eax + cmpl $48, %eax + The x86 CPU can compare a constant with a register. + movl $48, %ebx + subl %ebx, %eax + subl $48, %eax + The x86 CPU can subtract a constant from a register. + movl $0, %eax + pushl %eax + pushl $0 + The x86 CPU can push a constant. + cmpl %ebx, %eax + sete %al + movzbl %al, %eax + andl %eax, %eax + jz .L8 + cmpl %ebx, %eax + jnz .L8 The compare op code sets the flags. + cmpl %ebx, %eax + setg %al + movzbl %al, %eax + andl %eax, %eax + jz .L10 + cmpl %ebx, %eax + jng .L10 + The compare op code sets the flags. + + A peephole optimizer uses pattern matching to identify a code segment. + +BASICO sources + + The source code of BASICO version 0.8 is discussed in detail. The + findings of this code review are implemented in BASICO version 0.9. + [59]BASICO source code reading + + Version 0.9 + + Here are the bugfixes as mentioned in the source code reading. + [60]BASICO version 0.9 + + Version 0.8 + + The statements of a function that is only called from one location is + copied to this location (inline the function). This reduced the source + lines to 975. The stripped throw-away compiler binary is 16432 bytes + long and needs 46ms to process the basico.bas input file, the BASICO + version is 18124 bytes long and needs 82ms. The GNU C compiler was + using -O1 optimization. The simple idea of using CPU registers as stack + machine cache pays of very well. + [61]BASICO version 0.8 + + Version 0.7 + + The parser calls now the code generation functions. The BASICO compiler + is written in BASICO. Compiler bootstrapping is possible. For the input + file basico.bas the BASICO version 0.5 compiler produces the same + assembler code as the BASICO 0.7 compiler. The compiler is 1279 source + lines long. + [62]BASICO version 0.7 + + Version 0.6 + + Now scanner, symbol table and parser are in BASICO. The parser can + successful parse itself. The BASICO parser was transcribed from the + CoCo/R Parser.cpp output file of the basico06.atg grammar input file. + [63]BASICO version 0.6 + + Version 0.5 + + The compiler components scanner and symbol table are now available in + BASICO. The BASICO version scanner.bas was created by transcribing + scanner.c. The input file scanner.bas produces the same output file + with the BASICO version and the C version of the scanner - like it + should be. The compiled C version is 5844 bytes long and needs 12ms to + process the scanner.bas input file, the BASICO version is 9928 bytes + long and needs 22ms. + Version 0.51 combines the infix to postfix translator in BASICO with + the scanner and symbol table. This is an intermediate step to the full + BASICO parser in BASICO. + [64]BASICO version 0.51 + [65]BASICO version 0.5 + + Version 0.4 + + The scanner is now written in C. For easy re-writing of the scanner in + BASICO only a subset of C was used. The BASICO source is copied as + comment into the output assembler listing. + Version 0.41 has the infix to postfix translator examples. + [66]BASICO version 0.41 + [67]BASICO version 0.4 + + Version 0.3 + + This version uses symbol tables. Now different variable types and + different storage types are working. Even call-by-reference for arrays + as function parameters is implemented. The first version of our + throw-away compiler is ready. Now work moves from back-end (code + generation) to front-end (scanner, parser). + In version 0.31 the lex scanner translates the keywords into + characters. The bison parser uses these characters as tokens. These + changes prepare the parser for the new scanner. Another detail is the + new handling of <= and >=. The boolean operator ^ for xor is new. The + not operator is now ~ as in C. + [68]BASICO version 0.31 + [69]BASICO version 0.3 + + Version 0.2 + + This version can compile expressions with global integer and global + array-of-integer variables. Function call with return value is + possible. Function call to C library functions with 3 parameters + maximum is working, too. The central missing element is the symbol + table. The symbol table tells the type of the variable like char + variable and/or local variable. Op-code generation for char and local + variables is missing, too. But we have reached some level of "Tiny + BASIC Compiler". + Version 0.21 allows comments after a statement. + [70]BASICO version 0.21 + [71]BASICO version 0.2 + + Version 0.1 + + This version uses lex and yacc to implement the parser of the + throw-away compiler. The parser does not only check that the syntax is + okay, but does emit some semantic information. Some very first ideas on + symbol table and code generation are included too. + [72]BASICO version 0.1 + +Books + + Compiler bauen mit UNIX + Introduction to compiler construction with UNIX + Axel-Tobias Schreiner with H. G. Friedman, Jr., + german 1985, Hanser, München, ISBN 3-446-14359-9; + englisch 1985, Prentice-Hall Software Series, New York, ISBN + 0-13-474396-2; + The book is no longer in print. But you can download the [73]example + programs here. + The UNIX Programming Environment + Der UNIX - Werkzeugkasten. Programmieren mit UNIX + Brian W. Kernighan, Rob Pike + german 2002, Hanser, München, ISBN 978-3446142732; + english 1984, Prentice Hall, Inc., ISBN 0-13-937681-X + The english version of the book is still in print. The [74]example + programs are here. + +References + + 1. http://www.andreadrian.de/tbng/BASICO_part1.html + 2. http://www.andreadrian.de/tbng/#mozTocId868038 + 3. http://www.andreadrian.de/tbng/#mozTocId936395 + 4. http://www.andreadrian.de/tbng/#mozTocId571548 + 5. http://www.andreadrian.de/tbng/#mozTocId888819 + 6. http://www.andreadrian.de/tbng/#mozTocId9185 + 7. http://www.andreadrian.de/tbng/#mozTocId738865 + 8. http://www.andreadrian.de/tbng/#mozTocId972790 + 9. http://www.andreadrian.de/tbng/#mozTocId769710 + 10. http://www.andreadrian.de/tbng/#mozTocId41406 + 11. http://www.andreadrian.de/tbng/#mozTocId528848 + 12. http://www.andreadrian.de/tbng/#mozTocId696380 + 13. http://www.andreadrian.de/tbng/#mozTocId60605 + 14. http://www.andreadrian.de/tbng/#mozTocId667307 + 15. http://www.andreadrian.de/tbng/#mozTocId798052 + 16. http://www.andreadrian.de/tbng/#mozTocId300437 + 17. http://www.andreadrian.de/tbng/#mozTocId334313 + 18. http://www.andreadrian.de/tbng/#mozTocId35039 + 19. http://www.andreadrian.de/tbng/#mozTocId276292 + 20. http://www.andreadrian.de/tbng/#mozTocId343981 + 21. http://www.andreadrian.de/tbng/#mozTocId802188 + 22. http://www.andreadrian.de/tbng/#mozTocId636454 + 23. http://www.andreadrian.de/tbng/#mozTocId460571 + 24. http://www.andreadrian.de/tbng/#mozTocId973061 + 25. http://www.andreadrian.de/tbng/#mozTocId333375 + 26. http://www.andreadrian.de/tbng/#mozTocId865845 + 27. http://www.andreadrian.de/tbng/#mozTocId476164 + 28. http://www.andreadrian.de/tbng/#mozTocId201449 + 29. http://www.andreadrian.de/tbng/#mozTocId271148 + 30. http://www.andreadrian.de/tbng/#mozTocId36461 + 31. http://www.andreadrian.de/tbng/#mozTocId843374 + 32. http://www.andreadrian.de/tbng/#mozTocId423416 + 33. http://www.andreadrian.de/tbng/#mozTocId327801 + 34. http://www.andreadrian.de/tbng/#mozTocId625187 + 35. http://www.andreadrian.de/tbng/#mozTocId954198 + 36. http://www.andreadrian.de/tbng/#mozTocId647194 + 37. http://www.andreadrian.de/tbng/#mozTocId63465 + 38. http://www.andreadrian.de/tbng/#mozTocId98862 + 39. http://www.ddj.com/184406381 + 40. http://www.acm.org/classics/oct95/ + 41. http://dinosaur.compilertools.net/ + 42. http://dinosaur.compilertools.net/ + 43. http://www.bitsavers.org/pdf/dartmouth/BASIC_Oct64.pdf + 44. http://www.comune.basico.me.it/ + 45. http://www.db.informatik.uni-kassel.de/Help/pascal/einfuehrung/pas_bnf.html + 46. http://ssw.jku.at/Coco/ + 47. http://www.246.dk/pascals.html + 48. http://www.oberon.ethz.ch/books.html + 49. http://www.cs.inf.ethz.ch/%7Ewirth/books/CompilerConstruction/ + 50. http://homepages.cwi.nl/%7Esteven/pascal/ + 51. ftp://ftp.freepascal.org/pub/fpc/attic/ucsd-pascal/ + 52. http://cm.bell-labs.com/who/dmr/ + 53. http://compilers.iecc.com/crenshaw/ + 54. http://www.rapideuphoria.com/index.html + 55. http://www.rano.org/bcompiler.html + 56. http://fabrice.bellard.free.fr/otcc/ + 57. http://www.andreadrian.de/tbng/otcc1.c + 58. http://en.wiktionary.org/wiki/peephole_optimization + 59. http://www.andreadrian.de/tbng/BASICO_part1.html + 60. http://www.andreadrian.de/tbng/basico_20060715.tgz + 61. http://www.andreadrian.de/tbng/basico_20060629.tgz + 62. http://www.andreadrian.de/tbng/basico_20060628.tgz + 63. http://www.andreadrian.de/tbng/basico_20060625a.tgz + 64. http://www.andreadrian.de/tbng/basico_20060621.tgz + 65. http://www.andreadrian.de/tbng/basico_20060620.tgz + 66. http://www.andreadrian.de/tbng/basico_20060618.tgz + 67. http://www.andreadrian.de/tbng/basico_20060611h.tgz + 68. http://www.andreadrian.de/tbng/basico_20060611b.tgz + 69. http://www.andreadrian.de/tbng/basico_20060610c.tgz + 70. http://www.andreadrian.de/tbng/basico_20060609.tgz + 71. http://www.andreadrian.de/tbng/basico_20060605b.tgz + 72. http://www.andreadrian.de/tbng/tbng01_20060528.tgz + 73. http://www.cs.rit.edu/%7Eats/books/index.html + 74. http://cm.bell-labs.com/cm/cs/upe/ |