summaryrefslogtreecommitdiff
path: root/miniany/doc/www.andreadrian.de_tbng.txt
diff options
context:
space:
mode:
Diffstat (limited to 'miniany/doc/www.andreadrian.de_tbng.txt')
-rw-r--r--miniany/doc/www.andreadrian.de_tbng.txt914
1 files changed, 914 insertions, 0 deletions
diff --git a/miniany/doc/www.andreadrian.de_tbng.txt b/miniany/doc/www.andreadrian.de_tbng.txt
new file mode 100644
index 0000000..b3f150c
--- /dev/null
+++ b/miniany/doc/www.andreadrian.de_tbng.txt
@@ -0,0 +1,914 @@
+ BASICO - Programming Language, Self-compiling compiler
+
+ Author: Andre Adrian
+ Version: 22.apr.2008
+
+Abstract
+
+ BASICO
+ * is a small imperative programming language that is just powerful
+ enough to compile itself (compiler bootstrapping).
+ * has no GOTO, but has while-break-wend and multiple return
+ * has C-like string handling.
+ * is implemented in less then 1000 source code lines for the
+ compiler.
+ * produces real binary programs for x86 processors, not P-code or
+ Byte-Code.
+ * uses the C compiler toolchain (assembler, linker)
+ * uses C library functions like printf(), getchar(), strcpy(),
+ isdigit(), rand() for run-time support.
+
+ Please check the section [1]BASICO source code reading to get a
+ detailed discussion of the source code!
+
+Contents
+
+ * [2]Abstract
+ * [3]Contents
+ * [4]Introduction
+ * [5]BASICO syntax ideas
+ + [6]GOTO considered harmful
+ + [7]Statement separator
+ + [8]Statement list
+ + [9]Dangling else
+ + [10]Assignment and Equal
+ + [11]Evaluation of conditional expressions
+ + [12]Variable type declaration
+ + [13]Automatic type conversion
+ + [14]Function calls
+ + [15]Nested functions
+ + [16]Call-by-value and Call-by-reference
+ + [17]Named constants
+ + [18]Comment
+ + [19]Compiler error handling
+ * [20]BASICO syntax
+ * [21]BASICO example program
+ * [22]BASICO development strategy
+ * [23]Predictive Parser for Expression
+ * [24]Examples for Compiler bootstrapping
+ * [25]Symbol Table
+ * [26]Code Generator
+ * [27]Peephole optimization
+ * [28]BASICO sources
+ + [29]Version 0.9
+ + [30]Version 0.8
+ + [31]Version 0.7
+ + [32]Version 0.6
+ + [33]Version 0.5
+ + [34]Version 0.4
+ + [35]Version 0.3
+ + [36]Version 0.2
+ + [37]Version 0.1
+ * [38]Books
+
+Introduction
+
+ In January 2006 Dr. Dobb's journal celebrated the [39]30th anniversary
+ of Tiny BASIC. Tiny BASIC was specified by Dennis Allison and was
+ implemented by Dick Whipple and John Arnold. Tiny BASIC needed 3
+ Kilobytes of memory and was sold for 5 US-dollars. The Microsoft BASIC
+ of this time needed 8 Kilobytes and 150 US-dollars.
+ The original Tiny BASIC interpreter is written in an intermediate
+ language (IL) to save memory. The CPU runs a virtual machine (VM) to
+ emulate the IL processor. The IL processor interprets the Tiny BASIC
+ program. Li-Chen Wang created a Tiny BASIC version without IL for the
+ Z80 8-bit CPU.
+ One drawback of this approach is that the IL can handle strings,
+ because it has to parse the BASIC keywords like PRINT, GOTO, but the
+ Tiny BASIC itself has no string handling commands. IL has small memory
+ requirement but also slow execution speed.
+ Dennis Ritchie used compiler bootstrapping for the C compiler, like
+ Niklaus Wirth did for the computer languages PASCAL, Modula and Oberon.
+ The compiler itself is written in the language the compiler can
+ translate. After an first iteration that needs external help, the
+ compiler of iteration N can compile the source code that creates
+ compiler of iteration N+1.
+ The project is pure fun. It is just a "what if" you ask 30 years later.
+ In 1976 the knowledge for BASICO was already present. The "[40]goto
+ considered harmful" paper of Edsger W. Dijkstra was written in 1968.
+ The first PASCAL compiler was written in 1971. The UNIX developers used
+ compiler bootstrapping in 1973. The compiler tools [41]lex (Lesk, 1975)
+ and [42]yacc (Johnson, 1974) were available.
+ The word BASICO can be read as "[43]Beginner's All-purpose Symbolic
+ Instruction Code zero" or as Basico, just something with an italian
+ sound like lambrusco.
+ Note: The project started with the name "Tiny BASIC no GOTO (TB-NG)",
+ but BASICO sounds so much better. Basico is the name of a [44]small
+ italian town (745 inhabitants).
+
+BASICO syntax ideas
+
+ Today the programming language C is the standard. Other languages are
+ comments to C. PASCAL is C with nested procedures but without stdio
+ library. C++ is C overkill. Modula-2 is C without printf but with
+ import and export. Ada is PASCAL++ for the DoD. JAVA is C++ with
+ garbage collection but byte-code. Tcl/Tk is no C at all, but a language
+ with minimum syntax. Last but not least BASIC is still not dead, but
+ has included a lot of syntax from other languages.
+
+ GOTO considered harmful
+
+ Today everybody agrees on the "one coding block shall have one entry
+ and one exit" paradigma " to shorten the conceptual gap between the
+ static program and the dynamic process, to make the correspondence
+ between the program (spread out in text space) and the process (spread
+ out in time) as trivial as possible". But in the same paper Dijkstra
+ also wrote "I remember having read the explicit recommendation to
+ restrict the use of the go to statement to alarm exits".
+ Just deleting the goto statement and not introduce some other syntax
+ for "alarm exits" does make programs less structured. Some alarm exits
+ are
+ * conditional break in a loop
+ * multiple return in a function
+ * conditional return stack un-rolling (try/throw/catch or
+ setjmp/longjmp)
+
+ The traditional structured control flow for the "alarm exit" problem
+ needs additional state variables or special function return-values that
+ remember the alarm-status. Specially for return stack un-rolling the
+ dynamic control flow is hard to see out of the static program listing.
+
+ Statement separator
+
+ The original BASIC of 1964 used linefeed as statement terminator.
+ PASCAL uses ; as statement separator, C uses ; as statement terminator.
+ Beginners often forget this silent syntax element. BASIC later adopted
+ the : as an additional statement separator. BASICO avoids silent syntax
+ elements, therefore both linefeed and ; are used as statement
+ terminators. With linefeed as statement terminator a long statement can
+ not be split in two source code lines. Such a statement needs a
+ continuation symbol to undo the statement terminator effect of the
+ linefeed. In Tcl/Tk the \ as last character before linefeed works as
+ continuation symbol. BASICO follows Tcl/Tk syntax.
+
+ Statement list
+
+ PASCAL encloses statements between the keywords BEGIN and END. C uses {
+ and } for the same purpose. These statement list keywords give trouble
+ to the beginner. First, they are silent syntax elements, second the
+ beginner has problems to see the difference between statement and
+ statement list.
+ The statement list keywords are not needed if the control flow keywords
+ include them. One example is the WHILE condition DO statement-list WEND
+ syntax. Another is the IF condition THEN statement-list ENDIF
+ construct. The first statement-list is enclosed by DO and WEND, the
+ second statement-list with THEN and ENDIF.
+
+ Dangling else
+
+ Dangling else is solved with the IF condition THEN statement-list ELSE
+ statement-list ENDIF construct. With ENDIF there is no more dangling
+ else shift/reduce conflict.
+
+ Assignment and Equal
+
+ BASIC uses = both for assigment and for test on equivalence, as it is
+ mathematical tradition. Pascal uses := and =, C uses = and ==. As long
+ as multiple assignment like a = b = c is not needed, the both meanings
+ of = do not produce a conflict. The BASICO meaning of a = b = c is a
+ assign (b equal c), that is a becomes 1 if b is equal to c and 0 else.
+ The author's all time favorite in simple C bugs, the nasty if (a = b)
+ instead of if (a == b) is gone with BASICO.
+
+ Evaluation of conditional expressions
+
+ The symbols '<', '>=', .. are defined as conditional operators. The
+ expression a >= b can result in 1 for true or 0 for false. There is no
+ short circuit evaluation for conditional expressions. This is fine for
+ today's CPUs that do not like jumps very much because of the execution
+ queue flushing.
+
+ Variable type declaration
+
+ Pascal and C have strict variable types. Without declaration no
+ variable can be used. BASIC needs a variable declaration only for array
+ variables. Scalar variables can be used directly. BASICO follows Pascal
+ and C. The variable types are integer, character, one-dimensional array
+ of integer and one dimensional array of character.
+
+ Automatic type conversion
+
+ All calculations are done with integer. Therefore the char variables
+ are promoted to integer. The BASICO char is an unsigned char. There are
+ no negative char values.
+
+ Function calls
+
+ The C library functions like printf() or strcpy() can be called out of
+ a BASICO program. The C call convention for the GNU C compiler is used.
+ The function arguments are pushed from right to left on the stack. All
+ arguments have the same size (32bit for GNU C on Intel x86). The return
+ value in placed into a CPU register (eax for GNU C on Intel x86).
+
+ Nested functions
+
+ BASICO does not support nested functions. I even think that nested
+ functions are a bad method for information hiding.
+
+ Call-by-value and Call-by-reference
+
+ Call-by-value is the method for int and char variables as function
+ arguments: A copy of the variable is forwarded to the function. With
+ call-by-value only input parameters are possible. Array variables are
+ forwarded with call-by-reference: The memory location of the start of
+ the array is given. This method allows arrays as input and output
+ parameters.
+ The return value of the function is one output parameter. To get more
+ output parameters, the trick in BASICO is to use arrays as function
+ arguments to hold these output parameters.
+
+ Named constants
+
+ Pascal has named constants with the const keyword. C uses the #define
+ preprocessor command for constants. BASICO has no named constants (for
+ now).
+
+ Comment
+
+ BASIC uses REM to start a one-line comment. PASCAL and C have
+ multi-line comments with { } or /* */. C++ re-introduced the one-line
+ comment with //. BASICO uses the C++ one-line comment.
+
+ Compiler error handling
+
+ The compiler stops at the first error. The source code line is shown up
+ to the point where the scanner, parser or code-generator detected the
+ bug and an error message is given. For the batch processing age
+ compiler gurus out there: one error is all you get from this compiler.
+
+BASICO syntax
+
+ The following LL(1) context-free grammar in Extended Backus-Naur-Form
+ (EBNF) for BASICO is final. It is based on the [45]PASCAL syntax. The
+ syntax is free of shift/reduce conflicts. All variable declarations are
+ before variable use. A simple LL(1) predictive parser needs one pass to
+ translate the source. Syntax terminals are enclosed in ' ' or " " like
+ '<' and "if". Capital letter non-terminals come from the scanner
+ (CONST, IDENT, ..). The empty set is commented. The scanner translates
+ some linefeeds into ';'. This trick makes the syntax semicolon-free.
+ The EBNF was checked by [46]Coco/R a scanner and parser generator for
+ LL(1) grammars from Hanspeter Mössenböck. In EBNF, [ ] is 0 to 1
+ repetition, { } is 0 to n repetion, ( ) is grouping alternatives.
+ basico =
+ [ "var" { globalDecl ';' } ]
+ { "func" IDENT '(' paramList ')' ':' returnType ';' blockOrForward
+ ';' }
+ .
+ globalDecl =
+ IDENT ':' (
+ "int"
+ | "char"
+ | "array" CONST (
+ "int"
+ | "char"
+ )
+ ) .
+ paramList =
+ [ paramDecl { ',' paramDecl } ]
+ .
+ paramDecl =
+ IDENT ':' (
+ "int"
+ | "char"
+ | "array" (
+ "int"
+ | "char"
+ )
+ ) .
+
+ returnType =
+ "int"
+ | "char"
+ | "void"
+ .
+
+ blockOrForward =
+ pblock3
+ | "forward"
+ .
+
+ pblock3 =
+ [ "var" { localDecl ';' } ]
+ "begin" stmtList "end"
+ .
+ localDecl =
+ IDENT ':' (
+ "int"
+ | "char"
+ | "array" CONST (
+ "int"
+ | "char"
+ )
+ ) .
+
+ stmtList =
+ { [ statement ] ';' }
+ .
+
+ statement =
+ IDENT (
+ [ '[' expr ']' ] '=' expr
+ | exprList
+ )
+ | "if" expr "then" stmtList [ "else" stmtList ] "endif"
+ | "while" expr "do" stmtList "wend"
+ | "return" [ expr ]
+ | "break"
+ .
+ exprList =
+ '(' [ expr { ',' expr } ] ')'
+ .
+
+ expr =
+ addexpr { '=' addexpr
+ | '#' addexpr
+ | '<' [ '=' ] addexpr
+ | '>' [ '=' ] addexpr
+ } .
+
+ addexpr =
+ term { '+' term
+ | '-' term
+ | '|' term
+ | '^' term
+ } .
+
+ term =
+ unaryfact { '*' unaryfact
+ | '/' unaryfact
+ | '%' unaryfact
+ | '&' unaryfact
+ } .
+
+ unaryfact =
+ '-' fact
+ | '~' fact
+ | fact
+ .
+
+ fact =
+ IDENT [ '[' expr ']' | exprList ]
+ | CONST
+ | CHRCONST
+ | STRCONST
+ | '(' expr ')'
+ .
+
+BASICO example program
+
+ This is the old "Guess a number" game in BASICO. The example program
+ uses the C functions getchar(), printf(), time(), srand() and rand().
+ The function main() is the entry point like in C. This is the first
+ BASICO game ever written. Please note the structured use of alarm exits
+ with "return decimal" in getDecimal() and "break" in main().
+ // guess.bas
+ // Guess a number game
+ // Compile with BASICO version 0.9
+ func getDecimal():int
+ var decimal: int
+ ch: char
+ begin
+ decimal = 0;
+ while 1 do
+ ch = getchar()
+ if (ch>='0')&(ch<='9') then
+ decimal = decimal * 10 + ch - '0'
+ else
+ return decimal
+ endif
+ wend
+ end
+ func main():void
+ var myNumber: int
+ yourNumber: int
+ guesses: int
+ ch: char
+ begin
+ printf("Guess a number game\n")
+ printf("Numbers are between 1 and 50\n")
+ srand(time(0)) // get time in seconds since 1970, init Random
+ Generator
+ while 1 do
+ myNumber = rand() // get a random number
+ myNumber = myNumber % 50 + 1
+ guesses = 0
+ while 1 do
+ printf("Your guess: ")
+ yourNumber = getDecimal()
+ guesses = guesses + 1
+ if yourNumber = myNumber then
+ printf("You guessed it in %d guesses!\n", guesses)
+ break
+ else
+ if yourNumber > myNumber then
+ printf("Your guess is to high\n")
+ else
+ printf("Your guess is to low\n")
+ endif
+ endif
+ wend
+ printf("Another game (y or n): ")
+ ch = getchar();
+ if (ch='n')|(ch='N') then
+ break
+ endif
+ getchar() // eat \n
+ wend
+ printf("Goodbye.\n")
+ end
+ Compile program:
+ ./basico<guess.bas >guess.s
+ cc -o guess guess.s
+ Run program:
+ ./guess
+ Guess a number game
+ Numbers are between 1 and 50
+ Your guess: 25
+ Your guess is to high
+ Your guess: 13
+ Your guess is to low
+ Your guess: 19
+ Your guess is to high
+ Your guess: 16
+ You guessed it in 4 guesses!
+ Another game (y or n): n
+ Goodbye.
+
+BASICO development strategy
+
+ Compiler bootstrapping is done in several steps. We start with the
+ development environment for one language, in our case C, and we end
+ with the development environment for another language, BASICO. To
+ minimize our effort, we only replace the C compiler with the BASICO
+ compiler but keep the assembler, linker and library of the C
+ tool-chain.
+ * Define the BASICO source language. For compiler bootstrap you need
+ character, character arrays (strings), integer and integer arrays.
+ A recursive descent parser needs recursive function calls. Function
+ calls need parameters (formal arguments), local variables, return
+ values and call-by-reference in/out variables. See
+ basico05_bnf.txt.
+ * Define the BASICO target language. The output of a recursive
+ descent parser (compiler) are op-codes for a stack-machine. These
+ stack op-codes are further translated into op-codes (assembler
+ listing) for a real register machine like the Intel x86 or the
+ Renesas R8C. See x86.txt.
+ * Implement a throw-away compiler that translates BASICO source into
+ target op-codes. This initial compiler is done with lex and yacc.
+ * Establish the tool-chain with BASICO throw-away compiler,
+ assembler, linker and library.
+ * Convert the BASICO grammer to LL(1) with the help of the CoCo/R
+ scanner and parser generator.
+ * Write the first version of the BASICO compiler in BASICO source
+ code.
+ * Translate the first BASICO compiler with the initial lex/yacc
+ compiler into a binary program.
+ * Now the development environment is complete. We have the compiler
+ source, a throw-away compiler that can translate this source into a
+ program. This program can compile it's own source.
+
+Predictive Parser for Expression
+
+ The following parser is an enhanced version of the infix to postfix
+ translator in chapter 2.5 of "COMPILERS Principles, Techniques and
+ Tools" by Aho, Sethi and Ullman. The program can only handle single
+ digit numbers and can't skip white space, but does understand all
+ single character BASICO operators like =, #, <, >, +, -, |, ^, *, /, %,
+ &, ~, ( and ).
+ var lookahead: int
+ func main(): void
+ begin
+ lookahead = getchar()
+ expr()
+ putchar('\n')
+ end
+ func expr(): void
+ begin
+ addexpr()
+ while 1 do
+ if lookahead = '=' then
+ match('='); addexpr(); putchar('=')
+ else if lookahead = '#' then
+ match('#'); addexpr(); putchar('#')
+ else if lookahead = '<' then
+ match('<'); addexpr(); putchar('<')
+ else if lookahead = '>' then
+ match('>'); addexpr(); putchar('>')
+ else; break; endif; endif; endif; endif
+ wend
+ end
+ func addexpr(): void
+ begin
+ term()
+ while 1 do
+ if lookahead = '+' then
+ match('+'); term(); putchar('+')
+ else if lookahead = '-' then
+ match('-'); term(); putchar('-')
+ else if lookahead = '|' then
+ match('|'); term(); putchar('|')
+ else if lookahead = '^' then
+ match('^'); term(); putchar('^')
+ else; break; endif; endif; endif; endif
+ wend
+ end
+ func term(): void
+ begin
+ unaryfact();
+ while 1 do
+ if lookahead = '*' then
+ match('*'); unaryfact(); putchar('*')
+ else if lookahead = '/' then
+ match('/'); unaryfact(); putchar('/')
+ else if lookahead = '%' then
+ match('%'); unaryfact(); putchar('%')
+ else if lookahead = '&' then
+ match('&'); unaryfact(); putchar('&')
+ else; break; endif; endif; endif; endif
+ wend
+ end
+ func unaryfact(): void
+ begin
+ if lookahead = '-' then
+ match('-'); fact(); putchar('-')
+ else if lookahead = '~' then
+ match('~'); fact(); putchar('~')
+ else; fact(); endif; endif
+ end
+ func fact(): void
+ begin
+ if lookahead = '(' then
+ match('('); expr(); match(')')
+ else if isdigit(lookahead) then
+ putchar(lookahead); match(lookahead)
+ else; error(); endif; endif
+ end
+ func match(t: int): void
+ begin
+ if lookahead = t then
+ lookahead = getchar()
+ else; error(); endif
+ end
+ func error(): void
+ begin
+ printf("syntax error\n")
+ exit(1)
+ end
+ Compile program:
+ ./basico04<rpn3.bas >rpn3.s
+ cc -o rpn3 rpn3.s
+ Run program:
+ ./rpn3
+ 7=1+2*3
+ 7123*+=
+ The infix expression 7 test for equal with 1 plus 2 multiply by 3 is
+ translated into the postfix expression push 7, 1, 2, 3, then multiply 2
+ by 3, add this with 1 and test this for equal with 7.
+
+Examples for Compiler bootstrapping
+
+ * Niklaus Wirth published the source code of many small languages
+ like PL/0 in 1976, [47]PASCAL-S in 1975 and [48]Oberon-0
+ ([49]sources) in 1996. These languages can not bootstrap
+ themselves. BASICO follows the PASCAL syntax.
+ * [50]P4-Pascal and [51]UCSD-Pascal can bootstrap themselves.
+ * Dennis Ritchie used compiler bootstrapping for the UNIX C compiler.
+ Today the sources of the "[52]last1120c" compiler are on his home
+ page. This compiler has no struct and creates PDP11/20 object code.
+ Together with the UNIX version 1 binaries and the PDP11 emulator
+ you can go back to 1973 and program like the old heros. Even the
+ UNIX version 1 man pages are available again. BASICO follows the C
+ semantics.
+ * Jack W. Crenshaw published between 1988 and 1989 the [53]TINY
+ language. He used characters instead of constants for keywords in
+ the parser. This idea is used in BASICO, too. TINY can not
+ bootstrap itself. BASICO follows TINY's code generation ideas.
+ * [54]Euphoria is a BASIC like programming language. Programs can run
+ interpreted or compiled with help of an Euphoria to C translator.
+ There is a free Euphoria interpreter in Euphoria.
+ * Edmund Grimley Evans wrote in 2001 "[55]Bootstrapping a simple
+ compiler from nothing". He started on the pure binary machine code
+ level. The language is Forth like.
+ * Fabrice Bellard wrote in 2002 the [56]Obfuscated Tiny C Compiler
+ (OTCC). Macro extended with gcc -E and pretty printed with indent
+ the source is 463 lines and can compile itself on Linux into 80386
+ machine code. See this masterpiece yourself ([57]readable otcc.c).
+
+Symbol Table
+
+ Niklaus Wirth used in PASCAL-S a two-dimensional symbol table. BASICO
+ provides only one-dimensional arrays. The compiler program itself has
+ to do the abstraction from 1-dim to 2-dim.
+ The identifiers in the program can be keywords (if, then, else), global
+ variable names, function names, function parameter names or local
+ variable names. The search in the symbol table is keywords first, then
+ local and parameter variables and last global variables and function
+ names. These three namespaces (keywords, local, global) can be
+ implemented with three symbol tables. Or in one symbal table that has
+ three sections. The first approach is implemented.
+ The variable name is the unique identifier in the symbol table to get
+ and set the attributes of the variable. One attribute is the variable
+ type like int, char, array-of-int and array-of char. Another is the
+ storage type like global storage or local storage. One more attribute
+ for parameter and local variables is the frame-pointer offset
+ (frame-pointer relative memory location). See symtab.c for details.
+
+Code Generator
+
+ The recursive decent parser creates op-codes for a stack-machine.
+ Normal microprocessors are register machines that can handle 1 to 3
+ addresses per op-code. The Intel x86 chips or the Renesas R8C chip can
+ handle 2 addresses. To bridge the gap between stack-machine and
+ 2-address machine two or more registers of the CPU are used as
+ stack-cache, that is TOS (top of stack), NOS (next of stack), third of
+ stack and fourth of stack are in registers. Instead of moving the
+ contents of the CPU registers to perform the push and pop stack
+ operations, the stack-machine cache labels (TOS, NOS, THIRD, FOURTH)
+ are re-mapped. Therefore CPU register %eax can be NOS at one time and
+ TOS at another time. See codegen.c for details. Some examples of code
+ generation for the x86. The BASICO statement follows as comment the
+ assembler code:
+ movl a, %eax
+ movl $1, %ebx
+ subl %ebx, %eax
+ movl %eax, x
+ # x=a-1
+ movl a, %eax
+ movl b, %ebx
+ subl %ebx, %eax
+ movl c, %ebx
+ movl d, %ecx
+ subl %ecx, %ebx
+ movl %edx, %esi
+ cltd
+ idivl %ebx
+ movl %esi, %edx
+ movl %eax, x
+ # x=(a-b)/(c-d)
+ movl a, %eax
+ negl %eax
+ movl b, %ebx
+ movl c, %ecx
+ movl d, %edx
+ imul %edx, %ecx
+ subl %ecx, %ebx
+ subl %ebx, %eax
+ movl %eax, x
+ # x=-a-(b-c*d)
+ movl i, %eax
+ movl $1, %ebx
+ addl %ebx, %eax
+ movl n, %ebx
+ movl i, %ecx
+ movl bb(,%ecx,4), %ecx
+ imul %ecx, %ebx
+ movl %ebx, bb(,%eax,4)
+ # bb[i+1] = n*bb[i]
+ movl a, %eax
+ movl b, %ebx
+ notl %ebx
+ andl %ebx, %eax
+ movl a, %ebx
+ notl %ebx
+ movl b, %ecx
+ andl %ecx, %ebx
+ orl %ebx, %eax
+ movl %eax, x
+ # x=a&~b|~a&b
+
+Peephole optimization
+
+ [58]Wiktionary definition: An optimization that works by eliminating
+ redundant instructions from a small area of code.
+ Some examples of possible peephole optimization for the Basico
+ compiler:
+
+ original code
+ optimized code
+ comment
+ movl $0, %eax
+ movl %eax, -12(%ebp) movl $0, -12(%ebp) The x86 CPU can
+ move a constant to a memory location.
+ movl $48, %ebx
+ cmpl %ebx, %eax
+ cmpl $48, %eax
+ The x86 CPU can compare a constant with a register.
+ movl $48, %ebx
+ subl %ebx, %eax
+ subl $48, %eax
+ The x86 CPU can subtract a constant from a register.
+ movl $0, %eax
+ pushl %eax
+ pushl $0
+ The x86 CPU can push a constant.
+ cmpl %ebx, %eax
+ sete %al
+ movzbl %al, %eax
+ andl %eax, %eax
+ jz .L8
+ cmpl %ebx, %eax
+ jnz .L8 The compare op code sets the flags.
+ cmpl %ebx, %eax
+ setg %al
+ movzbl %al, %eax
+ andl %eax, %eax
+ jz .L10
+ cmpl %ebx, %eax
+ jng .L10
+ The compare op code sets the flags.
+
+ A peephole optimizer uses pattern matching to identify a code segment.
+
+BASICO sources
+
+ The source code of BASICO version 0.8 is discussed in detail. The
+ findings of this code review are implemented in BASICO version 0.9.
+ [59]BASICO source code reading
+
+ Version 0.9
+
+ Here are the bugfixes as mentioned in the source code reading.
+ [60]BASICO version 0.9
+
+ Version 0.8
+
+ The statements of a function that is only called from one location is
+ copied to this location (inline the function). This reduced the source
+ lines to 975. The stripped throw-away compiler binary is 16432 bytes
+ long and needs 46ms to process the basico.bas input file, the BASICO
+ version is 18124 bytes long and needs 82ms. The GNU C compiler was
+ using -O1 optimization. The simple idea of using CPU registers as stack
+ machine cache pays of very well.
+ [61]BASICO version 0.8
+
+ Version 0.7
+
+ The parser calls now the code generation functions. The BASICO compiler
+ is written in BASICO. Compiler bootstrapping is possible. For the input
+ file basico.bas the BASICO version 0.5 compiler produces the same
+ assembler code as the BASICO 0.7 compiler. The compiler is 1279 source
+ lines long.
+ [62]BASICO version 0.7
+
+ Version 0.6
+
+ Now scanner, symbol table and parser are in BASICO. The parser can
+ successful parse itself. The BASICO parser was transcribed from the
+ CoCo/R Parser.cpp output file of the basico06.atg grammar input file.
+ [63]BASICO version 0.6
+
+ Version 0.5
+
+ The compiler components scanner and symbol table are now available in
+ BASICO. The BASICO version scanner.bas was created by transcribing
+ scanner.c. The input file scanner.bas produces the same output file
+ with the BASICO version and the C version of the scanner - like it
+ should be. The compiled C version is 5844 bytes long and needs 12ms to
+ process the scanner.bas input file, the BASICO version is 9928 bytes
+ long and needs 22ms.
+ Version 0.51 combines the infix to postfix translator in BASICO with
+ the scanner and symbol table. This is an intermediate step to the full
+ BASICO parser in BASICO.
+ [64]BASICO version 0.51
+ [65]BASICO version 0.5
+
+ Version 0.4
+
+ The scanner is now written in C. For easy re-writing of the scanner in
+ BASICO only a subset of C was used. The BASICO source is copied as
+ comment into the output assembler listing.
+ Version 0.41 has the infix to postfix translator examples.
+ [66]BASICO version 0.41
+ [67]BASICO version 0.4
+
+ Version 0.3
+
+ This version uses symbol tables. Now different variable types and
+ different storage types are working. Even call-by-reference for arrays
+ as function parameters is implemented. The first version of our
+ throw-away compiler is ready. Now work moves from back-end (code
+ generation) to front-end (scanner, parser).
+ In version 0.31 the lex scanner translates the keywords into
+ characters. The bison parser uses these characters as tokens. These
+ changes prepare the parser for the new scanner. Another detail is the
+ new handling of <= and >=. The boolean operator ^ for xor is new. The
+ not operator is now ~ as in C.
+ [68]BASICO version 0.31
+ [69]BASICO version 0.3
+
+ Version 0.2
+
+ This version can compile expressions with global integer and global
+ array-of-integer variables. Function call with return value is
+ possible. Function call to C library functions with 3 parameters
+ maximum is working, too. The central missing element is the symbol
+ table. The symbol table tells the type of the variable like char
+ variable and/or local variable. Op-code generation for char and local
+ variables is missing, too. But we have reached some level of "Tiny
+ BASIC Compiler".
+ Version 0.21 allows comments after a statement.
+ [70]BASICO version 0.21
+ [71]BASICO version 0.2
+
+ Version 0.1
+
+ This version uses lex and yacc to implement the parser of the
+ throw-away compiler. The parser does not only check that the syntax is
+ okay, but does emit some semantic information. Some very first ideas on
+ symbol table and code generation are included too.
+ [72]BASICO version 0.1
+
+Books
+
+ Compiler bauen mit UNIX
+ Introduction to compiler construction with UNIX
+ Axel-Tobias Schreiner with H. G. Friedman, Jr.,
+ german 1985, Hanser, München, ISBN 3-446-14359-9;
+ englisch 1985, Prentice-Hall Software Series, New York, ISBN
+ 0-13-474396-2;
+ The book is no longer in print. But you can download the [73]example
+ programs here.
+ The UNIX Programming Environment
+ Der UNIX - Werkzeugkasten. Programmieren mit UNIX
+ Brian W. Kernighan, Rob Pike
+ german 2002, Hanser, München, ISBN 978-3446142732;
+ english 1984, Prentice Hall, Inc., ISBN 0-13-937681-X
+ The english version of the book is still in print. The [74]example
+ programs are here.
+
+References
+
+ 1. http://www.andreadrian.de/tbng/BASICO_part1.html
+ 2. http://www.andreadrian.de/tbng/#mozTocId868038
+ 3. http://www.andreadrian.de/tbng/#mozTocId936395
+ 4. http://www.andreadrian.de/tbng/#mozTocId571548
+ 5. http://www.andreadrian.de/tbng/#mozTocId888819
+ 6. http://www.andreadrian.de/tbng/#mozTocId9185
+ 7. http://www.andreadrian.de/tbng/#mozTocId738865
+ 8. http://www.andreadrian.de/tbng/#mozTocId972790
+ 9. http://www.andreadrian.de/tbng/#mozTocId769710
+ 10. http://www.andreadrian.de/tbng/#mozTocId41406
+ 11. http://www.andreadrian.de/tbng/#mozTocId528848
+ 12. http://www.andreadrian.de/tbng/#mozTocId696380
+ 13. http://www.andreadrian.de/tbng/#mozTocId60605
+ 14. http://www.andreadrian.de/tbng/#mozTocId667307
+ 15. http://www.andreadrian.de/tbng/#mozTocId798052
+ 16. http://www.andreadrian.de/tbng/#mozTocId300437
+ 17. http://www.andreadrian.de/tbng/#mozTocId334313
+ 18. http://www.andreadrian.de/tbng/#mozTocId35039
+ 19. http://www.andreadrian.de/tbng/#mozTocId276292
+ 20. http://www.andreadrian.de/tbng/#mozTocId343981
+ 21. http://www.andreadrian.de/tbng/#mozTocId802188
+ 22. http://www.andreadrian.de/tbng/#mozTocId636454
+ 23. http://www.andreadrian.de/tbng/#mozTocId460571
+ 24. http://www.andreadrian.de/tbng/#mozTocId973061
+ 25. http://www.andreadrian.de/tbng/#mozTocId333375
+ 26. http://www.andreadrian.de/tbng/#mozTocId865845
+ 27. http://www.andreadrian.de/tbng/#mozTocId476164
+ 28. http://www.andreadrian.de/tbng/#mozTocId201449
+ 29. http://www.andreadrian.de/tbng/#mozTocId271148
+ 30. http://www.andreadrian.de/tbng/#mozTocId36461
+ 31. http://www.andreadrian.de/tbng/#mozTocId843374
+ 32. http://www.andreadrian.de/tbng/#mozTocId423416
+ 33. http://www.andreadrian.de/tbng/#mozTocId327801
+ 34. http://www.andreadrian.de/tbng/#mozTocId625187
+ 35. http://www.andreadrian.de/tbng/#mozTocId954198
+ 36. http://www.andreadrian.de/tbng/#mozTocId647194
+ 37. http://www.andreadrian.de/tbng/#mozTocId63465
+ 38. http://www.andreadrian.de/tbng/#mozTocId98862
+ 39. http://www.ddj.com/184406381
+ 40. http://www.acm.org/classics/oct95/
+ 41. http://dinosaur.compilertools.net/
+ 42. http://dinosaur.compilertools.net/
+ 43. http://www.bitsavers.org/pdf/dartmouth/BASIC_Oct64.pdf
+ 44. http://www.comune.basico.me.it/
+ 45. http://www.db.informatik.uni-kassel.de/Help/pascal/einfuehrung/pas_bnf.html
+ 46. http://ssw.jku.at/Coco/
+ 47. http://www.246.dk/pascals.html
+ 48. http://www.oberon.ethz.ch/books.html
+ 49. http://www.cs.inf.ethz.ch/%7Ewirth/books/CompilerConstruction/
+ 50. http://homepages.cwi.nl/%7Esteven/pascal/
+ 51. ftp://ftp.freepascal.org/pub/fpc/attic/ucsd-pascal/
+ 52. http://cm.bell-labs.com/who/dmr/
+ 53. http://compilers.iecc.com/crenshaw/
+ 54. http://www.rapideuphoria.com/index.html
+ 55. http://www.rano.org/bcompiler.html
+ 56. http://fabrice.bellard.free.fr/otcc/
+ 57. http://www.andreadrian.de/tbng/otcc1.c
+ 58. http://en.wiktionary.org/wiki/peephole_optimization
+ 59. http://www.andreadrian.de/tbng/BASICO_part1.html
+ 60. http://www.andreadrian.de/tbng/basico_20060715.tgz
+ 61. http://www.andreadrian.de/tbng/basico_20060629.tgz
+ 62. http://www.andreadrian.de/tbng/basico_20060628.tgz
+ 63. http://www.andreadrian.de/tbng/basico_20060625a.tgz
+ 64. http://www.andreadrian.de/tbng/basico_20060621.tgz
+ 65. http://www.andreadrian.de/tbng/basico_20060620.tgz
+ 66. http://www.andreadrian.de/tbng/basico_20060618.tgz
+ 67. http://www.andreadrian.de/tbng/basico_20060611h.tgz
+ 68. http://www.andreadrian.de/tbng/basico_20060611b.tgz
+ 69. http://www.andreadrian.de/tbng/basico_20060610c.tgz
+ 70. http://www.andreadrian.de/tbng/basico_20060609.tgz
+ 71. http://www.andreadrian.de/tbng/basico_20060605b.tgz
+ 72. http://www.andreadrian.de/tbng/tbng01_20060528.tgz
+ 73. http://www.cs.rit.edu/%7Eats/books/index.html
+ 74. http://cm.bell-labs.com/cm/cs/upe/