1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
|
BASICO - Programming Language, Self-compiling compiler
Author: Andre Adrian
Version: 22.apr.2008
Abstract
BASICO
* is a small imperative programming language that is just powerful
enough to compile itself (compiler bootstrapping).
* has no GOTO, but has while-break-wend and multiple return
* has C-like string handling.
* is implemented in less then 1000 source code lines for the
compiler.
* produces real binary programs for x86 processors, not P-code or
Byte-Code.
* uses the C compiler toolchain (assembler, linker)
* uses C library functions like printf(), getchar(), strcpy(),
isdigit(), rand() for run-time support.
Please check the section [1]BASICO source code reading to get a
detailed discussion of the source code!
Contents
* [2]Abstract
* [3]Contents
* [4]Introduction
* [5]BASICO syntax ideas
+ [6]GOTO considered harmful
+ [7]Statement separator
+ [8]Statement list
+ [9]Dangling else
+ [10]Assignment and Equal
+ [11]Evaluation of conditional expressions
+ [12]Variable type declaration
+ [13]Automatic type conversion
+ [14]Function calls
+ [15]Nested functions
+ [16]Call-by-value and Call-by-reference
+ [17]Named constants
+ [18]Comment
+ [19]Compiler error handling
* [20]BASICO syntax
* [21]BASICO example program
* [22]BASICO development strategy
* [23]Predictive Parser for Expression
* [24]Examples for Compiler bootstrapping
* [25]Symbol Table
* [26]Code Generator
* [27]Peephole optimization
* [28]BASICO sources
+ [29]Version 0.9
+ [30]Version 0.8
+ [31]Version 0.7
+ [32]Version 0.6
+ [33]Version 0.5
+ [34]Version 0.4
+ [35]Version 0.3
+ [36]Version 0.2
+ [37]Version 0.1
* [38]Books
Introduction
In January 2006 Dr. Dobb's journal celebrated the [39]30th anniversary
of Tiny BASIC. Tiny BASIC was specified by Dennis Allison and was
implemented by Dick Whipple and John Arnold. Tiny BASIC needed 3
Kilobytes of memory and was sold for 5 US-dollars. The Microsoft BASIC
of this time needed 8 Kilobytes and 150 US-dollars.
The original Tiny BASIC interpreter is written in an intermediate
language (IL) to save memory. The CPU runs a virtual machine (VM) to
emulate the IL processor. The IL processor interprets the Tiny BASIC
program. Li-Chen Wang created a Tiny BASIC version without IL for the
Z80 8-bit CPU.
One drawback of this approach is that the IL can handle strings,
because it has to parse the BASIC keywords like PRINT, GOTO, but the
Tiny BASIC itself has no string handling commands. IL has small memory
requirement but also slow execution speed.
Dennis Ritchie used compiler bootstrapping for the C compiler, like
Niklaus Wirth did for the computer languages PASCAL, Modula and Oberon.
The compiler itself is written in the language the compiler can
translate. After an first iteration that needs external help, the
compiler of iteration N can compile the source code that creates
compiler of iteration N+1.
The project is pure fun. It is just a "what if" you ask 30 years later.
In 1976 the knowledge for BASICO was already present. The "[40]goto
considered harmful" paper of Edsger W. Dijkstra was written in 1968.
The first PASCAL compiler was written in 1971. The UNIX developers used
compiler bootstrapping in 1973. The compiler tools [41]lex (Lesk, 1975)
and [42]yacc (Johnson, 1974) were available.
The word BASICO can be read as "[43]Beginner's All-purpose Symbolic
Instruction Code zero" or as Basico, just something with an italian
sound like lambrusco.
Note: The project started with the name "Tiny BASIC no GOTO (TB-NG)",
but BASICO sounds so much better. Basico is the name of a [44]small
italian town (745 inhabitants).
BASICO syntax ideas
Today the programming language C is the standard. Other languages are
comments to C. PASCAL is C with nested procedures but without stdio
library. C++ is C overkill. Modula-2 is C without printf but with
import and export. Ada is PASCAL++ for the DoD. JAVA is C++ with
garbage collection but byte-code. Tcl/Tk is no C at all, but a language
with minimum syntax. Last but not least BASIC is still not dead, but
has included a lot of syntax from other languages.
GOTO considered harmful
Today everybody agrees on the "one coding block shall have one entry
and one exit" paradigma " to shorten the conceptual gap between the
static program and the dynamic process, to make the correspondence
between the program (spread out in text space) and the process (spread
out in time) as trivial as possible". But in the same paper Dijkstra
also wrote "I remember having read the explicit recommendation to
restrict the use of the go to statement to alarm exits".
Just deleting the goto statement and not introduce some other syntax
for "alarm exits" does make programs less structured. Some alarm exits
are
* conditional break in a loop
* multiple return in a function
* conditional return stack un-rolling (try/throw/catch or
setjmp/longjmp)
The traditional structured control flow for the "alarm exit" problem
needs additional state variables or special function return-values that
remember the alarm-status. Specially for return stack un-rolling the
dynamic control flow is hard to see out of the static program listing.
Statement separator
The original BASIC of 1964 used linefeed as statement terminator.
PASCAL uses ; as statement separator, C uses ; as statement terminator.
Beginners often forget this silent syntax element. BASIC later adopted
the : as an additional statement separator. BASICO avoids silent syntax
elements, therefore both linefeed and ; are used as statement
terminators. With linefeed as statement terminator a long statement can
not be split in two source code lines. Such a statement needs a
continuation symbol to undo the statement terminator effect of the
linefeed. In Tcl/Tk the \ as last character before linefeed works as
continuation symbol. BASICO follows Tcl/Tk syntax.
Statement list
PASCAL encloses statements between the keywords BEGIN and END. C uses {
and } for the same purpose. These statement list keywords give trouble
to the beginner. First, they are silent syntax elements, second the
beginner has problems to see the difference between statement and
statement list.
The statement list keywords are not needed if the control flow keywords
include them. One example is the WHILE condition DO statement-list WEND
syntax. Another is the IF condition THEN statement-list ENDIF
construct. The first statement-list is enclosed by DO and WEND, the
second statement-list with THEN and ENDIF.
Dangling else
Dangling else is solved with the IF condition THEN statement-list ELSE
statement-list ENDIF construct. With ENDIF there is no more dangling
else shift/reduce conflict.
Assignment and Equal
BASIC uses = both for assigment and for test on equivalence, as it is
mathematical tradition. Pascal uses := and =, C uses = and ==. As long
as multiple assignment like a = b = c is not needed, the both meanings
of = do not produce a conflict. The BASICO meaning of a = b = c is a
assign (b equal c), that is a becomes 1 if b is equal to c and 0 else.
The author's all time favorite in simple C bugs, the nasty if (a = b)
instead of if (a == b) is gone with BASICO.
Evaluation of conditional expressions
The symbols '<', '>=', .. are defined as conditional operators. The
expression a >= b can result in 1 for true or 0 for false. There is no
short circuit evaluation for conditional expressions. This is fine for
today's CPUs that do not like jumps very much because of the execution
queue flushing.
Variable type declaration
Pascal and C have strict variable types. Without declaration no
variable can be used. BASIC needs a variable declaration only for array
variables. Scalar variables can be used directly. BASICO follows Pascal
and C. The variable types are integer, character, one-dimensional array
of integer and one dimensional array of character.
Automatic type conversion
All calculations are done with integer. Therefore the char variables
are promoted to integer. The BASICO char is an unsigned char. There are
no negative char values.
Function calls
The C library functions like printf() or strcpy() can be called out of
a BASICO program. The C call convention for the GNU C compiler is used.
The function arguments are pushed from right to left on the stack. All
arguments have the same size (32bit for GNU C on Intel x86). The return
value in placed into a CPU register (eax for GNU C on Intel x86).
Nested functions
BASICO does not support nested functions. I even think that nested
functions are a bad method for information hiding.
Call-by-value and Call-by-reference
Call-by-value is the method for int and char variables as function
arguments: A copy of the variable is forwarded to the function. With
call-by-value only input parameters are possible. Array variables are
forwarded with call-by-reference: The memory location of the start of
the array is given. This method allows arrays as input and output
parameters.
The return value of the function is one output parameter. To get more
output parameters, the trick in BASICO is to use arrays as function
arguments to hold these output parameters.
Named constants
Pascal has named constants with the const keyword. C uses the #define
preprocessor command for constants. BASICO has no named constants (for
now).
Comment
BASIC uses REM to start a one-line comment. PASCAL and C have
multi-line comments with { } or /* */. C++ re-introduced the one-line
comment with //. BASICO uses the C++ one-line comment.
Compiler error handling
The compiler stops at the first error. The source code line is shown up
to the point where the scanner, parser or code-generator detected the
bug and an error message is given. For the batch processing age
compiler gurus out there: one error is all you get from this compiler.
BASICO syntax
The following LL(1) context-free grammar in Extended Backus-Naur-Form
(EBNF) for BASICO is final. It is based on the [45]PASCAL syntax. The
syntax is free of shift/reduce conflicts. All variable declarations are
before variable use. A simple LL(1) predictive parser needs one pass to
translate the source. Syntax terminals are enclosed in ' ' or " " like
'<' and "if". Capital letter non-terminals come from the scanner
(CONST, IDENT, ..). The empty set is commented. The scanner translates
some linefeeds into ';'. This trick makes the syntax semicolon-free.
The EBNF was checked by [46]Coco/R a scanner and parser generator for
LL(1) grammars from Hanspeter Mössenböck. In EBNF, [ ] is 0 to 1
repetition, { } is 0 to n repetion, ( ) is grouping alternatives.
basico =
[ "var" { globalDecl ';' } ]
{ "func" IDENT '(' paramList ')' ':' returnType ';' blockOrForward
';' }
.
globalDecl =
IDENT ':' (
"int"
| "char"
| "array" CONST (
"int"
| "char"
)
) .
paramList =
[ paramDecl { ',' paramDecl } ]
.
paramDecl =
IDENT ':' (
"int"
| "char"
| "array" (
"int"
| "char"
)
) .
returnType =
"int"
| "char"
| "void"
.
blockOrForward =
pblock3
| "forward"
.
pblock3 =
[ "var" { localDecl ';' } ]
"begin" stmtList "end"
.
localDecl =
IDENT ':' (
"int"
| "char"
| "array" CONST (
"int"
| "char"
)
) .
stmtList =
{ [ statement ] ';' }
.
statement =
IDENT (
[ '[' expr ']' ] '=' expr
| exprList
)
| "if" expr "then" stmtList [ "else" stmtList ] "endif"
| "while" expr "do" stmtList "wend"
| "return" [ expr ]
| "break"
.
exprList =
'(' [ expr { ',' expr } ] ')'
.
expr =
addexpr { '=' addexpr
| '#' addexpr
| '<' [ '=' ] addexpr
| '>' [ '=' ] addexpr
} .
addexpr =
term { '+' term
| '-' term
| '|' term
| '^' term
} .
term =
unaryfact { '*' unaryfact
| '/' unaryfact
| '%' unaryfact
| '&' unaryfact
} .
unaryfact =
'-' fact
| '~' fact
| fact
.
fact =
IDENT [ '[' expr ']' | exprList ]
| CONST
| CHRCONST
| STRCONST
| '(' expr ')'
.
BASICO example program
This is the old "Guess a number" game in BASICO. The example program
uses the C functions getchar(), printf(), time(), srand() and rand().
The function main() is the entry point like in C. This is the first
BASICO game ever written. Please note the structured use of alarm exits
with "return decimal" in getDecimal() and "break" in main().
// guess.bas
// Guess a number game
// Compile with BASICO version 0.9
func getDecimal():int
var decimal: int
ch: char
begin
decimal = 0;
while 1 do
ch = getchar()
if (ch>='0')&(ch<='9') then
decimal = decimal * 10 + ch - '0'
else
return decimal
endif
wend
end
func main():void
var myNumber: int
yourNumber: int
guesses: int
ch: char
begin
printf("Guess a number game\n")
printf("Numbers are between 1 and 50\n")
srand(time(0)) // get time in seconds since 1970, init Random
Generator
while 1 do
myNumber = rand() // get a random number
myNumber = myNumber % 50 + 1
guesses = 0
while 1 do
printf("Your guess: ")
yourNumber = getDecimal()
guesses = guesses + 1
if yourNumber = myNumber then
printf("You guessed it in %d guesses!\n", guesses)
break
else
if yourNumber > myNumber then
printf("Your guess is to high\n")
else
printf("Your guess is to low\n")
endif
endif
wend
printf("Another game (y or n): ")
ch = getchar();
if (ch='n')|(ch='N') then
break
endif
getchar() // eat \n
wend
printf("Goodbye.\n")
end
Compile program:
./basico<guess.bas >guess.s
cc -o guess guess.s
Run program:
./guess
Guess a number game
Numbers are between 1 and 50
Your guess: 25
Your guess is to high
Your guess: 13
Your guess is to low
Your guess: 19
Your guess is to high
Your guess: 16
You guessed it in 4 guesses!
Another game (y or n): n
Goodbye.
BASICO development strategy
Compiler bootstrapping is done in several steps. We start with the
development environment for one language, in our case C, and we end
with the development environment for another language, BASICO. To
minimize our effort, we only replace the C compiler with the BASICO
compiler but keep the assembler, linker and library of the C
tool-chain.
* Define the BASICO source language. For compiler bootstrap you need
character, character arrays (strings), integer and integer arrays.
A recursive descent parser needs recursive function calls. Function
calls need parameters (formal arguments), local variables, return
values and call-by-reference in/out variables. See
basico05_bnf.txt.
* Define the BASICO target language. The output of a recursive
descent parser (compiler) are op-codes for a stack-machine. These
stack op-codes are further translated into op-codes (assembler
listing) for a real register machine like the Intel x86 or the
Renesas R8C. See x86.txt.
* Implement a throw-away compiler that translates BASICO source into
target op-codes. This initial compiler is done with lex and yacc.
* Establish the tool-chain with BASICO throw-away compiler,
assembler, linker and library.
* Convert the BASICO grammer to LL(1) with the help of the CoCo/R
scanner and parser generator.
* Write the first version of the BASICO compiler in BASICO source
code.
* Translate the first BASICO compiler with the initial lex/yacc
compiler into a binary program.
* Now the development environment is complete. We have the compiler
source, a throw-away compiler that can translate this source into a
program. This program can compile it's own source.
Predictive Parser for Expression
The following parser is an enhanced version of the infix to postfix
translator in chapter 2.5 of "COMPILERS Principles, Techniques and
Tools" by Aho, Sethi and Ullman. The program can only handle single
digit numbers and can't skip white space, but does understand all
single character BASICO operators like =, #, <, >, +, -, |, ^, *, /, %,
&, ~, ( and ).
var lookahead: int
func main(): void
begin
lookahead = getchar()
expr()
putchar('\n')
end
func expr(): void
begin
addexpr()
while 1 do
if lookahead = '=' then
match('='); addexpr(); putchar('=')
else if lookahead = '#' then
match('#'); addexpr(); putchar('#')
else if lookahead = '<' then
match('<'); addexpr(); putchar('<')
else if lookahead = '>' then
match('>'); addexpr(); putchar('>')
else; break; endif; endif; endif; endif
wend
end
func addexpr(): void
begin
term()
while 1 do
if lookahead = '+' then
match('+'); term(); putchar('+')
else if lookahead = '-' then
match('-'); term(); putchar('-')
else if lookahead = '|' then
match('|'); term(); putchar('|')
else if lookahead = '^' then
match('^'); term(); putchar('^')
else; break; endif; endif; endif; endif
wend
end
func term(): void
begin
unaryfact();
while 1 do
if lookahead = '*' then
match('*'); unaryfact(); putchar('*')
else if lookahead = '/' then
match('/'); unaryfact(); putchar('/')
else if lookahead = '%' then
match('%'); unaryfact(); putchar('%')
else if lookahead = '&' then
match('&'); unaryfact(); putchar('&')
else; break; endif; endif; endif; endif
wend
end
func unaryfact(): void
begin
if lookahead = '-' then
match('-'); fact(); putchar('-')
else if lookahead = '~' then
match('~'); fact(); putchar('~')
else; fact(); endif; endif
end
func fact(): void
begin
if lookahead = '(' then
match('('); expr(); match(')')
else if isdigit(lookahead) then
putchar(lookahead); match(lookahead)
else; error(); endif; endif
end
func match(t: int): void
begin
if lookahead = t then
lookahead = getchar()
else; error(); endif
end
func error(): void
begin
printf("syntax error\n")
exit(1)
end
Compile program:
./basico04<rpn3.bas >rpn3.s
cc -o rpn3 rpn3.s
Run program:
./rpn3
7=1+2*3
7123*+=
The infix expression 7 test for equal with 1 plus 2 multiply by 3 is
translated into the postfix expression push 7, 1, 2, 3, then multiply 2
by 3, add this with 1 and test this for equal with 7.
Examples for Compiler bootstrapping
* Niklaus Wirth published the source code of many small languages
like PL/0 in 1976, [47]PASCAL-S in 1975 and [48]Oberon-0
([49]sources) in 1996. These languages can not bootstrap
themselves. BASICO follows the PASCAL syntax.
* [50]P4-Pascal and [51]UCSD-Pascal can bootstrap themselves.
* Dennis Ritchie used compiler bootstrapping for the UNIX C compiler.
Today the sources of the "[52]last1120c" compiler are on his home
page. This compiler has no struct and creates PDP11/20 object code.
Together with the UNIX version 1 binaries and the PDP11 emulator
you can go back to 1973 and program like the old heros. Even the
UNIX version 1 man pages are available again. BASICO follows the C
semantics.
* Jack W. Crenshaw published between 1988 and 1989 the [53]TINY
language. He used characters instead of constants for keywords in
the parser. This idea is used in BASICO, too. TINY can not
bootstrap itself. BASICO follows TINY's code generation ideas.
* [54]Euphoria is a BASIC like programming language. Programs can run
interpreted or compiled with help of an Euphoria to C translator.
There is a free Euphoria interpreter in Euphoria.
* Edmund Grimley Evans wrote in 2001 "[55]Bootstrapping a simple
compiler from nothing". He started on the pure binary machine code
level. The language is Forth like.
* Fabrice Bellard wrote in 2002 the [56]Obfuscated Tiny C Compiler
(OTCC). Macro extended with gcc -E and pretty printed with indent
the source is 463 lines and can compile itself on Linux into 80386
machine code. See this masterpiece yourself ([57]readable otcc.c).
Symbol Table
Niklaus Wirth used in PASCAL-S a two-dimensional symbol table. BASICO
provides only one-dimensional arrays. The compiler program itself has
to do the abstraction from 1-dim to 2-dim.
The identifiers in the program can be keywords (if, then, else), global
variable names, function names, function parameter names or local
variable names. The search in the symbol table is keywords first, then
local and parameter variables and last global variables and function
names. These three namespaces (keywords, local, global) can be
implemented with three symbol tables. Or in one symbal table that has
three sections. The first approach is implemented.
The variable name is the unique identifier in the symbol table to get
and set the attributes of the variable. One attribute is the variable
type like int, char, array-of-int and array-of char. Another is the
storage type like global storage or local storage. One more attribute
for parameter and local variables is the frame-pointer offset
(frame-pointer relative memory location). See symtab.c for details.
Code Generator
The recursive decent parser creates op-codes for a stack-machine.
Normal microprocessors are register machines that can handle 1 to 3
addresses per op-code. The Intel x86 chips or the Renesas R8C chip can
handle 2 addresses. To bridge the gap between stack-machine and
2-address machine two or more registers of the CPU are used as
stack-cache, that is TOS (top of stack), NOS (next of stack), third of
stack and fourth of stack are in registers. Instead of moving the
contents of the CPU registers to perform the push and pop stack
operations, the stack-machine cache labels (TOS, NOS, THIRD, FOURTH)
are re-mapped. Therefore CPU register %eax can be NOS at one time and
TOS at another time. See codegen.c for details. Some examples of code
generation for the x86. The BASICO statement follows as comment the
assembler code:
movl a, %eax
movl $1, %ebx
subl %ebx, %eax
movl %eax, x
# x=a-1
movl a, %eax
movl b, %ebx
subl %ebx, %eax
movl c, %ebx
movl d, %ecx
subl %ecx, %ebx
movl %edx, %esi
cltd
idivl %ebx
movl %esi, %edx
movl %eax, x
# x=(a-b)/(c-d)
movl a, %eax
negl %eax
movl b, %ebx
movl c, %ecx
movl d, %edx
imul %edx, %ecx
subl %ecx, %ebx
subl %ebx, %eax
movl %eax, x
# x=-a-(b-c*d)
movl i, %eax
movl $1, %ebx
addl %ebx, %eax
movl n, %ebx
movl i, %ecx
movl bb(,%ecx,4), %ecx
imul %ecx, %ebx
movl %ebx, bb(,%eax,4)
# bb[i+1] = n*bb[i]
movl a, %eax
movl b, %ebx
notl %ebx
andl %ebx, %eax
movl a, %ebx
notl %ebx
movl b, %ecx
andl %ecx, %ebx
orl %ebx, %eax
movl %eax, x
# x=a&~b|~a&b
Peephole optimization
[58]Wiktionary definition: An optimization that works by eliminating
redundant instructions from a small area of code.
Some examples of possible peephole optimization for the Basico
compiler:
original code
optimized code
comment
movl $0, %eax
movl %eax, -12(%ebp) movl $0, -12(%ebp) The x86 CPU can
move a constant to a memory location.
movl $48, %ebx
cmpl %ebx, %eax
cmpl $48, %eax
The x86 CPU can compare a constant with a register.
movl $48, %ebx
subl %ebx, %eax
subl $48, %eax
The x86 CPU can subtract a constant from a register.
movl $0, %eax
pushl %eax
pushl $0
The x86 CPU can push a constant.
cmpl %ebx, %eax
sete %al
movzbl %al, %eax
andl %eax, %eax
jz .L8
cmpl %ebx, %eax
jnz .L8 The compare op code sets the flags.
cmpl %ebx, %eax
setg %al
movzbl %al, %eax
andl %eax, %eax
jz .L10
cmpl %ebx, %eax
jng .L10
The compare op code sets the flags.
A peephole optimizer uses pattern matching to identify a code segment.
BASICO sources
The source code of BASICO version 0.8 is discussed in detail. The
findings of this code review are implemented in BASICO version 0.9.
[59]BASICO source code reading
Version 0.9
Here are the bugfixes as mentioned in the source code reading.
[60]BASICO version 0.9
Version 0.8
The statements of a function that is only called from one location is
copied to this location (inline the function). This reduced the source
lines to 975. The stripped throw-away compiler binary is 16432 bytes
long and needs 46ms to process the basico.bas input file, the BASICO
version is 18124 bytes long and needs 82ms. The GNU C compiler was
using -O1 optimization. The simple idea of using CPU registers as stack
machine cache pays of very well.
[61]BASICO version 0.8
Version 0.7
The parser calls now the code generation functions. The BASICO compiler
is written in BASICO. Compiler bootstrapping is possible. For the input
file basico.bas the BASICO version 0.5 compiler produces the same
assembler code as the BASICO 0.7 compiler. The compiler is 1279 source
lines long.
[62]BASICO version 0.7
Version 0.6
Now scanner, symbol table and parser are in BASICO. The parser can
successful parse itself. The BASICO parser was transcribed from the
CoCo/R Parser.cpp output file of the basico06.atg grammar input file.
[63]BASICO version 0.6
Version 0.5
The compiler components scanner and symbol table are now available in
BASICO. The BASICO version scanner.bas was created by transcribing
scanner.c. The input file scanner.bas produces the same output file
with the BASICO version and the C version of the scanner - like it
should be. The compiled C version is 5844 bytes long and needs 12ms to
process the scanner.bas input file, the BASICO version is 9928 bytes
long and needs 22ms.
Version 0.51 combines the infix to postfix translator in BASICO with
the scanner and symbol table. This is an intermediate step to the full
BASICO parser in BASICO.
[64]BASICO version 0.51
[65]BASICO version 0.5
Version 0.4
The scanner is now written in C. For easy re-writing of the scanner in
BASICO only a subset of C was used. The BASICO source is copied as
comment into the output assembler listing.
Version 0.41 has the infix to postfix translator examples.
[66]BASICO version 0.41
[67]BASICO version 0.4
Version 0.3
This version uses symbol tables. Now different variable types and
different storage types are working. Even call-by-reference for arrays
as function parameters is implemented. The first version of our
throw-away compiler is ready. Now work moves from back-end (code
generation) to front-end (scanner, parser).
In version 0.31 the lex scanner translates the keywords into
characters. The bison parser uses these characters as tokens. These
changes prepare the parser for the new scanner. Another detail is the
new handling of <= and >=. The boolean operator ^ for xor is new. The
not operator is now ~ as in C.
[68]BASICO version 0.31
[69]BASICO version 0.3
Version 0.2
This version can compile expressions with global integer and global
array-of-integer variables. Function call with return value is
possible. Function call to C library functions with 3 parameters
maximum is working, too. The central missing element is the symbol
table. The symbol table tells the type of the variable like char
variable and/or local variable. Op-code generation for char and local
variables is missing, too. But we have reached some level of "Tiny
BASIC Compiler".
Version 0.21 allows comments after a statement.
[70]BASICO version 0.21
[71]BASICO version 0.2
Version 0.1
This version uses lex and yacc to implement the parser of the
throw-away compiler. The parser does not only check that the syntax is
okay, but does emit some semantic information. Some very first ideas on
symbol table and code generation are included too.
[72]BASICO version 0.1
Books
Compiler bauen mit UNIX
Introduction to compiler construction with UNIX
Axel-Tobias Schreiner with H. G. Friedman, Jr.,
german 1985, Hanser, München, ISBN 3-446-14359-9;
englisch 1985, Prentice-Hall Software Series, New York, ISBN
0-13-474396-2;
The book is no longer in print. But you can download the [73]example
programs here.
The UNIX Programming Environment
Der UNIX - Werkzeugkasten. Programmieren mit UNIX
Brian W. Kernighan, Rob Pike
german 2002, Hanser, München, ISBN 978-3446142732;
english 1984, Prentice Hall, Inc., ISBN 0-13-937681-X
The english version of the book is still in print. The [74]example
programs are here.
References
1. http://www.andreadrian.de/tbng/BASICO_part1.html
2. http://www.andreadrian.de/tbng/#mozTocId868038
3. http://www.andreadrian.de/tbng/#mozTocId936395
4. http://www.andreadrian.de/tbng/#mozTocId571548
5. http://www.andreadrian.de/tbng/#mozTocId888819
6. http://www.andreadrian.de/tbng/#mozTocId9185
7. http://www.andreadrian.de/tbng/#mozTocId738865
8. http://www.andreadrian.de/tbng/#mozTocId972790
9. http://www.andreadrian.de/tbng/#mozTocId769710
10. http://www.andreadrian.de/tbng/#mozTocId41406
11. http://www.andreadrian.de/tbng/#mozTocId528848
12. http://www.andreadrian.de/tbng/#mozTocId696380
13. http://www.andreadrian.de/tbng/#mozTocId60605
14. http://www.andreadrian.de/tbng/#mozTocId667307
15. http://www.andreadrian.de/tbng/#mozTocId798052
16. http://www.andreadrian.de/tbng/#mozTocId300437
17. http://www.andreadrian.de/tbng/#mozTocId334313
18. http://www.andreadrian.de/tbng/#mozTocId35039
19. http://www.andreadrian.de/tbng/#mozTocId276292
20. http://www.andreadrian.de/tbng/#mozTocId343981
21. http://www.andreadrian.de/tbng/#mozTocId802188
22. http://www.andreadrian.de/tbng/#mozTocId636454
23. http://www.andreadrian.de/tbng/#mozTocId460571
24. http://www.andreadrian.de/tbng/#mozTocId973061
25. http://www.andreadrian.de/tbng/#mozTocId333375
26. http://www.andreadrian.de/tbng/#mozTocId865845
27. http://www.andreadrian.de/tbng/#mozTocId476164
28. http://www.andreadrian.de/tbng/#mozTocId201449
29. http://www.andreadrian.de/tbng/#mozTocId271148
30. http://www.andreadrian.de/tbng/#mozTocId36461
31. http://www.andreadrian.de/tbng/#mozTocId843374
32. http://www.andreadrian.de/tbng/#mozTocId423416
33. http://www.andreadrian.de/tbng/#mozTocId327801
34. http://www.andreadrian.de/tbng/#mozTocId625187
35. http://www.andreadrian.de/tbng/#mozTocId954198
36. http://www.andreadrian.de/tbng/#mozTocId647194
37. http://www.andreadrian.de/tbng/#mozTocId63465
38. http://www.andreadrian.de/tbng/#mozTocId98862
39. http://www.ddj.com/184406381
40. http://www.acm.org/classics/oct95/
41. http://dinosaur.compilertools.net/
42. http://dinosaur.compilertools.net/
43. http://www.bitsavers.org/pdf/dartmouth/BASIC_Oct64.pdf
44. http://www.comune.basico.me.it/
45. http://www.db.informatik.uni-kassel.de/Help/pascal/einfuehrung/pas_bnf.html
46. http://ssw.jku.at/Coco/
47. http://www.246.dk/pascals.html
48. http://www.oberon.ethz.ch/books.html
49. http://www.cs.inf.ethz.ch/%7Ewirth/books/CompilerConstruction/
50. http://homepages.cwi.nl/%7Esteven/pascal/
51. ftp://ftp.freepascal.org/pub/fpc/attic/ucsd-pascal/
52. http://cm.bell-labs.com/who/dmr/
53. http://compilers.iecc.com/crenshaw/
54. http://www.rapideuphoria.com/index.html
55. http://www.rano.org/bcompiler.html
56. http://fabrice.bellard.free.fr/otcc/
57. http://www.andreadrian.de/tbng/otcc1.c
58. http://en.wiktionary.org/wiki/peephole_optimization
59. http://www.andreadrian.de/tbng/BASICO_part1.html
60. http://www.andreadrian.de/tbng/basico_20060715.tgz
61. http://www.andreadrian.de/tbng/basico_20060629.tgz
62. http://www.andreadrian.de/tbng/basico_20060628.tgz
63. http://www.andreadrian.de/tbng/basico_20060625a.tgz
64. http://www.andreadrian.de/tbng/basico_20060621.tgz
65. http://www.andreadrian.de/tbng/basico_20060620.tgz
66. http://www.andreadrian.de/tbng/basico_20060618.tgz
67. http://www.andreadrian.de/tbng/basico_20060611h.tgz
68. http://www.andreadrian.de/tbng/basico_20060611b.tgz
69. http://www.andreadrian.de/tbng/basico_20060610c.tgz
70. http://www.andreadrian.de/tbng/basico_20060609.tgz
71. http://www.andreadrian.de/tbng/basico_20060605b.tgz
72. http://www.andreadrian.de/tbng/tbng01_20060528.tgz
73. http://www.cs.rit.edu/%7Eats/books/index.html
74. http://cm.bell-labs.com/cm/cs/upe/
|