summaryrefslogtreecommitdiff
path: root/miniany/doc/www.andreadrian.de_tbng.txt
blob: b3f150c67b19f8719f589b3b09d92909a93edf7b (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
             BASICO - Programming Language, Self-compiling compiler

   Author: Andre Adrian
   Version: 22.apr.2008

Abstract

   BASICO
     * is a small imperative programming language that is just powerful
       enough to compile itself (compiler bootstrapping).
     * has no GOTO, but has while-break-wend and multiple return
     * has C-like string handling.
     * is implemented in less then 1000 source code lines for the
       compiler.
     * produces real binary programs for x86 processors, not P-code or
       Byte-Code.
     * uses the C compiler toolchain (assembler, linker)
     * uses C library functions like printf(), getchar(), strcpy(),
       isdigit(), rand() for run-time support.

   Please check the section [1]BASICO source code reading to get a
   detailed discussion of the source code!

Contents

     * [2]Abstract
     * [3]Contents
     * [4]Introduction
     * [5]BASICO syntax ideas
          + [6]GOTO considered harmful
          + [7]Statement separator
          + [8]Statement list
          + [9]Dangling else
          + [10]Assignment and Equal
          + [11]Evaluation of conditional expressions
          + [12]Variable type declaration
          + [13]Automatic type conversion
          + [14]Function calls
          + [15]Nested functions
          + [16]Call-by-value and Call-by-reference
          + [17]Named constants
          + [18]Comment
          + [19]Compiler error handling
     * [20]BASICO syntax
     * [21]BASICO example program
     * [22]BASICO development strategy
     * [23]Predictive Parser for Expression
     * [24]Examples for Compiler bootstrapping
     * [25]Symbol Table
     * [26]Code Generator
     * [27]Peephole optimization
     * [28]BASICO sources
          + [29]Version 0.9
          + [30]Version 0.8
          + [31]Version 0.7
          + [32]Version 0.6
          + [33]Version 0.5
          + [34]Version 0.4
          + [35]Version 0.3
          + [36]Version 0.2
          + [37]Version 0.1
     * [38]Books

Introduction

   In January 2006 Dr. Dobb's journal celebrated the [39]30th anniversary
   of Tiny BASIC. Tiny BASIC was specified by Dennis Allison and was
   implemented by Dick Whipple and John Arnold. Tiny BASIC needed 3
   Kilobytes of memory and was sold for 5 US-dollars. The Microsoft BASIC
   of this time needed 8 Kilobytes and 150 US-dollars.
   The original Tiny BASIC interpreter is written in an intermediate
   language (IL) to save memory. The CPU runs a virtual machine (VM) to
   emulate the IL processor. The IL processor interprets the Tiny BASIC
   program. Li-Chen Wang created a Tiny BASIC version without IL for the
   Z80 8-bit CPU.
   One drawback of this approach is that the IL can handle strings,
   because it has to parse the BASIC keywords like PRINT, GOTO, but the
   Tiny BASIC itself has no string handling commands. IL has small memory
   requirement but also slow execution speed.
   Dennis Ritchie used  compiler bootstrapping for the C compiler, like
   Niklaus Wirth did for the computer languages PASCAL, Modula and Oberon.
   The compiler itself is written in the language the compiler can
   translate. After an first iteration that needs external help, the
   compiler of iteration N can compile the source code that creates
   compiler of iteration N+1.
   The project is pure fun. It is just a "what if" you ask 30 years later.
   In 1976 the knowledge for BASICO was already present. The "[40]goto
   considered harmful" paper of Edsger W. Dijkstra was written in 1968.
   The first PASCAL compiler was written in 1971. The UNIX developers used
   compiler bootstrapping in 1973. The compiler tools [41]lex (Lesk, 1975)
   and [42]yacc (Johnson, 1974) were available.
   The word BASICO can be read as "[43]Beginner's All-purpose Symbolic
   Instruction Code zero" or as Basico, just something with an italian
   sound like lambrusco.
   Note: The project started with the name "Tiny BASIC no GOTO (TB-NG)",
   but BASICO sounds so much better. Basico is the name of a [44]small
   italian town (745 inhabitants).

BASICO syntax ideas

   Today the programming language C is the standard. Other languages are
   comments to C. PASCAL is C with nested procedures but without stdio
   library. C++ is C overkill. Modula-2 is C without printf but with
   import and export. Ada is PASCAL++ for the DoD. JAVA is C++ with
   garbage collection but byte-code. Tcl/Tk is no C at all, but a language
   with minimum syntax. Last but not least BASIC is still not dead, but
   has included a lot of syntax from other languages.

  GOTO considered harmful

   Today everybody agrees on the "one coding block shall have one entry
   and one exit" paradigma " to shorten the conceptual gap between the
   static program and the dynamic process, to make the correspondence
   between the program (spread out in text space) and the process (spread
   out in time) as trivial as possible". But in the same paper Dijkstra
   also wrote "I remember having read the explicit recommendation to
   restrict the use of the go to statement to alarm exits".
   Just deleting the goto statement and not introduce some other syntax
   for "alarm exits" does make programs less structured. Some alarm exits
   are
     * conditional break in a loop
     * multiple return in a function
     * conditional return stack un-rolling (try/throw/catch or
       setjmp/longjmp)

   The traditional structured control flow for the "alarm exit" problem
   needs additional state variables or special function return-values that
   remember the alarm-status. Specially for return stack un-rolling the
   dynamic control flow is hard to see out of the static program listing.

  Statement separator

   The original BASIC of 1964 used linefeed as statement terminator.
   PASCAL uses ; as statement separator, C uses ; as statement terminator.
   Beginners often forget this silent syntax element. BASIC later adopted
   the : as an additional statement separator. BASICO avoids silent syntax
   elements, therefore both linefeed and ; are used as statement
   terminators. With linefeed as statement terminator a long statement can
   not be split in two source code lines.  Such a statement needs a
   continuation symbol to undo the statement terminator effect of the
   linefeed. In Tcl/Tk the \ as last character before linefeed works as
   continuation symbol. BASICO follows Tcl/Tk syntax.

  Statement list

   PASCAL encloses statements between the keywords BEGIN and END. C uses {
   and } for the same purpose. These statement list keywords give trouble
   to the beginner. First, they are silent syntax elements, second the
   beginner has problems to see the difference between statement and
   statement list.
   The statement list keywords are not needed if the control flow keywords
   include them. One example is the WHILE condition DO statement-list WEND
   syntax. Another is the IF condition THEN statement-list ENDIF
   construct. The first statement-list is enclosed by DO and WEND, the
   second statement-list with THEN and ENDIF.

  Dangling else

   Dangling else is solved with the IF condition THEN statement-list ELSE
   statement-list ENDIF construct. With ENDIF there is no more dangling
   else shift/reduce conflict.

  Assignment and Equal

   BASIC uses = both for assigment and for test on equivalence, as it is
   mathematical tradition. Pascal uses := and =, C uses = and ==. As long
   as multiple assignment like a = b = c is not needed, the both meanings
   of = do not produce a conflict. The BASICO meaning of  a = b = c is a
   assign (b equal c), that is a becomes 1 if b is equal to c and 0 else.
   The author's all time favorite in simple C bugs, the nasty if (a = b)
   instead of if (a == b) is gone with BASICO.

  Evaluation of conditional expressions

   The symbols '<', '>=', .. are defined as conditional operators. The
   expression a >= b can result in 1 for true or 0 for false. There is no
   short circuit evaluation for conditional expressions. This is fine for
   today's CPUs that do not like jumps very much because of the execution
   queue flushing.

  Variable type declaration

   Pascal and C have strict variable types. Without declaration no
   variable can be used. BASIC needs a variable declaration only for array
   variables. Scalar variables can be used directly. BASICO follows Pascal
   and C. The variable types are integer, character, one-dimensional array
   of integer and one dimensional array of character.

  Automatic type conversion

   All calculations are done with integer. Therefore the char variables
   are promoted to integer. The BASICO char is an unsigned char. There are
   no negative char values.

  Function calls

   The C library functions like printf() or strcpy() can be called out of
   a BASICO program. The C call convention for the GNU C compiler is used.
   The function arguments are pushed from right to left on the stack. All
   arguments have the same size (32bit for GNU C on Intel x86). The return
   value in placed into a CPU register (eax for GNU C on Intel x86).

  Nested functions

   BASICO does not support nested functions. I even think that nested
   functions are a bad method for information hiding.

  Call-by-value and Call-by-reference

   Call-by-value is the method for int and char variables as function
   arguments: A copy of the variable is forwarded to the function. With
   call-by-value only input parameters are possible. Array variables are
   forwarded with call-by-reference: The memory location of the start of
   the array is given. This method allows arrays as input and output
   parameters.
   The return value of the function is one output parameter. To get more
   output parameters, the trick in BASICO is to use arrays as function
   arguments to hold these output parameters.

  Named constants

   Pascal has named constants with the const keyword. C uses the #define
   preprocessor command for constants. BASICO has no named constants (for
   now).

  Comment

   BASIC uses REM to start a one-line comment. PASCAL and C have
   multi-line comments with { } or /* */. C++ re-introduced the one-line
   comment with //. BASICO uses the C++ one-line comment.

  Compiler error handling

   The compiler stops at the first error. The source code line is shown up
   to the point where the scanner, parser or code-generator detected the
   bug and an error message is given. For the batch processing age
   compiler gurus out there: one error is all you get from this compiler.

BASICO syntax

   The following LL(1) context-free grammar in Extended Backus-Naur-Form
   (EBNF) for BASICO is final. It is based on the [45]PASCAL syntax. The
   syntax is free of shift/reduce conflicts. All variable declarations are
   before variable use. A simple LL(1) predictive parser needs one pass to
   translate the source. Syntax terminals are enclosed in ' ' or " " like
   '<' and "if". Capital letter non-terminals come from the scanner
   (CONST, IDENT, ..). The empty set is commented. The scanner translates
   some linefeeds into ';'. This trick makes the syntax semicolon-free.
   The EBNF was checked by [46]Coco/R a scanner and parser generator for
   LL(1) grammars from Hanspeter Mössenböck. In EBNF, [ ] is 0 to 1
   repetition, { } is 0 to n repetion, ( ) is grouping alternatives.
   basico =
     [ "var" { globalDecl ';' } ]
     { "func" IDENT '(' paramList ')' ':' returnType ';' blockOrForward
   ';' }
     .
   globalDecl =
     IDENT ':' (
       "int"
       | "char"
       | "array" CONST (
         "int"
         | "char"
       )
     ) .
   paramList =
     [ paramDecl { ',' paramDecl } ]
     .
   paramDecl =
     IDENT ':' (
       "int"
       | "char"
       | "array" (
         "int"
         | "char"
       )
     ) .

   returnType =
     "int"
     | "char"
     | "void"
     .

   blockOrForward =
     pblock3
     | "forward"
     .

   pblock3 =
     [ "var" { localDecl ';' } ]
     "begin" stmtList "end"
     .
   localDecl =
     IDENT ':' (
       "int"
       | "char"
       | "array" CONST (
         "int"
         | "char"
       )
     ) .

   stmtList =
     { [ statement ] ';' }
     .

   statement =
     IDENT (
       [ '[' expr ']' ] '=' expr
       | exprList
     )
     | "if" expr "then" stmtList [ "else" stmtList ] "endif"
     | "while" expr "do" stmtList "wend"
     | "return" [ expr ]
     | "break"
     .
   exprList =
     '(' [ expr { ',' expr } ] ')'
     .

   expr =
     addexpr { '=' addexpr
             | '#' addexpr
             | '<' [ '=' ] addexpr
             | '>' [ '=' ] addexpr
     } .

   addexpr =
     term {  '+' term
           | '-' term
           | '|' term
           | '^' term
     } .

   term =
     unaryfact { '*' unaryfact
               | '/' unaryfact
               | '%' unaryfact
               | '&' unaryfact
     } .

   unaryfact =
     '-' fact
     | '~' fact
     | fact
     .

   fact =
     IDENT [ '[' expr ']' | exprList ]
     | CONST
     | CHRCONST
     | STRCONST
     | '(' expr ')'
     .

BASICO example program

   This is the old "Guess a number" game in BASICO. The example program
   uses the C functions getchar(), printf(), time(),  srand() and rand().
   The function main() is the entry point like in C. This is the first
   BASICO game ever written. Please note the structured use of alarm exits
   with "return decimal" in getDecimal() and "break" in main().
   // guess.bas
   // Guess a number game
   // Compile with BASICO version 0.9
   func getDecimal():int
   var decimal: int
     ch: char
   begin
     decimal = 0;
     while 1 do
       ch = getchar()
       if (ch>='0')&(ch<='9') then
         decimal = decimal * 10 + ch - '0'
       else
         return decimal
       endif
     wend
   end
   func main():void
   var myNumber: int
     yourNumber: int
     guesses: int
     ch: char
   begin
     printf("Guess a number game\n")
     printf("Numbers are between 1 and 50\n")
     srand(time(0))  // get time in seconds since 1970, init Random
   Generator
     while 1 do
       myNumber = rand()     // get a random number
       myNumber = myNumber % 50 + 1
       guesses = 0
       while 1 do
         printf("Your guess: ")
         yourNumber = getDecimal()
         guesses = guesses + 1
         if yourNumber = myNumber then
           printf("You guessed it in %d guesses!\n", guesses)
           break
         else
           if yourNumber > myNumber then
             printf("Your guess is to high\n")
           else
             printf("Your guess is to low\n")
           endif
         endif
       wend
       printf("Another game (y or n): ")
       ch = getchar();
       if (ch='n')|(ch='N') then
         break
       endif
       getchar()     // eat \n
     wend
     printf("Goodbye.\n")
   end
   Compile program:
   ./basico<guess.bas >guess.s
   cc -o guess guess.s
   Run program:
   ./guess
   Guess a number game
   Numbers are between 1 and 50
   Your guess: 25
   Your guess is to high
   Your guess: 13
   Your guess is to low
   Your guess: 19
   Your guess is to high
   Your guess: 16
   You guessed it in 4 guesses!
   Another game (y or n): n
   Goodbye.

BASICO development strategy

   Compiler bootstrapping is done in several steps. We start with the
   development environment for one language, in our case C, and we end
   with the development environment for another language, BASICO. To
   minimize our effort, we only replace the C compiler with the BASICO
   compiler but keep the assembler, linker and library of the C
   tool-chain.
     * Define the BASICO source language. For compiler bootstrap you need
       character, character arrays (strings), integer and integer arrays.
       A recursive descent parser needs recursive function calls. Function
       calls need parameters (formal arguments), local variables, return
       values and call-by-reference in/out variables. See
       basico05_bnf.txt.
     * Define the BASICO target language. The output of a recursive
       descent parser (compiler) are op-codes for a stack-machine. These
       stack op-codes are further translated into op-codes (assembler
       listing) for a real register machine like the Intel x86 or the
       Renesas R8C. See x86.txt.
     * Implement a throw-away compiler that translates BASICO source into
       target op-codes. This initial compiler is done with lex and yacc.
     * Establish the tool-chain with BASICO throw-away compiler,
       assembler, linker and library.
     * Convert the BASICO grammer to LL(1) with the help of the CoCo/R
       scanner and parser generator.
     * Write the first version of the BASICO compiler in BASICO source
       code.
     * Translate the first BASICO compiler with the initial lex/yacc
       compiler into a binary program.
     * Now the development environment is complete. We have the compiler
       source, a throw-away compiler that can translate this source into a
       program. This program can compile it's own source.

Predictive Parser for Expression

   The following parser is an enhanced version of the infix to postfix
   translator in chapter 2.5 of "COMPILERS Principles, Techniques and
   Tools" by Aho, Sethi and Ullman. The program can only handle single
   digit numbers and can't skip white space, but does understand all
   single character BASICO operators like =, #, <, >, +, -, |, ^, *, /, %,
   &, ~, ( and ).
   var lookahead: int
   func main(): void
   begin
     lookahead = getchar()
     expr()
     putchar('\n')
   end
   func expr(): void
   begin
     addexpr()
     while 1 do
       if lookahead = '=' then
         match('='); addexpr(); putchar('=')
       else if lookahead = '#' then
         match('#'); addexpr(); putchar('#')
       else if lookahead = '<' then
         match('<'); addexpr(); putchar('<')
       else if lookahead = '>' then
         match('>'); addexpr(); putchar('>')
       else; break; endif; endif; endif; endif
     wend
   end
   func addexpr(): void
   begin
     term()
     while 1 do
       if lookahead = '+' then
         match('+'); term(); putchar('+')
       else if lookahead = '-' then
         match('-'); term(); putchar('-')
       else if lookahead = '|' then
         match('|'); term(); putchar('|')
       else if lookahead = '^' then
         match('^'); term(); putchar('^')
       else; break; endif; endif; endif; endif
     wend
   end
   func term(): void
   begin
     unaryfact();
     while 1 do
       if lookahead = '*' then
         match('*'); unaryfact(); putchar('*')
       else if lookahead = '/' then
         match('/'); unaryfact(); putchar('/')
       else if lookahead = '%' then
         match('%'); unaryfact(); putchar('%')
       else if lookahead = '&' then
         match('&'); unaryfact(); putchar('&')
       else; break; endif; endif; endif; endif
     wend
   end
   func unaryfact(): void
   begin
     if lookahead = '-' then
       match('-'); fact(); putchar('-')
     else if lookahead = '~' then
       match('~'); fact(); putchar('~')
     else; fact(); endif; endif
   end
   func fact(): void
   begin
     if lookahead = '(' then
       match('('); expr(); match(')')
     else if isdigit(lookahead) then
       putchar(lookahead); match(lookahead)
     else; error(); endif; endif
   end
   func match(t: int): void
   begin
     if lookahead = t then
       lookahead = getchar()
     else; error(); endif
   end
   func error(): void
   begin
     printf("syntax error\n")
     exit(1)
   end
   Compile program:
   ./basico04<rpn3.bas >rpn3.s
   cc -o rpn3 rpn3.s
   Run program:
   ./rpn3
   7=1+2*3
   7123*+=
   The infix expression 7 test for equal with 1 plus 2 multiply by 3 is
   translated into the postfix expression push 7, 1, 2, 3, then multiply 2
   by 3, add this with 1 and test this for equal with 7.

Examples for Compiler bootstrapping

     * Niklaus Wirth published the source code of many small languages
       like PL/0 in 1976,  [47]PASCAL-S in 1975 and [48]Oberon-0
       ([49]sources) in 1996. These languages can not bootstrap
       themselves. BASICO follows the PASCAL syntax.
     * [50]P4-Pascal and [51]UCSD-Pascal can bootstrap themselves.
     * Dennis Ritchie used compiler bootstrapping for the UNIX C compiler.
       Today the sources of the "[52]last1120c" compiler are on his home
       page. This compiler has no struct and creates PDP11/20 object code.
       Together with the UNIX version 1 binaries and the PDP11 emulator
       you can go back to 1973 and program like the old heros. Even the
       UNIX version 1 man pages are available again. BASICO follows the C
       semantics.
     * Jack W. Crenshaw published between 1988 and 1989 the [53]TINY
       language. He used characters instead of constants for keywords in
       the parser. This idea is used in BASICO, too. TINY can not
       bootstrap itself. BASICO follows TINY's code generation ideas.
     * [54]Euphoria is a BASIC like programming language. Programs can run
       interpreted or compiled with help of an Euphoria to C translator.
       There is a free Euphoria interpreter in Euphoria.
     * Edmund Grimley Evans wrote in 2001 "[55]Bootstrapping a simple
       compiler from nothing". He started on the pure binary machine code
       level. The language is Forth like.
     * Fabrice Bellard wrote in 2002 the [56]Obfuscated Tiny C Compiler
       (OTCC). Macro extended with gcc -E and pretty printed with indent
       the source is 463 lines and can compile itself on Linux into 80386
       machine code. See this masterpiece yourself ([57]readable otcc.c).

Symbol Table

   Niklaus Wirth used in PASCAL-S a two-dimensional symbol table. BASICO
   provides only one-dimensional arrays. The compiler program itself has
   to do the abstraction from 1-dim to 2-dim.
   The identifiers in the program can be keywords (if, then, else), global
   variable names, function names, function parameter names or local
   variable names. The search in the symbol table is keywords first, then
   local and parameter variables and last global variables and function
   names. These three namespaces (keywords, local, global) can be
   implemented with three symbol tables. Or in one symbal table that has
   three sections. The first approach is implemented.
   The variable name is the unique identifier in the symbol table to get
   and set the attributes of the variable. One attribute is the variable
   type like int, char, array-of-int and array-of char. Another is the
   storage type like global storage or local storage. One more attribute
   for parameter and local variables is the frame-pointer offset
   (frame-pointer relative memory location). See symtab.c for details.

Code Generator

   The recursive decent parser creates op-codes for a stack-machine.
   Normal microprocessors are register machines that can handle 1 to 3
   addresses per op-code. The Intel x86 chips or the Renesas R8C chip can
   handle 2 addresses. To bridge the gap between stack-machine and
   2-address machine two or more registers of the CPU are used as
   stack-cache, that is TOS (top of stack), NOS (next of stack), third of
   stack and fourth of stack are in registers. Instead of moving the
   contents of the CPU registers to perform the push and pop stack
   operations, the stack-machine cache labels (TOS, NOS, THIRD, FOURTH)
   are re-mapped. Therefore CPU register %eax can be NOS at one time and
   TOS at another time. See codegen.c for details. Some examples of code
   generation for the x86. The BASICO statement follows as comment the
   assembler code:
       movl    a, %eax
       movl    $1, %ebx
       subl    %ebx, %eax
       movl    %eax, x
   #   x=a-1
       movl    a, %eax
       movl    b, %ebx
       subl    %ebx, %eax
       movl    c, %ebx
       movl    d, %ecx
       subl    %ecx, %ebx
       movl    %edx, %esi
       cltd
       idivl   %ebx
       movl    %esi, %edx
       movl    %eax, x
   #   x=(a-b)/(c-d)
       movl    a, %eax
       negl    %eax
       movl    b, %ebx
       movl    c, %ecx
       movl    d, %edx
       imul    %edx, %ecx
       subl    %ecx, %ebx
       subl    %ebx, %eax
       movl    %eax, x
   #   x=-a-(b-c*d)
       movl    i, %eax
       movl    $1, %ebx
       addl    %ebx, %eax
       movl    n, %ebx
       movl    i, %ecx
       movl    bb(,%ecx,4), %ecx
       imul    %ecx, %ebx
       movl    %ebx, bb(,%eax,4)
   #   bb[i+1] = n*bb[i]
       movl    a, %eax
       movl    b, %ebx
       notl    %ebx
       andl    %ebx, %eax
       movl    a, %ebx
       notl    %ebx
       movl    b, %ecx
       andl    %ecx, %ebx
       orl    %ebx, %eax
       movl    %eax, x
   #   x=a&~b|~a&b

Peephole optimization

   [58]Wiktionary definition: An optimization that works by eliminating
   redundant instructions from a small area of code.
   Some examples of possible peephole optimization for the Basico
   compiler:

   original code
   optimized code
   comment
       movl    $0, %eax
       movl    %eax, -12(%ebp)     movl    $0, -12(%ebp) The x86 CPU can
   move a constant to a memory location.
       movl    $48, %ebx
       cmpl    %ebx, %eax
       cmpl    $48, %eax
   The x86 CPU can compare a constant with a register.
       movl    $48, %ebx
       subl    %ebx, %eax
       subl    $48, %eax
   The x86 CPU can subtract a constant from a register.
       movl    $0, %eax
       pushl   %eax
       pushl   $0
   The x86 CPU can push a constant.
       cmpl    %ebx, %eax
       sete    %al
       movzbl  %al, %eax
       andl    %eax, %eax
       jz      .L8
       cmpl    %ebx, %eax
       jnz     .L8 The compare op code sets the flags.
       cmpl    %ebx, %eax
       setg    %al
       movzbl  %al, %eax
       andl    %eax, %eax
       jz      .L10
       cmpl    %ebx, %eax
       jng    .L10
   The compare op code sets the flags.

   A peephole optimizer uses pattern matching to identify a code segment.

BASICO sources

   The source code of BASICO version 0.8 is discussed in detail. The
   findings of this code review are implemented in BASICO version 0.9.
   [59]BASICO source code reading

  Version 0.9

   Here are the bugfixes as mentioned in the source code reading.
   [60]BASICO version 0.9

  Version 0.8

   The statements of a function that is only called from one location is
   copied to this location (inline the function). This reduced the source
   lines to 975. The  stripped throw-away compiler binary is 16432 bytes
   long and needs 46ms to process the basico.bas input file, the BASICO
   version is 18124 bytes long and needs 82ms. The GNU C compiler was
   using -O1 optimization. The simple idea of using CPU registers as stack
   machine cache pays of very well.
   [61]BASICO version 0.8

  Version 0.7

   The parser calls now the code generation functions. The BASICO compiler
   is written in BASICO. Compiler bootstrapping is possible. For the input
   file basico.bas the BASICO version 0.5 compiler produces the same
   assembler code as the BASICO 0.7 compiler. The compiler is 1279 source
   lines long.
   [62]BASICO version 0.7

  Version 0.6

   Now scanner, symbol table and parser are in BASICO. The parser can
   successful parse itself. The BASICO parser was transcribed from the
   CoCo/R Parser.cpp output file of the basico06.atg grammar input file.
   [63]BASICO version 0.6

  Version 0.5

   The compiler components scanner and symbol table are now available in
   BASICO. The BASICO version scanner.bas was created by transcribing
   scanner.c. The input file scanner.bas produces the same output file
   with the BASICO version and the C version of the scanner - like it
   should be. The compiled C version is 5844 bytes long and needs 12ms to
   process the scanner.bas input file, the BASICO version is 9928 bytes
   long and needs 22ms.
   Version 0.51 combines the infix to postfix translator in BASICO with
   the scanner and symbol table. This is an intermediate step to the full
   BASICO parser in BASICO.
   [64]BASICO version 0.51
   [65]BASICO version 0.5

  Version 0.4

   The scanner is now written in C. For easy re-writing of the scanner in
   BASICO only a subset of C was used. The BASICO source is copied as
   comment into the output assembler listing.
   Version 0.41 has the infix to postfix translator examples.
   [66]BASICO version 0.41
   [67]BASICO version 0.4

  Version 0.3

   This version uses symbol tables. Now different variable types and
   different storage types are working. Even call-by-reference for arrays
   as function parameters is implemented. The first version of our
   throw-away compiler is ready. Now work moves from back-end (code
   generation) to front-end (scanner, parser).
   In version 0.31 the lex scanner translates the keywords into
   characters. The bison parser uses these characters as tokens. These
   changes prepare the parser for the new scanner. Another detail is the
   new handling of <= and >=. The boolean operator ^ for xor is new. The
   not operator is now ~ as in C.
   [68]BASICO version 0.31
   [69]BASICO version 0.3

  Version 0.2

   This version can compile expressions with global integer and global
   array-of-integer variables. Function call with return value is
   possible. Function call to C library functions with 3 parameters
   maximum is working, too.  The central missing element is the symbol
   table. The symbol table tells the type of the variable like char
   variable and/or local variable. Op-code generation for char and local
   variables is missing, too. But we have reached some level of "Tiny
   BASIC Compiler".
   Version 0.21 allows comments after a statement.
   [70]BASICO version 0.21
   [71]BASICO version 0.2

  Version 0.1

   This version uses lex and yacc to implement the parser of the
   throw-away compiler. The parser does not only check that the syntax is
   okay, but does emit some semantic information. Some very first ideas on
   symbol table and code generation are included too.
   [72]BASICO version 0.1

Books

   Compiler bauen mit UNIX
   Introduction to compiler construction with UNIX
   Axel-Tobias Schreiner with H. G. Friedman, Jr.,
   german 1985, Hanser, München, ISBN 3-446-14359-9;
   englisch 1985, Prentice-Hall Software Series, New York, ISBN
   0-13-474396-2;
   The book is no longer in print. But you can download the [73]example
   programs here.
   The UNIX Programming Environment
   Der UNIX - Werkzeugkasten. Programmieren mit UNIX
   Brian W. Kernighan, Rob Pike
   german 2002, Hanser, München, ISBN 978-3446142732;
   english 1984, Prentice Hall, Inc., ISBN 0-13-937681-X
   The english version of the book is still in print. The [74]example
   programs are here.

References

   1. http://www.andreadrian.de/tbng/BASICO_part1.html
   2. http://www.andreadrian.de/tbng/#mozTocId868038
   3. http://www.andreadrian.de/tbng/#mozTocId936395
   4. http://www.andreadrian.de/tbng/#mozTocId571548
   5. http://www.andreadrian.de/tbng/#mozTocId888819
   6. http://www.andreadrian.de/tbng/#mozTocId9185
   7. http://www.andreadrian.de/tbng/#mozTocId738865
   8. http://www.andreadrian.de/tbng/#mozTocId972790
   9. http://www.andreadrian.de/tbng/#mozTocId769710
  10. http://www.andreadrian.de/tbng/#mozTocId41406
  11. http://www.andreadrian.de/tbng/#mozTocId528848
  12. http://www.andreadrian.de/tbng/#mozTocId696380
  13. http://www.andreadrian.de/tbng/#mozTocId60605
  14. http://www.andreadrian.de/tbng/#mozTocId667307
  15. http://www.andreadrian.de/tbng/#mozTocId798052
  16. http://www.andreadrian.de/tbng/#mozTocId300437
  17. http://www.andreadrian.de/tbng/#mozTocId334313
  18. http://www.andreadrian.de/tbng/#mozTocId35039
  19. http://www.andreadrian.de/tbng/#mozTocId276292
  20. http://www.andreadrian.de/tbng/#mozTocId343981
  21. http://www.andreadrian.de/tbng/#mozTocId802188
  22. http://www.andreadrian.de/tbng/#mozTocId636454
  23. http://www.andreadrian.de/tbng/#mozTocId460571
  24. http://www.andreadrian.de/tbng/#mozTocId973061
  25. http://www.andreadrian.de/tbng/#mozTocId333375
  26. http://www.andreadrian.de/tbng/#mozTocId865845
  27. http://www.andreadrian.de/tbng/#mozTocId476164
  28. http://www.andreadrian.de/tbng/#mozTocId201449
  29. http://www.andreadrian.de/tbng/#mozTocId271148
  30. http://www.andreadrian.de/tbng/#mozTocId36461
  31. http://www.andreadrian.de/tbng/#mozTocId843374
  32. http://www.andreadrian.de/tbng/#mozTocId423416
  33. http://www.andreadrian.de/tbng/#mozTocId327801
  34. http://www.andreadrian.de/tbng/#mozTocId625187
  35. http://www.andreadrian.de/tbng/#mozTocId954198
  36. http://www.andreadrian.de/tbng/#mozTocId647194
  37. http://www.andreadrian.de/tbng/#mozTocId63465
  38. http://www.andreadrian.de/tbng/#mozTocId98862
  39. http://www.ddj.com/184406381
  40. http://www.acm.org/classics/oct95/
  41. http://dinosaur.compilertools.net/
  42. http://dinosaur.compilertools.net/
  43. http://www.bitsavers.org/pdf/dartmouth/BASIC_Oct64.pdf
  44. http://www.comune.basico.me.it/
  45. http://www.db.informatik.uni-kassel.de/Help/pascal/einfuehrung/pas_bnf.html
  46. http://ssw.jku.at/Coco/
  47. http://www.246.dk/pascals.html
  48. http://www.oberon.ethz.ch/books.html
  49. http://www.cs.inf.ethz.ch/%7Ewirth/books/CompilerConstruction/
  50. http://homepages.cwi.nl/%7Esteven/pascal/
  51. ftp://ftp.freepascal.org/pub/fpc/attic/ucsd-pascal/
  52. http://cm.bell-labs.com/who/dmr/
  53. http://compilers.iecc.com/crenshaw/
  54. http://www.rapideuphoria.com/index.html
  55. http://www.rano.org/bcompiler.html
  56. http://fabrice.bellard.free.fr/otcc/
  57. http://www.andreadrian.de/tbng/otcc1.c
  58. http://en.wiktionary.org/wiki/peephole_optimization
  59. http://www.andreadrian.de/tbng/BASICO_part1.html
  60. http://www.andreadrian.de/tbng/basico_20060715.tgz
  61. http://www.andreadrian.de/tbng/basico_20060629.tgz
  62. http://www.andreadrian.de/tbng/basico_20060628.tgz
  63. http://www.andreadrian.de/tbng/basico_20060625a.tgz
  64. http://www.andreadrian.de/tbng/basico_20060621.tgz
  65. http://www.andreadrian.de/tbng/basico_20060620.tgz
  66. http://www.andreadrian.de/tbng/basico_20060618.tgz
  67. http://www.andreadrian.de/tbng/basico_20060611h.tgz
  68. http://www.andreadrian.de/tbng/basico_20060611b.tgz
  69. http://www.andreadrian.de/tbng/basico_20060610c.tgz
  70. http://www.andreadrian.de/tbng/basico_20060609.tgz
  71. http://www.andreadrian.de/tbng/basico_20060605b.tgz
  72. http://www.andreadrian.de/tbng/tbng01_20060528.tgz
  73. http://www.cs.rit.edu/%7Eats/books/index.html
  74. http://cm.bell-labs.com/cm/cs/upe/