summaryrefslogtreecommitdiff
path: root/README
blob: d2b53bdca8a17cd986a6b90ad8d447b3cf687869 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
Wirth
-----

BNF -> EBNF: regular
interpretation always rests on the syntactical analysis
every sentence in language has exactly one syntactical structure,
languages are preferrably context-free
regular languages have no recursion
example with why 815xx can't be an identifier => is not regular!
regular languages can be checked with a finite state machine 
EBNF can be directly translated into code, this makes the charm of the solution
(see yacc/bison)
lexicographical analysis of concrete representation (example charset), next
step is checking the syntax of the abstract terminal symbols
-> scanner (regular language) and parser pattern (context free language)
usual blank, space rule -> it's a scanner thing, seperates symbols, but has
otherwise no meaning later (famous bad counterexample is python!)

non-terminals get to (recursive) functions
push down automaton for the context-free language
recursive descent parsing, goal-oriented parsing
the stack serves as "things we will takle later"
one-symbol-lookahead

LL: top-down parsing, recursive descending
LL(n): n look-ahead top-down parser, cocor
LL(1) being most common (1 symbol look-ahead)

LR: bottom up parsing
LALR(1): yacc
LR(1): most general, but most complex
LR(n): not practical

EBNF syntax: Wirth/Algol or http://www.ietf.org/rfc/rfc2234.txt

along to attributed grammar:
type systems
semantics
code generation

Aho
---

analysis:
- check syntax
- check semantics
- symbol table
- intermediate representation
synthesis:
- produce final target

- syntax tree
- intermediare machine-independent format
- optimizations
- code generation: optimized intermed format -> target language

- multi-core, parallelism, memory hierarchies

- static vs. dynamic scope => here the static keyword comes from!

- environment: names to l-values
- state: l-values to r-values

- qualified names, e.g. "x.y"

- declararation: about the type of things
- definition: about the value of things

- scopes
  - declarations within block scopes
- visibility of methods
- dynamic scope
  - prepocessor
  - methods, RTTI (not ML)

- call-by-value
  - Java defacto, but many are disguised pointers, no choice,
    primitives are by value, String/Array/objects by reference
- call-by-reference
  - for bigger structures
- call-by-name: dead Algol60
- aliases of parameters
- syntax-directed translation scheme
  - attributed grammars, attaching properties to nodes
    - synthesized attributes
  - attributed grammars with code fragments
- descending, predictive parser, classical approach, when
  written by hand
- white spaces should be in the lexer/scanner, not in the parser/grammar!
  Too complex otherwise (where is a space allowed in which part of
  the production, tons of ambiguities)
- keywords are reserved identifiers, easier to handle

- vms
  - https://github.com/tekknolagi/carp
  - https://github.com/jakogut/tinyvm

- error handling in compilers
  - in times of punch cards or C++ compiling 1000nds of lines it's essential
    to get as many error reports per compiler run as possible. Modular linking
    and language should result in <1000 lines code compiles, so redoing it
    should really not be a problem on nowadays computers.