initial checkin

author: Andreas Baumann <mail@andreasbaumann.cc> 2017-01-01 19:31:12 +0100
committer: Andreas Baumann <mail@andreasbaumann.cc> 2017-01-01 19:31:12 +0100
commit: 69fe7b182a1eedfb75c611f7dd35fa60200426f4 (patch)
tree: 329a0c6cc9b06c23d8782ece09f0f7dfa9b16b13 /README.Aitken
download: compilertests-69fe7b182a1eedfb75c611f7dd35fa60200426f4.tar.gz
compilertests-69fe7b182a1eedfb75c611f7dd35fa60200426f4.tar.bz2
1 files changed, 205 insertions, 0 deletions
diff --git a/README.Aitken b/README.Aitken
new file mode 100644
index 0000000..35648e5
--- /dev/null
+++ b/README.Aitken
@@ -0,0 +1,205 @@
+Cool
+spim
+string -> lexical -> token = <class,string>
+single characters token classes (brakets, semicolon)
+= assignment token, but == operator class
+words of a program. lexems
+<token class, lexeme> -> token
+lexical systems should have a minimal lookahead (1 character is normal)
+- ahead examples: == and =
+- T<T> vs >>. V<W<T>>
+regular languages
+-> sigma, alphabet, sigma* (kleen)
+- regular expression: e, 1 char, union, concatenation, iteration (kleen)
+- regular language for the set of strings in tokens
+  - epsilon and on character-only language
+  - union, concat, iteration
+- format languages
+- formal lanugae
+- meaning function L(e)=M
+  - map something in syntax to semantics
+  - L: expression -> sets(string)
+  - semantic the same, syntax is arabic vs roman numbers, very good example
+  - mulitple expression usually have the same meaning/semantic
+  - L: N:1 function, basis of optimization
+- regex for parsing
+- deducelt
+- lexical specification
+  - monster regex
+  - maximal munch, also choose the longer token
+  - ambiguities: keyword vs. identifier
+    => priority ordering (firstly defined)
+  - no rules matches: make an error production
+  - finite automata, implementation for regular expressions
+  - implementation of lexical analyzers using regular expressions
+    - epsilon-move, state transition without input
+      DFA have no choice, no epsilon-moves, one transition per input per state
+      deterministic
+      NFA: non-deterministic
+    - DFA: 2-dim array with state and input as dimensions, resulting state
+      as matrix element, share duplicate state transitions in lists of
+      vectors lists per input,
+    - table-driven DFA
+    - no NFA to DFA transition at all
+    - table-driven NFA, 2-dim matrix
+    - CoCo/R: DFA or NFA, which switches
+- chapter 5
+  - regular languages can only count mod k
+  - context-free grammars (CFG)
+  - CFG gives us: program valid or not, nothing more
+  - there is a N:1 relation from grammar to language, quite likely
+    to fit the grammar to the specific tools at hand
+  - parse tree derivation with production rules
+    N derivations define a parse tree of a language
+    left-mode derivation, right-mode derivation
+  - ambiguous grammer
+    - rewrite grammar
+    - prioritize production rules
+    - example: expression -> term and factor
+    - classical optional else and nested if problem
+      else -> closest then
+    - ambiguous grammar, precedence and associativity (%left +)
+      -> more natural language, used in tools
+    - Oberon and hand-written parsers use the first approach
+  - errors: lexical, syntax, semantical
+    currently used:
+    - panic mode (bison: error, skip)
+    - error productions (e.g gcc warnings about problematic constructs)
+  - we don't want the parse tree, we need AST, abstract syntax tree
+    AST is simpler than the parse tree, algorithms get easier on an AST
+  - top-down
+  - bottom-up
+  - rd parser (recursive descendant): most common one, very easy to
+    write by hand. top-down
+    - very easy to implement
+    - left recursion -> use right recursion (rewrite grammar), must
+      have terminals in productions to make the problem decidabale
+      S -> Sa | b1 | b2 | .. | bn
+      =>
+      S -> b1 S' | b2 S' | ... | bn S'
+      S' -> a S'
+      -> generic case is loops of non-terminals inducing left-recusion
+      => parser generators can do this automatically making the
+         grammar human-readable!
+      => nevertheless done by hand to keep control over semantics
+    - gcc used bison/flex, now a hand-written recursive decent
+  - predictive parsing, top-down, no backtracking, LL(k)-grammar
+    left-to-right-scan, left-most-derivation, k: k tokens look-ahead
+    practice k=1, LL(1)
+    - needs left-factoring
+    - table-constructable
+    - Coco/R is LL(k)
+    - LL(1) table construction
+  - things we cannot do with parsers alone:
+    - checking whether identifiers are declared before used
+    - checking validity of function parameters and if they fit to
+      their declaration
+  - first sets first(A)
+    t follow(A)
+    => construct LL(1) parsing table
+  - LL(1) can be checked by double entries in the parsing table
+    - non-left factored
+    - left-recursive
+    - ambiguous
+    - k>1 lookahead
+    - ...
+    - most programming languages are not LL(1)
+  - bottom-up, as efficient and more general, this could explain why
+    LR and LALR are more popular today
+    - work on left-recrusve grammars
+    - reduction instead of productions
+    - right-most derivation in reverse
+    - shift-reduce-parsing is the general form
+      - shift-reduce conflicts, happens
+      - reduce-reduce conflicts, grammar problem
+    - reduce only at handles, otherwise shift
+    - handle are places on the top of the stack which still
+      lead to the start symbol by n reductions
+    - algorithm is very easy compared to LL(1), but how to
+      find the handles in the grammar is tricky
+  - LR(k) grammars, used in practice LALR(k) grammars,
+    SLR(k) grammar is even more restrictive
+  - set of viable prefixes (handles) is a regular language
+  - LR(0) items (dots in the grammar)
+  - stack contains stacked prefixes of rhs of productions
+    we are currently working on, so if one matches we are
+    guarnateed we don't get stuck and can append it to
+    the stack of prefixes
+  - NFA on the stack of the shift/reduce parser, every state
+    is accepting, e-transitions are when we hope for another
+    production to produce a valid reduction, seeing terminals
+    or non-terminals on the stack leads to a state, where the
+    symbol has been consumed.
+  - canonical collections of LR(0) items, DFA of the NFA
+  - LR(0) parsing 
+    - if X->S in DFA(stack) -> reduce, S is terminal or non-termonal
+    - if X->t in DFA(stack) -> shift, consume t
+    - reduce/reduce conflict: N choices for a reduction to pick
+    - shift/reduce conflict: cannot decide whether to shift or reduce
+    - LR(0) -> SLR, avoid conflict by seing beyong he stack,
+      don't recude if the terminal is not in a follow statte of
+      a production
+    - SLR grammar is defined by heuristical algorithm, not precise
+    - SLR precedence declarations, E*E higher than E+E so we can
+      reduce the * instead of shifting the +.
+      => should be named shift/reduce conflict resolution rules
+    - the tools show the parsing automaton, so we see were we
+      have conflicts in the grammar
+    - in practice SLR(1)
+    - having eos always in follow(X) means if we are out of output
+      our only hope is to reduce.
+  - SLR improvements
+    - remember state of each stack prefix
+  - SLR(1) again cannot handle certain situations in real grammars:
+    - LR(1) is used: lookahead of one is one item + one symbol in
+      input instead of just the one symbol and the follow sets
+      those are the things we see in yacc
+- semantic analysis
+  - scoping
+    - static scoping won over dynamic scoping
+  - types
+    - classes == types in modern OOP languages
+    - statically typed (C, C++, Java, well mostly, unsafe casts for exceptions)
+    - dynamically typed (Python)
+    - untyped (machine code)
+    - declarations of types for every identifier
+    - infer types for all expressions
+    - type checking and type inference
+  - logical rules of inference
+    - find best valid rule, "34" is int is better than "34" is object, though latter is also valid
+    - no external data for proving an inference rule
+    - type environment , types for free variables
+    - least upper bound of two subtypes
+  - dynamic type of an object at runtime, compile time type (static time)
+    x:A, later x = new B (static type A, dynamic type B)
+  - SELF_TYPE, return value can change at runtime
+    - type checking environemnt is O,M,C (C for SELF_TYPE)
+    - x@T: static dispatch
+    - academic to show the concepts, in practice not important
+  - type errors:
+    - assign Object to unknown types => cascading not-understandable errors
+    - No_type: internal type, subtype of all classes, every operation defined and
+      results in No_type => not a tree, harder to implemnent
+- front-end, lexer, parser, semantic analysis
+- back-end
+  - compile-time and runtime correlation
+  - code, data arangment
+  - code genreation
+    - correct and fast
+    - method activations
+      - exceptions, corooutines, call/cc
+  - data
+    - stack of activations (currently running procedure frame)
+      activation record (AR) or frame
+    - AR: no control link as we know where we are (array as stack), also return
+      values are usually passed in registers
+    - frame contents is in registers for real compilers
+    - global, statically allocated, also new objects (heap)
+    - classic code (text), data, stack and heap
+    - alignment is important, SIBGUS SPARC, ARM we choose, Intel allowed but performance penalty
+      - padding
+  - code generation
+    - stack machines
+    - register machine
+    - 1-register stack machine, accumulator, acc is the 1-position extensions of the top of the stack
+  
+\ No newline at end of file
author	Andreas Baumann <mail@andreasbaumann.cc>	2017-01-01 19:31:12 +0100
committer	Andreas Baumann <mail@andreasbaumann.cc>	2017-01-01 19:31:12 +0100
commit	69fe7b182a1eedfb75c611f7dd35fa60200426f4 (patch)
tree	329a0c6cc9b06c23d8782ece09f0f7dfa9b16b13 /README.Aitken
download	compilertests-69fe7b182a1eedfb75c611f7dd35fa60200426f4.tar.gz compilertests-69fe7b182a1eedfb75c611f7dd35fa60200426f4.tar.bz2