diff options
author | Andreas Baumann <mail@andreasbaumann.cc> | 2017-01-01 19:31:12 +0100 |
---|---|---|
committer | Andreas Baumann <mail@andreasbaumann.cc> | 2017-01-01 19:31:12 +0100 |
commit | 69fe7b182a1eedfb75c611f7dd35fa60200426f4 (patch) | |
tree | 329a0c6cc9b06c23d8782ece09f0f7dfa9b16b13 /README | |
download | compilertests-69fe7b182a1eedfb75c611f7dd35fa60200426f4.tar.gz compilertests-69fe7b182a1eedfb75c611f7dd35fa60200426f4.tar.bz2 |
initial checkin
Diffstat (limited to 'README')
-rw-r--r-- | README | 96 |
1 files changed, 96 insertions, 0 deletions
@@ -0,0 +1,96 @@ +Wirth +----- + +BNF -> EBNF: regular +interpretation always rests on the syntactical analysis +every sentence in language has exactly one syntactical structure, +languages are preferrably context-free +regular languages have no recursion +example with why 815xx can't be an identifier => is not regular! +regular languages can be checked with a finite state machine +EBNF can be directly translated into code, this makes the charm of the solution +(see yacc/bison) +lexicographical analysis of concrete representation (example charset), next +step is checking the syntax of the abstract terminal symbols +-> scanner (regular language) and parser pattern (context free language) +usual blank, space rule -> it's a scanner thing, seperates symbols, but has +otherwise no meaning later (famous bad counterexample is python!) + +non-terminals get to (recursive) functions +push down automaton for the context-free language +recursive descent parsing, goal-oriented parsing +the stack serves as "things we will takle later" +one-symbol-lookahead + +LL: top-down parsing, recursive descending +LL(n): n look-ahead top-down parser, cocor +LL(1) being most common (1 symbol look-ahead) + +LR: bottom up parsing +LALR(1): yacc +LR(1): most general, but most complex +LR(n): not practical + +EBNF syntax: Wirth/Algol or http://www.ietf.org/rfc/rfc2234.txt + +along to attributed grammar: +type systems +semantics +code generation + +Aho +--- + +analysis: +- check syntax +- check semantics +- symbol table +- intermediate representation +synthesis: +- produce final target + +- syntax tree +- intermediare machine-independent format +- optimizations +- code generation: optimized intermed format -> target language + +- multi-core, parallelism, memory hierarchies + +- static vs. dynamic scope => here the static keyword comes from! + +- environment: names to l-values +- state: l-values to r-values + +- qualified names, e.g. "x.y" + +- declararation: about the type of things +- definition: about the value of things + +- scopes + - declarations within block scopes +- visibility of methods +- dynamic scope + - prepocessor + - methods, RTTI (not ML) + +- call-by-value + - Java defacto, but many are disguised pointers, no choice, + primitives are by value, String/Array/objects by reference +- call-by-reference + - for bigger structures +- call-by-name: dead Algol60 +- aliases of parameters +- syntax-directed translation scheme + - attributed grammars, attaching properties to nodes + - synthesized attributes + - attributed grammars with code fragments +- descending, predictive parser, classical approach, when + written by hand +- white spaces should be in the lexer/scanner, not in the parser/grammar! + Too complex otherwise (where is a space allowed in which part of + the production, tons of ambiguities) +- keywords are reserved identifiers, easier to handle + +- vms + - https://github.com/tekknolagi/carp + - https://github.com/jakogut/tinyvm |