Wirth ----- BNF -> EBNF: regular interpretation always rests on the syntactical analysis every sentence in language has exactly one syntactical structure, languages are preferrably context-free regular languages have no recursion example with why 815xx can't be an identifier => is not regular! regular languages can be checked with a finite state machine EBNF can be directly translated into code, this makes the charm of the solution (see yacc/bison) lexicographical analysis of concrete representation (example charset), next step is checking the syntax of the abstract terminal symbols -> scanner (regular language) and parser pattern (context free language) usual blank, space rule -> it's a scanner thing, seperates symbols, but has otherwise no meaning later (famous bad counterexample is python!) non-terminals get to (recursive) functions push down automaton for the context-free language recursive descent parsing, goal-oriented parsing the stack serves as "things we will takle later" one-symbol-lookahead LL: top-down parsing, recursive descending LL(n): n look-ahead top-down parser, cocor LL(1) being most common (1 symbol look-ahead) LR: bottom up parsing LALR(1): yacc LR(1): most general, but most complex LR(n): not practical EBNF syntax: Wirth/Algol or http://www.ietf.org/rfc/rfc2234.txt along to attributed grammar: type systems semantics code generation Aho --- analysis: - check syntax - check semantics - symbol table - intermediate representation synthesis: - produce final target - syntax tree - intermediare machine-independent format - optimizations - code generation: optimized intermed format -> target language - multi-core, parallelism, memory hierarchies - static vs. dynamic scope => here the static keyword comes from! - environment: names to l-values - state: l-values to r-values - qualified names, e.g. "x.y" - declararation: about the type of things - definition: about the value of things - scopes - declarations within block scopes - visibility of methods - dynamic scope - prepocessor - methods, RTTI (not ML) - call-by-value - Java defacto, but many are disguised pointers, no choice, primitives are by value, String/Array/objects by reference - call-by-reference - for bigger structures - call-by-name: dead Algol60 - aliases of parameters - syntax-directed translation scheme - attributed grammars, attaching properties to nodes - synthesized attributes - attributed grammars with code fragments - descending, predictive parser, classical approach, when written by hand - white spaces should be in the lexer/scanner, not in the parser/grammar! Too complex otherwise (where is a space allowed in which part of the production, tons of ambiguities) - keywords are reserved identifiers, easier to handle - vms - https://github.com/tekknolagi/carp - https://github.com/jakogut/tinyvm - error handling in compilers - in times of punch cards or C++ compiling 1000nds of lines it's essential to get as many error reports per compiler run as possible. Modular linking and language should result in <1000 lines code compiles, so redoing it should really not be a problem on nowadays computers.