summaryrefslogtreecommitdiff
path: root/README
diff options
context:
space:
mode:
authorAndreas Baumann <mail@andreasbaumann.cc>2017-01-01 19:31:12 +0100
committerAndreas Baumann <mail@andreasbaumann.cc>2017-01-01 19:31:12 +0100
commit69fe7b182a1eedfb75c611f7dd35fa60200426f4 (patch)
tree329a0c6cc9b06c23d8782ece09f0f7dfa9b16b13 /README
downloadcompilertests-69fe7b182a1eedfb75c611f7dd35fa60200426f4.tar.gz
compilertests-69fe7b182a1eedfb75c611f7dd35fa60200426f4.tar.bz2
initial checkin
Diffstat (limited to 'README')
-rw-r--r--README96
1 files changed, 96 insertions, 0 deletions
diff --git a/README b/README
new file mode 100644
index 0000000..7e41d02
--- /dev/null
+++ b/README
@@ -0,0 +1,96 @@
+Wirth
+-----
+
+BNF -> EBNF: regular
+interpretation always rests on the syntactical analysis
+every sentence in language has exactly one syntactical structure,
+languages are preferrably context-free
+regular languages have no recursion
+example with why 815xx can't be an identifier => is not regular!
+regular languages can be checked with a finite state machine
+EBNF can be directly translated into code, this makes the charm of the solution
+(see yacc/bison)
+lexicographical analysis of concrete representation (example charset), next
+step is checking the syntax of the abstract terminal symbols
+-> scanner (regular language) and parser pattern (context free language)
+usual blank, space rule -> it's a scanner thing, seperates symbols, but has
+otherwise no meaning later (famous bad counterexample is python!)
+
+non-terminals get to (recursive) functions
+push down automaton for the context-free language
+recursive descent parsing, goal-oriented parsing
+the stack serves as "things we will takle later"
+one-symbol-lookahead
+
+LL: top-down parsing, recursive descending
+LL(n): n look-ahead top-down parser, cocor
+LL(1) being most common (1 symbol look-ahead)
+
+LR: bottom up parsing
+LALR(1): yacc
+LR(1): most general, but most complex
+LR(n): not practical
+
+EBNF syntax: Wirth/Algol or http://www.ietf.org/rfc/rfc2234.txt
+
+along to attributed grammar:
+type systems
+semantics
+code generation
+
+Aho
+---
+
+analysis:
+- check syntax
+- check semantics
+- symbol table
+- intermediate representation
+synthesis:
+- produce final target
+
+- syntax tree
+- intermediare machine-independent format
+- optimizations
+- code generation: optimized intermed format -> target language
+
+- multi-core, parallelism, memory hierarchies
+
+- static vs. dynamic scope => here the static keyword comes from!
+
+- environment: names to l-values
+- state: l-values to r-values
+
+- qualified names, e.g. "x.y"
+
+- declararation: about the type of things
+- definition: about the value of things
+
+- scopes
+ - declarations within block scopes
+- visibility of methods
+- dynamic scope
+ - prepocessor
+ - methods, RTTI (not ML)
+
+- call-by-value
+ - Java defacto, but many are disguised pointers, no choice,
+ primitives are by value, String/Array/objects by reference
+- call-by-reference
+ - for bigger structures
+- call-by-name: dead Algol60
+- aliases of parameters
+- syntax-directed translation scheme
+ - attributed grammars, attaching properties to nodes
+ - synthesized attributes
+ - attributed grammars with code fragments
+- descending, predictive parser, classical approach, when
+ written by hand
+- white spaces should be in the lexer/scanner, not in the parser/grammar!
+ Too complex otherwise (where is a space allowed in which part of
+ the production, tons of ambiguities)
+- keywords are reserved identifiers, easier to handle
+
+- vms
+ - https://github.com/tekknolagi/carp
+ - https://github.com/jakogut/tinyvm