1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
|
#[1]alternate
[2]Home[3]Books[4]About
[5]C[6]compiler
2016-02-06
Let's Build a C Compiler(0) -- Preface
Table of Contents
1. [7]1. Why Compiler Theory
2. [8]2. Hard to understand, hard to implement?
3. [9]3. Original intention is for self-practicing
4. [10]4. Caution
5. [11]5. References
EDIT: Note that I've include the full tutorial in the project
[12]write-a-C-interpreter. Please check that instead.
In "Let's Build a C Compiler" series, we will build a compiler from
scratch for C programming language. I hope you will get some
understanding of compiler construction by the end of this tutorial. At
the same time we will build a usable compiler of C though some syntaxes
are not supported.
Note that it is actually an Interpreter and can interpret itself. I use
the word "compiler" because it is more attractive, but we did more than
that. Also this series is actually written in Chinese in the first
place, If you are confused by my English, please leave some comments.
In this very first chapter there will not be any code. If you are those
that likes code instead of texts, please skip. I'll talk about the
intention of this series.
Why Compiler Theory
What is the most important courses in computer science? I would give
"Data Structure", "Algorithm" and "Compiler Theory". In my point of
view, understanding recursion is the first level for programmers, and
writing a compiler is the next one.
(Of course, there exists a lot of excellent programmers that don't
write a compiler, but at least writing one is a big challenge)
People used to say that you can write more efficient code if you know
how the compiler works. But who cares when the modern computers have
performance so high that we can hardly imagine before? Then why bother
with compiler theory?
Because it is cool!
OK, admit it, you are still reading mainly because you are curious how
far would I go with this tutorial. But be careful, it will go quite
far.
No? You just want to know how to build a compiler? OK then... my
mistake.
Hard to understand, hard to implement?
I have always been in awe of compiler. So when I went to college and
they taught compiler theory, I was so enthusiastic! And then... then I
quit, because I could not understand a single part.
Normally a course about compiler will cover:
1. How to represent syntax (such as BNF, etc.)
2. Lexer, with somewhat NFA(nondeterministic finite automata),
DFA(deterministic finite automata).
3. Parser, such as recursive descent, LL(k), LALR, etc.
4. Intermediate Languages.
5. Code generation.
6. Code optimization.
I believe that most(98%) students will not care anything beyond
parser(at least in my school). And the most important thing is: we
still don't know how to build a compiler! Even after all these
theories. The main reason is that what "Compiler Theory" try to teach
is actually "how to build a parser generator", namely a tool that
consumes syntax grammar and generates compiler (such as lex/yacc).
These theories try to taught us how to solve problems in a common way
automatically. That means once you master them, you are able to deal
with all kinds of grammars. They are indeed useful in industry.
Nevertheless they are too powerful and too complicate for students and
most programmers. You will be convinced if you read the source code of
lex/yacc (or flex/bison).
The good news is, building a compiler is far simpler than you'd ever
imagined. I won't lie, it is not easy, but not that hard.
Original intention is for self-practicing
I saw [13]c4 on Github. It is a small C interpreter which is claimed to
be implemented by only 4 functions. The most amazing part is that it is
bootstrapping (that interpret itself). Also it is done with about 500
lines!
Existing tutorials is either very simple(such as implementing a simple
calculator) or using automation tools(such as flex/bison). c4 is
implemented all on its own. The bad thing is that it try to be minimal,
so the code is quite a mess, hard to understand. So I started a new
project that:
1. implement a working C compiler(interpreter actually)
2. Writing this tutorial to show how to do it.
c4 is about 500 Lines, it took 1 week for me to re-write it, resulting
1400 lines including comments. The project is hosted on Github:
[14]Write a C Interpreter
Note: Almost all logic of this project is taken from c4. So the
original author(rswier) takes credit.
Caution
Two major problem I met when I working with this project are:
1. boring, there will be codes that are almost identical.
2. hard to debug. We don't have good test cases. On the other hand if
the output is wrong, I could only follow the generated code all by
myself to debug.
So I hope you'll take out enough time and patience for studying, cause
I am sure that you will feel a great sense of accomplishment just like
I do.
References
1. [15]Let's Build a Compiler: a very good tutorial of building a
compiler for fresh starters.
2. [16]Lemon Parser Generator: the parser generator that is used in
SQLite. Good to read if you won't to understand compiler theory in
code.
In the end, I am human with a general level, there will be inevitably
wrong with the articles and codes(also my English). Feel free to
correct me!
Hope you enjoy it.
Please enable JavaScript to view the [17]comments powered by Disqus.
[18]Blog comments powered by Disqus
References
Visible links:
1. https://lotabout.me/atom.xml
2. https://lotabout.me/
3. https://lotabout.me/books
4. https://lotabout.me/about
5. https://lotabout.me/tags/C/
6. https://lotabout.me/tags/compiler/
7. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#why-compiler-theory
8. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#hard-to-understand-hard-to-implement
9. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#original-intention-is-for-self-practicing
10. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#caution
11. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#references
12. https://github.com/lotabout/write-a-C-interpreter/tree/master/tutorial/en
13. https://github.com/rswier/c4
14. https://github.com/lotabout/write-a-C-interpreter
15. http://compilers.iecc.com/crenshaw/
16. http://www.hwaci.com/sw/lemon/
17. http://disqus.com/?ref_noscript
18. http://disqus.com/
Hidden links:
20. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#why-compiler-theory
21. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#hard-to-understand-hard-to-implement
22. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#original-intention-is-for-self-practicing
23. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#caution
24. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#references
|