summaryrefslogtreecommitdiff
path: root/miniany/doc/lotabout.me_2016_Let-s-Build-a-C-Interpreter-0.txt
blob: 78b8c4f4d87c61016a21d76515613fb59daefb28 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
   #[1]alternate

   [2]Home[3]Books[4]About
   [5]C[6]compiler
   2016-02-06

Let's Build a C Compiler(0) -- Preface

   Table of Contents
    1. [7]1. Why Compiler Theory
    2. [8]2. Hard to understand, hard to implement?
    3. [9]3. Original intention is for self-practicing
    4. [10]4. Caution
    5. [11]5. References

   EDIT: Note that I've include the full tutorial in the project
   [12]write-a-C-interpreter. Please check that instead.

   In "Let's Build a C Compiler" series, we will build a compiler from
   scratch for C programming language. I hope you will get some
   understanding of compiler construction by the end of this tutorial. At
   the same time we will build a usable compiler of C though some syntaxes
   are not supported.

   Note that it is actually an Interpreter and can interpret itself. I use
   the word "compiler" because it is more attractive, but we did more than
   that. Also this series is actually written in Chinese in the first
   place, If you are confused by my English, please leave some comments.

   In this very first chapter there will not be any code. If you are those
   that likes code instead of texts, please skip. I'll talk about the
   intention of this series.

Why Compiler Theory

   What is the most important courses in computer science? I would give
   "Data Structure", "Algorithm" and "Compiler Theory". In my point of
   view, understanding recursion is the first level for programmers, and
   writing a compiler is the next one.

   (Of course, there exists a lot of excellent programmers that don't
   write a compiler, but at least writing one is a big challenge)

   People used to say that you can write more efficient code if you know
   how the compiler works. But who cares when the modern computers have
   performance so high that we can hardly imagine before? Then why bother
   with compiler theory?

   Because it is cool!

   OK, admit it, you are still reading mainly because you are curious how
   far would I go with this tutorial. But be careful, it will go quite
   far.

   No? You just want to know how to build a compiler? OK then... my
   mistake.

Hard to understand, hard to implement?

   I have always been in awe of compiler. So when I went to college and
   they taught compiler theory, I was so enthusiastic! And then... then I
   quit, because I could not understand a single part.

   Normally a course about compiler will cover:
    1. How to represent syntax (such as BNF, etc.)
    2. Lexer, with somewhat NFA(nondeterministic finite automata),
       DFA(deterministic finite automata).
    3. Parser, such as recursive descent, LL(k), LALR, etc.
    4. Intermediate Languages.
    5. Code generation.
    6. Code optimization.

   I believe that most(98%) students will not care anything beyond
   parser(at least in my school). And the most important thing is: we
   still don't know how to build a compiler! Even after all these
   theories. The main reason is that what "Compiler Theory" try to teach
   is actually "how to build a parser generator", namely a tool that
   consumes syntax grammar and generates compiler (such as lex/yacc).

   These theories try to taught us how to solve problems in a common way
   automatically. That means once you master them, you are able to deal
   with all kinds of grammars. They are indeed useful in industry.
   Nevertheless they are too powerful and too complicate for students and
   most programmers. You will be convinced if you read the source code of
   lex/yacc (or flex/bison).

   The good news is, building a compiler is far simpler than you'd ever
   imagined. I won't lie, it is not easy, but not that hard.

Original intention is for self-practicing

   I saw [13]c4 on Github. It is a small C interpreter which is claimed to
   be implemented by only 4 functions. The most amazing part is that it is
   bootstrapping (that interpret itself). Also it is done with about 500
   lines!

   Existing tutorials is either very simple(such as implementing a simple
   calculator) or using automation tools(such as flex/bison). c4 is
   implemented all on its own. The bad thing is that it try to be minimal,
   so the code is quite a mess, hard to understand. So I started a new
   project that:
    1. implement a working C compiler(interpreter actually)
    2. Writing this tutorial to show how to do it.

   c4 is about 500 Lines, it took 1 week for me to re-write it, resulting
   1400 lines including comments. The project is hosted on Github:
   [14]Write a C Interpreter

   Note: Almost all logic of this project is taken from c4. So the
   original author(rswier) takes credit.

Caution

   Two major problem I met when I working with this project are:
    1. boring, there will be codes that are almost identical.
    2. hard to debug. We don't have good test cases. On the other hand if
       the output is wrong, I could only follow the generated code all by
       myself to debug.

   So I hope you'll take out enough time and patience for studying, cause
   I am sure that you will feel a great sense of accomplishment just like
   I do.

References

    1. [15]Let's Build a Compiler: a very good tutorial of building a
       compiler for fresh starters.
    2. [16]Lemon Parser Generator: the parser generator that is used in
       SQLite. Good to read if you won't to understand compiler theory in
       code.

   In the end, I am human with a general level, there will be inevitably
   wrong with the articles and codes(also my English). Feel free to
   correct me!

   Hope you enjoy it.

   Please enable JavaScript to view the [17]comments powered by Disqus.
   [18]Blog comments powered by Disqus

References

   Visible links:
   1. https://lotabout.me/atom.xml
   2. https://lotabout.me/
   3. https://lotabout.me/books
   4. https://lotabout.me/about
   5. https://lotabout.me/tags/C/
   6. https://lotabout.me/tags/compiler/
   7. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#why-compiler-theory
   8. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#hard-to-understand-hard-to-implement
   9. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#original-intention-is-for-self-practicing
  10. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#caution
  11. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#references
  12. https://github.com/lotabout/write-a-C-interpreter/tree/master/tutorial/en
  13. https://github.com/rswier/c4
  14. https://github.com/lotabout/write-a-C-interpreter
  15. http://compilers.iecc.com/crenshaw/
  16. http://www.hwaci.com/sw/lemon/
  17. http://disqus.com/?ref_noscript
  18. http://disqus.com/

   Hidden links:
  20. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#why-compiler-theory
  21. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#hard-to-understand-hard-to-implement
  22. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#original-intention-is-for-self-practicing
  23. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#caution
  24. https://lotabout.me/2016/Let-s-Build-a-C-Interpreter-0/#references