1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
|
Very early C compilers and language
Several years ago, Paul Vixie and Keith Bostic found a DECtape drive,
attached it to a VAX, and offered to read old DECtapes. Even at the
time, this was an antiquarian pursuit, and it presented an opportunity
to mine beneath the raised floor of the computer room and unearth some
of the DECtapes we'd stored since the early 1970s. Gradually, I've been
curating some of this, and here offer some of the artifacts.
Unfortunately existing tapes lack interesting things like earliest Unix
OS source, but some indicative fossils have been prepared for
exhibition.
[new.gif] information: Warren Toomey, now at Bond University, has
managed to make one of the compilers (last1120c, see below) compile
itself using a First/Second edition Unix emulator for the PDP-11; see
his [1]ftp-available directory. More generally, it's worth looking into
the [2]PDP-11 Unix Preservation Society pages for sources and
simulators.
As described in the [3]C History paper, 1972-73 were the truly
formative years in the development of the C language: this is when the
transition from typeless B to weakly typed C took place, mediated by
the (Neanderthal?) NB language, of which no source seems to survive. It
was also the period in which Unix was rewritten in C.
In looking over this material, I have mixed emotions; so much of this
stuff is immature and not well-done, and there is an element of
embarrassment about displaying it. But at the same time it does capture
two moments in a period of creativeness and may have some historical
interest.
Two tapes are present here; the first is labeled "last1120c", the
second "prestruct-c". I know from distant memory what these names mean:
the first is a saved copy of the compiler preserved just as we were
abandoning the PDP-11/20, which did not have multiply or divide
instructions, but instead a separate, optional unit that did these
operations (and also shifts) by storing the operands into memory
locations. (A [4]story about using this hardware is told elsewhere.)
"prestruct-c" is a copy of the compiler just before I started changing
it to use structures itself.
It's a bit hard to get really accurate dates for these compilers,
except that they are certainly 1972-73. There are date bits on the tape
image, but they suffer from a possible off-by-a-year error because we
changed epochs more than once during this era, and also because the
files may have been copied or fiddled after they were the source for
the compiler in contemporaneous use.
The earlier compiler does not know about structures at all: the string
"struct" does not appear anywhere. The second tape has a compiler that
does implement structures in a way that begins to approach their
current meaning. Their declaration syntax seems to use () instead of
{}, but . and -> for specifying members of a structure itself and
members of a pointed-to structure are both there.
Neither compiler yet handled the general declaration syntax of today or
even K&R I, with its compound declarators like the one in int **ipp; .
The compilers have not yet evolved the notion of compounding of type
constructors ("array of pointers to functions", for example). These
would appear, though, by 5th or 6th edition Unix (say 1975), as
described (in Postscript) in the [5]C manual a couple of years after
these versions.
Instead, pointer declarations were written in the style int ip[];. A
fossil from this era survives even in modern C, where the notation can
be used in declarations of arguments. On the other hand, the later of
the two does accept the * notation, even though it doesn't use it.
(Evolving compilers written in their own language are careful not to
take advantage of their own latest features.)
It's interesting to note that the earlier compiler has a commented-out
preparation for a "long" keyword; the later one takes over its slot for
"struct." Implementation of long was a few years away.
Aside from their small size, perhaps the most striking thing about
these programs is their primitive construction, particularly the many
constants strewn throughout; they are used for names of tokens, for
example. This is because the preprocessor didn't exist at the time.
A second, less noticeable, but astonishing peculiarity is the space
allocation: temporary storage is allocated that deliberately overwrites
the beginning of the program, smashing its initialization code to save
space. The two compilers differ in the details in how they cope with
this. In the earlier one, the start is found by naming a function; in
the later, the start is simply taken to be 0. This indicates that the
first compiler was written before we had a machine with memory mapping,
so the origin of the program was not at location 0, whereas by the time
of the second, we had a PDP-11 that did provide mapping. (See the
[6]Unix History paper). In one of the files (prestruct-c/c10.c) the
kludgery is especially evident.
Links to the source of the compilers are listed below. The files named
c0?.c are the first passes, which parse source and writes syntax trees
intermingled with some text on an intermediate file. The c1?.c files
are the code generators, which read the trees and generate code. The
format is straight text (with just NL characters separating lines; the
browsers I've tried cope with this).
The code generation technique uses tables of instruction prototypes; a
parse tree is recursively matched against the part of the table
corresponding to its root operator. Restrictions on the types and
complexity of the operands can be expressed, and the table is searched
sequentially for the earliest matching fragment. Following each
restriction specification is the expansion specification; lower case
letters are literal, upper case things are replaced by things from the
operands in the tree. This is described in more detail in the paper
[7]A Tour through the PDP-11 Compiler. (This reference is troff source;
it can also be found in Postscript or PDF forms, though bundled with
other papers, under the [8]7th Edition Manual's home page). But do note
that this Tour describes the state of things after several years had
passed.
There are four tables specifying how to compile an expression to a
register, to compile only for side effects, to compile only to test
condition codes, and to compile to push on the stack (used for function
arguments, or for temporaries). They were saved only with the
"last1120c" compiler; the tables for the later one would have been
similar.
The source for the last1120c compiler also has a subsidiary table for
each pass with a bit of stuff that was not in the library, and some
encoding of facts about various operators as .s (assembler language)
files.
Finally, there is the cvopt program, used to convert the nonce-language
expression template tables into assembler. With a lot of handwork,
there is probably enough material to construct a working version of the
last1120c compiler, where "works" means "turns source into PDP-11
assembler." (See the [9]top of the page for one who succeeded.)
The links for the files are:
last1120c
[10]c00.c
[11]c01.c
[12]c02.c
[13]c03.c
[14]c0t.s
[15]c10.c
[16]c11.c
[17]c1t.s
[18]regtab.s
[19]cctab.s
[20]sptab.s
[21]efftab.s
[22]cvopt.c
prestruct-c
[23]c00.c
[24]c01.c
[25]c02.c
[26]c03.c
[27]c10.c
[28]c11.c
References
1. ftp://minnie.tuhs.org/pub/PDP-11/Sims/Apout/
2. http://minnie.tuhs.org/PUPS
3. https://www.bell-labs.com/usr/dmr/www/chist.html
4. https://www.bell-labs.com/usr/dmr/www/odd.html
5. https://www.bell-labs.com/usr/dmr/www/cman.ps
6. https://www.bell-labs.com/usr/dmr/www/hist.html
7. http://plan9.bell-labs.com/7thEdMan/vol2/ctour.bun
8. http://plan9.bell-labs.com/7thEdMan/index.html
9. https://www.bell-labs.com/usr/dmr/www/primevalC.html#works
10. https://www.bell-labs.com/usr/dmr/www/last1120c/c00.c
11. https://www.bell-labs.com/usr/dmr/www/last1120c/c01.c
12. https://www.bell-labs.com/usr/dmr/www/last1120c/c02.c
13. https://www.bell-labs.com/usr/dmr/www/last1120c/c03.c
14. https://www.bell-labs.com/usr/dmr/www/last1120c/c0t.s
15. https://www.bell-labs.com/usr/dmr/www/last1120c/c10.c
16. https://www.bell-labs.com/usr/dmr/www/last1120c/c11.c
17. https://www.bell-labs.com/usr/dmr/www/last1120c/c1t.s
18. https://www.bell-labs.com/usr/dmr/www/last1120c/regtab.s
19. https://www.bell-labs.com/usr/dmr/www/last1120c/cctab.s
20. https://www.bell-labs.com/usr/dmr/www/last1120c/sptab.s
21. https://www.bell-labs.com/usr/dmr/www/last1120c/efftab.s
22. https://www.bell-labs.com/usr/dmr/www/last1120c/cvopt.c
23. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c00.c
24. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c01.c
25. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c02.c
26. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c03.c
27. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c10.c
28. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c11.c
|