summaryrefslogtreecommitdiff
path: root/miniany/doc/www.bell-labs.com_usr_dmr_www_primevalC.txt
blob: e7bcff24b67a1e50be2a74ff94c088a4115b1a2f (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
                      Very early C compilers and language

   Several years ago, Paul Vixie and Keith Bostic found a DECtape drive,
   attached it to a VAX, and offered to read old DECtapes. Even at the
   time, this was an antiquarian pursuit, and it presented an opportunity
   to mine beneath the raised floor of the computer room and unearth some
   of the DECtapes we'd stored since the early 1970s. Gradually, I've been
   curating some of this, and here offer some of the artifacts.
   Unfortunately existing tapes lack interesting things like earliest Unix
   OS source, but some indicative fossils have been prepared for
   exhibition.

   [new.gif] information: Warren Toomey, now at Bond University, has
   managed to make one of the compilers (last1120c, see below) compile
   itself using a First/Second edition Unix emulator for the PDP-11; see
   his [1]ftp-available directory. More generally, it's worth looking into
   the [2]PDP-11 Unix Preservation Society pages for sources and
   simulators.

   As described in the [3]C History paper, 1972-73 were the truly
   formative years in the development of the C language: this is when the
   transition from typeless B to weakly typed C took place, mediated by
   the (Neanderthal?) NB language, of which no source seems to survive. It
   was also the period in which Unix was rewritten in C.

   In looking over this material, I have mixed emotions; so much of this
   stuff is immature and not well-done, and there is an element of
   embarrassment about displaying it. But at the same time it does capture
   two moments in a period of creativeness and may have some historical
   interest.

   Two tapes are present here; the first is labeled "last1120c", the
   second "prestruct-c". I know from distant memory what these names mean:
   the first is a saved copy of the compiler preserved just as we were
   abandoning the PDP-11/20, which did not have multiply or divide
   instructions, but instead a separate, optional unit that did these
   operations (and also shifts) by storing the operands into memory
   locations. (A [4]story about using this hardware is told elsewhere.)

   "prestruct-c" is a copy of the compiler just before I started changing
   it to use structures itself.

   It's a bit hard to get really accurate dates for these compilers,
   except that they are certainly 1972-73. There are date bits on the tape
   image, but they suffer from a possible off-by-a-year error because we
   changed epochs more than once during this era, and also because the
   files may have been copied or fiddled after they were the source for
   the compiler in contemporaneous use.

   The earlier compiler does not know about structures at all: the string
   "struct" does not appear anywhere. The second tape has a compiler that
   does implement structures in a way that begins to approach their
   current meaning. Their declaration syntax seems to use () instead of
   {}, but . and -> for specifying members of a structure itself and
   members of a pointed-to structure are both there.

   Neither compiler yet handled the general declaration syntax of today or
   even K&R I, with its compound declarators like the one in int **ipp; .
   The compilers have not yet evolved the notion of compounding of type
   constructors ("array of pointers to functions", for example). These
   would appear, though, by 5th or 6th edition Unix (say 1975), as
   described (in Postscript) in the [5]C manual a couple of years after
   these versions.

   Instead, pointer declarations were written in the style int ip[];. A
   fossil from this era survives even in modern C, where the notation can
   be used in declarations of arguments. On the other hand, the later of
   the two does accept the * notation, even though it doesn't use it.
   (Evolving compilers written in their own language are careful not to
   take advantage of their own latest features.)

   It's interesting to note that the earlier compiler has a commented-out
   preparation for a "long" keyword; the later one takes over its slot for
   "struct." Implementation of long was a few years away.

   Aside from their small size, perhaps the most striking thing about
   these programs is their primitive construction, particularly the many
   constants strewn throughout; they are used for names of tokens, for
   example. This is because the preprocessor didn't exist at the time.

   A second, less noticeable, but astonishing peculiarity is the space
   allocation: temporary storage is allocated that deliberately overwrites
   the beginning of the program, smashing its initialization code to save
   space. The two compilers differ in the details in how they cope with
   this. In the earlier one, the start is found by naming a function; in
   the later, the start is simply taken to be 0. This indicates that the
   first compiler was written before we had a machine with memory mapping,
   so the origin of the program was not at location 0, whereas by the time
   of the second, we had a PDP-11 that did provide mapping. (See the
   [6]Unix History paper). In one of the files (prestruct-c/c10.c) the
   kludgery is especially evident.

   Links to the source of the compilers are listed below. The files named
   c0?.c are the first passes, which parse source and writes syntax trees
   intermingled with some text on an intermediate file. The c1?.c files
   are the code generators, which read the trees and generate code. The
   format is straight text (with just NL characters separating lines; the
   browsers I've tried cope with this).

   The code generation technique uses tables of instruction prototypes; a
   parse tree is recursively matched against the part of the table
   corresponding to its root operator. Restrictions on the types and
   complexity of the operands can be expressed, and the table is searched
   sequentially for the earliest matching fragment. Following each
   restriction specification is the expansion specification; lower case
   letters are literal, upper case things are replaced by things from the
   operands in the tree. This is described in more detail in the paper
   [7]A Tour through the PDP-11 Compiler. (This reference is troff source;
   it can also be found in Postscript or PDF forms, though bundled with
   other papers, under the [8]7th Edition Manual's home page). But do note
   that this Tour describes the state of things after several years had
   passed.

   There are four tables specifying how to compile an expression to a
   register, to compile only for side effects, to compile only to test
   condition codes, and to compile to push on the stack (used for function
   arguments, or for temporaries). They were saved only with the
   "last1120c" compiler; the tables for the later one would have been
   similar.

   The source for the last1120c compiler also has a subsidiary table for
   each pass with a bit of stuff that was not in the library, and some
   encoding of facts about various operators as .s (assembler language)
   files.

   Finally, there is the cvopt program, used to convert the nonce-language
   expression template tables into assembler. With a lot of handwork,
   there is probably enough material to construct a working version of the
   last1120c compiler, where "works" means "turns source into PDP-11
   assembler." (See the [9]top of the page for one who succeeded.)

   The links for the files are:

    last1120c

   [10]c00.c
   [11]c01.c
   [12]c02.c
   [13]c03.c
   [14]c0t.s
   [15]c10.c
   [16]c11.c
   [17]c1t.s
   [18]regtab.s
   [19]cctab.s
   [20]sptab.s
   [21]efftab.s
   [22]cvopt.c

    prestruct-c

   [23]c00.c
   [24]c01.c
   [25]c02.c
   [26]c03.c
   [27]c10.c
   [28]c11.c

References

   1. ftp://minnie.tuhs.org/pub/PDP-11/Sims/Apout/
   2. http://minnie.tuhs.org/PUPS
   3. https://www.bell-labs.com/usr/dmr/www/chist.html
   4. https://www.bell-labs.com/usr/dmr/www/odd.html
   5. https://www.bell-labs.com/usr/dmr/www/cman.ps
   6. https://www.bell-labs.com/usr/dmr/www/hist.html
   7. http://plan9.bell-labs.com/7thEdMan/vol2/ctour.bun
   8. http://plan9.bell-labs.com/7thEdMan/index.html
   9. https://www.bell-labs.com/usr/dmr/www/primevalC.html#works
  10. https://www.bell-labs.com/usr/dmr/www/last1120c/c00.c
  11. https://www.bell-labs.com/usr/dmr/www/last1120c/c01.c
  12. https://www.bell-labs.com/usr/dmr/www/last1120c/c02.c
  13. https://www.bell-labs.com/usr/dmr/www/last1120c/c03.c
  14. https://www.bell-labs.com/usr/dmr/www/last1120c/c0t.s
  15. https://www.bell-labs.com/usr/dmr/www/last1120c/c10.c
  16. https://www.bell-labs.com/usr/dmr/www/last1120c/c11.c
  17. https://www.bell-labs.com/usr/dmr/www/last1120c/c1t.s
  18. https://www.bell-labs.com/usr/dmr/www/last1120c/regtab.s
  19. https://www.bell-labs.com/usr/dmr/www/last1120c/cctab.s
  20. https://www.bell-labs.com/usr/dmr/www/last1120c/sptab.s
  21. https://www.bell-labs.com/usr/dmr/www/last1120c/efftab.s
  22. https://www.bell-labs.com/usr/dmr/www/last1120c/cvopt.c
  23. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c00.c
  24. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c01.c
  25. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c02.c
  26. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c03.c
  27. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c10.c
  28. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c11.c