summaryrefslogtreecommitdiff
path: root/miniany/doc/www.bell-labs.com_usr_dmr_www_primevalC.txt
diff options
context:
space:
mode:
Diffstat (limited to 'miniany/doc/www.bell-labs.com_usr_dmr_www_primevalC.txt')
-rw-r--r--miniany/doc/www.bell-labs.com_usr_dmr_www_primevalC.txt188
1 files changed, 188 insertions, 0 deletions
diff --git a/miniany/doc/www.bell-labs.com_usr_dmr_www_primevalC.txt b/miniany/doc/www.bell-labs.com_usr_dmr_www_primevalC.txt
new file mode 100644
index 0000000..e7bcff2
--- /dev/null
+++ b/miniany/doc/www.bell-labs.com_usr_dmr_www_primevalC.txt
@@ -0,0 +1,188 @@
+ Very early C compilers and language
+
+ Several years ago, Paul Vixie and Keith Bostic found a DECtape drive,
+ attached it to a VAX, and offered to read old DECtapes. Even at the
+ time, this was an antiquarian pursuit, and it presented an opportunity
+ to mine beneath the raised floor of the computer room and unearth some
+ of the DECtapes we'd stored since the early 1970s. Gradually, I've been
+ curating some of this, and here offer some of the artifacts.
+ Unfortunately existing tapes lack interesting things like earliest Unix
+ OS source, but some indicative fossils have been prepared for
+ exhibition.
+
+ [new.gif] information: Warren Toomey, now at Bond University, has
+ managed to make one of the compilers (last1120c, see below) compile
+ itself using a First/Second edition Unix emulator for the PDP-11; see
+ his [1]ftp-available directory. More generally, it's worth looking into
+ the [2]PDP-11 Unix Preservation Society pages for sources and
+ simulators.
+
+ As described in the [3]C History paper, 1972-73 were the truly
+ formative years in the development of the C language: this is when the
+ transition from typeless B to weakly typed C took place, mediated by
+ the (Neanderthal?) NB language, of which no source seems to survive. It
+ was also the period in which Unix was rewritten in C.
+
+ In looking over this material, I have mixed emotions; so much of this
+ stuff is immature and not well-done, and there is an element of
+ embarrassment about displaying it. But at the same time it does capture
+ two moments in a period of creativeness and may have some historical
+ interest.
+
+ Two tapes are present here; the first is labeled "last1120c", the
+ second "prestruct-c". I know from distant memory what these names mean:
+ the first is a saved copy of the compiler preserved just as we were
+ abandoning the PDP-11/20, which did not have multiply or divide
+ instructions, but instead a separate, optional unit that did these
+ operations (and also shifts) by storing the operands into memory
+ locations. (A [4]story about using this hardware is told elsewhere.)
+
+ "prestruct-c" is a copy of the compiler just before I started changing
+ it to use structures itself.
+
+ It's a bit hard to get really accurate dates for these compilers,
+ except that they are certainly 1972-73. There are date bits on the tape
+ image, but they suffer from a possible off-by-a-year error because we
+ changed epochs more than once during this era, and also because the
+ files may have been copied or fiddled after they were the source for
+ the compiler in contemporaneous use.
+
+ The earlier compiler does not know about structures at all: the string
+ "struct" does not appear anywhere. The second tape has a compiler that
+ does implement structures in a way that begins to approach their
+ current meaning. Their declaration syntax seems to use () instead of
+ {}, but . and -> for specifying members of a structure itself and
+ members of a pointed-to structure are both there.
+
+ Neither compiler yet handled the general declaration syntax of today or
+ even K&R I, with its compound declarators like the one in int **ipp; .
+ The compilers have not yet evolved the notion of compounding of type
+ constructors ("array of pointers to functions", for example). These
+ would appear, though, by 5th or 6th edition Unix (say 1975), as
+ described (in Postscript) in the [5]C manual a couple of years after
+ these versions.
+
+ Instead, pointer declarations were written in the style int ip[];. A
+ fossil from this era survives even in modern C, where the notation can
+ be used in declarations of arguments. On the other hand, the later of
+ the two does accept the * notation, even though it doesn't use it.
+ (Evolving compilers written in their own language are careful not to
+ take advantage of their own latest features.)
+
+ It's interesting to note that the earlier compiler has a commented-out
+ preparation for a "long" keyword; the later one takes over its slot for
+ "struct." Implementation of long was a few years away.
+
+ Aside from their small size, perhaps the most striking thing about
+ these programs is their primitive construction, particularly the many
+ constants strewn throughout; they are used for names of tokens, for
+ example. This is because the preprocessor didn't exist at the time.
+
+ A second, less noticeable, but astonishing peculiarity is the space
+ allocation: temporary storage is allocated that deliberately overwrites
+ the beginning of the program, smashing its initialization code to save
+ space. The two compilers differ in the details in how they cope with
+ this. In the earlier one, the start is found by naming a function; in
+ the later, the start is simply taken to be 0. This indicates that the
+ first compiler was written before we had a machine with memory mapping,
+ so the origin of the program was not at location 0, whereas by the time
+ of the second, we had a PDP-11 that did provide mapping. (See the
+ [6]Unix History paper). In one of the files (prestruct-c/c10.c) the
+ kludgery is especially evident.
+
+ Links to the source of the compilers are listed below. The files named
+ c0?.c are the first passes, which parse source and writes syntax trees
+ intermingled with some text on an intermediate file. The c1?.c files
+ are the code generators, which read the trees and generate code. The
+ format is straight text (with just NL characters separating lines; the
+ browsers I've tried cope with this).
+
+ The code generation technique uses tables of instruction prototypes; a
+ parse tree is recursively matched against the part of the table
+ corresponding to its root operator. Restrictions on the types and
+ complexity of the operands can be expressed, and the table is searched
+ sequentially for the earliest matching fragment. Following each
+ restriction specification is the expansion specification; lower case
+ letters are literal, upper case things are replaced by things from the
+ operands in the tree. This is described in more detail in the paper
+ [7]A Tour through the PDP-11 Compiler. (This reference is troff source;
+ it can also be found in Postscript or PDF forms, though bundled with
+ other papers, under the [8]7th Edition Manual's home page). But do note
+ that this Tour describes the state of things after several years had
+ passed.
+
+ There are four tables specifying how to compile an expression to a
+ register, to compile only for side effects, to compile only to test
+ condition codes, and to compile to push on the stack (used for function
+ arguments, or for temporaries). They were saved only with the
+ "last1120c" compiler; the tables for the later one would have been
+ similar.
+
+ The source for the last1120c compiler also has a subsidiary table for
+ each pass with a bit of stuff that was not in the library, and some
+ encoding of facts about various operators as .s (assembler language)
+ files.
+
+ Finally, there is the cvopt program, used to convert the nonce-language
+ expression template tables into assembler. With a lot of handwork,
+ there is probably enough material to construct a working version of the
+ last1120c compiler, where "works" means "turns source into PDP-11
+ assembler." (See the [9]top of the page for one who succeeded.)
+
+ The links for the files are:
+
+ last1120c
+
+ [10]c00.c
+ [11]c01.c
+ [12]c02.c
+ [13]c03.c
+ [14]c0t.s
+ [15]c10.c
+ [16]c11.c
+ [17]c1t.s
+ [18]regtab.s
+ [19]cctab.s
+ [20]sptab.s
+ [21]efftab.s
+ [22]cvopt.c
+
+ prestruct-c
+
+ [23]c00.c
+ [24]c01.c
+ [25]c02.c
+ [26]c03.c
+ [27]c10.c
+ [28]c11.c
+
+References
+
+ 1. ftp://minnie.tuhs.org/pub/PDP-11/Sims/Apout/
+ 2. http://minnie.tuhs.org/PUPS
+ 3. https://www.bell-labs.com/usr/dmr/www/chist.html
+ 4. https://www.bell-labs.com/usr/dmr/www/odd.html
+ 5. https://www.bell-labs.com/usr/dmr/www/cman.ps
+ 6. https://www.bell-labs.com/usr/dmr/www/hist.html
+ 7. http://plan9.bell-labs.com/7thEdMan/vol2/ctour.bun
+ 8. http://plan9.bell-labs.com/7thEdMan/index.html
+ 9. https://www.bell-labs.com/usr/dmr/www/primevalC.html#works
+ 10. https://www.bell-labs.com/usr/dmr/www/last1120c/c00.c
+ 11. https://www.bell-labs.com/usr/dmr/www/last1120c/c01.c
+ 12. https://www.bell-labs.com/usr/dmr/www/last1120c/c02.c
+ 13. https://www.bell-labs.com/usr/dmr/www/last1120c/c03.c
+ 14. https://www.bell-labs.com/usr/dmr/www/last1120c/c0t.s
+ 15. https://www.bell-labs.com/usr/dmr/www/last1120c/c10.c
+ 16. https://www.bell-labs.com/usr/dmr/www/last1120c/c11.c
+ 17. https://www.bell-labs.com/usr/dmr/www/last1120c/c1t.s
+ 18. https://www.bell-labs.com/usr/dmr/www/last1120c/regtab.s
+ 19. https://www.bell-labs.com/usr/dmr/www/last1120c/cctab.s
+ 20. https://www.bell-labs.com/usr/dmr/www/last1120c/sptab.s
+ 21. https://www.bell-labs.com/usr/dmr/www/last1120c/efftab.s
+ 22. https://www.bell-labs.com/usr/dmr/www/last1120c/cvopt.c
+ 23. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c00.c
+ 24. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c01.c
+ 25. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c02.c
+ 26. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c03.c
+ 27. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c10.c
+ 28. https://www.bell-labs.com/usr/dmr/www/prestruct-c/c11.c