1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
|
[1]PREVIOUS [2]TABLE OF CONTENTS [3]NEXT
__________________________________________________________________
Microperl
Simon Cozens
Perl 5.7.0 shipped with an obscure and barely announced new feature --
microperl. microperl is something I've worked on for quite a few months
now and, while I expect it to be useful to a tiny fraction of Perl
users, I'd like to explain why it's included and what's so cool about
it.
First, though, what is it? Well, when you compile a version of Perl,
the first thing that gets built is a program called miniperl. miniperl
is more or less just like your ordinary perl, except that it doesn't
have the DynaLoader XS module linked in. An XS module is a C extension
module, allowing Perl to run C code, and it's usually written in a
special glue language, XS, rather than in C.
What makes DynaLoader special is that it's the module which allows Perl
to load other XS modules dynamically -- without that, you can't use
modules like IO::File, your DBM database library (DB_File, SDBM_File or
equivalent) or any of the modules on CPAN with XS components.
miniperl, however, has just enough brains to run a program which
translates the XS language in which DynaLoader is written into a C
program; once we've done that, we can compile DynaLoader and link it
into perl. In effect, we're building perl in stages: first without XS
support, and then using that first effort to help build the next stage
with XS support.
Bootstrapping
This process -- starting small and using the result to build up to the
next stage -- is called "bootstrapping", since you've started from the
ground and are pulling yourself up by your own bootstraps. In fact,
it's exactly what happens when you turn up your computer and it "boots
up" -- the raw circuitry knows enough to activate the BIOS, the BIOS
knows enough to find and run the first block on the disk, which is a
program which in turn knows enough to find the boot loader, which finds
and runs the operating system.
The idea behind microperl is to take the bootstrapping nature of
miniperl and perl to its logical conclusion. microperl is a very, very
simple build of perl which will hopefully one day be used to build
miniperl.
At the moment, before miniperl can be built, a program called Configure
must be run. To make sure that perl builds properly on the plethora of
different computers it supports, Configure performs hundreds of tests
to work out the characteristics of the current system -- which
character set is being used, how large the various C datatypes are on
the machine, what libraries are available, how to use the C compiler,
and so on.
The problem with Configure is that you need to make a few assumptions
about the system in order to run it. Configure is written in Bourne
shell, so you need a copy of sh around. It also uses grep, awk, sed and
a bunch of other Unix utilities to probe the system. Of course, this
will only work on something that smells like Unix. It would be a lot
better if Configure was written in something portable -- something like
Perl, for instance. Of course, you'd need a Perl interpreter to be able
to run it, and since the purpose of the exercise is to build a Perl
interpreter, we've hit a chicken-and-egg problem.
This is where microperl comes in. The aim of microperl is to be a Perl
interpreter which can be built on as many machines as possible, even
small operating systems like WinCE and PalmOS, before any probing of
the system occurs. Simply unpack your Perl distribution, compile it,
and go. In fact, let's do that.
Building Microperl
Unpack Perl 5.7.0, and change to the directory perl5.7.0/. Now issue
this command:
% make -f Makefile.micro
If all goes well, you should see something like this happen:
cc -c -o uav.o -DPERL_CORE -DPERL_MICRO av.c
cc -c -o udeb.o -DPERL_CORE -DPERL_MICRO deb.c
cc -c -o udoio.o -DPERL_CORE -DPERL_MICRO doio.c
cc -c -o udoop.o -DPERL_CORE -DPERL_MICRO doop.c
[... time passes ...]
cc -c -o uutil.o -DPERL_CORE -DPERL_MICRO util.c
cc -c -o uperlapi.o -DPERL_CORE -DPERL_MICRO perlapi.c
cc -o microperl uav.o udeb.o udoio.o udoop.o udump.o uglobals.o ugv.o
uhv.o umg.o uperlmain.o uop.o uperl.o uperlio.o uperly.o upp.o upp_ctl.o
upp_hot.o upp_sys.o uregcomp.o uregexec.o urun.o uscope.o usv.o utaint.o
utoke.o uuniversal.o uutf8.o uutil.o uperlapi.o -lm
You might see a few warnings, which are probably harmless. If you don't
get past the first file, check that your C compiler is available and
replace the line CC = cc in Makefile.micro with CC = /path/to/your/cc.
If you use a separate linker, alter the line LD = $(CC) to, for
example, LD = ld. Eventually, you should end up with an executable file
called microperl. (If you get stuck between the first file and the big
statement at the end, then we probably have a bug.)
microperl is a real, honest-to-goodness Perl interpreter; no core
elements of the Perl language have been removed. The regular expression
engine is exactly the same, the language is exactly the same, it has
the same Unicode support, and so on. The only things that have been
removed from it are functions that are completely system-specific, like
crypt and readdir.
% ./microperl -le 'print q/Hello world/'
Hello world
% ./microperl t/base/cond.t
1..4
ok 1
ok 2
ok 3
ok 4
How Does It Work?
The idea for microperl came from Ilya Zakharevich, who produced a
package called crazyperl very much along the same lines. What I've
done, together with a lot of help from Jarkko Hietaniemi, is to make it
easy to build, extend the number of systems it can work on, and keep up
it to date with the changes as Perl develops.
To understand how it works, however, we've got to go back to looking at
how perl is built. Once Configure has tested the system and found
everything it needs to know, it writes out its results to a shell file,
config.sh. This file is then re-arranged into a C header file,
config.h, and the rest of the Perl source files use the values from
that as they are compiled.
You can examine the config.sh file that was used to build your version
of Perl through the Config module:
#!/usr/bin/perl
use warnings;
use strict;
use Config qw(config_sh);
print config_sh();
This should print out something like the following:
archlibexp='/usr/local/lib/perl5/5.7.0/cygwin'
archname='cygwin'
cc='gcc'
ccflags='-fno-strict-aliasing -I/usr/local/include'
cppflags='-fno-strict-aliasing -I/usr/local/include'
...
This tells us that this Perl will store its machine-specific modules in
the directory /usr/local/lib/perl5/5.7.0/cygwin, that the architecture
we're running on is cygwin, and the flags passed to the C compiler and
C preprocessor respectively; you can find documentation on what the
rest of the options mean with perldoc Config. You can use this to
determine characteristics about the current Perl build:
#!/usr/bin/perl
use warnings;
use strict;
use Config;
...
if ($Config{use5005threads}) {
# OK, we have threads:
require Threads; import Threads;
threading_child();
} else {
# Make do with fork:
forking_child();
}
microperl provides a config.sh which specifies the lowest common
denominator: almost all optional items are turned off, all tests are
set to have failed, and so on. We then build a version of Perl with
this minimal configuration -- since Perl is able to cope with pretty
much every combination that can be thrown at it, it does its best to
work around everything that we claim is lacking.
Why?
What's the point? Is microperl anything more than a cool hack?
Well, it certainly is a cool hack, and I quite like it just for that,
but here are three real, practical uses for microperl.
• Hacking
When you're working on the Perl core, you'll naturally end up doing
a load of tweaking, testing ideas, and so on. This means a lot of
recompiling, and recompiling a full perl -- or even just miniperl --
takes a lot of time. microperl builds fast. Furthermore, it's simple
and uncluttered, and therefore useful for debugging and making
patches; the microperl kit, if placed in a separate directory, would
be a core of 71 files, (29 C program files) rather than the
1700-or-so files in a full Perl distribution.
It's also great for checking out a fresh Perl distribution without
having to take the time to plod through Configure; Configure is
pretty slow, taking between ten and twenty minutes on my old
machine.
On top of this, because microperl assumes the absolute minimum
permissible, it can help root out edge cases in the configure
process -- if a function is used on the assumption that everyone has
it, for instance. microperl has already found a few assumptions in
the Perl source, and will hopefully guard against any more.
• Porting
Because microperl is a lowest common denominator, it's very, very
easy to port to new systems: you don't need to know anything about
the characteristics of the system you're compiling on. Replace CC in
Makefile.micro with a cross-compiler, and you can instantly port a
version of Perl to another operating system. (In fact, this is what
I'm doing to put Perl onto the Palm Pilot port of Linux, but there
are a few wrinkles left to be smoothed.)
Further, since you can build microperl with nothing other than a C
compiler and make, it's useful for porting to systems which don't
have sh, awk, grep, and all the other things Configure demands. Once
you have experience of how config.sh works, you can build up a
version of Perl by steadily adding features to microperl.
• Bootstrapping
As mentioned before, we can theoretically use microperl to bootstrap
a Perl build; the "steadily adding features" part mentioned above
can be automated and run by the machine itself. Ideally, you'd
unpack your Perl kit, type make, and a microperl would be built,
which would run a Perl version of Configure and then build a full
perl.
Now, when someone says "theoretically", they usually mean "not
really", but let's have a look at a simple proof of concept. One of
the tests we need to do is to examine the size of C integer storage:
$ ./microperl.exe fixbytes.pl
Checking to see how big your integers are...
ints are 4, longs are 4, shorts are 2
Fixing uconfig.sh
Reprocessing uconfig.sh
Extracting uconfig.h (with variable substitutions)
Making myself!
cc -c -o uav.o -DPERL_CORE -DPERL_MICRO av.c
...
cc -o microperl uav.o udeb.o udoio.o udoop.o udump.o
uglobals.o ugv.o uhv.o umg.o uperlmain.o uop.o uperl.o
uperlio.o uperly.o upp.o upp_ctl.o
upp_hot.o upp_sys.o uregcomp.o uregexec.o urun.o
uscope.o usv.o utaint.o utoke.o uuniversal.o uutf8.o
uutil.o uperlapi.o -lm
I'm still here.
Now we've built a new version of microperl using the information
we've discovered. Here's the code which did that:
require "bootstrap.pl";
print "Checking to see how big your integers are...\n";
open (OUT, ">intsize.c") or die $!;
print OUT <<'EOF';
#include <stdio.h>
int main()
{
printf("$intsize=%d;\n", (int)sizeof(int) );
printf("$longsize=%d;\n", (int)sizeof(long) );
printf("$shortsize=%d;\n", (int)sizeof(short));
exit(0);
}
EOF
close OUT;
system("cc -o intsize intsize.c");
if (!-x "./intsize") { die "Didn't compile" }
$sizes = './intsize';
unlink "intsize", "intsize.c";
eval $sizes;
print "ints are $intsize, longs are $longsize, shorts are $shortsize\n";
changeit ( intsize => $intsize,
longsize => $longsize,
shortsize => $shortsize
);
rebuild();
Of course, it'll take a lot of work before we can bootstrap Perl to
the same degree as an ordinary Configure run. But I'll get there!
Problems
The major problem I've had developing microperl is that the definitions
that config.sh needs to provide to config.h constantly change, and so
I've had to add new entries to microperl's config.sh.
To avoid doing this manually, I wrote a little Perl program that checks
for undefined symbols during the make process and attempts to divine
what they should be for microperl, adds in the relevant entries to
config.sh and tries again. It's a crude future-proofing, but it saves
me a lot of work.
Apart from that, it's been a question of re-arranging things in the
Perl core to make them more friendly to completely impotent
configurations like microperl's: signal handling, for instance, had to
be excised, and file modes such as O_CREAT generated for systems
without the relevant system header files.
The Future
Where do I see microperl going? Hopefully, it'll one day be more than
just a toy for me. It's certainly useful for anyone working on the Perl
core to quickly check out their changes or Perl's operation; it's good
for people learning the internals of Perl because it strips away
everything but the essentials. I'd like to see it building itself and
hopefully making an attack on the current Configure system.
But no matter what happens to it, it's still a neat hack.
_ _END_ _
__________________________________________________________________
Simon Cozens is an open source programmer and author, and a member of
the Perl development team. He lives in Tokyo, and his hobbies include
the Greek language and culture. He is the author of "Beginning Perl"
and a number of articles on Perl and on Open Source topics.
__________________________________________________________________
[4]PREVIOUS [5]TABLE OF CONTENTS [6]NEXT
References
1. https://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0002.html
2. https://www.foo.be/docs/tpj/issues/vol5_3/ewtoc.html
3. https://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0004.html
4. https://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0002.html
5. https://www.foo.be/docs/tpj/issues/vol5_3/ewtoc.html
6. https://www.foo.be/docs/tpj/issues/vol5_3/tpj0503-0004.html
|