miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000

  A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux

                         (or, "Size Is Everything")
     __________________________________________________________________

     She studied it carefully for about 15 minutes. Finally, she spoke.
     "There's something written on here," she said, frowning, "but it's
     really teensy."

                                     [Dave Barry, "The Columnist's Caper"]

   If you're a programmer who's become fed up with software bloat, then
   may you find herein the perfect antidote.

   This document explores methods for squeezing excess bytes out of simple
   programs. (Of course, the more practical purpose of this document is to
   describe a few of the inner workings of the ELF file format and the
   Linux operating system. But hopefully you can also learn something
   about how to make really teensy ELF executables in the process.)

   Please note that the information and examples given here are, for the
   most part, specific to ELF executables on a Linux platform running
   under an Intel x86 architecture. I imagine that a good bit of the
   information is applicable to other ELF-based Unices, but my experiences
   with such are too limited for me to say with certainty.

   Please also note that if you aren't a little bit familiar with assembly
   code, you may find parts of this document sort of hard to follow. (The
   assembly code that appears in this document is written using Nasm; see
   [1]http://www.nasm.us/.)
     __________________________________________________________________

   In order to start, we need a program. Almost any program will do, but
   the simpler the program the better, since we're more interested in how
   small we can make the executable than what the program does.

   Let's take an incredibly simple program, one that does nothing but
   return a number back to the operating system. Why not? After all, Unix
   already comes with no less than two such programs: true and false.
   Since 0 and 1 are already taken, we'll use the number 42.

   So, here is our first version:

  /* tiny.c */
  int main(void) { return 42; }

   which we can compile and test like so:

  $ gcc -Wall tiny.c
  $ ./a.out ; echo $?
  42

   So. How big is it? Well, on my machine, I get:

  $ wc -c a.out
     3998 a.out

   (Yours will probably differ some.) Admittedly, that's pretty small by
   today's standards, but it's almost certainly bigger than it needs to
   be.

   The obvious first step is to strip the executable:

  $ gcc -Wall -s tiny.c
  $ ./a.out ; echo $?
  42
  $ wc -c a.out
     2632 a.out

   That's certainly an improvement. For the next step, how about
   optimizing?

  $ gcc -Wall -s -O3 tiny.c
  $ wc -c a.out
     2616 a.out

   That also helped, but only just. Which makes sense: there's hardly
   anything there to optimize.

   It seems unlikely that there's much else we can do to shrink a
   one-statement C program. We're going to have to leave C behind, and use
   assembler instead. Hopefully, this will cut out all the extra overhead
   that C programs automatically incur.

   So, on to our second version. All we need to do is return 42 from
   main(). In assembly language, this means that the function should set
   the accumulator, eax, to 42, and then return:

  ; tiny.asm
  BITS 32
  GLOBAL main
  SECTION .text
  main:
                mov     eax, 42
                ret

   We can then build and test like so:

  $ nasm -f elf tiny.asm
  $ gcc -Wall -s tiny.o
  $ ./a.out ; echo $?
  42

   (Hey, who says assembly code is difficult?) And now how big is it?

  $ wc -c a.out
     2604 a.out

   Looks like we shaved off a measly twelve bytes. So much for all the
   extra overhead that C automatically incurs, eh?

   Well, the problem is that we are still incurring a lot of overhead by
   using the main() interface. The linker is still adding an interface to
   the OS for us, and it is that interface that actually calls main(). So
   how do we get around that if we don't need it?

   The actual entry point that the linker uses by default is the symbol
   with the name _start. When we link with gcc, it automatically includes
   a _start routine, one that sets up argc and argv, among other things,
   and then calls main().

   So, let's see if we can bypass this, and define our own _start routine:

  ; tiny.asm
  BITS 32
  GLOBAL _start
  SECTION .text
  _start:
                mov     eax, 42
                ret

   Will gcc do what we want?

  $ nasm -f elf tiny.asm
  $ gcc -Wall -s tiny.o
  tiny.o(.text+0x0): multiple definition of `_start'
  /usr/lib/crt1.o(.text+0x0): first defined here
  /usr/lib/crt1.o(.text+0x36): undefined reference to `main'

   No. Well, actually, yes it will, but first we need to learn how to ask
   for what we want.

   It so happens that gcc recognizes an option called -nostartfiles. From
   the gcc info pages:

     -nostartfiles
     Do not use the standard system startup files when linking. The
     standard libraries are used normally.

   Aha! Now let's see what we can do:

  $ nasm -f elf tiny.asm
  $ gcc -Wall -s -nostartfiles tiny.o
  $ ./a.out ; echo $?
  Segmentation fault
  139

   Well, gcc didn't complain, but the program doesn't work. What went
   wrong?

   What went wrong is that we treated _start as if it were a C function,
   and tried to return from it. In reality, it's not a function at all.
   It's just a symbol in the object file which the linker uses to locate
   the program's entry point. When our program is invoked, it's invoked
   directly. If we were to look, we would see that the value on the top of
   the stack was the number 1, which is certainly very un-address-like. In
   fact, what is on the stack is our program's argc value. After this
   comes the elements of the argv array, including the terminating NULL
   element, followed by the elements of envp. And that's all. There is no
   return address on the stack.

   So, how does _start ever exit? Well, it calls the exit() function!
   That's what it's there for, after all.

   Actually, I lied. What it really does is call the _exit() function.
   (Notice the leading underscore.) exit() is required to finish up some
   tasks on behalf of the process, but those tasks will never have been
   started, because we're bypassing the library's startup code. So we need
   to bypass the library's shutdown code as well, and go directly to the
   operating system's shutdown processing.

   So, let's try this again. We're going to call _exit(), which is a
   function that takes a single integer argument. So all we need to do is
   push the number onto the stack and call the function. (We also need to
   declare _exit() as external.) Here's our assembly:

  ; tiny.asm
  BITS 32
  EXTERN _exit
  GLOBAL _start
  SECTION .text
  _start:
                push    dword 42
                call    _exit

   And we build and test as before:

  $ nasm -f elf tiny.asm
  $ gcc -Wall -s -nostartfiles tiny.o
  $ ./a.out ; echo $?
  42

   Success at last! And now how big is it?

  $ wc -c a.out
     1340 a.out

   Almost half the size! Not bad. Not bad at all. Hmmm ... so what other
   interesting obscure options does gcc have?

   Well, this one, appearing immediately after -nostartfiles in the
   documentation, is certainly eye-catching:

     -nostdlib
     Don't use the standard system libraries and startup files when
     linking. Only the files you specify will be passed to the linker.

   That's gotta be worth investigating:

  $ gcc -Wall -s -nostdlib tiny.o
  tiny.o(.text+0x6): undefined reference to `_exit'

   Oops. That's right ... _exit() is, after all, a library function. It
   has to be filled in from somewhere.

   Okay. But surely, we don't need libc's help just to end a program, do
   we?

   No, we don't. If we're willing to leave behind all pretenses of
   portability, we can make our program exit without having to link with
   anything else. First, though, we need to know how to make a system call
   under Linux.
     __________________________________________________________________

   Linux, like most operating systems, provides basic necessities to the
   programs it hosts via system calls. This includes things like opening a
   file, reading and writing to file handles -- and, of course, shutting
   down a process.

   The Linux system call interface is a single instruction: int 0x80. All
   system calls are done via this interrupt. To make a system call, eax
   should contain a number that indicates which system call is being
   invoked, and other registers are used to hold the arguments, if any. If
   the system call takes one argument, it will be in ebx; a system call
   with two arguments will use ebx and ecx. Likewise, edx, esi, and edi
   are used if a third, fourth, or fifth argument is required,
   respectively. Upon return from a system call, eax will contain the
   return value. If an error occurs, eax will contain a negative value,
   with the absolute value indicating the error.

   The numbers for the different system calls are listed in
   /usr/include/asm/unistd.h. A quick peek will tell us that the exit
   system call is assigned the number 1. Like the C function, it takes one
   argument, the value to return to the parent process, and so this will
   go into ebx.

   We now know all we need to know to create the next version of our
   program, one that won't need assistance from any external functions to
   work:

  ; tiny.asm
  BITS 32
  GLOBAL _start
  SECTION .text
  _start:
                mov     eax, 1
                mov     ebx, 42
                int     0x80

   Here we go:

  $ nasm -f elf tiny.asm
  $ gcc -Wall -s -nostdlib tiny.o
  $ ./a.out ; echo $?
  42

   Ta-da! And the size?

  $ wc -c a.out
      372 a.out

   Now that's tiny! Almost a fourth the size of the previous version!

   So ... can we do anything else to make it even smaller?

   How about using shorter instructions?

   If we generate a list file for the assembly code, we'll find the
   following:

  00000000 B801000000        mov        eax, 1
  00000005 BB2A000000        mov        ebx, 42
  0000000A CD80              int        0x80

   Well, gee, we don't need to initialize all of ebx, since the operating
   system is only going to use the lowest byte. Setting bl alone will be
   sufficient, and will take two bytes instead of five.

   We can also set eax to one by xor'ing it to zero and then using a
   one-byte increment instruction; this will save two more bytes.

  00000000 31C0              xor        eax, eax
  00000002 40                inc        eax
  00000003 B32A              mov        bl, 42
  00000005 CD80              int        0x80

   I think it's pretty safe to say that we're not going to make this
   program any smaller than that.

   As an aside, we might as well stop using gcc to link our executable,
   seeing as we're not using any of its added functionality, and just call
   the linker, ld, ourselves:

  $ nasm -f elf tiny.asm
  $ ld -s tiny.o
  $ ./a.out ; echo $?
  42
  $ wc -c a.out
      368 a.out

   Four bytes smaller. (Hey! Didn't we shave five bytes off? Well, we did,
   but alignment considerations within the ELF file caused it to require
   an extra byte of padding.)

   So ... have we reached the end? Is this as small as we can go?

   Well, hm. Our program is now seven bytes long. Do ELF files really
   require 361 bytes of overhead? What's in this file, anyway?

   We can peek into the contents of the file using objdump:

  $ objdump -x a.out | less

   The output may look like gibberish, but right now let's just focus on
   the list of sections:

  Sections:
  Idx Name          Size      VMA       LMA       File off  Algn
    0 .text         00000007  08048080  08048080  00000080  2**4
                    CONTENTS, ALLOC, LOAD, READONLY, CODE
    1 .comment      0000001c  00000000  00000000  00000087  2**0
                    CONTENTS, READONLY

   The complete .text section is listed as being seven bytes long, just as
   we specified. So it seems safe to conclude that we now have complete
   control of the machine-language content of our program.

   But then there's this other section named ".comment". Who ordered that?
   And it's 28 bytes long, even! We may not be sure what this .comment
   section is, but it seems a good bet that it isn't a necessary
   feature....

   The .comment section is listed as being located at file offset 00000087
   (hexadecimal). If we use a hexdump program to look at that area of the
   file, we will see:

  00000080: 31C0 40B3 2ACD 8000 5468 6520 4E65 7477  1.@.*...The Netw
  00000090: 6964 6520 4173 7365 6D62 6C65 7220 302E  ide Assembler 0.
  000000A0: 3938 0000 2E73 796D 7461 6200 2E73 7472  98...symtab..str

   Well, well, well. Who'd've thought that Nasm would undermine our quest
   like this? Maybe we should switch to using gas, AT&T syntax
   notwithstanding....

   Alas, if we do:

  ; tiny.s
  .globl _start
  .text
  _start:
                xorl    %eax, %eax
                incl    %eax
                movb    $42, %bl
                int     $0x80

   ... we will find:

  $ gcc -s -nostdlib tiny.s
  $ ./a.out ; echo $?
  42
  $ wc -c a.out
      368 a.out

   ... no difference!

   Well, actually there is some difference. Turning once again to objdump,
   we see:

  Sections:
  Idx Name          Size      VMA       LMA       File off  Algn
    0 .text         00000007  08048074  08048074  00000074  2**2
                    CONTENTS, ALLOC, LOAD, READONLY, CODE
    1 .data         00000000  0804907c  0804907c  0000007c  2**2
                    CONTENTS, ALLOC, LOAD, DATA
    2 .bss          00000000  0804907c  0804907c  0000007c  2**2
                    ALLOC

   No comment section, but now we have two useless sections for storing
   our nonexistent data. And even though these sections are zero bytes
   long, they incur overhead, bringing our file size up for no good
   reason.

   Okay, so just what is all this overhead, and how do we get rid of it?

   Well, to answer these questions, we must begin diving into some real
   wizardry. We need to understand the ELF format.
     __________________________________________________________________

   The canonical document describing the ELF format for Intel-386
   architectures can be found at
   [2]http://refspecs.linuxbase.org/elf/elf.pdf. (You can also find a
   flat-text version of version 1.0 of the standard at
   [3]http://www.muppetlabs.com/~breadbox/software/ELF.txt.) This
   specification covers a lot of territory, so if you'd prefer to not read
   the whole thing yourself, I'll understand. Basically, here's what we
   need to know:

   Every ELF file begins with a structure called the ELF header. This
   structure is 52 bytes long, and contains several pieces of information
   that describe the contents of the file. For example, the first sixteen
   bytes contain an "identifier", which includes the file's magic-number
   signature (7F 45 4C 46), and some one-byte flags indicating that the
   contents are 32-bit or 64-bit, little-endian or big-endian, etc. Other
   fields in the ELF header contain information such as: the target
   architecture; whether the ELF file is an executable, an object file, or
   a shared-object library; the program's starting address; and the
   locations within the file of the program header table and the section
   header table.

   These two tables can appear anywhere in the file, but typically the
   former appears immediately following the ELF header, and the latter
   appears at or near the end of the file. The two tables serve similar
   purposes, in that they identify the component parts of the file.
   However, the section header table focuses more on identifying where the
   various parts of the program are within the file, while the program
   header table describes where and how these parts are to be loaded into
   memory. In brief, the section header table is for use by the compiler
   and linker, while the program header table is for use by the program
   loader. The program header table is optional for object files, and in
   practice is never present. Likewise, the section header table is
   optional for executables -- but is almost always present!

   So, this is the answer to our first question. A fair piece of the
   overhead in our program is a completely unnecessary section header
   table, and maybe some equally useless sections that don't contribute to
   our program's memory image.

   So, we turn to our second question: how do we go about getting rid of
   all that?

   Alas, we're on our own here. None of the standard tools will deign to
   make an executable without a section header table of some kind. If we
   want such a thing, we'll have to do it ourselves.

   This doesn't quite mean that we have to pull out a binary editor and
   code the hexadecimal values by hand, though. Good old Nasm has a flat
   binary output format, which will serve us well. All we need now is the
   image of an empty ELF executable, which we can fill in with our
   program. Our program, and nothing else.

   We can look at the ELF specification, and /usr/include/linux/elf.h, and
   executables created by the standard tools, to figure out what our empty
   ELF executable should look like. But, if you're the impatient type, you
   can just use the one I've supplied here:

  BITS 32

                org     0x08048000

  ehdr:                                                 ; Elf32_Ehdr
                db      0x7F, "ELF", 1, 1, 1, 0         ;   e_ident
        times 8 db      0
                dw      2                               ;   e_type
                dw      3                               ;   e_machine
                dd      1                               ;   e_version
                dd      _start                          ;   e_entry
                dd      phdr - $$                       ;   e_phoff
                dd      0                               ;   e_shoff
                dd      0                               ;   e_flags
                dw      ehdrsize                        ;   e_ehsize
                dw      phdrsize                        ;   e_phentsize
                dw      1                               ;   e_phnum
                dw      0                               ;   e_shentsize
                dw      0                               ;   e_shnum
                dw      0                               ;   e_shstrndx

  ehdrsize      equ     $ - ehdr

  phdr:                                                 ; Elf32_Phdr
                dd      1                               ;   p_type
                dd      0                               ;   p_offset
                dd      $$                              ;   p_vaddr
                dd      $$                              ;   p_paddr
                dd      filesize                        ;   p_filesz
                dd      filesize                        ;   p_memsz
                dd      5                               ;   p_flags
                dd      0x1000                          ;   p_align

  phdrsize      equ     $ - phdr

  _start:

  ; your program here

  filesize      equ     $ - $$

   This image contains an ELF header, identifying the file as an Intel 386
   executable, with no section header table and a program header table
   containing one entry. Said entry instructs the program loader to load
   the entire file into memory (it's normal behavior for a program to
   include its ELF header and program header table in its memory image)
   starting at memory address 0x08048000 (which is the default address for
   executables to load), and to begin executing the code at _start, which
   appears immediately after the program header table. No .data segment,
   no .bss segment, no commentary -- nothing but the bare necessities.

   So, let's add in our little program:

  ; tiny.asm
                org     0x08048000

  ;
  ; (as above)
  ;


  _start:
                mov     bl, 42
                xor     eax, eax
                inc     eax
                int     0x80

  filesize      equ     $ - $$

   and try it out:

  $ nasm -f bin -o a.out tiny.asm
  $ chmod +x a.out
  $ ./a.out ; echo $?
  42

   We have just created an executable completely from scratch. How about
   that? And now, take a look at its size:

  $ wc -c a.out
       91 a.out

   Ninety-one bytes. Less than one-fourth the size of our previous
   attempt, and less than one-fortieth the size of our first!

   What's more, this time we can account for every last byte. We know
   exactly what's in the executable, and why it needs to be there. This
   is, finally, the limit. We can't get any smaller than this.

   Or can we?
     __________________________________________________________________

   Well, if you actually stopped to read the ELF specification, you might
   have noticed a couple of facts. 1) The different parts of an ELF file
   are permitted to be located anywhere (except the ELF header, which must
   be at the top of the file), and they can even overlap each other. 2)
   Some of the fields in the headers aren't actually used.

   In particular, I'm thinking of that string of zeros at the end of the
   16-byte identification field. They are pure padding, to make room for
   future expansion of the ELF standard. So the OS shouldn't care at all
   what's in there. And we're already loading everything into memory
   anyway, and our program is only seven bytes long....

   Can we put our code inside the ELF header itself?

   Why not?

  ; tiny.asm

  BITS 32

                org     0x08048000

  ehdr:                                                 ; Elf32_Ehdr
                db      0x7F, "ELF"                     ;   e_ident
                db      1, 1, 1, 0, 0
  _start:       mov     bl, 42
                xor     eax, eax
                inc     eax
                int     0x80
                dw      2                               ;   e_type
                dw      3                               ;   e_machine
                dd      1                               ;   e_version
                dd      _start                          ;   e_entry
                dd      phdr - $$                       ;   e_phoff
                dd      0                               ;   e_shoff
                dd      0                               ;   e_flags
                dw      ehdrsize                        ;   e_ehsize
                dw      phdrsize                        ;   e_phentsize
                dw      1                               ;   e_phnum
                dw      0                               ;   e_shentsize
                dw      0                               ;   e_shnum
                dw      0                               ;   e_shstrndx

  ehdrsize      equ     $ - ehdr

  phdr:                                                 ; Elf32_Phdr
                dd      1                               ;   p_type
                dd      0                               ;   p_offset
                dd      $$                              ;   p_vaddr
                dd      $$                              ;   p_paddr
                dd      filesize                        ;   p_filesz
                dd      filesize                        ;   p_memsz
                dd      5                               ;   p_flags
                dd      0x1000                          ;   p_align

  phdrsize      equ     $ - phdr

  filesize      equ     $ - $$

   After all, bytes are bytes!

  $ nasm -f bin -o a.out tiny.asm
  $ chmod +x a.out
  $ ./a.out ; echo $?
  42
  $ wc -c a.out
       84 a.out

   Not bad, eh?

   Now we've really gone as low as we can go. Our file is exactly as long
   as one ELF header and one program header table entry, both of which we
   absolutely require in order to get loaded into memory and run. So
   there's nothing left to reduce now!

   Except ...

   Well, what if we could do the same thing to the program header table
   that we just did to the program? Have it overlap with the ELF header,
   that is. Is it possible?

   It is indeed. Take a look at our program. Note that the last eight
   bytes in the ELF header bear a certain kind of resemblence to the first
   eight bytes in the program header table. A certain kind of resemblence
   that might be described as "identical".

   So ...

  ; tiny.asm

  BITS 32

                org     0x08048000

  ehdr:
                db      0x7F, "ELF"             ; e_ident
                db      1, 1, 1, 0, 0
  _start:       mov     bl, 42
                xor     eax, eax
                inc     eax
                int     0x80
                dw      2                       ; e_type
                dw      3                       ; e_machine
                dd      1                       ; e_version
                dd      _start                  ; e_entry
                dd      phdr - $$               ; e_phoff
                dd      0                       ; e_shoff
                dd      0                       ; e_flags
                dw      ehdrsize                ; e_ehsize
                dw      phdrsize                ; e_phentsize
  phdr:         dd      1                       ; e_phnum       ; p_type
                                                ; e_shentsize
                dd      0                       ; e_shnum       ; p_offset
                                                ; e_shstrndx
  ehdrsize      equ     $ - ehdr
                dd      $$                                      ; p_vaddr
                dd      $$                                      ; p_paddr
                dd      filesize                                ; p_filesz
                dd      filesize                                ; p_memsz
                dd      5                                       ; p_flags
                dd      0x1000                                  ; p_align
  phdrsize      equ     $ - phdr

  filesize      equ     $ - $$

   And sure enough, Linux doesn't mind our parsimony one bit:

  $ nasm -f bin -o a.out tiny.asm
  $ chmod +x a.out
  $ ./a.out ; echo $?
  42
  $ wc -c a.out
       76 a.out

   Now we've really gone as low as we can go. There's no way to overlap
   the two structures any more than this. The bytes simply don't match up.
   This is the end of the line!

   Unless, that is, we could change the contents of the structures to make
   them match even further....

   How many of these fields is Linux actually looking at, anyway? For
   example, does Linux actually check to see if the e_machine field
   contains 3 (indicating an Intel 386 target), or is it just assuming
   that it does?

   As a matter of fact, in that case it does. But a surprising number of
   other fields are being quietly ignored.

   So: Here's what is and isn't essential in the ELF header. The first
   four bytes have to contain the magic number, or else Linux won't touch
   it. The other three bytes in the e_ident field are not checked,
   however, which means we have no less than twelve contiguous bytes we
   can set to anything at all. e_type has to be set to 2, to indicate an
   executable, and e_machine has to be 3, as just noted. e_version is,
   like the version number inside e_ident, completely ignored. (Which is
   sort of understandable, seeing as currently there's only one version of
   the ELF standard.) e_entry naturally has to be valid, since it points
   to the start of the program. And clearly, e_phoff needs to contain the
   correct offset of the program header table in the file, and e_phnum
   needs to contain the right number of entries in said table. e_flags,
   however, is documented as being currently unused for Intel, so it
   should be free for us to reuse. e_ehsize is supposed to be used to
   verify that the ELF header has the expected size, but Linux pays it no
   mind. e_phentsize is likewise for validating the size of the program
   header table entries. This one was unchecked in older kernels, but now
   it needs to be set correctly. Everything else in the ELF header is
   about the section header table, which doesn't come into play with
   executable files.

   And now how about the program header table entry? Well, p_type has to
   contain 1, to mark it as a loadable segment. p_offset really needs to
   have the correct file offset to start loading. Likewise, p_vaddr needs
   to contain the proper load address. Note, however, that we're not
   required to load at 0x08048000. Almost any address can be used as long
   as it's above 0x00000000, below 0x80000000, and page-aligned. The
   p_paddr field is documented as being ignored, so that's guaranteed to
   be free. p_filesz indicates how many bytes to load out of the file into
   memory, and p_memsz indicates how large the memory segment needs to be,
   so these numbers ought to be relatively sane. p_flags indicates what
   permissions to give the memory segment. It needs to be readable (4), or
   it won't be usable at all, and it needs to also be executable (1), or
   else we can't execute code in it. Other bits can probably be set as
   well, but we need to have those at minimum. Finally, p_align gives the
   alignment requirements for the memory segment. This field is mainly
   used when relocating segments containing position-independent code (as
   for shared libraries), so for an executable file Linux will ignore
   whatever garbage we store here.

   All in all, that's a fair bit of leeway. In particular, a bit of
   scrutiny will reveal that most of the necessary fields in the ELF
   header are in the first half - the second half is almost completely
   free for munging. With this in mind, we can interpose the two
   structures quite a bit more than we did previously:

  ; tiny.asm

  BITS 32

                org     0x00200000

                db      0x7F, "ELF"             ; e_ident
                db      1, 1, 1, 0, 0
  _start:
                mov     bl, 42
                xor     eax, eax
                inc     eax
                int     0x80
                dw      2                       ; e_type
                dw      3                       ; e_machine
                dd      1                       ; e_version
                dd      _start                  ; e_entry
                dd      phdr - $$               ; e_phoff
  phdr:         dd      1                       ; e_shoff       ; p_type
                dd      0                       ; e_flags       ; p_offset
                dd      $$                      ; e_ehsize      ; p_vaddr
                                                ; e_phentsize
                dw      1                       ; e_phnum       ; p_paddr
                dw      0                       ; e_shentsize
                dd      filesize                ; e_shnum       ; p_filesz
                                                ; e_shstrndx
                dd      filesize                                ; p_memsz
                dd      5                                       ; p_flags
                dd      0x1000                                  ; p_align

  filesize      equ     $ - $$

   As you can (hopefully) see, the first twenty bytes of the program
   header table now overlap the last twenty bytes of the ELF header. The
   two dovetail quite nicely, actually. There are only two parts of the
   ELF header within the overlapped region that matter. The first is the
   e_phnum field, which just happens to coincide with the p_paddr field,
   one of the few fields in the program header table which is definitely
   ignored. The other is the e_phentsize field, which coincides with the
   top half of the p_vaddr field. These are made to match up by selecting
   a non-standard load address for our program, with a top half equal to
   0x0020.

   Now we have really left behind all pretenses of portability ...

  $ nasm -f bin -o a.out tiny.asm
  $ chmod +x a.out
  $ ./a.out ; echo $?
  42
  $ wc -c a.out
       64 a.out

   ... but it works! And the program is twelve bytes shorter, exactly as
   predicted.

   This is where I say that we can't do any better than this, but of
   course, we already know that we can -- if we could get the program
   header table to reside completely within the ELF header. Can this holy
   grail be achieved?

   Well, we can't just move it up another twelve bytes without hitting
   hopeless obstacles trying to reconcile several fields in both
   structures. The only other possibility would be to have it start
   immediately following the first four bytes. This puts the first part of
   the program header table comfortably within the e_ident area, but still
   leaves problems with the rest of it. After some experimenting, it looks
   like it isn't going to quite be possible.

   However, it turns out that there are still a couple more fields in the
   program header table that we can pervert.

   We noted that p_memsz indicates how much memory to allocate for the
   memory segment. Obviously it needs to be at least as big as p_filesz,
   but there wouldn't be any harm if it was larger. Just because we ask
   for memory doesn't mean we have to use it, after all.

   Secondly, it turns out that, contrary to all my expectations, the
   executable bit can be dropped from the p_flags field. It turns out that
   the readable and executable bits are redundant: either one will imply
   the other.

   So, with these facts in mind, we can reorganize the file into this
   little monstrosity:

  ; tiny.asm

  BITS 32

                org     0x00010000

                db      0x7F, "ELF"             ; e_ident
                dd      1                                       ; p_type
                dd      0                                       ; p_offset
                dd      $$                                      ; p_vaddr
                dw      2                       ; e_type        ; p_paddr
                dw      3                       ; e_machine
                dd      _start                  ; e_version     ; p_filesz
                dd      _start                  ; e_entry       ; p_memsz
                dd      4                       ; e_phoff       ; p_flags
  _start:
                mov     bl, 42                  ; e_shoff       ; p_align
                xor     eax, eax
                inc     eax                     ; e_flags
                int     0x80
                db      0
                dw      0x34                    ; e_ehsize
                dw      0x20                    ; e_phentsize
                dw      1                       ; e_phnum
                dw      0                       ; e_shentsize
                dw      0                       ; e_shnum
                dw      0                       ; e_shstrndx

  filesize      equ     $ - $$

   The p_flags field has been changed from 5 to 4, as we noted we could
   get away with doing. This 4 is also the value of the e_phoff field,
   which gives the offset into the file for the program header table,
   which is exactly where we've located it. The program (remember that?)
   has been moved down to lower part of the ELF header, beginning at the
   e_shoff field and ending inside the e_flags field.

   Note that the load address has been changed to a much lower number --
   about as low as it can be, in fact. This keeps the value in the e_entry
   field to a reasonably small number, which is good since it's also the
   p_memsz number. (Actually, with virtual memory it hardly matters -- we
   could have left it at our original value and it would work just as
   well. But there's no harm in being polite.)

   The change to p_filesz may require an explanation. Because we aren't
   setting the write bit in the p_flags field, Linux won't let us define a
   p_memsz value greater than p_filesz, since it can't zero-initialize
   those extra bytes if they aren't writeable. Since we can't change the
   p_flags field without moving the program header table out of alignment,
   you might think that the only solution would be to lower the p_memsz
   value back down to equal p_filesz (which would make it impossible to
   share it with e_entry). However, another solution exists, namely to
   increase p_filesz to equal p_memsz. That means they're both larger than
   the real file size -- quite a bit larger, in fact -- but it absolves
   the loader from having to write to read-only memory, which is all it
   cared about.

   And so ...

  $ nasm -f bin -o a.out tiny.asm
  $ chmod +x a.out
  $ ./a.out ; echo $?
  42
  $ wc -c a.out
       52 a.out

   ... and so, with both the program header table and the program itself
   completely embedded within the ELF header, our executable file is now
   exactly as big as the ELF header! No more, no less. And still running
   without a single complaint from Linux!

   Now, finally, we have truly and certainly reached the absolute minimum
   possible. There can be no question about it, right? After all, we have
   to have a complete ELF header (even if it is badly mangled), or else
   Linux wouldn't give us the time of day!

   Right?

   Wrong. We have one last dirty trick left.

   It seems to be the case that if the file isn't quite the size of a full
   ELF header, Linux will still play ball, and fill out the missing bytes
   with zeros. We have no less than seven zeros at the end of our file,
   and if we drop them from the file image:

  ; tiny.asm

  BITS 32

                org     0x00010000

                db      0x7F, "ELF"             ; e_ident
                dd      1                                       ; p_type
                dd      0                                       ; p_offset
                dd      $$                                      ; p_vaddr
                dw      2                       ; e_type        ; p_paddr
                dw      3                       ; e_machine
                dd      _start                  ; e_version     ; p_filesz
                dd      _start                  ; e_entry       ; p_memsz
                dd      4                       ; e_phoff       ; p_flags
  _start:
                mov     bl, 42                  ; e_shoff       ; p_align
                xor     eax, eax
                inc     eax                     ; e_flags
                int     0x80
                db      0
                dw      0x34                    ; e_ehsize
                dw      0x20                    ; e_phentsize
                db      1                       ; e_phnum
                                                ; e_shentsize
                                                ; e_shnum
                                                ; e_shstrndx

  filesize      equ     $ - $$

   ... we can, incredibly enough, still produce a working executable:

  $ nasm -f bin -o a.out tiny.asm
  $ chmod +x a.out
  $ ./a.out ; echo $?
  42
  $ wc -c a.out
       45 a.out

   Here, at last, we have honestly gone as far as we can go. There is no
   getting around the fact that the 45th byte in the file, which specifies
   the number of entries in the program header table, needs to be
   non-zero, needs to be present, and needs to be in the 45th position
   from the start of the ELF header. We are forced to conclude that there
   is nothing more that can be done.
     __________________________________________________________________

   This forty-five-byte file is less than one-eighth the size of the
   smallest ELF executable we could create using the standard tools, and
   is less than one-fiftieth the size of the smallest file we could create
   using pure C code. We have stripped everything out of the file that we
   could, and put to dual purpose most of what we couldn't.

   Of course, half of the values in this file violate some part of the ELF
   standard, and it's a wonder that Linux will even consent to sneeze on
   it, much less give it a process ID. This is not the sort of program to
   which one would normally be willing to confess authorship.

   On the other hand, every single byte in this executable file can be
   accounted for and justified. How many executables have you created
   lately that you can say that about?


                                                                 [4](next)
     __________________________________________________________________

   [5]Tiny
   [6]Software
   [7]Brian Raiter

References

   1. http://www.nasm.us/
   2. http://refspecs.linuxbase.org/elf/elf.pdf
   3. http://www.muppetlabs.com/~breadbox/software/ELF.txt
   4. https://www.muppetlabs.com/~breadbox/software/tiny/teensyps.html
   5. http://www.muppetlabs.com/~breadbox/software/tiny/
   6. http://www.muppetlabs.com/~breadbox/software/
   7. http://www.muppetlabs.com/~breadbox/