summaryrefslogtreecommitdiff
path: root/miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt
diff options
context:
space:
mode:
Diffstat (limited to 'miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt')
-rw-r--r--miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt1000
1 files changed, 1000 insertions, 0 deletions
diff --git a/miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt b/miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt
new file mode 100644
index 0000000..902fed3
--- /dev/null
+++ b/miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt
@@ -0,0 +1,1000 @@
+ A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux
+
+ (or, "Size Is Everything")
+ __________________________________________________________________
+
+ She studied it carefully for about 15 minutes. Finally, she spoke.
+ "There's something written on here," she said, frowning, "but it's
+ really teensy."
+
+ [Dave Barry, "The Columnist's Caper"]
+
+ If you're a programmer who's become fed up with software bloat, then
+ may you find herein the perfect antidote.
+
+ This document explores methods for squeezing excess bytes out of simple
+ programs. (Of course, the more practical purpose of this document is to
+ describe a few of the inner workings of the ELF file format and the
+ Linux operating system. But hopefully you can also learn something
+ about how to make really teensy ELF executables in the process.)
+
+ Please note that the information and examples given here are, for the
+ most part, specific to ELF executables on a Linux platform running
+ under an Intel x86 architecture. I imagine that a good bit of the
+ information is applicable to other ELF-based Unices, but my experiences
+ with such are too limited for me to say with certainty.
+
+ Please also note that if you aren't a little bit familiar with assembly
+ code, you may find parts of this document sort of hard to follow. (The
+ assembly code that appears in this document is written using Nasm; see
+ [1]http://www.nasm.us/.)
+ __________________________________________________________________
+
+ In order to start, we need a program. Almost any program will do, but
+ the simpler the program the better, since we're more interested in how
+ small we can make the executable than what the program does.
+
+ Let's take an incredibly simple program, one that does nothing but
+ return a number back to the operating system. Why not? After all, Unix
+ already comes with no less than two such programs: true and false.
+ Since 0 and 1 are already taken, we'll use the number 42.
+
+ So, here is our first version:
+
+ /* tiny.c */
+ int main(void) { return 42; }
+
+ which we can compile and test like so:
+
+ $ gcc -Wall tiny.c
+ $ ./a.out ; echo $?
+ 42
+
+ So. How big is it? Well, on my machine, I get:
+
+ $ wc -c a.out
+ 3998 a.out
+
+ (Yours will probably differ some.) Admittedly, that's pretty small by
+ today's standards, but it's almost certainly bigger than it needs to
+ be.
+
+ The obvious first step is to strip the executable:
+
+ $ gcc -Wall -s tiny.c
+ $ ./a.out ; echo $?
+ 42
+ $ wc -c a.out
+ 2632 a.out
+
+ That's certainly an improvement. For the next step, how about
+ optimizing?
+
+ $ gcc -Wall -s -O3 tiny.c
+ $ wc -c a.out
+ 2616 a.out
+
+ That also helped, but only just. Which makes sense: there's hardly
+ anything there to optimize.
+
+ It seems unlikely that there's much else we can do to shrink a
+ one-statement C program. We're going to have to leave C behind, and use
+ assembler instead. Hopefully, this will cut out all the extra overhead
+ that C programs automatically incur.
+
+ So, on to our second version. All we need to do is return 42 from
+ main(). In assembly language, this means that the function should set
+ the accumulator, eax, to 42, and then return:
+
+ ; tiny.asm
+ BITS 32
+ GLOBAL main
+ SECTION .text
+ main:
+ mov eax, 42
+ ret
+
+ We can then build and test like so:
+
+ $ nasm -f elf tiny.asm
+ $ gcc -Wall -s tiny.o
+ $ ./a.out ; echo $?
+ 42
+
+ (Hey, who says assembly code is difficult?) And now how big is it?
+
+ $ wc -c a.out
+ 2604 a.out
+
+ Looks like we shaved off a measly twelve bytes. So much for all the
+ extra overhead that C automatically incurs, eh?
+
+ Well, the problem is that we are still incurring a lot of overhead by
+ using the main() interface. The linker is still adding an interface to
+ the OS for us, and it is that interface that actually calls main(). So
+ how do we get around that if we don't need it?
+
+ The actual entry point that the linker uses by default is the symbol
+ with the name _start. When we link with gcc, it automatically includes
+ a _start routine, one that sets up argc and argv, among other things,
+ and then calls main().
+
+ So, let's see if we can bypass this, and define our own _start routine:
+
+ ; tiny.asm
+ BITS 32
+ GLOBAL _start
+ SECTION .text
+ _start:
+ mov eax, 42
+ ret
+
+ Will gcc do what we want?
+
+ $ nasm -f elf tiny.asm
+ $ gcc -Wall -s tiny.o
+ tiny.o(.text+0x0): multiple definition of `_start'
+ /usr/lib/crt1.o(.text+0x0): first defined here
+ /usr/lib/crt1.o(.text+0x36): undefined reference to `main'
+
+ No. Well, actually, yes it will, but first we need to learn how to ask
+ for what we want.
+
+ It so happens that gcc recognizes an option called -nostartfiles. From
+ the gcc info pages:
+
+ -nostartfiles
+ Do not use the standard system startup files when linking. The
+ standard libraries are used normally.
+
+ Aha! Now let's see what we can do:
+
+ $ nasm -f elf tiny.asm
+ $ gcc -Wall -s -nostartfiles tiny.o
+ $ ./a.out ; echo $?
+ Segmentation fault
+ 139
+
+ Well, gcc didn't complain, but the program doesn't work. What went
+ wrong?
+
+ What went wrong is that we treated _start as if it were a C function,
+ and tried to return from it. In reality, it's not a function at all.
+ It's just a symbol in the object file which the linker uses to locate
+ the program's entry point. When our program is invoked, it's invoked
+ directly. If we were to look, we would see that the value on the top of
+ the stack was the number 1, which is certainly very un-address-like. In
+ fact, what is on the stack is our program's argc value. After this
+ comes the elements of the argv array, including the terminating NULL
+ element, followed by the elements of envp. And that's all. There is no
+ return address on the stack.
+
+ So, how does _start ever exit? Well, it calls the exit() function!
+ That's what it's there for, after all.
+
+ Actually, I lied. What it really does is call the _exit() function.
+ (Notice the leading underscore.) exit() is required to finish up some
+ tasks on behalf of the process, but those tasks will never have been
+ started, because we're bypassing the library's startup code. So we need
+ to bypass the library's shutdown code as well, and go directly to the
+ operating system's shutdown processing.
+
+ So, let's try this again. We're going to call _exit(), which is a
+ function that takes a single integer argument. So all we need to do is
+ push the number onto the stack and call the function. (We also need to
+ declare _exit() as external.) Here's our assembly:
+
+ ; tiny.asm
+ BITS 32
+ EXTERN _exit
+ GLOBAL _start
+ SECTION .text
+ _start:
+ push dword 42
+ call _exit
+
+ And we build and test as before:
+
+ $ nasm -f elf tiny.asm
+ $ gcc -Wall -s -nostartfiles tiny.o
+ $ ./a.out ; echo $?
+ 42
+
+ Success at last! And now how big is it?
+
+ $ wc -c a.out
+ 1340 a.out
+
+ Almost half the size! Not bad. Not bad at all. Hmmm ... so what other
+ interesting obscure options does gcc have?
+
+ Well, this one, appearing immediately after -nostartfiles in the
+ documentation, is certainly eye-catching:
+
+ -nostdlib
+ Don't use the standard system libraries and startup files when
+ linking. Only the files you specify will be passed to the linker.
+
+ That's gotta be worth investigating:
+
+ $ gcc -Wall -s -nostdlib tiny.o
+ tiny.o(.text+0x6): undefined reference to `_exit'
+
+ Oops. That's right ... _exit() is, after all, a library function. It
+ has to be filled in from somewhere.
+
+ Okay. But surely, we don't need libc's help just to end a program, do
+ we?
+
+ No, we don't. If we're willing to leave behind all pretenses of
+ portability, we can make our program exit without having to link with
+ anything else. First, though, we need to know how to make a system call
+ under Linux.
+ __________________________________________________________________
+
+ Linux, like most operating systems, provides basic necessities to the
+ programs it hosts via system calls. This includes things like opening a
+ file, reading and writing to file handles -- and, of course, shutting
+ down a process.
+
+ The Linux system call interface is a single instruction: int 0x80. All
+ system calls are done via this interrupt. To make a system call, eax
+ should contain a number that indicates which system call is being
+ invoked, and other registers are used to hold the arguments, if any. If
+ the system call takes one argument, it will be in ebx; a system call
+ with two arguments will use ebx and ecx. Likewise, edx, esi, and edi
+ are used if a third, fourth, or fifth argument is required,
+ respectively. Upon return from a system call, eax will contain the
+ return value. If an error occurs, eax will contain a negative value,
+ with the absolute value indicating the error.
+
+ The numbers for the different system calls are listed in
+ /usr/include/asm/unistd.h. A quick peek will tell us that the exit
+ system call is assigned the number 1. Like the C function, it takes one
+ argument, the value to return to the parent process, and so this will
+ go into ebx.
+
+ We now know all we need to know to create the next version of our
+ program, one that won't need assistance from any external functions to
+ work:
+
+ ; tiny.asm
+ BITS 32
+ GLOBAL _start
+ SECTION .text
+ _start:
+ mov eax, 1
+ mov ebx, 42
+ int 0x80
+
+ Here we go:
+
+ $ nasm -f elf tiny.asm
+ $ gcc -Wall -s -nostdlib tiny.o
+ $ ./a.out ; echo $?
+ 42
+
+ Ta-da! And the size?
+
+ $ wc -c a.out
+ 372 a.out
+
+ Now that's tiny! Almost a fourth the size of the previous version!
+
+ So ... can we do anything else to make it even smaller?
+
+ How about using shorter instructions?
+
+ If we generate a list file for the assembly code, we'll find the
+ following:
+
+ 00000000 B801000000 mov eax, 1
+ 00000005 BB2A000000 mov ebx, 42
+ 0000000A CD80 int 0x80
+
+ Well, gee, we don't need to initialize all of ebx, since the operating
+ system is only going to use the lowest byte. Setting bl alone will be
+ sufficient, and will take two bytes instead of five.
+
+ We can also set eax to one by xor'ing it to zero and then using a
+ one-byte increment instruction; this will save two more bytes.
+
+ 00000000 31C0 xor eax, eax
+ 00000002 40 inc eax
+ 00000003 B32A mov bl, 42
+ 00000005 CD80 int 0x80
+
+ I think it's pretty safe to say that we're not going to make this
+ program any smaller than that.
+
+ As an aside, we might as well stop using gcc to link our executable,
+ seeing as we're not using any of its added functionality, and just call
+ the linker, ld, ourselves:
+
+ $ nasm -f elf tiny.asm
+ $ ld -s tiny.o
+ $ ./a.out ; echo $?
+ 42
+ $ wc -c a.out
+ 368 a.out
+
+ Four bytes smaller. (Hey! Didn't we shave five bytes off? Well, we did,
+ but alignment considerations within the ELF file caused it to require
+ an extra byte of padding.)
+
+ So ... have we reached the end? Is this as small as we can go?
+
+ Well, hm. Our program is now seven bytes long. Do ELF files really
+ require 361 bytes of overhead? What's in this file, anyway?
+
+ We can peek into the contents of the file using objdump:
+
+ $ objdump -x a.out | less
+
+ The output may look like gibberish, but right now let's just focus on
+ the list of sections:
+
+ Sections:
+ Idx Name Size VMA LMA File off Algn
+ 0 .text 00000007 08048080 08048080 00000080 2**4
+ CONTENTS, ALLOC, LOAD, READONLY, CODE
+ 1 .comment 0000001c 00000000 00000000 00000087 2**0
+ CONTENTS, READONLY
+
+ The complete .text section is listed as being seven bytes long, just as
+ we specified. So it seems safe to conclude that we now have complete
+ control of the machine-language content of our program.
+
+ But then there's this other section named ".comment". Who ordered that?
+ And it's 28 bytes long, even! We may not be sure what this .comment
+ section is, but it seems a good bet that it isn't a necessary
+ feature....
+
+ The .comment section is listed as being located at file offset 00000087
+ (hexadecimal). If we use a hexdump program to look at that area of the
+ file, we will see:
+
+ 00000080: 31C0 40B3 2ACD 8000 5468 6520 4E65 7477 1.@.*...The Netw
+ 00000090: 6964 6520 4173 7365 6D62 6C65 7220 302E ide Assembler 0.
+ 000000A0: 3938 0000 2E73 796D 7461 6200 2E73 7472 98...symtab..str
+
+ Well, well, well. Who'd've thought that Nasm would undermine our quest
+ like this? Maybe we should switch to using gas, AT&T syntax
+ notwithstanding....
+
+ Alas, if we do:
+
+ ; tiny.s
+ .globl _start
+ .text
+ _start:
+ xorl %eax, %eax
+ incl %eax
+ movb $42, %bl
+ int $0x80
+
+ ... we will find:
+
+ $ gcc -s -nostdlib tiny.s
+ $ ./a.out ; echo $?
+ 42
+ $ wc -c a.out
+ 368 a.out
+
+ ... no difference!
+
+ Well, actually there is some difference. Turning once again to objdump,
+ we see:
+
+ Sections:
+ Idx Name Size VMA LMA File off Algn
+ 0 .text 00000007 08048074 08048074 00000074 2**2
+ CONTENTS, ALLOC, LOAD, READONLY, CODE
+ 1 .data 00000000 0804907c 0804907c 0000007c 2**2
+ CONTENTS, ALLOC, LOAD, DATA
+ 2 .bss 00000000 0804907c 0804907c 0000007c 2**2
+ ALLOC
+
+ No comment section, but now we have two useless sections for storing
+ our nonexistent data. And even though these sections are zero bytes
+ long, they incur overhead, bringing our file size up for no good
+ reason.
+
+ Okay, so just what is all this overhead, and how do we get rid of it?
+
+ Well, to answer these questions, we must begin diving into some real
+ wizardry. We need to understand the ELF format.
+ __________________________________________________________________
+
+ The canonical document describing the ELF format for Intel-386
+ architectures can be found at
+ [2]http://refspecs.linuxbase.org/elf/elf.pdf. (You can also find a
+ flat-text version of version 1.0 of the standard at
+ [3]http://www.muppetlabs.com/~breadbox/software/ELF.txt.) This
+ specification covers a lot of territory, so if you'd prefer to not read
+ the whole thing yourself, I'll understand. Basically, here's what we
+ need to know:
+
+ Every ELF file begins with a structure called the ELF header. This
+ structure is 52 bytes long, and contains several pieces of information
+ that describe the contents of the file. For example, the first sixteen
+ bytes contain an "identifier", which includes the file's magic-number
+ signature (7F 45 4C 46), and some one-byte flags indicating that the
+ contents are 32-bit or 64-bit, little-endian or big-endian, etc. Other
+ fields in the ELF header contain information such as: the target
+ architecture; whether the ELF file is an executable, an object file, or
+ a shared-object library; the program's starting address; and the
+ locations within the file of the program header table and the section
+ header table.
+
+ These two tables can appear anywhere in the file, but typically the
+ former appears immediately following the ELF header, and the latter
+ appears at or near the end of the file. The two tables serve similar
+ purposes, in that they identify the component parts of the file.
+ However, the section header table focuses more on identifying where the
+ various parts of the program are within the file, while the program
+ header table describes where and how these parts are to be loaded into
+ memory. In brief, the section header table is for use by the compiler
+ and linker, while the program header table is for use by the program
+ loader. The program header table is optional for object files, and in
+ practice is never present. Likewise, the section header table is
+ optional for executables -- but is almost always present!
+
+ So, this is the answer to our first question. A fair piece of the
+ overhead in our program is a completely unnecessary section header
+ table, and maybe some equally useless sections that don't contribute to
+ our program's memory image.
+
+ So, we turn to our second question: how do we go about getting rid of
+ all that?
+
+ Alas, we're on our own here. None of the standard tools will deign to
+ make an executable without a section header table of some kind. If we
+ want such a thing, we'll have to do it ourselves.
+
+ This doesn't quite mean that we have to pull out a binary editor and
+ code the hexadecimal values by hand, though. Good old Nasm has a flat
+ binary output format, which will serve us well. All we need now is the
+ image of an empty ELF executable, which we can fill in with our
+ program. Our program, and nothing else.
+
+ We can look at the ELF specification, and /usr/include/linux/elf.h, and
+ executables created by the standard tools, to figure out what our empty
+ ELF executable should look like. But, if you're the impatient type, you
+ can just use the one I've supplied here:
+
+ BITS 32
+
+ org 0x08048000
+
+ ehdr: ; Elf32_Ehdr
+ db 0x7F, "ELF", 1, 1, 1, 0 ; e_ident
+ times 8 db 0
+ dw 2 ; e_type
+ dw 3 ; e_machine
+ dd 1 ; e_version
+ dd _start ; e_entry
+ dd phdr - $$ ; e_phoff
+ dd 0 ; e_shoff
+ dd 0 ; e_flags
+ dw ehdrsize ; e_ehsize
+ dw phdrsize ; e_phentsize
+ dw 1 ; e_phnum
+ dw 0 ; e_shentsize
+ dw 0 ; e_shnum
+ dw 0 ; e_shstrndx
+
+ ehdrsize equ $ - ehdr
+
+ phdr: ; Elf32_Phdr
+ dd 1 ; p_type
+ dd 0 ; p_offset
+ dd $$ ; p_vaddr
+ dd $$ ; p_paddr
+ dd filesize ; p_filesz
+ dd filesize ; p_memsz
+ dd 5 ; p_flags
+ dd 0x1000 ; p_align
+
+ phdrsize equ $ - phdr
+
+ _start:
+
+ ; your program here
+
+ filesize equ $ - $$
+
+ This image contains an ELF header, identifying the file as an Intel 386
+ executable, with no section header table and a program header table
+ containing one entry. Said entry instructs the program loader to load
+ the entire file into memory (it's normal behavior for a program to
+ include its ELF header and program header table in its memory image)
+ starting at memory address 0x08048000 (which is the default address for
+ executables to load), and to begin executing the code at _start, which
+ appears immediately after the program header table. No .data segment,
+ no .bss segment, no commentary -- nothing but the bare necessities.
+
+ So, let's add in our little program:
+
+ ; tiny.asm
+ org 0x08048000
+
+ ;
+ ; (as above)
+ ;
+
+
+ _start:
+ mov bl, 42
+ xor eax, eax
+ inc eax
+ int 0x80
+
+ filesize equ $ - $$
+
+ and try it out:
+
+ $ nasm -f bin -o a.out tiny.asm
+ $ chmod +x a.out
+ $ ./a.out ; echo $?
+ 42
+
+ We have just created an executable completely from scratch. How about
+ that? And now, take a look at its size:
+
+ $ wc -c a.out
+ 91 a.out
+
+ Ninety-one bytes. Less than one-fourth the size of our previous
+ attempt, and less than one-fortieth the size of our first!
+
+ What's more, this time we can account for every last byte. We know
+ exactly what's in the executable, and why it needs to be there. This
+ is, finally, the limit. We can't get any smaller than this.
+
+ Or can we?
+ __________________________________________________________________
+
+ Well, if you actually stopped to read the ELF specification, you might
+ have noticed a couple of facts. 1) The different parts of an ELF file
+ are permitted to be located anywhere (except the ELF header, which must
+ be at the top of the file), and they can even overlap each other. 2)
+ Some of the fields in the headers aren't actually used.
+
+ In particular, I'm thinking of that string of zeros at the end of the
+ 16-byte identification field. They are pure padding, to make room for
+ future expansion of the ELF standard. So the OS shouldn't care at all
+ what's in there. And we're already loading everything into memory
+ anyway, and our program is only seven bytes long....
+
+ Can we put our code inside the ELF header itself?
+
+ Why not?
+
+ ; tiny.asm
+
+ BITS 32
+
+ org 0x08048000
+
+ ehdr: ; Elf32_Ehdr
+ db 0x7F, "ELF" ; e_ident
+ db 1, 1, 1, 0, 0
+ _start: mov bl, 42
+ xor eax, eax
+ inc eax
+ int 0x80
+ dw 2 ; e_type
+ dw 3 ; e_machine
+ dd 1 ; e_version
+ dd _start ; e_entry
+ dd phdr - $$ ; e_phoff
+ dd 0 ; e_shoff
+ dd 0 ; e_flags
+ dw ehdrsize ; e_ehsize
+ dw phdrsize ; e_phentsize
+ dw 1 ; e_phnum
+ dw 0 ; e_shentsize
+ dw 0 ; e_shnum
+ dw 0 ; e_shstrndx
+
+ ehdrsize equ $ - ehdr
+
+ phdr: ; Elf32_Phdr
+ dd 1 ; p_type
+ dd 0 ; p_offset
+ dd $$ ; p_vaddr
+ dd $$ ; p_paddr
+ dd filesize ; p_filesz
+ dd filesize ; p_memsz
+ dd 5 ; p_flags
+ dd 0x1000 ; p_align
+
+ phdrsize equ $ - phdr
+
+ filesize equ $ - $$
+
+ After all, bytes are bytes!
+
+ $ nasm -f bin -o a.out tiny.asm
+ $ chmod +x a.out
+ $ ./a.out ; echo $?
+ 42
+ $ wc -c a.out
+ 84 a.out
+
+ Not bad, eh?
+
+ Now we've really gone as low as we can go. Our file is exactly as long
+ as one ELF header and one program header table entry, both of which we
+ absolutely require in order to get loaded into memory and run. So
+ there's nothing left to reduce now!
+
+ Except ...
+
+ Well, what if we could do the same thing to the program header table
+ that we just did to the program? Have it overlap with the ELF header,
+ that is. Is it possible?
+
+ It is indeed. Take a look at our program. Note that the last eight
+ bytes in the ELF header bear a certain kind of resemblence to the first
+ eight bytes in the program header table. A certain kind of resemblence
+ that might be described as "identical".
+
+ So ...
+
+ ; tiny.asm
+
+ BITS 32
+
+ org 0x08048000
+
+ ehdr:
+ db 0x7F, "ELF" ; e_ident
+ db 1, 1, 1, 0, 0
+ _start: mov bl, 42
+ xor eax, eax
+ inc eax
+ int 0x80
+ dw 2 ; e_type
+ dw 3 ; e_machine
+ dd 1 ; e_version
+ dd _start ; e_entry
+ dd phdr - $$ ; e_phoff
+ dd 0 ; e_shoff
+ dd 0 ; e_flags
+ dw ehdrsize ; e_ehsize
+ dw phdrsize ; e_phentsize
+ phdr: dd 1 ; e_phnum ; p_type
+ ; e_shentsize
+ dd 0 ; e_shnum ; p_offset
+ ; e_shstrndx
+ ehdrsize equ $ - ehdr
+ dd $$ ; p_vaddr
+ dd $$ ; p_paddr
+ dd filesize ; p_filesz
+ dd filesize ; p_memsz
+ dd 5 ; p_flags
+ dd 0x1000 ; p_align
+ phdrsize equ $ - phdr
+
+ filesize equ $ - $$
+
+ And sure enough, Linux doesn't mind our parsimony one bit:
+
+ $ nasm -f bin -o a.out tiny.asm
+ $ chmod +x a.out
+ $ ./a.out ; echo $?
+ 42
+ $ wc -c a.out
+ 76 a.out
+
+ Now we've really gone as low as we can go. There's no way to overlap
+ the two structures any more than this. The bytes simply don't match up.
+ This is the end of the line!
+
+ Unless, that is, we could change the contents of the structures to make
+ them match even further....
+
+ How many of these fields is Linux actually looking at, anyway? For
+ example, does Linux actually check to see if the e_machine field
+ contains 3 (indicating an Intel 386 target), or is it just assuming
+ that it does?
+
+ As a matter of fact, in that case it does. But a surprising number of
+ other fields are being quietly ignored.
+
+ So: Here's what is and isn't essential in the ELF header. The first
+ four bytes have to contain the magic number, or else Linux won't touch
+ it. The other three bytes in the e_ident field are not checked,
+ however, which means we have no less than twelve contiguous bytes we
+ can set to anything at all. e_type has to be set to 2, to indicate an
+ executable, and e_machine has to be 3, as just noted. e_version is,
+ like the version number inside e_ident, completely ignored. (Which is
+ sort of understandable, seeing as currently there's only one version of
+ the ELF standard.) e_entry naturally has to be valid, since it points
+ to the start of the program. And clearly, e_phoff needs to contain the
+ correct offset of the program header table in the file, and e_phnum
+ needs to contain the right number of entries in said table. e_flags,
+ however, is documented as being currently unused for Intel, so it
+ should be free for us to reuse. e_ehsize is supposed to be used to
+ verify that the ELF header has the expected size, but Linux pays it no
+ mind. e_phentsize is likewise for validating the size of the program
+ header table entries. This one was unchecked in older kernels, but now
+ it needs to be set correctly. Everything else in the ELF header is
+ about the section header table, which doesn't come into play with
+ executable files.
+
+ And now how about the program header table entry? Well, p_type has to
+ contain 1, to mark it as a loadable segment. p_offset really needs to
+ have the correct file offset to start loading. Likewise, p_vaddr needs
+ to contain the proper load address. Note, however, that we're not
+ required to load at 0x08048000. Almost any address can be used as long
+ as it's above 0x00000000, below 0x80000000, and page-aligned. The
+ p_paddr field is documented as being ignored, so that's guaranteed to
+ be free. p_filesz indicates how many bytes to load out of the file into
+ memory, and p_memsz indicates how large the memory segment needs to be,
+ so these numbers ought to be relatively sane. p_flags indicates what
+ permissions to give the memory segment. It needs to be readable (4), or
+ it won't be usable at all, and it needs to also be executable (1), or
+ else we can't execute code in it. Other bits can probably be set as
+ well, but we need to have those at minimum. Finally, p_align gives the
+ alignment requirements for the memory segment. This field is mainly
+ used when relocating segments containing position-independent code (as
+ for shared libraries), so for an executable file Linux will ignore
+ whatever garbage we store here.
+
+ All in all, that's a fair bit of leeway. In particular, a bit of
+ scrutiny will reveal that most of the necessary fields in the ELF
+ header are in the first half - the second half is almost completely
+ free for munging. With this in mind, we can interpose the two
+ structures quite a bit more than we did previously:
+
+ ; tiny.asm
+
+ BITS 32
+
+ org 0x00200000
+
+ db 0x7F, "ELF" ; e_ident
+ db 1, 1, 1, 0, 0
+ _start:
+ mov bl, 42
+ xor eax, eax
+ inc eax
+ int 0x80
+ dw 2 ; e_type
+ dw 3 ; e_machine
+ dd 1 ; e_version
+ dd _start ; e_entry
+ dd phdr - $$ ; e_phoff
+ phdr: dd 1 ; e_shoff ; p_type
+ dd 0 ; e_flags ; p_offset
+ dd $$ ; e_ehsize ; p_vaddr
+ ; e_phentsize
+ dw 1 ; e_phnum ; p_paddr
+ dw 0 ; e_shentsize
+ dd filesize ; e_shnum ; p_filesz
+ ; e_shstrndx
+ dd filesize ; p_memsz
+ dd 5 ; p_flags
+ dd 0x1000 ; p_align
+
+ filesize equ $ - $$
+
+ As you can (hopefully) see, the first twenty bytes of the program
+ header table now overlap the last twenty bytes of the ELF header. The
+ two dovetail quite nicely, actually. There are only two parts of the
+ ELF header within the overlapped region that matter. The first is the
+ e_phnum field, which just happens to coincide with the p_paddr field,
+ one of the few fields in the program header table which is definitely
+ ignored. The other is the e_phentsize field, which coincides with the
+ top half of the p_vaddr field. These are made to match up by selecting
+ a non-standard load address for our program, with a top half equal to
+ 0x0020.
+
+ Now we have really left behind all pretenses of portability ...
+
+ $ nasm -f bin -o a.out tiny.asm
+ $ chmod +x a.out
+ $ ./a.out ; echo $?
+ 42
+ $ wc -c a.out
+ 64 a.out
+
+ ... but it works! And the program is twelve bytes shorter, exactly as
+ predicted.
+
+ This is where I say that we can't do any better than this, but of
+ course, we already know that we can -- if we could get the program
+ header table to reside completely within the ELF header. Can this holy
+ grail be achieved?
+
+ Well, we can't just move it up another twelve bytes without hitting
+ hopeless obstacles trying to reconcile several fields in both
+ structures. The only other possibility would be to have it start
+ immediately following the first four bytes. This puts the first part of
+ the program header table comfortably within the e_ident area, but still
+ leaves problems with the rest of it. After some experimenting, it looks
+ like it isn't going to quite be possible.
+
+ However, it turns out that there are still a couple more fields in the
+ program header table that we can pervert.
+
+ We noted that p_memsz indicates how much memory to allocate for the
+ memory segment. Obviously it needs to be at least as big as p_filesz,
+ but there wouldn't be any harm if it was larger. Just because we ask
+ for memory doesn't mean we have to use it, after all.
+
+ Secondly, it turns out that, contrary to all my expectations, the
+ executable bit can be dropped from the p_flags field. It turns out that
+ the readable and executable bits are redundant: either one will imply
+ the other.
+
+ So, with these facts in mind, we can reorganize the file into this
+ little monstrosity:
+
+ ; tiny.asm
+
+ BITS 32
+
+ org 0x00010000
+
+ db 0x7F, "ELF" ; e_ident
+ dd 1 ; p_type
+ dd 0 ; p_offset
+ dd $$ ; p_vaddr
+ dw 2 ; e_type ; p_paddr
+ dw 3 ; e_machine
+ dd _start ; e_version ; p_filesz
+ dd _start ; e_entry ; p_memsz
+ dd 4 ; e_phoff ; p_flags
+ _start:
+ mov bl, 42 ; e_shoff ; p_align
+ xor eax, eax
+ inc eax ; e_flags
+ int 0x80
+ db 0
+ dw 0x34 ; e_ehsize
+ dw 0x20 ; e_phentsize
+ dw 1 ; e_phnum
+ dw 0 ; e_shentsize
+ dw 0 ; e_shnum
+ dw 0 ; e_shstrndx
+
+ filesize equ $ - $$
+
+ The p_flags field has been changed from 5 to 4, as we noted we could
+ get away with doing. This 4 is also the value of the e_phoff field,
+ which gives the offset into the file for the program header table,
+ which is exactly where we've located it. The program (remember that?)
+ has been moved down to lower part of the ELF header, beginning at the
+ e_shoff field and ending inside the e_flags field.
+
+ Note that the load address has been changed to a much lower number --
+ about as low as it can be, in fact. This keeps the value in the e_entry
+ field to a reasonably small number, which is good since it's also the
+ p_memsz number. (Actually, with virtual memory it hardly matters -- we
+ could have left it at our original value and it would work just as
+ well. But there's no harm in being polite.)
+
+ The change to p_filesz may require an explanation. Because we aren't
+ setting the write bit in the p_flags field, Linux won't let us define a
+ p_memsz value greater than p_filesz, since it can't zero-initialize
+ those extra bytes if they aren't writeable. Since we can't change the
+ p_flags field without moving the program header table out of alignment,
+ you might think that the only solution would be to lower the p_memsz
+ value back down to equal p_filesz (which would make it impossible to
+ share it with e_entry). However, another solution exists, namely to
+ increase p_filesz to equal p_memsz. That means they're both larger than
+ the real file size -- quite a bit larger, in fact -- but it absolves
+ the loader from having to write to read-only memory, which is all it
+ cared about.
+
+ And so ...
+
+ $ nasm -f bin -o a.out tiny.asm
+ $ chmod +x a.out
+ $ ./a.out ; echo $?
+ 42
+ $ wc -c a.out
+ 52 a.out
+
+ ... and so, with both the program header table and the program itself
+ completely embedded within the ELF header, our executable file is now
+ exactly as big as the ELF header! No more, no less. And still running
+ without a single complaint from Linux!
+
+ Now, finally, we have truly and certainly reached the absolute minimum
+ possible. There can be no question about it, right? After all, we have
+ to have a complete ELF header (even if it is badly mangled), or else
+ Linux wouldn't give us the time of day!
+
+ Right?
+
+ Wrong. We have one last dirty trick left.
+
+ It seems to be the case that if the file isn't quite the size of a full
+ ELF header, Linux will still play ball, and fill out the missing bytes
+ with zeros. We have no less than seven zeros at the end of our file,
+ and if we drop them from the file image:
+
+ ; tiny.asm
+
+ BITS 32
+
+ org 0x00010000
+
+ db 0x7F, "ELF" ; e_ident
+ dd 1 ; p_type
+ dd 0 ; p_offset
+ dd $$ ; p_vaddr
+ dw 2 ; e_type ; p_paddr
+ dw 3 ; e_machine
+ dd _start ; e_version ; p_filesz
+ dd _start ; e_entry ; p_memsz
+ dd 4 ; e_phoff ; p_flags
+ _start:
+ mov bl, 42 ; e_shoff ; p_align
+ xor eax, eax
+ inc eax ; e_flags
+ int 0x80
+ db 0
+ dw 0x34 ; e_ehsize
+ dw 0x20 ; e_phentsize
+ db 1 ; e_phnum
+ ; e_shentsize
+ ; e_shnum
+ ; e_shstrndx
+
+ filesize equ $ - $$
+
+ ... we can, incredibly enough, still produce a working executable:
+
+ $ nasm -f bin -o a.out tiny.asm
+ $ chmod +x a.out
+ $ ./a.out ; echo $?
+ 42
+ $ wc -c a.out
+ 45 a.out
+
+ Here, at last, we have honestly gone as far as we can go. There is no
+ getting around the fact that the 45th byte in the file, which specifies
+ the number of entries in the program header table, needs to be
+ non-zero, needs to be present, and needs to be in the 45th position
+ from the start of the ELF header. We are forced to conclude that there
+ is nothing more that can be done.
+ __________________________________________________________________
+
+ This forty-five-byte file is less than one-eighth the size of the
+ smallest ELF executable we could create using the standard tools, and
+ is less than one-fiftieth the size of the smallest file we could create
+ using pure C code. We have stripped everything out of the file that we
+ could, and put to dual purpose most of what we couldn't.
+
+ Of course, half of the values in this file violate some part of the ELF
+ standard, and it's a wonder that Linux will even consent to sneeze on
+ it, much less give it a process ID. This is not the sort of program to
+ which one would normally be willing to confess authorship.
+
+ On the other hand, every single byte in this executable file can be
+ accounted for and justified. How many executables have you created
+ lately that you can say that about?
+
+
+ [4](next)
+ __________________________________________________________________
+
+ [5]Tiny
+ [6]Software
+ [7]Brian Raiter
+
+References
+
+ 1. http://www.nasm.us/
+ 2. http://refspecs.linuxbase.org/elf/elf.pdf
+ 3. http://www.muppetlabs.com/~breadbox/software/ELF.txt
+ 4. https://www.muppetlabs.com/~breadbox/software/tiny/teensyps.html
+ 5. http://www.muppetlabs.com/~breadbox/software/tiny/
+ 6. http://www.muppetlabs.com/~breadbox/software/
+ 7. http://www.muppetlabs.com/~breadbox/