diff options
Diffstat (limited to 'miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt')
-rw-r--r-- | miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt | 1000 |
1 files changed, 1000 insertions, 0 deletions
diff --git a/miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt b/miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt new file mode 100644 index 0000000..902fed3 --- /dev/null +++ b/miniany/doc/www.muppetlabs.com_~breadbox_software_tiny_teensy.txt @@ -0,0 +1,1000 @@ + A Whirlwind Tutorial on Creating Really Teensy ELF Executables for Linux + + (or, "Size Is Everything") + __________________________________________________________________ + + She studied it carefully for about 15 minutes. Finally, she spoke. + "There's something written on here," she said, frowning, "but it's + really teensy." + + [Dave Barry, "The Columnist's Caper"] + + If you're a programmer who's become fed up with software bloat, then + may you find herein the perfect antidote. + + This document explores methods for squeezing excess bytes out of simple + programs. (Of course, the more practical purpose of this document is to + describe a few of the inner workings of the ELF file format and the + Linux operating system. But hopefully you can also learn something + about how to make really teensy ELF executables in the process.) + + Please note that the information and examples given here are, for the + most part, specific to ELF executables on a Linux platform running + under an Intel x86 architecture. I imagine that a good bit of the + information is applicable to other ELF-based Unices, but my experiences + with such are too limited for me to say with certainty. + + Please also note that if you aren't a little bit familiar with assembly + code, you may find parts of this document sort of hard to follow. (The + assembly code that appears in this document is written using Nasm; see + [1]http://www.nasm.us/.) + __________________________________________________________________ + + In order to start, we need a program. Almost any program will do, but + the simpler the program the better, since we're more interested in how + small we can make the executable than what the program does. + + Let's take an incredibly simple program, one that does nothing but + return a number back to the operating system. Why not? After all, Unix + already comes with no less than two such programs: true and false. + Since 0 and 1 are already taken, we'll use the number 42. + + So, here is our first version: + + /* tiny.c */ + int main(void) { return 42; } + + which we can compile and test like so: + + $ gcc -Wall tiny.c + $ ./a.out ; echo $? + 42 + + So. How big is it? Well, on my machine, I get: + + $ wc -c a.out + 3998 a.out + + (Yours will probably differ some.) Admittedly, that's pretty small by + today's standards, but it's almost certainly bigger than it needs to + be. + + The obvious first step is to strip the executable: + + $ gcc -Wall -s tiny.c + $ ./a.out ; echo $? + 42 + $ wc -c a.out + 2632 a.out + + That's certainly an improvement. For the next step, how about + optimizing? + + $ gcc -Wall -s -O3 tiny.c + $ wc -c a.out + 2616 a.out + + That also helped, but only just. Which makes sense: there's hardly + anything there to optimize. + + It seems unlikely that there's much else we can do to shrink a + one-statement C program. We're going to have to leave C behind, and use + assembler instead. Hopefully, this will cut out all the extra overhead + that C programs automatically incur. + + So, on to our second version. All we need to do is return 42 from + main(). In assembly language, this means that the function should set + the accumulator, eax, to 42, and then return: + + ; tiny.asm + BITS 32 + GLOBAL main + SECTION .text + main: + mov eax, 42 + ret + + We can then build and test like so: + + $ nasm -f elf tiny.asm + $ gcc -Wall -s tiny.o + $ ./a.out ; echo $? + 42 + + (Hey, who says assembly code is difficult?) And now how big is it? + + $ wc -c a.out + 2604 a.out + + Looks like we shaved off a measly twelve bytes. So much for all the + extra overhead that C automatically incurs, eh? + + Well, the problem is that we are still incurring a lot of overhead by + using the main() interface. The linker is still adding an interface to + the OS for us, and it is that interface that actually calls main(). So + how do we get around that if we don't need it? + + The actual entry point that the linker uses by default is the symbol + with the name _start. When we link with gcc, it automatically includes + a _start routine, one that sets up argc and argv, among other things, + and then calls main(). + + So, let's see if we can bypass this, and define our own _start routine: + + ; tiny.asm + BITS 32 + GLOBAL _start + SECTION .text + _start: + mov eax, 42 + ret + + Will gcc do what we want? + + $ nasm -f elf tiny.asm + $ gcc -Wall -s tiny.o + tiny.o(.text+0x0): multiple definition of `_start' + /usr/lib/crt1.o(.text+0x0): first defined here + /usr/lib/crt1.o(.text+0x36): undefined reference to `main' + + No. Well, actually, yes it will, but first we need to learn how to ask + for what we want. + + It so happens that gcc recognizes an option called -nostartfiles. From + the gcc info pages: + + -nostartfiles + Do not use the standard system startup files when linking. The + standard libraries are used normally. + + Aha! Now let's see what we can do: + + $ nasm -f elf tiny.asm + $ gcc -Wall -s -nostartfiles tiny.o + $ ./a.out ; echo $? + Segmentation fault + 139 + + Well, gcc didn't complain, but the program doesn't work. What went + wrong? + + What went wrong is that we treated _start as if it were a C function, + and tried to return from it. In reality, it's not a function at all. + It's just a symbol in the object file which the linker uses to locate + the program's entry point. When our program is invoked, it's invoked + directly. If we were to look, we would see that the value on the top of + the stack was the number 1, which is certainly very un-address-like. In + fact, what is on the stack is our program's argc value. After this + comes the elements of the argv array, including the terminating NULL + element, followed by the elements of envp. And that's all. There is no + return address on the stack. + + So, how does _start ever exit? Well, it calls the exit() function! + That's what it's there for, after all. + + Actually, I lied. What it really does is call the _exit() function. + (Notice the leading underscore.) exit() is required to finish up some + tasks on behalf of the process, but those tasks will never have been + started, because we're bypassing the library's startup code. So we need + to bypass the library's shutdown code as well, and go directly to the + operating system's shutdown processing. + + So, let's try this again. We're going to call _exit(), which is a + function that takes a single integer argument. So all we need to do is + push the number onto the stack and call the function. (We also need to + declare _exit() as external.) Here's our assembly: + + ; tiny.asm + BITS 32 + EXTERN _exit + GLOBAL _start + SECTION .text + _start: + push dword 42 + call _exit + + And we build and test as before: + + $ nasm -f elf tiny.asm + $ gcc -Wall -s -nostartfiles tiny.o + $ ./a.out ; echo $? + 42 + + Success at last! And now how big is it? + + $ wc -c a.out + 1340 a.out + + Almost half the size! Not bad. Not bad at all. Hmmm ... so what other + interesting obscure options does gcc have? + + Well, this one, appearing immediately after -nostartfiles in the + documentation, is certainly eye-catching: + + -nostdlib + Don't use the standard system libraries and startup files when + linking. Only the files you specify will be passed to the linker. + + That's gotta be worth investigating: + + $ gcc -Wall -s -nostdlib tiny.o + tiny.o(.text+0x6): undefined reference to `_exit' + + Oops. That's right ... _exit() is, after all, a library function. It + has to be filled in from somewhere. + + Okay. But surely, we don't need libc's help just to end a program, do + we? + + No, we don't. If we're willing to leave behind all pretenses of + portability, we can make our program exit without having to link with + anything else. First, though, we need to know how to make a system call + under Linux. + __________________________________________________________________ + + Linux, like most operating systems, provides basic necessities to the + programs it hosts via system calls. This includes things like opening a + file, reading and writing to file handles -- and, of course, shutting + down a process. + + The Linux system call interface is a single instruction: int 0x80. All + system calls are done via this interrupt. To make a system call, eax + should contain a number that indicates which system call is being + invoked, and other registers are used to hold the arguments, if any. If + the system call takes one argument, it will be in ebx; a system call + with two arguments will use ebx and ecx. Likewise, edx, esi, and edi + are used if a third, fourth, or fifth argument is required, + respectively. Upon return from a system call, eax will contain the + return value. If an error occurs, eax will contain a negative value, + with the absolute value indicating the error. + + The numbers for the different system calls are listed in + /usr/include/asm/unistd.h. A quick peek will tell us that the exit + system call is assigned the number 1. Like the C function, it takes one + argument, the value to return to the parent process, and so this will + go into ebx. + + We now know all we need to know to create the next version of our + program, one that won't need assistance from any external functions to + work: + + ; tiny.asm + BITS 32 + GLOBAL _start + SECTION .text + _start: + mov eax, 1 + mov ebx, 42 + int 0x80 + + Here we go: + + $ nasm -f elf tiny.asm + $ gcc -Wall -s -nostdlib tiny.o + $ ./a.out ; echo $? + 42 + + Ta-da! And the size? + + $ wc -c a.out + 372 a.out + + Now that's tiny! Almost a fourth the size of the previous version! + + So ... can we do anything else to make it even smaller? + + How about using shorter instructions? + + If we generate a list file for the assembly code, we'll find the + following: + + 00000000 B801000000 mov eax, 1 + 00000005 BB2A000000 mov ebx, 42 + 0000000A CD80 int 0x80 + + Well, gee, we don't need to initialize all of ebx, since the operating + system is only going to use the lowest byte. Setting bl alone will be + sufficient, and will take two bytes instead of five. + + We can also set eax to one by xor'ing it to zero and then using a + one-byte increment instruction; this will save two more bytes. + + 00000000 31C0 xor eax, eax + 00000002 40 inc eax + 00000003 B32A mov bl, 42 + 00000005 CD80 int 0x80 + + I think it's pretty safe to say that we're not going to make this + program any smaller than that. + + As an aside, we might as well stop using gcc to link our executable, + seeing as we're not using any of its added functionality, and just call + the linker, ld, ourselves: + + $ nasm -f elf tiny.asm + $ ld -s tiny.o + $ ./a.out ; echo $? + 42 + $ wc -c a.out + 368 a.out + + Four bytes smaller. (Hey! Didn't we shave five bytes off? Well, we did, + but alignment considerations within the ELF file caused it to require + an extra byte of padding.) + + So ... have we reached the end? Is this as small as we can go? + + Well, hm. Our program is now seven bytes long. Do ELF files really + require 361 bytes of overhead? What's in this file, anyway? + + We can peek into the contents of the file using objdump: + + $ objdump -x a.out | less + + The output may look like gibberish, but right now let's just focus on + the list of sections: + + Sections: + Idx Name Size VMA LMA File off Algn + 0 .text 00000007 08048080 08048080 00000080 2**4 + CONTENTS, ALLOC, LOAD, READONLY, CODE + 1 .comment 0000001c 00000000 00000000 00000087 2**0 + CONTENTS, READONLY + + The complete .text section is listed as being seven bytes long, just as + we specified. So it seems safe to conclude that we now have complete + control of the machine-language content of our program. + + But then there's this other section named ".comment". Who ordered that? + And it's 28 bytes long, even! We may not be sure what this .comment + section is, but it seems a good bet that it isn't a necessary + feature.... + + The .comment section is listed as being located at file offset 00000087 + (hexadecimal). If we use a hexdump program to look at that area of the + file, we will see: + + 00000080: 31C0 40B3 2ACD 8000 5468 6520 4E65 7477 1.@.*...The Netw + 00000090: 6964 6520 4173 7365 6D62 6C65 7220 302E ide Assembler 0. + 000000A0: 3938 0000 2E73 796D 7461 6200 2E73 7472 98...symtab..str + + Well, well, well. Who'd've thought that Nasm would undermine our quest + like this? Maybe we should switch to using gas, AT&T syntax + notwithstanding.... + + Alas, if we do: + + ; tiny.s + .globl _start + .text + _start: + xorl %eax, %eax + incl %eax + movb $42, %bl + int $0x80 + + ... we will find: + + $ gcc -s -nostdlib tiny.s + $ ./a.out ; echo $? + 42 + $ wc -c a.out + 368 a.out + + ... no difference! + + Well, actually there is some difference. Turning once again to objdump, + we see: + + Sections: + Idx Name Size VMA LMA File off Algn + 0 .text 00000007 08048074 08048074 00000074 2**2 + CONTENTS, ALLOC, LOAD, READONLY, CODE + 1 .data 00000000 0804907c 0804907c 0000007c 2**2 + CONTENTS, ALLOC, LOAD, DATA + 2 .bss 00000000 0804907c 0804907c 0000007c 2**2 + ALLOC + + No comment section, but now we have two useless sections for storing + our nonexistent data. And even though these sections are zero bytes + long, they incur overhead, bringing our file size up for no good + reason. + + Okay, so just what is all this overhead, and how do we get rid of it? + + Well, to answer these questions, we must begin diving into some real + wizardry. We need to understand the ELF format. + __________________________________________________________________ + + The canonical document describing the ELF format for Intel-386 + architectures can be found at + [2]http://refspecs.linuxbase.org/elf/elf.pdf. (You can also find a + flat-text version of version 1.0 of the standard at + [3]http://www.muppetlabs.com/~breadbox/software/ELF.txt.) This + specification covers a lot of territory, so if you'd prefer to not read + the whole thing yourself, I'll understand. Basically, here's what we + need to know: + + Every ELF file begins with a structure called the ELF header. This + structure is 52 bytes long, and contains several pieces of information + that describe the contents of the file. For example, the first sixteen + bytes contain an "identifier", which includes the file's magic-number + signature (7F 45 4C 46), and some one-byte flags indicating that the + contents are 32-bit or 64-bit, little-endian or big-endian, etc. Other + fields in the ELF header contain information such as: the target + architecture; whether the ELF file is an executable, an object file, or + a shared-object library; the program's starting address; and the + locations within the file of the program header table and the section + header table. + + These two tables can appear anywhere in the file, but typically the + former appears immediately following the ELF header, and the latter + appears at or near the end of the file. The two tables serve similar + purposes, in that they identify the component parts of the file. + However, the section header table focuses more on identifying where the + various parts of the program are within the file, while the program + header table describes where and how these parts are to be loaded into + memory. In brief, the section header table is for use by the compiler + and linker, while the program header table is for use by the program + loader. The program header table is optional for object files, and in + practice is never present. Likewise, the section header table is + optional for executables -- but is almost always present! + + So, this is the answer to our first question. A fair piece of the + overhead in our program is a completely unnecessary section header + table, and maybe some equally useless sections that don't contribute to + our program's memory image. + + So, we turn to our second question: how do we go about getting rid of + all that? + + Alas, we're on our own here. None of the standard tools will deign to + make an executable without a section header table of some kind. If we + want such a thing, we'll have to do it ourselves. + + This doesn't quite mean that we have to pull out a binary editor and + code the hexadecimal values by hand, though. Good old Nasm has a flat + binary output format, which will serve us well. All we need now is the + image of an empty ELF executable, which we can fill in with our + program. Our program, and nothing else. + + We can look at the ELF specification, and /usr/include/linux/elf.h, and + executables created by the standard tools, to figure out what our empty + ELF executable should look like. But, if you're the impatient type, you + can just use the one I've supplied here: + + BITS 32 + + org 0x08048000 + + ehdr: ; Elf32_Ehdr + db 0x7F, "ELF", 1, 1, 1, 0 ; e_ident + times 8 db 0 + dw 2 ; e_type + dw 3 ; e_machine + dd 1 ; e_version + dd _start ; e_entry + dd phdr - $$ ; e_phoff + dd 0 ; e_shoff + dd 0 ; e_flags + dw ehdrsize ; e_ehsize + dw phdrsize ; e_phentsize + dw 1 ; e_phnum + dw 0 ; e_shentsize + dw 0 ; e_shnum + dw 0 ; e_shstrndx + + ehdrsize equ $ - ehdr + + phdr: ; Elf32_Phdr + dd 1 ; p_type + dd 0 ; p_offset + dd $$ ; p_vaddr + dd $$ ; p_paddr + dd filesize ; p_filesz + dd filesize ; p_memsz + dd 5 ; p_flags + dd 0x1000 ; p_align + + phdrsize equ $ - phdr + + _start: + + ; your program here + + filesize equ $ - $$ + + This image contains an ELF header, identifying the file as an Intel 386 + executable, with no section header table and a program header table + containing one entry. Said entry instructs the program loader to load + the entire file into memory (it's normal behavior for a program to + include its ELF header and program header table in its memory image) + starting at memory address 0x08048000 (which is the default address for + executables to load), and to begin executing the code at _start, which + appears immediately after the program header table. No .data segment, + no .bss segment, no commentary -- nothing but the bare necessities. + + So, let's add in our little program: + + ; tiny.asm + org 0x08048000 + + ; + ; (as above) + ; + + + _start: + mov bl, 42 + xor eax, eax + inc eax + int 0x80 + + filesize equ $ - $$ + + and try it out: + + $ nasm -f bin -o a.out tiny.asm + $ chmod +x a.out + $ ./a.out ; echo $? + 42 + + We have just created an executable completely from scratch. How about + that? And now, take a look at its size: + + $ wc -c a.out + 91 a.out + + Ninety-one bytes. Less than one-fourth the size of our previous + attempt, and less than one-fortieth the size of our first! + + What's more, this time we can account for every last byte. We know + exactly what's in the executable, and why it needs to be there. This + is, finally, the limit. We can't get any smaller than this. + + Or can we? + __________________________________________________________________ + + Well, if you actually stopped to read the ELF specification, you might + have noticed a couple of facts. 1) The different parts of an ELF file + are permitted to be located anywhere (except the ELF header, which must + be at the top of the file), and they can even overlap each other. 2) + Some of the fields in the headers aren't actually used. + + In particular, I'm thinking of that string of zeros at the end of the + 16-byte identification field. They are pure padding, to make room for + future expansion of the ELF standard. So the OS shouldn't care at all + what's in there. And we're already loading everything into memory + anyway, and our program is only seven bytes long.... + + Can we put our code inside the ELF header itself? + + Why not? + + ; tiny.asm + + BITS 32 + + org 0x08048000 + + ehdr: ; Elf32_Ehdr + db 0x7F, "ELF" ; e_ident + db 1, 1, 1, 0, 0 + _start: mov bl, 42 + xor eax, eax + inc eax + int 0x80 + dw 2 ; e_type + dw 3 ; e_machine + dd 1 ; e_version + dd _start ; e_entry + dd phdr - $$ ; e_phoff + dd 0 ; e_shoff + dd 0 ; e_flags + dw ehdrsize ; e_ehsize + dw phdrsize ; e_phentsize + dw 1 ; e_phnum + dw 0 ; e_shentsize + dw 0 ; e_shnum + dw 0 ; e_shstrndx + + ehdrsize equ $ - ehdr + + phdr: ; Elf32_Phdr + dd 1 ; p_type + dd 0 ; p_offset + dd $$ ; p_vaddr + dd $$ ; p_paddr + dd filesize ; p_filesz + dd filesize ; p_memsz + dd 5 ; p_flags + dd 0x1000 ; p_align + + phdrsize equ $ - phdr + + filesize equ $ - $$ + + After all, bytes are bytes! + + $ nasm -f bin -o a.out tiny.asm + $ chmod +x a.out + $ ./a.out ; echo $? + 42 + $ wc -c a.out + 84 a.out + + Not bad, eh? + + Now we've really gone as low as we can go. Our file is exactly as long + as one ELF header and one program header table entry, both of which we + absolutely require in order to get loaded into memory and run. So + there's nothing left to reduce now! + + Except ... + + Well, what if we could do the same thing to the program header table + that we just did to the program? Have it overlap with the ELF header, + that is. Is it possible? + + It is indeed. Take a look at our program. Note that the last eight + bytes in the ELF header bear a certain kind of resemblence to the first + eight bytes in the program header table. A certain kind of resemblence + that might be described as "identical". + + So ... + + ; tiny.asm + + BITS 32 + + org 0x08048000 + + ehdr: + db 0x7F, "ELF" ; e_ident + db 1, 1, 1, 0, 0 + _start: mov bl, 42 + xor eax, eax + inc eax + int 0x80 + dw 2 ; e_type + dw 3 ; e_machine + dd 1 ; e_version + dd _start ; e_entry + dd phdr - $$ ; e_phoff + dd 0 ; e_shoff + dd 0 ; e_flags + dw ehdrsize ; e_ehsize + dw phdrsize ; e_phentsize + phdr: dd 1 ; e_phnum ; p_type + ; e_shentsize + dd 0 ; e_shnum ; p_offset + ; e_shstrndx + ehdrsize equ $ - ehdr + dd $$ ; p_vaddr + dd $$ ; p_paddr + dd filesize ; p_filesz + dd filesize ; p_memsz + dd 5 ; p_flags + dd 0x1000 ; p_align + phdrsize equ $ - phdr + + filesize equ $ - $$ + + And sure enough, Linux doesn't mind our parsimony one bit: + + $ nasm -f bin -o a.out tiny.asm + $ chmod +x a.out + $ ./a.out ; echo $? + 42 + $ wc -c a.out + 76 a.out + + Now we've really gone as low as we can go. There's no way to overlap + the two structures any more than this. The bytes simply don't match up. + This is the end of the line! + + Unless, that is, we could change the contents of the structures to make + them match even further.... + + How many of these fields is Linux actually looking at, anyway? For + example, does Linux actually check to see if the e_machine field + contains 3 (indicating an Intel 386 target), or is it just assuming + that it does? + + As a matter of fact, in that case it does. But a surprising number of + other fields are being quietly ignored. + + So: Here's what is and isn't essential in the ELF header. The first + four bytes have to contain the magic number, or else Linux won't touch + it. The other three bytes in the e_ident field are not checked, + however, which means we have no less than twelve contiguous bytes we + can set to anything at all. e_type has to be set to 2, to indicate an + executable, and e_machine has to be 3, as just noted. e_version is, + like the version number inside e_ident, completely ignored. (Which is + sort of understandable, seeing as currently there's only one version of + the ELF standard.) e_entry naturally has to be valid, since it points + to the start of the program. And clearly, e_phoff needs to contain the + correct offset of the program header table in the file, and e_phnum + needs to contain the right number of entries in said table. e_flags, + however, is documented as being currently unused for Intel, so it + should be free for us to reuse. e_ehsize is supposed to be used to + verify that the ELF header has the expected size, but Linux pays it no + mind. e_phentsize is likewise for validating the size of the program + header table entries. This one was unchecked in older kernels, but now + it needs to be set correctly. Everything else in the ELF header is + about the section header table, which doesn't come into play with + executable files. + + And now how about the program header table entry? Well, p_type has to + contain 1, to mark it as a loadable segment. p_offset really needs to + have the correct file offset to start loading. Likewise, p_vaddr needs + to contain the proper load address. Note, however, that we're not + required to load at 0x08048000. Almost any address can be used as long + as it's above 0x00000000, below 0x80000000, and page-aligned. The + p_paddr field is documented as being ignored, so that's guaranteed to + be free. p_filesz indicates how many bytes to load out of the file into + memory, and p_memsz indicates how large the memory segment needs to be, + so these numbers ought to be relatively sane. p_flags indicates what + permissions to give the memory segment. It needs to be readable (4), or + it won't be usable at all, and it needs to also be executable (1), or + else we can't execute code in it. Other bits can probably be set as + well, but we need to have those at minimum. Finally, p_align gives the + alignment requirements for the memory segment. This field is mainly + used when relocating segments containing position-independent code (as + for shared libraries), so for an executable file Linux will ignore + whatever garbage we store here. + + All in all, that's a fair bit of leeway. In particular, a bit of + scrutiny will reveal that most of the necessary fields in the ELF + header are in the first half - the second half is almost completely + free for munging. With this in mind, we can interpose the two + structures quite a bit more than we did previously: + + ; tiny.asm + + BITS 32 + + org 0x00200000 + + db 0x7F, "ELF" ; e_ident + db 1, 1, 1, 0, 0 + _start: + mov bl, 42 + xor eax, eax + inc eax + int 0x80 + dw 2 ; e_type + dw 3 ; e_machine + dd 1 ; e_version + dd _start ; e_entry + dd phdr - $$ ; e_phoff + phdr: dd 1 ; e_shoff ; p_type + dd 0 ; e_flags ; p_offset + dd $$ ; e_ehsize ; p_vaddr + ; e_phentsize + dw 1 ; e_phnum ; p_paddr + dw 0 ; e_shentsize + dd filesize ; e_shnum ; p_filesz + ; e_shstrndx + dd filesize ; p_memsz + dd 5 ; p_flags + dd 0x1000 ; p_align + + filesize equ $ - $$ + + As you can (hopefully) see, the first twenty bytes of the program + header table now overlap the last twenty bytes of the ELF header. The + two dovetail quite nicely, actually. There are only two parts of the + ELF header within the overlapped region that matter. The first is the + e_phnum field, which just happens to coincide with the p_paddr field, + one of the few fields in the program header table which is definitely + ignored. The other is the e_phentsize field, which coincides with the + top half of the p_vaddr field. These are made to match up by selecting + a non-standard load address for our program, with a top half equal to + 0x0020. + + Now we have really left behind all pretenses of portability ... + + $ nasm -f bin -o a.out tiny.asm + $ chmod +x a.out + $ ./a.out ; echo $? + 42 + $ wc -c a.out + 64 a.out + + ... but it works! And the program is twelve bytes shorter, exactly as + predicted. + + This is where I say that we can't do any better than this, but of + course, we already know that we can -- if we could get the program + header table to reside completely within the ELF header. Can this holy + grail be achieved? + + Well, we can't just move it up another twelve bytes without hitting + hopeless obstacles trying to reconcile several fields in both + structures. The only other possibility would be to have it start + immediately following the first four bytes. This puts the first part of + the program header table comfortably within the e_ident area, but still + leaves problems with the rest of it. After some experimenting, it looks + like it isn't going to quite be possible. + + However, it turns out that there are still a couple more fields in the + program header table that we can pervert. + + We noted that p_memsz indicates how much memory to allocate for the + memory segment. Obviously it needs to be at least as big as p_filesz, + but there wouldn't be any harm if it was larger. Just because we ask + for memory doesn't mean we have to use it, after all. + + Secondly, it turns out that, contrary to all my expectations, the + executable bit can be dropped from the p_flags field. It turns out that + the readable and executable bits are redundant: either one will imply + the other. + + So, with these facts in mind, we can reorganize the file into this + little monstrosity: + + ; tiny.asm + + BITS 32 + + org 0x00010000 + + db 0x7F, "ELF" ; e_ident + dd 1 ; p_type + dd 0 ; p_offset + dd $$ ; p_vaddr + dw 2 ; e_type ; p_paddr + dw 3 ; e_machine + dd _start ; e_version ; p_filesz + dd _start ; e_entry ; p_memsz + dd 4 ; e_phoff ; p_flags + _start: + mov bl, 42 ; e_shoff ; p_align + xor eax, eax + inc eax ; e_flags + int 0x80 + db 0 + dw 0x34 ; e_ehsize + dw 0x20 ; e_phentsize + dw 1 ; e_phnum + dw 0 ; e_shentsize + dw 0 ; e_shnum + dw 0 ; e_shstrndx + + filesize equ $ - $$ + + The p_flags field has been changed from 5 to 4, as we noted we could + get away with doing. This 4 is also the value of the e_phoff field, + which gives the offset into the file for the program header table, + which is exactly where we've located it. The program (remember that?) + has been moved down to lower part of the ELF header, beginning at the + e_shoff field and ending inside the e_flags field. + + Note that the load address has been changed to a much lower number -- + about as low as it can be, in fact. This keeps the value in the e_entry + field to a reasonably small number, which is good since it's also the + p_memsz number. (Actually, with virtual memory it hardly matters -- we + could have left it at our original value and it would work just as + well. But there's no harm in being polite.) + + The change to p_filesz may require an explanation. Because we aren't + setting the write bit in the p_flags field, Linux won't let us define a + p_memsz value greater than p_filesz, since it can't zero-initialize + those extra bytes if they aren't writeable. Since we can't change the + p_flags field without moving the program header table out of alignment, + you might think that the only solution would be to lower the p_memsz + value back down to equal p_filesz (which would make it impossible to + share it with e_entry). However, another solution exists, namely to + increase p_filesz to equal p_memsz. That means they're both larger than + the real file size -- quite a bit larger, in fact -- but it absolves + the loader from having to write to read-only memory, which is all it + cared about. + + And so ... + + $ nasm -f bin -o a.out tiny.asm + $ chmod +x a.out + $ ./a.out ; echo $? + 42 + $ wc -c a.out + 52 a.out + + ... and so, with both the program header table and the program itself + completely embedded within the ELF header, our executable file is now + exactly as big as the ELF header! No more, no less. And still running + without a single complaint from Linux! + + Now, finally, we have truly and certainly reached the absolute minimum + possible. There can be no question about it, right? After all, we have + to have a complete ELF header (even if it is badly mangled), or else + Linux wouldn't give us the time of day! + + Right? + + Wrong. We have one last dirty trick left. + + It seems to be the case that if the file isn't quite the size of a full + ELF header, Linux will still play ball, and fill out the missing bytes + with zeros. We have no less than seven zeros at the end of our file, + and if we drop them from the file image: + + ; tiny.asm + + BITS 32 + + org 0x00010000 + + db 0x7F, "ELF" ; e_ident + dd 1 ; p_type + dd 0 ; p_offset + dd $$ ; p_vaddr + dw 2 ; e_type ; p_paddr + dw 3 ; e_machine + dd _start ; e_version ; p_filesz + dd _start ; e_entry ; p_memsz + dd 4 ; e_phoff ; p_flags + _start: + mov bl, 42 ; e_shoff ; p_align + xor eax, eax + inc eax ; e_flags + int 0x80 + db 0 + dw 0x34 ; e_ehsize + dw 0x20 ; e_phentsize + db 1 ; e_phnum + ; e_shentsize + ; e_shnum + ; e_shstrndx + + filesize equ $ - $$ + + ... we can, incredibly enough, still produce a working executable: + + $ nasm -f bin -o a.out tiny.asm + $ chmod +x a.out + $ ./a.out ; echo $? + 42 + $ wc -c a.out + 45 a.out + + Here, at last, we have honestly gone as far as we can go. There is no + getting around the fact that the 45th byte in the file, which specifies + the number of entries in the program header table, needs to be + non-zero, needs to be present, and needs to be in the 45th position + from the start of the ELF header. We are forced to conclude that there + is nothing more that can be done. + __________________________________________________________________ + + This forty-five-byte file is less than one-eighth the size of the + smallest ELF executable we could create using the standard tools, and + is less than one-fiftieth the size of the smallest file we could create + using pure C code. We have stripped everything out of the file that we + could, and put to dual purpose most of what we couldn't. + + Of course, half of the values in this file violate some part of the ELF + standard, and it's a wonder that Linux will even consent to sneeze on + it, much less give it a process ID. This is not the sort of program to + which one would normally be willing to confess authorship. + + On the other hand, every single byte in this executable file can be + accounted for and justified. How many executables have you created + lately that you can say that about? + + + [4](next) + __________________________________________________________________ + + [5]Tiny + [6]Software + [7]Brian Raiter + +References + + 1. http://www.nasm.us/ + 2. http://refspecs.linuxbase.org/elf/elf.pdf + 3. http://www.muppetlabs.com/~breadbox/software/ELF.txt + 4. https://www.muppetlabs.com/~breadbox/software/tiny/teensyps.html + 5. http://www.muppetlabs.com/~breadbox/software/tiny/ + 6. http://www.muppetlabs.com/~breadbox/software/ + 7. http://www.muppetlabs.com/~breadbox/ |