diff options
Diffstat (limited to 'miniany/doc/unixwiz.net_techtips_win32-callconv-asm.txt')
-rw-r--r-- | miniany/doc/unixwiz.net_techtips_win32-callconv-asm.txt | 321 |
1 files changed, 321 insertions, 0 deletions
diff --git a/miniany/doc/unixwiz.net_techtips_win32-callconv-asm.txt b/miniany/doc/unixwiz.net_techtips_win32-callconv-asm.txt new file mode 100644 index 0000000..a11fca6 --- /dev/null +++ b/miniany/doc/unixwiz.net_techtips_win32-callconv-asm.txt @@ -0,0 +1,321 @@ + #[1]RSS 2.0 + +Does this site look plain? + + This site uses advanced css techniques + + [2][Steve Friedl Logo] + +Steve Friedl's Unixwiz.net Tech Tips + +Intel x86 Function-call Conventions - Assembly View + + * [3]Home + * [4]Contact + * [5]About + * [6]TechTips + * [7]Tools&Source + * [8]Evo Payroll + * [9]CmdLetters + * [10]Research + * [11]AT&T 3B2 + * [12]Advisories + * [13]News/Pubs + * [14]Literacy + * [15]Calif.Voting + * [16]Personal + * [17]Tech Blog + * [18]Evo Blog + + One of the "big picture" issues in looking at compiled C code is the + function-calling conventions. These are the methods that a calling + function and a called function agree on how parameters and return + values should be passed between them, and how the stack is used by the + function itself. The layout of the stack constitutes the "stack frame", + and knowing how this works can go a long way to decoding how something + works. + + In C and modern CPU design conventions, the stack frame is a chunk of + memory, allocated from the stack, at run-time, each time a function is + called, to store its automatic variables. Hence nested or recursive + calls to the same function, each successively obtain their own separate + frames. + + Physically, a function's stack frame is the area between the addresses + contained in esp, the stack pointer, and ebp, the frame pointer (base + pointer in Intel terminology). Thus, if a function pushes more values + onto the stack, it is effectively growing its frame. + + This is a very low-level view: the picture as seen from the C/C++ + programmer is illustrated elsewhere: + + o [19]Unixwiz.net Tech Tip: Intel x86 Function-call Conventions - C + Programmer's View + + For the sake of discussion, we're using the terms that the Microsoft + Visual C compiler uses to describe these conventions, even though other + platforms may use other terms. + + __cdecl (pronounced see-DECK-'ll rhymes with "heckle") + This convention is the most common because it supports semantics + required by the C language. The C language supports variadic + functions (variable argument lists, à la printf), and this means + that the caller must clean up the stack after the function call: + the called function has no way to know how to do this. It's not + terribly optimal, but the C language semantics demand it. + + __stdcall + Also known as __pascal, this requires that each function take a + fixed number of parameters, and this means that the called + function can do argument cleanup in one place rather than have + this be scattered throughout the program in every place that + calls it. The Win32 API primarily uses __stdcall. + + It's important to note that these are merely conventions, and any + collection of cooperating code can agree on nearly anything. There are + other conventions (passing parameters in registers, for instance) that + behave differently, and of course the optimizer can make mincemeat of + any clear picture as well. + + Our focus here is to provide an overview, and not an authoritative + definition for these conventions. + +Register use in the stack frame + + In both __cdecl and __stdcall conventions, the same set of three + registers is involved in the function-call frame: + + %ESP - Stack Pointer + This 32-bit register is implicitly manipulated by several CPU + instructions (PUSH, POP, CALL, and RET among others), it always + points to the last element used on the stack (not the first free + element): this means that the PUSH and POP operations would be + specified in pseudo-C as: + +*--ESP = value; // push + +value = *ESP++; // pop + + The "Top of the stack" is an occupied location, not a free one, + and is at the lowest memory address. + + %EBP - Base Pointer + This 32-bit register is used to reference all the function + parameters and local variables in the current stack frame. + Unlike the %esp register, the base pointer is manipulated only + explicitly. This is sometimes called the "Frame Pointer". + + %EIP - Instruction Pointer + This holds the address of the next CPU instruction to be + executed, and it's saved onto the stack as part of the CALL + instruction. As well, any of the "jump" instructions modify the + %EIP directly. + +Assembler notation + + Virtually everybody in the Intel assembler world uses the Intel + notation, but the GNU C compiler uses what they call the "AT&T syntax" + for backwards compatibility. This seems to us to be a really dumb idea, + but it's a fact of life. + + There are minor notational differences between the two notations, but + by far the most annoying is that the AT&T syntax reverses the source + and destination operands. To move the immediate value 4 into the EAX + register: +mov $4, %eax // AT&T notation + +mov eax, 4 // Intel notation + + More recent GNU compilers have a way to generate the Intel syntax, but + it's not clear if the GNU assembler takes it. In any case, we'll use + the Intel notation exclusively. + + There are other minor differences that are not of much concern to the + reverse engineer. + +Calling a __cdecl function + + The best way to understand the stack organization is to see each step + in calling a function with the __cdecl conventions. These steps are + taken automatically by the compiler, and though not all of them are + used in every case (sometimes no parameters, sometimes no local + variables, sometimes no saved registers), but this shows the overall + mechanism employed. + + Push parameters onto the stack, from right to left + Parameters are pushed onto the stack, one at a time, from right + to left. Whether the parameters are evaluated from right to left + is a different matter, and in any case this is unspecified by + the language and code should never rely on this. The calling + code must keep track of how many bytes of parameters have been + pushed onto the stack so it can clean it up later. + + Call the function + Here, the processor pushes contents of the %EIP (instruction + pointer) onto the stack, and it points to the first byte after + the CALL instruction. After this finishes, the caller has lost + control, and the callee is in charge. This step does not change + the %ebp register. + + Save and update the %ebp + Now that we're in the new function, we need a new local stack + frame pointed to by %ebp, so this is done by saving the current + %ebp (which belongs to the previous function's frame) and making + it point to the top of the stack. + +push ebp +mov ebp, esp // ebp « esp + + Once %ebp has been changed, it can now refer directly to the + function's arguments as 8(%ebp), 12(%ebp). + Note that 0(%ebp) is the old base pointer and 4(%ebp) is the old + instruction pointer, but this applies to near calls only - far + calls include segment registers too, but these are uncommon in + real programs. + + Save CPU registers used for temporaries + [__cdecl stack frame] If this function will use any CPU + registers, it has to save the old values first lest it walk on + data used by the calling functions. Each register to be used is + pushed onto the stack one at a time, and the compiler must + remember what it did so it can unwind it later. + + Allocate local variables + The function may choose to use local stack-based variables, and + they are allocated here simply by decrementing the stack pointer + by the amount of space required. This is always done in + four-byte chunks. + Now, the local variables are located on the stack between the + %ebp and %esp registers, and though it would be possible to + refer to them as offsets from either one, by convention the %ebp + register is used. This means that -4(%ebp) refers to the first + local variable. + + Perform the function's purpose + At this point, the stack frame is set up correctly, and this is + represented by the diagram to the right. All the parameters and + locals are offsets from the %ebp register: + + 16(%ebp) - third function parameter + 12(%ebp) - second function parameter + 8(%ebp) - first function parameter + 4(%ebp) - old %EIP (the function's "return address") + 0(%ebp) - old %EBP (previous function's base pointer) + -4(%ebp) - first local variable + -8(%ebp) - second local variable + -12(%ebp) - third local variable + + The function is free to use any of the registers that had been + saved onto the stack upon entry, but it must not change the + stack pointer or all Hell will break loose upon function return. + + Release local storage + When the function allocates local, temporary space, it does so + by decrementing from the stack point by the amount space needed, + and this process must be reversed to reclaim that space. It's + usually done by adding to the stack pointer the same amount + which was subtracted previously, though a series of POP + instructions could achieve the same thing. + + Restore saved registers + For each register saved onto the stack upon entry, it must be + restored from the stack in reverse order. If the "save" and + "restore" phases don't match exactly, catastrophic stack + corruption will occur. + + Restore the old base pointer + The first thing this function did upon entry was save the + caller's %ebp base pointer, and by restoring it now (popping the + top item from the stack), we effectively discard the entire + local stack frame and put the caller's frame back in play. + + Return from the function + This is the last step of the called function, and the RET + instruction pops the old %EIP from the stack and jumps to that + location. This gives control back to the calling function. Only + the stack pointer and instruction pointers are modified by a + subroutine return. + + Clean up pushed parameters + In the __cdecl convention, the caller must clean up the + parameters pushed onto the stack, and this is done either by + popping the stack into don't-care registers (for a few + parameters) or by adding the parameter-block size to the stack + pointer directly. + +__cdecl -vs- __stdcall + + The __stdcall convention is mainly used by the Windows API, and it's a + bit more compact than __cdecl. The main difference is that any given + function has a hard-coded set of parameters, and this cannot vary from + call to call like it can in C (no "variadic functions"). + + Because the size of the parameter block is fixed, the burden of + cleaning these parameters off the stack can be shifted to the called + function, instead of being done by the calling function as in __cdecl. + There are several effects of this: + 1. the code is a tiny bit smaller, because the parameter-cleanup code + is found once -- in the called function itself -- rather than in + every place the function is called. These may be only a few bytes + per call, but for commonly-used functions it can add up. This + presumably means that the code may be a tiny bit faster as well. + 2. calling the function with the wrong number of parameters is + catastrophic - the stack will be badly misaligned, and general + havoc will surely ensue. + 3. As an offshoot of #2, Microsoft Visual C takes special care of + functions that are B{__stdcall}. Since the number of parameters is + known at compile time, the compiler encodes the parameter byte + count in the symbol name itself, and this means that calling the + function wrong leads to a link error. + For instance, the function int foo(int a, int b) would generate -- + at the assembler level -- the symbol "_foo@8", where "8" is the + number of bytes expected. This means that not only will a call with + 1 or 3 parameters not resolve (due to the size mismatch), but + neither will a call expecting the __cdecl parameters (which looks + for _foo). It's a clever mechanism that avoids a lot of problems. + +Variations and Notes + + The x86 architecture provides a number of built-in mechanisms for + assisting with frame management, but they don't seem to be commonly + used by C compilers. Of particular interest is the ENTER instruction, + which handles most of the function-prolog code. +ENTER 10,0 PUSH ebp + MOV ebp, esp + SUB esp, 10 + + We're pretty sure these are functionally equivalent, but our 80386 + processor reference suggests that the ENTER version is more compact (6 + bytes -vs- 9) but slower (15 clocks -vs- 6). The newer processors are + probably harder to pin down, but somebody has probably figured out that + ENTER is slower. Sigh. + [20]More Tech Tips + + [21]Home [22]Stephen J. Friedl Software Consultant Orange County, + CA USA [Steve's Email] [23][RSS Feed available] + +References + + 1. http://unixwiz.net/techtips/techtips.rss + 2. http://unixwiz.net/ + 3. http://unixwiz.net/ + 4. http://unixwiz.net/contact + 5. http://unixwiz.net/about/ + 6. http://unixwiz.net/techtips/ + 7. http://unixwiz.net/tools/ + 8. http://unixwiz.net/evo/ + 9. http://unixwiz.net/cmdletters/ + 10. http://unixwiz.net/research/ + 11. http://unixwiz.net/3b2.html + 12. http://unixwiz.net/advisories.html + 13. http://unixwiz.net/news.html + 14. http://unixwiz.net/literacy.html + 15. http://unixwiz.net/voting/ + 16. http://unixwiz.net/personal/ + 17. http://blog.unixwiz.net/ + 18. http://evoblog.unixwiz.net/ + 19. http://unixwiz.net/techtips/win32-callconv.html + 20. http://unixwiz.net/techtips/index.html + 21. http://unixwiz.net/ + 22. http://unixwiz.net/contact.html + 23. http://unixwiz.net/techtips/techtips.rss |