1 files changed, 841 insertions, 0 deletions
diff --git a/miniany/doc/www.cs.virginia.edu_~evans_cs216_guides_x86.txt b/miniany/doc/www.cs.virginia.edu_~evans_cs216_guides_x86.txt
new file mode 100644
index 0000000..ed0736f
--- /dev/null
+++ b/miniany/doc/www.cs.virginia.edu_~evans_cs216_guides_x86.txt
@@ -0,0 +1,841 @@
+                   University of Virginia Computer Science
+             CS216: Program and Data Representation, Spring 2006
+                                            08 March 2022
+
+                             x86 Assembly Guide
+
+    Contents: [1]Registers | [2]Memory and Addressing | [3]Instructions |
+                            [4]Calling Convention
+
+   This guide describes the basics of 32-bit x86 assembly language
+   programming, covering a small but useful subset of the available
+   instructions and assembler directives. There are several different
+   assembly languages for generating x86 machine code. The one we will use
+   in CS216 is the Microsoft Macro Assembler (MASM) assembler. MASM uses
+   the standard Intel syntax for writing x86 assembly code.
+
+   The full x86 instruction set is large and complex (Intel's x86
+   instruction set manuals comprise over 2900 pages), and we do not cover
+   it all in this guide. For example, there is a 16-bit subset of the x86
+   instruction set. Using the 16-bit programming model can be quite
+   complex. It has a segmented memory model, more restrictions on register
+   usage, and so on. In this guide, we will limit our attention to more
+   modern aspects of x86 programming, and delve into the instruction set
+   only in enough detail to get a basic feel for x86 programming.
+
+Resources
+
+     * [5]Guide to Using Assembly in Visual Studio -- a tutorial on
+       building and debugging assembly code in Visual Studio
+     * [6]Intel x86 Instruction Set Reference
+     * [7]Intel's Pentium Manuals (the full gory details)
+
+Registers
+
+   Modern (i.e 386 and beyond) x86 processors have eight 32-bit general
+   purpose registers, as depicted in Figure 1. The register names are
+   mostly historical. For example, EAX used to be called the accumulator
+   since it was used by a number of arithmetic operations, and ECX was
+   known as the counter since it was used to hold a loop index. Whereas
+   most of the registers have lost their special purposes in the modern
+   instruction set, by convention, two are reserved for special purposes
+   -- the stack pointer (ESP) and the base pointer (EBP).
+
+   For the EAX, EBX, ECX, and EDX registers, subsections may be used. For
+   example, the least significant 2 bytes of EAX can be treated as a
+   16-bit register called AX. The least significant byte of AX can be used
+   as a single 8-bit register called AL, while the most significant byte
+   of AX can be used as a single 8-bit register called AH. These names
+   refer to the same physical register. When a two-byte quantity is placed
+   into DX, the update affects the value of DH, DL, and EDX. These
+   sub-registers are mainly hold-overs from older, 16-bit versions of the
+   instruction set. However, they are sometimes convenient when dealing
+   with data that are smaller than 32-bits (e.g. 1-byte ASCII characters).
+
+   When referring to registers in assembly language, the names are not
+   case-sensitive. For example, the names EAX and eax refer to the same
+   register.
+
+                           [8][x86-registers.png]
+                           Figure 1. x86 Registers
+
+Memory and Addressing Modes
+
+Declaring Static Data Regions
+
+   You can declare static data regions (analogous to global variables) in
+   x86 assembly using special assembler directives for this purpose. Data
+   declarations should be preceded by the .DATA directive. Following this
+   directive, the directives DB, DW, and DD can be used to declare one,
+   two, and four byte data locations, respectively. Declared locations can
+   be labeled with names for later reference -- this is similar to
+   declaring variables by name, but abides by some lower level rules. For
+   example, locations declared in sequence will be located in memory next
+   to one another.
+
+   Example declarations:
+
+   .DATA
+   var DB 64   ; Declare a byte, referred to as location var, containing
+   the value 64.
+   var2 DB ? ; Declare an uninitialized byte, referred to as location
+   var2.
+   DB 10 ; Declare a byte with no label, containing the value 10. Its
+   location is var2 + 1.
+   X DW ? ; Declare a 2-byte uninitialized value, referred to as location
+   X.
+   Y DD 30000     ; Declare a 4-byte value, referred to as location Y,
+   initialized to 30000.
+
+   Unlike in high level languages where arrays can have many dimensions
+   and are accessed by indices, arrays in x86 assembly language are simply
+   a number of cells located contiguously in memory. An array can be
+   declared by just listing the values, as in the first example below. Two
+   other common methods used for declaring arrays of data are the DUP
+   directive and the use of string literals. The DUP directive tells the
+   assembler to duplicate an expression a given number of times. For
+   example, 4 DUP(2) is equivalent to 2, 2, 2, 2.
+
+   Some examples:
+
+   Z DD 1, 2, 3 ; Declare three 4-byte values, initialized to 1, 2, and 3.
+   The value of location Z + 8 will be 3.
+   bytes   DB 10 DUP(?) ; Declare 10 uninitialized bytes starting at
+   location bytes.
+   arr DD 100 DUP(0)     ; Declare 100 4-byte words starting at location
+   arr, all initialized to 0
+   str DB 'hello',0 ; Declare 6 bytes starting at the address str,
+   initialized to the ASCII character values for hello and the null (0)
+   byte.
+
+Addressing Memory
+
+   Modern x86-compatible processors are capable of addressing up to 2^32
+   bytes of memory: memory addresses are 32-bits wide. In the examples
+   above, where we used labels to refer to memory regions, these labels
+   are actually replaced by the assembler with 32-bit quantities that
+   specify addresses in memory. In addition to supporting referring to
+   memory regions by labels (i.e. constant values), the x86 provides a
+   flexible scheme for computing and referring to memory addresses: up to
+   two of the 32-bit registers and a 32-bit signed constant can be added
+   together to compute a memory address. One of the registers can be
+   optionally pre-multiplied by 2, 4, or 8.
+
+   The addressing modes can be used with many x86 instructions (we'll
+   describe them in the next section). Here we illustrate some examples
+   using the mov instruction that moves data between registers and memory.
+   This instruction has two operands: the first is the destination and the
+   second specifies the source.
+
+   Some examples of mov instructions using address computations are:
+
+   mov eax, [ebx] ; Move the 4 bytes in memory at the address contained in
+   EBX into EAX
+   mov [var], ebx ; Move the contents of EBX into the 4 bytes at memory
+   address var. (Note, var is a 32-bit constant).
+   mov eax, [esi-4] ; Move 4 bytes at memory address ESI + (-4) into EAX
+   mov [esi+eax], cl ; Move the contents of CL into the byte at address
+   ESI+EAX
+   mov edx, [esi+4*ebx]     ; Move the 4 bytes of data at address
+   ESI+4*EBX into EDX
+
+   Some examples of invalid address calculations include:
+
+     mov eax, [ebx-ecx]         ; Can only add register values
+     mov [eax+esi+edi], ebx     ; At most 2 registers in address computation
+
+Size Directives
+
+   In general, the intended size of the data item at a given memory
+   address can be inferred from the assembly code instruction in which it
+   is referenced. For example, in all of the above instructions, the size
+   of the memory regions could be inferred from the size of the register
+   operand. When we were loading a 32-bit register, the assembler could
+   infer that the region of memory we were referring to was 4 bytes wide.
+   When we were storing the value of a one byte register to memory, the
+   assembler could infer that we wanted the address to refer to a single
+   byte in memory.
+
+   However, in some cases the size of a referred-to memory region is
+   ambiguous. Consider the instruction mov [ebx], 2. Should this
+   instruction move the value 2 into the single byte at address EBX?
+   Perhaps it should move the 32-bit integer representation of 2 into the
+   4-bytes starting at address EBX. Since either is a valid possible
+   interpretation, the assembler must be explicitly directed as to which
+   is correct. The size directives BYTE PTR, WORD PTR, and DWORD PTR serve
+   this purpose, indicating sizes of 1, 2, and 4 bytes respectively.
+
+   For example:
+
+   mov BYTE PTR [ebx], 2 ; Move 2 into the single byte at the address
+   stored in EBX.
+   mov WORD PTR [ebx], 2 ; Move the 16-bit integer representation of 2
+   into the 2 bytes starting at the address in EBX.
+   mov DWORD PTR [ebx], 2     ; Move the 32-bit integer representation of
+   2 into the 4 bytes starting at the address in EBX.
+
+Instructions
+
+   Machine instructions generally fall into three categories: data
+   movement, arithmetic/logic, and control-flow. In this section, we will
+   look at important examples of x86 instructions from each category. This
+   section should not be considered an exhaustive list of x86
+   instructions, but rather a useful subset. For a complete list, see
+   Intel's instruction set reference.
+
+   We use the following notation:
+
+   <reg32>     Any 32-bit register (EAX, EBX, ECX, EDX, ESI, EDI, ESP, or
+   EBP)
+   <reg16> Any 16-bit register (AX, BX, CX, or DX)
+   <reg8> Any 8-bit register (AH, BH, CH, DH, AL, BL, CL, or DL)
+   <reg> Any register
+   <mem> A memory address (e.g., [eax], [var + 4], or dword ptr [eax+ebx])
+   <con32> Any 32-bit constant
+   <con16> Any 16-bit constant
+   <con8> Any 8-bit constant
+   <con> Any 8-, 16-, or 32-bit constant
+
+Data Movement Instructions
+
+   mov -- Move (Opcodes: 88, 89, 8A, 8B, 8C, 8E, ...)
+
+     The mov instruction copies the data item referred to by its second
+     operand (i.e. register contents, memory contents, or a constant
+     value) into the location referred to by its first operand (i.e. a
+     register or memory). While register-to-register moves are possible,
+     direct memory-to-memory moves are not. In cases where memory
+     transfers are desired, the source memory contents must first be
+     loaded into a register, then can be stored to the destination memory
+     address.
+
+     Syntax
+     mov <reg>,<reg>
+     mov <reg>,<mem>
+     mov <mem>,<reg>
+     mov <reg>,<const>
+     mov <mem>,<const>
+
+     Examples
+     mov eax, ebx -- copy the value in ebx into eax
+     mov byte ptr [var], 5 -- store the value 5 into the byte at location
+     var
+
+   push -- Push stack (Opcodes: FF, 89, 8A, 8B, 8C, 8E, ...)
+
+     The push instruction places its operand onto the top of the hardware
+     supported stack in memory. Specifically, push first decrements ESP
+     by 4, then places its operand into the contents of the 32-bit
+     location at address [ESP]. ESP (the stack pointer) is decremented by
+     push since the x86 stack grows down - i.e. the stack grows from high
+     addresses to lower addresses.
+
+     Syntax
+     push <reg32>
+     push <mem>
+     push <con32>
+
+     Examples
+     push eax -- push eax on the stack
+     push [var] -- push the 4 bytes at address var onto the stack
+
+   pop -- Pop stack
+
+     The pop instruction removes the 4-byte data element from the top of
+     the hardware-supported stack into the specified operand (i.e.
+     register or memory location). It first moves the 4 bytes located at
+     memory location [SP] into the specified register or memory location,
+     and then increments SP by 4.
+
+     Syntax
+     pop <reg32>
+     pop <mem>
+
+     Examples
+     pop edi -- pop the top element of the stack into EDI.
+     pop [ebx] -- pop the top element of the stack into memory at the
+     four bytes starting at location EBX.
+
+   lea -- Load effective address
+
+     The lea instruction places the address specified by its second
+     operand into the register specified by its first operand. Note, the
+     contents of the memory location are not loaded, only the effective
+     address is computed and placed into the register. This is useful for
+     obtaining a pointer into a memory region.
+
+     Syntax
+     lea <reg32>,<mem>
+
+     Examples
+     lea edi, [ebx+4*esi] -- the quantity EBX+4*ESI is placed in EDI.
+     lea eax, [var] -- the value in var is placed in EAX.
+     lea eax, [val] -- the value val is placed in EAX.
+
+Arithmetic and Logic Instructions
+
+   add -- Integer Addition
+
+     The add instruction adds together its two operands, storing the
+     result in its first operand. Note, whereas both operands may be
+     registers, at most one operand may be a memory location.
+
+     Syntax
+     add <reg>,<reg>
+     add <reg>,<mem>
+     add <mem>,<reg>
+     add <reg>,<con>
+     add <mem>,<con>
+
+     Examples
+     add eax, 10 -- EAX <- EAX + 10
+     add BYTE PTR [var], 10 -- add 10 to the single byte stored at memory
+     address var
+
+   sub -- Integer Subtraction
+
+     The sub instruction stores in the value of its first operand the
+     result of subtracting the value of its second operand from the value
+     of its first operand. As with add Syntax
+     sub <reg>,<reg>
+     sub <reg>,<mem>
+     sub <mem>,<reg>
+     sub <reg>,<con>
+     sub <mem>,<con>
+
+     Examples
+     sub al, ah -- AL <- AL - AH
+     sub eax, 216 -- subtract 216 from the value stored in EAX
+
+   inc, dec -- Increment, Decrement
+
+     The inc instruction increments the contents of its operand by one.
+     The dec instruction decrements the contents of its operand by one.
+
+     Syntax
+     inc <reg>
+     inc <mem>
+     dec <reg>
+     dec <mem>
+
+     Examples
+     dec eax -- subtract one from the contents of EAX.
+     inc DWORD PTR [var] -- add one to the 32-bit integer stored at
+     location var
+
+   imul -- Integer Multiplication
+
+     The imul instruction has two basic formats: two-operand (first two
+     syntax listings above) and three-operand (last two syntax listings
+     above).
+
+     The two-operand form multiplies its two operands together and stores
+     the result in the first operand. The result (i.e. first) operand
+     must be a register.
+
+     The three operand form multiplies its second and third operands
+     together and stores the result in its first operand. Again, the
+     result operand must be a register. Furthermore, the third operand is
+     restricted to being a constant value.
+
+     Syntax
+     imul <reg32>,<reg32>
+     imul <reg32>,<mem>
+     imul <reg32>,<reg32>,<con>
+     imul <reg32>,<mem>,<con>
+
+     Examples
+
+   imul eax, [var] -- multiply the contents of EAX by the 32-bit contents
+   of the memory location var. Store the result in EAX.
+
+   imul esi, edi, 25 -- ESI -> EDI * 25
+
+   idiv -- Integer Division
+
+     The idiv instruction divides the contents of the 64 bit integer
+     EDX:EAX (constructed by viewing EDX as the most significant four
+     bytes and EAX as the least significant four bytes) by the specified
+     operand value. The quotient result of the division is stored into
+     EAX, while the remainder is placed in EDX.
+
+     Syntax
+     idiv <reg32>
+     idiv <mem>
+
+     Examples
+
+   idiv ebx -- divide the contents of EDX:EAX by the contents of EBX.
+   Place the quotient in EAX and the remainder in EDX.
+
+   idiv DWORD PTR [var] -- divide the contents of EDX:EAX by the 32-bit
+   value stored at memory location var. Place the quotient in EAX and the
+   remainder in EDX.
+
+   and, or, xor -- Bitwise logical and, or and exclusive or
+
+     These instructions perform the specified logical operation (logical
+     bitwise and, or, and exclusive or, respectively) on their operands,
+     placing the result in the first operand location.
+
+     Syntax
+     and <reg>,<reg>
+     and <reg>,<mem>
+     and <mem>,<reg>
+     and <reg>,<con>
+     and <mem>,<con>
+
+     or <reg>,<reg>
+     or <reg>,<mem>
+     or <mem>,<reg>
+     or <reg>,<con>
+     or <mem>,<con>
+
+     xor <reg>,<reg>
+     xor <reg>,<mem>
+     xor <mem>,<reg>
+     xor <reg>,<con>
+     xor <mem>,<con>
+
+     Examples
+     and eax, 0fH -- clear all but the last 4 bits of EAX.
+     xor edx, edx -- set the contents of EDX to zero.
+
+   not -- Bitwise Logical Not
+
+     Logically negates the operand contents (that is, flips all bit
+     values in the operand).
+
+     Syntax
+     not <reg>
+     not <mem>
+
+     Example
+     not BYTE PTR [var] -- negate all bits in the byte at the memory
+     location var.
+
+   neg -- Negate
+
+     Performs the two's complement negation of the operand contents.
+
+     Syntax
+     neg <reg>
+     neg <mem>
+
+     Example
+     neg eax -- EAX -> - EAX
+
+   shl, shr -- Shift Left, Shift Right
+
+     These instructions shift the bits in their first operand's contents
+     left and right, padding the resulting empty bit positions with
+     zeros. The shifted operand can be shifted up to 31 places. The
+     number of bits to shift is specified by the second operand, which
+     can be either an 8-bit constant or the register CL. In either case,
+     shifts counts of greater then 31 are performed modulo 32.
+
+     Syntax
+     shl <reg>,<con8>
+     shl <mem>,<con8>
+     shl <reg>,<cl>
+     shl <mem>,<cl>
+
+     shr <reg>,<con8>
+     shr <mem>,<con8>
+     shr <reg>,<cl>
+     shr <mem>,<cl>
+
+     Examples
+
+   shl eax, 1 -- Multiply the value of EAX by 2 (if the most significant
+   bit is 0)
+
+   shr ebx, cl -- Store in EBX the floor of result of dividing the value
+   of EBX by 2^n wheren is the value in CL.
+
+Control Flow Instructions
+
+   The x86 processor maintains an instruction pointer (IP) register that
+   is a 32-bit value indicating the location in memory where the current
+   instruction starts. Normally, it increments to point to the next
+   instruction in memory begins after execution an instruction. The IP
+   register cannot be manipulated directly, but is updated implicitly by
+   provided control flow instructions.
+
+   We use the notation <label> to refer to labeled locations in the
+   program text. Labels can be inserted anywhere in x86 assembly code text
+   by entering a label name followed by a colon. For example,
+
+       mov esi, [ebp+8]
+begin: xor ecx, ecx
+       mov eax, [esi]
+
+   The second instruction in this code fragment is labeled begin.
+   Elsewhere in the code, we can refer to the memory location that this
+   instruction is located at in memory using the more convenient symbolic
+   name begin. This label is just a convenient way of expressing the
+   location instead of its 32-bit value.
+
+   jmp -- Jump
+
+     Transfers program control flow to the instruction at the memory
+     location indicated by the operand.
+
+     Syntax
+     jmp <label>
+
+     Example
+     jmp begin -- Jump to the instruction labeled begin.
+
+   jcondition -- Conditional Jump
+
+     These instructions are conditional jumps that are based on the
+     status of a set of condition codes that are stored in a special
+     register called the machine status word. The contents of the machine
+     status word include information about the last arithmetic operation
+     performed. For example, one bit of this word indicates if the last
+     result was zero. Another indicates if the last result was negative.
+     Based on these condition codes, a number of conditional jumps can be
+     performed. For example, the jz instruction performs a jump to the
+     specified operand label if the result of the last arithmetic
+     operation was zero. Otherwise, control proceeds to the next
+     instruction in sequence.
+
+     A number of the conditional branches are given names that are
+     intuitively based on the last operation performed being a special
+     compare instruction, cmp (see below). For example, conditional
+     branches such as jle and jne are based on first performing a cmp
+     operation on the desired operands.
+
+     Syntax
+     je <label> (jump when equal)
+     jne <label> (jump when not equal)
+     jz <label> (jump when last result was zero)
+     jg <label> (jump when greater than)
+     jge <label> (jump when greater than or equal to)
+     jl <label> (jump when less than)
+     jle <label> (jump when less than or equal to)
+
+     Example
+     cmp eax, ebx
+     jle done
+
+   If the contents of EAX are less than or equal to the contents of EBX,
+   jump to the label done. Otherwise, continue to the next instruction.
+
+   cmp -- Compare
+
+     Compare the values of the two specified operands, setting the
+     condition codes in the machine status word appropriately. This
+     instruction is equivalent to the sub instruction, except the result
+     of the subtraction is discarded instead of replacing the first
+     operand.
+
+     Syntax
+     cmp <reg>,<reg>
+     cmp <reg>,<mem>
+     cmp <mem>,<reg>
+     cmp <reg>,<con>
+
+     Example
+     cmp DWORD PTR [var], 10
+     jeq loop
+
+   If the 4 bytes stored at location var are equal to the 4-byte integer
+   constant 10, jump to the location labeled loop.
+
+   call, ret -- Subroutine call and return
+
+     These instructions implement a subroutine call and return. The call
+     instruction first pushes the current code location onto the hardware
+     supported stack in memory (see the push instruction for details),
+     and then performs an unconditional jump to the code location
+     indicated by the label operand. Unlike the simple jump instructions,
+     the call instruction saves the location to return to when the
+     subroutine completes.
+
+     The ret instruction implements a subroutine return mechanism. This
+     instruction first pops a code location off the hardware supported
+     in-memory stack (see the pop instruction for details). It then
+     performs an unconditional jump to the retrieved code location.
+
+     Syntax
+     call <label>
+     ret
+
+Calling Convention
+
+   To allow separate programmers to share code and develop libraries for
+   use by many programs, and to simplify the use of subroutines in
+   general, programmers typically adopt a common calling convention. The
+   calling convention is a protocol about how to call and return from
+   routines. For example, given a set of calling convention rules, a
+   programmer need not examine the definition of a subroutine to determine
+   how parameters should be passed to that subroutine. Furthermore, given
+   a set of calling convention rules, high-level language compilers can be
+   made to follow the rules, thus allowing hand-coded assembly language
+   routines and high-level language routines to call one another.
+
+   In practice, many calling conventions are possible. We will use the
+   widely used C language calling convention. Following this convention
+   will allow you to write assembly language subroutines that are safely
+   callable from C (and C++) code, and will also enable you to call C
+   library functions from your assembly language code.
+
+   The C calling convention is based heavily on the use of the
+   hardware-supported stack. It is based on the push, pop, call, and ret
+   instructions. Subroutine parameters are passed on the stack. Registers
+   are saved on the stack, and local variables used by subroutines are
+   placed in memory on the stack. The vast majority of high-level
+   procedural languages implemented on most processors have used similar
+   calling conventions.
+
+   The calling convention is broken into two sets of rules. The first set
+   of rules is employed by the caller of the subroutine, and the second
+   set of rules is observed by the writer of the subroutine (the callee).
+   It should be emphasized that mistakes in the observance of these rules
+   quickly result in fatal program errors since the stack will be left in
+   an inconsistent state; thus meticulous care should be used when
+   implementing the call convention in your own subroutines.
+
+                         [9][stack-convention.png] >
+                        Stack during Subroutine Call
+    [Thanks to Maxence Faldor for providing a correct figure and to James
+   Peterson for finding and fixing the bug in the original version of this
+                                  figure!]
+
+   A good way to visualize the operation of the calling convention is to
+   draw the contents of the nearby region of the stack during subroutine
+   execution. The image above depicts the contents of the stack during the
+   execution of a subroutine with three parameters and three local
+   variables. The cells depicted in the stack are 32-bit wide memory
+   locations, thus the memory addresses of the cells are 4 bytes apart.
+   The first parameter resides at an offset of 8 bytes from the base
+   pointer. Above the parameters on the stack (and below the base
+   pointer), the call instruction placed the return address, thus leading
+   to an extra 4 bytes of offset from the base pointer to the first
+   parameter. When the ret instruction is used to return from the
+   subroutine, it will jump to the return address stored on the stack.
+
+Caller Rules
+
+   To make a subrouting call, the caller should:
+    1. Before calling a subroutine, the caller should save the contents of
+       certain registers that are designated caller-saved. The
+       caller-saved registers are EAX, ECX, EDX. Since the called
+       subroutine is allowed to modify these registers, if the caller
+       relies on their values after the subroutine returns, the caller
+       must push the values in these registers onto the stack (so they can
+       be restore after the subroutine returns.
+    2. To pass parameters to the subroutine, push them onto the stack
+       before the call. The parameters should be pushed in inverted order
+       (i.e. last parameter first). Since the stack grows down, the first
+       parameter will be stored at the lowest address (this inversion of
+       parameters was historically used to allow functions to be passed a
+       variable number of parameters).
+    3. To call the subroutine, use the call instruction. This instruction
+       places the return address on top of the parameters on the stack,
+       and branches to the subroutine code. This invokes the subroutine,
+       which should follow the callee rules below.
+
+   After the subroutine returns (immediately following the call
+   instruction), the caller can expect to find the return value of the
+   subroutine in the register EAX. To restore the machine state, the
+   caller should:
+    1. Remove the parameters from stack. This restores the stack to its
+       state before the call was performed.
+    2. Restore the contents of caller-saved registers (EAX, ECX, EDX) by
+       popping them off of the stack. The caller can assume that no other
+       registers were modified by the subroutine.
+
+   Example
+   The code below shows a function call that follows the caller rules. The
+   caller is calling a function _myFunc that takes three integer
+   parameters. First parameter is in EAX, the second parameter is the
+   constant 216; the third parameter is in memory location var.
+
+push [var] ; Push last parameter first
+push 216   ; Push the second parameter
+push eax   ; Push first parameter last
+
+call _myFunc ; Call the function (assume C naming)
+
+add esp, 12
+
+   Note that after the call returns, the caller cleans up the stack using
+   the add instruction. We have 12 bytes (3 parameters * 4 bytes each) on
+   the stack, and the stack grows down. Thus, to get rid of the
+   parameters, we can simply add 12 to the stack pointer.
+
+   The result produced by _myFunc is now available for use in the register
+   EAX. The values of the caller-saved registers (ECX and EDX), may have
+   been changed. If the caller uses them after the call, it would have
+   needed to save them on the stack before the call and restore them after
+   it.
+
+Callee Rules
+
+   The definition of the subroutine should adhere to the following rules
+   at the beginning of the subroutine:
+    1. Push the value of EBP onto the stack, and then copy the value of
+       ESP into EBP using the following instructions:
+    push ebp
+    mov  ebp, esp
+
+       This initial action maintains the base pointer, EBP. The base
+       pointer is used by convention as a point of reference for finding
+       parameters and local variables on the stack. When a subroutine is
+       executing, the base pointer holds a copy of the stack pointer value
+       from when the subroutine started executing. Parameters and local
+       variables will always be located at known, constant offsets away
+       from the base pointer value. We push the old base pointer value at
+       the beginning of the subroutine so that we can later restore the
+       appropriate base pointer value for the caller when the subroutine
+       returns. Remember, the caller is not expecting the subroutine to
+       change the value of the base pointer. We then move the stack
+       pointer into EBP to obtain our point of reference for accessing
+       parameters and local variables.
+    2. Next, allocate local variables by making space on the stack.
+       Recall, the stack grows down, so to make space on the top of the
+       stack, the stack pointer should be decremented. The amount by which
+       the stack pointer is decremented depends on the number and size of
+       local variables needed. For example, if 3 local integers (4 bytes
+       each) were required, the stack pointer would need to be decremented
+       by 12 to make space for these local variables (i.e., sub esp, 12).
+       As with parameters, local variables will be located at known
+       offsets from the base pointer.
+    3. Next, save the values of the callee-saved registers that will be
+       used by the function. To save registers, push them onto the stack.
+       The callee-saved registers are EBX, EDI, and ESI (ESP and EBP will
+       also be preserved by the calling convention, but need not be pushed
+       on the stack during this step).
+
+   After these three actions are performed, the body of the subroutine may
+   proceed. When the subroutine is returns, it must follow these steps:
+    1. Leave the return value in EAX.
+    2. Restore the old values of any callee-saved registers (EDI and ESI)
+       that were modified. The register contents are restored by popping
+       them from the stack. The registers should be popped in the inverse
+       order that they were pushed.
+    3. Deallocate local variables. The obvious way to do this might be to
+       add the appropriate value to the stack pointer (since the space was
+       allocated by subtracting the needed amount from the stack pointer).
+       In practice, a less error-prone way to deallocate the variables is
+       to move the value in the base pointer into the stack pointer:
+       mov esp, ebp. This works because the base pointer always contains
+       the value that the stack pointer contained immediately prior to the
+       allocation of the local variables.
+    4. Immediately before returning, restore the caller's base pointer
+       value by popping EBP off the stack. Recall that the first thing we
+       did on entry to the subroutine was to push the base pointer to save
+       its old value.
+    5. Finally, return to the caller by executing a ret instruction. This
+       instruction will find and remove the appropriate return address
+       from the stack.
+
+   Note that the callee's rules fall cleanly into two halves that are
+   basically mirror images of one another. The first half of the rules
+   apply to the beginning of the function, and are commonly said to define
+   the prologue to the function. The latter half of the rules apply to the
+   end of the function, and are thus commonly said to define the epilogue
+   of the function. Example
+   Here is an example function definition that follows the callee rules:
+
+.486
+.MODEL FLAT
+.CODE
+PUBLIC _myFunc
+_myFunc PROC
+  ; Subroutine Prologue
+  push ebp     ; Save the old base pointer value.
+  mov ebp, esp ; Set the new base pointer value.
+  sub esp, 4   ; Make room for one 4-byte local variable.
+  push edi     ; Save the values of registers that the function
+  push esi     ; will modify. This function uses EDI and ESI.
+  ; (no need to save EBX, EBP, or ESP)
+
+  ; Subroutine Body
+  mov eax, [ebp+8]   ; Move value of parameter 1 into EAX
+  mov esi, [ebp+12]  ; Move value of parameter 2 into ESI
+  mov edi, [ebp+16]  ; Move value of parameter 3 into EDI
+
+  mov [ebp-4], edi   ; Move EDI into the local variable
+  add [ebp-4], esi   ; Add ESI into the local variable
+  add eax, [ebp-4]   ; Add the contents of the local variable
+                     ; into EAX (final result)
+
+  ; Subroutine Epilogue
+  pop esi      ; Recover register values
+  pop  edi
+  mov esp, ebp ; Deallocate local variables
+  pop ebp ; Restore the caller's base pointer value
+  ret
+_myFunc ENDP
+END
+
+   The subroutine prologue performs the standard actions of saving a
+   snapshot of the stack pointer in EBP (the base pointer), allocating
+   local variables by decrementing the stack pointer, and saving register
+   values on the stack.
+
+   In the body of the subroutine we can see the use of the base pointer.
+   Both parameters and local variables are located at constant offsets
+   from the base pointer for the duration of the subroutines execution. In
+   particular, we notice that since parameters were placed onto the stack
+   before the subroutine was called, they are always located below the
+   base pointer (i.e. at higher addresses) on the stack. The first
+   parameter to the subroutine can always be found at memory location EBP
+   + 8, the second at EBP + 12, the third at EBP + 16. Similarly, since
+   local variables are allocated after the base pointer is set, they
+   always reside above the base pointer (i.e. at lower addresses) on the
+   stack. In particular, the first local variable is always located at EBP
+   - 4, the second at EBP - 8, and so on. This conventional use of the
+   base pointer allows us to quickly identify the use of local variables
+   and parameters within a function body.
+
+   The function epilogue is basically a mirror image of the function
+   prologue. The caller's register values are recovered from the stack,
+   the local variables are deallocated by resetting the stack pointer, the
+   caller's base pointer value is recovered, and the ret instruction is
+   used to return to the appropriate code location in the caller.
+
+Using these Materials
+
+   These materials are released under a [10]Creative Commons
+   Attribution-Noncommercial-Share Alike 3.0 United States License. We are
+   delighted when people want to use or adapt the course materials we
+   developed, and you are welcome to reuse and adapt these materials for
+   any non-commercial purposes (if you would like to use them for a
+   commercial purpose, please contact [11]David Evans for more
+   information). If you do adapt or use these materials, please include a
+   credit like "Adapted from materials developed for University of
+   Virginia cs216 by David Evans. This guide was revised for cs216 by
+   David Evans, based on materials originally created by Adam Ferrari many
+   years ago, and since updated by Alan Batson, Mike Lack, and Anita
+   Jones." and a link back to this page.
+     __________________________________________________________________
+
+                 [12]CS216: Program and Data Representation
+                         [13]University of Virginia
+                               [14]David Evans
+                          [15]evans@cs.virginia.edu
+                          [16]Using these Materials
+
+References
+
+   1. https://www.cs.virginia.edu/~evans/cs216/guides/x86.html#registers
+   2. https://www.cs.virginia.edu/~evans/cs216/guides/x86.html#memory
+   3. https://www.cs.virginia.edu/~evans/cs216/guides/x86.html#instructions
+   4. https://www.cs.virginia.edu/~evans/cs216/guides/x86.html#calling
+   5. https://www.cs.virginia.edu/~evans/cs216/guides/vsasm.html
+   6. http://www.felixcloutier.com/x86/
+   7. http://www.intel.com/content/www/us/en/processors/architectures-software-developer-manuals.html
+   8. https://www.cs.virginia.edu/~evans/cs216/guides/x86-registers.png
+   9. https://www.cs.virginia.edu/~evans/cs216/guides/stack-convention.png
+  10. https://creativecommons.org/licenses/by-nc-sa/3.0/us/
+  11. http://www.cs.virginia.edu/evans/
+  12. http://www.cs.virginia.edu/evans/cs216/
+  13. http://www.cs.virginia.edu/
+  14. http://www.cs.virginia.edu/evans/
+  15. mailto:evans@cs.virginia.edu
+  16. http://www.cs.virginia.edu/evans/cs216/reuse.html