diff options
Diffstat (limited to 'miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt')
-rw-r--r-- | miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt | 645 |
1 files changed, 645 insertions, 0 deletions
diff --git a/miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt b/miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt new file mode 100644 index 0000000..c349efd --- /dev/null +++ b/miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt @@ -0,0 +1,645 @@ + IA-32 (x86) Architecture + +History + + As technology improved over the years, there developed a race to get + the first (usable) processors on a single integrated circuit. + + When able to place approximately 10,000 transistors on a single IC, + then we have just about enough circuitry to put a (simple) processor on + a this single IC. + + The Intel 8086 was Intel's entry in the race. On the way to getting + their processor out (on the market) as fast as possible, they made some + unusual design decisions. + Year + 1974 8080 8-bit architecture with 8-bit bus + + 1978 8086 16 bit architecture w/ 16-bit bus + + 8088 - like 8086, 16-bit architecture, but + only had an 8-bit (internal) bus + selected for IBM PC -- golden handcuffs + + 1980 8087 FPU + + 1982 80286 24-bit weird addresses + + 1985 80386 32b registers and addresses + + 1989 80486, + 1993 Pentium, + 1995 Pentium Pro -- few changes + + 1997 MMX + + Backward Compatibility + + SECTION NOT YET WRITTEN! + + Current implementations + + Pentium Pro and after + + Instruction decode translates machine code into "RISC OPS" (like + decoded MIPS instructions) + + An execution unit runs RISC OPS + + + Backward compatiblity + - Complex decoding + + execution unit as fast as RISC like MIPS + +The Pentium Architecture + + * It is not a load/store architecture. + * The instruction set is huge! We go over only a fraction of the + instruction set. 16bit, 32bit operations on memory and registers + decoding nightmare: a single machine code instruction can be from 1 + to 17 bytes long w/ prefixes & postfixes. But, mainline (most + common) 386 instructions not terrible + * There are lots of restrictions on how instructions/operands are put + together, but there is also an amazing amount of flexibility. + + Registers + + The Intel architectures as a set just do not have enough registers to + satisfy most assembly language programmers. Still, the processors have + been around for a LONG time, and they have a sufficient number of + registers to do whatever is necessary. + + For our (mostly) general purpose use, we get + 32-bit 16-bit 8-bit 8-bit + (high part of 16) (low part of 16) + + EAX AX AH AL + EBX BX BH BL + ECX CX CH CL + EDX DX DH DL + + + and + + EBP BP + ESI SI + EDI DI + ESP SP + + There are a few more, but we will not use or discuss them. They are + only used for memory accessability in the segmented memory model. + + Note that it is unusual to be able to designate part of a register as + an operand. This evolved, due to the backward compatibility to previous + processors that had 16-bit registers. + + Using the registers: + As an operand, just use the name (upper case and lower case both work + interchangeably). + + EBP is a frame pointer. + ESP is a stack pointer. + + ONE MORE REGISTER: + Many bits used for controlling the action of the processor and setting + state are in the register called EFLAGS. This register contains the + condition codes: + OF Overflow flag + SF Sign flag + ZF Zero flag + PF Parity flag + CF Carry flag + + The settings of these flags are checked in conditional control + instructions. Many instructions set one or more of the flags. + + The use of the EFLAGS register is implied (rather than explicit) in + instructions. + + Accessing Memory + + There are 2 memory models supported in the Pentium architecture. + (Actually it is the 486 and more recent models that support 2 models.) + + In both models, memory is accessed using an address. It is the way that + addresses are formed (within the processor) that differs in the 2 + models. + + FLAT MEMORY MODEL + + -- The memory model that every one else uses. + + SEGMENTED MEMORY MODEL + + -- Different parts of a program are assumed to be in their own, + set-aside portions of memory. These portions are called segments. + + -- An address is formed from 2 pieces: a segment location and an offset + within a segment. + + Note that each of these pieces can be shorter (contain fewer bits) than + a whole address. This is much of the reason that Intel chose this form + of memory model for its earliest single-chip processors. + + -- There are segments for: + + code + data + stack + other + + -- Which segment something is in can be implied by the memory access + involved. An instruction fetch will always be looking in the code + segment. An instruction to push data onto the stack always accesses the + stack segment. + + Addressing Modes + + Some would say that the Intel architectures only support 1 addressing + mode. It looks (something like) this: + effective address = base reg + (index reg x scaling factor) + displacement + + where + base reg is EAX, EBX, ECX, EDX or ESP or EBP + index reg is EDI or ESI + scaling factor is 1, 2, 4, or 8 + + The syntax of using this (very general) addressing mode will vary from + system to system. It depends on the preprocessor and the syntax + accepted by the assembler. + + For our implementation, an operand within an instruction that uses this + addressing mode could look like + [EAX][EDI*2 + 80] + + + The effective address calculated with be the contents of register EDI + multiplied times 2 added to the constant 80, added to the contents of + register EAX. + + There are extremely few times where a high-level language compiler can + utilize such a complex addressing mode. It is much more likely that + simplified versions of this mode will be used. + + SOME ADDRESSING MODES + + -- register mode -- The operand is in a register. The effective address + is the register (wierd). + + Example instruction: + mov eax, ecx + + Both operands use register mode. The contents of register ecx is copied + to register eax. + + -- immediate mode -- The operand is in the instruction. The effective + address is within the instruction. + + Example instruction: + mov eax, 26 + + The second operand uses immediate mode. Within the instruction is the + operand. It is copied to register eax. + + -- register direct mode -- The effective address is in a register. + + Example instruction: + mov eax, [esp] + + The second operand uses register direct mode. The contents of register + esp is the effective address. The contents of memory at the effective + address are copied into register eax. + + -- direct mode -- The effective address is in the instruction. + + Example instruction: + mov eax, var_name + + The second operand uses direct mode. The instruction contains the + effective address. The contents of memory at the effective address are + copied into register eax. + + -- base displacement mode -- The effective address is the sum of a + constant and the contents of a register. + + Example instruction: + mov eax, [esp + 4] + + The second operand uses base displacement mode. The instruction + contains a constant. That constant is added to the contents of register + esp to form an effective address. The contents of memory at the + effective address are copied into register eax. + + -- base-indexed mode -- (Intel's name) The effective address is the sum + of the contents of two registers. + + Example instruction: + mov eax, [esp][esi] + + The contents of registers esp and esi are added to form an effective + address. The contents of memory at the effective address are copied + into register eax. + + Note that there are restrictions on the combinations of registers that + can be used in this addressing mode. + + -- PC relative mode -- The effective address is the sum of the contents + of the PC and a constant contained within the instruction. + + Example instruction: + jmp a_label + + The contents of the program counter is added to an offset that is + within the machine code for the instruction. The resulting sum is + placed back into the program counter. Note that from the assembly + language it is not clear that a PC relative addressing mode is used. It + is the assembler that generates the offset to place in the instruction. + +Instruction Set + + Generalities: + + -- Many (most?) of the instructions have exactly 2 operands. If there + are 2 operands, then one of them will be required to use register mode, + and the other will have no restrictions on its addressing mode. + + -- There are most often ways of specifying the same instruction for 8-, + 16-, or 32-bit oeprands. Note that on a 32-bit machine, with newly + written code, the 16-bit form will never be used. + + Meanings of the operand specifications: + reg - register mode operand, 32-bit register + reg8 - register mode operand, 8-bit register + r/m - general addressing mode, 32-bit + r/m8 - general addressing mode, 8-bit + immed - 32-bit immediate is in the instruction + immed8 - 8-bit immediate is in the instruction + m - symbol (label) in the instruction is the effective address + + Data Movement + + mov reg, r/m ; copy data + r/m, reg + reg, immed + r/m, immed + + movsx reg, r/m8 ; sign extend and copy data + + movzx reg, r/m8 ; zero extend and copy data + + lea reg, m ; get effective address + (A newer instruction, so its format is much restricted + over the other ones.) + + EXAMPLES: + + mov EAX, 23 ; places 32-bit 2's complement immediate 23 + ; into register EAX + movsx ECX, AL ; sign extends the 8-bit quantity in register + ; AL to 32 bits, and places it in ECX + mov [esp], -1 ; places value -1 into memory, address given + ; by contents of esp + lea EBX, loop_top ; put the address assigned (by the assembler) + ; to label loop_top into register EBX + + Integer Arithmetic + + add reg, r/m ; two's complement addition + r/m, reg + reg, immed + r/m, immed + + inc reg ; add 1 to operand + r/m + + sub reg, r/m ; two's complement subtraction + r/m, reg + reg, immed + r/m, immed + + dec reg ; subtract 1 from operand + r/m + + neg r/m ; get additive inverse of operand + + mul eax, r/m ; unsigned multiplication + ; edx||eax <- eax * r/m + + imul r/m ; 2's comp. multiplication + ; edx||eax <- eax * r/m + reg, r/m ; reg <- reg * r/m + reg, immed ; reg <- reg * immed + + div r/m ; unsigned division + ; does edx||eax / r/m + ; eax <- quotient + ; edx <- remainder + + idiv r/m ; 2's complement division + ; does edx||eax / r/m + ; eax <- quotient + ; edx <- remainder + + cmp reg, r/m ; sets EFLAGS based on + r/m, immed ; second operand - first operand + r/m8, immed8 + r/m, immed8 ; sign extends immed8 before subtract + + + + EXAMPLES: + + neg [eax + 4] ; takes doubleword at address eax+4 + ; and finds its additive inverse, then places + ; the additive inverse back at that address + ; the instruction should probably be + ; neg dword ptr [eax + 4] + + inc ecx ; adds one to contents of register ecx, and + ; result goes back to ecx + + Logical + + not r/m ; logical not + + and reg, r/m ; logical and + reg8, r/m8 + r/m, reg + r/m8, reg8 + r/m, immed + r/m8, immed8 + + or reg, r/m ; logical or + reg8, r/m8 + r/m, reg + r/m8, reg8 + r/m, immed + r/m8, immed8 + + xor reg, r/m ; logical exclusive or + reg8, r/m8 + r/m, reg + r/m8, reg8 + r/m, immed + r/m8, immed8 + + test r/m, reg ; logical and to set EFLAGS + r/m8, reg8 + r/m, immed + r/m8, immed8 + + + + + EXAMPLES: + + and edx, 00330000h ; logical and of contents of register + ; edx (bitwise) with 0x00330000, + ; result goes back to edx + + Floating Point Arithmetic + + Since the newer architectures have room for floating point hardware on + chip, Intel defined a simple-to-implement extension to the architecture + to do floating point arithmetic. In their usual zeal, they have + included MANY instructions to do floating point operations. + + The mechanism is simple. A set of 8 registers are organized and + maintained (by hardware) as a stack of floating point values. ST refers + to the stack top. ST(1) refers to the register within the stack that is + next to ST. ST and ST(0) are synonyms. + + There are separate instructions to test and compare the values of + floating point variables. + finit ; initialize the FPU + + fld m32 ; load floating point value + m64 + ST(i) + + fldz ; load floating point value 0.0 + + fst m32 ; store floating point value + m64 + ST(i) + + fstp m32 ; store floating point value + m64 ; and pop ST + ST(i) + + fadd m32 ; floating point addition + m64 + ST, ST(i) + ST(i), ST + + faddp ST(i), ST ; floating point addition + ; and pop ST + + Control Instructions + + All conditional control instructions in the Intel architectures are + called jumps. Their machine code is similar to the MIPS branch + instructions. + + Just some of the many control instructions: + jmp m ; unconditional jump + jg m ; jump if greater than 0 + jge m ; jump if greater than or equal to 0 + jl m ; jump if less than 0 + jle m ; jump if less than or equal to 0 + + Note that a control instruction takes a single operand, which specifies + the jump target. The conditional control instructions look at the + condition code bits (in the EFLAGS register) to make a decision on + whether to take the jump or not. + + The condition code bits are set by separate instructions. Several + arithmetic and logical instructions set some of the condition code + bits. There are also specific instructions to compare operands and set + the condition code bits based on the comparison (examples: cmp, test). + + Some sample code, for fun: + + Pentium code to add 1 to each element of an array of integers. + + Assume that there is an array of 100 integers in memory. The label + associated with the first element is int_array. + + Comments are placed to the right, and preceded by a semicolon (;). + + + lea EAX, int_array ; like la in MIPS, EAX is pointer + mov ECX, 100 ; register ECX contains counter +loop_top: + cmp ECX, 0 ; must set condition codes + je all_done ; uses condition codes to branch + inc [EAX] ; a register direct addressing mode! + add EAX, 4 ; updates pointer + dec ECX ; update counter + jmp loop_top ; unconditional branch to loop_top + +all_done: + + + Some things to notice about this code: + + -- You can figure it out, although you only know MIPS assembly + language! That is because most assembly languages look similar. + + -- The 2-address instruction set does not generate a larger number of + instructions for this example (than a 3-address instruction set would, + like MIPS). It does do the same number of memory accesses. + +Intel MMX (Optional) + + MultiMedia eXtension to Intel Arch. + [source: Peleg & Weiser, IEEE Micro, Aug. 96] + + Motivation + + Q: Why might people want to buy newer, faster PCs? + A: Processing audio and video + + Let's make audio and video perform better + + Method 1: add special-purpose card + Method 2: make regular microprocessor perform better at audio/video + + Intel's MMX follows Method 2 + The goal is 2x performance in audio, video, etc. + + Key observation: precision of data required << 32 bits + + For video, + Red/Green/Blue might use 8 (16) bits each for 256 (64K) colors per + pixel (picture element) + + Key technique: pack multiple low-precision items into a 64-bit + floating-point register add instructions to manipulate them + + (This is an example of a general technique called "single instruction + multiple data", or SIMD) + + MMX Datatypes ------------- + * 1 x 64 bit quad word + * 2 x 32 bit double-word + * 4 x 16 bit word + * 8 x 8 bit byte + + MMX Instructions + + Example, ADDB (B stands for byte) + + 17 87 100 ... 5 more + + 17 13 200 ... 5 more + ---- ---- ---- ... + 34 100 44 = 300 mod 256 ==> wraparound + 255 = max value ==> saturating + + This can be used to do arithmetic/logical operations on more than 1 + pixel's worth of data in 1 instruction. + + Also MOV's == load & stores + + Example: + 16 element dot product (from matrix multiply) + + [a1 a2 ... a16]* [b1 b2 ... b16]^T = a1*b1 + b2*b2 + ... + a16*b16 + + comparision with Intel IA-32 gives: + + -> 32 loads + -> 16 * + -> 15 + + -> 12 loop ctrl + --- + 76 instructions + int ==> 200 cycles + fp ==> 76 cycles + + Intel MMX assuming 16b values + + -> 16 instructions + -> 12 cycles (6x better than fp) + + Other Instructions + + PACK/UNPACK -- putting multiple values in single register & back + + MASK + Example, "make 0xff if equal" + + 15 15 100 120 101 76 15 15 + 15 15 15 15 15 15 15 15 + ------------------------------------- + FF FF 00 00 00 00 FF FF + + Why? Mask for weatherman! + * * film weatherperson in front of blue background (0x15) + * * wthmsk = use above mask instruction + wthmsk==FF -- no weatherperson + wthmsk==00 -- weatherperson + + image = (~wthmsk & weatherperson ) | (wthmsk & weathermap) + + (What happens if weatherperson wears suit of color 15?) + + MMX Constraints + * Instruction Set Architecture extensions, but perfect backward + compatibility + * 100% Operating System compatible (no new registers, flags, + exceptions) + * Independent Software Vendor (ISV) support (bit in CPUID instruction + so applications can test for MMX and include code for both) + +IA-64/Merced (Optional) + + Motivation + * IA-32 has 32-bit addresses + * 2^32 ==> 4G bytes of memory Current large servers want more! + * Near future medium servers will want more ... Someday desktops will + want more? + + What to do? + 1. Kludge IA-32 to support > 32-bit addresses + 2. Do new instruction set with binary compatibility strategy + (a) have new chips also support IA-32 + (b) use binary translation, etc. + + Intel claims to be doing 2a, but has only partially revealed plans. (as + of Nov '98) + + New instruction set architecture: IA-64 + * 64-bit addresses + * Mode for running old code + * First implementation is called "Merced" + * Has extra large instructions so that 128b = 4 * 32b holds + instrn0 instrn1 instrn2 template + + "template" gives "relationships" between instuctions. + Example: whether instrn1 shares no registers or memory locations w/ + instrn0 + * Uses "templates" and "predication". + Each instruction is "predicated" + Example, + if $1 < $2 + then + $3 = $4 + else + $5 = $6 + endif + + Is normally: + bge $1, $2, else + mov $3, $4 + b endif + else: mov $5, $6 + endif: + + With predication: + setlt p0, $1, $2 + if (p0==TRUE) mov $3, $4 + if (p0==FALSE) mov $5, $6 /* three instructions & no branches */ + + Aren't you glad we did not teach 354 with IA-64/Merced? + + Copyright © Karen Miller, 2006 |