summaryrefslogtreecommitdiff
path: root/miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt
diff options
context:
space:
mode:
Diffstat (limited to 'miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt')
-rw-r--r--miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt645
1 files changed, 645 insertions, 0 deletions
diff --git a/miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt b/miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt
new file mode 100644
index 0000000..c349efd
--- /dev/null
+++ b/miniany/doc/pages.cs.wisc.edu_~markhill_cs354_Fall2008_notes_Pentium.txt
@@ -0,0 +1,645 @@
+ IA-32 (x86) Architecture
+
+History
+
+ As technology improved over the years, there developed a race to get
+ the first (usable) processors on a single integrated circuit.
+
+ When able to place approximately 10,000 transistors on a single IC,
+ then we have just about enough circuitry to put a (simple) processor on
+ a this single IC.
+
+ The Intel 8086 was Intel's entry in the race. On the way to getting
+ their processor out (on the market) as fast as possible, they made some
+ unusual design decisions.
+ Year
+ 1974 8080 8-bit architecture with 8-bit bus
+
+ 1978 8086 16 bit architecture w/ 16-bit bus
+
+ 8088 - like 8086, 16-bit architecture, but
+ only had an 8-bit (internal) bus
+ selected for IBM PC -- golden handcuffs
+
+ 1980 8087 FPU
+
+ 1982 80286 24-bit weird addresses
+
+ 1985 80386 32b registers and addresses
+
+ 1989 80486,
+ 1993 Pentium,
+ 1995 Pentium Pro -- few changes
+
+ 1997 MMX
+
+ Backward Compatibility
+
+ SECTION NOT YET WRITTEN!
+
+ Current implementations
+
+ Pentium Pro and after
+
+ Instruction decode translates machine code into "RISC OPS" (like
+ decoded MIPS instructions)
+
+ An execution unit runs RISC OPS
+
+ + Backward compatiblity
+ - Complex decoding
+ + execution unit as fast as RISC like MIPS
+
+The Pentium Architecture
+
+ * It is not a load/store architecture.
+ * The instruction set is huge! We go over only a fraction of the
+ instruction set. 16bit, 32bit operations on memory and registers
+ decoding nightmare: a single machine code instruction can be from 1
+ to 17 bytes long w/ prefixes & postfixes. But, mainline (most
+ common) 386 instructions not terrible
+ * There are lots of restrictions on how instructions/operands are put
+ together, but there is also an amazing amount of flexibility.
+
+ Registers
+
+ The Intel architectures as a set just do not have enough registers to
+ satisfy most assembly language programmers. Still, the processors have
+ been around for a LONG time, and they have a sufficient number of
+ registers to do whatever is necessary.
+
+ For our (mostly) general purpose use, we get
+ 32-bit 16-bit 8-bit 8-bit
+ (high part of 16) (low part of 16)
+
+ EAX AX AH AL
+ EBX BX BH BL
+ ECX CX CH CL
+ EDX DX DH DL
+
+
+ and
+
+ EBP BP
+ ESI SI
+ EDI DI
+ ESP SP
+
+ There are a few more, but we will not use or discuss them. They are
+ only used for memory accessability in the segmented memory model.
+
+ Note that it is unusual to be able to designate part of a register as
+ an operand. This evolved, due to the backward compatibility to previous
+ processors that had 16-bit registers.
+
+ Using the registers:
+ As an operand, just use the name (upper case and lower case both work
+ interchangeably).
+
+ EBP is a frame pointer.
+ ESP is a stack pointer.
+
+ ONE MORE REGISTER:
+ Many bits used for controlling the action of the processor and setting
+ state are in the register called EFLAGS. This register contains the
+ condition codes:
+ OF Overflow flag
+ SF Sign flag
+ ZF Zero flag
+ PF Parity flag
+ CF Carry flag
+
+ The settings of these flags are checked in conditional control
+ instructions. Many instructions set one or more of the flags.
+
+ The use of the EFLAGS register is implied (rather than explicit) in
+ instructions.
+
+ Accessing Memory
+
+ There are 2 memory models supported in the Pentium architecture.
+ (Actually it is the 486 and more recent models that support 2 models.)
+
+ In both models, memory is accessed using an address. It is the way that
+ addresses are formed (within the processor) that differs in the 2
+ models.
+
+ FLAT MEMORY MODEL
+
+ -- The memory model that every one else uses.
+
+ SEGMENTED MEMORY MODEL
+
+ -- Different parts of a program are assumed to be in their own,
+ set-aside portions of memory. These portions are called segments.
+
+ -- An address is formed from 2 pieces: a segment location and an offset
+ within a segment.
+
+ Note that each of these pieces can be shorter (contain fewer bits) than
+ a whole address. This is much of the reason that Intel chose this form
+ of memory model for its earliest single-chip processors.
+
+ -- There are segments for:
+
+ code
+ data
+ stack
+ other
+
+ -- Which segment something is in can be implied by the memory access
+ involved. An instruction fetch will always be looking in the code
+ segment. An instruction to push data onto the stack always accesses the
+ stack segment.
+
+ Addressing Modes
+
+ Some would say that the Intel architectures only support 1 addressing
+ mode. It looks (something like) this:
+ effective address = base reg + (index reg x scaling factor) + displacement
+
+ where
+ base reg is EAX, EBX, ECX, EDX or ESP or EBP
+ index reg is EDI or ESI
+ scaling factor is 1, 2, 4, or 8
+
+ The syntax of using this (very general) addressing mode will vary from
+ system to system. It depends on the preprocessor and the syntax
+ accepted by the assembler.
+
+ For our implementation, an operand within an instruction that uses this
+ addressing mode could look like
+ [EAX][EDI*2 + 80]
+
+
+ The effective address calculated with be the contents of register EDI
+ multiplied times 2 added to the constant 80, added to the contents of
+ register EAX.
+
+ There are extremely few times where a high-level language compiler can
+ utilize such a complex addressing mode. It is much more likely that
+ simplified versions of this mode will be used.
+
+ SOME ADDRESSING MODES
+
+ -- register mode -- The operand is in a register. The effective address
+ is the register (wierd).
+
+ Example instruction:
+ mov eax, ecx
+
+ Both operands use register mode. The contents of register ecx is copied
+ to register eax.
+
+ -- immediate mode -- The operand is in the instruction. The effective
+ address is within the instruction.
+
+ Example instruction:
+ mov eax, 26
+
+ The second operand uses immediate mode. Within the instruction is the
+ operand. It is copied to register eax.
+
+ -- register direct mode -- The effective address is in a register.
+
+ Example instruction:
+ mov eax, [esp]
+
+ The second operand uses register direct mode. The contents of register
+ esp is the effective address. The contents of memory at the effective
+ address are copied into register eax.
+
+ -- direct mode -- The effective address is in the instruction.
+
+ Example instruction:
+ mov eax, var_name
+
+ The second operand uses direct mode. The instruction contains the
+ effective address. The contents of memory at the effective address are
+ copied into register eax.
+
+ -- base displacement mode -- The effective address is the sum of a
+ constant and the contents of a register.
+
+ Example instruction:
+ mov eax, [esp + 4]
+
+ The second operand uses base displacement mode. The instruction
+ contains a constant. That constant is added to the contents of register
+ esp to form an effective address. The contents of memory at the
+ effective address are copied into register eax.
+
+ -- base-indexed mode -- (Intel's name) The effective address is the sum
+ of the contents of two registers.
+
+ Example instruction:
+ mov eax, [esp][esi]
+
+ The contents of registers esp and esi are added to form an effective
+ address. The contents of memory at the effective address are copied
+ into register eax.
+
+ Note that there are restrictions on the combinations of registers that
+ can be used in this addressing mode.
+
+ -- PC relative mode -- The effective address is the sum of the contents
+ of the PC and a constant contained within the instruction.
+
+ Example instruction:
+ jmp a_label
+
+ The contents of the program counter is added to an offset that is
+ within the machine code for the instruction. The resulting sum is
+ placed back into the program counter. Note that from the assembly
+ language it is not clear that a PC relative addressing mode is used. It
+ is the assembler that generates the offset to place in the instruction.
+
+Instruction Set
+
+ Generalities:
+
+ -- Many (most?) of the instructions have exactly 2 operands. If there
+ are 2 operands, then one of them will be required to use register mode,
+ and the other will have no restrictions on its addressing mode.
+
+ -- There are most often ways of specifying the same instruction for 8-,
+ 16-, or 32-bit oeprands. Note that on a 32-bit machine, with newly
+ written code, the 16-bit form will never be used.
+
+ Meanings of the operand specifications:
+ reg - register mode operand, 32-bit register
+ reg8 - register mode operand, 8-bit register
+ r/m - general addressing mode, 32-bit
+ r/m8 - general addressing mode, 8-bit
+ immed - 32-bit immediate is in the instruction
+ immed8 - 8-bit immediate is in the instruction
+ m - symbol (label) in the instruction is the effective address
+
+ Data Movement
+
+ mov reg, r/m ; copy data
+ r/m, reg
+ reg, immed
+ r/m, immed
+
+ movsx reg, r/m8 ; sign extend and copy data
+
+ movzx reg, r/m8 ; zero extend and copy data
+
+ lea reg, m ; get effective address
+ (A newer instruction, so its format is much restricted
+ over the other ones.)
+
+ EXAMPLES:
+
+ mov EAX, 23 ; places 32-bit 2's complement immediate 23
+ ; into register EAX
+ movsx ECX, AL ; sign extends the 8-bit quantity in register
+ ; AL to 32 bits, and places it in ECX
+ mov [esp], -1 ; places value -1 into memory, address given
+ ; by contents of esp
+ lea EBX, loop_top ; put the address assigned (by the assembler)
+ ; to label loop_top into register EBX
+
+ Integer Arithmetic
+
+ add reg, r/m ; two's complement addition
+ r/m, reg
+ reg, immed
+ r/m, immed
+
+ inc reg ; add 1 to operand
+ r/m
+
+ sub reg, r/m ; two's complement subtraction
+ r/m, reg
+ reg, immed
+ r/m, immed
+
+ dec reg ; subtract 1 from operand
+ r/m
+
+ neg r/m ; get additive inverse of operand
+
+ mul eax, r/m ; unsigned multiplication
+ ; edx||eax <- eax * r/m
+
+ imul r/m ; 2's comp. multiplication
+ ; edx||eax <- eax * r/m
+ reg, r/m ; reg <- reg * r/m
+ reg, immed ; reg <- reg * immed
+
+ div r/m ; unsigned division
+ ; does edx||eax / r/m
+ ; eax <- quotient
+ ; edx <- remainder
+
+ idiv r/m ; 2's complement division
+ ; does edx||eax / r/m
+ ; eax <- quotient
+ ; edx <- remainder
+
+ cmp reg, r/m ; sets EFLAGS based on
+ r/m, immed ; second operand - first operand
+ r/m8, immed8
+ r/m, immed8 ; sign extends immed8 before subtract
+
+
+
+ EXAMPLES:
+
+ neg [eax + 4] ; takes doubleword at address eax+4
+ ; and finds its additive inverse, then places
+ ; the additive inverse back at that address
+ ; the instruction should probably be
+ ; neg dword ptr [eax + 4]
+
+ inc ecx ; adds one to contents of register ecx, and
+ ; result goes back to ecx
+
+ Logical
+
+ not r/m ; logical not
+
+ and reg, r/m ; logical and
+ reg8, r/m8
+ r/m, reg
+ r/m8, reg8
+ r/m, immed
+ r/m8, immed8
+
+ or reg, r/m ; logical or
+ reg8, r/m8
+ r/m, reg
+ r/m8, reg8
+ r/m, immed
+ r/m8, immed8
+
+ xor reg, r/m ; logical exclusive or
+ reg8, r/m8
+ r/m, reg
+ r/m8, reg8
+ r/m, immed
+ r/m8, immed8
+
+ test r/m, reg ; logical and to set EFLAGS
+ r/m8, reg8
+ r/m, immed
+ r/m8, immed8
+
+
+
+
+ EXAMPLES:
+
+ and edx, 00330000h ; logical and of contents of register
+ ; edx (bitwise) with 0x00330000,
+ ; result goes back to edx
+
+ Floating Point Arithmetic
+
+ Since the newer architectures have room for floating point hardware on
+ chip, Intel defined a simple-to-implement extension to the architecture
+ to do floating point arithmetic. In their usual zeal, they have
+ included MANY instructions to do floating point operations.
+
+ The mechanism is simple. A set of 8 registers are organized and
+ maintained (by hardware) as a stack of floating point values. ST refers
+ to the stack top. ST(1) refers to the register within the stack that is
+ next to ST. ST and ST(0) are synonyms.
+
+ There are separate instructions to test and compare the values of
+ floating point variables.
+ finit ; initialize the FPU
+
+ fld m32 ; load floating point value
+ m64
+ ST(i)
+
+ fldz ; load floating point value 0.0
+
+ fst m32 ; store floating point value
+ m64
+ ST(i)
+
+ fstp m32 ; store floating point value
+ m64 ; and pop ST
+ ST(i)
+
+ fadd m32 ; floating point addition
+ m64
+ ST, ST(i)
+ ST(i), ST
+
+ faddp ST(i), ST ; floating point addition
+ ; and pop ST
+
+ Control Instructions
+
+ All conditional control instructions in the Intel architectures are
+ called jumps. Their machine code is similar to the MIPS branch
+ instructions.
+
+ Just some of the many control instructions:
+ jmp m ; unconditional jump
+ jg m ; jump if greater than 0
+ jge m ; jump if greater than or equal to 0
+ jl m ; jump if less than 0
+ jle m ; jump if less than or equal to 0
+
+ Note that a control instruction takes a single operand, which specifies
+ the jump target. The conditional control instructions look at the
+ condition code bits (in the EFLAGS register) to make a decision on
+ whether to take the jump or not.
+
+ The condition code bits are set by separate instructions. Several
+ arithmetic and logical instructions set some of the condition code
+ bits. There are also specific instructions to compare operands and set
+ the condition code bits based on the comparison (examples: cmp, test).
+
+ Some sample code, for fun:
+
+ Pentium code to add 1 to each element of an array of integers.
+
+ Assume that there is an array of 100 integers in memory. The label
+ associated with the first element is int_array.
+
+ Comments are placed to the right, and preceded by a semicolon (;).
+
+
+ lea EAX, int_array ; like la in MIPS, EAX is pointer
+ mov ECX, 100 ; register ECX contains counter
+loop_top:
+ cmp ECX, 0 ; must set condition codes
+ je all_done ; uses condition codes to branch
+ inc [EAX] ; a register direct addressing mode!
+ add EAX, 4 ; updates pointer
+ dec ECX ; update counter
+ jmp loop_top ; unconditional branch to loop_top
+
+all_done:
+
+
+ Some things to notice about this code:
+
+ -- You can figure it out, although you only know MIPS assembly
+ language! That is because most assembly languages look similar.
+
+ -- The 2-address instruction set does not generate a larger number of
+ instructions for this example (than a 3-address instruction set would,
+ like MIPS). It does do the same number of memory accesses.
+
+Intel MMX (Optional)
+
+ MultiMedia eXtension to Intel Arch.
+ [source: Peleg & Weiser, IEEE Micro, Aug. 96]
+
+ Motivation
+
+ Q: Why might people want to buy newer, faster PCs?
+ A: Processing audio and video
+
+ Let's make audio and video perform better
+
+ Method 1: add special-purpose card
+ Method 2: make regular microprocessor perform better at audio/video
+
+ Intel's MMX follows Method 2
+ The goal is 2x performance in audio, video, etc.
+
+ Key observation: precision of data required << 32 bits
+
+ For video,
+ Red/Green/Blue might use 8 (16) bits each for 256 (64K) colors per
+ pixel (picture element)
+
+ Key technique: pack multiple low-precision items into a 64-bit
+ floating-point register add instructions to manipulate them
+
+ (This is an example of a general technique called "single instruction
+ multiple data", or SIMD)
+
+ MMX Datatypes -------------
+ * 1 x 64 bit quad word
+ * 2 x 32 bit double-word
+ * 4 x 16 bit word
+ * 8 x 8 bit byte
+
+ MMX Instructions
+
+ Example, ADDB (B stands for byte)
+
+ 17 87 100 ... 5 more
+ + 17 13 200 ... 5 more
+ ---- ---- ---- ...
+ 34 100 44 = 300 mod 256 ==> wraparound
+ 255 = max value ==> saturating
+
+ This can be used to do arithmetic/logical operations on more than 1
+ pixel's worth of data in 1 instruction.
+
+ Also MOV's == load & stores
+
+ Example:
+ 16 element dot product (from matrix multiply)
+
+ [a1 a2 ... a16]* [b1 b2 ... b16]^T = a1*b1 + b2*b2 + ... + a16*b16
+
+ comparision with Intel IA-32 gives:
+
+ -> 32 loads
+ -> 16 *
+ -> 15 +
+ -> 12 loop ctrl
+ ---
+ 76 instructions
+ int ==> 200 cycles
+ fp ==> 76 cycles
+
+ Intel MMX assuming 16b values
+
+ -> 16 instructions
+ -> 12 cycles (6x better than fp)
+
+ Other Instructions
+
+ PACK/UNPACK -- putting multiple values in single register & back
+
+ MASK
+ Example, "make 0xff if equal"
+
+ 15 15 100 120 101 76 15 15
+ 15 15 15 15 15 15 15 15
+ -------------------------------------
+ FF FF 00 00 00 00 FF FF
+
+ Why? Mask for weatherman!
+ * * film weatherperson in front of blue background (0x15)
+ * * wthmsk = use above mask instruction
+ wthmsk==FF -- no weatherperson
+ wthmsk==00 -- weatherperson
+
+ image = (~wthmsk & weatherperson ) | (wthmsk & weathermap)
+
+ (What happens if weatherperson wears suit of color 15?)
+
+ MMX Constraints
+ * Instruction Set Architecture extensions, but perfect backward
+ compatibility
+ * 100% Operating System compatible (no new registers, flags,
+ exceptions)
+ * Independent Software Vendor (ISV) support (bit in CPUID instruction
+ so applications can test for MMX and include code for both)
+
+IA-64/Merced (Optional)
+
+ Motivation
+ * IA-32 has 32-bit addresses
+ * 2^32 ==> 4G bytes of memory Current large servers want more!
+ * Near future medium servers will want more ... Someday desktops will
+ want more?
+
+ What to do?
+ 1. Kludge IA-32 to support > 32-bit addresses
+ 2. Do new instruction set with binary compatibility strategy
+ (a) have new chips also support IA-32
+ (b) use binary translation, etc.
+
+ Intel claims to be doing 2a, but has only partially revealed plans. (as
+ of Nov '98)
+
+ New instruction set architecture: IA-64
+ * 64-bit addresses
+ * Mode for running old code
+ * First implementation is called "Merced"
+ * Has extra large instructions so that 128b = 4 * 32b holds
+ instrn0 instrn1 instrn2 template
+
+ "template" gives "relationships" between instuctions.
+ Example: whether instrn1 shares no registers or memory locations w/
+ instrn0
+ * Uses "templates" and "predication".
+ Each instruction is "predicated"
+ Example,
+ if $1 < $2
+ then
+ $3 = $4
+ else
+ $5 = $6
+ endif
+
+ Is normally:
+ bge $1, $2, else
+ mov $3, $4
+ b endif
+ else: mov $5, $6
+ endif:
+
+ With predication:
+ setlt p0, $1, $2
+ if (p0==TRUE) mov $3, $4
+ if (p0==FALSE) mov $5, $6 /* three instructions & no branches */
+
+ Aren't you glad we did not teach 354 with IA-64/Merced?
+
+ Copyright © Karen Miller, 2006