CS代考 IA32) generations: 8086, 286, 386, 486, Pentium, PentiumIII, Pentium4,… – cscodehelp代写

er Architectur
2: Instruction Set

Copyright By cscodehelp代写 加微信 cscodehelp

1 (Martin/Roth)
: Instruction Set Architectures
• Chapter 2 • Further r
pendix C (RISC)
Available from web page
• The Evolution
and Appendix D
chnology at IBM
Much of this chapter will be “on your own reading”
• Hard to talk about ISA features without knowing what they do
1 (Martin/Roth)
many of these issu
: Instruction Set Architectures
Instruction Set
1 (Martin/Roth)
Gates & Transistors
: Instruction Set Architectures
Architecture (ISA)
What is a goo
pects of ISAs
• RISC vs.
• Implementing CISC:
1 (Martin/Roth)
(instruction s
• A well-define hardw
guarantees rega
perations are
• The “contract” between software and hardware • Functional definition of operations, modes,
locations supported by hardware Precise description of how
are implemented
perations take more power
: Instruction Set Architectures
are/software interface
fast and whic
invoke, and access
and storage

guage Analogy
• Allows commu • Language:
• Many common aspects
Part of spe Common o
ISA: hardware to soft
• Need to speak
• Many different languages/ISAs
Different structure
• Both evolve over
ey differences: ISAs
1 (Martin/Roth)
person to person
e same lan
ech: verbs, nouns, adjectives, adverbs, etc. perations: calculation, control/branch, memory
explicitly engine
: Instruction Set Architectures
, many similarities,
ered and extend
many differences
• Easy to exp
1 (Martin/Roth)
y to design hi
• More recently
ress programs effic
design low-power implementations? design high-reliability implementations?
design low-cost implementations
: Instruction Set Architectures
• Easy to maintain programmability (imp and programs (technology) evolves?
• x86 (IA32) generations: 8086, 286, 386, 486, Pentium, PentiumIII, Pentium4,…
vs CISC Foresh
ecall perfor
• (instructi
C (Complex Instru
y for assembly-level progra
• Increases “instruction/progr • Help from smart compiler
• Perhaps improve
1 (Martin/Roth)
ction” with
cycle time
essive implementation allowed by s
: Instruction Set Architectures
“instructions/program” with “complex” instructions
* (seconds/cycle)
many single-cycle instru
but hopefu
not as much
impler instruc
• Easy to express programs efficiently? • For whom?
• Before 1985: human
• Compilers were terrible, most code was hand-assembled • Want high-level coarse-grain instructions
• As similar to high-level language as possible
• After 1985: compiler
• Optimizing compilers generate much better code that you or I • Want low-level fine-grain instructions
• Compiler can’t tell if two high-level idioms match exactly or not
CIS 501 (Martin/Roth): Instruction Set Architectures 8

• Proximity to a high-level language (HLL) • Closing the “semantic gap”
• Semantically heavy
Example: SPARC save/restore
Bad example: x86 rep movsb (copy Ridiculous example: VAX insque (ins
• “Semantic
tranger than fiction
• People once thought compu
• Fortunately, never materialized (but
1 (Martin/Roth)
: Instruction Set Architectures
human to program in?
insns that
“loop”, “procedure
you have many hig
ters would
string) ert-into-
mplete idio
evel languages?
language directly
keeps coming back around)
day’s Semantic
• Today’s ISAs are targeted to one language… • Just so happens that this language is very low
e C programming language
Will ISAs be different when Java/C# become dominant? • Object-oriented? Probably not
• Support for
• Support for
1 (Martin/Roth)
garbage collection? Maybe
Smart compilers instructions
Any benefit of tailo
: Instruction Set Architectures
is likely sm
guages to simple
1 (Martin/Roth)
• Low level primitives
• Wulf: “primitives
• Requires
gularity: “princi
Orthogonality One-vs.-all
from which solutio not solutions”
good at breaking complex
Not so good at combining sim
: Instruction Set Architectures
• Requires search, pattern matching (why AI is hard) Easier to synthesize complex insns than to compare them
f least astonishment”
to program in?
can be syn
uctures to simple ones
ple structures into complex
• Every ISA can be implemented
• Not every ISA can be implemented efficiently
• Classic high-performance implementation techniques
• Pipelining, parallel execution, out-of-order execution (more later)
• Certain ISA features make these difficult
– Variable instruction lengths/formats: complicate decoding – Implicit state: complicates dynamic scheduling
– Variable latencies: complicates scheduling
– Difficult to interrupt instructions: complicate many things
CIS 501 (Martin/Roth): Instruction Set Architectures 12

• Very imp
1 (Martin/Roth)
M’s 360/37
Backward compatibility
• New processors must support old
ardware… if it requires new software
• Intel was the first company to realize this
• ISA ust remain compatible, no matter what
the worst designed ISAs EVER, but survives
pward) com
processors must support new New processors redefine only
e to detect
rocessors emulate new inst
: Instruction Set Architectures
0 (the first
(can’t drop
programs (with software help) previously-illegal opcodes
r specific n
low-level soft
• Trap:instructionmakeslow-level“functioncall”toOShandler • Nop: “no operation” – instructions with no functional semantics
• Handle rare
ly used but hard
implement “legacy” opc
ine to trap in new implementation and emulate in software Rid yourself of some ISA mistakes of the past
• Add ISA hints
1 (Martin/Roth)
: performance suffers
• Reserve sets of trap & nop opcodes (don’ • Add ISA functionality by overloading traps
firmware patch to “add”
: Instruction Set Architectures
erloading nop
old implementation
asy compatibility r
• Temptation: use • Frequent outcom
1 (Martin/Roth)
equires forethought
some ISA extension for 5% performance gain e: gain diminishes, disappears, or turns to loss
continue to support gadget fo
• Example: register windows (SPARC)
Adds difficulty to out-of- Details shortly
: Instruction Set Architectures
implementations of SPARC
Aspects of ISAs
• VonNeumannmodel
• Implicit structure of all modern ISAs
• Length and encoding
• Operand model
• Where (other than memory) are operands stored?
• Datatypes and operations
• Overview only
• Read about the rest in the book and appendices
CIS 501 (Martin/Roth): Instruction Set Architectures 16

Write Output
Sequential Model
Basic • Def
called VonNeum
Value flows from insn
all modern
as output,
an, but in ENIAC befo
s A as inpu
feature: the program counter (PC) ines total order on dynamic instruction Next PC is PC++ unless insn says otherwise
Read Inputs
define computation
1 (Martin/Roth)
: Instruction Set Architectures
d Y after X
Processor logically executes loop at
• Instruction execution assumed atomic
• Instruction X finishes before insn X+1 starts
been proposed…
Example: M

• 3 formats, sim
• Q: how many instru
1 (Martin/Roth)
IPS Format
ple encoding
ctions can
Rs(5) Rt(5)
Rs(5 Rt(5)
: Instruction Set Architectures
encoded? A: 127
Rd(5) Sh(5)
• Fixed len • Most
1 (Martin/Roth)
common is 32 bits
Simple implementatio
• Variable length
– Complex impl
density: 32 bits
• Compromise: two leng
: Instruction Set Architectures
can do this in one 8-bit instruction
ute next PC usin
y decoder impleme
egister by 1?
Operand Model: Memory Only
• Where (other than memory) can operands come from? • And how are they specified?
• Example:A=B+C
• Several options
• Memory only
add B,C,A mem[A] = mem[B] + mem[C]
MEM CIS 501 (Martin/Roth): Instruction Set Architectures 20

load B add C
1 (Martin/Roth)
Accumulator: implicit single ele
: Instruction Set Architectures
ACC = mem[B]
ACC = A mem[A]
CC + mem[C] = ACC

store R1,A
oad-store: GPR and only l
store R1,A
1 (Martin/Roth)
add R1,R2,R1
: Instruction Set Architectures
er: multiple e
R1 = mem[B]
R1 = R1 + mem[C] mem[A] = R1
oads/stores access me
R1 = mem[B] R2 = mem[C]
R1 = R1 + R2
mem[A] = R1
it accumula
Stack: TOS
1 (Martin/Roth)
implicit in instru
: Instruction Set Architectures
stk[TOS++] stk[TOS++]
em[A] = stk[–TOS]
= mem[C] = stk[–TO
+ stk[–TO
Operand Model Pros and Cons
• Metric I: static code size
• Number of instructions needed to represent program, size of each • Want many implicit operands, high level instructions
• Good ! bad: memory, accumulator, stack, load-store
• Metric II: data memory traffic
• Number of bytes move to and from memory
• Want as many long-lived operands in on-chip storage • Good ! bad: load-store, stack, accumulator, memory
• Metric III: cycles per instruction
• Want short (1 cycle?), little variability, few nearby dependences • Good ! bad: load-store, stack, accumulator, memory
• Upshot: most new ISAs are load-store or hybrids
CIS 501 (Martin/Roth): Instruction Set Architectures 24

How Many Registers?
• One reason
• Small is • Another is t
– More reg
• Upshot: trend • 64-bit x86
1 (Martin/Roth)
s faster tha
Fewer registe
es, arrays,
registers are faster is th fast (hardware truism)
hat they are directly addressed (
rs per instr
hough compilers are getting better
means more savi
have as many as possible?
r specifiers
uction or indirect addressin
put in registers
to more registers: 8 (x86)!32 (MIPS) !128 (IA32) has 16 64-bit integer and 16 128-bit FP registers
: Instruction Set Architectures
putting more
1 (Martin/Roth)
• Support me
Address Size
• Alternative (wrong)
• Most critical, inescapable
irtual address size
• Determines size of addressable (usa
size of 2n
Will limit the
uire nasty hacks
• x86 evolution:
• 4-bit (4004), 8-bit (8008), 16-bit (8086),
• 32-bit + protected memory (80386)
• 64-bit (AMD’s Opteron & Intel’s EM64T Pen
: Instruction Set Architectures
32-bit or 64-bit address spaces
e of calculation operations
ble) memory
t already at) 64 bits
ISA design decision
(E.g., x86 segments)
Global regi
• Sun SPARC om the RISC I) • 32 integer registers divided
• Explicitsave/restoreinst
hardware activation
into: 8 global, 8 ructions
restore: locals zeroed, inputs ! outputs, inputs “popped” Hardware stack provides few (4) on-chip register frames
d-to/filled-from me
omatic param
– Hidden memory operations (some restores fast, o – A nightmare for register renaming (more later)
1 (Martin/Roth)
: Instruction Set Architectures
caller-saved registers
traffic on shallow (<4 deep) call graphs 8 input, 8 locals zeroed Memory Addressing • Addressing mode: way of specifying address • Used in memory-memory or load/store instructions in register ISA • Examples • Register-Indirect: R1=mem[R2] • Displacement: R1=mem[R2+immed] • Index-base: R1=mem[R2+R3] • Memory-indirect: R1=mem[mem[R2]] • Auto-increment: R1=mem[R2], R2= R2+1 • Auto-indexing:R1=mem[R2+immed],R2=R2+immed • Scaled: R1=mem[R2+R3*immed1+immed2] • PC-relative: R1=mem[PC+imm] • What high-level program idioms are these used for? CIS 501 (Martin/Roth): Instruction Set Architectures 28 Example: M I-type instructions: • Is 16-bits enough? IPS Addressing plements only displacement • Why? Experiment on VAX (I • Disp: 61%, reg-ind: 19% small displacem SA with every mode) found , scaled: 11%, mem-ind: 5%, ent or registe • Yes? VAX experiment showed 1% accesses use displacem r indirect (displacement 0) distribution other: 4% 1 (Martin/Roth) Rs(5) Rt(5) Reg+Reg mode : Instruction Set Architectures ntrol Instructi One issue: testing • Option I: compare and branch insns branch-less-than R1,10,target 1 (Martin/Roth) • Option II: implicit condition branch-neg ndition codes • Option III: condition for conditio wo ALUs: one for c set-less-than R2,R1,10 branch-not-equal-zero R2,target ditional instru : Instruction Set Architectures ctions, + o registers, separate ondition, one sets “negative” ALU per, + explicit for target address branch insns e is tricky dependence ccess alig address % si • Aligned: loa • Unaligned load-half @XX • Question: (uncommon case)? Support in hardware? M akes all accesses slow Trap to software routine? Possibility • Load, shift , load, shift, MIPS? ISA support: unaligned access using two instructio lwl @XXXX10; lwr • Big-endian: sensibl rder (e.g., MIPS, PowerPC) integer: “00000000 00000000 00000010 00000011 • Little-e integer: “00000011 00000010 00000000 00000000 • Why little endian? To different? To be annoying? Nobody knows 1 (Martin/Roth) : Instruction Set Architectures Example: MIPS Conditional Branches • MIPS uses combination of options II/III • Compare2registersandbranch:beq,bne • Equality and inequality only + Don’t need an adder for comparison • Compare1registertozeroandbranch:bgtz,bgez,bltz,blez • Greater/less than comparisons + Don’t need adder for comparison • Setexplicitconditionregisters:slt,sltu,slti,sltiu,etc. • More than 80% of branches are (in)equalities or comparisons to 0 • OK to take two insns to do remaining branches (MCCF) CIS 501 (Martin/Roth): Instruction Set Architectures 32 ntrol Instructi • Option I: PC-relative • Used for branches • Option II: Absolute • Position independent outside procedure • Used for procedure calls • Option III: Indir • Howfardo 1 (Martin/Roth) computing targets on-independe within proced ect (target found in regi for jumping to dynamic targets for returns ally not so far within a procedure (t er from one procedure to another : Instruction Set Architectures ic procedure calls, switches hey don’t get ntrol Instructi • Implicit r • Link (remember) • Directjump-and-link:jal • Indirectjump-and-link:jal 1 (Martin/Roth) support for address register is : Instruction Set Architectures calling insn + 4 程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com

Leave a Reply

Your email address will not be published. Required fields are marked *