# CS计算机代考程序代写 mips compiler ECE437: Introduction to Digital Computer Design

ECE437: Introduction to Digital Computer Design

Chapter 1b (performance, 1.4 onwards)

Spring 2021

Performance of Computers

• Which computer is fastest? • Not so simple

– scientific simulation – FP performance

– program development – Integer performance

– commercial work – memory + I/O

ECE437, S’21 © Vijaykumar and Thottethodi (2)

Performance of Computers

• Want to buy the fastest computer for what you want to do

– workload is important

• Want to design the fastest computer for what they want to pay

– cost is an important criterion

ECE437, S’21 © Vijaykumar and Thottethodi (3)

Outline

• Time and performance

• Iron law

• MIPS and MFLOPS

• Which programs and how to average • Amdahl’s law

ECE437, S’21 © Vijaykumar and Thottethodi (4)

1

Defining Performance

• What is important to whom

Limited model

“Start job, wait 1. Computer system user for end.” Suggest

– minimize elapsed time for program = alternate

time_end – time_start – called response time

performance metrics.

2. Computer center manager

– maximize completion rate = #jobs/second – called throughput

ECE437, S’21 © Vijaykumar and Thottethodi (5)

Response Time vs. Throughput

• Is throughput = 1/av. response time?

– ONLY if NO overlap

– with overlap, throughput > 1/av.response time

• e.g., a lunch buffet – assume 5 entrees

– each person takes 2 minutes at every entree

• throughput is 1 person every 2 minutes

• BUT time to fill up tray is 10 minutes

– Why? Otherwise, what would the throughput be?

• because there are 5 people (each at 1 entree) simultaneously;

• if there is no such overlap throughput = 1/10

ECE437, S’21 © Vijaykumar and Thottethodi (6)

What is Performance for us?

• For computer architects

– CPU execution time = time spent running a program

• Shorter time better but people like better to be bigger to match intuition

– performance = 1/time (shorter time better perf.)

– where time = response time, CPU run time, etc.

• Elapsed time = CPU execution time + I/O wait

• We will concentrate mostly on CPU execution time

ECE437, S’21 © Vijaykumar and Thottethodi (7)

Improve Performance

• Improve (a) response time or (b) throughput?

– faster CPU

• both (a) and (b)

– Add more CPUs

• (b) but (a) may be improved due to reduced queueing

ECE437, S’21 © Vijaykumar and Thottethodi (8)

Give an example of this phenomenon

2

Performance Comparison

• Machine A is n times faster than machine B iff

– perf(A)/perf(B) = time(B)/time(A) = n

• Machine A is x% faster than machine B iff – perf(A)/perf(B) = time(B)/time(A) = 1 + x/100

• E.g., A 10s, B 15s

– 15/10 = 1.5 => A is 1.5 times faster than B

– 15/10=1+50/100=>Ais50%fasterthanB

ECE437, S’21 © Vijaykumar and Thottethodi (9)

Breaking down Performance

• A program is broken into instructions

– H/W is aware of instructions, not programs

• At lower level, H/W breaks instructions into cycles

– lower level state machines change state every cycle

• E.g., 4 GHz Opteron runs 4 B cycles/sec, 1 cycle = 0.25 ns

ECE437, S’21 © Vijaykumar and Thottethodi (10)

Iron law

• Time/program = instrs/program x cycles/instr x sec/cycle

– NEVER forget this!

• sec/cycle (a.k.a. cycle time, clock time)

– mostly determined by technology and CPU orgn. • cycles/instr (a.k.a. CPI)

– mostly determined by ISA and CPU organization – EFFECTIVE cycles/instr and NOT actual latency – overlap among instructions makes this smaller

• Each instr 5 cycles but 5 instrs overlap CPI = 1

– AVERAGE over instrs (instrs have different CPI)

• instr/program (a.k.a. instruction count)

– instrs executed, NOT static code

– mostly determined by program, compiler, ISA

ECE437, S’21 © Vijaykumar and Thottethodi (11)

Our Goal

• Minimize time which is the product, NOT isolated terms

– Tempting because of the separate terms

– Optimizing one term may worsen time by worsening other terms!

• Common error to miss terms while devising optimizations

– E.g., ISA change to decrease instr. count – BUT leads to slower clock

ECE437, S’21 © Vijaykumar and Thottethodi (12)

3

Iron Law Example

• Machine A: clock 1 ns, CPI 2.0, for a program

• Machine B: clock 2 ns, CPI 1.2, for same program

• Which is faster and how much?

• Time/program = instrs/program x cycles/instr x sec/cycle

– Time(A):Nx2.0x1=2N

– Time(B):Nx1.2×2=2.4N

• Compare: Time(B)/Time(A) = 2.4N/2N = 1.2

• So, Machine A is 20% faster than Machine B for this program

ECE437, S’21 © Vijaykumar and Thottethodi (13)

Iron Law Example

• KeepclockofAat1nsandclockofBat 2 ns

• For equal performance, if CPI of B is 1.2, what is A’s CPI?

– Time(B)/Time(A) = 1 = (N x 2 x 1.2)/(N x 1 x CPI(A))

– CPI(A) = 2.4

ECE437, S’21 © Vijaykumar and Thottethodi (14)

Iron Law Example

• LetCPIofA= 2.0andCPIofB=1.2

• For equal performance, if clock of B is 2 ns, what is A’s clock?

ECE437, S’21 © Vijaykumar and Thottethodi (15)

Iron Law Example

• LetCPIofA=2.0andCPIofB=1.2

• For equal performance, if clock of B is 2 ns, what is A’s clock?

– Time(B)/Time(A) = 1 = (N x 1.2 x 2)/(N x 2.0 x clock(A))

– clock(A) = 1.2 ns

ECE437, S’21 © Vijaykumar and Thottethodi (16)

4

Iron Law notes

• You will see ch4a (single cycle) in the lab

• But not Ch1b (Iron law) so spend an hour to think about and internalize Ch1b

ECE437, S’21 © Vijaykumar and Thottethodi (17)

Other Metrics

• Other than execution time

• MIPS and MFLOPS – Bogus!

• MIPS = millions of instructions per sec

= instruction count/(execution time x 106) – Not MIPS, the instruction set

= clock rate/(CPI x 106) (How?) • BUT MIPS has problems

ECE437, S’21 © Vijaykumar and Thottethodi (18)

Problems with MIPS

• E.g.,withoutFPH/W,anFPopmaytake50single- cycle instrs

• withFPH/Wonlyone2-cycleinstratsameclock

• ThusaddingFPH/W

– CPIincreases(why?)TheFPopgoesfrom50/50to2/1 – butinstrs/progdecreasesmore(why?)eachoftheFPop

reduces from 50 to 1, factor of 50

– totalexecutiontimedecreases

• ForMIPS

– instrs/prog ignored

• MIPSgetsworse!

ECE437, S’21 © Vijaykumar and Thottethodi (19)

Problems with MIPS

• Ignores program

• Usually used to quote peak performance

– ideal conditions => guarantee not to exceed!! • When is MIPS ok?

– same compiler and same ISA

– e.g., same binary running on Xeon Ivy Bridge and

Xeon Broadwell

– why? instrs/prog is constant and may be ignored

ECE437, S’21 © Vijaykumar and Thottethodi (20)

5

Other (Bogus) Metrics

• MFLOPS = FP ops in program/(execution time x 106)

• Assuming FP ops independent of compiler and ISA

– Assumption not true

– Eg may not have divide instruction in ISA

– optimizing compilers can remove some insts

• Relative MIPS and normalized MFLOPS – adds to confusion!

ECE437, S’21 © Vijaykumar and Thottethodi (21)

Rules

• Use ONLY Time

– Beware when reading, especially if details are omitted

– Beware of Peak

• Peak is the performance that the chip maker guarantees not to exceed – meaningless!

ECE437, S’21 © Vijaykumar and Thottethodi (22)

Outline

• Time and performance • Iron law

• MIPS and MFLOPS

• Which programs

• How to average • Amdahl’s law

ECE437, S’21 © Vijaykumar and Thottethodi (23)

Which Programs

• Execution time of what

• Best case – you run the same set of programs everyday

– port them and time the whole “workload” • In reality, use benchmarks

– programs chosen to measure performance

– predict performance of actual workload (hopefully) – saves effort and money

– representative? honest?

ECE437, S’21 © Vijaykumar and Thottethodi (24)

6

Benchmarks: SPEC2006

• SPEC: System Performance Evaluation Cooperative

• Latest is SPEC2006, before SPEC89, SPEC92, SPEC95, SPEC2000

• 12 integer and 17 floating point programs

– normalize run time with a Gold Standard processor (SPEC ratio)

– Geometric Mean (GM) of the normalized times (why GM?)

ECE437, S’21 © Vijaykumar and Thottethodi (25)

One-minute quiz

• State the Iron Law – Time per program =

• Previous quiz

– What is the ALU operation for loads and stores?

• Add

ECE437, S’21 © Vijaykumar and Thottethodi (26)

SPEC CINT2006 http://www.spec.org/cpu2006/CINT2006/

Benchmark

Description

Xalancbmk

XML processing

astar

Path finding

mcf

Combinatorial optimization

gcc

GNU C compiler

Omnetpp

Discrete event simulation

h264ref

Video compression

libquantum

Quantum computing

gobmk

Artificial Intelligence: Go

hmmer

Gene Sequence Search

Bzip2

Compression

sjeng

Artificial Intelligence: chess

Perlbench

PERL programming language

ECE437, S’21 © Vijaykumar and Thottethodi (27)

SPEC CFP2006 http://www.spec.org/cpu2006/CFP2006/

Benchmark

Description

milc

Quantum chromodynamics

Bwaves

Fluid dynamics

Soplex

Simplex LP solver

Wrf

Weather modeling

17 in all, see webpage

Speech recognition, quantum chemistry, structural mechanics, ray tracing, etc.

ECE437, S’21 © Vijaykumar and Thottethodi (28)

7

How to average

• SPEC has 29 programs – how to average? • Example

• One answer: total execution time, then how much faster than A is B? 1001/110 = 9.1

ECE437, S’21 © Vijaykumar and (29) Thottethodi

Machine A

Machine B

Program 1

1s

10s

Program 2

1000s

100s

Total

1001s

110s

How to average

• Another:arithmeticmean(sameresult:B9.1times

faster than A )

• Arithmetic mean of times: time

programs

i1

i

• AM(A)=1001/2=500.5

• AM(B)=110/2=55

• 500.5/55=9.1

• Validonlyifprogramsrunequallyoften,elseuse

“weight” factors

• Weighted arithmetic mean:

ECE437, S’21 © Vijaykumar and (30) Thottethodi

i1

n

/ n

for n

n

weight time / n

ii

Other Averages • E.g., 30 mph for first 10 miles

• 90 mph for next 10 miles. Average speed?

• Average speed = (30+90)/2 =60mph? WRONG

• Average speed = total distance / total time = (20 / (10/30+10/90))

= 45 mph

• What if it was 10 hours at each speed?

– instead of 10 miles

ECE437, S’21 © Vijaykumar and Thottethodi (31)

Harmonic Mean

• Harmonic mean of rates =

1 n1

rate i1 i n

– Use HM if forced to start and end with rates • Trick to do arithmetic mean of times but

using rates and not times

ECE437, S’21 © Vijaykumar and (32) Thottethodi

8

Practice

• Machine A runs 10M instructions at 15 MIPS and the next 10M instructions at 45 MIPS

– Average MIPS = ??

• Machine A runs for 10 seconds at 15 MIPS and the next 10 seconds at 45 MIPS

– Average MIPS = ??

ECE437, S’21 © Vijaykumar and Thottethodi (33)

Dealing with Ratios

• Absolute times

• Now consider ratios (w.r.t. A)

• Averages: A = 1, B = 5.05 ECE437, S’21 © Vijaykumar and Thottethodi (34)

Machine A

Machine B

Program 1

1s

10s

Program 2

1000s

100s

Machine A

Machine B

Program 1

1

10

Program 2

1

0.1

Dealing with Ratios

• Absolute times

• Now consider ratios (w.r.t. B)

• Averages: A = 5.05, B = 1

• Both cannot be true

ECE437, S’21 © Vijaykumar and Thottethodi (35)

Machine A

Machine B

Program 1

1s

10s

Program 2

1000s

100s

Machine A

Machine B

Program 1

0.1

1

Program 2

10

1

Geometric Mean

• Don’tusearithmeticmeanonratios(normalized numbers)

• Usegeometricmeanforratios

– geometric mean of ratios =

– Use GM if forced to use ratios

• Independentofreferencemachine(mathproperty)

• Intheexample,GMformachineAis1,formachine

B is also 1

• normalizedwithrespecttoeithermachine

ECE437, S’21 © Vijaykumar and (36) Thottethodi

n

n ratio

i i1

9

But..

• Geometric mean of ratios is not proportional to total time

• AM in example says machine B is 9.1 times faster

• GM says they are equal

• If we took total execution time, A and B are equal only if

– program 1 is run 100 times more often than program 2

• Generally, GM will mispredict for three or more machines

ECE437, S’21 © Vijaykumar and Thottethodi (37)

Previous Midterm Question

• Machine A runs 9 times faster than machine B when running program-P,

• Machine B runs 4 times faster than machine A when running program Q.

• Which machine is faster (and by what factor) when averaged across both programs?

ECE437, S’21 © Vijaykumar and Thottethodi (38)

Summary for Averages

• Use AM for times

• Use HM if forced to use rates

• Use GM if forced to use ratios

• Better yet, use unnormalized, raw run times to compute performance

ECE437, S’21 © Vijaykumar and Thottethodi (39)

Amdahl’s Law

• Whydoesthecommoncasematterthemost?

• Letanoptimizationspeedffractionoftimebya

factor of s

• assumingthatoldtime=T,whatisthespeedup?

– fisthe“affected”fractionofT – (1-f) is the unaffected fraction

• Speedup = timeold unaffectedold affectedold timenew unaffectednew affectednew

• = (1f)TfT ( 1 f ) T sf T

ECE437, S’21 © Vijaykumar and Thottethodi

1

( 1 f ) sf

(40)

10

Amdahl’s Law Example

• Your boss asks you to improve processor performance

• Two options: What should you do?

– improve the ALU used 95% of time, by 10%

– improve the square-root unit used 5%, by a factor of 10

ECE437, S’21 © Vijaykumar and (41) Thottethodi

f

s

Speedup

95%

1.10

1.094

5%

10

1.047

5%

∞

1.052

Amdahl’s Law: Limit

• Make common case fast because:

10

8

11 6 lim

s 1ff s 1f 4 2 0

ECE437, S’21 © Vijaykumar and (42) Thottethodi

0 0.2 0.4 0.6 0.8 1 f

Amdahl’s Law

• “Make common case fast”

– Scientific heuristic, not religious commandment – Use for intuition, verify with numbers

• 60% can be improved by a factor of 2 – Speedup = 1/(0.4+0.6/2) = 1/0.7

• 40% can be improved by a factor of 8 – Speedup = 1/(0.6+0.4/8) = 1/0.65

• Second option is better

– Less common case, but higher speedup compensates

ECE437, S’21 © Vijaykumar and Thottethodi (43)

Summary of Chapter 1b

• Timeandperformance:

– MachineAntimesfasterthanMachineB – iff Time(B)/Time(A) = n

• IronLaw:Time/prog

– InstrcountxCPIxCycletime

• OtherMetrics:MIPSandMFLOPS – Bewareofpeakandomitteddetails

• Benchmarks:SPEC2006(int+FP) • Summarizeperformance:

– AMfortime,HMforrate,GMforratio

• Amdahl’s Law: Speedup = 1 common case fast

1ff s

ECE437, S’21 © Vijaykumar and (44) Thottethodi

11

Speedup

Single-cycle Datapath

CPI

Inst. Count

Cycle Time

• Performance Implications

– Minimize all three

– Insts/prog fixed — f(ISA,compiler)

– CPI = 1 : As good as it gets (*)

– Clock cycle time : high, “lw” critical path

ECE437, S’21 © Vijaykumar and Thottethodi (45)

Back to Ch4b

• To improve performance beyond single- cycle datapath

– Now that we understand performance

ECE437, S’21 © Vijaykumar and Thottethodi (46)

12