# CS计算机代考程序代写 x86 mips assembly compiler algorithm Name:_______________________________

Name:_______________________________

ECE 437: Introduction to Digital Computer Design & Prototyping Midterm #1

Your exam should have 14 (fourteen) pages.

Page 13 is intentionally left blank.

Write your name on this page and at least one other page.

Closed book, closed notes.

Switch off and put away cell-phones/pagers/calculators.

Numerical expressions do NOT have to evaluated. The correct expression is

sufficient to get full credit.

Apportion your time carefully.

For multiple choice questions, please choose the most appropriate choice if you

think more than one choice is acceptable.

Numbers in brackets represent points for that question. Points add up to 100.

You may request additional copies of Figure 1. If you use additional copies,

you must unambiguously mark the copy that you want graded. If you submit multiple attempts without marking one copy as your solution, you will get zero credit for that problem.

Good luck.

Prob.

Max.

Score

I

30

II

20

III

25

IV

25

Total

100

Page 1 of 12

I.

(30 points) Multiple Choice Questions (3 points each)

a. Which of the following instructions is not likely to be in a RISC instruction set?

(1) div reg3, reg2, reg1 // reg3 <- reg2/reg1
(2) move reg1, reg2 // move contents of reg1 to reg2
(3) max reg2, reg1, imm-n // find the maximum of “imm-n” integers beginning at
address contained in reg1 and copy it to reg2.
(4) add reg3, reg1, reg2 // place sum of reg1 and reg2 in reg3
(5) Choices (3) and (4) are both unlikely to be in a RISC instruction set.
b. What happens if the register file reads and writes happen on the same clock edge. (In the lecture we assumed that writes happen in the first half-clock and the values can be read in the second half.)
(1) It is a structural hazard that increases CPI.
(2) It necessitates changes in forwarding logic but does not affect performance. (3) It introduces true data-hazards which require stalling.
(4) Either (2) or (3) could occur.
(5) None of the above.
c. Which of the following is not correct?
(1) The key benefit of pipelining is that it offers a huge improvement in clock cycle
with a modest loss in CPI.
(2) The control signal generation in a pipelined processor is identical to that of a
single-cycled processor.
(3) The x86 ISA is RISC-like ISA.
(4) If delay slots cannot be filled with useful work, it is perfectly okay to fill them with
NOOPs.
(5) Harmonic means should be used to average rates.
d. What is the CPI on a pipelined processor if 30% of instructions are load instructions and 50% of loads are followed by an immediate use. (Assume an implementation like the one shown in Fig 2 and assume that there are no other RAW hazards.) Tricky!
(1) 1.05
(2) 1.10
(3) 1.15
(4) Cannot be determined without additional information (5) None of the above
e. Choose the correct expression for overall speedup when applying three non- overlapping fractions f1, f2 and f3 when each fraction fi achieves a speedup of si. (1) 1/( (1-f1-f2-f3) + (f1/s1+f2/s2+f3/s3) )
(2) (f1+f2+f3)/(s1+s2+s2)
(3) (f1/s1 + f2/s2 + f3/s3)
(4) 1/(1–f1 +f1/s1)+1/(1–f2 +f2/s2)+1/(1–f3 +f3/s3) (5) None of the above
Name:_______________________________
Page 2 of 12
Name:_______________________________
f. Characterize the following statements as TRUE or FALSE.
(1) A four stage pipeline with the work distributed in 25%:30%:20%:25% proportions
is 20% faster than an evenly divided four-stage pipeline. (Assuming no pipeline
hazards.)
(2) The “jal” instruction in the MIPS instruction set requires the Register Write Enable
control signal to be asserted
(3) An optimization that makes 75% of execution 3 times faster is better than an
optimization that makes 25% of execution 10 times faster.
(4) MFLOPS is a meaningful comparison metric if the same workload is used.
(5) Averaging performance across programs using unweighted arithmetic means
implicitly assumes that all programs are equally frequent.
g. If machine A is p times, q times and r times as fast as machine B on three different workloads (say P,Q, and R), on average machine A is faster than machine B by:
(1) HM(p,q,r)
(2) (p+q+r)/3
(3) GM(p,q,r)
(4) AM(p,q,r)
(5) Options (2) and (4) are both correct.
h. In general, increasing the pipeline depth increases
(1) The performance penalty of branch misprediction
(2) Probability of data hazards
(3) Clockspeed
(4) All of the above
(5) None of the above
i. Why is the stack necessary?
(1) To enable function calls within function calls.
(2) To enable floating point operations.
(3) It is not really necessary. It is a software convention that can be redesigned to
use FIFO queues.
(4) It is necessary only for specific algorithms that require a stack data-structure.
(5) None of the above
j. Which of the three terms in the execution time expression do compiler optimizations improve?
(1) CPI
(2) Clock period
(3) Number of Instructions
(4) More than one of the above
(5) None of the above
Page 3 of 12
II.
(20 points) Short Questions (5 points each. ) There are two pairs of related questions (a and b are related. c and d are related.) So it may help to read both before you attempt either.
a. Consider a pipeline with branch resolution in the ID stage as shown in Fig 2. Though the comparator input values are shown as coming from the register file, they may need to be forwarded/bypassed from the pipeline registers of older instructions that have not yet written their results back to the register file. Provide logic equations to handle bypassing to the ID-stage branch resolution for cases that do not require stalling
b. For the same branch resolution hardware, provide logic equations to detect hazards that do require a stall. (Your solution must also identify cases which require stalling.)
Name:_______________________________
Page 4 of 12
c. (5 points) Consider two pipelined processor implementations. One (say, implementation A) resolves branches in the MEM stage. The other (implementation B) resolves branches in the ID stage. Assume that there are no pipeline hazards other than branch-related control flow hazards. Also, assume that there the branch hazards are handled by introducing bubbles (i.e., no branch prediction, no delay slots). Finally, assume that the two implementations operate at the same clock speed. If we assume a workload with 20% branches, which implementation is faster and by what factor?
d. (5 points) Provide an expression for CPI for an arbitrary n-stage pipeline assuming that branches are resolved in the kth (1 < k < n) stage. Assume that there are no other hazards, branches are 20% of all instructions, and branch prediction accuracy is x (0

4. ALUSrc (0 Selects 2nd register, 1 selects sign-extended immediate as second ALU operand)

5. RegWrite (Asserted to enable writes to the register file.)

6. Branch (Asserted if instruction is a “beq” instruction.)

7. MemRead (Asserted to enable data memory read.)

8. MemWrite (Asserted to enable data memory write.)

Name:_______________________________

Page 6 of 12

Name:_______________________________

Fig 1. Single Cycle Datapath

Page 7 of 12

Name:_______________________________

Fig 2. Pipelined Datapath

Page 8 of 12

IV. (25 points) Pipelined Processor Implementation.

This question is used to satisfy the pipeline ABET outcome. To successfully satisfy the ABET outcome, you must score at least 70% on this question.

Consider the code below

outer:

inner:

mov $3, 3

mov $4, 2

subi $4,$4,1

bne $4, $0, inner

subi $3 $3, 1

bne $3, $0, outer

Name:_______________________________

(a) (10 points) Compute the overall branch prediction accuracy for the code shown above assuming a single-bit branch predictor. Assume that the two branches map to different entries in the branch predictor table and that both predictors are initially in the “Predict Taken” state. Your solution must list whether each occurrence of each of the two branches is predicted correctly or not.

# branch 1

# branch 2

Page 9 of 12

Name:_______________________________

(b) (10 points) Now compute the branch prediction accuracy assuming a 2-bit branch predictor. As before, assume that the two branches map to different entries in the branch predictor table and that both predictors are initially in the “Strongly Predict Taken” state. Your solution must list whether each occurrence of each of the two branches is predicted correctly or not.

Page 10 of 12

Name:_______________________________

(c) (5 points) Assume that there are two machines, one with a single bit-predictor and another with a 2-bit predictor. If the branch prediction accuracy computed in parts (a) and (b) are the average prediction accuracies for workloads of interest, which machine is faster and by what factor? (Assume that branch resolution is at the end of MEM stage in both cases.).

Assume the CPI ignoring branches is 1 and every 6th instruction is a branch.

Page 11 of 12

Powers of 2

Name:_______________________________

0

1

1

2

2

4

3

8

4

16

5

32

6

64

7

128

8

256

9

512

10

1,024

11

2,048

12

4,096

13

8,192

14

16,384

15

32,768

16

65,536

17

131,072

18

262,144

19

524,288

20

1,048,576

21

2,097,152

22

4,194,304

23

8,388,608

24

16,777,216

25

33,554,432

26

67,108,864

27

134,217,728

28

268,435,456

29

536,870,912

30

1,073,741,824

31

2,147,483,648

32

4,294,967,296

Glossary:

CPI: Cycles per Instruction

CC: Clock cycletime

IC: Instruction Count

MIPS: (a) Millions of instructions per second (b) The MIPS ISA (depending on context) FSM: Finite State Machine

ALU: Arithmetic Logic Unit

RISC: Reduced Instruction Set Computer

CISC: Complex Instruction Set Computer

OS: Operating System

RTL: Register Transfer Language

Page 12 of 12