Monthly Archives: March 2021

CS计算机代考程序代写 python assignment_5

assignment_5

MSE3114 Computational Methods for Physicists and Materials Engineers

Assignment 5

Q1. (Vector norm) Vector norm is a measure of the “size” of a vector. We have learned three norms: $\ell_1$-norm, $\ell_2$-norm and $\ell_\infty$-norm. In this question we will understand these norms by some examples.

Figure (a) shows the google map of Manhattan of New York City. Consider how to measure the distance between two locations on this map. To simplify the problem, we only consider the simplified map shown in Fig. (b) and measure the distance between “Start” point and “End” point. Suppose that the address of “Start” is $i$-th St $j$-th Ave, the address of “End” is $p$-th St $q$-th Ave, and the size of block is $L_1$ by $L_2$. Then, the coordinate of “Start” is $\mathbf{x}_\text{start} = (iL_1, jL_2)$ and the coordinate of “End” is $\mathbf{x}_\text{end} = (pL_1, qL_2)$. The straight distance between “Start” and “End” is the $\ell_2$-norm: $\lVert \mathbf{x}_\text{end} – \mathbf{x}_\text{start}\rVert_2$; i.e., the blue line in Fig. (b). Write a python code to define a function straight_distance(i, j, p, q, L1, L2), which takes in the values of $i$, $j$, $p$, $q$, $L_1$ and $L_2$ and returns the straight distance.

However, a taxi cab cannot run along the straight line; it must run along the streets and avenues. For the taxi cab, the distance between “Start” and “End” is the $\ell_1$-norm: $\lVert \mathbf{x}_\text{end} – \mathbf{x}_\text{start}\rVert_1$; i.e., the red line in Fig. (a). Define a function taxi_cab_distance(i, j, p, q, L1, L2), which takes in the values of $i$, $j$, $p$, $q$, $L_1$ and $L_2$ and returns the distance for a taxi cab.
If the address of “Start” point is 10th St 4th Ave, the address of “End” point is 3th St 7th Ave, and the size of block is $L_1 = 300$ m and $L_2 = 100$ m. Apply the function straight_distance() defined in Q1.1 to calculate the straight distance. Apply the function taxi_cab_distance() defined in Q1.2 to calculate the distance for a taxi cab.
Overhead crane is used to move an object from one place to another; see Fig. (a). The object can be moved along $\mathbf{e}_1$- and $\mathbf{e}_2$-directions simultaneously with the same velocity $v$. Consider how much time an overhead crane takes to move an object for “Start” point to “End” point in Fig. (b). Suppose that the coordinate of “Start” is $\mathbf{x}_\text{start} = (x_1, x_2)$ and the coordinate of “End” is $\mathbf{x}_\text{end} = (y_1, y_2)$. The time for the overhead crane takes to move an object from $\mathbf{x}_\text{start}$ to $\mathbf{x}_\text{end}$ is $t = \lVert \mathbf{x}_\text{end} – \mathbf{x}_\text{start}\rVert_\infty / v$, i.e., the $\ell_\infty$-norm divided by the velocity $v$. Define a function overhead_crane_time(x1, x2, y1, y2, v), which takes in the coordinates $(x_1, x_2)$ and $(y_1, y_2)$ and the velocity $v$ and returns the time for an overhead crane to move an object from $(x_1, x_2)$ to $(y_1, y_2)$. Apply this function to calculate the time to move an object from $(2.1, 0.3)$ to $(11.3, 31.0)$ (unit: m) with the velocity $v = 0.1$ m/s.

(25 marks)

Q2. (Contraction) If an operator $\mathbf{f}$ is a contraction, there must exist $q \in [0, 1)$ such that
$$
\lVert \mathbf{f}(\mathbf{x}) – \mathbf{f}(\mathbf{y}) \rVert
\le q \lVert \mathbf{x} – \mathbf{y} \rVert
$$
for all $\mathbf{x}, \mathbf{y} \in \mathbb{R}$.
You can choose any norm convenient for your calculation, e.g. $\ell_1$-norm, $\ell_2$-norm, $\ell_\infty$-norm, etc. Check if the following operators are contractions. (You don’t need python code for solving this question.)

Consider the following linear equations:
$$
\mathbf{f}(\mathbf{x})
\equiv \left(\begin{array}{c}
0.1 x_1 + 0.2 x_2 – 10 \\
0.3 x_1 – 0.6 x_2 + 1
\end{array}\right)
= \left(\begin{array}{c}
x_1 \\ x_2
\end{array}\right)
\equiv \mathbf{x}.
$$
Is $\mathbf{f}$ a contraction for $\mathbf{x} \in \mathbb{R}$? Prove your answer.
Hint: (i) you can use $\ell_1$-norm or $\ell_\infty$-norm. (ii) Recall the inequality: $|x + y| \le |x| + |y|$ and $|x – y| \le |x| + |-y| = |x| + |y|$.

Kepler’s equation is
$$
M + e \sin E = E,
$$
where $M$ is the mean anomaly, $E$ is the eccentric anomaly and $e$ is the eccentricity.
Consider the equation $f(E) = E$, where $f(E) \equiv M + e\sin E$.
Is $f$ a contraction for $E \in \mathbb{R}$? Prove your answer.
Hint: (i) Note the relationship:
$$
\sin x – \sin y = 2 \sin\left(\frac{x – y}{2}\right) \cos\left(\frac{x + y}{2}\right).
$$
(ii) Note the inequalities: $|\sin x| \le |x|$ and $|\cos x| \le 1$.
(iii) Finally, you may find that if $f$ is a contraction depends on the value of $e$.

(25 marks)

Q3. (Jacobi and Gauss-Seidel methods) Write a python code for solving a system of linear equations by Jacobi method and Gauss-Seidel method. Written in matrix form, a system of linear equations is expressed as $\mathbf{A}\mathbf{x} = \mathbf{b}$.

Define a function jacobi(A, b, x, eps, max_n), where A is the matrix $\mathbf{A}$, b is the vector $\mathbf{b}$, x is the initial guess of the solution, eps is the criterion for $\ell_2$-norm of residual (i.e., $\lVert \mathbf{x}_\text{current_step} – \mathbf{x}_\text{previous_step}\rVert_2$), and max_n is the maximum iteration steps allowed. This function solves the equations by Jacobi iteration. It returns the solution when $\lVert \mathbf{x}_\text{current_step} – \mathbf{x}_\text{previous_step}\rVert_2 \lt$ eps, or when the iteration step is larger than max_n.

Define a function gauss_seidel(A, b, x, eps, max_n), where A is the matrix $\mathbf{A}$, b is the vector $\mathbf{b}$, x is the initial guess of the solution, eps is the criterion for $\ell_2$-norm of residual (i.e., $\lVert \mathbf{x}_\text{current_step} – \mathbf{x}_\text{previous_step}\rVert_2$), and max_n is the maximum iteration steps allowed. This function solves the equations by Gauss-Seidel iteration. It returns the solution when $\lVert \mathbf{x}_\text{current_step} – \mathbf{x}_\text{previous_step}\rVert_2 \lt$ eps, or when the iteration step is larger than max_n.

Solve the following equations:
$$
\left(\begin{array}{cccc}
20 & 1.3 & -2 \\
1 & -13 & -7 \\
0.1 & 0.7 & 3
\end{array}\right)
\left(\begin{array}{c}
x_1 \\ x_2 \\ x_3
\end{array}\right)
=
\left(\begin{array}{c}
3.3 \\ 10 \\ 52
\end{array}\right)
$$
by three methods: (i) calling the function jacobi(A, b, x, eps, max_n) defined in Q3.1 with the initial guess x = $\mathbf{0}$, eps $=10^{-10}$ and max_n $=1000000$, (ii) calling the function gauss_seidel(A, b, x, eps, max_n) defined in Q3.2 with the initial guess x = $\mathbf{0}$, eps $=10^{-10}$ and max_n $=1000000$, and (iii) applying the direct method in SciPy scipy.linalg.solve().

(25 marks)

Q4. (GMRES and CG) Write a python code to do the following operations.

Define a function create_sparse_rand_mat(N, zero_frac) for creating (i) A: a sparse random symmetric matrix $\mathbf{A}$ of size $N\times N$ with the fraction of non-zero elements zero_frac stored in format of compressed sparse row matrix (CSR), (ii) A_arr: the same matrix $\mathbf{A}$ stored in format of numpy array, and (iii) b: a random matrix $\mathbf{b}$ of size $N\times 1$ by

A = scipy.sparse.random(N, N, zero_frac) # sparse random matrix in format of CSR
A = A.toarray() # convert CSR format to numpy array format
A_arr = A + np.transpose(A) + 4*np.eye(N) # addition of “4*np.eye(N)” guaranttes that A is well-conditioned
A = scipy.sparse.csr_matrix(A_arr) # convert numpy array format to CSR format
b = np.random.rand(N, 1)

Create the matrices $\mathbf{A}$ in both CSR format and array format and $\mathbf{b}$ by calling the function create_sparse_rand_mat(N, zero_frac) defined in Q4.1 with $N = 100$ and zero_frac $=0.005$. Plot the matrix $\mathbf{A}$ thus created by .imshow() in matplotlib.

Loop over the matrix size $N = 100, 150, 200, \cdots, 750, 800$. In each cycle with $N$, create random $\mathbf{A}$ and $\mathbf{b}$ by create_sparse_rand_mat(N, zero_frac) defined in Q4.1 with zero_frac $=0.005$. Count the times spent on runing scipy.sparse.linalg.gmres(A, b, x0, tol, maxiter), scipy.sparse.linalg.cg(A, b, x0, tol, maxiter), and scipy.linalg.solve(A, b). The parameters are x0 $=\mathbf{0}$, tol $=10^{-10}$ and maxiter $= 1000000$. You can learn how to count the time by time.time_ns() (in unit of nanosecond) from Page 13 of the note of lecture 2. For each cycle, append the values of $N$ and the three times mentioned above to four lists.

Plot the computation times spent on three methods vs. $N$. Show the legend to label the meaning of each curve.

(25 marks)

CS计算机代考程序代写 algorithm Midterm practice problems

Midterm practice problems
March 8, 2021
Due date: Just before the exam.
Remember: For any question you answer “I do not know” you get 20% of the grade associated with this question. A totally wrong answer gets 0.
All solutions should be typed.
Exercise 1 (20 points): k-diameters: Given a set X of n points in Rd and an integer k, find a partition of X into k clusters {X1,…,Xk} (that is, 􏰀ki=1Xi = X and Xi ∩Xj = ∅ for all i ̸= j) such that the objective
k
􏰁 diam(Xi) i=1
is minimized. Here diam(Z) = maxx∈Z,y∈Z ∥x − y∥1 is the diameter of the set Z ⊆ X.
1. Design an optimal polynomial-time algorithm for the k-diameters problem when d = 1. The running
time of your algorithm should be linear assuming that your data points are sorted. (10 points)
2. Prove that your algorithm is optimal. (10 points)
Exercise 2 (35 points) Assume you are given d-dimensional data points X = {x1, . . . , xn} and you want to partition them into k clusters, C1 , . . . , Ck , such that if ri is the representative of cluster Ci , then the error function
k E1(C1,…,Ck)=􏰁 􏰁 ||x−ri||1
i=1 x∈Ci
is minimized.
For this exercise, assume a clustering algorithm A that solves the above problem in time O(n2). Ad-
ditionally, assume that A is an α-approximation algorithm; that is it produces solutions with k clusters
C1′,…,Ck′ such that
where E∗ is the cost of the optimal solution.
E 1 ( C 1′ , . . . , C k′ ) ≤ α · E ∗ ,
1. Design a clustering algorithm that uses A as a subroutine and runs in time less than O(n2) and is
polynomial in k (15 points).
2. Show that your algorithm has a constant approximation bound. This approximation factor might
depend on α. Your answer should consist of both the value of the factor as well as a proof (20 points). 1

Exercise 3 (20 points): Consider a set of n data points x1, . . . , xn.
1. Assume a random number generator R() that generates values in the interval (0,1]. Let distance function dR between two points xi and xj with xi ̸= xj be dR(xi,xj) = R(). Also dR(xi,xi) = 0 for every xi. Prove or disprove that dR is a metric. (10 points)
2. Construct a graph G = (V,E), where a point xi is represented by node vi ∈ V. For every pair of nodes vi,vj, there exists an undirected edge in G with weight wij = dR(xi,xj). We define the distance function between two nodes vi and vj, denoted by dG(vi,vj), be the weight of the shortest path between vi and vj in graph G. Prove or disprove that dG is a metric. (15 points)
Exercise 4 (25 points): In class we have defined the Edit Distance between two strings x and y, of length n and m respectively to be the minimum (weighted) number of insertions, deletions and substitutions that transform string x to string y. We also demonstrated that assuming different deletion, insertion and substitution costs for every letter (or pairs of letters), the following dynamic-programming recursion computes the edit distance between x and y:
D(x(1…i−1),y(1…j))+delete(x[i]), 
D(x(1…i),y(1…j))=min D(x(1…i),y(1…j−1))+insert(y[j]),
D(x(1 . . . i − 1), y(1 . . . j − 1)) + substitute(x[i], y[j]).
In the above equation x(1…i) (resp. y(1…j)) is the substring of x (resp. of y) that consists of the first i (resp. j) symbols appearing in x (resp. y). Also, for symbol a, delete(a), insert(a) correspond to the cost of deleting or inserting a respectively. All these costs are greater than 0. Finally, for symbols a, b, substitute(a, b) corresponds to the cost of substituting symbol a with symbol b – the substitution cost is non-zero when a ̸= b and zero when a = b.
1. (15 points:) Prove or disprove that the edit distance function as defined above is a metric.
2. (10 points:) Find two instantiations of the edit-distance function that are metrics. An instantiation of the edit distance function is defined by a specific way of allocating costs to operations such as deletions, insertions and substitutions.
2

CS计算机代考程序代写 Advanced Analytics: Session 2.1 Data Presentation and Visualisation

Advanced Analytics: Session 2.1 Data Presentation and Visualisation

Sports Economics and Analytics:

Assessment Details
Professor Ian McHale

100% of module

Deadline = Friday 19th March 12 noon

Maximum length is 2,500 words

Discuss one of the following:
1

1. Competitive Balance
Competitive balance and uncertainty of outcome are considered central ideas of sports economics. Discuss measures of competitive balance and use one of these measures to analyse the level of, and changes to, competitive balance and uncertainty of outcome in a sports league or leagues of your choice.
2

2. Sports Analytics
Discuss the contribution sports analytics has made to an issue in sports and/or sports business. You may choose from any sport, or several sports. You ay provide a historical overview of the sports analytics landscape detailing key moments and dates in the evolution of sports analytics. Use examples where applicable. You may also comment on where, how and why you think analytics may be used in the future.

3

3. Integrity in sport
Consider the problem of integrity in sport. How are economic ideas and analytics used to identify, and reduce, match fixing in sport? Your essay might include a description of a forecasting model in a sport of your choice. Please consider issues such as: betting market efficiency; characteristics of sports, or individuals within a sport, more at risk of match fixing and corruption; examples of economics and analytics being used in real sports in the fight against match fixing.
4

4. Rating in sports
Rating competitors (teams and players) in sport is a key idea in sports analytics. Discuss why rating is important, its uses, and any advantages over other competitors that can be gained. Describe a ratings model of your choice, for players, or teams, in any sport you wish. You should critique the ratings model, identifying any strengths and weaknesses.
5

Style
Your report should be written in the style of a report for a consultancy.

Typically, this would include sections such as:
Background of the problem (including a motivation for writing on this topic),
Past Work (also known as a literature review),
Methods (presenting any technical know-how used to address the problem)
Results and Conclusions (including how the solution may be improved, how the issue might manifest itself in other sports/industries, and what general lessons can be learned from the report).

You must use citations and referencing.

There are many sources of information – be adventurous.
Google scholar, the library search system, Wikipedia, blogs, etc.
6

Some tips
Do:
Spellcheck (the red wavey lines mean something)
Grammar check (the blue wavey lines mean something!)
Presentation matters: numbered pages, numbered sections (clearly indicated), figures and tables sequentially numbered with captions explaining what is in the table or figure.
Consider the structure before you write: does is make sense to the reader? Is this the best way to get your point across?
Critique the literature (what is good and what is bad?)
Consider your objectives

Don’t:
Waffle – padding out a report is such an obvious trick. 2,500 is a maximum word count, each word, sentence and paragraph should be carefully chosen.
Use unrelated references to make it look like you have read lots.
Print pages and pages of data in an appendix. Offer an email address if the reader would like to obtain a copy of any data.

7

CS计算机代考程序代写 cache algorithm data structure assembly AI x86 concurrency compiler prolog Static Scheduling & VLIW 15-740

Static Scheduling & VLIW 15-740
Prof. Nathan Beckmann
(Original slides by Onur Mutlu, edited by Seth Goldstein)
Carnegie Mellon University

Reprise of dynamic scheduling
n
2

DO WE REALLY NEED ALL THIS COMPLEX HARDWARE?
HOW FAR CAN WE GET WITHOUT IT?
3

Key Questions
Q1. How do we find independent instructions to fetch/execute?
Q2. How do we enable more compiler optimizations?
e.g., common subexpression elimination, constant propagation, dead code elimination, redundancy elimination, …
Q3. How do we increase the instruction fetch rate?
i.e., have the ability to fetch more instructions per cycle
4

Key Questions
Q1. How do we find independent instructions to fetch/execute?
Q2. How do we enable more compiler optimizations?
e.g., common subexpression elimination, constant propagation, dead code elimination, redundancy elimination, …
Q3. How do we increase the instruction fetch rate?
i.e., have the ability to fetch more instructions per cycle
A: Enabling the compiler to optimize across a larger number of instructions that will be executed straight line (without branches getting in the way) eases all of the above
5

Very long instruction word – VLIW
n Compiler does the scheduling statically
n Simple hardware with multiple function units
q Reduced hardware complexity
q Little or no scheduling done in hardware, e.g., in-order q Hopefully, faster clock and less power
n Compiler required to group and schedule instructions (compare to OoO superscalar)
q Predicated instructions to help with scheduling (trace, etc.) q More registers (for software pipelining, etc.)
6

VLIW example – 2-cycle loads
n RISC code MUL R1, R3, 3
LD R4, 0(R1)
ADD R2, R2, R4
SUB R3, R3, 1
BNEZ R3, -4
n VLIW code MUL R1, R3, 3
LD R4, 0(R1)
NOP
ADD R2, R2, R4
SUB R3, R3, 1
NOP
NOP
BNEZ R3, -4
7

Very long instruction word – VLIW
n Compiler does the scheduling statically
n Simple hardware with multiple function units
n Compiler required to group and schedule instructions (compare to OoO superscalar)
n Example machines:
q Multiflow, Cydra 5 (8-16 ops per VLIW)
q IA-64 Itanium (3 ops per bundle, “EPIC” VLIW variant) q TMS32xxxx (5+ ops per VLIW, DSPs)
q Crusoe (4 ops per VLIW, JIT x86 è VLIW)
q Tilera TILEPro (3 ops per VLIW, 100+ cores)
8

Comparison between SS « VLIW
From Mark Smotherman, “Understanding EPIC Architectures and Implementations”

Comparison: CISC, RISC, VLIW
VLIW is a natural extension of RISC ideas to superscalar

VLIW relies on compiler for ILP
n
11

VLIW: Finding Independent Operations
n Within a basic block, there is limited instruction-level parallelism
n To find multiple instructions to be executed in parallel, the compiler needs to consider multiple basic blocks
n Problem: Moving an instruction above a branch is unsafe because instruction is not guaranteed to be executed
n Idea: Enlarge blocks at compile time by finding the frequently-executed paths
q Trace scheduling
q Superblock scheduling q Hyperblock scheduling q Software Pipelining
It’s all about the compiler and how to schedule the instructions to maximize parallelism
12

List Scheduling: For 1 basic block
n Idea:Assignprioritytoeachinstruction
n Initializereadylistthatholdsallreadyinstructions
n Choose one ready instruction I from ready list with the highest priority
n InsertIintoschedule
q Ensuring resources are available (structural hazards)
n Addthoseinstructionswhoseprecedenceconstraintsare now satisfied into the ready list
13

Data Precedence Graph
i1 i2 i5 i6 i7 i10 i11 i12
222222 i3 2 i8 2 i13
2
22
i4
i9
i14
4
4 i15
2 i16
14

Instruction Prioritization Heuristics
n Number of descendants in precedence graph
n Maximum latency from root node of precedence graph n Length of operation latency
n Ranking of paths based on importance
n Some combination of above
15

VLIW List Scheduling
n Assign priorities – critical path in this example
n Compute data ready list (DRL) – all operations whose predecessors
have been scheduled.
n Select from DRL in priority order while checking resource constraints n Add newly ready operations to DRL and repeat for next instruction
5
1
23334
23456
223
789
112
1011 12
1
4-wide VLIW
Data Ready List
1
{1}
6
3
4
5
{2,3,4,5,6}
9
2
7
8
{2,7,8,9}
12
10
11
{10,11,12}
13
{13}
13
16

VLIW List Scheduling
5
1 2233435364
4-wide VLIW
2
2
3
2
1
13
789
11
1011 12
17

VLIW List Scheduling
5
1 2233435364
4-wide VLIW
1
6
3
4
5
9
2
7
8
12
10
11
13
2
2
3
2
1
13
789
11
1011 12
18

VLIW List Scheduling+Structural Hazards
5
1 2233435364
4-wide VLIW
2
2
3
2
1011 12
1
13
789 11
19

VLIW List Scheduling+Structural Hazards
5
1 2233435364
4-wide VLIW
1
2
3
6
5
9
4
7
8
12
10
11
13
2
2
3
2
1011 12
1
13
789 11
20

Extending the scheduling domain
n Basic block is too small to get any real parallelism q Recall: 88% of OOO benefit from speculation è
larger scheduling window
n How to extend the basic block?
q Why do we have basic blocks in the first place?
q Loops
n Loop unrolling
n Software pipelining
q Non-loops
n Will almost always involve some speculation n Thus profiling may be very important
[MTZ’13]
21

Safety and Legality in Code Motion
n Two characteristics of speculative code motion:
q Safety: whether or not spurious exceptions may occur q Legality: whether or not result will be always correct
n Four possible types of code motion:
r4 = r1 …
r1 = …
r1 = r2 & r3
r1 = r2 & r3
(a) safe and legal
(b) illegal
r1 = …
r1 = load A
r4 = r1 …
r1 = load A
(c) unsafe
(d) unsafe and illegal
22

Code Movement Constraints
n Downward
q When moving an operation from a BB to one of its dest BB’s,
n all the other dest basic blocks should still be able to use the result of the operation
n the other source BB’s of the dest BB should not be disturbed n Upward
q When moving an operation from a BB to its source BB’s n register values required by the other dest BB’s must not be
destroyed
n the movement must not cause new exceptions
23

Trace Scheduling
n Trace: A frequently executed path in the control-flow graph (has multiple side entrances and multiple side exits)
n Idea: Find independent operations within a trace to pack into VLIW instructions.
q Traces determined via profiling
q Compiler adds fix-up code for correctness (if a side entrance or side exit of a trace is exercised at runtime, corresponding fix-up code is executed)
24

Trace Scheduling Idea
25

Trace Scheduling (II)
n There may be conditional branches from the middle of the trace (side exits) and transitions from other traces into the middle of the trace (side entrances).
n These control-flow transitions are ignored during trace scheduling.
n After scheduling, fix-up/bookkeeping code is inserted to ensure the correct execution of off-trace code.
n Fisher, “Trace scheduling: A technique for global microcode compaction,” IEEE TC 1981.
26

Trace Scheduling (III)
Instr 1
Instr 2
Instr 3
Instr 4
Instr 5
Instr 2
Instr 3
Instr 4
Instr 1
Instr 5
What bookkeeping is required when Instr 1
is moved below the side entrance in the trace?
27

Trace Scheduling (IV)
Instr 3
Instr 4
Instr 1
Instr 2
Instr 3
Instr 4
Instr 5
Instr 2
Instr 3
Instr 4
Instr 1
Instr 5
28

Trace Scheduling (V)
Instr 1
Instr 2
Instr 3
Instr 4
Instr 5
Instr 1
Instr 5
Instr 2
Instr 3
Instr 4
What bookkeeping is required when Instr 5 moves above the side entrance in the trace?
29

Trace Scheduling (VI)
Instr 5
Instr 1
Instr 2
Instr 3
Instr 4
Instr 5
Instr 1
Instr 5
Instr 2
Instr 3
Instr 4
30

Trace Scheduling Fixup Code Issues
n Sometimes need to copy instructions more than once to ensure correctness on all paths (see C below)
Original trace
A
B X C
D A’ B’ C’ Y Scheduled B
trace E DYA
C’’’ E’’ D’’
EC
BX Correctness C
B’’ X
DY
31

Trace Scheduling Overview
n Trace Selection
q select seed block (the highest frequency basic block) q extend trace (along the highest frequency edges)
forward (successor of the last block of the trace)
backward (predecessor of the first block of the trace) q don’t cross loop back edge
q bound max_trace_length heuristically
n Trace Scheduling
q build data precedence graph for a whole trace
q perform list scheduling and allocate registers
q add compensation code to maintain semantic correctness
n Speculative Code Motion (upward)
q move an instruction above a branch if safe
32

Trace Scheduling Example (I)
B1
9 stalls
B3
1 stall
fdiv f1, f2, f3 fadd f4, f1, f5 beq r1, $0
fdiv f1, f2, f3 fadd f4, f1, f5
beq r1, $0
990
ld r2, 0(r3)
990
800
800
10
r2 and f2 not live out
B3
f2 not live out
B6
B2
ld r2, 4(r3)
10
ld r2, 0(r3)
add r2, r2, 4 beq r2, $0
fsub f2, f2, f6 st.d f2, 0(r8)
add r3, r3, 4 add r8, r8, 4
add r2, r2, 4 beq r2, $0
B4
B5
200
fsub f2, f3, f7
200
B7
B6
1 stall
fsub f2, f2, f6 st.d f2, 0(r8)
add r3, r3, 4 add r8, r8, 4
33

Trace Scheduling Example (I)
9 stalls
fdiv f1, f2, f3 fadd f4, f1, f5 beq r1, $0
ld r2, 0(r3)
add r2, r2, 4 beq r2, $0
fsub f2, f2, f6 st.d f2, 0(r8)
B6
fdiv f1, f2, f3 beq r1, $0
ld r2, 0(r3) fsub f2, f2, f6
add r2, r2, 4 beq r2, $0
st.d f2, 0(r8)
add r3, r3, 4 add r8, r8, 4 fadd f4, f1, f5
1 stall
r2 and f2 not live out
B3 0 stall 0 stall
f2 not live out
1 stall
1 stall
add r3, r3, 4 add r8, r8, 4
B3 B6
34

Trace Scheduling Example (II)
fdiv f1, f2, f3 beq r1, $0
ld r2, 0(r3) fsub f2, f2, f6
add r2, r2, 4 beq r2, $0
st.d f2, 0(r8)
add r3, r3, 4
add r8,r8,4
fadd f4, f1, f5 fadd f4, f1, f5
B6
0 stall 0 stall
1 stall
fdiv f1, f2, f3 beq r1, $0
ld r2, 0(r3) fsub f2, f2, f6
add r2, r2, 4 beq r2, $0
st.d f2, 0(r8)
fadd f4, f1, f5
Split comp. code
fadd f4, f1, f5
add r3, r3, 4
B3 add r8,r8,4 B3
B6
35

Trace Scheduling Example (III)
fdiv f1, f2, f3 beq r1, $0
ld r2, 0(r3) fsub f2, f2, f6 add r2, r2, 4 beq r2, $0
st.d f2, 0(r8)
fadd f4, f1, f5
Split comp. code
fadd f4, f1, f5
add r3, r3, 4 add r8, r8, 4 fadd f4, f1, f5
B3
B6
add r3, r3, 4 add r8, r8, 4
Join comp. code
36

Trace Scheduling Example (IV)
fdiv f1, f2, f3 beq r1, $0
ld r2, 0(r3) fsub f2, f2, f6 add r2, r2, 4 beq r2, $0
st.d f2, 0(r8)
add r3, r3, 4 add r8, r8, 4 fadd f4, f1, f5
fadd f4, f1, f5
Split comp. code
fadd f4, f1, f5
B3
add r2, r2, 4 beq r2, $0
fsub f2, f2, f6 st.d f2, 0(r8)
add r3, r3, 4 add r8, r8, 4
B6
Copied split instructions
add r3, r3, 4 add r8, r8, 4
Join comp. code
37

Trace Scheduling Example (V)
ld r2, 0(r3) fsub f2, f2, f6 add r2, r2, 4
beq r2, $0
fdiv f1, f2, f3 beq r1, $0
fadd f4, f1, f5
ld r2, 4(r3)
add r2, r2, 4 beq r2, $0
B3
st.d f2, 0(r8) add r3, r3, 4 add r8, r8, 4 fadd f4, f1, f5
fadd f4, f1, f5
B6
fsub f2, f2, f6 st.d f2, 0(r8)
add r3, r3, 4 add r8, r8, 4
fsub f2, f3, f7
add r3, r3, 4 add r8, r8, 4
38

Trace Scheduling Tradeoffs
n Advantages
+ Enables the finding of more independent instructions à fewer
NOPs in a VLIW instruction
n Disadvantages
— Profile dependent
— What if dynamic path deviates from trace à lots of NOPs in the VLIW instructions
— Code bloat and additional fix-up code executed — Due to side entrances and side exits
— Infrequent paths interfere with the frequent path — Effectiveness depends on the bias of branches
— Unbiased branches à smaller traces à less opportunity for finding independent instructions
39

Superblock Scheduling
n Trace: multiple entry, multiple exit block
n Superblock: single-entry, multiple exit block
q A trace with side entrances are eliminated
q Infrequent paths do not interfere with the frequent path
+ More optimization/scheduling opportunity than traces
+ Eliminates “difficult” bookkeeping due to side entrances
Hwu+, “The Superblock: An Effective Technique for VLIW and superscalar compilation,” J of SC 1991.
40

Superblock example
opA: mul r1,r2,3
99
opC: mul r3,r2,3
opA: mul r1,r2,3
99 opB: add r2,r2,1
11
opB: add r2,r2,1
opC’: mul r3,r2,3
opC: mul r3,r2,3
1
Original Code
Code After Superblock Formation
opA: mul r1,r2,3
99
opC: mov r3,r1
1
opB: add r2,r2,1
opC’: mul r3,r2,3
Code After Common Subexpression Elimination
41

Superblock Scheduling Shortcomings
— Still profile-dependent
— No single frequently executed path if there is an unbiased
branch
— Reduces the size of superblocks
— Code bloat and additional fix-up code executed — Due to side exits
42

Hyperblock Scheduling
n Idea: Use predication support to eliminate unbiased branches and increase the size of superblocks
n Hyperblock: A single-entry, multiple-exit block with internal control flow eliminated using predication (if-conversion)
n Advantages
+ Reduces the effect of unbiased branches on scheduled block size
n Disadvantages
— Requires predicated execution support
— All disadvantages of predicated execution
43

Hyperblock Formation (I)
n Hyperblock formation 1. Block selection
2. Tail duplication
3. If-conversion
10
90 80
80 20
20
n Block selection
q Select subset of BBs for inclusion in HB
q Difficult problem 10
q Weighted cost/benefit function n Height overhead
n Resource overhead
n Dependency overhead
n Branch elimination benefit n Weighted by frequency
90 10
10
n Mahlke et al., “Effective Compiler Support for Predicated Execution Using the Hyperblock,” MICRO 1992.
BB1
BB2
BB3
BB4
BB5
BB6
44

Hyperblock Formation (II)
Tail duplication same as with Superblock formation 10
BB1
80
BB2
80
10
90 10
BB6
10
20
BB1
80
BB2
80
20
BB3
20
90
10
BB3
20
BB4
BB4
10
BB5
BB5
90
BB6
81 9
10 19
BB6’
45

Hyperblock Formation (III)
10
BB1
80
BB2
80
If-convert (predicate) intra-hyperblock branches 10
BB1 p1,p2 = CMPP
BB2 if p1
BB3 if p2
BB4
BB6
BB4
20
BB3
20
90
10
10
BB5
BB6
81 9
9
1
10
81
9
1
BB6’
BB5
BB6’
46

WHAT ABOUT MEMORY?
47

Non-Faulting Loads and Exception Propagation
ld.s r1=[a]
inst 1 inst 2 …. br
inst 1 inst 2 ….
br
unsafe code
motion
….
ld r1=[a]
use=r1
n ld.s fetches speculatively from memory
i.e. any exception due to ld.s is suppressed
n If ld.s r1 did not cause an exception then chk.s r1 is a NOP, else a branch is taken (to execute some compensation code)
….
chk.s r1
use=r1
ld r1=[a]
48

Non-Faulting Loads and Exception Propagation in IA-64
ld.s r1=[a]
inst 1 inst 2 use=r1 ….
br
inst 1 inst 2 ….
br br
unsafe code
motion
….
ld r1=[a] use=r1
n Load data can be speculatively consumed prior to check
n “speculation” status is propagated with speculated data
n Any instruction that uses a speculative result also becomes speculative itself (i.e. suppressed exceptions)
n chk.s checks the entire dataflow sequence for exceptions
….
chk.s use
ld r1=[a] use=r1
49

Aggressive ST-LD Reordering in IA-64
inst 1
inst 2
….
st [?]
st[?] ….
ld r1=[x]
use=r1
ld.a r1=[x]
inst 1
inst 2
….
st [?]
….
ld.c r1=[x] use=r1
potential aliasing
n ld.a starts the monitoring of any store to the same address as the advanced load
n If no aliasing has occurred since ld.a, ld.c is a NOP n If aliasing has occurred, ld.c re-loads from memory
50

Aggressive ST-LD Reordering in IA-64
inst 1
inst 2
….
st [?]
st[?] ….
ld r1=[x] use=r1
ld.a r1=[x]
inst 1 inst 2 use=r1 ….
st [?] …. chk.a X ….
potential aliasing
ld r1=[a] use=r1
51

Can We Do Better?
n Hyperblock still
q Profile dependent
q Requires fix-up code
q And, requires predication support
n Single-entry, single-exit enlarged blocks
q Block-structured ISA
n Optimizes multiple paths (can use predication to enlarge blocks) n No need for fix-up code (duplication instead of fixup)
52

Block Structured ISA
n Blocks (> instructions) are atomic (all-or-none) operations q Either all of the block is committed or none of it
n Compiler enlarges blocks by combining basic blocks with their control flow successors
q Branches within the enlarged block converted to “fault” operations à if the fault operation evaluates to true, the block is discarded and the target of fault is fetched
Melvin and Patt, “Enhancing Instruction Scheduling with a Block-Structured ISA,” IJPP 1995.
53

Block Structured ISA (II)
n Advantages:
+ Larger atomic blocks à larger units can be fetched from I-cache
+ Aggressive compiler optimizations (e.g. reordering) can be enabled within atomic blocks (no side entries or exits)
+ Can explicitly represent dependencies among operations within an enlarged block
n Disadvantages:
— “Fault operations” can lead to work to be wasted (atomicity)
— Code bloat (multiple copies of the same basic block exists in the binary and possibly in I-cache)
— Need to predict which enlarged block comes next n Optimizations
q Within an enlarged block, the compiler can perform optimizations that cannot normally/easily be performed across basic blocks
54

Block Structured ISA (III)
n Hao et al., “Increasing the instruction fetch rate via block- structured instruction set architectures,” MICRO 1996.
55

Superblock vs. BS-ISA
n Superblock
q Single-entry, multiple exit code block
q Not atomic
q Compiler inserts fix-up code on superblock side exit
n BS-ISA blocks
q Single-entry, single exit
q Atomic
q Need to roll back to the beginning of the block on fault
56

Superblock vs. BS-ISA
n Superblock
+ No ISA support needed
— Optimizes for only 1 frequently executed path
— Not good if dynamic path deviates from profiled path à missed opportunity to optimize another path
n Block Structured ISA
+ Enables optimization of multiple paths and their dynamic selection.
+ Dynamic prediction to choose the next enlarged block. Can dynamically adapt to changes in frequently executed paths at run- time
+ Atomicity can enable more aggressive code optimization — Code bloat becomes severe as more blocks are combined
— Requires “next enlarged block” prediction, ISA+HW support — More wasted work on “fault” due to atomicity requirement
57

Summary and Questions
n Trace, superblock, hyperblock, block-structured ISA
n How many entries, how many exits does each of them have?
q What are the corresponding benefits and downsides?
n What are the common benefits?
q Enable and enlarge the scope of code optimizations q Reduce fetch breaks; increase fetch rate
n What are the common downsides?
q Code bloat (code size increase)
q Wasted work if control flow deviates from enlarged block’s path
58

VLIW Summary
n Heavy reliance on compiler (push RISC to the extreme) q Compiler algorithms (e.g., software pipelining) have lasting
impact outside of VLIW
n Is there enough statically knowable parallelism?
q E.g., memory aliasing and branch bias n What about wasted FUs? Code bloat?
q Code size is already a big problem with x86 apps!
n Architecture joke: “VLIW is the architecture of the future,
and always will be.”
n Yet many DSPs are VLIW. Why?
59

WHAT ABOUT LOOPS?
60

The problem with loops
n Consider the following code:
for (int i = 0; i < N; i++) { b[i] = b[i] * b[i]; } 61 The problem with loops n Consider the following code: for (int i = 0; i < N; i++) { b[i] = b[i] * b[i]; } n RISC assembly (LD & MUL 2-cycles) LOOP: Useful work Loop overhead LD R1, 0(R3) MUL R2, R1, R1 ST R2, 0(R3) ADD R3, R3, 4 BLT R3, R4, LOOP è 7 cycles per iteration 62 The problem with loops n Consider the following code: for (int i = 0; i < N; i++) { b[i] = b[i] * b[i]; } n VLIW assembly (1 ALU, 1 LD/ST; 2-cycles) LOOP: è5 cycles per iteration NOP NOP LD R1, 0(R3) NOP NOP NOP Useful work MUL R2, R1, R1 ADD R,3 R,3 4 BLT R3, R4, LOOP ST R2, -4(R3) Loop overhead 63 Amortize overheads by unrolling loops n Key idea is to schedule the following code instead: for (int i = 0; i < N; i+=4) { b[i+0] = b[i+0] * b[i+0]; b[i+1] = b[i+1] * b[i+1]; b[i+2] = b[i+2] * b[i+2]; b[i+3] = b[i+3] * b[i+3]; } Loop unrolling è Larger scheduling block è Better schedule 64 Amortize overheads by unrolling loops n VLIW assembly LOOP: NOP NOP è 2 cycles per iteration LD R1, 0(R9) LD R3, 4(R9) LD R5, 8(R9) LD R7, 12(R9) ST R2, 0(R9) ST R4, 4(R9) ST R6, 8(R9) ST R8, -4(R9) MUL R2, R1, R1 MUL R4, R3, R3 MUL R6, R5, R5 MUL R8, R7, R7 ADD R9, R9, 16 BLT R9, R10 LOOP Loop overhead Useful work 65 Correctness of loop unrolling n Is this transformation legal? for (int i = 0; i < N; i++) { b[i] = b[i] * b[i]; } for (int i = 0; i < N; i+=4) { b[i+0] = b[i+0] * b[i+0]; b[i+1] = b[i+1] * b[i+1]; b[i+2] = b[i+2] * b[i+2]; b[i+3] = b[i+3] * b[i+3]; } 66 Correctness of loop unrolling n Instead, schedule the following code: int i; for (i = 0; i+3 < N; i+=4) { b[i+0] = b[i+0] * b[i+0]; b[i+1] = b[i+1] * b[i+1]; b[i+2] = b[i+2] * b[i+2]; b[i+3] = b[i+3] * b[i+3]; } for (; i < N; i++) { b[i] = b[i] * b[i]; } 67 Loop unrolling summary n Advantages q Reduces loop overhead q Improves code schedule within loop n (eg, hiding MUL latency in example) n Disadvantages q Increases code size q Less effective on loops with internal branches n Can use predication … more on this later 68 Loop Unrolling n Idea: Replicate loop body multiple times within an iteration + Reduces loop maintenance overhead q Induction variable increment or loop condition test + Enlarges basic block (and analysis scope) q Enables code optimization and scheduling opportunities — What if iteration count not a multiple of unroll factor? (need extra code to detect this) — Increases code size 69 Can we do better? n VLIW assembly (ALU + LD/ST) LOOP: NOP NOP LD R1, 0(R9) LD R3, 4(R9) LD R5, 8(R9) LD R7, 12(R9) ST R2, 0(R9) ST R4, 4(R9) ST R6, 8(R9) ST R8, -4(R9) MUL R2, R1, R1 MUL R4, R3, R3 MUL R6, R5, R5 MUL R8, R7, R7 ADD R9, R9, 16 BLT R9, R10 LOOP è 2 cycles per iteration Loop overhead Useful work Not in this case – LD/ST unit is at 100% utilization 70 Can we do better? n VLIW assembly (3-wide) LOOP: NOP NOP NOP NOP NOP LD R1, 0(R9) LD R3, 4(R9) LD R5, 8(R9) LD R7, 12(R9) MUL R2, R1, R1 MUL R4, R3, R3 MUL R6, R5, R5 MUL R8, R7, R7 ST R2, 0(R9) ST R4, 4(R9) ST R6, 8(R9) ST R8, 12(R9) NOP ADD R9, R9, 16 BLT R9, R10, LOOP NOP Loop overhead è 7/4 cycles per iteration Useful work 71 Can we do better? n VLIW assembly (3-wide) LOOP: NOP NOP NOP NOP NOP LD R1, 0(R31) LD R3, 4(R31) LD R5, 8(R31) LD R7, 12(R31) LD R9, 12(R31) LD R11, 12(R31) LD R13, 12(R31) LD R15, 12(R31) MUL R2, R1, R1 MUL R4, R3, R3 MUL R6, R3, R3 MUL R8, R3, R3 MUL R10, R3, R3 MUL R12, R3, R3 MUL R14, R5, R5 MUL R16, R7, R7 ST R2, 0(R31) ST R4, 4(R31) ST R6, 8(R31) ST R8, 12(R31) ST R10, 16(R31) ST R12, 20(R31) ST R14, 24(R31) ST R16, -4(R31) NOP ADD R31, R31, 32 BLT R31, R10, LOOP NOP è 11/8 cycles per iteration Loop overhead Useful work …but code & register bloat 72 Can we do better than unrolling? n VLIW assembly (3-wide) LOOP: NOP NOP Ramp-up & ramp-down overhead NOP Loop overhead Useful work NOP NOP NOP LD R1, 0(R31) LD R3, 4(R31) LD R5, 8(R31) LD R7, 12(R31) LD R9, 12(R31) LD R11, 12(R31) LD R13, 12(R31) LD R15, 12(R31) MUL R2, R1, R1 MUL R4, R3, R3 MUL R6, R3, R3 MUL R8, R3, R3 MUL R10, R3, R3 MUL R12, R3, R3 MUL R14, R5, R5 MUL R16, R7, R7 ST R2, 0(R31) ST R4, 4(R31) ST R6, 8(R31) ST R8, 12(R31) ST R10, 16(R31) ST R12, 20(R31) ST R14, 24(R31) ST R16, -4(R31) NOP ADD R31, R31, 32 BLT R31, R10, LOOP 73 Can we do better than unrolling? n VLIW assembly (3-wide) LOOP: NOP NOP Ramp-up & ramp-down overhead NOP NOP NOP LD R1, 0(R31) LD R3, 4(R31) LD R5, 8(R31) LD R7, 12(R31) LD R9, 12(R31) LD R11, 12(R31) LD R13, 12(R31) LD R15, 12(R31) MUL R2, R1, R1 MUL R4, R3, R3 MUL R6, R3, R3 MUL R8, R3, R3 MUL R10, R3, R3 MUL R12, R3, R3 MUL R14, R5, R5 MUL R16, R7, R7 ST R2, 0(R31) ST R4, 4(R31) ST R6, 8(R31) ST R8, 12(R31) ST R10, 16(R31) ST R12, 20(R31) ST R14, 24(R31) ST R16, -4(R31) NOP ADD R31, R31, 32 BLT R31, R10, LOOP Perfect efficiency NOP Loop overhead Useful work Goal: Maintain peak efficiency, w/out ramp-up/down 74 Software Pipelining n Idea: Move instructions across iterations of the loop q Very large improvements in running time are possible n E.g., 5-wide VLIW LOOP: { LD R1, 0(R9) NOP NOP NOP LD R1, 4(R9) MUL R2, R1, R1 ADD R9, R9, 4 NOP ADD R9, R9, 4 BGE R9, R10, END Current iteration One iteration ago Two iterations ago NOP SUB R10, R10, 4 LD R1, 0(R9) MUL R2, R1, R1 ST R2, -8(R9) ADD R9, R9, 4 BLT R9, R10, LOOP } NOP ST R2, -4(R9) NOP ST R2, 0(R9) END: NOP NOP NOP NOP NOP MUL R2, R1, R1 NOP NOP 15-745 © Seth Copen Goldstein 2000-5 75 Software Pipelining n Idea: Move instructions across iterations of the loop q Very large improvements in running time are possible n E.g., 5-wide VLIW LD R1, 0(R9) NOP NOP NOP LD R1, 4(R9) MUL R2, R1, R1 NOP ADD R9, R9, 4 NOP ADD R9, R9, 4 BGE R9, R10, END è 1 cycle per iteration } NOP ST R2, -4(R9) NOP ST R2, 0(R9) LOOP: Perfect efficiency END: { SUB R10, R10, 4 LD R1, 0(R9) NOP NOP NOP NOP NOP MUL R2, R1, R1 NOP NOP MUL R2, R1, R1 ST R2, -8(R9) ADD R9, R9, 4 BLT R9, R10, LOOP 15-745 © Seth Copen Goldstein 2000-5 76 Goal of SP n Increase distance between dependent operations by moving destination operation to a later iteration A:a¬ ld[d] B:b¬ a*a C: st [d], b D:d¬ d+4 Assume all have latency of 2 A B C D 15-745 © Seth Copen Goldstein 2000-5 77 Can we decrease the latency? n Lets unroll A: a¬ld[d] B: b¬a*a C: st [d], b D: d¬d+4 A1: a ¬ ld [d] B1: b ¬ a * a C1: st [d], b D1: d¬ d+4 A B C D A1 B1 C1 D1 15-745 © Seth Copen Goldstein 2000-5 78 Rename variables A: a¬ ld[d] B: b¬ a*a C: st [d], b D: d1¬d+4 A1: a1 ¬ ld [d1] B1: b1¬ a1*a1 C1: st [d1], b1 D1:d¬ d1+4 A B C D A1 B1 C1 D1 15-745 © Seth Copen Goldstein 2000-5 79 Schedule A: a¬ ld[d] B: b¬ a*a C: st [d], b D: d1¬d+4 A1: a1 ¬ ld [d1] B1: b1¬ a1*a1 C1: st [d1], b1 D1:d¬ d1+4 AD B A1 C B1 D1 C1 A B C D1 D A1 B1 C1 15-745 © Seth Copen Goldstein 2000-5 80 Unroll Some More A: a¬ B: b¬ C: D: d1¬ A1: a1 ¬ B1: b1¬ C1: D1: d2¬ A2: a2 ¬ B2: b2¬ C2: D2: d ¬ ld[d] a*a st [d], b d+4 ld [d1] a1*a1 st [d1], b1 d1+4 ld [d2] a2*a2 st [d2], b2 d2 + 4 AD B C A1 D1 B1 A2 C1 C2 D2 B2 A B C D2 D A1 B1 C1 D1 A2 B2 C2 15-745 © Seth Copen Goldstein 2000-5 81 Unroll Some More A: a ¬ B: b¬ C: D: d1¬ A1: a1 ¬ B1: b1¬ C1: D1: d2¬ A2: a2 ¬ B2: b2¬ C2: D2: d¬ ld [d] a*a st [d], b d+4 ld [d1] a1*a1 st [d1], b1 d1+4 ld [d2] a2*a2 st [d2], b2 d2+4 AD B C A1 D1 B1 A2 D2 C1 B2 A3 C2 B3 C3 A B C D3 D A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 15-745 © Seth Copen Goldstein 2000-5 82 A B C D4 D A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 One More Time AD B C A1 D1 B1 A2 D2 C1 B2 A3 C2 B3 C3 D3 A4 B4 C4 15-745 © Seth Copen Goldstein 2000-5 83 A B C D4 D A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 A4 B4 C4 Can Rearrange AD B C A1 D1 B1 A2 D2 C1 B2 A3 C2 B3 C3 D3 A4 B4 C4 15-745 © Seth Copen Goldstein 2000-5 84 Rearrange AD B C A1 D1 B1 A2 D2 C1 B2 A3 C2 B3 C3 A B C D3 D A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 D3 15-745 © Seth Copen Goldstein 2000-5 85 Rearrange AD B C A1 D1 B1 A2 D2 C1 B2 A3D3 C2 B3 C3 A B C D3 D A1 B1 C1 D1 A2 B2 C2 D2 A3 B3 C3 15-745 © Seth Copen Goldstein 2000-5 86 SP Loop A: a ¬ B: b¬ D: d1¬ A1: a1 ¬ D1: d2¬ C: B1: b1¬ A2: a2 ¬ D2: d¬ B2: b2¬ C1: D3: d2¬ C2: ld [d] a*a d+4 ld [d1] d1+4 st [d], b a1*a1 ld [d2] d2+4 a2*a2 st [d1], b1 d1+4 st [d2], b2 Prolog Body Epilog A B C C C D3 D A1 B1 B1 B1 C1 D1 A2 A2 A2 B2 C2 D2 D2 D2 15-745 © Seth Copen Goldstein 2000-5 87 Goal of Software Pipelining n Increase distance between dependent operations by moving destination operation to a later iteration A B C dependencies in initial loop A after SP B C i+1 i+2 iteration i 15-745 © Seth Copen Goldstein 2000-5 88 Goal of Software Pipelining n Discover ILP across iterations! A0 A0 A1 B0 A2 B1 C0 A3 B2 C1 B3 C2 C3 A1 B0 Ai Bi-1 Ci-2 Bi Ci-1 Ci 15-745 © Seth Copen Goldstein 2000-5 89 Example Assume operating on a infinite wide machine A0 A1 B0 Prolog loop body epilog Ai Bi-1 Ci-2 Bi Ci-1 Ci 15-745 © Seth Copen Goldstein 2000-5 90 Dealing with exit conditions for (i=0; i= N) goto done
A0
B0
if (i+1 == N) goto last i=1
A1
if (i+2 == N) goto epilog i=2
{
}
loop: Ai Ai
Bi Bi-1 Ci Ci-2
i++
if (i < N) goto loop epilog: Bi Ci-1 last: ci done: 15-745 © Seth Copen Goldstein 2000-5 91 Loop Unrolling vs. Software Pipelining For SuperScalar or VLIW n Loop Unrolling reduces loop overhead n Software Pipelining reduces fill/drain n Best is if you combine them Software Pipelining Loop Unrolling Time 15-745 © Seth Copen Goldstein 2000-5 92 TODO: More complicated SP example 93 Software Pipelining Approaches n The first serious approach to software pipelining was presented by Aiken & Nicolau. q Aiken’s 1988 Ph.D. thesis. q Impractical as it ignores resource hazards (focusing only on data-dependence constraints). n “Iterative Modulo Scheduling” Rau MICRO’94 q Addresses resource constraints q Iterate over different loop lengths until one schedules successfully q Compute loop lower/upper bounds from required & available resources 94 SYSTOLIC ARRAYS 95 Why Systolic Architectures? n Idea: Data flows from the computer memory in a rhythmic fashion, passing through many processing elements before it returns to memory n Similar to an assembly line q Different people work on the same car q Many cars are assembled simultaneously q Can be two-dimensional n Special purpose accelerators/architectures need q Simple, regular designs (keep # unique parts small and regular) q High concurrency à high performance q Balanced computation and I/O (memory access) 96 Systolic Architectures n H. T. Kung, “Why Systolic Architectures?,” IEEE Computer 1982. Memory: heart PEs: cells Memory pulses data through cells 97 Systolic Architectures n Basic principle: Replace a single PE with a regular array of PEs and carefully orchestrate flow of data between the PEs à achieve high throughput w/o increasing memory bandwidth requirements n Differences from pipelining: q Array structure can be non-linear and multi-dimensional q PE connections can be multidirectional (and different speed) q PEs can have local memory and execute kernels (rather than a piece of the instruction) 98 Systolic Computation Example n 99 Systolic Computation Example: Convolution 100 Systolic Computation: Convolution y1 w3 w2 x2 x1 y1 w1 w3 w2 w1 x3 x2 x1 x3 y1 w3 y1 w2 x2 y2 w1 y2 w3 w2 w1 x4 x3 x2 y2 y3 w3 w2 x4 x3 w1 101 Systolic Computation: Convolution y1 w3 w2 w1 x2 x1 102 Systolic Computation: Convolution y1 w3 w2 w1 x3 x2 x1 103 Systolic Computation: Convolution y1 y2 w3 w2 w1 x3 x2 104 Systolic Computation: Convolution y1 y2 w3 w2 w1 x4 x3 x2 105 Systolic Computation: Convolution y1 y2 y3 w3 w2 w1 x4 x3 106 Systolic Computation Example: Convolution n Worthwhile to implement adder and multiplier separately to allow overlapping of add/multiply executions 107 TODO: Example relating SP to systolic architecture for some computation (maybe the convolution) 108 More Programmability n Each PE in a systolic array q Can store multiple “weights” q Weights can be selected on the fly q Eases implementation of, e.g., adaptive filtering n Taken further q Each PE can have its own data and instruction memory q Data memory à to store partial/temporary results, constants q Leads to stream processing, pipeline parallelism n Moregenerally,stagedexecution 109 Pipeline Parallelism 110 File Compression Example 111 Why pipeline parallelism in software? n Pipeline parallelism vs data parallelism q Why split pipeline stages across PEs? q No cycle-time benefit like we got in hardware n Data movement patterns differ q Pipeline parallelism: move input data between PEs q Data parallelism: move task code/data between PEs n Tight feedback loops within single stage q E.g., compression or encryption n Appropriate design depends on application 112 Systolic Array Summary n Advantages q Makes multiple uses of each data item à reduce data fetches q High concurrency q Regular design (both data and control flow) n Disadvantages q Not good at exploiting irregular parallelism q Relatively special purpose à need software, programmer support to be a general purpose model 113 The WARP Computer n HT Kung, CMU, 1984-1988 n Linear array of 10 cells, each cell a 10 Mflop programmable processor n Attached to a general purpose host machine n High-level language and optimizing compiler to program the systolic array n Used extensively to accelerate vision and robotics tasks n Annaratone et al., “Warp Architecture and Implementation,” ISCA 1986. n Annaratone et al., “The Warp Computer: Architecture, Implementation, and Performance,” IEEE TC 1987. 114 The WARP Computer 115 The WARP Computer 116 Resource Constraints n Minimally indivisible sequences, i and j, can execute together if combined resources in a step do not exceed available resources. n R(i) is a resource configuration vector R(i) is the number of units of resource i n r(i) is a resource usage vector s.t. 0 £ r(i) £ R(i) n Each node in G has an associated r(i) 15-745 © Seth Copen Goldstein 2000-5 117 SCALAR REPLACEMENT 118 Scalar Replacement: Example DO I = 1, N DO J = 1, M A(I) = A(I) + B(J) ENDDO ENDDO DO I = 1, N T = A(I) DO J = 1, M T = T + B(J) ENDDO A(I) = T ENDDO n AllloadsandstorestoAintheinner loop have been saved n A(I)canbeleftinaregister throughout the inner loop n Superscalar+cachewillgetmostof n HighchanceofTbeingallocateda this, but not allocate A(I) to register register by compiler Scalar Replacement n Convert array reference to scalar reference n Approach is to use dependences to achieve these memory hierarchy transformations Dependence and Memory Hierarchy n True or Flow – save loads and cache miss n Anti – save cache miss? n Output – save stores n Input – save loads A(I) = … + B(I) … =A(I)+k A(I) = … … = B(I) Dependence and Memory Hierarchy n Loop Carried dependences – Consistent dependences most useful for memory management purposes n Consistent dependences – dependences with constant dependence distance Using Dependences n In the reduction example DO I = 1, N DO J = 1, M A(I) = A(I) + B(J) ENDDO ENDDO DO I = 1, N T = A(I) DO J = 1, M T = T + B(J) ENDDO A(I) = T ENDDO n True dependence – replace the references to A in the inner loop by scalar T n Output dependence – store can be moved outside the inner loop n Antidependence – load can be moved before the inner loop Scalar Replacement n Example: Scalar Replacement in case of loop independent dependence DO I = 1, N A(I) = B(I) + C X(I) = A(I)*Q ENDDO DO I = 1, N t = B(I) + C A(I) = t X(I) = t*Q ENDDO n One less load for each iteration for reference to A Scalar Replacement n Example: Scalar Replacement in case of loop carried dependence spanning single iteration DO I = 1, N A(I) = B(I-1) B(I) = A(I) + C(I) ENDDO tB = B(0) DO I = 1, N tA = tB A(I) = tA tB = tA + C(I) B(I) = tB ENDDO n One less load for each iter for ref to B which had a loop carried true dependence of 1 iter n Also one less load per iter for reference to A Scalar Replacement n Example: Scalar Replacement in case of loop carried dependence spanning multiple iterations DO I = 1, N A(I) = B(I-1) + B(I+1) ENDDO n t1 = B(0) t2 = B(1) DO I = 1, N t3 = B(I+1) A(I) = t1 + t3 t1 = t2 t2 = t3 ENDDO One less load for each iter for ref to B which had a loop carried input dependence of 2 iters Invariants maintained were t1=B(I-1); t2=B(I); t3=B(I+1) n Aiken/Nicolau Scheduling Step 1 Perform scalar replacement to eliminate memory references where possible. for i:=1 to N do a := j Å V[i-1] b := a Å f c := e Å j d := f Å c e := b Å d f := U[i] g: V[i] := b h: W[i] := d j := X[i] for i:=1 to N do a:=jÅb b:=aÅf c:=eÅj d:=fÅc e:=bÅd f := U[i] g: V[i] := b h: W[i] := d j := X[i] 15-745 © Seth Copen Goldstein 2000-5 127 Aiken/Nicolau Scheduling Step 2 Unroll the loop and compute the data-dependence graph (DDG). DDG for rolled loop: for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d f := U[i] g: V[i] := b h: W[i] := d j := X[i] ac j bd gf h e 15-745 © Seth Copen Goldstein 2000-5 128 Aiken/Nicolau Scheduling Step 2, cont’d DDG for unrolled loop: for i:=1 to N do a := j Å b b := a Å f c:=eÅj d:=fÅc e:=bÅd f:=U[i] g: V[i] := b h: W[i] := d j := X[i] a1 c1 b1 d1 g a j1 e h1 121 b f1 c 22 g2 a3 b 3 j2 d 2 f2 e2 c3 h1 15-745 © Seth Copen Goldstein 2000-5 129 Aiken/Nicolau Scheduling Step 3 Build a tableau of iteration number vs cycle time. ac 1 1 b d1 1j iteration 123456 1 acfjfjfjfjfjfj 2bd 3egha 1 h 4 cb g1a2 e115dga f 6ehb b1c 7 cga 2j28db 2d 9 ehga g2a3 2 10 cb fh11 dga 2e112 ehb b3 2 13 cg c 14 d 3 15 eh 15-745 © Seth Copen Goldstein 2000-5 130 cycle Aiken/Nicolau Scheduling Step 3 Build a tableau of iteration number vs cycle time. basically, you’re emulating iteration 1 1 123456 ac a superscalar with infinite 1 acfjfjfjfjfjfj 2 bd 3 egh a 4 cb 5 dga 6 eh b 7cga 8db 9 ehga resources, infinite register b d1 renaming, always predicting 1j the loop-back branch: h g a 1 e 1 121 thus, just pure f1 b2jc2 data dependency 2d g2a3 2 10 cb fh11 dga 2e112 ehb b3 2 13 cg c 14 d 3 15 eh 15-745 © Seth Copen Goldstein 2000-5 131 cycle Aiken/Nicolau Scheduling Step 4 Find repeating patterns of instructions. iteration 123456 1 acfjfjfjfjfjfj 2 bd 34 egh acb 5 dga 6 ehb 7 cg a 8db 9 10 11 12 13 14 15 eh g a cb dga ehb cg d eh 15-745 © Seth Copen Goldstein 2000-5 132 cycle Aiken/Nicolau Scheduling Step 4 Find repeating patterns of instructions. iteration 123456 1 acfjfjfjfjfjfj 2 bd 34 egh acb 5 dga 6 ehb 7 cg a 8db 9 10 11 12 13 14 15 eh g a cb dga ehb cg d eh 15-745 © Seth Copen Goldstein 2000-5 133 cycle Aiken/Nicolau Scheduling Step 4 Find repeating patterns of instructions. iteration 123456 1 acfjfjfjfjfjfj 2 bd 34 egh acb 5 dga 6 ehb 7 cg a 8db Go back and relate slopes to DDG 9 10 11 12 13 14 15 eh g a cb dga ehb cg d eh 15-745 © Seth Copen Goldstein 2000-5 134 cycle Aiken/Nicolau Scheduling Step 5 “Coalesce” the slopes. iteration iteration 123456 123456 1 acfjfjfjfjfjfj 2 bd 3 egh a 4 cb 1 acfj 2 bd 3 egh 4 fj a cb fj dg a eh b fj 5 dga 6 ehb 7 cg a 8db8db 9 10 11 12 13 14 15 eh g a cb dga ehb cg d eh cg a 9 eh g fj 5 6 7 10 ca 11 db 12 eh g 13 c 14 d 15 eh 15-745 © Seth Copen Goldstein 2000-5 135 cycle cycle Aiken/Nicolau Scheduling Step 6 Find the loop body and “reroll” the loop. iteration 123456 1 acfj 2bd 3 egh 4 fj a cb fj dga ehbfj 5 6 7 8db 9 10 11 12 13 14 15 ehg fj ca db eh g c d eh cg a 15-745 © Seth Copen Goldstein 2000-5 136 cycle Aiken/Nicolau Scheduling Step 6 Find the loop body and “reroll” the loop. Prologue/entry code Loop body Epilogue/exit code iteration 123456 1 acfj 2bd fj 3 egh a 4 cb fj 5 dga 6 ehbfj 78 cg a 910 11 db 12 eh g 13 c 14 d 15 eh db ehg fj ca 15-745 © Seth Copen Goldstein 2000-5 137 cycle Aiken/Nicolau Scheduling Step 7 Generate code. (Assume VLIW-like machine for this example. The instructions on each line should be issued in parallel.) L: bi+1 := ai Å fi W[i] := di ai+2 := ji+1 Å bi+1 i := i+1 bN :=aN Å fN-1 W[N-1] := dN-1 w[N]:=dN a1:=j0 Å b0 b1:=a1 Å f0 e1:=b1 Å d1 c2:=e1 Å j1 d2:=f1 Å c2 e2:=b2 Å d2 c3:=e2 Å j2 di:=fi-1Åci ei := bi Å di ci+1:=eiÅji c1:=e0 Å j0 d1:=f0 Å c1 V[1] := b1 b2:=a2 Å f1 V[2] := b2 W[2] := d2 V[3] := b3 f1:=U[1] f2:=U[2] W[1] := d1 f3 := U[3] a3:=j2 Å b3:=a3 Å a4:=j3 Å j1 := X[1] j2 := X[2] a2:=j1 Å b1 j3 := X[3] b2 f2 f4 := U[4] b3 i:=3 j4 := X[4] dN-1 :=fN-2 Å cN-1 eN-1 := bN-1 Å dN-1 cN := eN-1 Å jN-1 dN := fN-1 + cN eN :=bN Å dN V[i+1] := bi+1 v[N] := bN fi+2 := U[I+2] ji+2 := X[i+2] if i
n For an edge, u®v, we must have s(v)-s(u) 3 d(u,v)
15-745 © Seth Copen Goldstein 2000-5 149

Precedence Constraints
n Cyclic: constraint becomes a tuple: q p is the minimum iteration delay
(or the loop carried dependence distance) q d is the delay
n For an edge, u®v, we must have s(v)-s(u) 3 d(u,v)-s*p(u,v)
np30
n If data dependence is
q within an iteration, p=0
q loop-carried across p iter boundaries, p>0
15-745 © Seth Copen Goldstein 2000-5 150

Iterative Approach
n Finding minimum S that satisfies the constraints is NP- Complete.
n Heuristic:
q Find lower and upper bounds for S
q foreach s from lower to upper bound? n Schedule graph.
n If succeed, done
n Otherwise try again (with next higher s)
n Thus: “Iterative Modulo Scheduling” Rau MICRO’94
15-745 © Seth Copen Goldstein 2000-5 151

Iterative Approach
n Heuristic:
q Find lower and upper bounds for S
q foreach s from lower to upper bound
n Schedule graph.
n If succeed, done
n Otherwise try again (with next higher s)
n So the key difference:
q AN88 does not assume S when scheduling
q IMS must assume an S for each scheduling attempt to understand resource conflicts
15-745 © Seth Copen Goldstein 2000-5 152

Lower Bounds
n Resource Constraints: SR
maximum over all resources of # of uses divided by #
available…
n Precedence Constraints: SE
max delay over all cycles in dataflow graph
In practice, one is easy, other is hard.
Tim’s secret approach: just use SR as lower bound, then do binary search for best S
15-745 © Seth Copen Goldstein 2000-5 153

Acyclic Example
a <0,2>
<0,3> c
b <0,1>
Lower Bound: SR=2 Upper Bound: 5
15-745 © Seth Copen Goldstein 2000-5 154

Lower Bound on s
• Assume 1 ALU and 1 MU
• Assume latency Op or load is 1 cycle
a
c
for i:=1 to N do a := j Å b b := a Å f c := e Å j d:=fÅc e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
<1,1> <0,1>
<1,1> <0,1>
<1,1> <0,1>
j
b
h
f
d
g
<0,1>
<0,1>
<1,1>
Resources => 5 cycles Dependencies => 3 cycles
e
15-745 © Seth Copen Goldstein 2000-5 155

Scheduling data structures
To schedule for initiation interval s:
n Create a resource table with s rows and R columns
n Create a vector, s, of length N for n instructions in the loop
q s[n] = the time at which n is scheduled, or NONE
n Prioritize instructions by some heuristic q critical path (or cycle)
q resource critical
15-745 © Seth Copen Goldstein 2000-5 156

Scheduling algorithm
n Pick an instruction, n
n Calculate earliest time due to dependence constraints For all x=pred(n),
earliest = max(earliest, s(x)+d(x,n)-s.p(x,n)) n try and schedule n from earliest to (earliest+s-1)
s.t. resource constraints are obeyed.
q possible twist: deschedule a conflicting node to make way for n, maybe randomly, like sim anneal
n If we fail, then this schedule is faulty (i.e. give up on this s)
15-745 © Seth Copen Goldstein 2000-5 157

Scheduling algorithm – cont.
n We now schedule n at earliest, I.e., s(n) = earliest n Fix up schedule
q Successors, x, of n must be scheduled s.t.
s(x) >= s(n)+d(n,x)-s.p(n,x), otherwise they are removed
(descheduled) and put back on worklist.
n repeat this some number of times until either
q succeed, then register allocate q fail, then increase s
15-745 © Seth Copen Goldstein 2000-5 158

Simplest Example
for () {
a = b+c
b = a*a
c = a*194 }
<1,1>
<1,1> <0,1> <0,1>
b
Resources:
a
1
1
c
What is IIres? What is IIrec?
15-745 © Seth Copen Goldstein 2000-5 159

Simplest Example
for () { 0 a = b+c
b = a*a
c = a*194 1 }
a
Modulo Resource Table:
0 1
b
c
Try II = 2
1
15-745 © Seth Copen Goldstein 2000-5 160

Simplest Example
for () { 0 a = b+c
b = a*a
c = a*194 1 }
a
Try II = 2
Modulo Resource Table:
0 1
1
b
c
1
15-745 © Seth Copen Goldstein 2000-5 161

Simplest Example
for () { 0 a = b+c
b = a*a
c = a*194 1 }
Modulo Resource Table:
0 1
1
1
a
2
Try II = 2
b
c
1
15-745 © Seth Copen Goldstein 2000-5 162

Simplest Example
for () { 0 a = b+c
b = a*a
c = a*194 1 }
a
b
Try II = 2
Modulo Resource Table:
0 1
earliest a: sigma(c) + delay(c) – 2 = 2+1-2 = 1
1
c
2
1
15-745 © Seth Copen Goldstein 2000-5 163

Simplest Example
for () {
a = b+c
b = a*a
c = a*194
0 1 2
}
b
a
Try II = 2
Modulo Resource Table:
0 1
earliest b? scheduled b? what next?
1
c
1
15-745 © Seth Copen Goldstein 2000-5 164

Simplest Example
for () {
a = b+c
b = a*a
c = a*194
0 1 2 3
}
Try II = 2
Modulo Resource Table:
0 1
Lesson: lower bound may not be achievable
1
b
a
c
1
15-745 © Seth Copen Goldstein 2000-5 165

Example
for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
Priorities: ?
<1,1> <0,1>
<1,1> <0,1>
<1,1> <0,1>
<0,1>
<0,1>
<1,1>
g
a
h
j
f
c
b
d
e
15-745 © Seth Copen Goldstein 2000-5 166

Example
for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
Priorities: c,d,e,a,b,f,j,g,h
g
a
<1,1> <0,1>
<1,1> <0,1>
<1,1> <0,1>
<0,1>
<0,1>
<1,1>
h
f
c
j
b
d
e
15-745 © Seth Copen Goldstein 2000-5 167

for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
s=5
instr
s
a
b
c
d
e
f
g
h
j
ALU
MU
Priorities: c,d,e,a,b,f,j,g,h
a
b
j
h
f
c
g
e
d
15-745 © Seth Copen Goldstein 2000-5 168

for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
Priorities: a,b,f,j,g,h
s=5
instr
s
a
b
c
0
d
1
e
2
f
g
h
j
ALU
MU
c
d
e
j
a
c
f
b
d
g
h
e
15-745 © Seth Copen Goldstein 2000-5 169

for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
Priorities: b,f,j,g,h
s=5
instr
s
a
3
b
c
0
d
1
e
2
f
g
h
j
ALU
MU
c
d
e
a
j
a
c
f
b
d
g
h
e
15-745 © Seth Copen Goldstein 2000-5 170

for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
Priorities: b,f,j,g,h
s=5
instr
s
a
3
b
4
c
0
d
1
e
2
f
g
h
j
ALU
MU
c
d
e
a
b
j
a
c
f
b
d
g
h
e
15-745 © Seth Copen Goldstein 2000-5 171

for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
Priorities: e,f,j,g,h
s=5
ALU
MU
c
d
a
b
j
a
c
f
b
d
h
g
instr
a
b
c
d
e
f
g
h
j
1
s
3
4
0
e
b causes b->e edge violation
15-745 © Seth Copen Goldstein 2000-5 172

for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
Priorities: e,f,j,g,h
s=5
ALU
MU
c
d
e
a
b
j
a
c
f
b
d
h
g
instr
a
b
c
d
e
f
g
h
j
1
s
3
4
0
7
e
e causes e->c edge violation
15-745 © Seth Copen Goldstein 2000-5 173

for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
Priorities: f,j,g,h
s=5
instr
s
a
3
b
4
c
5
d
6
e
7
f
0
g
h
j
ALU
MU
c
f
d
e
a
b
j
a
c
f
b
d
g
h
e
15-745 © Seth Copen Goldstein 2000-5 174

for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
Priorities:j,g,h
s=5
instr
s
a
3
b
4
c
5
d
6
e
7
f
0
g
h
j
1
ALU
MU
c
f
d
j
e
a
b
j
a
c
f
b
d
g
h
e
15-745 © Seth Copen Goldstein 2000-5 175

for i:=1 to N do a := j Å b b := a Å f c := e Å j d := f Å c e := b Å d
f := U[i]
g: V[i] := b
h: W[i] := d
j := X[i]
Priorities:g,h
s=5
instr
s
a
3
b
4
c
5
d
6
e
7
f
0
g
7
h
8
j
1
ALU
MU
c
f
d
j
e
g
a
h
b
j
a
c
f
b
d
g
h
e
15-745 © Seth Copen Goldstein 2000-5 176

Creating the Loop
n Create the body from the schedule.
n Determine which iteration an instruction
falls into
q Mark its sources and dest as belonging to that iteration.
q Add Moves to update registers n Prolog fills in gaps at beginning
q For each move we will have an instruction in prolog, and we fill in dependent instructions
n Epilog fills in gaps at end
instr
s
a
3
b
4
c
5
d
6
e
7
f
0
g
7
h
8
j
1
15-745 © Seth Copen Goldstein 2000-5 177

f0 = U[0]; j0 = X[0];
FOR i = 0 to N f1 := U[i+1] j1 := X[i+1] nop
a := j0 ? b b := a ? f0 c := e ? j0 d := f0 ? c e := b ? d
h: W[i] := d f0 = f1
j0 = j1
g: V[i] := b
15-745 © Seth Copen Goldstein 2000-5 178

Conditionals
n What about internal control structure, I.e., conditionals n Three approaches
q Schedule both sides and use conditional moves
q Schedule each side, then make the body of the conditional a
macro op with appropriate resource vector q Trace schedule the loop
15-745 © Seth Copen Goldstein 2000-5 179

What to take away
n Architecture includes compiler!
n Dependence analysis is very important
(including alias analysis)
n Software pipelining crucial for statically scheduled, but also very useful for dynamically scheduled
15-745 © Seth Copen Goldstein 2000-5 180

Multiflow: An early VLIW architecture (1987)

EPIC – Intel IA-64 Architecture
n Gets rid of lock-step execution of instructions within a VLIW instruction
n Idea: More ISA support for static scheduling and parallelization q Specify dependencies within and between VLIW instructions
(explicitly parallel)
+ No lock-step execution
+ Static reordering of stores and loads + dynamic checking
— Hardware needs to perform dependency checking (albeit aided by software)
— Other disadvantages of VLIW still exist
n Huck et al., “Introducing the IA-64 Architecture,” IEEE Micro, Sep/Oct 2000.
183

IA-64 Instructions
n IA-64 “Bundle” (~EPIC Instruction)
q Total of 128 bits
q Contains three IA-64 instructions
q Template bits in each bundle specify dependencies within a bundle
\
n IA-64 Instruction
q Fixed-length 41 bits long
q Contains three 7-bit register specifiers
q Contains a 6-bit field for specifying one of the 64 one-bit predicate registers
184

IA-64 Instruction Bundles and Groups
n Groups of instructions can be executed safely in parallel
q Marked by “stop bits”
n Bundles are for packaging
q Groups can span multiple bundles
n Alleviates recompilation need somewhat
185

Template Bits
n Specify two things
q Stop information: Boundary of independent instructions
q Functional unit information: Where should each instruction be routed
186

CS计算机代考程序代写 algorithm file system database ,

,
School of Science
COSC2536/2537 Security in Computing and Information Technology
Assignment 1
Overview
The objective of Assignment 1 is evaluating your knowledge on the topics covered in Lecture 1-4. Topics include Basic Cryptographic Techniques, and Public-Key Cryptography (RSA, ElGamal and Paillier cryptosystems). Assignment 1 will focus on developing your abilities in application of knowledge, critical analysis and decision making. Assignment 1 contains several problems related to the topics mentioned above. You are required to prepare the solutions and upload them as a single PDF or Word document in CANVAS.
In this assignment, there are 5 (five) questions in total. The first question Q1 is on designing a cryptographic algorithm for a secure vault with a sophisticated digital keypad. In this question, a scenario is given that describes how a secret key for the digital keypad is generated and the digital keypad works. You need to design an algorithm that satisfies the requirements of the security of the digital keypad.
The second question Q2 is about designing an algorithm to perform cryptanalysis on a captured encrypted text. The term Cryptanalysis is used to breach cryptographic security systems and gain access to the contents of encrypted messages, even if the cryptographic key is unknown. Therefore, you are expected to apply cryptanalysis in order to obtain plaintext from the given ciphertext in Q2.
The second question Q3 is about the designing a Secure Online Property Auction System using the hash algorithm. In Q2, you are expected to design an Online Bidding System where an attacker cannot determine the bid values of participants and the hash algorithm based bidding would work.
The fourth question Q4 is related to breaking the RSA Encryption algorithm. In this question, you are expected to perform prime factorization to break RSA private-key d from the public-key (n, e). You should demonstrate the detail steps with explanations how the RSA encryption algorithm can be broken. Marks will be deducted if you fail to show the detail computation correctly, skip the computation steps, or do not provide explanations.
The fifth question Q5 is about a real-life application of Public-Key cryptography. In this question, you are expected to show how RSA and ElGamal Public-Key Cryptography techniques can be used in secure smart door. You should demonstrate the detail steps with explanations how the RSA and ElGamal encryption and decryption work in the given context. Marks will be deducted if you fail to show the detail computation correctly, skip the computation steps, or do not provide explanations.
Develop this assignment in an iterative fashion (as opposed to completing it in one sitting). You should be able to start preparing your answers immediately after the Lecture-1 (in Week-1). At the end of each week starting from Week-1 to Week-4, you should be able to solve at least one question.
Assessment Type: Individual assignment; no group work. Submit online via Canvas→Assignments→Assignment 1.
Marks awarded for meeting requirements as closely as possible. Clarifications/updates may be made via announcements/relevant discussion forums.
Due date: Week 4, Friday the 26th Mar 2021 11:59pm
Deadlines will not be advanced, but they may be extended. Please check Canvas→Syllabus or via
Canvas→Assignments→Assignment 1 for the most up to date information.
As this is a major assignment in which you demonstrate your understanding, a university standard late penalty of 10% per each working day applies for up to 5 working days late, unless special consideration has been granted.
Weighting: 35 marks (Contributes 35% of the total Grade)
Page 1 of 9

,
If there are questions, you must ask via the relevant Canvas discussion forums in a general manner.
Overall, you must follow the following special instructions:
• You must use the values provided in the questions.
• Hand-written answers are not allowed and will not be assessed. Compose your answers using any word processing software (e.g., MS Word).
• You are required to show all of the steps and intermediate results for each question.
• Please DO NOT provide codes as an answer. Only codes will not be assessed.
• Upload your solution as a single PDF or Word document in CANVAS.
Assessment Criteria
This assessment will determine your ability to:
• Follow requirements provided in this document and in the lessons.
• Independently solve a problem by using cryptography and cryptanalysis concepts taught over the first four weeks of the course.
• Meeting deadlines. Learning Outcomes
This assessment is relevant to the following Learning Outcomes:
1. CLO-1: explain the functioning of security services in computing environments and the security issues in networked applications.
2. CLO-2: discuss various types of data integrity and confidentiality mechanisms including public key cryptography.
3. CLO-3: describe basic system security mechanisms and protocols, such as those used in operating systems, file systems and computer networks.
Assessment details
Please ensure that you have read Section 1 to 3 of this document before going further. Assessment details (i.e. question Q1 to Q5) are provided in the next page.
Page 2 of 9

,
Q1. Designing Cryptographic Algorithm for Secure Nuclear Arsenal Control System (6 Marks)
A nuclear arsenal is a weapon, such as atomic bomb or hydrogen bomb, is made of nuclear energy and has a tremendous destructive power comes from the release of nuclear energy. If the nuclear bomb is activated accidently, it can cause many deaths. Therefore, a secure control system should be designed for authorized activation and deactivation of the nuclear arsenal.
Figure-1.1: Nuclear Arsenal
Figure-1.2: Master Key generation at keypad from three keys
If the secure nuclear arsenal control system is designed to work based on a single key, the security of the system would be less secure. Hence, a combination of multiple keys provided by multiple authorized participants for creating a single master key would increase the security of the system.
Say, all three authorized participants (the President, Secretary of Defense, and Security Adviser) must provide the approval to confirm the activation and deactivation of the nuclear arsenal via a very sophisticated and specially designed secure control system. The secure control system has a digital keypad (see Figure-1.2) which is used to enter secret password for activating or deactivating the nuclear arsenal. The keypad accepts three secret keys one after another. Each secret key is an integer number of 5 digits.
Page 3 of 9

,
When the keypad is initialized each participant enters individual secret key without anyone knowing that number. Once all three participants enter their secret numbers, the sophisticated logic in the keypad performs a mathematical operation and generates a master key by using the three numbers (see Figure-1.2). It then stores the master key in the memory and deletes the individual secret keys.
Once the digital keypad is initialized, the three authorized participants must come all at the same time and enter the secret keys one after another. Similar to the initialization phase, keypad performs a mathematical operation and generates a new master key by using the three numbers. The new master key is then compared with old master key saved in the keypad. If they are same, the nuclear arsenal activates or deactivates.
Explain the algorithm with an example to design the sophisticated keypad for the secured nuclear arsenal control system. You need to choose appropriate 5-digit numbers to show all the steps (i.e., initialization of master key and key verification process) of the algorithm.
Q2. Cryptanalysis with Missing Encrypted Text [8 Marks]
In January 1917, British cryptographers deciphered a telegram from German Foreign Minister Arthur Zimmermann to the German Minister to Mexico, Heinrich von Eckhardt. In that telegram, Zimmermann offered United States territory to Mexico in return for joining the German cause. This message helped draw the United States into the war and thus changed the course of history. The ciphertext and decoded message of Zimmermann is shown in Figure-2.1. The challenge was the encrypted message had many missing ciphertext. In spite of missing encrypted text, the British cryptographic office known as “Room 40” decoded the Zimmermann Telegram and handed it over to the United States in late- February 1917.
(a) Encoded Message (b) Decoded Message
Figure-2.1: Zimmermann Telegram
In this task, you have to decrypt an encrypted message. However, here we have encrypted a long English message a bit differently. Every single alphabet in the message has been substituted by another unique alphabet. While the encrypted message was captured, some of the alphabets were missing. A missing encrypted alphabet is marked as ‘_’. The encrypted message is shown below:
NZQLI XSFIXSVH ZIV ZG LWWH DRGS ZFGSLIRGRVH LEVI GSV ZHGIZAVMVXZ EZXXRMV, DRGS IVORTRLFH OVZWVIH GVOORMT KZIRHSRLMVIH GSVB ZIV VMGRGOVW GL IVJFVHG Z WRUUVIVMG QZY YFG GSV UVWVIZO TLEVIMNVMG HZBRMT NLHG KVLKOV DLM’G SZEV Z XSLRXV. IVORTRLFH XLMXVIMH ZYLFG GSV ZHGIZAVMVXZ EZ_XRMV ZIRHV UILN RGH FHV LU ZYLIGVW UVGZO XVOOH RM GSV WVEVOLKNVMG KILXVHH, DSRXS RH XLNNLM HXRVMGR_RX KIZXGRXV GSZG HLNV XSIR_GRZMH URMW LYQVXGRLMZYOV. GSV HGLFHS XLFOW UIFHGIZGV LI WVOZB ZGGVNKGH GL RMLXFOZGV GSV XLFMGIB ZTZRMHG UFIGSVI XLE_W-19 LFGYIVZPH ZMW OLXPWL_MH ZH ZFGSLIRGRVH KIVKZIV GL HGZI_ GSV EZXXRMV ILOOLFG OZGVI GSRH NLMGS.
Page 4 of 9

,
Figure-2.2: English Alphabet Frequency Count
Using frequency analysis technique, you need to show step-by-step processes to Decipher and find out the actual message. Use the English alphabet frequency count as shown in Figure 2.2. Please note that no marks will be given if only plaintext is shown without detailed steps.
Q3. Designing Secure Online Car Auction System using Hash Algorithm [7 Marks]
Covid-19 has changed the way we conduct business these days. This is true for car auctions as well. Currently, most of the car auctions are being operated online as physical auction is prohibited. Large number of sellers and car dealers are opting for online auctions. Based on an article published, we would like to highlight few facts about the current practice in online auctions:
• “Online car auctions run like a traditional auction, with buyers registering and placing bids while watching videos and online condition report of cars.”
• “Another method involves buyers sending off bids, similar to eBay, and the time allotted for the auction is extended by five minutes every time a bid is entered.”
Obviously, there are many issues with online auction, but one of the critical issues is trust – the way online bidding process is conducted. We want to make sure the online bidding process is trustworthy, and nobody can cheat to win.
Figure-3: Cryptographic Hash Function based Online Car Auction
Design a cheating-proof online car auction system using cryptographic hash function with the following requirements:
• A bidder can only bid with a hash value.
• The bidder can bid only once.
• Guessing the plaintext bid amount should be difficult.
Show step-by-step process with concrete examples.
[Will you be able to build a website for online bidding for you second assignment?]
https://www.manheim.com.au/passenger-vehicles/landing/Online-Auction-Only
[If you are interested to implement a broader version of this system as the Capstone project, please contact the Lecturer]
Page 5 of 9

,
Q4. Breaking RSA Key (6 Marks)
Recently, researchers have successfully decrypted the RSA ciphertext using one of the RSA cryptanalysis techniques, called prime factorization, without knowing the private key. Say, Trudy is an intelligent hacker who knows RSA encryption algorithm and prime factorization very well. Hence, she has been hired by someone who wants to know the secret message between Alice and Bob. Trudy uses her understanding on the prime factorization-based RSA cryptanalysis techniques for retrieving Alice and Bob’s secret message.
As you know from Lecture-3 and Tutorial-3, the public-key component (n) of the RSA cryptosystems is an integer that has two prime numbers. Assume that Bob’s RSA public-key is published as: n = 4599788129 and e = 3919. Trudy wants to find the private-key (d) for the above RSA public-key.
You need to perform the followings:
a) Show step by-step process how Trudy can find the private-key (i.e., RSA key breaking) using
prime factorization.
b) Show how the secret message (M) can be computed from the captured ciphertext C =
700100670.
c) HowcouldTrudytakeadvantageofmultiplecomputerstoperformtheintegerfactorizationtasks mentioned above for breaking RSA faster? Explain with detailed steps.
[https://www.quintessencelabs.com/blog/breaking-rsa-encryption-update-state-art/]
[If you are interested to implement a broader version of this system as the Capstone project, please contact the Lecturer]
Q5. Application of Public-Key Cryptography (Marks: 4+4 = 8)
Figure-5: Partial list of first 10000 Prime Numbers
Say, Alice wants to design a smart door lock for you using Public-Key Cryptography Algorithm. The smart door lock requires initialization of a pair of keys (public-key and private-key) with a Public-Key Cryptography Algorithm. Parameters of both public and private keys need to be saved in the smart door lock at the time of configuration. The algorithm is designed in a way that would allow you to save the key parameters at the time of configuration which can be changed later by you. At the time of locking the door, you need to enter a secret message (M) which will be saved in the smart door lock unless the door is opened. The smart door lock would generate a ciphertext (C) and send it to your mobile phone as an SMS. The smart door lock requires you to enter C at the time of opening the door.
Page 6 of 9

,
The smart door lock would decrypt C and generate a plaintext M’. If the generated M’ matches with the
stored M, then the door will be unlocked. An overview of the process in shown in Figure-5.
Assume that public and private keys will be generated using either RSA or ElGamal Public-Key Cryptography Algorithm. Also assume that you would use your student number as the secret message M. For example, if your student number is “S123456”, the secret message is: M = 123456.
Answer the following questions:
a) Consider that the smart door lock is using RSA Public-Key Cryptography Algorithm. With proper description, show detailed steps of key generation, generation of ciphertext C (i.e., encryption process), and generation of the plaintext M’ (i.e., decryption process). Use parameters: p = 8377 and q = 6673.
i. Choose a small public key parameter (e = 937) and show detailed steps to compute your public-key and private-key?
ii. How would the smart door lock encrypt message M = and produce the ciphertext C?
iii. How would the smart door lock decrypt C?
b) ConsiderthatthesmartdoorlockisusingElGamalPublic-KeyCryptographyAlgorithm.With proper description, show detailed steps of key generation, generation of ciphertext C (i.e., encryption process), and generation of the plaintext M’ (i.e., decryption process). Use parameters: p = 7719799, g = 2686, and x = 7718.
i. Show detailed steps to compute your public-key and private-key?
ii. The smart door lock chooses a random number r = 21445. How would the smart door lock encrypt message M = and produce the ciphertext
C?
iii. How would the smart door lock decrypt the encrypted message C?
Academic integrity and plagiarism (standard warning)
Academic integrity is about honest presentation of your academic work. It means acknowledging the work of others while developing your own insights, knowledge and ideas. You should take extreme care that you have:
• Acknowledged words, data, diagrams, models, frameworks and/or ideas of others you have quoted (i.e. directly copied), summarized, paraphrased, discussed or mentioned in your assessment through the appropriate referencing methods,
• Provided a reference list of the publication details so your reader can locate the source if necessary. This includes material taken from Internet sites.
If you do not acknowledge the sources of your material, you may be accused of plagiarism because you have passed off the work and ideas of another person without appropriate referencing, as if they were your own.
RMIT University treats plagiarism as a very serious offence constituting misconduct. Plagiarism covers a variety of inappropriate behaviors, including:
• Failure to properly document a source
• Copyright material from the internet or databases
• Collusion between students
For further information on our policies and procedures, please refer to the University website. Assessment declaration
When you submit work electronically, you agree to the assessment declaration.
Page 7 of 9

,
Rubric/assessment criteria for marking
All of the computations must be correct and only provided values must be used. Instructions must be followed.
Criteria
The characteristic or outcome that is being judged.
Total
Question 1
Designing Cryptographic Algorithm
The answer is correct and the explanation is up to the mark
6 Marks
The answer is correct, but the explanation is not up to the mark
4 Marks
The answer is partially correct and the explanation is not up to the mark
2 Marks
The question is attempted but the answer is not correct.
1 Marks
Not answered.
0 Marks
6 Marks
Question 2
Cryptanalysis
Plaintext is correct
Steps are shown in a systematic way and algorithm is presented.
8 Marks
Plaintext is correct
Steps are shown in a systematic way, but algorithm is not presented or incorrect.
4 Marks
Plaintext is partially correct
Or
Plaintext is correct. Steps are not shown in a systematic way and algorithm is not presented.
1 Marks
Not answered
0 Marks
8 Marks
Question 3
Cryptographic Hash Algorithm
The answer is correct, and the explanation is up to the mark
7 Marks
The answer is correct, but the explanation is not up to the mark
5 Marks
The answer is partially correct, and the explanation is not up to the mark
3 Marks
The question is attempted but the answer is not correct.
1 Marks
Not answered
0 Marks
7 Marks
Question 4
Breaking RSA Encryption algorithm
Step-by-step processes of finding secret key and decryption are shown correctly. The explanation of using multiple computers for breaking RSA is up to the mark.
6 Marks
Step-by-step processes of finding secret key and decryption are shown correctly but the explanation of using multiple computers for breaking RSA is not up to the mark.
OR,
Step-by-step processes of finding secret key and decryption are not shown correctly but the explanation of using multiple computers for breaking RSA is up to the mark.
4 Marks
Step-by-step processes of finding secret key and decryption are shown but not all of the computations are shown correctly or step-by- step processes are not shown as required.
The explanation of using multiple computers for breaking RSA is not up to the mark.
2 Marks
The process of finding secret key is partially correct and detailed steps are not provided. Decryption process is partially correct or not described as per the requirement.
The explanation of using multiple computers for breaking RSA is partially correct.
1 Marks
None of the steps are shown correctly
Or
Not answered
0 Marks
6 Marks
Page 8 of 9

,
Question 5(a)
Encryption with RSA Cryptography
Step-by-step processes of both encryption and decryption are shown
All of the computations are shown correctly in detail
4 Marks
Step-by-step processes of both encryption and decryption are shown
Not all of the computations are shown correctly in detail
3 Marks
Step-by-step processes of encryption are shown correctly
However, decryption steps are not shown or incorrectly shown
2 Mark
Step-by-step processes of encryption are shown that are partially correct/ completely wrong
Or
Only decryption steps are correct
1 Marks
None of the steps are shown correctly
Or
Calculations are not shown in detail
Or
Not answered
0 Marks
4 Marks
Question 5(b)
Encryption with ElGamal Cryptography
Step-by-step processes of both encryption and decryption are shown
All of the computations are shown correctly in detail
4 Marks
Step-by-step processes of both encryption and decryption are shown
Not all of the computations are shown correctly in detail
3 Marks
Step-by-step processes of encryption are shown correctly
However, decryption steps are not shown or incorrectly shown
2 Mark
Step-by-step processes of encryption are shown that are partially correct/ completely wrong
Or
Only decryption steps are correct
1 Marks
None of the steps are shown
correctly/
Calculations are not shown in detail/
Not answered
0 Marks
4 Marks
Page 9 of 9

CS计算机代考程序代写 computer architecture PowerPoint Presentation

PowerPoint Presentation

TU856-1 & TU858-1 Computer Architecture and Technology
Module Code: CMPU 1006
FROM BOOLEAN ALGEBRA TO LOGIC GATES
Presenter: Dr Art Sloan
Semester 1, Week 5
1

Presentation Outline
This presentation links the ideas of binary representation with the functionality of logic gates through the mathematical view of Boolean algebra and truth tables .
It describes the structure of logic gates and mentions their use in digital circuits and CMOS.

There are some example variations on the basic AND and OR gates, showing how they can be engineered for various numerical outcomes.
2

Presentation Content – including
The Algebra of Logic
George Boole
Using Boolean Algebra with Binary
Boolean Arithmetic
The Principle of Logic Gates
Gate Representation (AND, OR, NOT)
Inside Logic Gates
3
Logic Gates for Data Movement
Truth Tables
De Morgan’s Theorem
Logic Gates on CMOS
NOR Gates as ‘Universal’
Lecture Summary
Where to Next?

The Algebra of Logic
Boolean algebra, sometimes referred to as the algebra of logic, is a two-valued system of algebra that represents logical relationships and operations.

Historically, the principle of two-value algebra began with the Greek philosopher Aristotle’s bivalent (two-mode) definition of truth with four foundational laws of logic.
4

The Algebra of Logic (2)
Aristotle’s laws of two-value logic:
The Law of Identity (A is A),
The Law of Non-Contradiction (A is not non-A),
The Law of the Excluded Middle (either A or non-A)
The Law of Rational Inference.

These so-called Laws function within the scope of logic where a proposition is limited to one of two possible values.

5

George Boole
George Boole was a teacher and mathematician who lived between 1815 and 1864 – mostly in Lincoln, England. (He lectured in Cork for a few years when the university was called Queen’s College.)

Boole applied algebraic techniques to the logic processes for two-value systems.

He contended that any logical statement could be assigned a binary value, such as “true/false” or “yes/no.”
6

George Boole (2)
Boole discussed ways of reducing logical relationships to simple statements of equality, inequality, inclusion, and exclusion.
He then showed ways to express these statements symbolically using a binary (two-valued) code.
7

George Boole – 2 November 1815 – 8 December 1864

George Boole (3)
He stated the algebraic rules that governed these logical relationships. This system of mathematical logic came to be known as Boolean algebra.

The algebra is based on one or two variables with logic of AND, OR and NOT applied to them as combinations of inputs.

8

George Boole (4)
AND

A (False) AND B (False) -> False
A (True) AND B (False) -> False
A (False) AND B (True) -> False
A (True) AND B (True) -> True

9

George Boole (5)
OR

A (False) OR B (False) -> False
A (True) OR B (False) -> True
A (False) OR B (True) -> True
A (True) OR B (True) -> True
10

George Boole (6)
NOT

A (False) -> NOT A = True

A (True) -> NOT A = False

11

Claude Shannon Takes Boole’s Algebra
In the 1930s, a graduate student of Massachusetts Institute of Technology called Claude Shannon described the Boolean type of algebra as matching the effects of switch and relay-switching circuits, therefore effecting bi-state or bipolar machinery…. 1s and 0s…. Digital things!

(John Von Neumann used these principles in his architecture… more on that soon.)
12

Claude Shannon (2)
13
A (False) AND B (False) -> False
A (True) AND B (False) -> False
A (False) AND B (True) -> False
A (True) AND B (True) -> True
A (False) OR B (False) -> False
A (True) OR B (False) -> True
A (False) OR B (True) -> True
A (True) OR B (True) -> True
A (False) -> True (NOT A)
A (True) -> False (NOT A)
0 · 0 = 0
1 · 0 = 0
0 · 1 = 0
1 · 1 = 1
0 + 0 = 0
1 + 0 = 1
0 + 1 = 1
1 + 1 = 1
0 = 1
1 = 0

Using Boolean Algebra with Binary
The basic principle of Boolean algebra is that a logic variable ‘X’ can have only one of two possible values or states:
X = TRUE or X = FALSE

In binary notation, we can say:
X = TRUE = 1
X = FALSE = 0
This is called positive logic or high-true logic.
14

Using Boolean Algebra with Binary (2)
We might also say:
X = TRUE = 0
X = FALSE = 1
This is called negative logic or low-true logic.

Usually the positive logic convention is used.

15

Using Boolean Algebra with Binary (3)
Electrically, 1 is represented by a more positive voltage than zero and 0 is represented by zero volts.

Very often, on a microprocessor;
X = TRUE = 1 = 0.5 volts (variable up to 0.8)
X = FALSE = 0 = 0 volts (or a small bit over 0)
16

Boolean Arithmetic
With two states for input and output, it turns out that:
Addition WILL work,
Subtraction WILL NOT work,
Multiplication WILL work,
Division WILL NOT work.

N.B. See Two’s Complement from Week 4’s notes
17

Boolean Arithmetic (2)
Addition works as OR:
0 + 0 = 0
0 + 1 = 1
1 + 0 = 1
1 + 1 = 1

1 + 1 can not be 0 – nor can it be 2, so Boolean logic means it has to be 1.
18

Boolean Arithmetic (3)
Multiplication works as AND:
0 x 0 = 0
0 x 1 = 0
1 x 0 = 0
1 x 1 = 1

19

Boolean Arithmetic (4)
A + B reads, A OR B

A x B reads A AND B but since, in mathematics, generally, a dot (•) is used to show multiplication – or nothing at all – the notation of A•B or AB might appear for A AND B.
20

Boolean Arithmetic (5)
NOT:

The outputs in Boolean algebra can be affected by using NOT in the logic – i.e. you can invert inputs, A or B or the output, X by placing a NOT with the input before applying OR or AND, or just after the operation (OR or AND), thereby inverting the output.

21

Boolean Arithmetic (6)
The notation for NOT is a bar, a bubble or a ‘complement apostrophe’.

So, for example, if A = 0 then NOT A = 1 and it might appear as either:
Ā = 1
Å = 1
A’ = 1

22

Boolean Arithmetic (7) Truth Tables
23
A B X
0 0 0
0 1 1
1 0 1

1 1 1

A B X
0 0 0
0 1 0
1 0 0
1 1 1

The OR Truth Table
The NOT Truth Table
The AND Truth Table
A X
0 1
1 0

The Principles of Logic Gates
The binary language used in today’s computers reflects Boole’s binary logic.

Modern computers operate solely on the binary numbers “1” and “0.”

All the instructions that direct a computer’s operation exist as a sequence of such binary digits or bits (0s and 1s).
24

The Principles of Logic Gates (2)
Binary digits (or logical variables) are processed in the machine as distinct voltage states in tiny electronic circuits known as logic gates.

A logic gate only recognizes two varieties of input, high-voltage (value of 1 or TRUE) and low-voltage (value of 0 or FALSE).
25

The Principles of Logic Gates (3)
The truth variables of Boolean algebra use the functionality of the logical concepts of AND, OR and NOT.

It will come as no surprise, therefore, that the logic gates (as electronic mechanisms) are AND, OR, and NOT.

These gates, used in differing combinations, allow the computer to execute all its operations and/or store its data.
26

The Principles of Logic Gates (4)
Each logic gate of the types AND and OR takes in two or more bits in the form of such voltages, combines them according to a built-in rule, and produces a single high-voltage or low-voltage logical conclusion (output).

Note, though, that NOT gates usually have one input and one output.

27

Diagram of the Main Gates
28

Please note that an ‘inverting buffer’ is referred to and used to represent a NOT gate.

Digital Logic
Let A and B denote two propositions. (Here, propositions = declarative sentences that are either true or false but not both true and false at the same time).

If each proposition A and B is associated with a switch that will be closed if the proposition is “true” and open if the proposition is “false” then:
the combined proposition AB (A AND B) may be instantiated by connecting the switches in series.
Thus the output of one switch is the input of the other. Current can flow through the combined circuit if and only if both switches are closed, that is, if both A and B are “true.”
29

Gate Representations – AND
30

Digital Logic (2)
The combined proposition A + B (A OR B) may be instantiated by connecting the switches in parallel.

Thus, with the switches side by side – so that both contribute to the output simultaneously – they can be used to represent the statement, A + B (A OR B).
In this case current will flow if either switch is closed or if both switches are closed. That is, if A or B or both are “true.”
31

Gate Representations – OR
32

Gate Representations – OR (2)
33

Gate Representations – NOT
34

Inside an AND Gate with Truth Table
35

A B X
0 0 0
0 1 0
1 0 0
1 1 1

Inside an OR Gate with Truth Table
36
A B X
0 0 0
0 1 1
1 0 1
1 1 1

Inside an NOT Gate with Truth Table
37

A X
0 1
1 0

Logic Gates for Data Movement
Logic gates are the fundamental building blocks of all digital logic circuits – they are switching circuits that perform certain simple operations on binary signals.

These operations are chosen to facilitate the implementation of functions such as changing 1s to 0s or 0s to 1, filtering 1s only or 0s only, checking for combinations of 1s and/or 0s.

38

Logic Gates for Data Movement (2)
Why? Why are these filterings and conversions important to computer circuits?
That’s the question I asked myself when I first saw logic gates. The reason: these allow for bits and bytes to be shifted through circuits and/or converted in the CPU.
Those shifts and conversions are the fundamental elements of computation. All action associated with components like the BIOS and the CPU need to use logic and algebra.
39

NAND and NOR Gates
On (in) a silicone chip it is simpler to manufacture the combination NOT AND and NOT OR than it is to deal with AND, OR and NOT.

NOT AND becomes NAND.

NOT OR becomes NOR.

40

NAND
41

The NAND gate looks like an AND gate with a ‘NOT bubble’ right on its output.

The NAND Truth Table
42

The NAND Gate notation:
A NAND B = X
or
_____
A ● B = X

This reads as,
“A and B Bar equals X.”
A B X
0 0 1
0 1 1
1 0 1
1 1 0

NOR
43
The NOR gate looks like an OR gate with a ‘NOT bubble’ on its output.

The NOR Truth Table
44

The NOR Gate notation:
A NOR B = X
or
_____
A + B = X

This reads as,
“A or B Bar equals X.”
A B X
0 0 1
0 1 0
1 0 0
1 1 0

The EXCLUSIVE OR Truth Table
45
A B X
0 0 0
0 1 1
1 0 1
1 1 0

The Exclusive OR Gate notation:
A XOR B = X
or
A  B = X

This reads as,
“A x-or B equals X.”

De Morgan’s Theorem
The mathematician, De Morgan had a theorem of logic that, when applied to switching circuits years later, showed that one gate could be made to work like another by inverting inputs and outputs.

There are two parts to the theorem:
46
Augustus De Morgan – 27 June 1806 – 18 March 1871

De Morgan’s Theorem First Part
The complement of two or more variables ANDed is equivalent to the OR of the complements of the individual variables.
___ _ _
(A ● B) = A + B

(NOT A AND B = NOT A OR NOT B)
47

De Morgan’s Theorem Second Part
The complement of two or more variables ORed together is equivalent to the AND of the complements of the individual variables.
___ _ _
(A + B) = A ● B

(NOT A OR B = NOT A AND NOT B)

48

De Morgan’s Theorem (2)
De Morgan’s rules are about what is known as group complementation in Boolean algebra. 

The rules can be expressed as:
the negation of a disjunction is the conjunction of the negations
the negation of a conjunction is the disjunction of the negations

49

50

50

More on DeMorgan’s Theorem

0 0 0 1
0 1 0 1
1 0 0 1
1 1 1 0

0 0 1 1 1
0 1 1 0 1
1 0 0 1 1
1 1 0 0 0

Proof

The truth-tables are equal; therefore, the Boolean equations must be equal.

51

Overview & proof of DeMorgan’s Theorem #1
DeMorgan’s Theorems
Digital Electronics 
2,1 Introduction to AOI Logic
Project Lead The Way, Inc.
Copyright 2009
51

More on DeMorgan’s Theorem (2)

0 0 0 1
0 1 1 0
1 0 1 0
1 1 1 0

0 0 1 1 1
0 1 1 0 0
1 0 0 1 0
1 1 0 0 0

Proof

The truth-tables are equal; therefore, the Boolean equations must be equal.

52

Overview & proof of DeMorgan’s Theorem #2

DeMorgan’s Theorems
Digital Electronics 
2,1 Introduction to AOI Logic
Project Lead The Way, Inc.
Copyright 2009
52

A Comparison Between Boolean and DeMorgan’s Theorems

Commutative Law
Associative Law
Distributive Law
Consensus Theorem

DeMorgan’s
53

Updates of the Boolean Theorems with the addition of DeMorgan’s
DeMorgan’s Theorems
Digital Electronics 
2,1 Introduction to AOI Logic
Project Lead The Way, Inc.
Copyright 2009
53

The Theorem in Gates
54

NAND and NOR Gates on CMOS
NAND and NOR gates are most commonly etched on the die of a CMOS chip.
(CMOS – Complementary Metal-Oxide Semiconductor, the architecture of most computer CPUs and memory modules.)
Why?
55

(Intel)

A
B

A
B

NAND and NOR Gates on CMOS (2)
It is easier to implement groups of NAND and NOR gates on a chip than the AND, OR and NOT gates individually. (Overall, using NAND and NOR causes fewer inputs (a lower gate input count).

Also, they give a convenient conceptual representation.
56

(Intel)

A
B

A
B

NAND and NOR Gates on CMOS (3)
The NAND gate is the natural implementation for CMOS technology in terms of chip area and speed.
The NAND gate is a universal gate: a gate type that can implement any Boolean function:
NOT can be implemented with NAND
AND implemented with the NAND gate
OR using NAND (as in De Morgan’s Theorem)
57

NAND and NOR Gates on CMOS (4)
Similar to the NAND gate, the NOR gate is a universal gate.

With a NOR gate you can implement:
a NOT
an AND
an OR

58

NOR Gates as a ‘Universal’
59
NAND and NOR gates implementing NOT

NOR Gates as a ‘Universal’ (2)
60
NAND and NOR gates implementing AND

NOR Gates as a ‘Universal’ (3)
61
A NOR gate implementing OR

Gates in Circuits
In conclusion;
most of computer functionality is dependent upon (and contained as) electrical circuits. Specifically, electronic circuits.
The mathematical and logical functionality is dependent upon the circuits that are gates.
That functionality accounts for the execution of instructions and the movement/storage of data.
62

End of Boolean Algebra and Logic Gates
That describes the functionality of Boolean algebra representation and the introduction of logic gates. There is more to come on logic gates, but the structure of these circuits have been described in this ‘first part’.

Are there…
ANY QUESTIONS?
63

Where to Next?
NEXT WEEK:
The theme of the next lecture:
“Sequential Logic (Logic Gates)”
Next time we can take a look at sequential logic – an extension of the topics of Boolean algebra and logic gates. More examples of gate arrangements will give more clarity to the subject.
64

Thanks for your attentiveness.

See you here next time. Be safe and well in the meantime.
65

B
A
×

B
A
+

B

B
A
×

B
A
B
A
+
=
×

B
A
×

A

B

A

B

B
A
+

B
A
×

B

B
A
+

B
A
B
A
×
=
+

B
A
×

B
A
+

B
A
+

B
A
+

B
A
+

B
A
×

X

X

9)
1

X

X

8)
X

X

X

7)
1

1

X

6)
X

0

X

5)
0

X

X

4)
X

X

X

3)
X

1

X

2)
0

0

X

1)
=
=
+
=
+
=
+
=
+
=
×
=
×
=
×
=
×

(
)
(
)
(
)
(
)
(
)
(
)
(
)
Y

X

Y

X

14B)
Y

X

Y
X

14A)
Y
X
Y
X
X

13D)
Y
X
Y
X
X

13C)
Y
X
XY
X

13B)
Y
X
Y
X
X

13A)
YZ
YW
XZ
XW
Z
W
Y
X

12B)
XZ
XY
Z
Y
X

12A)
Z
Y
X
Z
Y

X

11B)
Z
XY
YZ
X

11A)
X

Y
Y
X

10B)
X

Y
Y
X

10A)
=
+
+
=
+
=
+
+
=
+
+
=
+
+
=
+
+
+
+
=
+
+
+
=
+
+
+
=
+
+
=
+
=
+
×
=
×

(IBM)

/docProps/thumbnail.jpeg

CS计算机代考程序代写 python University of Lincoln Assessment Framework Assessment Briefing Template 2020-2021

University of Lincoln Assessment Framework Assessment Briefing Template 2020-2021
NOTE: All Assessment Briefings should be made available prior to the commencement of the module, clearly signposted on the module Blackboard site as well as included in any module handbook or briefing document.
Module Code & Title: CMP3750M Cyber Security
Contribution to Final Module Mark: 40%
Description of Assessment Task and Purpose:
Scenario
You are employed by as a Network Security Analyst by an airline. The airline has been
criticised for high flight prices on the back of a £4m profit in the last financial year. The business has recently been under attack from threat actors who have initiated DoS attacks. These attacks have brought the business to a halt with customers not able to access their accounts or book flights. The business has not invested in network security and there is only one firewall between the main servers and the Internet.
Your line manager has asked you to undertake two tasks:
1. Provide create a tool that will identify when a DoS attack is happening. This tool must
be created in Python.
2. Provide a narrative of no more than 1500 words that explains:
a. how the tool operates, including installation and operation details;
b. why it is an important tool for the business;
c. the impact of a DoS attack;
d. and how it will help to protect the business.
Learning Outcomes Assessed:
LO1 critically evaluate the security of a computer system
LO2 differentiate between types of attacks and critically assess their impact
Knowledge & Skills Assessed:
Subject Specific Skills: Planning and design, Subject-specific knowledge,
Professional Graduate Skills: effective time management, working under pressure to meet deadlines, problem solving, logical thinking,
Emotional Intelligence: self-management
Career Focused Skills: network operations, planning and design, device configuration
Assessment Submission Instructions:
Please submit the following:
1. Narrative that explains:
a. how the tool operates, including installation and operation details; b. why it is an important tool for the business;
c. the impact of a DoS attack;
d. and how it will help to protect the business.
Your narrative should not exceed 1500 words.

Date for Return of Feedback:
Format for Assessment:
Please submit:
• A file that contains the narrative
• The tool you have created.
You do not need to submit any files containing the code.
Do not submit files to the supporting documents on Blackboard as these will not be read.
Please submit the narrative element to Blackboard using this format: Font size 11, 1.5 line spacing, 2cm margins, Times New Roman or Arial Please use Harvard referencing.
Feedback Format:
Feedback will be provided on Blackboard.
Additional Information for Completion of Assessment:
Assessment Support Information:
You will need to test this tool to make sure it works. Please make sure that you do this in a sandboxed environment – a VM such as Virtual Box.
Do not under any circumstances test on a live network.
Important Information on Dishonesty & Plagiarism:
University of Lincoln Regulations define plagiarism as ‘the passing off of another person’s thoughts, ideas, writings or images as one’s own…Examples of plagiarism include the unacknowledged use of another person’s material whether in original or summary form. Plagiarism also includes the copying of another student’s work’.
Plagiarism is a serious offence and is treated by the University as a form of academic dishonesty. Students are directed to the University Regulations for details of the procedures and penalties involved.
For further information, see www.plagiarism.org

CS计算机代考程序代写 computer architecture 228-1 Computer Architecture and Technology

228-1 Computer Architecture and Technology

TU856-1 and TU858-1
Computer Architecture and Technology

(Week 4) Tutorial 3

Questions and Answers ( 1 )
How can we consider the use of binary to represent all instructions and data on a computer system?

The two-state number system can be compared with ‘relay’ components and ‘switching’ – which makes up hardware architecture at its most fundamental levels.

Questions and Answers ( 2 )
A discursive investigation of Two’s Complement and how that works mathematically (as a method for subtraction and division).

Two’s Complement number conversion: 110
I.E. how do you find the decimal -1 in 8-bit binary?
0000 00012 (This is decimal 1 in 8-bit binary)
Flip 1s to 0s and 0s to 1s to get One’s Complement
1111 1110
Add a binary 1 to this to get Two’s Complement
1111 1111 (This can represent -12)

Two’s Complement again
Two’s Complement numbers can be used in calculations by the computer to have the effect of subtraction.

A few more examples on the right

Questions and Answers ( 3 )
What use has octal as a number base?

The use of three-place number systems.

Example
Older mainframe computers from 1950 and early 1960s used 12-bit, 24-bit or 36-bit words. Divisible by 3, so 3-bit octal labels fitted them for calculations.

Questions and Answers ( 4 )
What use has hexadecimal as a number base?

The use of four-place number systems.

Example
Computers from 1960s onward use 16-bit, 32-bit or 64-bit words. Divisible by 4, so 4-bit hexadecimal labels fit them for calculations.

Typical Exam (Sub) Question For This Content
Question
Give a brief description of binary, octal and hexadecimal number representations.

Sample solution
Binary
Computing machines operate on electrical current and so use two states. We view these states as the numbers 0 and 1.
This is the binary representation and is called ‘Base 2’.
Whether in decimal or binary, the position of numbers delineate their quantity.

Solution continued
Octal
Grouping binary digits, for example, 010010101110011 into threes looks like this:
9,58710 = |010|010|101|110|0112
The Octal notation for representing Binary numbers uses groups of three bits:
The symbols that are used to represent each group are the same as the integer value of each group. By using these Octal symbols (0 – 7), the number can be expressed in a more compact form:
9,58710 = (2|2|5|6|3) 8

Solution continued
 Hexadecimal
Suppose that we group the example binary digits 010010101110011 into fours. Then this might be written:
9,58710 = |0010|0101|0111|00112

Now the groups of four can be given different symbols.
There are 16 different combinations of four binary digits.
The symbols chosen are the common numerals (0 – 9) and the remaining six possible four-bit combinations are represented by the letters, A, B, C, D, E and F.

(10 marks)

/docProps/thumbnail.jpeg

CS计算机代考程序代写 ocaml B tree interpreter compiler data structure 3/13/2021 assignment3

3/13/2021 assignment3
Important notes about grading:
1. Compiler errors: Programs that cannot be compiled will receive an automatic zero. If you are having trouble getting your assignment to compile, please visit recitation or office hours.
2. Late assignments: Please carefully review the course website’s policy on late assignments, as all assignments handed in after the deadline will be considered late. Submitting an incorrect version before the deadline and realizing that you have done so after the deadline will be counted as a late submission.
Compile and run your code:
After downloading and unzipping the file for assignment3, in your command-line window, cd into the directory which contains src . You should write your code in src/assignment3.ml and src/eval.ml .
Use make to compile the code. Then, use ./imp to excute it.
Submission:
Please submit to Sakai with a zip file including assignment3.ml and eval.ml only.
https://ru-automated-reasoning-group.github.io/cs314_s21/hw/assignment3.html 1/9

3/13/2021 assignment3
Library functions:
You can use any library functions in the List module for this assignment (please do not use library functions in any other modules). You may find the following functions particularly useful, e.g., List.mapi ,
List.mem_assoc , and List.assoc .
Datatypes:
Problems 1-3 are about manipulating tree data structures.
In OCaml, you can define a tree data structure using datatype:
type ‘a tree = Leaf | Node of ‘a tree * ‘a * ‘a tree
We will assume binary search trees in this assignment and can define a bineary search tree insertion
function as the following:
let rec insert tree x =
match tree with
| Leaf -> Node(Leaf, x, Leaf) | Node(l, y, r) ->
if x = y then tree
else if x < y then Node(insert l x, y, r) else Node(l, y, insert r x) We can construct a binary search tree from a list: let construct l = List.fold_left (fun acc x -> insert acc x) Leaf l
Problem 1
We have seen the benefits of the ‘fold’ function for list data structures. In a similar fashion, write a function
fold_inorder : (‘a -> ‘b -> ‘a) -> ‘a -> ‘b tree -> ‘a That does a inorder fold of the tree. For example,
fold_inorder (fun acc x -> acc @ [x]) [] (Node (Node (Leaf,1,Leaf), 2, Node
(Leaf,3,Leaf))) = [1;2;3]
In [ ]:
In [ ]:
let rec fold_inorder f acc t = (* YOUR CODE HERE *)
assert (fold_inorder (fun acc x -> acc @ [x]) [] (Node (Node (Leaf,1,Leaf), 2, Node (Le af,3,Leaf))) = [1;2;3]);
assert (fold_inorder (fun acc x -> acc + x) 0 (Node (Node (Leaf,1,Leaf), 2, Node (Leaf, 3,Leaf))) = 6)
https://ru-automated-reasoning-group.github.io/cs314_s21/hw/assignment3.html 2/9

3/13/2021 assignment3
Problem 2
Given a binary tree, return the level order traversal of its nodes’ values. (i.e., from left to right, level by level).
levelOrder : ‘a tree -> ‘a list list Example: Given a tree t :
the returned result of levelOrder t should be a nested list [[41];[20;65];[11;29;50;91]; [32;72;99]] . The i th elment of the nested list is a list of tree elements at the i th level of the tree t from left to right You may need to define a recursive auxiliary function
In [ ]:
Problem 3
The usual recursive formulation of the function that computes the sum of a tree
let rec sum_tree t =
match t with
| Leaf -> 0
| Node (l, x, r) -> sum_tree l + x + sum_tree r
is not tail-recursive.
For a tail recursive implementation, you may need to use an auxiliary function of type
aux: int -> tree list -> int
Hint: aux acc ts takes as input an accumulator acc as usual and a list of sub-trees obtained from the input tree. The idea is that when the list is empty, acc is the sum of all input-tree elements. Implement a tail recursive function sumtailrec that uses this idea.
sumtailrec: int tree -> int
In [ ]:
In [ ]:
let levelOrder t =
(* YOUR CODE HERE *)
let sumtailrec t =
(* YOUR CODE HERE *)
let tree = construct [2;1;3] in
assert (sumtailrec tree = sum_tree tree)
https://ru-automated-reasoning-group.github.io/cs314_s21/hw/assignment3.html 3/9

3/13/2021 assignment3
Operational Semantics Problem 4
In this problem, you will implement a definitional interpreter (aka an evaluator) for Imp, an imperative language. The language contains variables, integer and boolean types, equality and comparison operations, math and boolean operations, and simple control flow.
Recall from class that a definitional interpreter is a function that takes a program’s abstract syntax tree (AST) and evaluates it to its final result. For Imp, this final result is an environment, which is a map from variables to their (current) values. Imp’s AST is defined by the type com (commands), and environments are defined by the type environment in the file ast.ml . Your job will be to implement the function
eval_command : com -> environment -> environment
which will evaluate the given command (first argument) starting with the given environment (second argument), producing the final environment. Since commands also include (arithmetic and other) expressions, you will also implement the function
eval_expr : exp -> environment -> value
This function evalutes the given expression to its final value in the given environment. Both of your functions
will go in the file eval.ml ; you should not modify any other files that are given to you. ast.ml: Abstract Syntax Trees, Environments, Values
As mentioned above, the Imp AST is defined by the type com in ast.ml . This type has the following definition:
type com =
| While of exp * com
|For ofexp*com
| Cond of exp * com * com
| Comp of com * com
| Assg of string * exp
| Declare of dtype * string | Print of exp
| Skip
Each part of this datatype corresponds to a kind of Imp command. For example, Skip is an empty command. Assg of string * exp corresponds to an assignment command, where the string part indicates a variable name, and the exp part indicates the expression whose value should be assigned to the variable. Comp of com * com is a command that is itself a pair of other commands, with the first executed before the second. We explain this AST in detail in the next section. Note that the types exp and
dtype are part of the Imp AST, too; they are referenced in com ‘s definition above.
In a full-fledged interpreter, a parser is used to convert a normal text file into its corresponding AST. In this project, we provide a parser for you. For example, consider the following input file:
https://ru-automated-reasoning-group.github.io/cs314_s21/hw/assignment3.html 4/9

3/13/2021 assignment3
int n;
int f;
n := 5;
f := 1;
while (n > 1) {
f := f * n;
n := n – 1; };
The parser will take this and produce the following com :
Comp(Declare (Int_Type, “n”), Comp(Declare (Int_Type, “f”),
Comp (Assg (“n”, Number 5), Comp (Assg (“f”, Number 1),
While (LT (Number 1, Var “n”),
Comp(Assg (“f”, Times (Var “f”, Var “n”)),
Assg (“n”, Minus (Var “n”, Number 1))))))))
This com uses Comp to string together each of the commands in the file, one for each line. The Declare(Int_Type, “n”) part corresponds to the first line int n . The Assg(“n”, Number 5)
corresponds to the third line n := 5 . And so on.
We suggest before coding you look at the Imp input examples in /programs . You should be able to get a
sense of how these examples correspond to the com type above.
Also in ast.ml are the definitions of type value and environment. The former is the result from evaluating an expression (i.e., something of type exp ). The latter is a map from variables to values; it keeps track of the current value assigned to a given variable. Here are their definitions:
type value =
| Int_Val of int
| Bool_Val of bool
type environment = (string * value) list
A value (the result of evaluating an expression) is either an integer or a boolean. An environment is a list of pairs, where the first element is a variable name and the second is its current value. This representation is called an association list – the first element of the pair is associated with the second. Elements earlier in the list override elements later in the list. The List module has many useful functions for working with association lists which you should consider using in your implementation.
eval.ml: The Evaluator
To implement the definitional interpreter (i.e., the evaluator) you must implement two functions, eval_expr and eval_com in the file eval.ml . This is the only file you should modify.
eval_com : com -> environment -> environment eval_expr : exp -> environment -> value
Each of these takes as an argument an environment. Evaluating a command might modify an environment (e.g., due to a variable assignment), so eval_com produces an environment as output. Evaluating an expression will never change the environment, and so eval_expr only produces the final value. Both take an environment as input in which the evaluator can look up current values of variables.
https://ru-automated-reasoning-group.github.io/cs314_s21/hw/assignment3.html 5/9

3/13/2021 assignment3
To see these functions in action, to give you a sense of what they do, consider some elements from our example Imp program above.
First, consider Declare (Int_Type, “n”) , a com that represents an Imp command. If we evaluate it in an empty environment [] , we should get an output environment [(“n”, Int_Val 0)] . That is,
eval_expr [] Declare (Int_Type, “n”) = [(“n”, Int_Val 0)] . An integer variable is initialized to 0 by the interpreter.
Now, consider Assgn(“n”, Number 5) , a com that represents an Imp command. We evaluate it in an environment [(“f”, Int_Val 0); (“n”, Int_Val 0)] . We will get an output environment where we have n mapped to Int_Val 5 . That is, eval_com Assign(“n”, Number 5) [(“f”, Int_Val 0); (“n”, Int_Val 0)] = [(“n”,Int_Val 5); (“f”, Int_Val 0); (“n”, Int_Val 0)] .
Now consider LT (Number 1, Var “n”) , an exp that represents an Imp expression. Evaluating it in the environment produced above, we would have eval_expr LT (Number 1, Var “n”) [(“n”,Int_Val 5); (“f”, Int_Val 0); (“n”, Int_Val 0)] = Bool_Val true ; i.e., 5 is indeed greater than 1.
In what follows, we will step through each of the variants of the exp and com types to say what your interpreter should do with them.
The interpreter concerns error cases as well such as addition between a boolean and an integer and therefore represent a stuck reduction. The expected behavior for these irreducible error cases can be boiled down to the following rules:
Any expression containing division by zero should raise a DivByZero error when evaluated.
Any expression or command that is applied to the wrong types should raise a TypeError exception when evaluated, for example, the expression 1 + true would result in TypeError .
An expression or command that assigns to an undefined variable, or reads from an undefined variable should raise a UndefinedVar when evaluated.
The above exceptions are defiend in eval.ml . Evaluation of subexpressions should be done from left to right in order to ensure that lines with multiple possible errors match up with our expected errors.
Function 1: eval_expr
eval_expr takes an expression e and an environment env , and it produces the result of evaluating e , which is something of type value ( Int_Val or Bool_Val ). Here’s what to do with each element of the expr type:
Number
Integer literals evaluate to a Int_Val of the same value.
True, False
Boolean literals evaluate to a Bool_Val of the same value.
Var
An identifier (variable) evaluates to whatever value it is mapped to by the environment. Should raise a UndefinedVar if the identifier has no binding.
Plus, Minus, Times, Div, and Mod
https://ru-automated-reasoning-group.github.io/cs314_s21/hw/assignment3.html 6/9

3/13/2021 assignment3
These mathematical operations operate only on integers and produce a Int_Val containing the result of the operation. All operators must work for all possible integer values, positive or negative, except for division, which will raise a DivByZeroError exception on an attempt to divide by zero. If either argument to one of these operators evaluates to a non-integer, a TypeError should be raised.
Or and And
These logical operations operate only on booleans and produce a Bool_Val containing the result of the operation. If either argument to one of these operators evaluates to a non-boolean, a TypeError should be raised.
Not
The unary not operator operates only on booleans and produces a Bool_Val containing the negated value of the contained expression. If the expression in the Not is not a boolean (and does not evaluate to a boolean), a TypeError should be raised.
Less (Lt), LessEqual (Leq)
These relational operators operate only on integers and produce a Bool_Val containing the result of the operation. If either argument to one of these operators evaluates to a non-integer, a TypeError should be raised.
Equal (Eq)
The equality operator operates both on integers and booleans, but both subexpressions must be of the same type. The operators produce a Bool_Val containing the result of the operation. If the two arguments to the operator do not evaluate to the same type (i.e. one boolean and one integer), a TypeError should be raised.
Function 2: eval_com
eval_com takes a command c and an environment env and produces an updated environment (defined in Types) as a result.
Skip
Skip is short for “no operation” and should do just that – nothing at all. The environment should be returned
unchanged when evaluating a Skip . Comp
The Comp sequencing command is used to compose whole programs as a series of commands. When evaluating Comp , evaluate the first subcommand under the environment env to create an updated environment env’ . Then, evaluate the second subcommand under env’ , returning the resulting environment.
Declare
The Declare command is used to create new variables in the environment. If the type being declared is Int_Type , a new binding to the value Int_Val(0) should be made in the environment. If the type being
declared is Bool_Type , a new binding to the value Bool_Val(false) should be made in the environment. The updated environment should be returned.
Assg
https://ru-automated-reasoning-group.github.io/cs314_s21/hw/assignment3.html 7/9

3/13/2021 assignment3
The Assg command assigns new values to already-declared variables. If the variable hasn’t been declared before assignment, a UndefinedVar should be raised. If the variable has been declared to a different type than the one being assigned into it, a TypeError should be raised. Otherwise, the environment should be updated to reflect the new value of the given variable, and an updated environment should be returned.
Cond
The Cond command consists of three components – a guard expression, an if-body command and an else- body command. The guard expression must evaluate to a boolean – if it does not, a TypeError should be raised. If it evaluates to true , the if-body should be evaluated. Otherwise, the else-body should be evaluated instead. The environment resulting from evaluating the correct body should be returned.
While
The While command consists of two components – a guard expression and a body command. The guard expression must evaluate to a boolean – if it does not, a TypeError should be raised. If it evaluates to true , the body should be evaluated to produce a new environment and the entire loop should then be
evaluated again under this new environment, returning the environment produced by the reevaluation. If the guard evaluates to false , the current environment should simply be returned.
For
The For loop command consists of two components – a guard expression and a body command. Upon entering the loop, the guard expression a is evaluated in the current environment, yielding an integer n . If the int value n is 0, the loop body is not executed at all. If n is greater than 0, then the loop body is executed n times. No action in the body of the loop, such as assigning to a variable, can change the number of times the loop is executed, nor does executing the body alone change the value of any variable except by explicit assignment.
Parser
A real interpreter has two parts: a parser, which takes text files and converts them into an AST, and an evaluator, which runs the AST to produce a final result. Your job has been to implement the evaluator. To help you test this evaluator, we are providing code that will do parsing for you. The function load in
assignment3.ml will take in a text file and produce an AST of type com . Our parser (called in load ) expects an input file whose contents is a command (or a sequence of commands, which will be parsed as a single Comp ). We suggest before coding you look at the Imp input examples in /programs . You should be able to get a sense of how these examples correspond to the com type above. Our code in the function
eval in assignment3.ml passes the parsed AST to your eval_command function with an empty environment.
In the following example, we parse a program from file and apply the eval function to obtain the evaluation result of the loaded program. We use a given function print_env_str to convert the output environment to string (after removing shadowed variables in the environment). The program is:
int x;
int y;
x := 5;
x := x + 5;
y := 5 + 5;
y := 5 + x;
Obviously, after evaluation, x = 10 and y = 15 .
https://ru-automated-reasoning-group.github.io/cs314_s21/hw/assignment3.html 8/9

3/13/2021 assignment3
In [ ]:
let parsed_ast = load (“programs/aexp-add.imp”) in let result = print_env_str(eval (parsed_ast)) in assert(result =
“- x => 10\n\
– y => 15\n”)
One thing you may need to pay attention to for programming the interpreter is nested pattern matching. You will need parentheses to clearly delimit scoping, e.g.,
match e with
| Plus (e1, e2) ->
let r1 = eval_expr e1 env in
let r2 = eval_expr e2 env in
(match r1, r2 with
| Int_Val i, Int_Val j -> …
| _ -> …) (* The parentheses are needed to delimit the scope of the two pa
ttern-matchings *)
| Minus (e1, e2) -> …
https://ru-automated-reasoning-group.github.io/cs314_s21/hw/assignment3.html 9/9

EEC173A/ECS152A-Computer Networks Winter 2021
Instructor: Chuah

Name: _____________________________

Student ID#: ________________________

24-hour take-home final assessment: March 17

Duration: 24 hours (March 17, 8:00am-March 18, 8:00am)
• This take-home final exam is open book, open notes. Discussing with others in any form is not allowed.
• Each problem may carry different number of points. Try to solve as many as you can.
• Show your reasoning clearly. If your reasoning is correct, but your final answer is wrong, you will receive partial credits. If you just show the answer without reasoning, and your answer is wrong, you may receive no points at all.
• Submit your multiple-choice and numerical answers for all questions on Canvas Quiz. At the end of the Canvas Quiz, there will be a question that allows you to upload a file to show your work. You can use this document to type your response and upload. Alternatively, you write your response on paper, scan it, and upload.
• Be sure to sign the statement of academic integrity on the second page.
• To avoid possible time out from Canvas Quiz or potential network connectivity problem, I highly recommend that you download the word or pdf version of the final so that you have an offline copy to work on either on your computer or with paper & pencil. Once you are confident about your answers, you can login to Canvas Quiz and enter your answers online, as well as upload a copy of your work (see #4 above).

Good luck!

Do not write in this box.
Problem Set
1
2
3
4
5
6
Total
Full Points
20
20
16
12
12
20
100
Points


Statement of academic integrity (please sign below)

As a student at UC Davis, I hold myself to a high standard of integrity, and by signing/accepting the statement below I reaffirm my pledge to act ethically by honoring the UC Davis Code of Academic Conduct. I will also encourage other students to avoid academic misconduct.

I acknowledge that the work I submit is my individual effort. I did not consult with or receive any help from any person or other source. I also did not provide help to others. I may work with others only if the instructor gave specific instructions, and only to the extent allowed by the instructor.

I understand that suspected misconduct on this assignment/exam will be reported to the Office of Student Support and Judicial Affairs and, if established, will result in disciplinary sanctions up through Dismissal from the University and a grade penalty up to a grade of “F” for the course.

I understand that if I fail to acknowledge or sign this statement, an instructor may not grade this work and may assign a grade of “0” of “F”.



Signature (sign/type above this line)
Problem Set 1. Multiple Choice Questions (20 points)
In each of the following questions, select the correct answer.
Q1. (4 pt.) Circuit switching has the advantage of
• Predictable performance
• Not requiring a signaling phase
• Very efficient allocation of shared resources
• None of the above

Q2. (4 pt.) Consider an IEEE 802.11 network that use RTS and CTS frames. Suppose that node A and node B are hidden from each other and both nodes want to transmit to C. Both A and B can hear node C, and node C can hear both A and B. Which of the following statement is FALSE?
• Both node A and node B will send RTS packets, and these RTS packets may collide.
• If the RTS packets from A and B collide, node C will not respond with a CTS.
• After sending an RTS packet, if node A does not hear CTS after time-out, it will attempt to resend RTS.
• Even if node C sends a CTS after receiving RTS from node A (without collision), subsequent data transmission from A can still collide with transmission from node B.
.

Q3. (4 pt.) Consider the following deployment scenario of a network address translation (NAT) box.
Host 128.119.171.186
Host 128.119.171.186
Suppose that the host with IP address 10.0.1.16 sends an IP datagram destined to host 128.119.171.186. The source port is 3394, and the destination port is 80. Consider the datagram at step 2, after it has been transmitted by the router. What is the source and destination IP addresses for this datagram?

• Src: 10.0.1.16; Dest: 128.119.171.186
• Src: 10.0.1.16; Dest: 135.12.197.207
• Src: 135.122.197.207; Dest: 128.119.171.186
• Src: 128.119.171.186; Dest: 135.122.197.207

Q4. (4 pt.) Consider the router and the two attached subnets below (A and B). The number of hosts for each subnet is also shown below. Assume that this organization has been allocated the following address space: 210.85.80.0/23.

Assign subnet addresses to each of the subnets (starting with A, then B) so that the amount of address space assigned is minimal, and at the same time leaving the largest possible contiguous address space available for assignment if a new subnet were to be added. What will be the CIDR entry for subnet A (which has 21 hosts)?
• 210.85.80.0/23
• 210.85.80.0/24
• 210.85.80.0/27
• 210.85.80.0/28

Q5. (4 pt.) Consider the following figure. Assume distant vector routing algorithm is used to find the shortest paths. Which of the following statement is FALSE?

• Each node learns about the global topology through link state packets from other nodes.
• Node u learns about the best paths to other nodes by exchanging distance vectors with v.
• Node u will announce to v that its path costs to w, x, and y is infinity.
• Node x will announce to w that its path costs to v and u is infinity.

Problem Set 2. End-to-end Delay, Web, and File Distribution (20 points)
Q6. (4 pt.) Consider a packet of length 1,000 bytes that begins at end system A and travels over three links to a destination end system. These three links are connected by two packet switches. Assume that the propagation speed on all three links is 2.5 x 108 m/s, the lengths of these links are 5,000km 4,000km, and 1000km, respectively. The transmission rates for the three links are 2Mbps, 10Mbps, and 1Mbps, respectively. The switch processing delay is 3msec. What is the end-to-end delay?

Q7. (4 pt.) Browser Caching. Consider an HTTP server and client as shown in the figure below. Suppose that the RTT delay between the client and server is 20 msecs; the time a server needs to transmit an object into its outgoing link is 0.75 msecs; and any other HTTP message not containing an object has a negligible (zero) transmission time. Suppose the client makes 40 requests, one after the other, waiting for a reply to a request before sending the next request.
Assume the client is using HTTP 1.1 (persistent HTTP) and the IF-MODIFIED-SINCE header line. Assume 60% of the objects requested have NOT changed since the client downloaded them (before these 40 download requests are performed).
How much time elapses (in milliseconds) between the client transmitting the first request, and the completion of the last request (ignoring TCP connection setup time)?

Q8. (4 pt.) Explain how the choice of TTL value affects the performance of DNS cache performance.

Q9. (4 pt.) Suppose that we need to distribute a file of size F =2Gbits to 5 peers. Suppose the server, s, has an upload rate of u=100Mbps.
The 5 peers have upload rates of: u1 = 20 Mbps, u2 = 25 Mbps, u3 = 14 Mbps, u4 = 10 Mbps, and u5 = 15 Mbps
The 5 peers have download rates of: d1 = 29 Mbps, d2 = 33 Mbps, d3 = 15 Mbps, d4 = 20 Mbps, and d5 = 27 Mbps
What is the minimum time needed to distribute this file using client-server model? What is the limiting factor that contributes to the downloading time using client-server model: the ‘server upload rate’, ‘specific client download rate’, or the ‘combined upload of the clients and the server’?

Q10. (4 pt.) Consider the same network connectivity in Question 9. Now assume this file is distributed using peer-to-peer download. What is the minimum time needed to distribute this file? What is the root cause that contributes to the downloading time? The ‘server upload rate’, ‘specific client download rate’, or the ‘combined upload of the clients and the server’?

Problem Set 3. TCP (16 points)
Imagine a TCP session over wireless where the congestion window is fixed at 5 segments (congestion control is turned off and no fast retransmits). Segments may get lost but are not reordered. Assume the sender has a long file (hence continuous byte stream) to send. The receiver has close-to-infinite buffer and it sends an acknowledgment as soon as it receives a segment, i.e., acknowledgments are not deferred. Similarly, sender transmits a segment as soon as it can. Each segment carries 1000 bytes and the time to transmit a segment is 2 ms. Assume that transmission of ACK takes negligible time. Note that the retransmission timer for a segment is started after the last bit of the segment is sent. Assume Go-Back-5, and accumulative ACK are used.
Q11. Suppose two data segments with byte sequence numbers 3000 and 15000 are lost once during the transmission. How many segments get retransmitted under each of the following conditions?
• (4 pt.) Round trip time = 100 ms, Timeout = 102 ms
• (4 pt.) Round trip time = 100 ms, Timeout = 152 ms

Q12. Suppose acknowledgments corresponding to the above data segments are lost instead of the data segments. How many segments get retransmitted under the above conditions?
• (4 pt.) Round trip time = 100 ms, Timeout = 102 ms
• (4 pt.) Round trip time = 100 ms, Timeout = 152 ms 














Problem Set 4. Network Layer (12 points)
Please justify your answers by qualitative or quantitative arguments or illustrations. 

Q13. (4 pt.) This question explores how to set the (configurable) link weights in link-state routing protocols like OSPF and IS-IS inside a single Autonomous System (AS). How should the network operators set the link weights if their goal is to minimize the number of hops each packet traverses to reach its destination? 



Q14. (4 pt.) Similar to the previous question, let us ponder how to assign link weights in link-state routing protocols like OSPF and IS-IS inside a single AS domain. How should the operators set the link weights to minimize the end-to-end delay the traffic experiences? Assume the network is lightly loaded, so queuing delay is insignificant.



Q15. (4 points) In the picture below, the nodes are routers, the edges are links, and the integers correspond to the link weight on each direction of the link. The arrows on the edges to show the shortest path from every node to the destination node D. Dotted lines are links that are not part of the shortest paths to node D.
A
B
C
D
E
F
G
H
I
5
5
5
5
5
5
9
9
10
10
11
11
A
B
C
D
E
F
G
H
I
5
5
5
5
5
5
9
9
10
10
11
11

Suppose the link F-I is overloaded with traffic. Identify a single weight change (on just one link) that would divert traffic from source I to destination D away from the F-I edge without affecting the path between any other source-destination pairs. Avoid any reliance on how routers choose between multiple paths with the same (smallest) cost.

Problem Set 5. Link-State Routing (12 points)
Q16-18. Consider the following network. The link cost represents the probability of failure for that link, e.g., pAC be the probability that link (A,C) fails = 0.01. Assume that failure events on different links are independent of one another. Consider how Dijkstra’s algorithm can be modified to find the most reliable path from node A to every other node on the graph, that is, path for which the probability that all its links stay intact during the connection’s lifetime is maximal.

Let Pk(A,B) be the probability that a path k from A to B = (A, …, B) remains intact (probability that the path does not fail). The goal is to find the best path k with maximum reliability, i.e., highest Pk(A,B). 

Q16. (4 pt.) What is the probability for the path from A to E via C (ACE) being intact?

Q17. (4 pt.) What is the probability for the path from A to E via D (ADE) being intact?

Q18. (4 pt.) What is the most reliable path from A to E? 


Problem Set 6. Link layer: Error Detection, MAC, & Self-Learning Switch (20 points)

Q19. (6 pt.) Suppose we want to transmit the message 10100101 and protect it from errors using the CRC polynomial x3+1. Encode the data bit sequence using the generator polynomial and give the code word.

Q20. (4 pt.) Interaction between MAC-layer and TCP in Wireless Environment. Wireless link-layer (MAC-layer) relies on local retransmission to recover packets lost due to varying channel conditions (e.g., fading, collisions, etc.). Typically, the MAC-layer will try retransmitting a packet N times before giving up. How does the local MAC-layer retransmission help enhance the throughput performance of the end-to-end TCP connection?


Q21. (4 pt.) If local retransmission at link layer is beneficial as described in Question #20 above, it seems that one should choose a large N to increase the chances of successful repair. Give two reasons or scenarios where it is *not* desirable to have a large N.

Q22. (6 points) Consider the operation of a learning switch in the context of a network in which 6 nodes labeled A through F are star connected into an Ethernet switch.
A
D
B
E
C
F
1
2
3
4
5
6
A
D
B
E
C
F
1
2
3
4
5
6

Suppose that (i) B sends a frame to E, (ii) E replies with a frame to B, (iii) A sends a frame to B, and (iv) B replies with a frame to A. The switch table is initially empty. Show the state of the switch table and after each of these events. For each of these events, identify the interfaces(s) on which the transmitted frame will be forwarded. If nothing happens, leave the table entry blank or enter ‘-‘. Please fill out the blanks of red brackets number. Enter your answers as follow:
(1) XXX
(2) XXX
(3) XXX
…
(10) XXX

Event
Switch Table
Interface(s) on which transmitted frame will be forwarded for each event.

MAC Address
Interface

(i)
B
2

(ii)

(iii)

(iv)