# CS代考 CS 15-213, Fall 2004 Final Exam – cscodehelp代写

Instructions:
Full Name:
CS 15-213, Fall 2004 Final Exam

December 16, 2004
• Make sure that your exam has 24 pages and is not missing any sheets, then write your full name and Andrew login ID on the front.
• The exam has a maximum score of 123 points.
• The problems are of varying difficulty. The point value of each problem is indicated. Pile up the easy
points quickly and then come back to the harder problems.
• This exam is OPEN BOOK. You may use any books or notes you like. You may not use a calculator,
laptop or any other electronic or wireless device. Good luck!
TOTAL (123):

Problem 1. (10 points):
Consider the following 16-bit floating point representation based on the IEEE floating point format:
• There is a sign bit in the most significant bit.
• The next k = 7 bits are the exponent. The exponent bias is 63. • The last n = 8 bits encode the significand.
Numeric values are encoded in this format as a value of the form V = (−1)s × M × 2E , where s is the sign bit, E is exponent after biasing, and M is the significand.
Answer the following problems using either decimal (e.g., 1.375) or fractional (e.g., 11/8) representations for numbers that are not integers.
For denormalized numbers:
A. What is the smallest value E of the exponent after biasing? B. What is the largest value M of the significand?
For normalized numbers:
A. What is the smallest value E of the exponent after biasing? B. What is the largest value E of the exponent after biasing? C. What is the smallest value M of the significand?
D. What is the largest value M of the significand?

Suppose we want to implement this representation in C. One way to do this would be with the following struct. The 16-bit float is stored as two unsigned chars (assumed to be 8 bits each).
typedef struct
unsigned char exps;
unsigned char frac;
} float16 ;
You are now asked to finish implementing a function that takes a float16 and divides it by 2. The value returned by div2 should follow the IEEE format restrictions that we described earlier for this 16-bit number. You may assume that the argument f is non-negative.
float16 div2(float16 f)
/* save copies so we can modify struct fields */
unsigned char frac = f.frac;
unsigned char exps = f.exps;
/* check for infinity and NaN */ if ( exps == )
/* in that case, division gives us same number */
f.frac = frac;
f.exps = exps;
/* check for denormalized numbers */ if ( exps == )
f.frac = ;
f.exps = ; }
/* check for a normalized number that becomes denormalized */
if ( exps {
return f; }
) /* remaining cases */

Problem 2. (12 points):
This problem concerns the way virtual addresses are translated into physical addresses. Imagine a system with the following parameters:
• Virtual addresses are 20 bits wide.
• Physical addresses are 18 bits wide.
• The page size is 4096 bytes.
• The TLB is 2-way set associative with 16 total entries.
The contents of the TLB and the first 32 entries of the page table are shown as follows. All numbers are given in hexadecimal.
1 2 3 4 5 6 7
Tag PPN Valid
16 13 1 1B 2D 1 100F1 0F1E0 1F011 111F0 032B1 1D230 06081 0F191 0A091 1F 20 1 02 13 0 18 12 1 0C 0B 0 1E 24 0
Page Table
VPN PPN Valid VPN PPN Valid
00 17 1 10 26 0 01 28 1 11 17 0 02141120E1 030B013101 04260142D0 05131151B0 060F116311 0710117120 081C018231 0925119040 0A 31 0 1A 0C 1 0B 16 1 1B 2B 1 0C 01 1 1C 1E 0 0D 15 1 1D 3E 1 0E 0C 0 1E 27 1 0F 14 0 1F 18 1

1. The diagram below shows the bits of a virtual address. Please indicate the locations of the following fields by placing an ’X’ in the corresponding boxes of that field’s row. For example, if the virtual page offset were computed from the 2 most significant bits of the virtual address, you would mark the ’O’ (offset) column as shown:
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
O The virtual page offset N The virtual page number
I The TLB index T The TLB tag
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 O
2. The diagram below shows the format of a physical address. Please indicate the locations of the following fields by placing an ’X’ in the corresponding boxes of that field’s row.
O The physical page offset N The physical page number
17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0 O

For the given virtual addresses, please indicate the TLB entry accessed and the physical address. Indicate whether the TLB misses and whether a page fault occurs. If there is a page fault, enter “-” for “PPN” and leave the physical address blank.
1. Virtual address (one bit per box)
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Parameter Value VPN 0x TLB Index 0x TLB Tag 0x
Parameter Value TLB Hit? (Y/N)
Page Fault? (Y/N)
3. Physical address(one bit per box)
17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
1. Virtual address (one bit per box)
19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0
Parameter Value VPN 0x TLB Index 0x TLB Tag 0x
Parameter Value TLB Hit? (Y/N)
Page Fault? (Y/N)
3. Physical address(one bit per box)
17 16 15 14 13 12 11 10 9 8 7 6 5 4 3 2 1 0

Problem 3. (14 points):
Short-answer questions, two points each. Write at most one line’s worth of text to each question. Make sure that your answer is legible: use the scratch space at the top until you’re sure of your answer. We will not grade your scratch space.
A. Briefly explain why a segregated list allocator is almost always faster than an equally well-written explicit freelist allocator.
B. Name the most important thing a process has that a thread lacks.
C. Name one advantage of increasing the page size for virtual memory. Name one disadvantage.
E. Normally,theexponentfieldofafloatingpointnumberisinterpretedasE−bias;butindenormalized numbers (when E = 0), it is interpreted as 0 − bias + 1. Explain why.
F. In network code using sockets, what condition would cause your process to receive a SIGPIPE?
G. I’m writing a code to run a large simulation. I have made changes which I hope will halve the running time, and I try to get timings on the program by running it on my laptop. After one hour I suspend it (which sends SIGSTOP to all programs) and go home, where some hours later I unsuspend it (sending SIGCONT); the program terminates a few minutes later. Name one timing measure which should still indicate a large speedup, and another which will be useless.

Problem 4. (6 points):
This problem tests your understanding of signals, setjmp() and longjmp().
#include
#include
#include
#include
struct list_s {
struct list_s *next;
static jmp_buf jmpEnv;
static void handleSEGV(int signal) {
longjmp(jmpEnv, 1);
struct list_s *pushlist(struct list_s *head, int value) {
struct list_s *ele;
ele = (struct list_s *)malloc(sizeof(struct list_s));
if(ele == NULL) {
ele->data = value;
return ele; }
volatile int sum = 0; /* ’sum’ is not kept in a register */
setjmp(jmpEnv);
if(sum) return sum;
while(1) {

int main() {
struct list_s *list = NULL;
signal(SIGSEGV, handleSEGV);
list = pushlist(list, 2);
list = pushlist(list, 4);
list = pushlist(list, 6);
printf(“%d
”, func(list));
return 0; }
A. What is printed by this program?
B. Explain briefly what happens if you call func(NULL).

Problem 5. (8 points):
A. You are given the following loop:
for(prod = 1, i = 0; i < 128; i++) { prod = prod * vec[i]; That compiles into the following assembly code: movl \$0x1, %eax movl \$0x0, %esi imul 0xfffffde8(%ebp,%esi,4),%eax cmpl \$0x80, %esi Calculate the CPE (cycles per element) of this loop assuming 4 cycles for a multiply instruction, 3 cycles for a memory load, and 1 cycle for all other instructions. The processor is fully pipelined, and can issue multiple instructions every cycle. For simplicity, assume there is an unbounded number of functional units. Also assume that the processor makes no branch mispredictions. B. Assume the loop is then unrolled as follows: for(prodA = 1, prodB = 1, prodC = 1, prodD = 1, i = 0; i < 126; i += 3) { prodA = prodA * vec[i]; prodB = prodB * vec[i+1]; prodC = prodC * vec[i+2]; prod = vec[i] * vec[i+1]; prod *= prodA * prodB * prodC; That compiles into the following assembly code: movl \$0x1, %eax movl \$0x1, %ebx movl \$0x1, %ecx movl \$0x0, %esi imul 0xfffffde8(%ebp,%esi,4),%eax imul 0xfffffdec(%ebp,%esi,4),%ebx imul 0xfffffdf0(%ebp,%esi,4),%ecx addl 0x3, %esi cmpl \$0x7D, %esi movl 0xfffffde8(%ebp,%esi,4),%edx imul 0xfffffdec(%ebp,%esi,4),%edx imul %ebx, %eax imul %ecx, %edx imul %edx, %eax Calculate the CPE of this unrolled loop using the same assumptions from part 1. Problem 6. (8 points): This problem deals with the bit representation of integer values. Assume that all values are 32 bits in length and stored in two’s complement form. A. Usingonlyconstantsandthe<> 16) | (x << 16); int b = a | (a >> 8) | (a << 8); int c = b | (b >> 4) | (b << 4); int d = c | (c >> 2) | (c << 2); int e = d | (d >> 1) | (d << 1); return e; } A. Whatistheoutputoffunc(12)? B. Whatistheoutputoffunc(0)? C. Usingonlyconstants,thefunc()function,andthe ̃and&operators, provide code for: x = (a == 0) ? 1 : 0. Problem 7. (8 points): Time to hand-optimize some code. dot(const struct vector *a, const struct vector *b) { int sum = 0; for(i = 0; i < N; i++) { sum += a->x[i] * b->x[i];
return sum;
#define N 15213
2 struct vector {
This code is computing the inner product of two vectors. Unroll the loop (doing two operations per iteration).
int dot(const struct vector *a, const struct vector *b) {

struct vector {
dot(const struct vector *a, const struct vector *b) { intsum=0;
assert(a->n == b->n);
for(i=0;in;i++){ sum += a->x[i] * b->x[i];
return sum;
This code again computes the inner product of the vector arguments, but this time the vectors have variable length. Find three loop-invariant computations and lift them out of the loop.
int dot(const struct vector *a, const struct vector *b) {
C. Iliftedthethreeloopinvariantcomputationsoutoftheloop,thentypedgcc-O2dot.c-odot but I didn’t notice any speedup. Why not?

Problem 8. (9 points):
This problem will test your understanding of x86 assembly code. Consider the following C code:
struct list {
struct list *next;
struct list *
get_ith_elem(struct list *head, int i) { /*assume i < length of list */ int j = 0; struct list *node = head; while(i != j) { node = node->next;
return node;
A. Using the following skeleton, fill in the assembly code that realizes the C function above. Do not use C variable names.
1 get_ith_elem:
2 push %ebp
3 movl %esp,%ebp
4 movl 0xc(%ebp),%ecx
5 movl 0x8(%ebp),%eax
6 movl \$0,%edx
7 cmpl %edx,%ecx
8 je _______
10 incl _______
11 cmpl %edx,_______
12 movl (_______),%eax
13 jne .L1
B. For each C variable, write the register with which it corresponds in the above assembly code.
i ________
j ________
node ________

C. Suppose instead we wanted to retrieve the ith node’s data field. That is, the final line of the C code would become return node->data. In assembly, this can be accomplished by adding one in- struction. What is that instruction and between what lines would it be placed?

Problem 9. (9 points):
This question will test your understanding of concurrent programming, specifically deadlocks. For these questions, assume each function is executed by a unique thread on a uniprocessor system.
A. Consider the following C code:
2 P(lock1);
3 P(lock2);
4 P(lock3);
6 /* do some work */
8 V(lock2);
9 V(lock3);
10 V(lock1);
13 P(lock1);
14 P(lock2);
15 P(lock3);
17 /* do some work */
19 V(lock1);
20 V(lock2);
21 V(lock3);
Does this code contain a deadlock? If so, write a sequence of line numbers that, when executed in that order, will cause the deadlock.
B. Consider the following C code:
2 P(lock1);
3 P(lock2);
4 P(lock3);
6 /* do some work */
8 V(lock3);
9 V(lock2);
10 V(lock1);
13 P(lock3);
14 P(lock2);
15 P(lock1);
17 /* do some work */
19 V(lock3);
20 V(lock2);
21 V(lock1);
Does this code contain a deadlock? If so, write a sequence of line numbers that, when executed in that order, will cause the deadlock.

C. Consider the following C code:
2 P(lock1);
3 P(lock2);
5 /* do some work */
7 V(lock2);
8 V(lock1);
20 P(lock2);
21 P(lock3);
23 /* do some work */
25 V(lock3);
26 V(lock2);
11 P(lock3);
12 P(lock1);
14 /* do some work */
16 V(lock1);
17 V(lock3);
Does this code contain a deadlock? If so, write a sequence of line numbers that, when executed in that order, will cause the deadlock.

Problem 10. (13 points):
The code shown below implements a simple client/server communication protocol. In this protocol, the server sends a block of data using the following format:
Content-Length:
where is the number of bytes, and is the set of bytes to be sent. The first line is terminated by a newline (’
’) character.
Here is an initial implementation of the function used by the server to send one block of data:
1 /* Send block of data to client */
2 void server_write(int clientfd, char *data, int len) 3{
4 char buf[MAXBUF];
5 sprintf(buf, “Content-Length: %d
%s”, len, data);
6 Rio_writen(clientfd, buf, strlen(buf));
Here is an initial implementation of the function used by the client to receive all the blocks of data from the server:
1 /* Read data sent by server */ 2 void client_read(int serverfd) 3{
4 rio_t rio;
5 char buf[MAXBUF];
6 int n, len;
8 while (Rio_readlineb(&rio, buf, MAXBUF) > 0) {
9 if (sscanf(buf, “Content-Length: %d”, &len) > 0) {
buf[n] = ’’; /* Terminate string */
”, buf);
printf(“Couldn’t determine content length
”);

As a test, the following function runs a session in which the server sends as data the strings “Hello” and “World!”:
1 void server_session(int clientfd) 2{
server_write(clientfd, words[i], strlen(words[i]));
sleep(1); /* Ensures that writes don’t get combined */
The call to sleep on line 7 of this function simply ensures that each call to server_write will generate a separate packet transmission from the server to the client. Otherwise, the operating system tries to coalesce the results of multiple writes into a single packet.
char *words = {“Hello”, “World!”}; inti;
for(i=0;i<2;i++){ When we run the server and client, the client prints the following result: Client received data: ’Conte’ Couldn’t determine content length Couldn’t determine content length A. Explain briefly why the session didn’t work the way it should. Include in your explanation why the first block of data received by the client was ’Conte’ rather than ’Hello’. B. Show how you could modify the client code to reliably process messages sent according to the pro- tocol. Describe your modification of function client_read, referring to the line numbers of the existing code. Give a brief explanation of how this change will make the session work properly. C. Suppose you have already distributed the client code to millions of customers. You can’t repair a buggy client, but you can modify the server code to compensate for the client’s shortcomings. De- scribe your modification of function server_write, referring to the line numbers of the existing code. Give a brief explanation of how this change will make the session work properly. D. What serious security flaw exists in the client function? Describe briefly how it could be exploited by a malicious server. Problem 11. (10 points): Suppose we have a simple in-order pipeline processor that blocks on data misses. The data cache of this processor has the following configurations: Cache size Cache line Cache associativity Cache latency Latency between cache and memory 4-way, uses LRU replacement policy H cycles Table 1: “Cache latency” is the time to fetch data from the cache to the CPU registers; “Latency between cache and memory” is the time to fetch a cache block from the memory into the cache. The following C code sequentially accesses D elements of array A for 100 times. To simplify your reasoning, assume the only memory accesses are to the entries of A; A starts at memory address 0; the cache is initially empty before the outer loop; sizeof(char)=1. char A[N]; /* N >> D */
int sum = 0;
for (i = 0; i < 100; i ++) { for (j = 0; j < D; j ++) { sum += A[j]; We run the above code with various values of D and get the following graph. The horizontal axis is the total number of bytes in the D elements and the vertical axis is the average latency to access an element in the array A. ZM Increasing values of D Determine the following values with the configuration parameters in Table 1 (C, B, 4, H, and P) A.Y=_______________ Z=_______________ B. Number of cache lines = _________________ C. Number of cache sets = _________________ D. M = _______________ E. What’s the hit rate when D > M (in the range of the second plateau)? hit rate = _________ F. X = _______________
Average access latency (cycles)

Problem 12. (6 points):
This question tests your understanding of file sharing on UNIX systems. Suppose “letter.txt” contains “abcdefg…xyz” and “number.txt” contains “0123…789”. Consider the following code. (We don’t check return values for space reasons. Assume all functions return normally.) Write down the values of “ch” in the comments (an example answer is given at line 13) and answer a short question.
1 int main(int argc, char *argv[]) 2{
3 int fd1, fd2, fd3, fd4;
4 char ch;
6 fd1 = open(“number.txt”, O_RDONLY);
7 fd2 = open(“number.txt”, O_RDONLY);
9 fd3 = open(“letter.txt”, O_RDONLY);
10 fd4 = dup(fd3); 11