CS 124 Midterm Review 3/7/20

You will have 75 minutes in class to complete the exam. The exam will have true/false questions, multiple choice, example/counterexample problems, run-this-algorithm problems, and problem set style present-and-prove problems.

The exam is difficult. In previous years, the median score has been around 50%.

2 Topics Covered

Copyright By cscodehelp代写 加微信 cscodehelp

This is a pretty comprehensive list of topics we have gone over this first half of the semester, but anything said in lecture is fair game for the exam.

2.1 Math Fundamentals

• Coin Flipping and basic statistics.

• Induction. If P(n) is a statement (“2n is even”), P(1) is true, and ∀n, P(n) → P(n + 1),

then ∀n, P(n) is true.

• Big-O Notation. o, O, ω, Ω, Θ and identifying which of these relations hold for two functions.

• Recurrence Relations. Solve simple ones by finding a pattern and proving them with induction. More complicated recurrences can be solved using the Master Theorem (must memorize)

2.2 Graph Search

• Representation. Adjacency list versus adjacency matrix

• Depth First Search (DFS). Uses a stack to process nodes, Pre and post order numbers, labels for the edges (tree, forward, back, cross).

Important Applications. Detecting cycles, topological sorting, finding strongly connected components

• Breadth First Search (BFS). Uses a queue to process nodes, can be used to find the shortest path when all edge weights are 1.

• Dijkstra’s Algorithm. Single source shortest path for non-negative edge weights. Uses a heap or priority queue to process nodes. Does not work when there are negative edge weights

• Heaps. binary heap implementation, operations: Peek, ExtractMin, Insert (Min-

Heapify), how they are used in Dijkstra’s algorithm

• Bellman-Ford Algorithm. Single source shortest path for general edge weights, edge

relaxation procedure (referred to as update in the lecture notes), detecting negative cycles

• Shortest Path in DAG. Can be done in linear time via dynamic programming regardless of edge weights.

2.3 Minimum Spanning Trees

• Basic Properties. Connected, acyclic, |E| = |V | − 1 (any two imply the third).

• Cut Property: The basis for why our MST algorithms are correct, allows us to greedily add

edges to a sub-MST.

• Prim’s Algorithm: implemented in a similar fashion as Dijkstra’s, builds MST from a single vertex.

• Kruskal’s Algorithm: implemented using a union-find data structure, builds MST from lightest edges of the graph

• Disjoint Forest Data Structure: maintain a bunch of sets that can be combined efficiently Operations: makeset(x), find(x), union(x,y)

Heuristics: Union by Rank, Path Compression

2.4 Greedy

• Main Idea: At each step, make a locally optimal choice in hope of reaching the globally optimal solution.

• Horn Formula: set all variables to false and greedily set ones to be true when forced to

• : find the best encoding by greedily combining the two least frequently

• Set Cover: greedy approximation algorithm with O(log n) performance ratio, showed on

problem set 3 that we can actually achieve Θ(log n) by example.

2.5 Divide and Conquer

• Main Idea: Divide the problem into smaller pieces, recursively solve those, and then combine them in the right way.

• Mergesort and Stoogesort.

• Integer Multiplication. perform 3 multiplications on n/2 digit numbers, and then do some

• Strassen’s Algorithm. multiplies two n × n matrices by doing 7 multiplications on n/2 × n/2 matrices.

2.6 Dynamic Programming

• Main Idea: Maintain a lookup table of correct solutions to sub-problems and build up this table in a particular order.

• Run-time and Space Analysis. How does one go about analyzing the time and space complexity of a DP algorithm?

• String Reconstruction. Tries to find where the first dictionary word ends by checking all possible breaking points.

• Edit Distance. Tries all possibilities of insert, delete, change on letters that are different.

• All Pairs Shortest Paths. Uses the idea that the shortest path to a node must have come

via one of the neighboring nodes.

• Traveling Salesman. DP can provide better exponential-time solutions to NP-hard problems.

3 Practice Problems

Exercise 1. Answer True or False for the following:

log5(n) = Ω(log3(n))

2logn=ω(n)

Suppose T (n) = 2T (n/b) + n. Then, as b → ∞, we eventually get T (n) = Θ(1). IfT(n)=T(√4 n)+1,thenT(n)=o(loglogn).

n log(n!) = O(n log n) n! = o(nn). from CLRS

2 2logn=o(n).fromCLRS

f(n)2 = ω(f(n)) for every positive f.

(c) F Take the limit:

Therefore, the ω relationship is false and the relationship should be o.

T Let’s take the limit:

2√n−n The limit is 0 and therefore the little o relationship holds.

lim n =lim n =lim2

n→∞ 2 n→∞ 2 n→∞

T Using our change of base formulas, we have:

log5n= log3n =log53·log3n≥0.1log3n

log3 5 and therefore the Ω relationship is true.

2 logn 2 logn =lim logn =0

n→∞ n n→∞2

F As b → ∞, it is true that T (n) becomes smaller because you are breaking the problem into more and more sub-parts. However, no matter how big b is, you always need to pay the fixed cost of n and so asymptotically, T(n) will never be constant. You can also see this through the Master Theorem.

F Perform a similar substitution as the one we did on the problem set. Let S(n) = T(2n). We get the recurrence: S(n) = S(n/4)+1 which solves to S(n) = log(n) and thus T (n) = log log(n). Therefore, the correct relationship should be Θ and not o.

(g) T Take the limit:

F log(n!) = O(n log n), so clearly n log(n!) = ω(n log n). The proof is by laws of logarithms: log(n!) = ni=1 logi ≤ ni=1 logn = nlogn.

limn!=limn i≤lim1=0 n→∞ nn n→∞i=1 n n→∞ n

√ √1 √2logn

T Since 2x=o(x),thenforlargex, 2x≤2x. Thenplugginginx=logn,2 ≤

1 logn logn 1/2 √2logn 22 =(2 ) forlargen. Thus,2

F Counterexample. f(n) = 1. Clearly, f(n)2 = 1 = Θ(f(n)), so it is not ω.

),andthuso(n).

Exercise 2. Run each of the following algorithms on the following graph, roughly describing what happens at each step.

(a) Kruskal’s for MST

(b) Prim’s for MST, beginning at A.

(c) Dijkstra’s for single-source shortest paths, beginning at A. A3B1C

(a) Sort edges in non-decreasing order. So we may choose (B, C), then (B, D), then (D, E), then discard (B, E) and (C, E) because they would form a cycle, then add (A, B), and we are done. The final MST contains edges {(A, B), (B, C), (B, D), (D, E)}. (It would also be correct to

break edge weight ties in a different order.)

(b) Initialize the value of A to be 0 and the rest to be infinity. Add A to the cut, so {B,C,D,E} are not in the cut. Then update the value of B to 3, and the value of D to 4; and B,D point to A. Add B to the cut (via the edge (A,B)), so {C,D,E} are not in the cut. Then update the value of D to 1, the value of E to 3, and the value of C to 1; and B,C,E point to B. Add D to the cut (via the edge (B,D)), so {C,E} are not in the cut. Update the value of E to 2, and it points to D. Add C to the cut (via the edge (B,C)), so E is not in the cut. Do not update the value of E, since it is not larger. Then add E to the cut (via the edge (D,E)). (It would also be correct to break value ties in a different order.)

Remark: In general, we might not get the exact same tree from Prim’s and Kruskal’s, but they both give some minimum spanning tree.

(c) The distance of A is 0, and everything else is infinity. Update the distance of B to 3, and the distance of D to 4. Then add B and D to the priority heap with distances as keys. Pop B from the heap. Update the distance of C to be dist(B) + 1 = 4, update the distance of E to be dist(B) + 3 = 6, and do not update the distances of A or D, since they are not shorter. Add C and E to the priority heap. Pop D from the priority heap. Update the distance of E to be dist(D) + 2 = 6, and do not update A or B. Add E to the priority heap. Pop C from the priority heap. Do not update the distances of B or E. Pop E from the priority heap. Do not update the distances of B,C,D. So the final distances are A : 0,B : 3,C : 4,D : 4,E : 6.

Exercise 3. (Problem Set 2) The input to 2SAT is a logical expression of a specific form: it is the conjunction (AND) of a set of clauses, where each clause is the disjunction (OR) of two literals. (A literal is either a Boolean variable or the negation of a Boolean variable.) For example, the following expression is an instance of 2SAT:

(x1 ∨x2)∧(x1 ∨x3)∧(x1 ∨x2)∧(x4 ∨x3)∧(x4 ∨x1)

A solution to an instance of a 2SAT formula is an assignment of the variables to the values T (true) and F (false) so that all the clauses are satisfied that is, there is at least one true literal in each clause. For example, the assignment x1 = T , x2 = F , x3 = F , x4 = T satisfies the 2SAT formula above.

Derive an algorithm that either finds a solution to a 2SAT formula, or returns that no solution exists.

Hint: Reduce to an appropriate problem. It may help to consider the following directed graph, given a formula I in 2SAT: the nodes of the graph are all the variables appearing in I and their negations. For each clause (α ∨ β) in I, we add a directed edge from α to β and a second directed edge from β to α. How can this be interpreted?

Each edge from the hint above represents an implication, i.e. if (u, v) is a directed edge in the graph, thenu=T =⇒ v=T.Thisapplieshere:ifαistrue,thenβmustbetruefortheclausetobe true; if β is true, then α must be true for the clause to be true.

In this graph, strongly connected components correspond to a set of literals that must simultaneously be true or false. This is because for any vertices/literals u, v in the same SCC, if u = T , then v = T

by following the implications along the path from u to v, and if u = F, assume for contradiction that v = T, then we must get u = T by following the implications along the path from v to u, contradiction. An immediate consequence of this is that if a strongly connected component contains both α and α, then the formula is not satisfiable, since it contains a contradiction.

It now suffices to prove that if no pair of literals α and α are in the same SCC, then the formula is satisfiable. We will attempt to construct a solution as follows. We topologically sort the graph of SCCs to obtain S1, S2, . . . , Sk, where Si are sets of variables that form an SCC, such that there is animplicationu→vforu∈Si andv∈Sj iffi≤j. Startingfromt=kandgoingbackwards,we set all literals in the SCC St to false if some u ∈ St has been assigned to false, and we set all literals in the SCC St to true otherwise. (Note that we always go with the latter case when k = t, since no literals have been set yet at that point.) We then assign α ̄ to the negation of what we assign to literals in St for each αinSt. Note that by our construction, α ̄ is either preset already to the desired value before we process St, or it has not been set yet and is in some SCC Sj such that j < t.
We show that there can be no contradiction in such assignment iff no literal and its negation are in the same SCC. The “only if” direction is obvious. For the “if” direction, assume that the first SCC during our algorithm where we encounter a contradicting assignment is St for some t < k, where u,v ∈ St, u ≠ v ̄ and u,v have been preset to T and F, respectively. By the skew-symmetric construction of the graph, since there is a path from u to v, there must be a path from v ̄ to u ̄, and since there is a path from v to u, there must be a path from u ̄ to v ̄. (Note the relationship between (α ̄, β) and (β ̄, α).) This implies that u ̄, v ̄ must be in the same SCC, and since they are what caused u and v to be preset with contradictory values, they are in some Sj with j > t. Since u ̄, v ̄ must be F and T, respectively, this contradicts the maximality of t. Therefore, our algorithm is correct.

To finish, we check the runtime of this algorithm. Converting the graph into an SCC graph G and running DFS on G′ takes O(n + m) time. Constructing the solution involves considering each literal a constant number of times, so it takes O(n) time. Therefore, this algorithm takes O(n + m), which is linear.

Exercise 4. Elections Ballotville is having an election for mayor. Voters in Ballotville submit ranked choice ballots, where they specify an ordering of the set of candidates C = c1, · · · , cn they prefer, such as c2, cn, c1, . . . , c2. Let Mi,j be the margin of votes between candidate i and j, so

Mi,j = #(voters who prefer i to j) − #(voters who prefer j to i)

A set S ⊂ C of candidates is called the Smith set if it is the smallest set S′ such that for all ci ∈ S′ and for all cj ∈ S′c, Mi,j > 0 (that is, every candidate in S′ would pairwise beat every candidate not in S′). Give an O(n2) algorithm for determining the Smith set. Hint: consider a graph where the candidates are vertices, and determine a suitable representation for the edges.

Let G = (V,E) be a graph with vertices v1,v2,…,vn corresponding to candidates c1,c2,…,cn. Let ei,j be a directed edge from i to j if Mi,j > 0. Note that ei,j,ej,i ∈ E iff Mi,j = 0.

Thus we want to find a set of vertices S such that for all vi ∈ S,vj ∈ Sc, there exists an edge ei,j. This is equivalent to finding the source SCC in a directed graph. Thus, we apply Tarjan’s algorithm

for finding the SCCs in a graph.

Correctness: We proved the correctness of Tarjan’s algorithm in class. It therefore suffices to show that the Smith set corresponds to the source SCC.

(Smith set implies source SCC) Suppose towards a contradiction that there is a candidate ci that is in the Smith set but vi is not in the source SCC of the graph. Since it is not in the source SCC, there must be a path from some vj (in the source SCC) to vi but not vice versa. Note that Mi,j = −Mj,i and there exists an edge ei,j iff Mi,j > 0, so it must be that either ei,j ∈ E or ej,i ∈ E

(or both).

Thus, one path from vj to vi is ej,i. It cannot be that ei,j also exists, as then every vertex in the source SCC would be reachable from vi (through ei,j). Thus, vj beats vi and vi does not beat vj, so vi is not in the Smith set, a contradiction.

(source SCC implies Smith set) Suppose towards a contradiction that there is a vertex vi in the source SCC but ci is not in the Smith set. Since ci is not in the Smith set, there exists a candidate cj in the Smith set such that ej,i exists and ei,j does not.

Moreover, there is no other “path” by which vj is reachable from vi, since vj is only reachable via candidates in the Smith set, which vi is not in. Therefore, there is a node vj not reachable from vi, a contradiction to vi’s membership in the source SCC.

Runtime: The runtime for Tarjan’s algorithm is linear, as shown in class. Here, O(|V | + |E|) = O(n + n2) = O(n2).

Exercise 5. A directed graph G = (V, E) is called semiconnected if for each pair of distinct vertices u,v∈V,thenthereisapathfromutov,apathfromvtou,orboth. Findanalgorithmto determine whether a directed graph is semiconnected. Hint: Look at the SCC graph.

Let G′ be the DAG on the strongly connected components of G. Number the nodes of G′ in some topological order. We claim that G is semiconnected iff there is always a path from the i-th node to thej-thnodeofG′ ifi