# IT代写 COMP3004 — Computability and Complexity Part II, Complexity – cscodehelp代写

COMP3004 — Computability and Complexity Part II, Complexity

With some answers to the exercises.

December 4, 2017

Definitely worth trying Wikipedia’s webpage http://en.wikipedia.org/wiki/Computational complexity.

Copyright By cscodehelp代写 加微信 cscodehelp

M Garey and D Johnson, “Computers and Intractability — a guide to the theory of NP- Completeness”, Freeman 1986. Fairly advanced. The ‘NP-completeness bible’.

M Sipser, “Introduction to the theory of Computation”, MIT press, International Edition, 2006. Covers exactly what we cover here, students like it.

C Papadimitriou, “Computational Complexity”, Addison-Wesley, 1995. T Sudkamp, “Languages and Machines”, 2nd Ed., Addison-Wesley, 1997.

J Hopcroft, R Motwani and J Ullman, “An introduction to automata theory, languages, and computation”, Addison-Wesley, second edition, 2001.

V J Rayward-Smith, “A first course in computability”, Publications, 1986. An introductory text that covers much of both parts of this course. Watch out for errors. May be out of print.

Review of computability and scope of this part of the course

In the first part of this course you should have learnt what we mean by “a computable problem”. The issue with computability is whether a given problem can be solved algorithmically, at least in principle. So you should know that the halting problem is uncomputable (undecidable) whereas the problem of sorting a list is certainly computable.

This is all well and good, except that there are many problems in computer science which are decidable (in principle) but where the run-time of any algorithm that solves them seems to be ridiculously long. There are other algorithms which seem to use up vast amounts of memory. In practice we cannot solve these problems, unless we are restricted to very small cases.

The point of the study of complexity is to make a scientific classification of the resources needed to solve certain problems. By resources, we mean the amount of time (or the number of CPU cycles) and the amount of memory used.

In a previous course (Theory II), you already made a start on this subject by considering the so-called ‘O’ notation. So you should know what we mean if we say that an algorithm runs in cubic time or O(n3). When we say O(n3) we are ignoring the difference, say, between the functions n3 and 27 × n3 + 97 × n2 + 1034 — both functions are O(n3). In this course we will have an even

coarser classification and group all algorithms that run in time O(n2), O(n3), . . . , O(nk) together and classify them as polynomial time algorithms. We will use the concept of polynomial time reduction to examine other complexity classes, some of which can be far worse than polynomial time.

What you need to know before you start

You should know what a polynomial is. A polynomial is just a sum of powers in some variable, e.g. 2×n5 +3×n4 −7×n2 +2035 is a polynomial. You should be able to add, subtract and multiply any two polynomials.

You should know about O notation. You should understand O(n7), for example. A function f(n) is O(n7) if there is a constant c and an integer N such that for all n > N we have f(n) ≤ c.n7.

You should know what a Turing machine is. M = (Q,Σ,q0,δ,F) where

• Q is a finite set of states,

• Σ is the alphabet (used for input, output and intermediate computations), • q0 ∈ Q is the initial state,

• δ is a (partial) function from Q×Σ to Q×Σ×{−1,0,+1}. δ is the key part and can be thought of as a computer program. If M is in state q and reading symbol s and if δ(q, s) = (q′, s′, m) (m ∈ {−1, 0, +1}) then M moves into state q′ writes the symbol s′ over s and then moves according to m,

• F ⊆ Q is the set of final or accepting states.

You should know how a Turing machine runs, what it accepts. If x ∈ Σ∗ is a string then M accepts x if M ends up in a final state if it is started in state q0 with x on the input tape with the tape head in position 0. If M gets into a state q reading a symbol s and δ(q, s) is not defined then M rejects x. Otherwise, M does not halt.

Let M be a Turing machine and let w ∈ Σ∗ be a string. We define fM (w) to be the contents of the tape when M halts on input w if M halts, and fM (w) is undefined if M does not halt on input w.1

You should know what the language recognized by a Turing machine is. It is the set of all strings (over the given alphabet) that are accepted by the machine. A set of strings recognized by a Turing machine is sometimes called recursively enumerable.

You should know what we mean by decidable and undecidable. A set of strings is decidable if there is a Turing machine that recognizes it and is sure to terminate on any string. A set of strings is called undecidable otherwise.

You should know some other things which I’ll remind you of when I think of them.

Polynomial time and Intractability

If you have tried your hand at programming you will know that some problems are easier than others. Sometimes if you think very hard about a problem for a long time you will eventually find a neat program that solves it without taking too much running time or memory. For certain other problems people have been trying for years and years to find efficient programs to solve them but they have still not managed to do this.

1This is a purely mathematical definition of the function fM. The fact that, for some Turing machines, we cannot even tell whether M halts on input w or not, is immaterial. M will either halt or not and if it does there will be a definite content to the tape, so fM is well-defined.

Figure 1: Various polynomial and non-polynomial functions

This can be a serious difficulty. Suppose you have a list of integers and you want to sort them in ascending order. One way of doing this is to try all possible permutations of the list and for each permutation check if it is in ascending order. It doesn’t take a computer long to check if a list is in the correct order (about n checks if there are n integers in the list), but unfortunately there are n! possible permutations.

If the size of the list is small (say n < 5) then this need not be a problem. 5! = 120 and a computer can check 120 cases quite fast. However for larger n this method becomes entirely impractical. To see this note that n! > 2n for n ≥ 4. Look at the graph in figure 1 to see how big 2n can get.

If the size of the list is 5,000, for example, then the computer has to check 5,000! cases and each check involves about 5,000 comparisons so altogether something like 103900 comparisons. Now if each comparison takes 1 millisecond, that will take about 103890 years. The age of the universe is around 6 × 109 years. Thus n! operations is too many for practical purposes if n isn’t very small. For the sorting problem, of course, there are much faster algorithms than just trying all possible permutations. But for other problems there simply doesn’t seem to be a fast algorithm that solves them.

Apart from the length of time involved in performing a calculation, the other resource that you have to think about on a computer is the amount of memory. Several of my programs have met the response “not enough heap space, need extra memory” or something like it. Here again for some problems you can avoid the problem by redesigning your program or by getting extra

memory for your machine. However there are other problems that really seem to need a lot of memory and for non-trivial instances of the problem this may require much more memory than could ever, practically, be provided.

The idea in complexity theory is that we can approach the issue of how much resources (whether time or space) required by a given problem in a scientific way. The speed of the computer varies considerably from one machine to the next, but it can be seen that for high complexities (like exponential growth), these variations don’t make much difference to the outcome — 103889 years isn’t a lot better than 103890 years. Also, suppose we have an algorithm whose run time is proportional to 2n time units. If n = 60, suppose this comes to something like 10 billion years on an ordinary computer. A faster computer might do this in just 1 billion years, but this advantage is completely lost if you run the same algorithm on a slightly bigger input, say n = 64. That is because 264 = 260 ∗ 24 = 260 ∗ 16 so 16 billion years. So speeding up the machine isn’t a lot of use.

To illustrate the difference between polynomial and exponential algorithms consider the fol- lowing table.

Run-time max n in 1 second

O(n2) 1000

O(n3) 1000

O(2n) 1000 O(2(2n)) 1000

1000 × 4K 1000 × 26 1000 × 24 1000 + 12 1000.0

1000 × 1012 1000 × 106 1000 × 104 1000 + 40 1000.0

The idea is that we start with an ordinary computer C1 which runs at a certain speed. We have

various algorithms to run; their complexities are O(n),O(n2),O(n3),O(2n) and O(2(2n)) (the last

one is called double exponential time). Suppose, for standardisation, that the maximum input size

n such that C1 can complete the algorithm in 1 second, for each algorithm, is 1000. So the first

algorithm might run in time n , the second in time n2 , the third in time n3 , the fourth in 1000 1000,000 109

time 2n = 2n−1000 and the last in time 2(2n−21000). To help handle larger values of n, suppose 21000

we get hold of a faster computer C2 which runs at a speed 212 = 4192 times as fast as C1. Then we get hold of a truly superb computer C3 which runs at a speed 1012 times the speed of C1. We cannot hope for much better than that. The table illustrates the increased value of n we get for each algorithm by using these faster computers. Generally, for polynomial time algorithms, the value of n is multiplied by a certain factor. The factor is lower for higher powers of n, but we still do get a multiplicative effect. But for exponential functions like 2n, the fact that C2,C3 have speeds which are multiples of the speed of C1, we only get to add a constant to the value of n. The extra speed makes very little impression on the maximum value of n we can handle. For double exponential time, the effect is even smaller — it did not show up at all on my calculator.

This motivates the following definition.

DEFINITION 1 A problem is called tractable if there is a (deterministic) algorithm which solves it and runs in time O(nk), for some k. A problem is called intractable otherwise.

We say that an algorithm runs in polynomial-time or p-time if its run-time is O(nk) for some k.

Note that undecidable problems are classified as intractable, according to this definition. In this course, all the tractable algorithms are lumped together.

When we want to be more formal and precise about complexity we will use Turing machines for algorithms. Recall Church’s thesis that every algorithm can be computed on a Turing machine.

4 Decision Problems and Construction Problems

In computing we have various types of problems. There are execution problems, like “sort this list” or “multiply these matrices”. We have optimization problems, like “find the best way of fitting these boxes into this space” or “find the shortest path from A to B” or “find the minimal

spanning tree” or “find the shortest circuit in this graph”. And there are decision problems like “Is n a prime number?”, “Does the Turing machine M halt on input x?”, “Does graph G have a Hamiltonian circuit?”, or “Is this given circuit in this weighted graph the shortest possible?”.

DEFINITION 2 A decision problem is a problem where the output is always “yes” or “no”.

We focus entirely on decision problems, partly because they are easier to handle, partly because the definitions (later) won’t work except on decision algorithms and partly because we can usually convert other types of problems into decision algorithms.

Example Consider the Travelling Salesman Problem (TSP).

Instance: a weighted graph G (with non-negative integer weights).

Output: the length of a circuit of G that covers all the nodes of G such that the total weight along the circuit is minimal possible.

Clearly this is an optimization problem. However, consider instead the Travelling salesman deci- sion problem (TSDP).

Instance: a weighted graph G and a non-negative integer d. Yes-instance: if there is a circuit of G of total weight ≤ d. No-instance: otherwise.

This is clearly a decision problem. Any solution to TSDP could be used to solve TSP as fol- lows.

For d = 0 to #(nodes)× max. edge weight

if TSDP(G, d) = yes then halt and output d;

EXERCISE 1 Explain briefly why the algorithm above is sure to give the length of the shortest

circuit of G. Why does the For loop terminate at #(nodes)× max. edge weight?

Answer: If the algorithm ouputs k then it must be when d = k and it must be that it failed to find a circuit in the previous iterations, so there is no circuit of length k − 1 or less, but there is a circuit of length k. Hence k is indeed the minimum. The number of edges in a circuit is the number of nodes of the graph, hence every circuit has weight no more than the number of nodes times the maximum edge weight. Hence the shortest circuit must be no higher than that, so the loop must output an answer by that time.

Limitations of this approach

If you are not working with very large problem instances then you can see from the graphs in figure 1 that exponential functions can be better than polynomial functions, for moderate values of n.

This whole analysis is based on worst-case complexity. This is important if you want to be sure that your computations will terminate in good time. But often, average case complexity can be more useful. For example, in linear programming, the simplex algorithm has expo- nential worst-case complexity but its average complexity is only linear. However, very little theoretical analysis has been done for average case complexity.

From now on we are mostly interested in decision problems only.

In this part of the course we consider all the polynomial time problems as the easiest of all problems and we lump them together in a single complexity class called P. However, most computer scientists think that O(n) is fine, O(n2) is tolerable but O(n3) is pretty awful. Complexities like O(n31), while considered tractable here, are considered as intractable to many computer scientists.

The size of the input

Consider the decision problem

Is n prime?

The algorithm in figure 2 appears to solve this problem in p-time. In the algorithm, calculating

answer = ‘yes’; for i = 2 to n − 1

if n (mod i) = 0 then answer = ‘no’; return(answer);

Figure 2: An algorithm to test for primes

n (mod i) can be done quickly — say O(n2). So solving “Is n prime?” takes n×O(n2) = O(n3). However, this argument is WRONG. Lets go throught it again more carefully, using a Turing machine. When we say that a Turing machine runs in time f(n), for some function f, we mean that the number of cycles of the machine will always be less than or equal to f(n) for any input whose size is no more than n. The size of the input to a Turing machine is the number of non-blank

symbols written on the tape before the computation starts.

Now we want to take a number n and write it on the tape of the Turing machine that performs

the prime-testing algorithm in figure 2. We have to decide what alphabet to use for the Turing machine and how to code the number n so that we can write it on the tape. An obvious solution is to make sure the alphabet includes 0 and 1 and to write n in binary notation. With this notation, if the input size is n, the integer could be as big as 2n − 1. Thus, for an instance of size n, the loop in the algorithm could be executed 2n − 1 times. This is not p-time.

We could equally well use a bigger alphabet and use the decimal notation to represent integers and the run-time would still be exponential.

But suppose instead that we use unary notation. This is just like a tally system. An integer n is represented by a string of n ‘1’s. The only non-blank symbol needed is a ‘1’. Then an instance of size n represents the number n. In this case, the algorithm in figure 2 really does run in p-time as the loop is executed just n times.

Moral: be careful about how you code an instance of a problem. Different codings can yield different complexities. If the coding has “lots of padding” then the complexity of an algorithm can appear artificially low, as a function of the (rather large) input size.

7 Polynomial-time reduction

Remember what we are trying to do? No? We are trying to classify the decision problems in a way that distinguishes tractable and intractable problems. The principal tool used to make this classification is called poly-time reduction.

DEFINITION 3 Let A and B be any two decision problems and suppose their instances are coded as strings in some alphabet Σ. Let M be a (deterministic)2 Turing machine. Let the alphabet of M contain Σ. Suppose that for any instance w of A, that (i) fM (w) is always defined (i.e. if w is put on the tape then M is sure to terminate) and (ii) if w is a yes-instance of A then fM (w) is a yes-instance of B and (iii) if w is a no-instance of A then fM(w) is a no-instance of B. [We don’t care about fM(w) if w is not an instance of A.] Then we say that A reduces to B. This is from the first part of the course.

But suppose further that M is guaranteed to terminate in time O(nk), for some fixed k, where n=|w| is the size of an instance of A. In this case we say that A reduces to B in p-time and we write A ≤p B.

The idea of p-time reduction is that if A ≤p B then any solution to B can be converted quickly into a solution to A. Very roughly, we can think of this as meaning that “A is no harder than B”. More on this later.

7.1 Properties of p-time reduction

Reflexive For any decision problem A we have A ≤p A. This is easy. To do the reduction let M

be the TM that does nothing but halts immediately.

Transitive If A ≤p B and B ≤p C then A ≤p C (for any decision problems A, B, C).

Proof. Let M be a deterministic TM that reduces A to B and suppose M runs in time p(n) for some polynomial p, where n is the size of an instance of A. Similarly, let N be a deterministic TM that reduces B to C in time q(n) where q is another polynomial and n is the size of the instance of B.

To reduce A to C use a composite machine (M;N) formed by linking halting states of M to the start state of N. So this composite machine first executes M then executes N on the result.3 Clearly (M;N) reduces A to C. For the run-time, let w be an instance of A of size n. Then first M goes to work and finishes in time p(n). Then N takes over and does its work in time q(n). Total computation time equals p(n) + q(n), still a polynomial. Therefore the reduction was completed in p-time. Right? No, wrong!

The error concerns the calculation of the run-time of (M;N). Let w be an instance of A of size n. Then, sure enough, M completes its work in time p(n). Then we have fM (w) on the tape and N is about to set to work. But the size of fM (w) may not be n. It could be more. So the run-time of N could be more than q(n), it will actually be bound by q(|fM (w)|), but we are not sure how big |fM (w)| is. But don’t worry, we can fix this. The first machine, M, runs in time p(n) and in each unit of time the tape head can only move one space. Therefore, the size of fM (w) cannot be more than p(n) — it simply isn’t possible for M to write symbols outside the range 0–p(n) in the time available. So the second machine, N, must run in time q(p(n)). Now, a polynomial of a polynomial is still a polynomial. The total run-timefor(M;N)isnomorethanp(n)+q(p(n))—apolynomial. ThereforeA≤p C.

Symmetry NO. Just because A ≤p B it does not follow that B ≤p A. 7.2 Example of p-time reduction

Consider the Hamiltonian Circuit Problem (HCP). An instance is a graph G. G is a yes-instance of it contains a Hamiltonian circuit. This is a sequence of distinct nodes n0, n1, . . . , nk of G such that (i) every node of G is in the sequence (so the circuit ‘cove

程序代写 CS代考 加微信: cscodehelp QQ: 2235208643 Email: kyit630461@163.com