Computational complexity theory

Computational complexity theory, as a branch of the theory of computation in computer science, investigates the problems related to the amounts of resources required for the execution of algorithms (e.g., execution time), and the inherent difficulty in providing efficient algorithms for specific computational problems.

Contents

Overview

A typical question of the theory is, "As the size of the input to an algorithm increases, how do the running time and memory requirements of the algorithm change and what are the implications and ramifications of that change?" In other words, the theory, among other things, investigates the scalability of computational problems and algorithms. In particular, the theory places practical limits on what computers can accomplish.

An axiomatic approach to computational complexity was developed by Manuel Blum. He introduced such measures of complexity as the size of a machine and axiomatic complexity measure of recursive functions. The time complexity and space complexity of algorithms are special cases of the axiomatic complexity measure. Blum's axiomatic approach helps to study computational complexity in the most general setting.

An important aspect of the theory is to categorize computational problems and algorithms into complexity classes. The most important open question of complexity theory is whether the complexity class P is the same as the complexity class NP, or is a strict subset as is generally believed. Shortly after the question was first posed, it was realised that many important industry problems in the field of operations research are of an NP subclass called NP-complete. NP-complete problems have the property that solutions to these problems are quick to check, yet the current methods to find solutions are not "efficiently scalable" (as described more formally below). More importantly, if the NP class is larger than P, then no efficiently scalable solutions exist for these problems.

The openness of the P-NP problem prompts and justifies various research areas in the computational complexity theory, such as identification of efficiently solvable special cases of common computational problems, study of the computational complexity of finding approximate or heuristic solutions, research into the hierarchies of complexity classes, and the exploration of multivariate algorithmic approaches such as Parameterized complexity.

Since many practically relevant graph problems are NP-hard, implying that presumably any exact algorithm requires time exponential in the input size, the relatively new idea of parameterized complexity is to take a two-dimensional view, where in addition to the input size we consider a parameter k; a typical parameter is the size of the solution. A problem is called fixed-parameter tractable (FPT) if an instance of size n can be solved in f(k)n^O(1) time, where f is an arbitrary computable function that absorbs the exponential part of the running time. Thus, whenever the parameter turns out to be small in practice, we can expect good running times. Various techniques to design fixed-parameter algorithms include data reduction, iterative compression, and colorcoding.

History

The first publication on computational complexity appeared in 1956 (Trahtenbrot 1956). However, this publication in Russian was for a long time unknown to the Western reader. That is why the beginning of studies in computational complexity can be attributed not only to Trahtenbrot (see also Trahtenbrot 1967), but also to Rabin (Rabin 1959; Rabin 1960; Rabin 1963), Hartmanis, Lewis and Stearns (Hartmanis and Stearns 1964; Hartmanis and Stearns 1965; Hartmanis, Lewis and Stearns 1965). The paper (Hartmanis and Stearns 1965) became the most popular in this area. As a result, some researchers attribute the beginning of computational complexity theory only to Hartmanis, Lewis and Stearns (see, for example, Aho, Hopcroft and Ulman 1976).

Research of Andrey Kolmogorov in the theory of algorithms influenced the development of computational complexity theory. A notable early discovery was the Karatsuba algorithm in 1960, for the multiplication of two numbers. This algorithm disproved Kolmogorov's 1956 conjecture that the fastest multiplication algorithm must be \Omega(n^2), and thus helped launch the study of algorithms in earnest. In 1967, Manuel Blum developed an axiomatic complexity theory based on his axioms and proved an important result, the so called, speed-up theorem.

The field was subsequently expanded by many researchers, including:

Computational complexity theory topics

Time and space complexity

Complexity theory deals with the relative computational difficulty of computing functions and solving other problems. This differs from computability theory, which deals with whether a problem can be solved at all, regardless of the resources required.

A single "problem" is a complete set of related questions, where each question is a finite-length string. For example, the problem FACTORIZE is: given an integer written in binary, return all of the prime factors of that number. A particular question is called an instance. For example, "give the factors of the number 15" is one instance of the FACTORIZE problem.

The time complexity of a problem is the number of steps that it takes to solve an instance of the problem as a function of the size of the input (usually measured in bits), using the most efficient algorithm. To understand this intuitively, consider the example of an instance that is n bits long that can be solved in n² steps. In this example we say the problem has a time complexity of n². Of course, the exact number of steps will depend on exactly what machine or language is being used. To avoid that problem, the Big O notation is generally used (sometimes described as the "order" of the calculation, as in "on the order of"). If a problem has time complexity O(n²) on one typical computer, then it will also have complexity O(n²) on most other computers, so this notation allows us to generalize away from the details of a particular computer.

Example: Mowing grass has linear time complexity because it takes double the time to mow double the area. However, looking up something in a dictionary has only logarithmic time complexity because a double sized dictionary only has to be opened one time more (i.e. exactly in the middle, then the problem size is reduced by half).

The space complexity of a problem is a related concept, that measures the amount of space, or memory required by the algorithm. An informal analogy would be the amount of scratch paper needed while working out a problem with pen and paper. Space complexity is also measured with Big O notation.

There are other measures of computational complexity. For instance, communication complexity is a measure of complexity for distributed computations.

A different measure of problem complexity, which is useful in some cases, is circuit complexity. This is a measure of the size of a boolean circuit needed to compute the answer to a problem, in terms of the number of logic gates required to build the circuit. Such a measure is useful, for example, when designing hardware microchips to compute the function instead of software.

Decision problems

Much of complexity theory deals with decision problems. A decision problem is a problem where the answer is always yes or no. Complexity theory distinguishes between problems verifying yes and problems verifying no answers. A problem that reverses the yes and no answers of another problem is called a complement of that problem.

For example, a well known decision problem IS-PRIME returns a yes answer when a given input is a prime and a no otherwise, while the problem IS-COMPOSITE determines whether a given integer is not a prime number (i.e. a composite number). When IS-PRIME returns a yes, IS-COMPOSITE returns a no, and vice versa. So the IS-COMPOSITE is a complement of IS-PRIME, and similarly IS-PRIME is a complement of IS-COMPOSITE.

Decision problems are often considered because an arbitrary problem can always be reduced to some decision problem. For instance, the problem HAS-FACTOR is: given integers n and k written in binary, return whether n has any prime factors less than k. If we can solve HAS-FACTOR with a certain amount of resources, then we can use that solution to solve FACTORIZE without much more resources. This is accomplished by doing a binary search on k until the smallest factor of n is found, then dividing out that factor and repeating until all the factors are found.

An important result in complexity theory is the fact that no matter how hard a problem can get (i.e. how much time and space resources it requires), there will always be even harder problems. For time complexity, this is proven by the time hierarchy theorem. A similar space hierarchy theorem can also be derived.

Computational resources

Complexity theory analyzes the difficulty of computational problems in terms of many different computational resources. The same problem can be explained in terms of the necessary amounts of many different computational resources, including time, space, randomness, alternation, and other less-intuitive measures. A complexity class is the set of all of the computational problems which can be solved using a certain amount of a certain computational resource.

Perhaps the most well-studied computational resources are deterministic time (DTIME) and deterministic space (DSPACE). These resources represent the amount of computation time and memory space needed on a deterministic computer, like the computers that actually exist. These resources are of great practical interest, and are well-studied.

Some computational problems are easier to analyze in terms of more unusual resources. For example, a nondeterministic Turing machine is a computational model that is allowed to branch out to check many different possibilities at once. The nondeterministic Turing machine has very little to do with how we physically want to compute algorithms, but its branching exactly captures many of the mathematical models we want to analyze, so that nondeterministic time is a very important resource in analyzing computational problems.

Many more unusual computational resources have been used in complexity theory. Technically, any complexity measure can be viewed as a computational resource, and complexity measures are very broadly defined by the Blum complexity axioms.

Complexity classes

A complexity class is the set of all of the computational problems which can be solved using a certain amount of a certain computational resource.

The complexity class P is the set of decision problems that can be solved by a deterministic machine in polynomial time. This class corresponds to an intuitive idea of the problems which can be effectively solved in the worst cases.[1]

The complexity class NP is the set of decision problems that can be solved by a non-deterministic machine in polynomial time. This class contains many problems that people would like to be able to solve effectively, including the Boolean satisfiability problem, the Hamiltonian path problem and the vertex cover problem. All the problems in this class have the property that their solutions can be checked efficiently.[1]

Many complexity classes can be characterized in terms of the mathematical logic needed to express them – this field is called descriptive complexity.

Open questions

The P = NP question

See also: Oracle machine

The question of whether NP is the same set as P (that is whether problems that can be solved in non-deterministic polynomial time can be solved in deterministic polynomial time) is one of the most important open questions in theoretical computer science due to the wide implications a solution would present.[1] If it were true, many important problems would be shown to have "efficient" solutions. These include various types of integer programming in operations research, many problems in logistics, protein structure prediction in biology, and the ability to find formal proofs of pure mathematics theorems efficiently using computers.[2][3] The P=NP problem is one of the Millennium Prize Problems proposed by the Clay Mathematics Institute the solution of which is a US$1,000,000 prize for the first person to provide a solution.[4]

Questions like this motivate the concepts of hard and complete. A set of problems X is hard for a set of problems Y if every problem in Y can be transformed "easily" into some problem in X that produces the same solution. The definition of "easily" is different in different contexts. In the particular case of P versus NP, the relevant hard set is NP-hard – a set of problems, that are not necessarily in NP themselves, but to which any NP problem can be reduced in polynomial time.

Set X is complete for Y if it is hard for Y, and is also a subset of Y. Thus, the set of NP-complete problems contains the most "difficult" problems in NP, in the sense that they are the ones most likely not to be in P. Because the problem of P = NP remains unsolved, being able to reduce a problem to a known NP-complete problem would indicate that there is no known polynomial-time solution for it. Similarly, because all NP problems can be reduced to the set, finding an NP-complete problem that can be solved in polynomial time would mean that P = NP.[1]

Incomplete problems in NP

Diagram of complexity classes provided that PNP. The existence of problems in NP outside both P and NP-complete in this case was established by Ladner.[5]

Incomplete problems are problems in NP that are neither NP-complete nor in P. In other words, incomplete problems can neither be solved in polynomial time nor do they belong to the hardest problems of NP. It has been shown that, if PNP, then there exist NP-incomplete problems.[5][6]

An important open problem in this context is, whether the graph isomorphism problem is in P, NP-complete, or NP-incomplete. The answer is not known, but there are strong hints that the problem is at least not NP-complete.[7] The graph isomorphism problem asks whether two given graphs are isomorphic.

NP = co-NP

Where co-NP is the set containing the complement problems (i.e. problems with the yes/no answers reversed) of NP problems. It is believed that the two classes are not equal, however it has not yet been proven. It has been shown that if these two complexity classes are not equal, then it follows that no NP-complete problem can be in co-NP and no co-NP-complete problem can be in NP.[6]

Intractability

See also: Combinatorial explosion

Problems that can be solved, but not fast enough for the solution to be usable are called intractable (Hopcroft, et al, 2007: 368). Naive complexity theory assumes problems that lack polynomial-time solutions are intractable for more than the smallest inputs. Problems that are known to be intractable in this sense include those that are EXPTIME-complete. If NP is not the same as P, then the NP-complete problems are also intractable in this sense. What this means in practice is open to debate.

To see why exponential-time solutions might be unusable in practice, consider a problem that requires 2n operations to solve (n is the size of the input). For a relatively small input size of n=100, and assuming a computer that can perform 1012 operations per second, a solution would take about 4×1010 years, which is the same order as the current age of the universe. On the other hand, a problem that requires n15 operations would be in P, yet a solution would also take about 4×1010 years for n=100. And a problem that required 20.0000001*n operations would not be in P, but would be solvable for quite large cases.

Finally, saying that a problem is not in P does not imply that all large cases of the problem are hard or even that most of them are. For example the decision problem in Presburger arithmetic has been shown not to be in P, yet algorithms have been written that solve the problem in reasonable times in most cases.

See also

References

  1. 1.0 1.1 1.2 1.3 Sipser, Michael (2006). "Time Complexity". Introduction to the Theory of Computation (2nd edition ed.). USA: Thomson Course Technology. ISBN 0534950973. 
  2. Berger, Bonnie A.; Leighton, Terrance (1998). "Protein folding in the hydrophobic-hydrophilic (HP) model is NP-complete". Journal of Computational Biology 5 (1): p27–40. PMID 9541869. 
  3. Cook, Stephen (April 2000). The P versus NP Problem. Clay Mathematics Institute. http://www.claymath.org/millennium/P_vs_NP/Official_Problem_Description.pdf. Retrieved on 2006-10-18. 
  4. Jaffe, Arthur M. (2006). "The Millennium Grand Challenge in Mathematics". Notices of the AMS 53 (6). http://www.ams.org/notices/200606/fea-jaffe.pdf. Retrieved on 2006-10-18. 
  5. 5.0 5.1 Ladner, Richard E. (1975). "On the structure of polynomial time reducibility" (PDF). Journal of the ACM (JACM) 22 (1): 151–171. doi:10.1145/321864.321877. http://delivery.acm.org/10.1145/330000/321877/p155-ladner.pdf?key1=321877&key2=7146531911&coll=&dl=ACM&CFID=15151515&CFTOKEN=6184618. 
  6. 6.0 6.1 Du, Ding-Zhu; Ko, Ker-I (2000). Theory of Computational Complexity. John Wiley & Sons. ISBN 978-0-471-34506-0. 
  7. Arvind, Vikraman; Kurur, Piyush P. (2006), "Graph isomorphism is in SPP", Information and Computation 204 (5): 835–852, doi:10.1016/j.ic.2006.02.002 

Further reading

External links