In computer science, a kernelization is an efficient algorithm that preprocesses instances of decision problems by mapping them to equivalent instances with a guaranteed upper bound on the size of the output, called the kernel of the instance. Kernelization is often achieved by applying a set of reduction rules that cut away parts of the instance that are easy to handle. Being a concept of parameterized complexity theory, it is natural to give the guarantee on the size of a kernel by comparing it with a parameter associated to the problem. Indeed, kernelization can be used as a technique for creating fixed-parameter tractable algorithms.
Contents |
The standard example for a kernelization algorithm is the kernelization of the vertex cover problem by S. Buss.[1] In this problem, we are given a graph and a number and we want to know whether there is a vertex cover of size . This problem is NP-hard and cannot be solved in polynomial time unless P=NP. However, some easy reduction rules for this problem exist:
Rule 1. If is a vertex of degree , remove from the graph and decrease by one.
Any vertex cover of size must contain since otherwise all its neighbors would have to be picked to cover the incident edges.
Rule 2. If is an isolated vertex, remove it.
Isolated vertices cannot cover any edges and are not required for finding a vertex cover.
Rule 3. If more than edges remain in the graph, it cannot contain a vertex cover of size .
To see this, consider a vertex cover of size . Since the degree of every vertex was reduced to at most by exhaustive application of the first rule, at most edges are incident to . On the other hand, all edges of are incident to since it is a vertex cover, so indeed has at most edges. Furthermore, since every vertex is either in or adjacent to , the number of vertices is at most . Thus, we either know that the instance is a no-instance or we end up with a graph of at most edges and vertices.
The most straightforward exhaustive search algorithm for the vertex cover problem runs in time roughly , where is the number of vertices. This algorithms just tries all subsets of the vertices and checks whether the given subset is a vertex cover. The algorithm can then check whether the graph has a vertex cover of size at most .
To speed up this and more advanced exhaustive search algorithms, it is desirable to first apply an efficient kernelization algorithm to the instance and run the exhaustive search on the thus created kernel. The running time of the overall algorithm is then dominated by the number of vertices in the kernel, for which reason the goal is to come up with kernels that have as few vertices as possible. For vertex cover, kernelization algorithms are known that produce kernels with at most vertices.
One kernelization algorithm that achieves the vertex bound exploits the half-integrality of the linear program relaxation of vertex cover due to Nemhauser and Trotter.[2] Another kernelization algorithm achieving that bound is based on what is known as the crown reduction rule and is based on alternating path arguments.[3]
The currently best known kernelization algorithm in terms of the number of vertices is due to Lampis (2011) and achieves vertices for any fixed constant .
Given the many kernelization algorithms that exist and provide successive improvements, it is natural to ask for lower bounds on kernel sizes. Here the kernel sizes can be measured in terms of the number of vertices and in terms of the number of edges. It is suspected that the above mentioned kernelization algorithms for vertex cover are essentially optimal. In this case, kernels with vertices or with edges cannot be produced efficiently. Recently, it has been shown that the latter would imply that coNP NP/poly,[4] which is a statement in complexity theory that is generally believed to be false.
In the literature, there is no clear consensus on how kernelization should be formally defined and there are subtle differences in the uses of that expression.
In the Notation of Downey & Fellows (1999), a parameterized problem is a subset .
A kernelization for a parameterized problem is an algorithm that takes an instance and maps it in time polynomial in and to an instance such that
The output of kernelization is called a kernel. In this general context, the size of the string just refers to its length. Some authors prefer to use the number of vertices or the number of edges as the size measure in the context of graph problems.
In the Notation of Flum & Grohe (2006, p. 437), a parameterized problem consists of a decision problem and a function , the parameterization. The parameter of an instance is the number .
A kernelization for a parameterized problem is an algorithm that takes an instance with parameter and maps it in polynomial time to an instance such that
Note that in this notation, the bound on the size of implies that the parameter of is also bounded by a function in .
The function is often referred to as the size of the kernel. If , it is said that admits a polynomial kernel. Similarly, for , the problem admits linear kernel.
A problem is fixed-parameter tractable if and only if it is kernelizable (with a computable size bound on the kernels) and decidable.
That a kernelizable and decidable problem is fixed-parameter tractable, can be seen from the definition above. Use the kernelization algorithm, which uses time for some c, and then solve the output from the kernelization in time for some function g, which is finite, because the problem is decidable. The total running time is . The other direction, that a fixed-parameter tractable problem is kernelizable and decidable is a bit more involved. We will assume that the question is non-trivial, such that there is at least one instance that is in the language, called , and at least one instance that is not in the language, called . Assume the input is (x, k), and that there exists an algorithm that runs in time . We will now construct a kernelization algorithm, by finding in polynomial time an instance (x', k) where |x'| is bounded by k. If the size of x is less than f(k) we can just return (x, k). Otherwise, run the algorithm that proves the problem is fixed-parameter tractable and if the answer is yes, return , otherwise return . The latter is ok because when , we get also that .