Chernoff bound

From Wikipedia, the free encyclopedia

In probability theory, the Chernoff bound, named after Herman Chernoff, gives a lower bound for the success of majority agreement for n independent, equally likely events. The Chernoff bound is a special case of Chernoff's inequality.

A simple motivating example is to consider a biased coin. One side is more likely to come up than the other, but you don't know which and would like to find out. The obvious solution is to flip it many times and then choose the side that comes up the most. But how many times do you have to flip it to be confident that you've chosen correctly?

In general, let E1, ..., En be independent events each having probability p > 1 /2. Then, the Chernoff bound says that the probability of simultaneous occurrence of more than n/2 of the events Ek exceeds

1 - \exp \left( - 2n \left( {p - \frac{1}{2}} \right)^2 \right) .

In our example, suppose that we want to ensure we choose the wrong side with at most the small probability ε. Then, rearranging the above, we must have:

n \geq \frac{1}{(p -1/2)^2} \ln \frac{1}{\sqrt{\varepsilon}}.

If the coin is noticeably biased, say coming up on one side 60% of the time, then we can guess that side with 95% accuracy after 150 flips. If it is 90% biased, then a mere 10 flips suffices. If the coin is only biased a tiny amount, like most real coins are, the number of necessary flips becomes much larger.

More practically, the Chernoff bound is used in randomized algorithms (or in computational devices such as quantum computers) to determine a bound on the number of runs necessary to determine a value by majority agreement, up to a specified probability. For example, suppose an algorithm (or machine) A computes the correct value of a function f with probability at least p > 1/2. If we choose n satisfying the inequality above, the probability that a majority exists and is equal to the correct value is at least 1 − ε, which for small enough ε is quite reliable. If p and ε are constants, the entire right side is a constant, which is what makes algorithms in the complexity class BPP efficient.

Notice that if p is very close to one half, the necessary n can become very large. For example, if p = 1/2 + 1/2m, as it might be in some PP algorithms, the result is that n is bounded below by an exponential function in m:

n \geq 2^{2m} \ln \frac{1}{\sqrt{\varepsilon}}.

[edit] See also

In other languages