Chernoff bound
From Wikipedia, the free encyclopedia
In probability theory, the Chernoff bound, named after Herman Chernoff, gives a lower bound for the success of majority agreement for n independent, equally likely events. The Chernoff bound is a special case of Chernoff's inequality.
A simple motivating example is to consider a biased coin. One side is more likely to come up than the other, but you don't know which and would like to find out. The obvious solution is to flip it many times and then choose the side that comes up the most. But how many times do you have to flip it to be confident that you've chosen correctly?
In general, let E1, ..., En be independent events each having probability p > 1 /2. Then, the Chernoff bound says that the probability of simultaneous occurrence of more than n/2 of the events Ek exceeds
In our example, suppose that we want to ensure we choose the wrong side with at most the small probability ε. Then, rearranging the above, we must have:
If the coin is noticeably biased, say coming up on one side 60% of the time, then we can guess that side with 95% accuracy after 150 flips. If it is 90% biased, then a mere 10 flips suffices. If the coin is only biased a tiny amount, like most real coins are, the number of necessary flips becomes much larger.
More practically, the Chernoff bound is used in randomized algorithms (or in computational devices such as quantum computers) to determine a bound on the number of runs necessary to determine a value by majority agreement, up to a specified probability. For example, suppose an algorithm (or machine) A computes the correct value of a function f with probability at least p > 1/2. If we choose n satisfying the inequality above, the probability that a majority exists and is equal to the correct value is at least 1 − ε, which for small enough ε is quite reliable. If p is a constant, ε diminishes exponentially with growing n, which is what makes algorithms in the complexity class BPP efficient.
Notice that if p is very close to one half, the necessary n can become very large. For example, if p = 1/2 + 1/2m, as it might be in some PP algorithms, the result is that n is bounded below by an exponential function in m:
Contents |
[edit] Chernoff bounds
Herman Chernoff described a method on how to derive bounds on sequences of random variables. All the bounds derived using his method are now known as Chernoff bounds.
Chernoff's method centers around bounding a random variable X, which represents a sequence of random variables, by studying the random variable etX rather than the variable X itself. There are many flavors of Chernoff bounds; we present a bound on the relative error for a series of experiments as well as the absolute error.
[edit] Theorem (absolute error)
The following Theorem is due to Wassily Hoeffding. Assume random variables are i.i.d. Let , , and . Then
and
where
is the Kullback-Leibler divergence between Bernoulli distributed random variables with parameters x and y respectively.
[edit] Proof
The proof of this result relies on Markov's inequality for positive-valued random variables. First, let us set in our bound for ease of notation. Letting λ > 0 be an arbitrary positive real, we see that
Applying Markov's inequality to the last expression, we see that
where the equality follows from the independence of the n Xi's. Now, knowing that , , we have
Because λ is arbitrary, we can minimize the above expression with respect to λ, which is easily done using calculus and some logarithms. Thus,
Setting the last equation to zero and solving, we have
so that . Thus, . As , we see that λ > 0, so our bound is satisfied on λ. Having solved for λ, we can plug back into the equations above to find that
We now have our desired result, that
To complete the proof for the symmetric case, we simply define the random variable Yi = 1 − Xi, apply the same proof, and plug into our bound.
[edit] Simpler bounds
A simpler bound follows by relaxing the theorem using , which follows from the convexity of and the fact that . This results in a special case of Hoeffding's inequality. Sometimes, the bound for , which is stronger for p < 1 / 8, is also used.
Rudolf Ahlswede and Andreas Winter introduced a Chernoff bound for matrix-valued random variables.
[edit] Theorem (relative error)
Let random variables be independent random variables taking on values 0 or 1. Further, assume that . Then, if we let and μ be the expectation of X, for any δ > 0
[edit] Proof
For any t > 0, we have that . Applying Markov's inequality to the right-hand side of the previous formula (noting that exp(tX) is always a positive random variable), we have
Noting that , we can begin to bound . We have
The second line above follows because of the independence of the Xis, and the third line follows because exp(tXi) takes the value et with probability pi and the value 1 with probability 1 − pi. Re-writing piexp(t) + (1 − pi) as pi(exp(t) − 1) + 1 and recalling that (with strict inequality if x > 0), we set x = pi(exp(t) − 1). Thus
If we simply set t = log(1 + δ) so that t > 0 for δ > 0, we can substitute and find
This proves the result desired. A similar proof strategy can be used to show that
[edit] See also
- Chernoff's inequality, Chernoff bound: special cases
- Hoeffding's inequality
- Markov's inequality
- Chebyshev's inequality
[edit] References
- Herman Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics, vol. 23, pp. 493–507, 1952.
- Wassily Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58 (301): 13–30, March 1963. (JSTOR)
- Rudolf Ahlswede and Andreas Winter, "Strong Converse for Identification via Quantum Channels" http://www.arxiv.org/abs/quant-ph/0012127