Chernoff bound

From Wikipedia, the free encyclopedia

In probability theory, the Chernoff bound, named after Herman Chernoff, gives a lower bound for the success of majority agreement for n independent, equally likely events. The Chernoff bound is a special case of Chernoff's inequality.

A simple motivating example is to consider a biased coin. One side is more likely to come up than the other, but you don't know which and would like to find out. The obvious solution is to flip it many times and then choose the side that comes up the most. But how many times do you have to flip it to be confident that you've chosen correctly?

In general, let E₁, ..., E_n be independent events each having probability p > 1 /2. Then, the Chernoff bound says that the probability of simultaneous occurrence of more than n/2 of the events E_k exceeds

$1 - \exp \left( - 2n \left( {p - \frac{1}{2}} \right)^2 \right) .$

In our example, suppose that we want to ensure we choose the wrong side with at most the small probability ε. Then, rearranging the above, we must have:

$n \geq \frac{1}{2(p -1/2)^2} \ln \frac{1}{\sqrt{\varepsilon}}.$

If the coin is noticeably biased, say coming up on one side 60% of the time, then we can guess that side with 95% accuracy after 150 flips. If it is 90% biased, then a mere 10 flips suffices. If the coin is only biased a tiny amount, like most real coins are, the number of necessary flips becomes much larger.

More practically, the Chernoff bound is used in randomized algorithms (or in computational devices such as quantum computers) to determine a bound on the number of runs necessary to determine a value by majority agreement, up to a specified probability. For example, suppose an algorithm (or machine) A computes the correct value of a function f with probability at least p > 1/2. If we choose n satisfying the inequality above, the probability that a majority exists and is equal to the correct value is at least 1 − ε, which for small enough ε is quite reliable. If p is a constant, ε diminishes exponentially with growing n, which is what makes algorithms in the complexity class BPP efficient.

Notice that if p is very close to one half, the necessary n can become very large. For example, if p = 1/2 + 1/2^m, as it might be in some PP algorithms, the result is that n is bounded below by an exponential function in m:

$n \geq 2^{2m} \ln \frac{1}{\sqrt{\varepsilon}}.$

1 Chernoff bounds
2 Theorem (absolute error)
- 2.1 Proof
- 2.2 Simpler bounds
3 Theorem (relative error)
- 3.1 Proof
4 See also
5 References

[edit] Chernoff bounds

Herman Chernoff described a method on how to derive bounds on sequences of random variables. All the bounds derived using his method are now known as Chernoff bounds.

Chernoff's method centers around bounding a random variable $X$ , which represents a sequence of random variables, by studying the random variable $e t X$ rather than the variable $X$ itself. There are many flavors of Chernoff bounds; we present a bound on the relative error for a series of experiments as well as the absolute error.

[edit] Theorem (absolute error)

The following Theorem is due to Wassily Hoeffding. Assume random variables $X_1, X_2, \ldots, X_m$ are i.i.d. Let $p = E \left [X_i \right ]$ , $X_i \in \{0,1\}$ , and $\varepsilon > 0$ . Then

$\Pr\left[ \frac 1 m \sum X_i \geq p + \varepsilon \right] \leq \left ( {\left (\frac{p}{p + \varepsilon}\right )}^{p+\varepsilon} {\left (\frac{1 - p}{1 -p - \varepsilon}\right )}^{1 - p- \varepsilon}\right ) ^m = e^{ - D(p+\varepsilon\|p) m}$

and

$\Pr\left[ \frac 1 m \sum X_i \leq p - \varepsilon \right] \leq \left ( {\left (\frac{p}{p - \varepsilon}\right )}^{p-\varepsilon} {\left (\frac{1 - p}{1 -p + \varepsilon}\right )}^{1 - p+ \varepsilon}\right ) ^m = e^{ - D(p-\varepsilon\|p) m},$

where

$D(x||y) = x \log \frac{x}{y} + (1-x) \log \frac{1-x}{1-y}$

is the Kullback-Leibler divergence between Bernoulli distributed random variables with parameters $x$ and $y$ respectively.

[edit] Proof

The proof of this result relies on Markov's inequality for positive-valued random variables. First, let us set $q = p + \varepsilon$ in our bound for ease of notation. Letting $λ > 0$ be an arbitrary positive real, we see that

$\Pr\left[ \sum X_i \ge mq \right] = \Pr\left[ e^{\sum X_i} \ge e^{mq}\right] = \Pr\left[\prod e^{X_i} \ge e^{mq}\right] = \Pr\left[\prod e^{\lambda X_i} \ge e^{\lambda m q}\right]$

Applying Markov's inequality to the last expression, we see that

$\Pr\left[ \frac{1}{m} \sum X_i \ge q\right] \le \frac{E \left[\prod e^{\lambda X_i}\right]}{e^{\lambda mq}} = \left[\frac{ E\left[e^{\lambda X_i} \right] }{e^{\lambda q}}\right]^m$

where the equality follows from the independence of the $n$ $X i$ 's. Now, knowing that $\Pr[X_i = 1] = p$ , $\Pr[X_i = 0] = (1-p)$ , we have

$\left[\frac{ E\left[e^{\lambda X_i} \right] }{e^{\lambda q}}\right]^m = \left[\frac{p e^\lambda + (1-p)}{e^{\lambda q} }\right]^m = [pe^{(1-q)\lambda} + (1-p)e^{-q\lambda}]^m.$

Because $λ$ is arbitrary, we can minimize the above expression with respect to $λ$ , which is easily done using calculus and some logarithms. Thus,

$\begin{align} \frac{d}{d\lambda} \log(pe^{(1-q)\lambda} + (1-p)e^{-q\lambda}) & = \frac{1}{pe^{(1-q)\lambda} + (1-p)e^{-q\lambda}} ((1-q)pe^{(1-q)\lambda}-q(1-p)e^{-q\lambda}) \\ & = -q + \frac{pe^{(1-q)\lambda}}{pe^{(1-q)\lambda}+(1-p)e^{-q\lambda}} \end{align}$

Setting the last equation to zero and solving, we have

$\begin{align} q & = \frac{pe^{(1-q)\lambda}}{pe^{(1-q)\lambda}+(1-p)e^{-q\lambda}} = \frac{pe^{(1-q)\lambda}}{e^{-q\lambda}(pe^{\lambda}+(1-p))} \\ pe^{(1-q)\lambda} & = pe^{-q\lambda}e^\lambda = qe^{-q\lambda}(pe^{\lambda}+1-p) \\ \frac{p}{q}e^\lambda & = pe^\lambda + 1-p \end{align}$

so that $e^\lambda = (1-p)\left(\frac{p}{q}-p\right)^{-1}$ . Thus, $\lambda = \log\left(\frac{(1-p)q}{(1-q)p}\right)$ . As $q = p+\varepsilon > p$ , we see that $λ > 0$ , so our bound is satisfied on $λ$ . Having solved for $λ$ , we can plug back into the equations above to find that

$\begin{align} \log(pe^{(1-q)\lambda} + (1-p)e^{-q\lambda}) &= \log[e^{-q\lambda}(1-p+pe^\lambda)] \\ & = \log\left[e^{-q \log\left(\frac{(1-p)q}{(1-q)p}\right)}\right] + \log\left[1-p+pe^{\log\left(\frac{1-p}{1-q}\right)}e^{\log\frac{q}{p}}\right] \\ & = -q\log\frac{1-p}{1-q} -q \log\frac{q}{p} + \log\left[1-p+ p\left(\frac{1-p}{1-q}\right)\frac{q}{p}\right] \\ & = -q\log\frac{1-p}{1-q} -q \log\frac{q}{p} + \log\left[\frac{(1-p)(1-q)}{1-q}+\frac{(1-p)q}{1-q}\right] \\ & = -q\log\frac{q}{p} + (1-q)\log\frac{1-p}{1-q} = -D(q \| p). \end{align}$

We now have our desired result, that

$\Pr\left[\frac{1}{m}\sum X_i \ge p + \varepsilon\right] \le e^{-D(p+\varepsilon\|p) m}.$

To complete the proof for the symmetric case, we simply define the random variable $Y i = 1 - X i$ , apply the same proof, and plug into our bound.

[edit] Simpler bounds

A simpler bound follows by relaxing the theorem using $D( p + x \| p) \geq 2 x^2$ , which follows from the convexity of $D(p+x\| p)$ and the fact that $\frac{d^2}{dx^2} D(p+x\|p) = \frac{1}{p(1-p)}$ . This results in a special case of Hoeffding's inequality. Sometimes, the bound $D( (1+x) p \| p) \geq x^2 p/4$ for $-1/2 \leq x \leq 1/2$ , which is stronger for $p < 1 / 8$ , is also used.

Rudolf Ahlswede and Andreas Winter introduced a Chernoff bound for matrix-valued random variables.

[edit] Theorem (relative error)

Let random variables $X_1, X_2, \ldots, X_n$ be independent random variables taking on values 0 or 1. Further, assume that $\Pr(X_i = 1) = p_i$ . Then, if we let $X = \sum_{i=1}^n X_i$ and $μ$ be the expectation of $X$ , for any $δ > 0$

$\Pr \left[ X > (1+\delta)\mu\right] < \left(\frac{e^\delta}{(1+\delta)^{(1+\delta)}}\right)^\mu.$

[edit] Proof

For any $t > 0$ , we have that $\Pr[X > (1+\delta)\mu)] = \Pr[\exp(tX) > \exp(t(1+\delta)\mu)]$ . Applying Markov's inequality to the right-hand side of the previous formula (noting that $exp(t X)$ is always a positive random variable), we have

$\Pr[X > (1+\delta)\mu] \le \frac{\mathbf{E}[\exp(tX)]}{\exp(t(1+\delta)\mu)}.$

Noting that $\mathbf{E}[\exp(tX)] = \mathbf{E}\left[\exp\left(t\sum_{i=1}^n X_i\right)\right] = \mathbf{E}\left[\prod_{i=1}^n\exp(tX_i)\right]$ , we can begin to bound $\Pr[X > (1+\delta)\mu]$ . We have

$\begin{align} \Pr[X > (1 + \delta)\mu)] & \le \frac{\mathbf{E}\left[\prod_{i=1}^n\exp(tX_i)\right]}{\exp(t(1+\delta)\mu)} \\ & = \frac{\prod_{i=1}^n\mathbf{E}[\exp(tX_i)]}{\exp(t(1+\delta)\mu)} \\ & = \frac{\prod_{i=1}^n\left[p_i\exp(t) + (1-p_i)\right]}{\exp(t(1+\delta)\mu)} \end{align}$

The second line above follows because of the independence of the $X i$ s, and the third line follows because $exp(t X i)$ takes the value $e t$ with probability $p i$ and the value $1$ with probability $1 - p i$ . Re-writing $p i exp(t) + (1 - p i)$ as $p i (exp(t) - 1) + 1$ and recalling that $1+x \le \exp(x)$ (with strict inequality if $x > 0$ ), we set $x = p i (exp(t) - 1)$ . Thus

$\Pr[X > (1+\delta)\mu] < \frac{\prod_{i=1}^n\exp(p_i(e^t-1))}{\exp(t(1+\delta)\mu)} = \frac{\exp\left((e^t-1)\sum_{i=1}^n p_i\right)}{\exp(t(1+\delta)\mu)} = \frac{\exp((e^t-1)\mu)}{\exp(t(1+\delta)\mu)}.$

If we simply set $t = log(1 + δ)$ so that $t > 0$ for $δ > 0$ , we can substitute and find

$\frac{\exp((e^t-1)\mu)}{\exp(t(1+\delta)\mu)} = \frac{\exp((1+\delta - 1)\mu)}{(1+\delta)^{(1+\delta)\mu}} = \left[\frac{\exp(\delta)}{(1+\delta)^{(1+\delta)}}\right]^\mu$

This proves the result desired. A similar proof strategy can be used to show that

$\Pr[X < (1-\delta)\mu] < \exp(-\mu\delta^2/2).$

[edit] See also

Chernoff's inequality, Chernoff bound: special cases
Hoeffding's inequality
Markov's inequality
Chebyshev's inequality

[edit] References

Herman Chernoff, A measure of asymptotic efficiency for tests of a hypothesis based on the sum of observations, Annals of Mathematical Statistics, vol. 23, pp. 493–507, 1952.
Wassily Hoeffding, Probability inequalities for sums of bounded random variables, Journal of the American Statistical Association 58 (301): 13–30, March 1963. (JSTOR)
Rudolf Ahlswede and Andreas Winter, "Strong Converse for Identification via Quantum Channels" http://www.arxiv.org/abs/quant-ph/0012127

Categories: Inequalities | Probability theory | Statistical inequalities

Chernoff bound

From Wikipedia, the free encyclopedia

Contents

[edit] Chernoff bounds

[edit] Theorem (absolute error)

[edit] Proof

[edit] Simpler bounds

[edit] Theorem (relative error)

[edit] Proof

[edit] See also

[edit] References

Views

Navigation

Interaction

Search

Languages