User:Gala.martin/sandbox

From Wikipedia, the free encyclopedia



http://en.wikipedia.org/wiki/List_of_geography_topics


Contents

[edit] Proofs

A proof of Jensen's inequality can be provided in several ways. Here three different proves are given, each proof beeing related to the three different statements above (the finite form, the inequality in measure-theoretic terminology, and the general inequality in probabilistic notation). The first one is obtained by proving the finite form of the inequality first, and then using a density argument; this proof should clear out how is the inequality derived. The second one is the most common proof of Jensen's inequality, and uses some basic ideas of nonsmooth analysis. The third one is just a generalization of the second one, that provides a proof of the general statement for vector--valued random variables. This last proof is the more compact, even if requires a more advanced mathematical level.

[edit] Proof 1 (using the finite form)

If \lambda_1,\,\lambda_2 are two arbitrary positive real numbers such that λ1 + λ2 = 1, then convexity of \varphi implies \varphi(\lambda_1 x_1+\lambda_2 x_2)\leq \lambda_1\,\varphi(x_1)+\lambda_2\,\varphi(x_2) for any x_1,\,x_2. This can be easily generalized: if \lambda_1,\,\lambda_2,\ldots,\lambda_n are n positive real numbers such that \lambda_1+\lambda_2+\ldots+\lambda_n=1, then

\varphi(\lambda_1 x_1+\lambda_2 x_2+\ldots\lambda_n x_n)\leq \lambda_1\,\varphi(x_1)+\lambda_2\,\varphi(x_2)+\ldots+\lambda_n\,\varphi(x_n),

for any x_1,\,x_2,\ldots,\,x_n. This finite form of the Jensen's inequality can be proved by induction: by convexity hypotheses, the statement is true for n = 2. Suppose it is true also for some n, one needs to prove it for n+1. At least one of the λi is strictly positive, say λ1; therefore by convexity inequality:

\varphi\left(\sum_{i=1}^{n+1}\lambda_i x_i\right)= \varphi\left(\lambda_1 x_1+(1-\lambda_1)\sum_{i=2}^{n+1} \frac{\lambda_i}{1-\lambda_1} x_i\right)\leq \lambda_1\,\varphi(x_1)+(1-\lambda_1) \sum_{i=2}^{n+1}\varphi\left( \frac{\lambda_i}{1-\lambda_1} x_i\right).

Since \sum_{i=2}^{n+1} \frac{\lambda_i}{1-\lambda_1} =1, one can apply the induction hypotheses to the last term in the previous formula to obtain the result, namely the finite form of the Jensen's inequality.

In order to obtain the general inequality from this finite form, one needs to use a density argument. The finite form can be re-written as:

\varphi\left(\int x\,d\mu_n(x) \right)\leq \int \varphi(x)\,d\mu_n(x),

where μn is a measure given by an arbitrary convex combination of Dirac deltas:

\mu_n=\sum_{i=1}^n \lambda_i \delta_{x_i}.

Since convex functions are continuous, and since convex combinations of Dirac deltas are weakly dense in the set of probability measures (as could be easily verified), the general statement is obtained simply by a limiting procedure.

[edit] Proof 2 (measure theoretic notation)

Let g be a real-valued μ-integrable function on a measure space Ω, and let φ be a convex function on the real numbers. Define the right-handed derivative of φ at x as

\varphi^\prime(x):=\lim_{t\to0^-}\frac{\varphi(x+t)-\varphi(x)}{t}

Since φ is convex, the quotient of the right-hand side is decreasing when t approaches 0 from the right, and bounded below by any term of the form

\frac{\varphi(x+t)-\varphi(x)}{t}

where t < 0, and therefore, the limit does always exist.

Now, let us define the following:

x_0:=\int_\Omega g\, d\mu,
a:=\varphi^\prime(x_0),
b:=\varphi(x_0)-x_0\varphi^\prime(x_0).

Then for all x, ax+b\leq\varphi(x). To see that, take x>x0, and define t = x − x0 > 0. Then,

\varphi^\prime(x_0)\leq\frac{\varphi(x_0+t)-\varphi(x_0)}{t}.

Therefore,

\varphi^\prime(x_0)(x-x_0)+\varphi(x_0)\leq\varphi(x)

as desired. The case for x < x0 is proven similarly, and clearly ax_0+b=\varphi(x_0).

φ(x0) can then be rewritten as

ax_0+b=a\left(\int_\Omega g\,d\mu\right)+b.

But since μ(Ω) = 1, then for every real number k we have

\int_\Omega k\,d\mu=k.

In particular,

a\left(\int_\Omega g\,d\mu\right)+b=\int_\Omega(ag+b)\,d\mu\leq\int_\Omega\varphi\circ g\,d\mu.

[edit] Proof 3 (general inequality in probabilistic notation)

Let X be a random variable that takes value in a real topological vector space T. Since \varphi:T \mapsto \mathbb{R} is convex, for any x,y \in T, the quantity

\frac{\varphi(x+\theta\,y)-\varphi(x)}{\theta},

is decreasing as θ approaches 0. In particular, it is well defined the subdifferential of \varphi evaluated at x in the direction y, defined by:

(D\varphi)(x)\cdot y:=\lim_{\theta \to 0} \frac{\varphi(x+\theta\,y)-\varphi(x)}{\theta}=\inf_{\theta \neq 0} \frac{\varphi(x+\theta\,y)-\varphi(x)}{\theta}.

It is easily seen that the subdifferential is linear in y, and since the infimum taken in the right-hand side of the previous formula is smaller than the value of the same term for θ = 1 one gets:

\varphi(x)\leq \varphi(x+y)-(D\varphi)(x)\cdot y.

In particular, for an arbitrary sub-σ-algebra \mathfrak{G} we can evaluate the last inequality when x=\mathbb{E}\{X|\mathfrak{G}\},\,y=X-\mathbb{E}\{X|\mathfrak{G}\} to obtain:

\varphi(\mathbb{E}\{X|\mathfrak{G}\})\leq \varphi(X)-(D\varphi)(\mathbb{E}\{X|\mathfrak{G}\})\cdot (X-\mathbb{E}\{X|\mathfrak{G}\}).

Now, if we take the expectation conditioned to \mathfrak{G} on both sides of the previous expression, we get the result since:

\mathbb{E}\{\left[(D\varphi)(\mathbb{E}\{X|\mathfrak{G}\})\cdot (X-\mathbb{E}\{X|\mathfrak{G}\})\right]|\mathfrak{G}\}=(D\varphi)(\mathbb{E}\{X|\mathfrak{G}\})\cdot \mathbb{E}\{ \left( X-\mathbb{E}\{X|\mathfrak{G}\} \right) |\mathfrak{G}\}=0,

by the linearity of the subdifferential in the y variable, and well-known properties of the conditional expectation.