Proof of Stein's example

From Wikipedia, the free encyclopedia

Stein's example is an important result in decision theory. The following is an outline of its proof. The reader is referred to the main article for more information.

[edit] Sketched proof

The risk function of the decision rule d(\mathbf{x}) = \mathbf{x} is

R(\theta,d) = \mathbb{E}_\theta[ |\mathbf{\theta - x}|^2]
=\int (\mathbf{\theta - x})^T(\mathbf{\theta - x}) \left( \frac{1}{2\pi} \right)^{n/2} e^{(-1/2) (\mathbf{\theta - x})^T (\mathbf{\theta - x}) } m(dx)
 = n.\,

Now consider the decision rule

d'(\mathbf{x}) = \mathbf{x} - \frac{\alpha}{|\mathbf{x}|^2}\mathbf{x}

where α = n − 2. We will show that d' is a better decision rule than d. The risk function is

R(\theta,d') = \mathbb{E}_\theta\left[ \left|\mathbf{\theta - x} + \frac{\alpha}{|\mathbf{x}|^2}\mathbf{x}\right|^2\right]
 = \mathbb{E}_\theta\left[ |\mathbf{\theta - x}|^2 + 2(\mathbf{\theta - x})^T\frac{\alpha}{|\mathbf{x}|^2}\mathbf{x} + \frac{\alpha^2}{|\mathbf{x}|^4}|\mathbf{x}|^2 \right]
 = \mathbb{E}_\theta\left[ |\mathbf{\theta - x}|^2 \right] + 2\alpha\mathbb{E}_\theta\left[\frac{\mathbf{(\theta-x)^T x}}{|\mathbf{x}|^2}\right] + \alpha^2\mathbb{E}_\theta\left[\frac{1}{|\mathbf{x}|^2} \right]

— a quadratic in α. We may simplify the middle term by considering a general sufficiently well behaved function h:\mathbf{X} \mapsto h(\mathbf{X}) \in \mathbb{R} and using integration by parts. For any such h, for all 1\leq i \leq n:

\mathbb{E}_\theta [ (\theta_i - \mathbf{X}_i) h(\mathbf{X}) ]= \int (\theta_i - \mathbf{X}_i) h(\mathbf{X}) \left( \frac{1}{2\pi} \right)^{n/2} e^{ -(1/2)(\mathbf{x-\theta})^T \mathbf{(x-\theta)} } m(dx_i)
= \left[ h(\mathbf{X}) \left( \frac{1}{2\pi} \right)^{n/2} e^{-(1/2) (\mathbf{X}-\theta)^T (\mathbf{X}-\theta) } \right]^\infty_{-\infty}
- \int  \frac{\partial h}{\partial \mathbf{x}_i} \left( \frac{1}{2\pi} \right)^{n/2} e^{-(1/2)\mathbf{(x-\theta)}^T \mathbf{(x-\theta)} } m(dx_i)
 = - \mathbb{E}_\theta \left[ \frac{\partial h}{\partial \mathbf{x}_i} \right].

(This result is known as Stein's lemma.)

Thus, if we set


h(\mathbf{X})  =  \frac{\mathbf{X}_i}{|\mathbf{X}|^2}

then assuming h meets the "well behaved" condition (see end of proof), we have

\frac{\partial h}{\partial \mathbf{x}_i} = \frac{1}{|\mathbf{X}|^2} - \frac{2\mathbf{X}_i^2}{|\mathbf{X}|^4}

and so


\mathbb{E}_\theta\left[\frac{\mathbf{(\theta-x)^T x}}{|\mathbf{x}|^2}\right] = \sum_{i=1}^n \mathbb{E}_\theta \left[ (\theta_i - \mathbf{X}_i) \frac{\mathbf{X}_i}{|\mathbf{X}|^2} \right]
 = - \sum_{i=1}^n \mathbb{E}_\theta \left[ \frac{1}{|\mathbf{X}|^2} - \frac{2\mathbf{X}_i^2}{|\mathbf{X}|^4} \right]
 = -(n-2)\mathbb{E}_\theta \left[\frac{1}{|\mathbf{X}|^2}\right].

Then returning to the risk function of d' :


R(\theta,d') =  n - 2\alpha(n-2)\mathbb{E}_\theta\left[\frac{1}{|\mathbf{X}|^2}\right] + \alpha^2\mathbb{E}_\theta\left[\frac{1}{|\mathbf{x}|^2} \right].

This quadratic in α is minimized at

\alpha = n-2,\,

giving

R(\theta,d') = R(\theta,d) - (n-2)^2\mathbb{E}_\theta\left[\frac{1}{|\mathbf{x}|^2} \right]

which of course satisfies:

R(θ,d') < R(θ,d).

making d an inadmissible decision rule.

It remains to justify the use of


h(\mathbf{X})= \frac{\mathbf{X}}{|\mathbf{X}|^2}

This function is not in fact very "well behaved" since it is singular at \mathbf{x}=0. However the function


h(\mathbf{X}) = \frac{\mathbf{X}}{\epsilon + |\mathbf{X}|^2}

is "well behaved", and after following the algebra through and letting \epsilon \to 0 one obtains the same result.