Delta method

From Wikipedia, the free encyclopedia

The delta method is a method for deriving an approximate probability distribution for a function of a statistical estimator from knowledge of the limiting distribution of that estimator. In many cases the limiting distribution of the initial estimator is a Normal distribution with mean zero, therefore it is sufficient to obtain the variance of the function of this estimator. If B is an estimator for β then the variance of a function h(B) is

$\operatorname{Var}\left(h(B)\right) \approx \nabla h(\beta)^T \cdot \operatorname{Var}(B) \cdot \nabla h(\beta)$

[edit] Derivation

We know that a consistent estimator converges in probability to its true value:

$\sqrt{n}\left(B-\beta\right) \rightarrow N\left(0, \operatorname{Var}(B) \right)$

where n is the number of observations. Suppose we want to estimate the variance of a function h of the estimator B. Keeping only the first two terms of the Taylor series, and using vector notation for the gradient, we can estimate h(B) as

$h(B) \approx h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)$

which implies the variance of h(B) is approximately

$\begin{align} \operatorname{Var}\left(h(B)\right) & \approx \operatorname{Var}\left(h(\beta) + \nabla h(\beta)^T \cdot (B-\beta)\right) \\ & = \operatorname{Var}\left(h(\beta) + \nabla h(\beta)^T \cdot B - \nabla h(\beta)^T \cdot \beta\right) \\ & = \operatorname{Var}\left(\nabla h(\beta)^T \cdot B\right) \\ & = \left(\nabla h(\beta)^T\right)^2 \cdot \operatorname{Var}\left(B\right) \end{align}$

where the last two lines are achieved by recalling, for constants α and η and variable χ, the identity

$\operatorname{Var}(\alpha\chi+\eta) = \alpha^2 \cdot \operatorname{Var}(\chi)$ ,

and noting that β is a constant.

The delta method therefore implies that

$\sqrt{n}\left(h(B)-h(\beta)\right) \rightarrow N\left(0, \nabla h(\beta)^T \cdot \operatorname{Var}(B) \cdot \nabla h(\beta) \right)$

or in univariate terms,

$\sqrt{n}\left(h(B)-h(\beta)\right) \rightarrow N\left(0, \operatorname{Var}(B) \cdot \left(h^\prime(\beta)\right)^2 \right).$

[edit] Note

The delta method is nearly identical to the formulae presented in Klein (1953, p. 258):

$\operatorname{Var} \left( h_r \right) = \sum_i \left( \frac{ \partial h_r }{ \partial B_i } \right)^2 \operatorname{Var}\left( B_i \right) + \sum_i \sum_{j \neq i} \left( \frac{ \partial h_r }{ \partial B_i } \right) \left( \frac{ \partial h_r }{ \partial B_j } \right) \operatorname{Cov}\left( B_i, B_j \right)$

$\operatorname{Cov}\left( h_r, h_s \right) = \sum_i \left( \frac{ \partial h_r }{ \partial B_i } \right) \left( \frac{ \partial h_s }{ \partial B_i } \right) \operatorname{Var}\left( B_i \right) + \sum_i \sum_{j \neq i} \left( \frac{ \partial h_r }{ \partial B_i } \right) \left( \frac{ \partial h_s }{ \partial B_j } \right) \operatorname{Cov}\left( B_i, B_j \right)$

where $h r$ is the $r th$ element of $h (B)$ and $B i$ is the $i th$ element of $B$ . The only difference is that Klein stated these as identities, whereas they are actually approximations.