Covariance matrix

From Wikipedia, the free encyclopedia

In statistics and probability theory, the covariance matrix is a matrix of covariances between elements of a vector. It is the natural generalization to higher dimensions of the concept of the variance of a scalar-valued random variable.

1 Definition
2 Conflicting nomenclatures and notations
3 Properties
4 Which matrices are covariance matrices
5 Complex random vectors
6 Estimation
7 External link
8 See also

[edit] Definition

If $X$ is a column vector with $n$ scalar random variable components, and $μ k$ is the expected value of the k^th element of $X$ , i.e., $μ k = E(X k)$ , then the covariance matrix is defined as:

$\Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right]$

$= \begin{bmatrix} \mathrm{E}[(X_1 - \mu_1)(X_1 - \mu_1)] & \mathrm{E}[(X_1 - \mu_1)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_1 - \mu_1)(X_n - \mu_n)] \\ \\ \mathrm{E}[(X_2 - \mu_2)(X_1 - \mu_1)] & \mathrm{E}[(X_2 - \mu_2)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_2 - \mu_2)(X_n - \mu_n)] \\ \\ \vdots & \vdots & \ddots & \vdots \\ \\ \mathrm{E}[(X_n - \mu_n)(X_1 - \mu_1)] & \mathrm{E}[(X_n - \mu_n)(X_2 - \mu_2)] & \cdots & \mathrm{E}[(X_n - \mu_n)(X_n - \mu_n)] \end{bmatrix}$

The $(i, j)$ element is the covariance between $X i$ and $X j$ .

This concept generalizes to higher dimensions the concept of variance of a scalar-valued random variable $X$ , defined as

$\sigma^2 = \mathrm{var}(X) = \mathrm{E}[(X-\mu)^2] \,$

where $μ = E(X)$ .

[edit] Conflicting nomenclatures and notations

Nomenclatures differ. Some statisticians, following the probabilist William Feller, call this matrix the variance of the random vector $X$ , because it is the natural generalization to higher dimensions of the 1-dimensional variance. Others call it the covariance matrix, because it is the matrix of covariances between the scalar components of the vector $X$ . Thus

$\operatorname{var}(\textbf{X}) = \operatorname{cov}(\textbf{X}) = \mathrm{E} \left[ (\textbf{X} - \mathrm{E} [\textbf{X}]) (\textbf{X} - \mathrm{E} [\textbf{X}])^\top \right]$

However, the notation for the "cross-covariance" between two vectors is standard:

$\operatorname{cov}(\textbf{X},\textbf{Y}) = \mathrm{E} \left[ (\textbf{X} - \mathrm{E}[\textbf{X}]) (\textbf{Y} - \mathrm{E}[\textbf{Y}])^\top \right]$

The $v a r$ notation is found in William Feller's two-volume book An Introduction to Probability Theory and Its Applications, but both forms are quite standard and there is no ambiguity between them.

[edit] Properties

For $\Sigma=\mathrm{E} \left[ \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right) \left( \textbf{X} - \mathrm{E}[\textbf{X}] \right)^\top \right]$ and $\mu = \mathrm{E}(\textbf{X})$ the following basic properties apply:

$\Sigma = \mathrm{E}(\mathbf{X X^\top}) - \mathbf{\mu}\mathbf{\mu^\top}$
$\operatorname{var}(\mathbf{a^\top}\mathbf{X}) = \mathbf{a^\top} \operatorname{var}(\mathbf{X}) \mathbf{a}$
$\mathbf{\Sigma}$ is positive semi-definite
$\operatorname{var}(\mathbf{A X} + \mathbf{a}) = \mathbf{A} \operatorname{var}(\mathbf{X}) \mathbf{A^\top}$
$\operatorname{cov}(\mathbf{X},\mathbf{Y}) = \operatorname{cov}(\mathbf{Y},\mathbf{X})^\top$
$\operatorname{cov}(\mathbf{X_1} + \mathbf{X_2},\mathbf{Y}) = \operatorname{cov}(\mathbf{X_1},\mathbf{Y}) + \operatorname{cov}(\mathbf{X_2}, \mathbf{Y})$
If p = q, then $\operatorname{var}(\mathbf{X} + \mathbf{Y}) = \operatorname{var}(\mathbf{X}) + \operatorname{cov}(\mathbf{X},\mathbf{Y}) + \operatorname{cov}(\mathbf{Y}, \mathbf{X}) + \operatorname{var}(\mathbf{Y})$
$\operatorname{cov}(\mathbf{AX}, \mathbf{BX}) = \mathbf{A} \operatorname{cov}(\mathbf{X}, \mathbf{X}) \mathbf{B}^\top$
If $\mathbf{X}$ and $\mathbf{Y}$ are independent, then $\operatorname{cov}(\mathbf{X}, \mathbf{Y}) = 0$

where $\mathbf{X}, \mathbf{X_1}$ and $\mathbf{X_2}$ are a random $\mathbf{(p \times 1)}$ vectors, $\mathbf{Y}$ is a random $\mathbf{(q \times 1)}$ vector, $\mathbf{a}$ is $\mathbf{(p \times 1)}$ vector, $\mathbf{A}$ and $\mathbf{B}$ are $\mathbf{(p \times q)}$ matrices.

This covariance matrix (though very simple) is a very useful tool in many very different areas. From it a transformation matrix can be derived that allows one to completely decorrelate the data or, from a different point of view, to find an optimal basis for representing the data in a compact way (see Rayleigh quotient for a formal proof and additional properties of covariance matrices). This is called principal components analysis (PCA) in statistics and Karhunen-Loève transform (KL-transform) in image processing.

[edit] Which matrices are covariance matrices

From the identity

$\operatorname{var}(\mathbf{a^\top}\mathbf{X}) = \mathbf{a^\top} \operatorname{var}(\mathbf{X}) \mathbf{a}\,$

and the fact that the variance of any real-valued random variable is nonnegative, it follows immediately that only a nonnegative-definite matrix can be a covariance matrix. The converse question is whether every nonnegative-definite symmetric matrix is a covariance matrix. The answer is "yes". To see this, suppose M is a p×p nonnegative-definite symmetric matrix. From the finite-dimensional case of the spectral theorem, it follows that M has a nonnegative symmetric square root, which let us call M^1/2. Let $\mathbf{X}$ be any p×1 column vector-valued random variable whose covariance matrix is the p×p identity matrix. Then

$\operatorname{var}(M^{1/2}\mathbf{X}) = M^{1/2} (\operatorname{var}(\mathbf{X})) M^{1/2} = M.\,$

[edit] Complex random vectors

The variance of a complex scalar-valued random variable with expected value μ is conventionally defined using complex conjugation:

$\operatorname{var}(z) = \operatorname{E} \left[ (z-\mu)(z-\mu)^{*} \right]$

where the complex conjugate of a complex number $z$ is denoted $z *$ .

If $Z$ is a column-vector of complex-valued random variables, then we take the conjugate transpose by both transposing and conjugating, getting a square matrix:

$\operatorname{E} \left[ (Z-\mu)(Z-\mu)^{*} \right]$

where $Z *$ denotes the conjugate transpose, which is applicable to the scalar case since the transpose of a scalar is still a scalar.

[edit] Estimation

The derivation of the maximum-likelihood estimator of the covariance matrix of a multivariate normal distribution is perhaps surprisingly subtle. It involves the spectral theorem and the reason why it can be better to view a scalar as the trace of a 1 × 1 matrix than as a mere scalar. See estimation of covariance matrices.

[edit] External link

Covariance Matrix at Mathworld

Categories: Covariance and correlation | Matrices | Statistics

Covariance matrix

From Wikipedia, the free encyclopedia

Contents

[edit] Definition

[edit] Conflicting nomenclatures and notations

[edit] Properties

[edit] Which matrices are covariance matrices

[edit] Complex random vectors

[edit] Estimation

[edit] External link

[edit] See also

Views

Navigation

interaction

Search

In other languages