Karhunen-Loève theorem

From Wikipedia, the free encyclopedia

In the theory of stochastic processes, the Karhunen-Loève theorem (named after Kari Karhunen and Michel Loève) is a representation of a stochastic process as an infinite linear combination of orthogonal functions, analogous to a Fourier series representation of a function on a bounded interval. In contrast to a Fourier series where the coefficients are real numbers and the expansion basis consists of sinusoidal functions (that is, sine and cosine functions), the coefficients in the Karhunen-Loève theorem are random variables and the expansion basis depends on the process. In fact, the orthogonal basis functions used in this representation are determined by the covariance function of the process. If we regard a stochastic process as a random function F, that is, one in which the random value is a function on an interval [a, b], then this theorem can be considered as a random orthonormal expansion of F.

In the case of a centered stochastic process {Xt}t ∈ [a, b] (where centered means that the expectations E(Xt) are defined and equal to 0 for all values of the parameter t in [a, b]) satisfying a technical continuity condition, admits a decomposition

 \mathbf{X}_t = \sum_{k=1}^\infty \mathbf{Z}_k e_k(t).

where Zk are pairwise uncorrelated random variables and the functions ek are continuous real-valued functions on [a, b] which are pairwise orthogonal in L2[a, b]. The general case of a process which is not centered can be represented by expanding the expectation function (which is a non-random function) in the basis ek .

Moreover, if the process is Gaussian, then the random variables Zk are Gaussian and stochastically independent. This result generalizes the Karhunen-Loève transform. An important example of a centered real stochastic process on [0,1] is the Wiener process and the Karhunen-Loève theorem can be used to provide a canonical orthogonal representation for it. In this case the expansion consists of sinusoidal functions.

The above expansion into uncorrelated random variables is also known as the Karhunen-Loève expansion or Karhunen-Loève decomposition. The empirical version (i.e., with the coefficients computed from a sample) is known as Principal component analysis, Proper orthogonal decomposition (POD), or the Hotelling transform.

Contents

[edit] Formulation

We will formulate the result in terms of complex-valued stochastic processes. The results apply to real-valued processes without modification by recognizing that the complex conjugate of a real number is the number itself.

If X and Y are random variables, the inner product is defined by

 \langle \mathbf{X}|\mathbf{Y} \rangle = \operatorname{E}(\mathbf{X^*}\mathbf{Y})

where * represents complex conjugation.

[edit] Second order statistics

The inner product is defined if both X and Y have finite second moments, or equivalently, if they are both square integrable. Note that the inner product is related to covariance and correlation. In particular, for random variables of mean zero, covariance and inner product coincide. The autocovariance function KXX is

 K_\mathrm{XX}(t,s) =  \operatorname{Cov}[ X(t),X(s)  ]  = \langle \mathbf{X}_t | \mathbf{X}_s \rangle
=  \mathrm{E} \{ [ X(t)-\mu_X(t) ]^* [ X(s)-\mu_X(s) ]  \}  \,
=  \mathrm{E} \{  X^*(t)  X(s)  \} - \mu^*_X(t) \mu_X(s)  \,
= R_\mathrm{XX}(t,s)   - \mu^*_X(t) \mu_X(s) . \,

If {Xt}t is a centered process, then

\mu_X(t) = 0  \,

for all t. Thus, the autocovariance KXX is identical to the autocorrelation RXX:

 K_\mathrm{XX}(t,s) = R_\mathrm{XX}(t,s) . \,

Note that if {Xt}t is centered and t1, ≤ t2, ..., ≤ tN are points in [a, b], then

 \sum_{k,\ell} \operatorname{Cov}_{\mathbf{X}}(t_k,t_\ell) = \operatorname{Var}\left(\sum_{k=1}^N \mathbf{X}_k\right) \geq 0.

[edit] Statement of the theorem

Theorem. Consider a centered stochastic process {Xt}t indexed by t in the interval [a, b] with covariance function CovX. Suppose the covariance function CovX(t,s) is jointly continuous in t, s. Then CovX can be regarded as a positive definite kernel and so by Mercer's theorem, the corresponding integral operator T on L2[a,b] (relative to Lebesgue measure on [a,b]) has an orthonormal basis of eigenvectors. Let {ei}i be the eigenvectors of T corresponding to non-zero eigenvalues and

 \mathbf{Z}_i = \int_a^b \mathbf{X}_t e_i(t) dt.

Then Zi are centered orthogonal random variables and

 \mathbf{X}_t = \sum_{i=1}^\infty e_i(t) \mathbf{Z}_i

where the convergence is in the mean and is uniform in t. Moreover

 \operatorname{Var}(\mathbf{Z}_i) = \operatorname{E}(\mathbf{Z}_i^2) = \lambda_i.

where λi is the eigenvalue corresponding to the eigenvector ei.

[edit] Cauchy sums

In the statement of the theorem, the integral defining Zi, can be defined as the limit in the mean of Cauchy sums of random variables:

 \sum_{k=0}^{\ell-1} \mathbf{X}_{\xi_k} e_i(\xi_k) (t_{k+1} - t_k),

where

 a = t_0 \leq \xi_0 \leq t_1 \leq \cdots \leq \xi_{\ell-1} \leq t_n = b


[edit] Special case: Gaussian distribution

Since the limit in the mean of jointly Gaussian random variables is jointly Gaussian, and jointly Gaussian random (centered) variables are independent if and only if they are orthogonal, we can also conclude:

Theorem. The variables Zi have a joint Gaussian distribution and are stochastically independent if the original process {Xt}t is Gaussian.

In the gaussian case, since the variables Zi are independent, we can say more:

 \lim_{N \rightarrow \infty} \sum_{i=1}^N e_i(t) \mathbf{Z}_i(\omega) = \mathbf{X}_t(\omega)

almost surely.

Note that by generalizations of Mercer's theorem we can replace the interval [a, b] with other compact spaces C and Lebesgue measure on [a, b] with a Borel measure whose support is C.

[edit] The Wiener process

There are numerous equivalent characterizations of the Wiener process which is a mathematical formalization of Brownian motion. Here we regard it as the centered standard Gaussian process B(t) with covariance function

 \mathrm{K}_\mathrm{BB}(t,s)  = \operatorname{Cov}(B(t),B(s)) =  \min (s,t).

The eigenvectors of the covariance kernel are easily determined. These are

 e_k(t) = \sqrt{2} \sin \left(k - \frac{1}{2}\right) \pi t

and the corresponding eigenvalues are

 \lambda_k = \frac{4}{(2 k -1)^2 \pi^2}.

This gives the following representation of the Wiener process:

Theorem. There is a sequence {Wi}i of independent Gaussian random variables with mean zero and variance 1 such that

 \mathbf{B}_t = \sqrt{2} \sum_{k=1}^\infty \mathbf{W}_k \frac{\sin \left(k - \frac{1}{2}\right) \pi t}{ \left(k - \frac{1}{2}\right) \pi}.

Convergence is uniform in t and in the L2 norm, that is

 \operatorname{E}\left(\mathbf{B}_t - \sqrt{2} \sum_{k=1}^n \mathbf{W}_k \frac{\sin \left(k - \frac{1}{2}\right) \pi t}{ \left(k - \frac{1}{2}\right) \pi} \right)^2 \rightarrow 0

uniformly in t.

[edit] References

  • I. Guikhman, A. Skorokhod, Introduction a la Théorie des Processus Aléatoires Éditions MIR, 1977
  • B. Simon, Functional Integration and Quantum Physics, Academic Press, 1979
  • K. Karhunen, Kari, Über lineare Methoden in der Wahrscheinlichkeitsrechnung, Ann. Acad. Sci. Fennicae. Ser. A. I. Math.-Phys., 1947, No. 37, 1--79
  • M. Loève, Probability theory. Vol. II, 4th ed., Graduate Texts in Mathematics, Vol. 46, Springer-Verlag, 1978, ISBN 0-387-90262-7

[edit] See also

Languages