Rayleigh quotient

From Wikipedia, the free encyclopedia

In mathematics, for a given complex Hermitian matrix A and nonzero vector x, the Rayleigh quotient R(A,x) is defined as:

{x^{*} A x \over x^{*} x}

For real matrices and vectors, the condition of being Hermitian reduces to that of being symmetric, and the conjugate transpose x * to the usual transpose x'. Note that R(A,c.x) = R(A,x) for any real scalar c. Recall that a Hermitian (or real symmetric) matrix has real eigenvalues. It can be shown that the Rayleigh quotient reaches its minimum value \lambda_{\operatorname{min}} (the smallest eigenvalue of A) when x is v_{\operatorname{min}} (the corresponding eigenvector). Similarly, R(A, x) \leq \lambda_{\operatorname{max}} and R(A, v_{\operatorname{max}}) = \lambda_{\operatorname{max}}. The Rayleigh quotient is used in eigenvalue algorithms to obtain an eigenvalue approximation from an eigenvector approximation. Specifically, this is the basis for Rayleigh quotient iteration.

[edit] Special case of covariance matrices

A covariance matrix Σ can be represented as the product A'A. Its eigenvalues are positive:

Σvi = λivi
A'Avi = λivi
vi'A'Avi = viivi
\left\| A v_i \right\|^2 = \lambda _i \left\| v_i \right\|^2
\lambda _i = \frac{\left\| A v_i \right\|^2}{\left\| v_i \right\|^2} \geq 0

The eigenvectors are orthogonal to one another:

A'Avi = λivi
vj'A'Avi = λivj'vi
(A'Avj)'vi = λivj'vi
λjvj'vi = λivj'vi
j − λi)vj'vi = 0
vj'vi = 0 (different eigenvalues, in case of multiplicity, the basis can be orthogonalized)

The Rayleigh quotient can be expressed as a function of the eigenvalues by decomposing any vector x on the basis of eigenvectors:

x = \sum _{i=1} ^n \alpha _i v_i
\rho = \frac{x' A' A x}{x' x}
\rho = \frac{(\sum _{j=1} ^n \alpha _j v_j)' A' A (\sum _{i=1} ^n \alpha _i v_i)}{(\sum _{j=1} ^n \alpha _j v_j)' (\sum _{i=1} ^n \alpha _i v_i)}

Which, by orthogonality of the eigenvectors, becomes:

\rho = \frac{\sum _{i=1} ^n \alpha _i ^2 \lambda _i}{\sum _{i=1} ^n \alpha _i ^2}

If a vector x maximizes ρ, then any vector k.x (for k \ne 0) also maximizes it, one can reduce to the Lagrange problem of maximizing \sum _{i=1} ^n \alpha _i ^2 \lambda _i under the constraint that \sum _{i=1} ^n \alpha _i ^2 = 1.

Since all the eigenvalues are non-negative, the problem is convex and the maximum occurs on the edges of the domain, namely when α1 = 1 and \forall i > 1, \alpha _i = 0 (when the eigenvalues are ordered in decreasing magnitude).

This property is the basis for principal components analysis and canonical correlation.

In other languages