Cayley–Hamilton theorem

From Wikipedia, the free encyclopedia

In linear algebra, the Cayley–Hamilton theorem (named after the mathematicians Arthur Cayley and William Hamilton) states that every square matrix over the real or complex field, satisfies its own characteristic equation. This means the following: if A is the given square n×n matrix and I_n is the n×n identity matrix, then the characteristic polynomial of A is defined as:

$p(\lambda)=\det(\lambda I_n-A)\,$

where "det" is the determinant function. The Cayley–Hamilton theorem states that substituting the matrix $A$ in the characteristic polynomial results in the zero matrix:

$p(A)=0.\,$

The Cayley–Hamilton theorem holds for square matrices over commutative rings as well.

An important corollary of the Cayley–Hamilton theorem is that the minimal polynomial of a given matrix is a divisor of its characteristic polynomial. This is very useful in finding the Jordan form of a matrix.

1 Example
2 Proving the Cayley-Hamilton theorem
3 Abstraction and generalizations
4 See also
5 External links

[edit] Example

Consider for example the matrix

$A = \begin{bmatrix}1&2\\ 3&4\end{bmatrix}.$

The characteristic polynomial is given by

$p(\lambda)=\begin{vmatrix}\lambda-1&-2\\ -3&\lambda-4\end{vmatrix}=(\lambda-1)(\lambda-4)-2\cdot3=\lambda^2-5\lambda-2.$

The Cayley–Hamilton theorem then claims that

A 2 - 5 A - 2 I 2 = 0

which one can quickly verify in this case.

As a result of this, the Cayley–Hamilton theorem allows us to calculate powers of matrices more simply than by direct multiplication.

Taking the result above

A 2 - 5 A - 2 I 2 = 0

A 2 = 5 A + 2 I 2 .

Then, for example, to calculate A⁴, observe

A 3 = (5 A + 2 I 2) A = 5 A 2 + 2 A = 5(5 A + 2 I 2) + 2 A = 27 A + 10 I 2

A 4 = A 3 A = (27 A + 10 I 2) A = 27 A 2 + 10 A = 27(5 A + 2 I 2) + 10 A

A 4 = 145 A + 54 I 2 .

The theorem is also an important tool in calculating eigenvectors.

Note that for a general n x n matrix A, when its inverse exists, i.e. $\det(A) \ne 0$ , $A - 1$ can be written as a finite sum of powers of A: the form of the characteristic polynomial is

$p(\lambda)=\lambda^n-{\rm tr} (A)\lambda^{n-1}+\cdots+(-1)^n\det(A),$

and multiplying both sides of $p (A) = 0$ by $A - 1$ gives

$A^{-1}=\frac{(-1)^{n-1}}{\det(A)}(A^{n-1}-{\rm tr} (A)A^{n-2}+\cdots).$

[edit] Proving the Cayley-Hamilton theorem

The Cayley–Hamilton theorem may be viewed as a corollary to Cramer's rule from linear algebra. Cramer's rule states that if S is an n×n matrix, and Adj(S) denotes the adjugate matrix, which is the transpose of the matrix of cofactors of S, then S Adj(S) = det(S)I_n where I_n is the n×n identity matrix.

There is a famous incorrect one-line proof of the Cayley–Hamilton theorem from this identity: take S = t I_n − A and substitute t = A into the identity. This argument is fallacious because, in the expressions Adj(t I_n − A) and det(t I_n − A), t is supposed to be a scalar, so it does not make sense to substitute an n×n matrix for it.

There are several ways to repair this proof, using the observations that the matrix identity

(t I_n − A) Adj(t I_n − A) = det(t I_n − A) I_n = p(t)I_n (*)

holds for all scalars t and that both sides of the equality are matrices of polynomials (over the real or complex numbers) of degree n in t. Hence (since the real and complex numbers are infinite fields) the equation (*) is a polynomial identity, which may be viewed as an identity between polynomials (of degree n) in t with matrices as coefficients.

Claim: it is now legitimate to substitute t=A into this identity, and the Cayley-Hamilton theorem follows.

One way to justify this claim is to write out p(t) and Adj(t I_n − A) as polynomials (the latter having matrix coefficients, and degree n − 1), and equate coefficients of t in (*). Multiplying the coefficient of t^j by A^j (on the left) and summing over j then shows that p(A) = 0 as required.

There is, however, a more abstract approach, which ultimately proves to be more fruitful.

Let k[t] be the polynomial ring over k, where k denotes the real or complex numbers, and let kⁿ[t] be the vector space of length n column vectors whose entries are elements of k[t]. This is a k[t]-module: an element of kⁿ[t] can be multiplied by a polynomial in k[t] to yield an element of kⁿ[t].

The matrix A provides a homomorphism from k[t] to the ring of n×n matrices by evaluating at t = A. (For example, if $f (t) = 3 t 2 + t + 1$ is in k[t], then the evaluation of f at t = A is $f (A) = 3 A 2 + A + I n$ .)

There is also an evaluation map ev_A at t = A from kⁿ[t] to kⁿ: if $v(t)=\sum t^j v_j$ for column vectors v_j, then ev_A (v(t)) is defined to be $\sum A^j v_j$ .

Proof of Cayley–Hamilton theorem. Apply identity (*) to an arbitrary column vector v in kⁿ to obtain the identity

(t I_n − A) Adj(t I_n − A)v = p(t)v

in kⁿ[t]. Now apply the evaluation map ev_A to both sides. Notice that the left hand side evaluates to

ev_A((t I_n − A) Adj(t I_n − A) v) = (A I_n − A) ev_A(Adj(t I_n − A) v) = 0.

On the other hand, the right hand side evaluates to p(A)v. Thus p(A)v = 0 for every column vector v, and therefore p(A) = 0.

[edit] Abstraction and generalizations

The preceding proof made use of the fact that Cramer's rule holds for matrices with entries in the ring of polynomials k[t] with k equal to the real or complex numbers. In fact, Cramer's rule remains true for matrices with entries in any commutative ring B. Using this, one can generalize the Cayley-Hamilton theorem to matrices with entries in any commutative ring R, using Cramer's rule for matrices with entries in B = R[t].

More abstractly, the Cayley–Hamilton theorem holds for finitely generated free modules M over a commutative ring R (for example, for vector spaces V over any field k). If φ:M→M is an endomorphism of R-modules (for example, a linear map from a vector space V to itself), then φ has a characteristic polynomial p defined as follows. Let e_i for i=1,2,...,n be a basis of generators of M. In terms of these generators, $\varphi(e_j)=\sum a_{ij}e_i$ for some square matrix $A = (a i j)$ with entries in R. Now define p(t) = det(t I_n − A): it can be shown that p is independent of the choice of basis of M. Similarly, one can define the adjugate endomorphism of an endomorphism of M, and establish an abstract version of Cramer's rule.

The Cayley–Hamilton theorem then states that p(φ) = 0. This more general version of the theorem is the source of the celebrated Nakayama lemma in commutative algebra and algebraic geometry.

Proof. Let M[t] be the R[t]-module of polynomials with coefficients in M. Now note that M can also be made into an R[t]-module, by defining, for m∈M and $f(t)=\sum r_j t^j\in R[t]$ ,

$f(t)\cdot m = f(\varphi)(m) = \sum r_j \varphi^j(m)$ .

Evaluation at t = φ, which sends $\sum t^j m_j$ to $\sum \varphi^j(m_j)$ , defines a homomorphism ev_φ:M[t]→M of k[t]-modules, because the powers φ^j of φ commute with each other.

By Cramer's rule (over the commutative ring B = R[t]), for any m∈M,

(t id_M − φ)Adj(t id_M − φ)m =p(t)m.

After an application of ev_φ to both sides of this equation, the left hand side reduces to zero (because the composition of the evaluation map with t id_M − φ is zero), whereas the right hand reduces to p(φ)m (since the evaluation map is a homomorphism). Thus p(φ)m = 0 for every m∈M, and therefore p(φ) = 0.