Cayley–Hamilton theorem
From Wikipedia, the free encyclopedia
In linear algebra, the Cayley–Hamilton theorem (named after the mathematicians Arthur Cayley and William Hamilton) states that every square matrix over the real or complex field, satisfies its own characteristic equation. This means the following: if A is the given square n×n matrix and In is the n×n identity matrix, then the characteristic polynomial of A is defined as:
where "det" is the determinant function. The Cayley–Hamilton theorem states that substituting the matrix A in the characteristic polynomial results in the zero matrix:
The Cayley–Hamilton theorem holds for square matrices over commutative rings as well.
An important corollary of the Cayley–Hamilton theorem is that the minimal polynomial of a given matrix is a divisor of its characteristic polynomial. This is very useful in finding the Jordan form of a matrix.
Contents |
[edit] Example
Consider for example the matrix
The characteristic polynomial is given by
The Cayley–Hamilton theorem then claims that
- A2 − 5A − 2I2 = 0
which one can quickly verify in this case.
As a result of this, the Cayley–Hamilton theorem allows us to calculate powers of matrices more simply than by direct multiplication.
Taking the result above
- A2 − 5A − 2I2 = 0
- A2 = 5A + 2I2.
Then, for example, to calculate A4, observe
- A3 = (5A + 2I2)A = 5A2 + 2A = 5(5A + 2I2) + 2A = 27A + 10I2
- A4 = A3A = (27A + 10I2)A = 27A2 + 10A = 27(5A + 2I2) + 10A
- A4 = 145A + 54I2.
The theorem is also an important tool in calculating eigenvectors.
Note that for a general n x n matrix A, when its inverse exists, i.e. , A − 1 can be written as a finite sum of powers of A: the form of the characteristic polynomial is
and multiplying both sides of p(A) = 0 by A − 1 gives
[edit] Proving the Cayley-Hamilton theorem
The Cayley–Hamilton theorem may be viewed as a corollary to Cramer's rule from linear algebra. Cramer's rule states that if S is an n×n matrix, and Adj(S) denotes the adjugate matrix, which is the transpose of the matrix of cofactors of S, then S Adj(S) = det(S)In where In is the n×n identity matrix.
There is a famous incorrect one-line proof of the Cayley–Hamilton theorem from this identity: take S = t In − A and substitute t = A into the identity. This argument is fallacious because, in the expressions Adj(t In − A) and det(t In − A), t is supposed to be a scalar, so it does not make sense to substitute an n×n matrix for it.
There are several ways to repair this proof, using the observations that the matrix identity
- (t In − A) Adj(t In − A) = det(t In − A) In = p(t)In (*)
holds for all scalars t and that both sides of the equality are matrices of polynomials (over the real or complex numbers) of degree n in t. Hence (since the real and complex numbers are infinite fields) the equation (*) is a polynomial identity, which may be viewed as an identity between polynomials (of degree n) in t with matrices as coefficients.
Claim: it is now legitimate to substitute t=A into this identity, and the Cayley-Hamilton theorem follows.
- One way to justify this claim is to write out p(t) and Adj(t In − A) as polynomials (the latter having matrix coefficients, and degree n − 1), and equate coefficients of t in (*). Multiplying the coefficient of tj by Aj (on the left) and summing over j then shows that p(A) = 0 as required.
There is, however, a more abstract approach, which ultimately proves to be more fruitful.
Let k[t] be the polynomial ring over k, where k denotes the real or complex numbers, and let kn[t] be the vector space of length n column vectors whose entries are elements of k[t]. This is a k[t]-module: an element of kn[t] can be multiplied by a polynomial in k[t] to yield an element of kn[t].
The matrix A provides a homomorphism from k[t] to the ring of n×n matrices by evaluating at t = A. (For example, if f(t) = 3t2 + t + 1 is in k[t], then the evaluation of f at t = A is f(A) = 3A2 + A + In.)
There is also an evaluation map evA at t = A from kn[t] to kn: if for column vectors vj, then evA (v(t)) is defined to be .
Proof of Cayley–Hamilton theorem. Apply identity (*) to an arbitrary column vector v in kn to obtain the identity
- (t In − A) Adj(t In − A)v = p(t)v
in kn[t]. Now apply the evaluation map evA to both sides. Notice that the left hand side evaluates to
- evA((t In − A) Adj(t In − A) v) = (A In − A) evA(Adj(t In − A) v) = 0.
On the other hand, the right hand side evaluates to p(A)v. Thus p(A)v = 0 for every column vector v, and therefore p(A) = 0.
[edit] Abstraction and generalizations
The preceding proof made use of the fact that Cramer's rule holds for matrices with entries in the ring of polynomials k[t] with k equal to the real or complex numbers. In fact, Cramer's rule remains true for matrices with entries in any commutative ring B. Using this, one can generalize the Cayley-Hamilton theorem to matrices with entries in any commutative ring R, using Cramer's rule for matrices with entries in B = R[t].
More abstractly, the Cayley–Hamilton theorem holds for finitely generated free modules M over a commutative ring R (for example, for vector spaces V over any field k). If φ:M→M is an endomorphism of R-modules (for example, a linear map from a vector space V to itself), then φ has a characteristic polynomial p defined as follows. Let ei for i=1,2,...,n be a basis of generators of M. In terms of these generators, for some square matrix A = (aij) with entries in R. Now define p(t) = det(t In − A): it can be shown that p is independent of the choice of basis of M. Similarly, one can define the adjugate endomorphism of an endomorphism of M, and establish an abstract version of Cramer's rule.
The Cayley–Hamilton theorem then states that p(φ) = 0. This more general version of the theorem is the source of the celebrated Nakayama lemma in commutative algebra and algebraic geometry.
Proof. Let M[t] be the R[t]-module of polynomials with coefficients in M. Now note that M can also be made into an R[t]-module, by defining, for m∈M and ,
- .
Evaluation at t = φ, which sends to , defines a homomorphism evφ:M[t]→M of k[t]-modules, because the powers φj of φ commute with each other.
By Cramer's rule (over the commutative ring B = R[t]), for any m∈M,
- (t idM − φ)Adj(t idM − φ)m =p(t)m.
After an application of evφ to both sides of this equation, the left hand side reduces to zero (because the composition of the evaluation map with t idM − φ is zero), whereas the right hand reduces to p(φ)m (since the evaluation map is a homomorphism). Thus p(φ)m = 0 for every m∈M, and therefore p(φ) = 0.