Cayley–Hamilton theorem
From Wikipedia, the free encyclopedia
In linear algebra, the Cayley–Hamilton theorem (named after the mathematicians Arthur Cayley and William Hamilton) states that every square matrix over the real or complex field satisfies its own characteristic equation.
More precisely; if A is the given square n×n matrix and In is the n×n identity matrix, then the characteristic polynomial of A is defined as:
where "det" is the determinant function. The Cayley–Hamilton theorem states that substituting the matrix A in the characteristic polynomial (which involves multiplying its constant term by In, since that is the "zero-th" power of A) results in the zero matrix:
The Cayley–Hamilton theorem also holds for square matrices over commutative rings and for N-partitioned matrices.[1]
An important corollary of the Cayley–Hamilton theorem is that the minimal polynomial of a given matrix is a divisor of its characteristic polynomial. This is very useful in finding the Jordan form of a matrix.
Contents |
[edit] Example
For two dimensional matrices, the theorem gives the identity
In three dimensions, the expression is
and in four
Similar expressions can be obtained for the higher dimensional cases.
Consider for example the matrix
The characteristic polynomial is given by
The Cayley–Hamilton theorem then claims that
which one can quickly verify in this case.
As a result of this, the Cayley–Hamilton theorem allows us to calculate powers of matrices more simply than by direct multiplication.
Taking the result above
Then, for example, to calculate A4, observe
The theorem is also an important tool in calculating eigenvectors.
Note that for a general n x n matrix A, when its inverse exists, i.e. , A − 1 can be written as a finite sum of powers of A: the form of the characteristic polynomial is
and multiplying both sides of Failed to parse (Cannot write to or create math output directory): p(A)=0
by A − 1 gives
[edit] Proving the Cayley–Hamilton theorem
[edit] Erroneous arguments
The biggest difficulty about the Cayley–Hamilton theorem is that there are many wrong ways to prove it; it is crucial to understand what is wrong about them before attempting a correct proof. In most cases errors involve an application of arguments that might be valid in a commutative context to the non-commutative setting of matrix arithmetic. The simplest erroneous argument is even more crude however: it consists of taking the defining equation p(λ) = det(λIn − A) of the characteristic polynomial, and to simply substitute A for λ, giving p(A) = det(AIn − A) = det(0) = 0. Here the error is of course that on the right hand side the substitution makes no sense: λIn is an matrix with n scalars λ on the diagonal, so substituting A for λ would make AIn a matrix with n matrices A on the diagonal, and whatever meaning (if any) is given to that, it is certainly not equal to the matrix A as the argument suggests (which in fact silently replaces the scalar multiplication in λIn by a matrix multiplication after the substitution). Note by the way that the conclusion of this argument equates p(A) to the scalar 0 rather than to the zero matrix as it should.
It may be noted that the very statement of the theorem does something somewhat similar, by taking the matrix A and plugging it into the polynomial p with scalar coefficients; in this case however this is a precisely defined operation that replaces monomials ti by Ai and any constant term c by cIn, and evaluates the resulting linear combination of matrices in the vector space of matrices (or more abstractly formulated, it takes the image of p(t) under the unique ring homomorphism that sends 1 to In and t to A).
A somewhat more involved but still erroneous argument starts with a statement that is indeed a key ingredient for a correct proof, namely that for any matrix S there exists an explicitly determined matrix Adj(S), its adjugate matrix, for which Adj(S)S = SAdj(S) = det(S)In holds (this is a form of Cramer's rule from linear algebra). Since this is true for matrices with entries in any commutative ring, we can apply it to the matrix tIn − A, which has entries in the polynomial ring k[t]. This gives
Now as in the first argument given, we may be tempted to substitute A for t, and observe that the first factor, and therefore the left hand side, becomes the null matrix, and the right hand side becomes p(A)In = p(A). Apart from the fact that we have now confused scalar and matrix multiplication on both sides of the equation, so that the result at least has matrices at both sides, the substitution of A for t in tIn − A still makes as little sense as in the original argument.
A more sophisticated argument continues from (*), remarking that it can be viewed as an identity between polynomials (of degree n) in t with matrices as coefficients, in which setting the substitution of A for t (still reinterpreting the scalar multiplications in terms of the form Bti as matrix multiplications) would be justified. Considering polynomials over the non-commutative ring Mn(k) may seem strange, but it arises naturally here when writing a matrix with entries polynomial in t as a sum of terms, each consisting of the scalar multiplication of a constant matrix by a power of t, which is certainly possible. That point of view also dictates how such polynomials should be multiplied, the basic case being BtiCtj = BCti + j for any constant matrices B, C. However a fundamental difference of such polynomials over a non-commutative ring M with the case of polynomials over a commutative ring, is that for an element (a constant matrix in our case) there is no evaluation homomorphism from the ring M[t] to the ring M that sends t to B (unless B lies in the center of M). This is because the multiplication in M[t] implies that t commutes with all elements of M, and there is no way to define substitution of a non-central element for t that respects this (i.e., gives a ring homomorphism). Therefore the argument involving substitution of A for t is erroneous.
The following example illustrates what is going on in a concrete case. Take
- ,
whose characteristic polynomial is easily seen to be p(t) = t2, while the adjugate matrix of tI2 − A is tI2 + A; indeed we get as instance of (*) the equation
- (tI2 − A)(tI2 + A) = t2I2
which is valid for any scalar t (since tI2A = AtI2 and A2 = 0). The given argument now claims that it is legitimate to substitute a matrix for t into this equation, interpreting tI2 and t2I2 as matrix products. Of course one cannot refute this claim by setting t = A (after all the Cayley–Hamilton theorem is true), but by substituting some other matrix B for t instead, we would get (B − A)(B + A) = B2 or after simplification BA − AB = 0, which is false in general (take for instance for B the transpose matrix of A). The upshot is that there is no way to justify any argument in which scalar multiplications are at some point simply replaced by matrix multiplications.
Here is an even more abstract approach, which nevertheless is still erroneous.
Let k[t] be the polynomial ring over k, where k denotes the real or complex numbers, and let k[t]n be the set of length n column vectors whose entries are elements of k[t]. This is a k[t]-module: an element of k[t]n can be multiplied by a polynomial in k[t] to yield an element of k[t]n.
The matrix A provides a homomorphism from k[t] to the ring of n×n matrices by evaluating at t = A. (For example, if f(t) = 3t2 + t + 1 is in k[t], then the evaluation of f at t = A is f(A) = 3A2 + A + In.)
There is also an evaluation map evA at t = A from k[t]n to kn: if for column vectors vj, then evA (v(t)) is defined to be .
(Erroneous) Proof of Cayley–Hamilton theorem. Apply both sides of identity (*) to an arbitrary column vector v in kn to obtain the identity
- (t In − A) Adj(t In − A)v = p(t)v,
which is an equation in k[t]n since applying a matrix with polynomial entries to a vector with scalar entries gives a vector with polynomial entries. Now apply the evaluation map evA to both sides. Notice that the left hand side evaluates to
- evA((t In − A) Adj(t In − A) v) = (A In − A) evA(Adj(t In − A) v) = 0.
On the other hand, the right hand side evaluates to p(A)v. Thus p(A)v = 0 for every column vector v, and therefore p(A) = 0, QED.
Notice where this argument succumbs to the temptation of replacing the scalar product tIn by a matrix product AIn after substitution. Note also that nothing was proved about the value of evA(M(t)v) or evA(M(t)N(t)v) where M(t) and N(t) are matrices with polynomial coefficients, but some such rule was used in the computation. Confusingly, it is actually possible to define a second map that sends tIn − A to the null matrix and such that for every one has , namely by (note the order in the right hand side, where products are matrix multiplications). But calling this second map evA evaluation is misleading (as it is for the first one), since (as explained before) it is not a homomorphism of (non-commutative) rings. And similarly is it not true that when . Therefore the "improved" argument still fails.
[edit] A proof
Given that all attempts at a slick proof fail, one must do some real work somewhere. We start again with (*), and write
with , and also
with . The left hand side
can now be expanded by bilinearity of matrix multiplication, terms with a fixed power ti collected, and its (matrix) coefficient equated to the matrix piIn on the right. The left hand side of equation of the coefficients of ti then is Bn − 1 for i = n, it is for n > i > 0 and it is for i = 0. Multiplying the equation of the coefficients of ti from the left by Ai and adding up all equations so formed, the left hand sides cancel out completely, and the right hand sides add up to which proves the theorem.
[edit] Some further observations about the proof
Although the given proof completely avoids the dubious practice of subtituting a matrix for t in a non-commutative context, its manipulations can be interpreted as doing something rather close to just that, namely splitting the equation into components according to powers of t, left-multiplying each component by the same power of A, and then adding everything back together again. In fact the reasoning that the left hand sides cancel can be interpreted as a proof that the operation described above, which we shall now call left-evaluation at A to stress the fact that multiplication by t gets interpreted as left-multiplication by A, sends the left hand side of (*) to the null matrix. It does so by first working out the matrix product, thereby not relying on the (false) assumption that evA is a ring homomorphism, but yet it does not use any particular property of the second factor Adj(tIn − A) of the product, indicating that something more general could be said about this.
In the setting of a ring M[t] of polynomials in a commuting indeterminate t over a non-commutative base ring M, there is a well defined notion of Euclidian left-division by a monic polynomial : for every there exist such that P = BQ + R and deg(R) < deg(B), and the pair (Q,R) is unique. As in the commutative case this is proved by induction on deg(P), with the coefficients of Q being determined by decreasing degree (if the leading coefficient of Q is the same as that of P, and the remaining coefficients are determined by induction). When moreover deg(B) = 1, so that it is of the form B = t − A with , the remainder R is a constant polynomial; in the commutative case it can be interpreted as the image of P under the homomorphism of evaluation at t = A, and in the non-commutative case it is still equal to the (non-homomorphic) left-evaluation evA(P). To see this, one may expand (t − A)Q and apply the definition of evA to show that it annihilates that value, basically as is done in the given proof of the theorem; or one may prefer to show first that, although in general one does not have evA(ST) = evA(S)evA(T) for , this equation does hold whenever A commutes with (just) S, as is the case for S = t − A. (There is also an operation of Euclidean right division, and the remainder of right-division by t − A is the similarly defined but generally different right-evaluation at t = A.)
These considerations do not really simplify the proof given, but allow us to make the following general statements that easily imply the theorem: is left-divisble by t − A if and only if evA(P) = 0, and if in a product ST the left factor S is annihilated by evA, then so is the product itself (since S and therefore ST are left-divisble by t − A). While not justifying the final erroneous proof above, this does show that it is not extremely far from something that is true.
One thing we may deduce from these remarks is that Adj(tIn − A) is the quotient in the remainderless Euclidean left-division in Mn(k)[t] of p(t)In by tIn − A. That division can actually be performed within the commutative subring k[A][t] of Mn(k)[t] (where k[A] denotes the k-linear span of the powers of A inside Mn(k)), in which case Euclidean left-division just becomes usual Euclidean division of polynomials over a commutative ring. This in turn implies that the coefficients of Adj(tIn − A) lie in k[A]; in particular this is true for its constant term, which up to a sign equals Adj(A). Thus the adjugate of any matrix A can be expressed as a polynomial in A (with coefficients that depend on A), something that is not easy to deduce directly from the definition of the adjugate matrix. In fact the Euclidean division can be performed in k[A][t] without knowing the value of A, and doing so shows that the coefficients of the polynomial giving the adjugate are essentially the coeficients pi of the characteristic polynomial, but shifted down one degree: one has
Note that from this identity we can obtain the statement of the Cayley–Hamilton theorem by moving Adj( − A) to the right hand side and multiplying the resulting equation (on the left or on the right) by A, since
- .
[edit] Abstraction and generalizations
The preceding proof made use of the fact that Cramer's rule holds for matrices with entries in the ring of polynomials k[t] with k equal to the real or complex numbers. In fact, Cramer's rule remains true for matrices with entries in any commutative ring B. Using this, one can generalize the Cayley–Hamilton theorem to matrices with entries in any commutative ring R, using Cramer's rule for matrices with entries in B = R[t].
More abstractly, the Cayley–Hamilton theorem holds for finitely generated free modules M over a commutative ring R (for example, for vector spaces V over any field k). If φ:M→M is an endomorphism of R-modules (for example, a linear map from a vector space V to itself), then φ has a characteristic polynomial p defined as follows. Let ei for i=1,2,...,n be a basis of generators of M. In terms of these generators, for some square matrix A = (aij) with entries in R. Now define p(t) = det(t In − A): it can be shown that p is independent of the choice of basis of M. Similarly, one can define the adjugate endomorphism of an endomorphism of M, and establish a generalized version of Cramer's rule.
The Cayley–Hamilton theorem then states that p(φ) = 0. The proof given above applies without change, since it does not involve division by (nonzero) elements of k. This more general version of the theorem is the source of the celebrated Nakayama lemma in commutative algebra and algebraic geometry.
[edit] See also
[edit] References
- ^ Warwick, Kevin: "Using the Cayley Hamilton theorem with N partitioned matrices", IEEE Transactions on Automatic Control, Vol.AC 28, No.12, pp.1127-1128, Dec.1983.