Characteristic polynomial

From Wikipedia, the free encyclopedia

In linear algebra, one associates a polynomial to every square matrix, its characteristic polynomial. This polynomial encodes several important properties of the matrix, most notably its eigenvalues, its determinant and its trace.

Contents

[edit] Motivation

Given a square matrix A, we want to find a polynomial whose roots are precisely the eigenvalues of A. For a diagonal matrix A, the characteristic polynomial is easy to define: if the diagonal entries are a1, a2, a3, etc. the characteristic polynomial will be:

(t-a_{1})(t-a_{2})(t-a_{3})\dots

This works because the diagonal entries are also the eigenvalues of this matrix.

For a general matrix A, one can proceed as follows. If λ is an eigenvalue of A, then there is an eigenvector \mathbf{v} \neq 0 such that

A \mathbf{v} = \lambda \mathbf{v},

or

(\lambda I - A)\mathbf{v} = 0

(where I is the identity matrix). Since v is non-zero, this means that the matrix λIA is singular, which in turn means that its determinant is 0. We have just shown that the roots of the function det(t IA) are the eigenvalues of A. Since this function is a polynomial in t, we're done.

[edit] Formal definition

We start with a field K (such as the real or complex numbers) and an n×n matrix A over K. The characteristic polynomial of A, denoted by pA(t), is the polynomial defined by

pA(t) = det(t IA)

where I denotes the n-by-n identity matrix and the determinant is being taken in K(t), the field of rational functions in t. This is indeed a polynomial, since determinants are defined in terms of sums of products. (Some authors define the characteristic polynomial to be det(At I); the difference is immaterial since the two polynomials differ at most by a sign - when n is odd .)

[edit] Example

Suppose we want to compute the characteristic polynomial of the matrix

A=\begin{pmatrix}
2 & 1\\
-1& 0
\end{pmatrix}.

We have to compute the determinant of

t I-A = \begin{pmatrix}
t-2&-1\\
1&t
\end{pmatrix}

and this determinant is

(t-2)t - 1(-1) = t^2-2t+1.\,\!

The latter is the characteristic polynomial of A.

[edit] Properties

The polynomial pA(t) is monic (its leading coefficient is 1) and its degree is n. The most important fact about the characteristic polynomial was already mentioned in the motivational paragraph: the eigenvalues of A are precisely the roots of pA(t). The constant coefficient pA(0) is equal to (−1)n times of the determinant of A, and the coefficient of t n-1 is equal to -tr(A), the matrix trace of A. For a 2×2 matrix A, the characteristic polynomial is nicely expressed then as

t 2 − tr(A)t + det(A).

All real polynomials of odd degree have a real number as a root, so for odd n, every real matrix has at least one real eigenvalue. Many real polynomials of even degree do not have a real root, but the fundamental theorem of algebra states that every polynomial of degree n has n complex roots, counted with their multiplicities. The non-real roots of real polynomials, hence the non-real eigenvalues, come in conjugate pairs.

The Cayley-Hamilton theorem states that replacing t by A in the expression for pA(t) yields the zero matrix: pA(A) = 0. Simply, every matrix satisfies its own characteristic equation. As a consequence of this, one can show that the minimal polynomial of A divides the characteristic polynomial of A.

Two similar matrices have the same characteristic polynomial. The converse however is not true in general: two matrices with the same characteristic polynomial need not be similar.

The matrix A and its transpose have the same characteristic polynomial. A is similar to a triangular matrix if and only if its characteristic polynomial can be completely factored into linear factors over K. In fact, A is even similar to a matrix in Jordan normal form in this case.

[edit] Characteristic polynomial of a product of two matrices

If A and B are two square n×n matrices then characteristic polynomials of AB and BA coincide:

p_{AB}(t)=p_{BA}(t).\,

More generally, if A is m×n-matrix and B is n×m matrices such that m<n, then AB is m×m and BA is n×n matrix. One has

 p_{BA}(t) = t^{n-m} p_{AB}(t).\,

To prove the first result, recognize that the equation to be proved, as a polynomial in t and in the entries of A and B is a universal polynomial identity. It therefore suffices to check it on an open set of parameter values in the complex numbers. The tuples (A,B,t) where A is an invertible complex n by n matrix, B is any complex n by n matrix, and t is any complex number from an open set in complex space of dimension 2n2 + 1. When A is non-singular our result follows from the fact that AB and BA are similar:

BA = A^{-1} (AB) A.\,

[edit] Types

[edit] Characteristic equation

In linear algebra, the characteristic equation (or secular equation) of a square matrix A is the equation in one variable λ

\det(A - \lambda I) = 0 \,

where det is the determinant and I is the identity matrix. The solutions of the characteristic equation are precisely the eigenvalues of the matrix A. The polynomial which results from evaluating the determinant is the characteristic polynomial of the matrix.

For example, the matrix

P = \begin{bmatrix} 19 & 3 \\ -2 & 26 \end{bmatrix}

has characteristic equation

\begin{align}
 0 &{}= \det(P - \lambda I) \\
   &{}= \det\begin{bmatrix} 19-\lambda & 3 \\ -2 & 26-\lambda \end{bmatrix} \\
   &{}= 500-45\lambda+\lambda^2 \\
   &{}= (25-\lambda)(20-\lambda) .
\end{align}

The eigenvalues of this matrix are therefore 20 and 25.

Some shortcuts exist for low dimension matrices. For a 2×2 matrix A, the characteristic polynomial can be found from its determinant and trace, tr(A), to be

\det(A)-{\operatorname{tr}}(A)\lambda+\lambda^2.

For a 3×3 matrix, we define c2 as the sum of the principal minors of the matrix, and find the characteristic polynomial to be

\det(A)-c_2\lambda+{\operatorname{tr}}(A)\lambda^2-\lambda^3.

The Cayley–Hamilton theorem states that every square matrix satisfies its own characteristic equation.

[edit] Discrete mathematics

In discrete mathematics, the characteristic equation is used when solving recurrence problems. One can specify a recurrence relation of the form

t_{n} = At_{n-1} + Bt_{n-2} \,\!

where the value of tn is dependent on the values of tn−1 and tn−2. When solving a recurrence relation, the goal is to eliminate this dependency and derive an equation of the form

t_{n} = c_{1}{r_{1}}^n + c_{2}{r_{2}}^n , \,\!

where c1 and c2 are constants and r1 and r2 are the roots of the characteristic equation

r^2 - Ar - B = 0 , \,\!

where A and B are the constants defined in the original recurrence relation.

[edit] Secular function

The term secular function has been used for what mathematicians now call a characteristic function of a linear operator (in some literature the term secular function is still used). The term comes from the fact that these functions were used to calculate secular perturbations (on a time scale of a century, i.e slow compared to annual motion) of planetary orbits, according to Lagrange's theory of oscillations.

In linear algebra, zeros of a secular function are the eigenvalues of a matrix. Characteristic polynomials also have eigenvalues as roots.

The characteristic polynomial is defined by the determinant of the matrix with a shift. It has zeros only, without any pole. Commonly, the secular function implies the characteristic polynomial. But, in the strict sense, the secular function has poles as well. Interestingly, the poles are located in the eigenvalues of its sub-matrices. Thus, if the information of the sub-matrices is available, the eigenvalues of the matrix can be described using that kind of information. Furthermore, by partitioning the matrix like matrix tearing or gruing, we can iterate the eigenvalues in a recursive way. According to the methods of partitioning, the variant forms of the secular functions can be built up. However, they are all of the form of a series of the simple rational functions, which have poles at the eigenvalues of the partitioned matrices. For example, we can find a form of secular function in the divide-and-conquer eigenvalue algorithm.

Recently, the secular function has been utilized in signal processing. The estimation problem with uncertainty involves a sort of eigenvalue problem, such as a bounded data uncertainty, total least squares, data least squares, partial least squares, errors-in-variables model, etc. Many cases have been solved using their own secular equations. Some are still trying to find the unique secular equation that can resolve a given uncertainty estimation problem.

As for a numerical aspect, it is known that Newton's method is delicate when finding the roots of the secular equation. The higher-order interpolations are recommended. Among them, a simple rational approximation is a good choice considering the balance between the stability and the computational complexity. It is because the secular equation itself consists of a series of simple rational functions. However, using only interpolation cannot guarantee the stability. Thus fine search algorithms such as bisection steps are still required for accuracy.

[edit] Secular equation

Secular equation has several meanings.

In mathematics and numerical analysis, it means characteristic equation. See also characteristic polynomial.

Speaking of the other usage in numerical analysis, it is an extension of characteristic equation. See also secular function.

In astronomy, it is the algebraic or numerical expression of the magnitude of the inequalities in a planet's motion that remain after the inequalities of a short period have been allowed for.[1]

In molecular orbital calculations relating to the energy of the electron and its wave function, it is also used instead of the characteristic equation.

[edit] See also