Centering matrix

From Wikipedia, the free encyclopedia

In mathematics and multivariate statistics, the centering matrix[1] is a symmetric and idempotent matrix, which when multiplied with a vector has the same effect as subtracting the mean of the components of the vector from every component.

Contents

[edit] Definition

The centering matrix of size n is defined as the n-by-n matrix

C_n = I_n - \frac{1}{n}\mathbf{1}\mathbf{1}'

where I_n\, is the identity matrix of size n, \mathbf{1} is the column-vector of n ones and where {\,}' denotes matrix transpose. For example

C_1 = \begin{bmatrix}
0 \end{bmatrix}
,\  
C_2 = \left[ \begin{array}{rrr} 
\frac{1}{2} & -\frac{1}{2} \\ \\
-\frac{1}{2} & \frac{1}{2} 
\end{array} \right] 
,\ 
C_3 = \left[ \begin{array}{rrr}
\frac{2}{3} & -\frac{1}{3} & -\frac{1}{3} \\ \\
-\frac{1}{3} & \frac{2}{3} &  -\frac{1}{3} \\ \\
-\frac{1}{3} & -\frac{1}{3} & \frac{2}{3} 
\end{array} \right]

[edit] Properties

Given a column-vector, \mathbf{v}\, of size n, the centering property of C_n\, can be expressed as

C_n\,\mathbf{v} = \mathbf{v}-(\frac{1}{n}\mathbf{1}'\mathbf{v})\mathbf{1}

where \frac{1}{n}\mathbf{1}'\mathbf{v} is the mean of the components of \mathbf{v}\,.

C_n\, is symmetric positive semi-definite.

C_n\, is idempotent, so that C_n^k=C_n, for k=1,2,\ldots. Once you have removed the mean, it is zero and removing it again has no effect.

C_n\, is singular. The effects of applying the transformation C_n\,\mathbf{v} cannot be reversed.

C_n\, has the eigenvalue 1 of multiplicity n − 1 and 0 of multiplicity 1.

C_n\, has a nullspace of dimension 1, along the vector \mathbf{1}.

C_n\, is a projection matrix. That is, C_n\mathbf{v} is a projection of \mathbf{v}\, onto the (n − 1)-dimensional subspace that is orthogonal to the nullspace \mathbf{1}. (This is the subspace of all n-vectors whose components sum to zero.)

[edit] Application

Although multiplication by the centering matrix is not a computationally efficient way of removing the mean from a vector, it forms an analytical tool that conveniently and succinctly expresses mean removal. It can be used not only to remove the mean of a single vector, but also of multiple vectors stored in the rows or columns of a matrix. For an m-by-n matrix X\,, the multiplication C_m\,X removes the means from each of the n columns, while X\,C_n removes the means from each of the m rows.

The centering matrix provides in particular a succinct way to express the scatter matrix, S=(X-\mu\mathbf{1}')(X-\mu\mathbf{1}')' of a data sample X\,, where \mu=\tfrac{1}{n}X\mathbf{1} is the sample mean. The centering matrix allows us to express the scatter matrix more compactly as

S=X\,C_n(X\,C_n)'=X\,C_n\,C_n\,X\,'=X\,C_n\,X\,'.

[edit] References

  1. ^ John I. Marden, Analyzing and Modeling Rank Data, Chapman & Hall, 1995, ISBN 0412995212, page 59.