Canonical analysis

From Wikipedia, the free encyclopedia

In statistics, canonical analysis (from Gk.κανων bar, measuring rod, ruler) belongs to the family of regression methods for data analysis. Regression analysis quantifies a relationship between a predictor variable and a criterion variable by the coefficient of correlation r, coefficient of determination r2, and the standard regression coefficient β. Multiple regression analysis expresses a relationship between a set of predictor variables and a single criterion variable by the multiple correlation R, multiple coefficient of determination R², and a set of standard partial regression weights β1, β2, etc. Canonical variate analysis captures a relationship between a set of predictor variables and a set of criterion variables by the canonical correlations ρ1, ρ2, ..., and by the sets of canonical weights C and D.

[edit] Canonical analysis

Canonical analysis belongs to a group of methods which involve solving the characteristic equation for its latent roots and vectors. It describes formal structures in hyperspace invariant with respect to the rotation of their coordinates. In this type of solution, rotation leaves many optimizing properties preserved, provided it takes place in certain ways and in a subspace of its corresponding hyperspace. This rotation from the maximum intervariate correlation structure into a different, simpler and more meaningful structure increases the interpretability of the canonical weights C and D. In this the canonical analysis differs from Harold Hotelling’s (1936) canonical variate analysis (also called the canonical correlation analysis), designed to obtain maximum (canonical) correlations between the predictor and criterion canonical variates. The difference between the canonical variate analysis and canonical analysis is analogous to the difference between the principal components analysis and factor analysis, each with its characteristic set of communalities, eigenvalues and eigenvectors.

[edit] Canonical analysis (simple)

Canonical analysis is a multivariate technique which is concerned with determining the relationships between groups of variables in a data set. The data set is split into two groups, lets call these groups X and Y, based on some common characteristics. The purpose of Canonical analysis is then to find the relationship between X and Y, IE can some form of X represent Y. It works by finding the linear combination of X variables, IE X1, X2 etc and linear combination of Y variables, IE Y1, Y2 etc which are most highly correlated. This combination is known as the "first canonical variates" which are usually denoted U1 and V1, with the pair of U1 and V1 being called a "canonical function". The next canonical function, U2 and V2 are then restricted so that they are uncorrelated with U1 and V1. Everything is scaled so that the variance equals 1 .

[edit] References

  • Cliff, N. and Krus, D. J. (1976) Interpretation of canonical variate analysis: Rotated vs. unrotated solutions. Psychometrika, 41, 1, 35-42. (Request reprint).
  • Hotelling, H. (1936) Relations between two sets of variates. Biometrika, 28, 321-377
  • Krus, D.J., et al. (1976) Rotation in canonical analysis. Educational and Psychological Measurement, 36, 725-730. (Request reprint).
  • Liang, K.H., Krus, D.J., & Webb, J.M. (1995) K-fold crossvalidation in canonical analysis. Multivariate Behavioral Research, 30, 539-545. (Request reprint).