Chou's invariance theorem
Chou's invariance theorem, named after Kuo-Chen Chou, is a result deployed in bioinformatics and cheminformatics related to multivariate statistics. Where a distance that would, in standard statistical theory, be defined as a Mahalanobis distance cannot be defined in this way because the relevant covariance matrix is singular, a replacement would be to reduce the dimension of the multivariate space until the relevant covariance matrix is invertible. This can be achievable by simply omitting one or more of the original coordinates until a space of full rank is reached. Chou's invariance theorem says that it does not matter which of the coordinates are selected for removal, as the same values of distance would be calculated as a final result.
Background
When using Mahalanobis distance or covariant discriminant[1] to calculate the similarity of two proteins based on their amino acid compositions, to avoid the divergence problem due to the normalization condition imposed to their 20 constituent components, a dimension-reduced operation is needed by leaving out one of the 20 components and making the remaining 19 components completely independent. However, which one of the 20 components should be removed? Will the result be different by removing a different component? The same problems also occur when the calculation is based on (20 + λ)-D (dimensional) pseudo amino acid composition, where λ is an integer. Generally speaking, to calculate the Mahalanobis distance or covariant discriminant between two vectors each with Ω normalized components, the dimension-reduced operation is needed and hence the aforementioned problems are always to occur. To address these problems, the Chou's Invariance Theorem was developed in 1995.
Essence
According to the Chou’s invariance theorem, the outcome of the Mahalanobis distance or covariant discriminant will remain the same regardless of which one of the components is left out. Accordingly, any one of the constituent normalized components can be left out to overcome the divergence problem without changing the final result for Mahalanobis distance or covariant discriminant.
Proof
The rigorous mathematical proof for the theorem was given in the appendix of a paper by Chou.[2]
Applications
The theorem has been used in predicting protein subcellular localization,[3] identifying apoptosis protein subcellular location,[4] predicting protein structural classification,[5][6] as well as identifying various other important attributes for proteins.
References
- ↑ Chou KC, Elrod DW (February 1999). "Protein subcellular location prediction". Protein Eng. 12 (2): 107–18. PMID 10195282. doi:10.1093/protein/12.2.107.
- ↑ Chou KC (April 1995). "A novel approach to predicting protein structural classes in a (20-1)-D amino acid composition space". Proteins. 21 (4): 319–44. PMID 7567954. doi:10.1002/prot.340210406.
- ↑ Pan YX, Zhang ZZ, Guo ZM, Feng GY, Huang ZD, He L (May 2003). "Application of pseudo amino acid composition for predicting protein subcellular location: stochastic signal processing approach". J. Protein Chem. 22 (4): 395–402. PMID 13678304. doi:10.1023/A:1025350409648.
- ↑ Zhou GP, Doctor K (January 2003). "Subcellular location prediction of apoptosis proteins". Proteins. 50 (1): 44–8. PMID 12471598. doi:10.1002/prot.10251.
- ↑ Zhou GP (November 1998). "An intriguing controversy over protein structural class prediction". J. Protein Chem. 17 (8): 729–38. PMID 9988519. doi:10.1023/A:1020713915365.
- ↑ Zhou GP, Assa-Munt N (July 2001). "Some insights into protein structural class prediction". Proteins. 44 (1): 57–9. PMID 11354006. doi:10.1002/prot.1071.