Representative sequences
From Wikipedia, the free encyclopedia
The introduction to this article provides insufficient context for those unfamiliar with the subject. Please help improve the article with a good introductory style. |
This article may require cleanup to meet Wikipedia's quality standards. Please improve this article if you can. (June 2007) |
Protein sequences can provide data about the biological function and evolution of proteins and protein domains. Grouping and interrelating protein sequences can therefore provide information about both human biological processes, and the historical development of biological processes on earth.
Such Sequence clusters allow the effective coverage of sequence space.
Sequence clusters can reduce a large database of sequences to a smaller set of "sequence representatives", each of which should "represent" its cluster at the sequence level.
Sequence representatives allow the effective coverage of the original database with fewer sequences. The database of sequence representatives is called "non-redundant", as similar (or redundant) sequences have been removed at a certain similarity threshold.