Consensus CDS Project

From Wikipedia, the free encyclopedia
CCDS Project
Content
Description Consensus of protein coding regions
Contact
Research center National Center for Biotechnology Information
European Bioinformatics Institute
University of Santa Cruz, California
Wellcome Trust Sanger Institute
Authors Pruitt KD
Primary citation [1]Pruitt et al.
Release date 2009
Access
Website http://www.ncbi.nlm.nih.gov/projects/CCDS/CcdsBrowse.cgi
Tools
Miscellaneous

The Consensus Coding Sequence Project is a collaboration between the National Center for Biotechnology Information, the European Bioinformatics Institute, the University of Santa Cruz, California and the Wellcome Trust Sanger Institute, to agree upon a consistent set of protein coding genes for humans and mice for public use.[1] The CCDS gene sets have been arrived at by consensus of the different partners [2] and they consist of over 17,000 human and over 16,800 mouse genes.

The CCDS set is calculated following coordinated whole genome annotation updates carried out by the NCBI and Ensembl. Annotation updates represent genes that are defined by a mixture of manual curation and automated computational processing.

The general process flow for defining the CCDS gene set includes:compare genome annotation results identify annotated coding regions that have identical location coordinates on the genome quality evaluation remove lower quality CDSs from the core set pending additional review among the collaboration groups.

The CCDS set includes coding regions that are annotated as full-length (with an initiating ATG and valid stop-codon), can be translated from the genome without frameshifts, and use consensus splice-sites. The number and type of quality tests performed may be expanded in the future but includes analysis to identify putative pseudogenes, retrotransposed genes, consensus splice sites, supporting transcripts, and protein homology.

See also

References

  1. 1.0 1.1 Pruitt KD, Harrow J, Harte RA, Wallin C, Diekhans M, Maglott DR, Searle S, Farrell CM, Loveland JE, Ruef BJ, Hart E, Suner MM, Landrum MJ, Aken B, Ayling S, Baertsch R, Fernandez-Banet J, Cherry JL, Curwen V, Dicuccio M, Kellis M, Lee J, Lin MF, Schuster M, Shkeda A, Amid C, Brown G, Dukhanina O, Frankish A, Hart J, Maidak BL, Mudge J, Murphy MR, Murphy T, Rajan J, Rajput B, Riddick LD, Snow C, Steward C, Webb D, Weber JA, Wilming L, Wu W, Birney E, Haussler D, Hubbard T, Ostell J, Durbin R, Lipman D (2009). "The consensus coding sequence (CCDS) project: Identifying a common protein-coding gene set for the human and mouse genomes.". Genome Res 19 (7): 1316–23. doi:10.1101/gr.080531.108. PMC 2704439. PMID 19498102. 
  2. Harte, RA; Farrell, CM; Loveland, JE; Suner, MM; Wilming, L; Aken, B; Barrell, D; Frankish, A; Wallin, C; Searle, S; Diekhans, M; Harrow, J; Pruitt, KD (2012). "Tracking and coordinating an international curation effort for the CCDS Project.". Database : the journal of biological databases and curation 2012: bas008. doi:10.1093/database/bas008. PMC 3308164. PMID 22434842. 

External links

This article is issued from Wikipedia. The text is available under the Creative Commons Attribution/Share Alike; additional terms may apply for the media files.