Degenerate bases

From Wikipedia, the free encyclopedia

Degenerate base symbols in biochemistry are a IUPAC [1] representation for a position on a DNA sequence that can be have multiple possible alternatives. These should not be confused with non-canonical bases because each particular sequence will have in fact one of the regular bases. These are used to encode the consensus sequence of a population of aligned sequences and are used for example in phylogenetic analysis to summarise into one multiple sequences or for BLAST searches, even though IUPAC degenerate symbols are masked (as they are not coded).

Symbol [1] Description Bases represented
A adenosine A
C cytidine C
G guanine G
T thymidine T
U uridine U
W weak A T
S strong C G
M amino A C
K keto G T
R purine A G
Y pyrimidine C T
B not A C G T
D not C A G T
H not G A C T
V not T A C G
N any base (not a gap) A C G T

[edit] References

  1. ^ a b Nomenclature Committee of the International Union of Biochemistry (NC-IUB). Nomenclature for Incompletely Specified Bases in Nucleic Acid Sequences. Retrieved on 2008-02-04.