Z curve
The Z curve (or Z-curve) method is a bioinformatics algorithm for genome analysis. The Z-curve is a three-dimensional curve that constitutes a unique representation of a DNA sequence, i.e., for the Z-curve and the given DNA sequence each can be uniquely reconstructed from the other.[1] The resulting curve has a zigzag shape, hence the name Z-curve. The Z-curve method has been used in many different areas of genome research, such as replication origin identification,[2][3][4][5] ab initio gene prediction,[6] isochore identification,[7] genomic island identification[8] and comparative genomics.[9]
The Z Curve method was first created in 1994 as a way to visually map a DNA or RNA sequence. Different properties of the Z curve, such as its symmetry and periodicity can give unique information on the DNA sequence.[10] The Z curve is generated from a series of nodes, P0, P1,…PN, with the coordinates xn, yn, and zn (n=0,1,2…N, with N being the length of the DNA sequence). The Z curve is created by connecting each of the nodes sequentially.[11] Information on the distribution of nucleotides in a DNA sequence can be determined from the Z curve. The four nucleotides are combined into six different categories. The nucleotides are placed into each category by some defining characteristic and each category is designated a letter.[12]
Purine | R = A, G | Amino | M = A, C | Weak Hydrogen Bonds | W = A, T |
Pyrimidine | Y = C, T | Keto | K = G, T | Strong Hydrogen Bonds | S = G, C |
The x, y, and z components of the Z curve display the distribution of each of these categories of bases for the DNA sequence being studied. The x-component represents the distribution of purines and pyrimidine bases (R/Y). The y-component shows the distribution of amino and keto bases (M/K) and the z-component shows the distribution of strong-H bond and weak-H bond bases (S/W) in the DNA sequence.[13]
References
- ↑ Zhang CT, Zhang R, Ou HY (2003). "The Z curve database: a graphic representation of genome sequences". Bioinformatics 19 (5): 593–99. doi:10.1093/bioinformatics/btg041. PMID 12651717.
- ↑ Zhang R, Zhang CT (2005). "Identification of replication origins in archaeal genomes based on the Z-curve method". Archaea 1 (5): 335–46. doi:10.1155/2005/509646. PMC 2685548. PMID 15876567.
- ↑ Zhang R, Zhang CT (September 2002). "Single replication origin of the archaeon Methanosarcina mazei revealed by the Z curve method". Biochem. Biophys. Res. Commun. 297 (2): 396–400. doi:10.1016/S0006-291X(02)02214-3. PMID 12237132.
- ↑ Zhang R, Zhang CT (March 2003). "Multiple replication origins of the archaeon Halobacterium species NRC-1". Biochem. Biophys. Res. Commun. 302 (4): 728–34. doi:10.1016/S0006-291X(03)00252-3. PMID 12646230.
- ↑ Worning P, Jensen LJ, Hallin PF, Staerfeldt HH, Ussery DW (February 2006). "Origin of replication in circular prokaryotic chromosomes". Environ. Microbiol. 8 (2): 353–61. doi:10.1111/j.1462-2920.2005.00917.x. PMID 16423021.
- ↑ Guo FB, Ou HY, Zhang CT (2003). "ZCURVE: a new system for recognizing protein-coding genes in bacterial and archaeal genomes". Nucleic Acids Research 31 (6): 1780–89. doi:10.1093/nar/gkg254. PMC 152858. PMID 12626720.
- ↑ Zhang CT, Zhang R (2004). "Isochore structures in the mouse genome". Genomics 83 (3): 384–94. doi:10.1016/j.ygeno.2003.09.011. PMID 14962664.
- ↑ Zhang R, Zhang CT (2004). "A systematic method to identify genomic islands and its applications in analyzing the genomes of Corynebacterium glutamicum and Vibrio vulnificus CMCP6 chromosome I". Bioinformatics 20 (5): 612–22. doi:10.1093/bioinformatics/btg453. PMID 15033867.
- ↑ Zhang R, Zhang CT (2003). "Identification of genomic islands in the genome of Bacillus cereus by comparative analysis with Bacillus anthracis". Physiological Genomics 16 (1): 19–23. doi:10.1152/physiolgenomics.00170.2003. PMID 14600214.
- ↑ Zhang, R.; Zhang, C. T. (1994-02-01). "Z curves, an intutive [sic] tool for visualizing and analyzing the DNA sequences". Journal of Biomolecular Structure & Dynamics 11 (4): 767–782. doi:10.1080/07391102.1994.10508031. ISSN 0739-1102. PMID 8204213.
- ↑ Yu, Chenglong; Deng, Mo; Zheng, Lu; He, Rong Lucy; Yang, Jie; Yau, Stephen S.-T. (2014-07-18). "DFA7, a New Method to Distinguish between Intron-Containing and Intronless Genes". PLoS ONE 9 (7): e101363. doi:10.1371/journal.pone.0101363. PMC 4103774. PMID 25036549.
- ↑ Zhang, Ren; Zhang, Chun-Ting (2014-04-01). "A Brief Review: The Z-curve Theory and its Application in Genome Analysis". Current Genomics 15 (2): 78–94. doi:10.2174/1389202915999140328162433. ISSN 1389-2029. PMC 4009844. PMID 24822026.
- ↑ Zhang, C. T. (1997-08-07). "A symmetrical theory of DNA sequences and its applications". Journal of Theoretical Biology 187 (3): 297–306. doi:10.1006/jtbi.1997.0401. ISSN 0022-5193. PMID 9245572.
External links
- The Z curve database
- "Ori-Finder". Centre of Bioinformatics, Tianjin University (TUBIC). — a free, web-based program for predicting "origins of replication" using Z-curves.
- ENCODE threads explorer Three-dimensional connections across the genome. Nature (journal)
- ZCurve