Talk:DNA sequencing

From Wikipedia, the free encyclopedia

Molecular and Cellular Biology WikiProject This article is within the scope of the Molecular and Cellular Biology WikiProject. To participate, visit the WikiProject for more information. The WikiProject's current monthly collaboration is focused on improving Restriction enzyme.
B This article has been rated as B-Class on the assessment scale.
High This article is on a subject of High-importance within molecular and cellular biology.

Article Grading: The article has been rated for quality and/or importance but has no comments yet. If appropriate, please review the article and then leave comments here to identify the strengths and weaknesses of the article and what work it will need.

This article may be too technical for a general audience.
Please help improve this article by providing more context and better explanations of technical details to make it more accessible, without removing technical details.

Hello cello 10:07, 18 June 2006 (UTC)


Contents

[edit] Whoa, where's the "layman's terms"?

Aww, I was hoping to learn how DNA was sequenced, but I reckon if you are an individual that understands "... initiated at a specific site on the template DNA by using a short oligonucleotide 'primer' complementary to the template at that region.", then you probably already know how DNA is sequenced. Common, there's got to be someone out there that is talented enough with words and biology to make this topic accessible to anyone who tries. :) -Tom

I'll come back later and give it a try. I'll leave a note here so I'll remember to come back.Nbauman 19:15, 3 November 2006 (UTC)

This is quandry. Since Wikipedia is an encyclopedia it should be written for senior in high school level. But a senior should know what the four common DNA nucleotides are, that DNA has polarity, what a primer is, and that a primer annealing to a complementary strand could be used to initiate polymerization. But as I write this I can understand. To a molecular biologist DNA sequencing is as straight forward as it gets. To try to explain DNA sequencing and having to explain it all the way down to the basics of DNA polarity is a mind boggling task.

I will try to find a link to the Lewis Thomas of DNA sequencing. MBCF 01:45, 8 December 2006 (UTC)

[edit] "Less tech" scratch-pad

Hmm, let's see:

"DNA sequencing is any method that reveals any part of the central building block pattern that makes up a cell."

Evolve the article forward from there. Also, we are going to need a recognizable and clear history section. More illustrations too. Need to work in somethings about why we sequence DNA: being unique like fingerprints, find out how biology works, ... --Charles Gaudette 09:24, 18 July 2006 (UTC)

[edit] Other "Next Generation" technologies?

I have added a lot of material here - if you have some comments, please let me knowCinnamon colbert 04:03, 29 March 2007 (UTC)

    • I agree this article needs to be edited to make it more accessible and we should also add some links to the other "Next generation" technologies, e.g. Solexa and 454's biggest near-term competitor - Applied Biosystems' "SOLiD" platform - horrible name, nice technology! Also some discussion of the longer term technologies being investigated, such as nanopore sequencing? P.S. I don't work for Applied Biosystems!
      • I tried reading the solid pdf, and found it awfully complex, a la the Brenner /Lynx paper - is this really goign to fly in the real world ??
    • request As I reviewed your next generation sequencing entries, I noticed you missed a technology that is currently commercialized = Solexa's next generation Sequencing By Synthesis platform. I'd like to request a link to our web page at solexa.com. Please let me know what I can provide to accomplish this. Glenn Powell Director, Marketing Solexa Inc. gpowell@solexa.com
      • after looking at the illumina web site today (28 march 2007) I'm not sure I'd call the Solexa technolog "commercialized" in the sense that there are SKUs and pricing and so forth..looks a little custom at the momentCinnamon colbert 04:03, 29 March 2007 (UTC)

Since I’m close to the events described here, I’ll try to refrain from directly editing the article, but I can point out potential factual errors and peer-reviewed citations allowing other editors to decide.

  • [9] Nat Biotechnol. 2003 Jun;21(6):673-8. "Multiplexed genotyping with sequence-tagged molecular inversion probes" is a lovely paper but is not even close to being the correct reference for “Solid, which combines .. an oligo ligation strategy[9]”

Instead, the reference below is relevant (also one of the first next-generation sequencing papers): Shendure, J, et al. (2005) Accurate Multiplex Polony Sequencing of an Evolved Bacterial Genome. Science 309(5741):1728-32.

      • OK
  • “colony-based technique (from Mitra and Church at Harvard)” doesn’t really fit under “other proposals” but fits earlier since it is directly relevant to the Solexa and AB methods, also it might be worth pointing out that polymerase colonies do not involve bacterial colonies as used in previous sequencing methods.
    • fix it if you want

Mitra,RD, Shendure,J, Olejnik,J, Olejnik,EK, and Church,GM (2003) Fluorescent in situ Sequencing on Polymerase Colonies. Analyt. Biochem. 320:55-65

  • “commercialized by Integrated Genetics,[17]” This was not the company that commercialized it; it was Genome Therapeutics and that reference [17] is not even close to the Church & Gilbert technology. Furthermore, to declare that it was not a commercial success, seems to merit reconsideration considering that it was responsible for the first commercial genome sequence in 1994 and related projects worth over $100M in value (http://www.nature.com/ng/wilma/v13n1.867861436.html ) and lead a few years later to other solid-phase, cyclic hybridization and imaging methods including the ‘next-generation’ methods cited above.
    • Genomie therapuetics was a spin off of integrated genetics, to my memory; I meant to refer to the automated blot processor that IG/GT developed; commercial success is an undfined term, but I think it includes sales , on a repeat basis, to many other unrelated entitys, and it includes repeat on going revenue; a grant proposal is not, however large, commercial success in my mind.
  • In the section on “Major landmarks in DNA sequencing” Some of these don’t seem so major or relevant to sequencing.

1992 “Haseltine is shown dye-terminator sequencing technology” “2003 Cold Spring Harbor sponsors GeneSweep, a sweepstakes on the number of human genes.”

  • The statement “potential to produce 106-fold greater sequencing data” seems to cry out for an external reference. It would be safer to say 100-fold, but even for that it’s hard to find a peer-reviewed economic analysis. One such analysis is in the Science paper above.
  • “**First chromosome physical maps published:” This seems to be misformatted (should be on a new line) and also questionable whether 4 examples of maps are needed since their not really (a) representative, (b) relevant to sequencing, (c) relevant to a broad audience.

--George Church 06:12, 27 May 2007 (UTC)

[edit] Wow - George Church here !!

fellow wikians - quite an honor to have Dr. Church, although why he feels he can't contibute is a bit of a mystery.. Cinnamon colbert 17:49, 9 June 2007 (UTC)

It's not too surprising if you check his discussion page -- he's just trying to avoid any conflict of interest issues. I'm glad you're implementing his suggestions, I was hesitant to do so myself for COI reasons, so I put it off and hoped someone else would. -- Madeleine 18:08, 9 June 2007 (UTC)

[edit] In near future, people's official personal name will be a babbled hash of their full DNA sequencing.

If and when it becomes possible to sequence the entire human DNA in short time (minutes) using small equipment (desktop or portable) then a hash algorythm could be developed, which can give a short unique identifier for any human on Earth, based on their full DNA seq - just like 512-byte little SHA-1 hashes are used to positively identify gigabyte sized files of digital data.

The hash number sequence could be converted to S-code (a vocabulary of artifical words) to make up an easy to remember sentence. This is already done in SSH Babble format for user friendly host key fingerprints. Similarly the DNA's spellable S-code hash would become your official name. I recommend using chinese names and/or native american names for the DNA's Babble dictionary. Even if you were John Doe Jr., your official genetic name will be something like "Pei Xio Li Bai Chung Zheng Kuo" or "Little Blue Cloud Sitting Wolf Three Arrows".

The results would be great. Personal identity cards would become unnecessary, as your DNA hash becomes your worldwide unique personal name and identity and it can be verified infallibly in situ from the blood or mouth swab using rapid desktop sequencers. Thus, it becomes impossible to live under false names. Murder victims can always be identified immediately. Murder suspects can be named immediately on national TV, even if they only left a drop of blood or spit behind.

Didn't you ever see Gattaca??? --Baldzac 20:53, 5 December 2006 (UTC)

Of course the hash forming algorithm needs to be designed carefully so that it is not possible to deduce private info e.g. about gender, racial identity or hereditary diseases just by knowing someone's hash. Such info shall be gained only from full sequencing.

I do not think this bright or menacing (see 666) future is very far. About 97% of human DNA is invariably same among all people so only the rest 3% needs to be sequnced and reduced to a unique hash. Sequencing technology is developing rapidly. 195.70.48.242 20:33, 5 December 2006 (UTC)


[edit] ZS genetics - anyone else interested ?

Clearly, there is a disupte about the entry for ZS genetics and Glover in the next generation section of the article.
Does anyone have an opionion ? Should I request editor moderation ? I think what is in their now amounts to commercial pumping, and therefore is inappropriate for wikipedia.Cinnamon colbert 16:31, 1 April 2007 (UTC)

The informaton on Wm. Glover, and his novel methods using heavy-atom tagging and TEM for visualization is correct, factual, and NPOV. The applicaton which permits single-cell, single-molecule accuracy for gene expression has significant implications for cancer research, and related methods for DNA sequencing of strands of 20,000 bp and more in length are projected to provide a 2+ order of magnitude reduction in the time and cost of sequencing with digital accuracy. Credible researchers are following these developments with interest. If the judgement is that this development may not be reported in the Wikipedia until the method is well established, so be it; however the discussion of "next generation" technologies will be the poorer for it. Frankatca 17:26, 1 April 2007 (UTC)
The information provided on ZS Genetics is entirely NPOV. If the Wikipedia is to be cutting edge and useful it needs to responsibly report innovations as they occur and can be documented. See: In Sequence: The Inside Read on Genome Sequencing - February 20, 2007 http://www.in-sequence.com/issues/1_7/webreprints/139071-1.html

Further technical details in their published patent filings. See http://www.freepatentsonline.com/20070134699.html

The recent entry documents such an innovation. It is information; not advertising. Frankatca 15:45, 5 August 2007 (UTC)

I felt that Frank's re-addition of ZS Genetics looked like advertising, so I removed it. Then, concerned about the pro-academia bias I probably have, I re-added some old material on unproven methods in development which describes the technique without mentioning the company (but uses the company as a reference). I don't like wikipedia listing development by companies on unproven methods, it feels like advertising of unverifiable material to me, but if it's done I think it should simply mention the method without the company. Once something is proven & commercially produced there is an argument for mentioning companies themselves, but until then I feel that it is advertising and lacks sufficient notability. Madeleine 16:09, 5 August 2007 (UTC)
PS - Frank is on the board of directors for ZS Genetics, and I am a current member of George Church's lab. Madeleine 16:13, 5 August 2007 (UTC)

ZS Genetics was invited to present a technical progress report on their technology for DNA sequencing using halogen-labeled bases and automated transmission electron microscopy imaging for identification at Cambridge Healthtech Institute’s Second Annual conference on Next Generation Sequencing Platforms, Applications, and Case Studies in San Diego, CA, April 2008. During this conference it was announced that ZS Genetics is the seventh team to be accepted into the ten million dollar Archon X PRIZE for Genomics competition. Frankatca (talk) 12:57, 23 April 2008 (UTC)

[edit] WikiEthics, WikiPurpose, Etc.

So I am a bioinformatics tech about to enter a PhD program and I am wondering what my ethical responsibilities may or may not be in editing this article. I work with Solexa Sequencing techniques right now, which have some pretty novel and exciting possibilities that are well worth mentioning, but I am sort of tooting my own horn which may be received poorly. Also, along the lines of the "layman's terms" point, to what extent can/should the easily comprehensible descriptions be supplemented by more arcane material that might be useful to other researchers like myself. I know that when I, personally, search for 454 or Solexa on the web I go straight to the company's websites, which are not always clear or thorough. Can we offer a description that ascends through the high school level to the collegiate? and beyond? --WillJeck 01:48, 22 March 2007 (UTC)

WillJeck, your first instinct was right. Promoting your own product would specifically violate Wikipedia rules against self-promotion and advertising. If we allowed Wikipedia to include self-promotion, people on the payroll of companies would overwhelm volunteers.
The problem is that you can (and probably will) give undue emphasis to commercial companies -- if not to your own companies, to the commercial companies in general, as distinct from academic bjectively (although if someone from the company itself wrote the article, it would seem to violate Wikipedia rules on advertising and self-promotion). A link to the company web site would be legitimate too, if the site actually has useful information. But it would be a problem (and violate NPOV) if you choose commercial sites over non-commercial sites.
Why, for example, don't you link to the Nobel website, [1] which has lots of educational material on DNA sequencing written on the level of an intelligent high school student? There's a huge amount of free information on DNA sequencing, written by teachers and academics. [2]
I realize that the commercial companies have made important contributions and deserve a place in the article. But we have to strike a balance, and it's very easy for the commercial interests to take over. And of all the commercial inroads in Wikipedia, this article is much less of a problem than others. Nbauman 17:47, 24 March 2007 (UTC)


[edit] DNA Sequencing and the Human Genome - Hype and Promise

Cinnamon colbert 01:08, 4 April 2007 (UTC) let me know what you think of this paragraph, currently self rated at start- status. Cinnamon colbert 01:08, 4 April 2007 (UTC) the point I am trying to make is that the human genome has not yet, really, been sequenced, and that there is a great deal of hype in the supposedly scientific press.

Good point, but under Wikipedia rules you can't say it yourself, you have to find a source to attribute it to. I think the part about the unsequenced segments is the interesting part. I remember reading in Science about the unsequenced areas.
It would also be good to find an example of hype to knock down. Most of what I've read about DNA sequencing has been fairly measured and restrained. In Science and the NEJM, anyway. Nbauman 02:12, 4 April 2007 (UTC)
The points have some validity, but one needs to be a bit more grown up about the whole issue. It is now well known that the benefits and results of sequencing the human genome have been overblown by the media (especially when it was a "hot" topic), and also at least by some scientists seeking funding for this enterprise. In addition, ignoring for a moment that its content is lacking NPOV, I do not think that this section really belongs here. This article describes the principles of available DNA sequencing techniques, not sequencing of the human genome in particular. So if it is to be retained, it would need a new home, or else a section that deals with difficult-to-sequence regions such centromers or telomeres, repeats, etc, and possible approaches to sequencing those.Malljaja 09:06, 4 April 2007 (UTC)

thanks for the feedback, cinnamon colbert

[edit] Removal of the section

I removed this section without making comment here, my apologies. Cinnamon colbert posted to my talk page, I realize I should have said something here, so I'm pasting it here.Madeleine 15:06, 20 July 2007 (UTC)

limitations current technology in dna seq page
why did you delete this ? even by the std you cite, "complete = euchromatic" the "complete" chromosome seqs have gaps; I presume most of the gaps have been sized by southerns with pfge (which as you know was a schwartz cantor thing originally)
I feel we are doing a dis service to the non prof community by failing to point out the tremendous gap between reality and hype.Cinnamon colbert 13:53, 20 July 2007 (UTC)
As I understand it, the general issue is not that the DNA wasn't sequenced, the issue is that we can't figure out how to stitch it together due to repetitiveness. As such, I don't think this warrants its own section, it belongs within the sequence assembly section. It's deceptive to tell the reader "it's a lie! they didn't sequence all of it!" when the stuff that wasn't "sequenced" almost certainly was — it was simply too repetitive to be assembled in a linear fashion. Like a jigsaw where all the pieces are the same shape and same color. Rather than having a high-level section whose sole purpose is to "cry foul", I think this belongs as a more NPOV clarification within the section on "large-scale sequencing". To that end, I have added two sentences to that section (and a reference & link to the publicly available paper) — do these sentences work for you? Madeleine 15:06, 20 July 2007 (UTC)
I concur with Madeleine--the heading Hype and promise always seemed out of place and somewhat detracted from the fact that this is an entry that is focused mainly on DNA sequencing technology. As per my above comment left shortly after this section was introduced, I think its content needs a new home, or, as MP just did, should be integrated in the technical description of limitations of current seq technologies. I haven't scoured Wikipedia for articles devoted to DNA sequence assembly & annotation, but I'd expect that it deserves its own entry. Malljaja 19:08, 20 July 2007 (UTC)

[edit] New sequencing methods

I rewrote this section (and changed the title), trying to give more explanation of general tactics taken in pursuit of high-throughput sequencing. I'm posting here now because I noticed that someone with experience with Solexa got told not to edit this section. I need to make sure you know that I'm member of George Church's lab. As I mentioned earlier on this talk page, I was hesitant to edit the section. In the end, I did, because I wanted to improve it. I tried very hard to make this NPOV, using the first reference as a guide for what technologies are "well-known" and trying my best to be even-handed about it. Madeleine 23:40, 20 July 2007 (UTC)

Of interest: U.S. patents: 7,291,467, 7,291,468 & 7,288,379 - Systems and methods of analyzing nucleic acid polymers and related components; issued to ZS Genetics on November 6, 2007. The patent claims cover: 1) methods for sequencing nucleic acid polymers, 2) methods for detecting, identifying and/or quantifying nucleic acid polymers and 3) computerized systems for such genetic analyses, advanced DNA sequencing using electron microscopes.

The significance of the new method is suggested by the following text from filing 7,288,379: "One advantage provided by certain systems and methods of the invention is that nucleic acid sequencing, detection and/or identification can be done at extremely high speeds. The high speeds and other features of the invention, such as reduced sample manipulation and reduced need for performing chemistry on samples, also can lead to significant reduction in the cost of such analysis. Thus, systems and methods of the invention may make practical obtaining complete or substantial portions of genomes of individual humans for clinical uses (e.g., pharmacogenomics, diagnostics such as disease susceptibility or prognosis) and research uses (e.g., pharmacological research, research into biological processes, and research into the biological process of diseases). Also, it may be possible using embodiments of the invention to perform nucleic acid assays, not just identifying nucleic acids, but also their quantities, with great precision, within individual cells of an organism. This will provide a detailed understanding of how distinct cells function differently.

"One example of the foregoing is the use of the methods described herein in conducting microarray-type analysis of gene expression. Similar to conventional microarrays, grids of oligonucleotides (i.e., probes for specific genes or alleles) are provided on a substrate as described above. Nucleic acids that are labeled as described herein are prepared and contacted with the oligonucleotide grid to capture labeled nucleic acid molecules having specific sequences. In these embodiments, the need for labeling each different nucleotide with a unique label is lessened or eliminated, because the data read is not necessarily concerned with sequence (which is specified by the oligonucleotides) but simply can be concerned with determining the number of molecules bound to a specific oligonucleotide probe (i.e., detecting nucleic acid polymers), and/or determining the length of the nucleic acid polymer to identify the nucleic acid polymer. The application of the methods of the invention to microarray-type analysis and quantification of gene expression yields improvements in speed and quantification. Also, due to the ability to count individual nucleic acid molecules bound to the substrate, the methods permit the use of less sample, without amplification, thereby providing a more accurate picture of gene expression levels."

Frankatca 17:29, 12 November 2007 (UTC)

[edit] Samples with external dna often discarded after sequencing

We may need to rethink that that after this study. [3] Brian Pearson 01:32, 3 September 2007 (UTC)

[edit] new "Challenges" section

I feel this should simply be integrated into the section as a description of the resulting data, eliminate any attempts at comparison/contrast with the high-throughput methods.

  • Paragraph 1: true, but this isn't a challenge, it's simply a description of the output.
  • Paragraph 2: This isn't true, I sequence things all the time with a PCR primer, you only need to attach linkers for sequencing a library. In addition, you need to attach linkers for the high-throughput strategies too -- there's a PCR step, and there needs to be a common sequence for the sequencing primer. I believe in the traditional method you're putting them into cloning vectors to enable clonal isolation. As far as I understand it (and in my experience), filtering out these linker sequences is trivial, so I don't see how this information is notable.
  • Paragraph 3: Most of the high throughput methods, with the notable exception of Helicos, also use PCR amplification; however, I think the problems described are probably more likely to arise in longer PCR products (high throughput library molecules are very short). Again, this is can be included as a description of the output.

Madeleine 13:45, 17 April 2008 (UTC)

[edit] early methods

Add work from Ray Wu at cornell ? this is a little outside my knolwedge base - I don't know how important his work was. Cinnamon colbert (talk) 14:15, 20 April 2008 (UTC)