Talk:Human genome

From Wikipedia, the free encyclopedia

Good articles Human genome (reviewed version) has been listed as a good article under the good-article criteria. If you can improve it further, please do.
If it no longer meets these criteria, you can delist it, or ask for a review.
Molecular and Cellular Biology WikiProject This article is within the scope of the Molecular and Cellular Biology WikiProject. To participate, visit the WikiProject for more information. The current monthly improvement drive is Signal transduction.
Good article GA This article has been rated as GA-Class on the assessment scale.
Top This article is on a subject of top-importance within molecular and cellular biology.

Article Grading: The article has been rated for quality and/or importance but has no comments yet. If appropriate, please review the article and then leave comments here to identify the strengths and weaknesses of the article and what work it will need.

WikiProject Medicine This article is within the scope of WikiProject Medicine. Please visit the project page for details or ask questions at the doctor's mess.
Good article GA rated as GA-Class on the assessment scale
High rated as high-importance on the assessment scale

This is the talk page for discussing improvements to the Human genome article.
This is not a forum for general discussion about the article's subject.

Article policies
This article was selected on the Medicine portal as one of Wikipedia's best articles related to Medicine.
Other languages WikiProject Echo has identified Human genome as a foreign language featured article. You may be able to improve this article with information from the Spanish language Wikipedia.

Contents

[edit] Size of the human genome

I'm confused by the number of coding genes. HGP gives 30,000 CODING genes. Wiki gives 20,000 - 25,000 TOTAL genes with only 1.5% and 2.0 % coding. Can someone reconcile these discrepancies?

Thanks, Norm


The article says there are 3 billion base pairs. If each base pair has 4 possible values, at 2 bits per base pair, this adds up to about 715MB of digital data, just a bit more than is storable on a CD (you could probably store it on a CD if you WinZipped it). Are my calculations correct, and is this interesting/relevant enough (as a more comprehensible size reference) to put in the article?

Edit: correction, you might only need 1 bit per base pair (since there are only 2 possible values). Is this correct? If so, the human genome would be about 360MB.


Indeed, with a parsimonious encoding, one could store the human genome in well under 1GB. Moreover, it compresses very well, due to the high fraction of repetitive content. Conversely, PBS/Discovery Channel documentaries are always fond of the statement that if you printed out the human genome in telephone books, the stack would reach as high as the Washington Monument. The Human Genome Project cost a total of $4.6B or about $1.50 per base pair. These are all interesting tidbits of trivia, but I don't think they are really relevant to this scientifically-oriented article. But I could see a "trivia" section in Human Genome Project. --Mike Lin 04:11, 19 April 2006 (UTC)

I dont know why, but think that this is a neat little fact. The human genome (in document form) is 1/33 of a terabyte. In comparison, the whole of wikipedia is only would only take up 1/113 of a terabite. I think that fact puts it in perspective for us computer geeks out there.

[edit] Talking about its future

I think you can talk about what you can do with it in the future.

Good point!

[edit] Illustrations

The content of the page is pretty good and not too technical. I think that it could benefit from some illustrations. A nice chromosome painting picture would be best, at least for now, I think I'll go to NCBI and UCSC to see if I make something using the genome browsers.--Plociam 18:05, 9 August 2005 (UTC)

That would be awesome. My efforts so far have been to create a solid technical/scientific base for the article. It certainly needs illustration and some more introductory/"pop science" information. BTW please vote for This Week's Improvement Drive :o) --Mike Lin 18:48, 9 August 2005 (UTC)

[edit] Expansion Ideas

  • The human genome and disease
  • The future of human genome research
    • $1000 genome sequencing

--Plociam 07:14, 10 August 2005 (UTC)

[edit] Qualifications

Plociam asked for clarification of the cop-out last part of this sentence:

Thus follows the popular statement that "all humans are at least 99% genetically identical", although this would be somewhat qualified by most geneticists.

This thought should definitely be expanded upon. What I really wanted to do here was to explain this excerpt from Bill Clinton's 2000 State of the Union address:

I just want to say one more thing about this, and I want every one of you to think about this the next time you get mad at one of your colleagues on the other side of the aisle. This fall, at the White House, Hillary had one of her millennium dinners, and we had this very distinguished scientist there, who is an expert in this whole work in the human genome. And he said that we are all, regardless of race, genetically 99.9 percent the same.
Now, you may find that uncomfortable when you look around here. (Laughter.) But it is worth remembering. We can laugh about this, but you think about it. Modern science has confirmed what ancient faiths has always taught: the most important fact of life is our common humanity. Therefore, we should do more than just tolerate our diversity -- we should honor it and celebrate it. (Applause.)

The "qualifications" that I think a geneticist would attach to this statement would go as follows: if you look an SNPs, they cover much less than 1% of the genome. But SNPs have a very specific, technical definition. We can get a lot more SNPs by saying that the base substitution has to be present in only 0.1% of the population instead of 1%. Going further, if you also look at repeats or heterochromatin, for example, actually you can have a fair bit of different stuff going on from person to person. But that stuff doesn't really seem to matter to the phenotype...basically, this is just recognizing that a statement like "we are all, regardless of race, genetically 99.9 percent the same" is a nice soundbite for John Q. Public, but there is a lot of technical caveats underneath in how you define that percentage.

There is a related issue, also in the article, in saying "the species ABC genome is XYZ% identical to the human genome". What does that mean? There are two parts to the "real" answer: how much of the ABC genome aligns to the human genome, and of those portions that align, what percentage of base identity do you have? But there are arbitrary cutoffs made in genome alignments, of whether something aligns or not...so the simplified statement is really just a rough approximation, and it should be presented as such.

None of this is really captured in the article text because I worried about introducing distracting technical minutae...but we should try to convey it somehow.

--Mike Lin 04:50, 16 August 2005 (UTC)

Mike Lin has a good point. While this is a pretty technical caveat, I agree that this article should not make the blanket statement that humans are "genetically 99.9% identical." For now, I suggest that the article qualifies the above statement by replacing "although this would be somewhat qualified by most geneticists" with something simple but more specific, such as "however, this estimate depends on the precise definition of a SNP, which must underestimate the total variation within the genome." In the future, there may be a place for the complete explanation, perhaps in a "criticism" or "controversy" section, particularly if a citation to that viewpoint can be provided. In the meantime, let's leave the complete explanation in the talk section.

--Plociam 00:41, 19 August 2005 (UTC)

Someone should just look this up in the HAPMAP paper. I dunno if they reported on it yet, since it might be reserved for the later ENCODE paper on human variation, but they did complete resequencing in something like 5MB of human genome sequence for a number of populations, which should be more than enough to provide a reasonable estimate of true heterozygosity in humans and avoids the whole "definition of a SNP" debate. Anyway the data is published, so I'm sure they've at least said something about it... Graft 22:30, 27 January 2006 (UTC)

[edit] Genetic basis of human intelligence

Here's a rough draft of a section that I'm working on, although it is probably more appropriate for a broader article on Human genetics, which I'm also working on at a meta-level. That aside, any comments appreciated. --Mike Lin 13:34, 31 August 2005 (UTC)

The remarkable overall similarity of the human genome to those of other mammals has given rise to a scientific debate over the genetic basis of human intelligence. The central issue is whether the recent evolution of human intelligence was relatively typical or extraordinary. If it was typical (that is, if the relevant genes evolved at an average rate), then our capacity for abstract reasoning, arts, and science are most likely the result of relatively modest changes to a small number of genes, simply since so little evolutionary time has passed between humans and other primates. If, on the other hand, uniquely intense selective pressures led to extraordinarily fast evolution of the relevant portions of the human genome, then human intelligence could be the result of rapid evolution of many genes.

A recent study has found that more than one hundred key genes thought to govern the development of the brain have evolved significantly faster in humans than in other mammals, providing some evidence for extraordinary selective pressures and large numbers of genes governing intelligence. However, the issue remains far from settled, since the evolutionary context giving rise to such extraordinary selective pressure has not been convincingly explained.

This debate, which is likely to continue for some time, could ultimately have significant ethical and societal implications. If human intelligence is, in fact, guided by a small number of genes, then it is forseeable that, in the reasonably near future, geneticists might be able to determine or even engineer a person's natural predisposition towards particular intellectual pursuits (such as mathematics or music) on the basis of their genes.

See also: Race and intelligence

Dorus, S. et. al. Accelerated evolution of nervous system genes in the origin of Homo sapiens. Cell 119(7):1027-40, December 2004.

Why are we to "see also" Race and intelligence?
I do not have a copy of the referenced article by Dorus et al, I only read the abstract. I doubt that anyone has identified "more than one hundred key genes thought to govern the development of the brain" unless you take "key gene" to have a meaning like "is involved in". I'd also like to know the quantitative data that stand behind the claim of: "extraordinary selective pressures".
The main genetic changes that account for most of the differences in brain function betwee humans and chimps could have originally involved only a small number of key regulatory genes, for example, alterations in a few transcription factor genes might theoretically account for the greater post-natal brain growth in humans. After such "genetically small" initial changes, hundreds of other genes that have lesser roles in brain development and function could have been modified during subsequent evolution as a secondary response to the initial changes.
Even "If human intelligence is, in fact, guided by a small number of genes" it does not follow that "in the reasonably near future, geneticists might be able to determine or even engineer a person's natural predisposition towards particular intellectual pursuits".--JWSchmidt 15:35, 31 August 2005 (UTC)
Thanks. Most of your points are well taken. The quantitative measure used by Dorus et. al. are Ka/Ks on human vs. macaque nervous system genes as compared to Ka/Ks on the mouse vs. rat orthologs. (Higher Ka/Ks suggests either faster evolution or loss of function, and presumably our nervous system genes aren't losing function -- yes, they make a better argument than that in the paper.) The set of "nervous system genes" was culled from a manual literature search and known nervous system disease genes. They find 1) significantly higher average Ka/Ks in humans compared to rodents; 2) significantly more individual genes with higher Ka/Ks in human than rodent than the other way around; 3) significantly more biased distribution of Ka/Ks values in human compared to rodents; 4) an even stronger bias in all of the above when the "nervous system genes" is whittled down to those specifically implicated in development; 5) (control) statistically indistinguishable Ka/Ks values between humans and rodents on housekeeping genes.
With respect to a few small changes leading to "hundreds" of secondary changes -- I think there are issues with the very short timespan since human/chimp divergence. Once we nailed the brain size (and remember we think brain size varied between neanderthalensis and sapiens), has there been enough time for selection on hundreds of secondary changes without something really unusual going on?
The part about determining genetic predisposition should be more heavily qualified (already it's only "forseeable", "might be", and merely "natural predisposition"), but under the given assumptions and qualifications, it's reasonable to predict that it could be nailed down in the near future, meaning a few decades. I'm getting this from Weinberg [1]. --Mike Lin 17:28, 31 August 2005 (UTC)

version 2:

The remarkable overall similarity of the human genome to those of other mammals, especially primates, has given rise to a scientific debate over the genetic basis of human intelligence; that is, compared to other primates, how extensive are the genomic changes that give us the capacity for abstract reasoning, arts, and science? The central issue in answering this question is whether the recent evolution of whatever genes govern human intelligence (most likely by controlling nervous system development) was a typical or an extraordinary process. If it was typical (that is, if the relevant genes, whatever they are, evolved at a "normal" rate), then our intellectual capacities over other primates are most likely the result of relatively modest changes to a small number of genes, simply because too little time has passed for large-scale concerted evolution of many genes to have taken place.

If, on the other hand, some intense selective pressures led to extraordinarily fast evolution of the relevant portions of the human genome, then human intelligence could be the result of rapid evolution of many genes. A recent study has found that more than one hundred genes involved in the function of the nervous system, and especially some of those thought to control brain development, have evolved significantly faster in humans than in other mammals, providing some evidence for extraordinary selective pressures and large numbers of genes governing intelligence. However, the issue remains far from settled, since the evolutionary context that could give rise to such extraordinary pressure has not been convincingly explained.

This debate, still in its nascent stages and likely to continue for some time, could ultimately have significant ethical and societal implications. If human intelligence is, in fact, guided by a small number of genes, then it is forseeable that, within the moderately near future, geneticists might be able to estimate a person's natural predisposition towards particular intellectual pursuits, such as mathematics or music, on the basis of their genes.

Entrez PubMed 15869325 "A scan for positively selected genes in the genomes of humans and chimpanzees." PLoS Biol. 2005 Jun;3(6):e170. Epub 2005 May 3. --JWSchmidt 12:47, 1 September 2005 (UTC)

"If, on the other hand, some intense selective pressures led to extraordinarily fast evolution of the relevant portions of the human genome, then human intelligence could be the result of rapid evolution of many genes." This seems circular to me: If selective pressure for a certain phenotype made the relevant parts of the human genome (i.e., genes) evolve fast, then the resulting phenotype is the result of genes evolving fast. See the problem? Evolver 13:51, 10 September 2005 (UTC)

[edit] Size over history

Is there anywhere that shows a graph or something similar to show the growth/shrinkage of the genome of the human ancestors? porges 06:34, 11 January 2006 (UTC)

First find your ancestors DNA sample. Or do you mean comparing extant species? David D. (Talk) 07:09, 11 January 2006 (UTC)
I meant along the human lineage - of course, most of it would be projected. Is there anything along these lines or would it be purely speculative? porges 08:53, 11 January 2006 (UTC)
That's actually quite difficult to do, since we don't actually have that much complete genome sequence available. What you're proposing would require us to have, more or less, the complete genomes of the entire primate tree, so we could trace major duplication/deletion/fusion/etc. events and reconstruct ancestral genome sizes going up the human lineage. We have, to date, two complete primate genomes, human and chimp. Graft 22:13, 27 January 2006 (UTC)
Where complete should be in inverted commas: "complete". Genomes are usually declared complete when all but the most difficult sections have been successfully sequenced several times over (usually a number between 6 and 8 times). So it is a somewhat arbitrary consensus decision taken by the scientists involved. But this is a minor point. - Samsara contrib talk 14:11, 10 February 2006 (UTC)
I doubt that there would be that much difference in genome size in the human ancestors anyway, since there isn't a lot of difference between humans and primates. The principle differences would be changes in gene regulation, and there is no way that we could measure those kind of things from bones.--nixie 22:50, 27 January 2006 (UTC)
I beg to differ as far as there being no important differences in size. Also, we can probably reconstruct a substantial fraction of ancestral genomes computationally. As to predicting gene expression computationally... welll... maybe not yet, but it's not ridiculous. Graft 05:50, 28 January 2006 (UTC)


Population geneticists would predict that the human genome has grown, at least compared to the earliest primate ancestors, which probably had larger effective population sizes than we do today. Even if the number of genes had not changed, genomes are still thought to expand when there is less selection for them to remain small, by picking up functionally less efficient or useless (junk) DNA. Selection is always less efficient at smaller effective population sizes (all other things being equal). - Samsara contrib talk 14:15, 10 February 2006 (UTC)

[edit] Congratulations on SCOTW

You've succeeded. This article is now Science Collaboration of the Week. Now make it really, really good! - Samsara contrib talk 10:11, 27 January 2006 (UTC)

[edit] Moral, social and legal consequences

What do other editors feel about inserting a section devoted to the moral and political aspects of the human genome project? I'm gonna be kinda busy over the next few days, but I should be able to contribute something over the next week or so, if that suits? --Nicholas 10:14, 10 February 2006 (UTC)

That would be better suited to the article about the human genome project, no? - Samsara contrib talk 14:08, 10 February 2006 (UTC)

[edit] sugars

I have a concern about "There are an estimated 20,000-25,000 human protein-coding genes. " Aren't there genes coding for various sugars which is now a growth field?

You may be thinking about protein glycosylation, which can produce modified proteins. JWSchmidt 14:12, 10 February 2006 (UTC)
It's worth remembering that the number of proteins does not necessarily equal the number of genes. Apart from glycosylation, there is also alternative splicing to take account of. - Samsara contrib talk 14:17, 10 February 2006 (UTC)

[edit] Human genetics

There is a rather short article on Human genetics, I htink it probably should be a stand alone article, but it probably should be incorporated into this article somehow.--Peta 02:50, 18 May 2006 (UTC)

[edit] Number and sistinctiveness of chromosomes

I was always taught that there were 46 chromomsomes in most people's genetic complement, which is good because it is true. I changed the intro bit to note that most pairs of chromosomes contain two chromosomes that have similar structure, which may help those who wish to change the 24 to 23.

[edit] External Link

The National Office of Public Health Genomics offers insight into how human genomic discoveries can be used to improve health & prevent disease. http://www.cdc.gov/genomics/default.htm Lid6 17:35, 15 September 2006 (UTC)

Added. --apers0n 18:41, 15 September 2006 (UTC)

[edit] References

I took a stab at cleaning up the references using a nifty tool. One of the web links referenced a Science mag article, so I dug up the Science mag and used the press release from LBL as a summary link on the reference. Note that the tool I used creates HUGE lists of authors that I summarized into the relevant group that did the work. --Chrispounds 02:48, 26 September 2006 (UTC)

[edit] X inactivation

It seems rather random to put X inactivation in an introductory section to chromosomes. In fact, is it even needed in an article on genome? If so, shouldn't it be in a section on regulation/dosage compensation? TedTalk/Contributions 02:48, 30 September 2006 (UTC)

For me, the second option seems to be good. X inactivation is important, and it'd need an own section. NCurse work 05:44, 30 September 2006 (UTC)
But, is it important in an article on the human genome? There is discussion about regulatory sequences, but nothing about regulation itself. Before we should talk about X-inactivation, we should discuss multiple-copy genes (such as rRNA genes, or even alpha hemoglobin) -- they have regulatory effects, but are at least genome-related. I don't believe it belongs in this article. TedTalk/Contributions 14:56, 1 October 2006 (UTC)

As for "female mosaic," that was a cutsy way of describing X-inactivation from the 1980s. Does anyone really use this anymore? The current use of "female mosaic" refers to mosaicism in the X-chromosome, such as XX/XO mosaics. TedTalk/Contributions 16:44, 1 October 2006 (UTC)

I have not heard of the mosaic reference being dropped with respect to females and the expression of their X genotype. What makes you think this? David D. (Talk) 03:29, 6 October 2006 (UTC)
References I see to mocaicism in medical/genetic sources are to XX/XO or XX/XY or some such. I used to see female mosaic referring to X-inactivation in Human Genetics textbooks, but I no longer see that. Where do you see it now? TedTalk/Contributions 03:38, 6 October 2006 (UTC)
The Role of X Inactivation and Cellular Mosaicism in Women's Health and Sex-Specific Diseases Barbara R. Migeon, MD JAMA. 2006;295:1428-1433, was one I found with google and there were more examples. There are three examples that I had heard of before, all referred to as mosaics. The first is the sweat gland phenotype that is x linked. Some patches of epidermis are normal others are not. Likewise, women that have coloured sectors in their iris, or even two differently coloured irises, I have heard as been described as mosaic. The final common example I have read described as mosaicism are the rare examples of tetrachromic females. Most basic text books do not even cover X-inacticvation. Which book are your referring to where it is conspicuously absent? Do they state the term is not used or could it just be they didn't know the term has been used for X-inactivation too?David D. (Talk) 04:45, 6 October 2006 (UTC)
Thanks for the examples. All human genetics textbooks cover X-inactivation. I do not normally look at medical genetics textbooks, so it is possible they do not (once again pointing out the difference between human genetics and medical genetics). I have over a dozen human genetics textbooks on my shelf, with a target audience ranging from College Freshmen to Graduate students. They all mention X-inactivation and none of them talk about "female mosaicism". I have some older textbooks from the 1980s that I recall mentioning "female mosaicism", such as Hartl's book from that era. A Google scholar search picked up very few examples, and when it did, they normally talk about "mosaic pattern", which is an OK phrase (I used the search phrase "female mosaic"/"female mosaicism" along with various words associated with X-inactivation: X-inactivation, Barr, Lyonization, etc). It does bring up a good question about terminology. When there is a difference in terminology between professional use and common use, should common use be given any place? My opinion is that an encyclopedia should be "technically correct", although with a nod towards common usage. An article about X-inactivation could mention "female mosaic" in the proper context, but it should be avoided elsewhere. TedTalk/Contributions 11:46, 6 October 2006 (UTC)
Fascinating, i had no idea the usage was so different. I am referring to basic texts such as Snusted (upper level genetics) and b elow down to Campbell biology. None have anything significant on X-inactivation. It sounds like your books might be more professional. I assume by common usage you mean by outside science but i wonder if this is actually a basic research vs medical professional difference? I check my texts in more detail. I agree we need to give a nod to both usages. David D. (Talk) 12:49, 6 October 2006 (UTC)
I just checked Snustad & Simmons. It does mention "...female mammals are genetic mosaics [boldface theirs] containing two types of cell lineages." This usage doesn't fit the definition, "an individual composed of two or more cell lines of different genetic or chromosomal constitution...." (King and Stansfield, Dictionary of Genetics), and is still short of "female mosaicism". Common usage today does not include regulatory differences — if it did, the definition of mosaicism would be so broad as to be meaningless. I'm not sure it even includes such genetic diffferences as B & T cell diversity (which does involve DNA differences, so would strictly follow the definition). TedTalk/Contributions 14:01, 6 October 2006 (UTC)

[edit] Good article nomination

My suggestions:

  • In section Chromosomes, a citation is needed.
  • "The evolutionary branch between the human and mouse, for example, occurred 70-90 million years ago." (a source would be useful)
  • "Protein-coding sequences (specifically exons) comprise less than 1.5% of the human genome." (Source?)

Anyway it seems to be good for me now. NCurse work 14:49, 1 October 2006 (UTC)

Two done (struck out). - Samsara (talkcontribs) 15:04, 1 October 2006 (UTC)
The citation for X-inactivation is an article dealing with imprinting and inactivation. While it is interesting work, it isn't relevant to the general concept of X-inactivation, and even less relevant to "female mosaicism." Lyon's original work (1961) would be better, but I'm working on getting the paragraph deleted as irrelevant. Or, if you want a more modern reference, then maybe a reference on the XIST gene, possibly referencing the general treatment of Therman from 1974, or, specifically, Brown in 1991 for the XIST gene itself. TedTalk/Contributions 16:31, 1 October 2006 (UTC)
I'm not bothered. You know more about it than I. - Samsara (talkcontribs) 23:16, 1 October 2006 (UTC)


The article "Estimation of divergence times from multiprotein sequences for a few mammalian species and several distantly related organisms" advocates a method of estimation the rodent-human divergence at 96 million years ago, and this article discusses other estimates that range from 82 MYBP to over 100 MYBP. --JWSchmidt 15:21, 1 October 2006 (UTC)

It meets good article standards, no question about that. Nearly every statement is referenced. Well done! NCurse work 16:29, 1 October 2006 (UTC)

[edit] New human gene map shows unexpected differences

Story here. Maybe it's interesting material for the article. —msikma <user_talk:msikma> 06:59, 23 November 2006 (UTC)

A better source would be the article which is located here. --WS 12:57, 23 November 2006 (UTC)

[edit] Peer-review of DNA

Hi there. I wondered if the contributors to this page might have some input to this article. TimVickers 22:40, 24 December 2006 (UTC)

[edit] Change to Chromosomes section

The statement under the Chromosomes section... "Somatic cells usually have one copy of chromosomes 1-22 from each parent, plus an X chromosome from the mother, and either an X or Y chromosome from the father, for a total of 46" does not make sense to me. I am not a biology guy, but I do consider myself capable of understanding things when I read them. And this statement doesn't make sense to me, perhaps a re-write would be useful.Thoughtbox 18:38, 27 February 2007 (UTC)

[edit] LINEs and SINEs both Retrotransposons and Interspersed repeats?

I tried to make a little conclusion about the different parts that make the genome. As has gotten clear, LINEs and SINEs appeared both in the section of Retrotransposons and Interspersed repeats. Could this be true? I would appreciate if somebody arranged the articles in a way, that placed LINEs and SINEs in only one type. Thank you. Mortsggah 18:13, 6 April 2007 (UTC)