Race and multilocus allele clusters

From Wikipedia, the free encyclopedia

Human population structure can be inferred from multilocus DNA sequence data.  In Rosenberg et al. 2002, 2005, individuals from 52 populations were examined at 993 DNA markers. This data was used to partitioned individuals into K = 2, 3, 4, 5 or 6 clusters. In this figure, the average fractional membership of individuals from each population is represented by horizontal bars partitioned into K colored segments.  2 cluster analysis separated Africa and Eurasia from East Asia, Oceania, and America, 3 clusters separated Africa and Eurasia, 4 clusters separated America, 5 clusters separated Oceania (green), and 6 clusters subdivided native Americans.
Enlarge
Human population structure can be inferred from multilocus DNA sequence data. In Rosenberg et al. 2002, 2005, individuals from 52 populations were examined at 993 DNA markers. This data was used to partitioned individuals into K = 2, 3, 4, 5 or 6 clusters. In this figure, the average fractional membership of individuals from each population is represented by horizontal bars partitioned into K colored segments. 2 cluster analysis separated Africa and Eurasia from East Asia, Oceania, and America, 3 clusters separated Africa and Eurasia, 4 clusters separated America, 5 clusters separated Oceania (green), and 6 clusters subdivided native Americans.

Racial distinctions are generally made on the basis of skin color, facial features, inferred ancestry, national origin and self-identification. Ongoing debate exists over the merit of the concept of 'race', especially from the perspective of genetics. Many scientists argue that common racial classifications are insufficient, inaccurate, or biologically meaningless.[1] For example, Lewontin (1972) argues that there is no biological basis for race on the basis of research indicating that more genetic variation exists within such races than between them. However, some geneticists have claimed that many of these "well-intentioned"[2] statements are false and do not "derive from an objective scientific perspective."[3] They argue instead "that from both an objective and scientific (genetic and epidemiologic) perspective there is great validity in racial/ethnic self-categorizations, both from the research and public policy points of view."[3] It is well known that many alleles vary in frequency across human populations.

Contents

[edit] Distribution of variation

A thorough description of the differences in patterns of genetic variation between humans and other species awaits additional genetic studies of human populations and nonhuman species. But the data gathered to date suggest that human variation exhibits several distinctive characteristics. First, compared with many other mammalian species, humans are genetically less diverse—a counterintuitive finding, given our large population and worldwide distribution (Li and Sadler 1991; Kaessmann et al. 2001). For example, the chimpanzee subspecies living just in central and western Africa have higher levels of diversity than do humans (Ebersberger et al. 2002; Yu et al. 2003; Fischer et al. 2004).

Two random humans are expected to differ at approximately 1 in 1000 nucleotides, whereas two random chimpanzees differ at 1 in 500 nucleotide pairs. Therefore with a genome of approximate 3 billion nucleotides, on average two humans differ at approximately 3 million nucleotides. Most of these single nucleotide polymorphisms (SNPs) are neutral, but some are functional and influence the phenotypic differences between humans. It is estimated that about 10 million SNPs exist in human populations, where the rarer SNP allele has a frequency of at least 1% (see International HapMap Project).

The distribution of variants within and among human populations also differs from that of many other species. The details of this distribution are impossible to describe succinctly because of the difficulty of defining a "population," the clinal nature of variation, and heterogeneity across the genome (Long and Kittles 2003). In general, however, 5%–15% of genetic variation occurs between large groups living on different continents, with the remaining majority of the variation occurring within such groups (Lewontin 1972; Jorde et al. 2000a; Hinds et al. 2005). This distribution of genetic variation differs from the pattern seen in many other mammalian species, for which existing data suggest greater differentiation between groups (Templeton 1998; Kittles and Weiss 2003).

In the field of population genetics, it is believed that the distribution of neutral polymorphisms among contemporary humans reflects human demographic history.

Our history as a species also has left genetic signals in regional populations. For example, in addition to having higher levels of genetic diversity, populations in Africa tend to have lower amounts of linkage disequilibrium than do populations outside Africa, partly because of the larger size of human populations in Africa over the course of human history and partly because the number of modern humans who left Africa to colonize the rest of the world appears to have been relatively low (Gabriel et al. 2002). In contrast, populations that have undergone dramatic size reductions or rapid expansions in the past and populations formed by the mixture of previously separate ancestral groups can have unusually high levels of linkage disequilibrium (Nordborg and Tavare 2002).

In the field of population genetics, it is believed that the distribution of neutral polymorphisms among contemporary humans reflects human demographic history. It is believed that humans passed through a population bottleneck before a rapid expansion coinciding with migrations out of Africa leading to an African-Eurasian divergence around 100,000 years ago (ca. 5,000 generations), followed by a European-Asian divergence about 40,000 years ago (ca. 2,000 generations).

The rapid expansion of a previously small population has two important effects on the distribution of genetic variation. First, the so-called founder effect occurs when founder populations bring only a subset of the genetic variation from their ancestral population. Second, as founders become more geographically separated, the probability that two individuals from different founder populations will mate becomes smaller. The effect of this assortative mating is to reduce gene flow between geographical groups, and to increase the genetic distance between groups. The expansion of humans from Africa affected the distribution of genetic variation in two other ways. First, smaller (founder) populations experience greater genetic drift because of increased fluctuations in neutral polymorphisms. Second, new polymorphisms that arose in one group were less likely to be transmitted to other groups as gene flow was restricted.

Many other geographic, climatic, and historical factors have contributed to the patterns of human genetic variation seen in the world today. For example, population processes associated with colonization, periods of geographic isolation, socially reinforced endogamy, and natural selection all have affected allele frequencies in certain populations (Jorde et al. 2000b; Bamshad and Wooding 2003). In general, however, the recency of our common ancestry and continual gene flow among human groups have limited genetic differentiation in our species.

[edit] Substructure in the human population

Triangle plot shows average admixture of five North American ethnic groups. Individuals that self-identify with each group can be found at many locations on the map, but on average groups tend to cluster differently.
Enlarge
Triangle plot shows average admixture of five North American ethnic groups. Individuals that self-identify with each group can be found at many locations on the map, but on average groups tend to cluster differently.

New data on human genetic variation has reignited the debate surrounding race. Most of the controversy surrounds the question of how to interpret these new data, and whether conclusions based on existing data are sound. A large majority of researchers endorse the view that continental groups do not constitute different subspecies. However, other researchers still debate whether evolutionary lineages should rightly be called "races". These questions are particularly pressing for biomedicine, where self-described race is often used as an indicator of ancestry (see race in biomedicine below).

Although the genetic differences among human groups are relatively small, these differences in certain genes such as duffy, ABCC11, SLC24A5, called ancestry-informative markers (AIMs) nevertheless can be used to reliably situate many individuals within broad, geographically based groupings or self-identified race. For example, computer analyses of hundreds of polymorphic loci sampled in globally distributed populations have revealed the existence of genetic clustering that roughly is associated with groups that historically have occupied large continental and subcontinental regions (Rosenberg et al. 2002; Bamshad et al. 2003).

Some commentators have argued that these patterns of variation provide a biological justification for the use of traditional racial categories. They argue that the continental clusterings correspond roughly with the division of human beings into sub-Saharan Africans; Europeans, western Asians, and northern Africans; eastern Asians; Polynesians and other inhabitants of Oceania; and Native Americans (Risch et al. 2002). Other observers disagree, saying that the same data undercut traditional notions of racial groups (King and Motulsky 2002; Calafell 2003; Tishkoff and Kidd 2004). They point out, for example, that major populations considered races or subgroups within races do not necessarily form their own clusters. Thus, samples taken from India and Pakistan affiliate with Europeans or eastern Asians rather than separating into a distinct cluster.

Furthermore, because human genetic variation is clinal, many individuals affiliate with two or more continental groups. Thus, the genetically based "biogeographical ancestry" assigned to any given person generally will be broadly distributed and will be accompanied by sizable uncertainties (Pfaff et al. 2004).

In many parts of the world, groups have mixed in such a way that many individuals have relatively recent ancestors from widely separated regions. Although genetic analyses of large numbers of loci can produce estimates of the percentage of a person's ancestors coming from various continental populations (Shriver et al. 2003; Bamshad et al. 2004), these estimates may assume a false distinctiveness of the parental populations, since human groups have exchanged mates from local to continental scales throughout history (Cavalli-Sforza et al. 1994; Hoerder 2002). Even with large numbers of markers, information for estimating admixture proportions of individuals or groups is limited, and estimates typically will have wide CIs (Pfaff et al. 2004).

[edit] Ancestry as a way of categorizing people

An alternative to the use of racial or ethnic categories is to categorize individuals in terms of ancestry. Ancestry may be defined geographically (e.g., Asian, sub-Saharan African, or northern European), geopolitically (e.g., Vietnamese, Zambian, or Norwegian), or culturally (e.g., Brahmin, Lemba, or Apache). The definition of ancestry may recognize a single predominant source or multiple sources. Ancestry can be ascribed to an individual by an observer, as was the case with the U.S. census prior to 1960; it can be identified by an individual from a list of possibilities or with use of terms drawn from that person's experience; or it can be calculated from genetic data by use of loci with allele frequencies that differ geographically, as described above. At least among those individuals who participate in biomedical research, genetic estimates of biogeographical ancestry generally agree with self-assessed ancestry (Tang et al. 2005), but in an unknown percentage of cases, they do not (Brodwin 2002; Kaplan 2003).

Genetic data can be used to infer population structure and assign individuals to groups that often correspond with their self-identified geographical ancestry. The inference of population structure from multilocus genotyping depends on the selection of a large number of informative genetic markers. These studies usually find that groups of humans living on the same continent are more similar to one another than to groups living on different continents. Many such studies are criticized for assigning group identity a priori. However, even if group identity is stripped and group identity assigned a posteriori using only genetic data, population structure can still be inferred. For example, using 993 markers, Rosenberg et al. (2005) were able to assign 1,048 individuals from 52 populations around the globe to one of six genetic clusters, which correspond to major geographic regions.

However, in analyses that assign individuals to group it becomes less apparent that self-described racial groups are reliable indicators of ancestry. One cause of the reduced power of the assignment of individuals to groups is admixture. Some racial or ethnic groups, especially Hispanic groups, do not have homogenous ancestry. For example, self-described African Americans tend to have a mix of West African and European ancestry. Shriver et al. (2003) found that on average African Americans have ~80% African ancestry. Likewise, many white Americans have mixed European and African ancestry, where ~30% of whites have less than 90% European ancestry. In this context, it is becoming more commonplace to describe "race" as fractional ancestry. Without the use of genotyping, this has been approximated by the self-described ancestry of an individual's grand-parents.

Nevertheless, recent research indicates that self-described race is a near-perfect indicator of an individual's genetic profile, at least in the United States. Using 326 genetic markers, Tang et al. (2005) identified 4 genetic clusters among 3,636 individuals sampled from 15 locations in the United States, and were able to correctly assign individuals to groups that correspond with their self-described race (white, African American, East Asian, or Hispanic) for all but 5 individuals (an error rate of 0.14%). They conclude that ancient ancestry, which correlates tightly with self-described race and not current residence, is the major determinant of genetic structure in the U.S. population.

Genetic techniques that distinguish ancestry between continents can also be used to describe ancestry within continents. However, the study of intra-continental ancestry may require a greater number of informative markers. Populations from neighboring geographic regions typically share more recent common ancestors. As a result, allele frequencies will be correlated between these groups. This phenomenon is often seen as a cline of allele frequencies. The existence of allelic clines has been offered as evidence that individuals cannot be allocated into genetic clusters (Kittles & Weiss 2003). However, others argue that low levels of differentiation between groups merely make the assignment to groups more difficult, not impossible (Bamshad et al. 2004).

Despite its seemingly objective nature, ancestry also has limitations as a way of categorizing people (Elliott and Brodwin 2002). When asked about the ancestry of their parents and grandparents, many people cannot provide accurate answers. In one series of focus groups in the state of Georgia, 40% of ∼100 respondents said they did not know one or more of their four grandparents well enough to be certain how that person(s) would identify racially (Condit et al. 2003). Misattributed paternity or adoption can separate biogeographical ancestry from socially defined ancestry. Furthermore, the exponentially increasing number of our ancestors makes ancestry a quantitative rather than qualitative trait—5 centuries (or 20 generations) ago, each person had a maximum of >1 million ancestors (Ohno 1996). To complicate matters further, recent analyses suggest that everyone living today has exactly the same set of genealogical ancestors who lived as recently as a few thousand years in the past, although we have received our genetic inheritance in different proportions from those ancestors (Rohde et al. 2004).

Opponents of racial groupings argue that a distinct difference is only one of the two conditions for racial classifiction; the second condition is a lack of significant gene flow between populations. Cultural anthropologists believe humans to be monotypic because they argue races gradually fade into one another in many parts of the world. Although there has historically been little or no gene flow between some human populations such as the aboriginal Australians and black Africans, they argue, one cannot assume there has been little interracial gene flow, as the interbreeding of locally adjacent populations may also produce common traits. Some researchers report enough such gene flow has occurred that the most recent common ancestor of all humans alive today has been estimated as living as recently as 3,500 years ago [2], although critics say this is not necessarily significant gene flow (Rhode et al., 2004). Intercontinental travel has caused increased gene flow between geographically distant human populations. In some regions, this has caused racial lines to fade or perhaps disappear, particularly Latin American and parts of Southern Africa.

The delicacy of this definition has left the issue much in debate, especially among physical anthropologists, for if clines lead to large areas of near-homogeneity, such as Kenya, Sweden and China, then the people in these areas seem marked off by delimiters resembling nothing so much as the traditional physiological touchstones of "race". Currently, the question of whether human genetic variation is better described as clinal (i.e. no races) or cladistic (i.e. races are real) is largely fading.

The problem arises of distinguishing black Africans as a racial group; it doesn't work because it is a paraphyletic classification. In other words, under a phylogenetic classification, considering black Africans as a single racial group would require one to include every living person on Earth within that single African "race", because the genetic variation of the rest of the world represents essentially a single subtree within that of Africa. Also, it has long been known that groups such as the Khoisan were as different from other sub-Saharan groups as are Europeans and Asians (though even with the Khoisan the distinction is no longer so clearcut, as a large amount of intermarriage with both Europeans and Bantu-language speakers has occurred over the last three centuries).

Rachel Caspari (2003) argued that clades are by definition monophyletic groups (a taxon that includes all descendents of a given ancestor); since races are not monophyletic, they cannot be clades.

In the end, the terms "race," "ethnicity," and "ancestry" all describe just a small part of the complex web of biological and social connections that link individuals and groups to each other.

[edit] Notes

  1. ^ Sternberg et al. 2005, Suzuki and Aronson 2005, Smedley and Smedley 2005, Helms et al. 2005, [1]
  2. ^ Collins 2004
  3. ^ a b Risch et al. 2002