Genetic diversity in India and the inference of Eurasian population expansion
© Xing et al.; licensee BioMed Central Ltd. 2010
Received: 21 July 2010
Accepted: 24 November 2010
Published: 24 November 2010
Genetic studies of populations from the Indian subcontinent are of great interest because of India's large population size, complex demographic history, and unique social structure. Despite recent large-scale efforts in discovering human genetic variation, India's vast reservoir of genetic diversity remains largely unexplored.
To analyze an unbiased sample of genetic diversity in India and to investigate human migration history in Eurasia, we resequenced one 100-kb ENCODE region in 92 samples collected from three castes and one tribal group from the state of Andhra Pradesh in south India. Analyses of the four Indian populations, along with eight HapMap populations (692 samples), showed that 30% of all SNPs in the south Indian populations are not seen in HapMap populations. Several Indian populations, such as the Yadava, Mala/Madiga, and Irula, have nucleotide diversity levels as high as those of HapMap African populations. Using unbiased allele-frequency spectra, we investigated the expansion of human populations into Eurasia. The divergence time estimates among the major population groups suggest that Eurasian populations in this study diverged from Africans during the same time frame (approximately 90 to 110 thousand years ago). The divergence among different Eurasian populations occurred more than 40,000 years after their divergence with Africans.
Our results show that Indian populations harbor large amounts of genetic variation that have not been surveyed adequately by public SNP discovery efforts. Our data also support a delayed expansion hypothesis in which an ancestral Eurasian founding population remained isolated long after the out-of-Africa diaspora, before expanding throughout Eurasia.
The Indian subcontinent is currently populated by more than one billion people who belong to thousands of linguistic and ethnic groups [1, 2]. Genetic and anthropological studies have shown that the peopling of the subcontinent is characterized by a complex history, with contributions from different ancestral populations [2–5]. Studies of maternal lineages by mitochondrial resequencing have shown that the two major mitochondrial lineages that emerged from Africa (haplogroups M and N, dating to approximately 60 thousand years ago (kya)) are both very diverse among Indian populations [6, 7]. Additional studies of mitochondrial haplogroups show that an early migration may have populated the Indian subcontinent, leaving 'relic' populations in present-day India represented by some Austroasiatic-and Dravidian-speaking tribal populations [7–10]. These results highlight that the initial peopling of the Indian subcontinent likely occurred early in the history of anatomically modern humans. Concordant with the mitochondrial DNA (mtDNA) data, paternal lineages within India also show high diversity based on short tandem repeat (STR) markers on the Y chromosome and support an early and continuous presence of populations on the subcontinent . Recent studies of autosomal SNPs and STRs also demonstrate a high degree of genetic differentiation among Indian ethnic and linguistic groups [12–14].
The high diversity and the deep mitochondrial lineages in India support the hypothesis that Eurasia was initially populated by two major out-of-Africa migration routes [3, 15–17]. Populations migrating along an early 'southern-route' originated from the Horn of Africa, crossed the mouth of the Red Sea into the Arabian Peninsula, and subsequently migrated into India, Southeast Asia, and Australia. Later, populations migrated out of Africa along a 'northern route' from northern Africa into the Middle East and subsequently populated Eurasia. A recent study suggests that a population ancestral to all Eurasians has limited admixture with Neanderthals after the out-of-Africa migration event but prior to either of the two major Eurasian migrations . This scenario, which we termed the 'delayed expansion' hypothesis , predicts that the ancestral Eurasian population separated from African populations long before the expansion into Eurasia. However, the long-term existence of such an ancestral Eurasian population has never been documented. This hypothesis can be tested by using DNA sequence data to examine the demographic history of African populations and a diverse array of Eurasian populations, including previously under-represented samples from South Asia.
Recently, insights into population structure were gained from analyses of data from high-density SNP arrays [13, 19–26]. Although high-density SNP genotypes are useful for assessing population structure, quantitative analyses of demographic history depend critically on the patterns of variation represented not just by common SNPs (minor allele frequency ≥0.05) contained in genotyping SNP panels, but also by rare variants (minor allele frequency < 0.05) that have not been thoroughly characterized to date . Furthermore, most SNPs present on the high-density SNP genotyping platforms have been ascertained in an analytically intractable and ad hoc fashion . A lack of unbiased polymorphism data limits our ability to accurately estimate the genetic diversity level found in the Indian subcontinent and to correctly infer demographic parameters, such as effective population size, migration rate, and date of population origin and divergence. In addition, despite the large amount of genetic diversity suggested by Y-chromosome, mtDNA, and autosomal microarray analyses, Indian genetic diversity remains largely unexplored by previous large-scale human variant discovery efforts (for example, HapMap and PopRes).
To overcome the limitations and biases associated with SNP microarrays, we used the PCR-Sanger sequencing method to resequence a 100-kb ENCODE region in 92 Indian samples from four population groups (three castes and one tribal population) from the south Indian state Andhra Pradesh and combined our results with eight HapMap populations that are resequenced for the same region . By examining the complete distribution of rare and common variants in several populations that are not included in HapMap/ENCODE studies, we assess the additional information that can be gained by sampling more diverse populations, especially in geographic regions with little or no coverage. Furthermore, using resequencing data from 12 populations covering Africa, Europe, India, and East Asia, we are able to obtain accurate estimates of parameters such as ancestral population sizes and divergence dates and to test the 'delayed expansion' hypothesis of Eurasian population history.
ENCODE region selection and SNP discoveries
To generate a comparable dataset, we applied the same SNP calling criteria on 722 HapMap individuals who were sequenced using the same protocol in the ENCODE3 project . We then merged these two datasets (four Indian populations and eight HapMap populations (CEU, CHB, CHD, GIH, JPT, LWK, TSI, and YRI)) to obtain a final data set that consists of 1,484 SNPs in 722 individuals from 12 populations (see Materials and methods for SNP merging and filtering details).
Among the 1,484 total SNPs, 234 (15.8%) are specific to Indian populations (four Andhra Pradesh populations and the HapMap northern Indian GIH; Figure 1b). For Indian individuals, the average number of specific SNPs per individual is 1.5. This number is lower than in HapMap African individuals (2.4 SNPs), but higher than both HapMap European (1.3 SNPs) and HapMap East Asian individuals (1.1 SNPs). This result suggests that higher autosomal genetic diversity is harbored in Indian samples compared to other HapMap Eurasian samples. Among the 453 SNPs in the four newly sequenced south Indian populations, 137 (30%) are not present in any HapMap populations (Figure 1c), including one novel non-synonymous singleton variant (Supplemental text in Additional file 1).
Genetic diversity in India
Genetic diversity in continental groups and populations
At the population level, π and H indicate that some Indian populations have diversity levels comparable to or even higher than those of HapMap African populations. Specifically, Mala/Madiga, Yadava, and Irula have the highest π among all populations (84.46 π 10-5, 88.94 π 10-5, and 82.77 π 10-5, respectively). In contrast, Brahmins and HapMap GIH have lower diversity levels, comparable to HapMap European and East Asian populations (Table 1). Due to small sample sizes, the confidence intervals of π for all populations overlap. However, at the continental level, Indians have significantly higher nucleotide diversity than Europeans and East Asians, although θ and haplotype diversity are similar among the three groups (Table 1). Removal of unconfirmed genotypes in Indian individuals does not change the results (Supplemental text and Supplemental Table S3 in Additional file 1).
Demographic history of Eurasian populations
Pairwise F ST values (%) between and among continental groups
Pairwise F ST values (%) between Indian and HapMap non-Indian populations
The complete sequence data allow us to obtain an accurate derived-allele frequency (DAF) spectrum. At both the continental and population levels, the DAF spectra in our dataset are characterized by a high proportion of low-frequency SNPs, as expected for sequencing data (Supplemental text and Supplemental Figure S3 in Additional file 1). Based on the DAF spectra, we are able to infer the parameters associated with Indian population history, such as the divergence time, effective size, and migration rate between populations using the program ∂a∂i (Diffusion Approximation for Demographic Inference) .
∂a∂iinferred parameters for the three-population out-of-Africa model
T Af (kya)
T B (kya)
T 1-2 (kya)
∂a∂iinferred parameters for the four-population out-of-Africa model
East Asia first
N Af a
N B a
T Af (kya)a
T B (kya)a
T C (kya)
T 2-3 (kya)
When individual populations are analyzed, the patterns are largely consistent with the results from continental groups (Supplemental text and Supplemental Table S4 in Additional file 1). The CIs around the parameters are generally larger, indicating a loss of power due to the smaller sample sizes of the individual populations compared to the continental groups.
India has served as a major passageway for the dispersal of modern humans, and Indian demographics have been influenced by multiple waves of human migrations [3, 9, 33]. Because of its long history of human settlement and its enormous social, linguistic, and cultural diversity, the population history of India has long intrigued anthropologists and human geneticists [3, 12–14, 20, 34, 35]. A better understanding of Indian genetic diversity and population history can provide new insights into early migration patterns that may have influenced the evolution of modern humans.
By sampling and resequencing 92 south Indian individuals we found 137 novel SNPs in the 100-kb region. These new SNPs represent approximately 30% of the total SNPs in these individuals. This result is consistent with several previous studies that showed that genetic variants in Indian populations, especially the less common variants, are incompletely captured by HapMap populations [12, 29, 36]. More importantly, we found that genetic diversity varies substantially among Indian populations. At the continental level, the Indian continental group has significantly higher nucleotide diversity than both European and East Asian groups. Although the HapMap GIH and the Brahmin populations have genetic diversity values comparable to those of other HapMap Eurasian populations, diversity values (π and H) in the Irula, Mala/Madiga, and Yadava samples are higher than those of the HapMap African populations. The genetic diversity difference among Indian populations has been observed previously in mitochondria , autosomal , and Y chromosome  studies. Even among geographically proximate populations, genetic diversity can vary greatly due to differences in effective population sizes, mating patterns, and population history among these populations. Our finding highlights the importance of including multiple Indian populations in the human genetic diversity discovery effort.
Although this Eurasian ancestral population would have been isolated from the sub-Saharan African populations in this study, the geographic location of this population is uncertain. The most plausible location is the Middle East and/or northern Africa. A Middle East location of this population could explain the admixture patterns of Neanderthal and the non-African populations , although current archeological evidence does not support continuous occupation of the Middle East by modern humans prior to the Eurasian expansion . Alternatively, a north African location is more consistent with the archeological record but requires extreme population stratification within Africa . A more comprehensive sampling of African populations could help to pinpoint the location of this population.
Under the four-population out-of-Africa model, the divergence times among the three Eurasian continental groups are similar. The likelihood of the model with an earlier East Asian divergence is similar to that of the model with an earlier Indian divergence. This result appears to contradict the hypothesis that the Indian sub-continent was first populated by an early 'southern-route' migration through the Arabian Peninsula [3, 15–17]. Previous studies have identified unique mitochondrial M haplogroups in some tribal populations that are consistent with an older wave of migration [7–9]. For example, some Dravidian-and Austroasiatic-speaking Indian tribal populations share ancestral markers with Australian Aborigines on a mitochondrial M haplogroup (M42), which is dated to approximately 55 kya . However, because our samples of the Indian continental group are composed of three caste populations and one tribal Indian population, these populations are unlikely to effectively represent the descendants of the early 'southern-route' migration event. This sample collection might partially explain why we were unable to distinguish the 'East Asia first' model from the 'India first' model.
The between-population F ST estimates and divergence time estimates show that the Indian populations have different affinities to European and East Asian populations. South Indian Brahmin and northern Indian GIH have higher affinity to Europeans than to East Asians, while the tribal Irula generally have closer affinity to East Asian populations. The differential population affinities of Indian populations to other Eurasian populations have been observed previously using mtDNA, Y-chromosome, and autosomal markers. Regardless of caste affiliation, genetic distance estimates with mitochondrial markers showed a greater affinity of south Indian castes to East Asians, while distance estimates with Y-chromosome markers showed greater affinity of Indian castes to Europeans [14, 41, 42]. Distances estimated from autosomal STRs and SNPs also showed differential affinity of caste populations to European and East Asian populations [12–14, 20].
There are some limitations on our ability to infer demographic history in this study. First, our results are based on the sequence of a continuous 100-kb region. Therefore, these results reflect the history of a number of possibly co-segregating markers from a small portion of the genome. Our CIs around the parameter estimates, however, account for this co-segregation. Second, although we incorporated a number of parameters of population history, our demographic model is still a simplification of the true population history. Third, parameters estimated in our model are dependent on the estimate of the human mutation rate, which varies several-fold using different methods or datasets [43, 44]. Nevertheless, with appropriate caution, the sequence data allow us to explore demographic models in ways that are not possible with genotype data alone.
By sequencing a 100-kb autosomal region, we show that Indian populations harbor large amounts of genetic variation that have not been surveyed adequately by public SNP discovery efforts. In addition, our results strongly support the existence of an ancestral Eurasian population that remained separated from African populations for a long period of time before a major population expansion throughout Eurasia. With the rapid development of sequencing technologies, in the near future we will obtain exome and whole-genome data sets from many diverse populations, such as isolated Indian tribal groups who might better represent the descendants of a 'southern-route' migration event. These data will allow us to evaluate more complex models and refine the demographic history of the human Eurasian expansion.
Materials and methods
DNA samples, DNA sequencing and SNP calling
Ninety-four individuals from three caste groups and one tribal group from Andhra Pradesh, India were sampled (Figure 1a). All samples belong to the Dravidian language family and were collected as unrelated individuals as described previously [45, 46]. All studies of South Indian populations were performed with approval of the Institutional Review Board of the University of Utah and Andhra University, India. To sequence the ENCODE region ENr123, we used the same sets of primers that were used for the ENCODE3 project for PCR amplification and the same Sanger sequencing. Next, we obtained the sequence of 722 HapMap individuals from the ENCODE3 project  and performed SNP calling using the same SNP discovery pipeline . This experimental design allowed us to directly compare genetic variation patterns observed in these Indian populations with those observed in the HapMap populations studied by ENCODE3 . The sequence traces of the Indian samples generated from this study can be accessed at NCBI trace archive  by submitting the query: center_project = 'RHIDZ'.
SNPs and individual selection
After the SNP-calling process, two individuals with less than 80% call rates were removed from the dataset (one Brahmin and one Yadava). The SNP calls from the remaining 92 samples that passed quality control were then combined with the SNP calls from eight HapMap non-admixed populations studied by ENCODE3, including individuals from the Centre d'Etude du Polymorphisme Humain collection in Utah, USA, with ancestry from Northern and Western Europe (CEU), Han Chinese in Beijing, China (CHB), Japanese in Tokyo, Japan (JPT), Yoruba in Ibadan, Nigeria (YRI), Chinese in Metropolitan Denver, CO, USA (CHD), Gujarati Indians in Houston, TX, USA (GIH), Luhya in Webuye, Kenya (LWK), and Toscani in Italy (TSI), to create a final dataset containing 722 individuals from 12 populations.
After merging the HapMap and the south Indian data sets, 112 loci that are fixed in all 12 populations were removed from the dataset. Thirteen tri-allelic SNPs were also removed because most analyses in this study are designed for bi-allelic SNPs. For SNPs that are fixed in certain populations, genotypes were filled-in using the hg18 reference allele because the reference allele information was used in the SNP calling process (that is, only genotypes that are different from the reference alleles are called as SNPs).
The Hardy-Weinberg equilibrium test was performed on each of the 12 populations, and P-values from each test were obtained and transformed to Z-scores. Twelve Z-scores were combined to a single Z-score and transformed to a single P-value for each SNP. Bonferroni correction was used, and 48 SNPs that failed the test at the 0.01 level (P < 0.01/1,532) were removed. The ancestral/derived allele states of each SNP were determined using the human/chimpanzee alignment obtained from the UCSC database (hg18 vs.panTro2 ). Minor-alleles of 17 SNPs were assigned as the derived allele because the derived allele could not be determined by human-chimpanzee alignments. Genotypes of all samples in the final dataset are available as a supplemental file on our website  under Published Data.
For the 137 SNPs that are specific to our samples (that is, not present in any HapMap populations), we performed a validation experiment using an independent platform (Roche 454). When the minor allele is present in more than five individuals at a given locus, five individuals with the heterozygous genotype were randomly selected for validation. Among the 137 SNPs, we successfully designed and assayed 119 SNPs in 211 individual experiments. For the validation pipeline, we used PCR to amplify regions around the variants using the same primers as those used in the initial variant detection pipeline. In order to make genotype calls on all experiments simultaneously and also to reduce the cost of Roche 454 sequencing, we pooled PCR reactions in ten different pools and each pool was sequenced using a quarter of a Roche Titanium 454 sequencing run. The analysis was done using the Atlas-SNP2 pipeline available at the BCM-HGSC . Reads from the 454 runs were anchored using BLAT  to a unique spot in the genome, followed with the refined alignments using the cross_match program . We required at least 50 reads mapped to the variant site to make a validation call and the fraction of reads with the variant to be >15% of all reads mapping to that site.
Sequence statistics, FSTestimates, and PCA
Sequence-analysis statistics (S, θ, π, H and Tajima's D), and the confidence intervals for θ and π were calculated using the Population Genetics and Evolution Toolbox  in MATLAB (version r2009a). To assess haplotype diversity, the dataset was phased using fastPHASE (version 1.2)  with imputation, and the phased dataset was separated into ten 10-kb non-overlapping windows. Haplotype heterozygosity was then calculated for each window, and the mean heterozygosity for each population/continental group was calculated. For the SNP heterozygosity/geographic distance correlation analysis, the great-circle distance between each population and Addis Ababa, Ethiopia, a proposed point of modern human origin , was calculated. For populations that were collected from places other than their origins, an approximate origin location was used, such as Beijing, China for CHD, and Gujarat, India for GIH. F ST estimates between populations were calculated by the method described by Weir and Cockerham . Nei's genetic distances between populations were estimated from allele-frequency data as implemented in the PHYLIP software package  and PCA was performed using MATLAB.
Demographic history inference
Demographic history parameters were inferred using the program ∂a∂i (version 1.5.2) . Using a diffusion approximation to the allele-frequency spectrum, ∂a∂i implements a series of methods to infer population history based on sequence data. We compared three different three-population out-of-Africa models and three four-population out-of Africa models to test the effect of adding different parameters to the model (Supplemental text and Supplemental Figures S5 and S6 in Additional file 1). For the two models used in the final analysis, the python programs that were used to estimate the parameters, including the function calls, grid sizes, initial parameters, and parameter boundaries, are shown in Supplemental Figures S7 and S8 in Additional file 1. To ensure that the algorithm identified the optimal parameters, ten independent runs were performed on each model, and the parameter set with the highest likelihood was selected as the final result. For each model, 500 bootstrap replicates were performed on the dataset to obtain the confidence intervals. The per-generation mutation rate was estimated based on the human-chimpanzee divergence in this region (1.2%) using the method described in , with a generation time of 25 years, a human-chimpanzee speciation time of 6 million years ago, and a human-chimpanzee ancestral effective population size of 84,000 (averaged from the estimates from [60–62]).
thousand years ago
principal components analysis
single nucleotide polymorphism
short tandem repeat.
We thank BVR Prasad, JM Naidu, and B Baskara Rao for help in collecting samples in Andhra Pradesh, India. We thank Lora R Lewis, David Wheeler, and Kyle Chang for assistance with resequencing and pipeline analysis. We also thank two anonymous reviewers for their constructive comments. This study was funded by the National Human Genome Research Institute, National Institute of Health (5U54HG003273 and 1U01HG005211-01 to RG), and National Institute of Health (GM-59290 to LBJ). JX is supported by the National Human Genome Research Institute, National Institute of Health (K99HG005846). CH is supported by the University of Luxembourg-Institute for Systems Biology Program and the Primary Children's Medical Center Foundation National Institute of Diabetes and Digestive and Kidney Diseases (DK069513). Part of the computation for the project was performed at the Center for High Performance Computing, University of Utah.
- Singh KS: People of India: An Introduction. 1992, Calcutta Anthropological Survey of IndiaGoogle Scholar
- Chaubey G, Metspalu M, Kivisild T, Villems R: Peopling of South Asia: investigating the caste-tribe continuum in India. Bioessays. 2007, 29: 91-100. 10.1002/bies.20525.PubMedView ArticleGoogle Scholar
- Cavalli-Sforza LL, Menozzi P, Piazza A: The History and Geography of Human Genes. 1994, Princeton: Princeton University PressGoogle Scholar
- Thapar R: Early India. 2002, Berkeley: University of California PressGoogle Scholar
- Majumder PP: The human genetic history of South Asia. Curr Biol. 2010, 20: R184-187. 10.1016/j.cub.2009.11.053.PubMedView ArticleGoogle Scholar
- Palanichamy MG, Sun C, Agrawal S, Bandelt HJ, Kong QP, Khan F, Wang CY, Chaudhuri TK, Palla V, Zhang YP: Phylogeny of mitochondrial DNA macrohaplogroup N in India, based on complete sequencing: implications for the peopling of South Asia. Am J Hum Genet. 2004, 75: 966-978. 10.1086/425871.PubMedPubMed CentralView ArticleGoogle Scholar
- Chandrasekar A, Kumar S, Sreenath J, Sarkar BN, Urade BP, Mallick S, Bandopadhyay SS, Barua P, Barik SS, Basu D, Kiran U, Gangopadhyay P, Sahani R, Prasad BV, Gangopadhyay S, Lakshmi GR, Ravuri RR, Padmaja K, Venugopal PN, Sharma MB, Rao VR: Updating phylogeny of mitochondrial DNA macrohaplogroup m in India: dispersal of modern human in South Asian corridor. PLoS One. 2009, 4: e7447-10.1371/journal.pone.0007447.PubMedPubMed CentralView ArticleGoogle Scholar
- Kumar S, Padmanabham PB, Ravuri RR, Uttaravalli K, Koneru P, Mukherjee PA, Das B, Kotal M, Xaviour D, Saheb SY, Rao VR: The earliest settlers' antiquity and evolutionary history of Indian populations: evidence from M2 mtDNA lineage. BMC Evol Biol. 2008, 8: 230-10.1186/1471-2148-8-230.PubMedPubMed CentralView ArticleGoogle Scholar
- Macaulay V, Hill C, Achilli A, Rengo C, Clarke D, Meehan W, Blackburn J, Semino O, Scozzari R, Cruciani F, Taha A, Shaari NK, Raja JM, Ismail P, Zainuddin Z, Goodwin W, Bulbeck D, Bandelt HJ, Oppenheimer S, Torroni A, Richards M: Single, rapid coastal settlement of Asia revealed by analysis of complete mitochondrial genomes. Science. 2005, 308: 1034-1036. 10.1126/science.1109792.PubMedView ArticleGoogle Scholar
- Thangaraj K, Chaubey G, Kivisild T, Reddy AG, Singh VK, Rasalkar AA, Singh L: Reconstructing the origin of Andaman Islanders. Science. 2005, 308: 996-10.1126/science.1109987.PubMedView ArticleGoogle Scholar
- Sengupta S, Zhivotovsky LA, King R, Mehdi SQ, Edmonds CA, Chow CE, Lin AA, Mitra M, Sil SK, Ramesh A, Usha Rani MV, Thakur CM, Cavalli-Sforza LL, Majumder PP, Underhill PA: Polarity and temporality of high-resolution y-chromosome distributions in India identify both indigenous and exogenous expansions and reveal minor genetic influence of central asian pastoralists. Am J Hum Genet. 2006, 78: 202-221. 10.1086/499411.PubMedPubMed CentralView ArticleGoogle Scholar
- Indian Genome Variation Consortium: Genetic landscape of the people of India: a canvas for disease gene exploration. J Genet. 2008, 87: 3-20. 10.1007/s12041-008-0002-x.View ArticleGoogle Scholar
- Reich D, Thangaraj K, Patterson N, Price AL, Singh L: Reconstructing Indian population history. Nature. 2009, 461: 489-494. 10.1038/nature08365.PubMedPubMed CentralView ArticleGoogle Scholar
- Watkins WS, Thara R, Mowry BJ, Zhang Y, Witherspoon DJ, Tolpinrud W, Bamshad MJ, Tiripati S, Padmavati R, Smith H, Nancarrow D, Filippich C, Jorde LB: Genetic variation in South Indian castes: evidence from Y-chromosome, mitochondrial, and autosomal polymorphisms. BMC Genet. 2008, 9: 86-10.1186/1471-2156-9-86.PubMedPubMed CentralView ArticleGoogle Scholar
- Lahr MM, Foley R: Multiple dispersals and modern human origins. Evol Anthropol. 1994, 3: 48-60. 10.1002/evan.1360030206.View ArticleGoogle Scholar
- Forster P, Matsumura S: Evolution. Did early humans go north or south?. Science. 2005, 308: 965-966. 10.1126/science.1113261.PubMedView ArticleGoogle Scholar
- Disotell TR: Human evolution: the southern route to Asia. Curr Biol. 1999, 9: R925-928. 10.1016/S0960-9822(00)80106-2.PubMedView ArticleGoogle Scholar
- Green RE, Krause J, Briggs AW, Maricic T, Stenzel U, Kircher M, Patterson N, Li H, Zhai W, Fritz MH, Hansen NF, Durand EY, Malaspinas AS, Jensen JD, Marques-Bonet T, Alkan C, Prufer K, Meyer M, Burbano HA, Good JM, Schultz R, Aximu-Petri A, Butthof A, Hober B, Hoffner B, Siegemund M, Weihmann A, Nusbaum C, Lander ES, Russ C, et al: A draft sequence of the Neandertal genome. Science. 2010, 328: 710-722. 10.1126/science.1188021.PubMedView ArticleGoogle Scholar
- Xing J, Watkins WS, Shlien A, Walker E, Huff CD, Witherspoon DJ, Zhang Y, Simonson TS, Weiss RB, Schiffman JD, Malkin D, Woodward SR, Jorde LB: Toward a more uniform sampling of human genetic diversity: A survey of worldwide populations by high-density genotyping. Genomics. 2010, 96: 199-210. 10.1016/j.ygeno.2010.07.004.PubMedPubMed CentralView ArticleGoogle Scholar
- Xing J, Watkins WS, Witherspoon DJ, Zhang Y, Guthery SL, Thara R, Mowry BJ, Bulayeva K, Weiss RB, Jorde LB: Fine-scaled human genetic structure revealed by SNP microarrays. Genome Res. 2009, 19: 815-825. 10.1101/gr.085589.108.PubMedPubMed CentralView ArticleGoogle Scholar
- Tian C, Plenge RM, Ransom M, Lee A, Villoslada P, Selmi C, Klareskog L, Pulver AE, Qi L, Gregersen PK, Seldin MF: Analysis and application of European genetic substructure using 300 K SNP information. PLoS Genet. 2008, 4: e4-10.1371/journal.pgen.0040004.PubMedPubMed CentralView ArticleGoogle Scholar
- Li JZ, Absher DM, Tang H, Southwick AM, Casto AM, Ramachandran S, Cann HM, Barsh GS, Feldman M, Cavalli-Sforza LL, Myers RM: Worldwide human relationships inferred from genome-wide patterns of variation. Science. 2008, 319: 1100-1104. 10.1126/science.1153717.PubMedView ArticleGoogle Scholar
- Jakobsson M, Scholz SW, Scheet P, Gibbs JR, VanLiere JM, Fung HC, Szpiech ZA, Degnan JH, Wang K, Guerreiro R, Bras JM, Schymick JC, Hernandez DG, Traynor BJ, Simon-Sanchez J, Matarin M, Britton A, van de Leemput J, Rafferty I, Bucan M, Cann HM, Hardy JA, Rosenberg NA, Singleton AB: Genotype, haplotype and copy-number variation in worldwide human populations. Nature. 2008, 451: 998-1003. 10.1038/nature06742.PubMedView ArticleGoogle Scholar
- Novembre J, Johnson T, Bryc K, Kutalik Z, Boyko AR, Auton A, Indap A, King KS, Bergmann S, Nelson MR, Stephens M, Bustamante CD: Genes mirror geography within Europe. Nature. 2008, 456: 98-101. 10.1038/nature07331.PubMedPubMed CentralView ArticleGoogle Scholar
- Abdulla MA, Ahmed I, Assawamakin A, Bhak J, Brahmachari SK, Calacal GC, Chaurasia A, Chen CH, Chen J, Chen YT, Chu J, Cutiongco-de la Paz EM, De Ungria MC, Delfin FC, Edo J, Fuchareon S, Ghang H, Gojobori T, Han J, Ho SF, Hoh BP, Huang W, Inoko H, Jha P, Jinam TA, Jin L, Jung J, Kangwanpong D, Kampuansai J, Kennedy GC, et al: Mapping human genetic diversity in Asia. Science. 2009, 326: 1541-1545. 10.1126/science.1177074.PubMedView ArticleGoogle Scholar
- Bryc K, Auton A, Nelson MR, Oksenberg JR, Hauser SL, Williams S, Froment A, Bodo JM, Wambebe C, Tishkoff SA, Bustamante CD: Genome-wide patterns of population structure and admixture in West Africans and African Americans. Proc Natl Acad Sci USA. 2010, 107: 786-791. 10.1073/pnas.0909559107.PubMedPubMed CentralView ArticleGoogle Scholar
- Wall JD, Cox MP, Mendez FL, Woerner A, Severson T, Hammer MF: A novel DNA sequence database for analyzing human demographic history. Genome Res. 2008, 18: 1354-1361. 10.1101/gr.075630.107.PubMedPubMed CentralView ArticleGoogle Scholar
- Clark AG, Hubisz MJ, Bustamante CD, Williamson SH, Nielsen R: Ascertainment bias in studies of human genome-wide polymorphism. Genome Res. 2005, 15: 1496-1502. 10.1101/gr.4107905.PubMedPubMed CentralView ArticleGoogle Scholar
- Altshuler DM, Gibbs RA, Peltonen L, Dermitzakis E, Schaffner SF, Yu F, Bonnen PE, de Bakker PI, Deloukas P, Gabriel SB, Gwilliam R, Hunt S, Inouye M, Jia X, Palotie A, Parkin M, Whittaker P, Chang K, Hawes A, Lewis LR, Ren Y, Wheeler D, Muzny DM, Barnes C, Darvishi K, Hurles M, Korn JM, Kristiansson K, Lee C, McCarrol SA, et al: Integrating common and rare genetic variation in diverse human populations. Nature. 2010, 467: 52-58. 10.1038/nature09298.PubMedView ArticleGoogle Scholar
- Ramachandran S, Deshpande O, Roseman CC, Rosenberg NA, Feldman MW, Cavalli-Sforza LL: Support from the relationship of genetic and geographic distance in human populations for a serial founder effect originating in Africa. Proc Natl Acad Sci USA. 2005, 102: 15942-15947. 10.1073/pnas.0507611102.PubMedPubMed CentralView ArticleGoogle Scholar
- Metspalu M, Kivisild T, Metspalu E, Parik J, Hudjashov G, Kaldma K, Serk P, Karmin M, Behar DM, Gilbert MT, Endicott P, Mastana S, Papiha SS, Skorecki K, Torroni A, Villems R: Most of the extant mtDNA boundaries in south and southwest Asia were likely shaped during the initial settlement of Eurasia by anatomically modern humans. BMC Genet. 2004, 5: 26-10.1186/1471-2156-5-26.PubMedPubMed CentralView ArticleGoogle Scholar
- Gutenkunst RN, Hernandez RD, Williamson SH, Bustamante CD: Inferring the joint demographic history of multiple populations from multidimensional SNP frequency data. PLoS Genet. 2009, 5: e1000695-10.1371/journal.pgen.1000695.PubMedPubMed CentralView ArticleGoogle Scholar
- Majumder PP: Genomic inferences on peopling of south Asia. Curr Opin Genet Dev. 2008, 18: 280-284. 10.1016/j.gde.2008.07.003.PubMedView ArticleGoogle Scholar
- Watkins WS, Prasad BV, Naidu JM, Rao BB, Bhanu BA, Ramachandran B, Das PK, Gai PB, Reddy PC, Reddy PG, Sethuraman M, Bamshad MJ, Jorde LB: Diversity and divergence among the tribal populations of India. Ann Hum Genet. 2005, 69: 680-692. 10.1046/j.1529-8817.2005.00200.x.PubMedView ArticleGoogle Scholar
- Wooding S, Ostler C, Prasad BV, Watkins WS, Sung S, Bamshad M, Jorde LB: Directional migration in the Hindu castes: inferences from mitochondrial, autosomal and Y-chromosomal data. Hum Genet. 2004, 115: 221-229. 10.1007/s00439-004-1130-x.PubMedView ArticleGoogle Scholar
- Xing J, Witherspoon DJ, Watkins WS, Zhang Y, Tolpinrud W, Jorde LB: HapMap tagSNP transferability in multiple populations: general guidelines. Genomics. 2008, 92: 41-51. 10.1016/j.ygeno.2008.03.011.PubMedPubMed CentralView ArticleGoogle Scholar
- Basu A, Mukherjee N, Roy S, Sengupta S, Banerjee S, Chakraborty M, Dey B, Roy M, Roy B, Bhattacharyya NP, Roychoudhury S, Majumder PP: Ethnic India: a genomic view, with special reference to peopling and structure. Genome Res. 2003, 13: 2277-2290. 10.1101/gr.1413403.PubMedPubMed CentralView ArticleGoogle Scholar
- Stringer CB, Andrews P: Genetic and fossil evidence for the origin of modern humans. Science. 1988, 239: 1263-1268. 10.1126/science.3125610.PubMedView ArticleGoogle Scholar
- Hodgson JA, Bergey CM, Disotell TR: Neandertal genome: the ins and outs of African genetic diversity. Curr Biol. 2010, 20: R517-519. 10.1016/j.cub.2010.05.018.PubMedView ArticleGoogle Scholar
- Kumar S, Ravuri RR, Koneru P, Urade BP, Sarkar BN, Chandrasekar A, Rao VR: Reconstructing Indian-Australian phylogenetic link. BMC Evol Biol. 2009, 9: 173-10.1186/1471-2148-9-173.PubMedPubMed CentralView ArticleGoogle Scholar
- Bamshad M, Kivisild T, Watkins WS, Dixon ME, Ricker CE, Rao BB, Naidu JM, Prasad BV, Reddy PG, Rasanayagam A, Papiha SS, Villems R, Redd AJ, Hammer MF, Nguyen SV, Carroll ML, Batzer MA, Jorde LB: Genetic evidence on the origins of Indian caste populations. Genome Res. 2001, 11: 994-1004. 10.1101/gr.GR-1733RR.PubMedPubMed CentralView ArticleGoogle Scholar
- Cordaux R, Aunger R, Bentley G, Nasidze I, Sirajuddin SM, Stoneking M: Independent origins of Indian caste and tribal paternal lineages. Curr Biol. 2004, 14: 231-235.PubMedView ArticleGoogle Scholar
- Nachman MW, Crowell SL: Estimate of the mutation rate per nucleotide in humans. Genetics. 2000, 156: 297-304.PubMedPubMed CentralGoogle Scholar
- Roach JC, Glusman G, Smit AF, Huff CD, Hubley R, Shannon PT, Rowen L, Pant KP, Goodman N, Bamshad M, Shendure J, Drmanac R, Jorde LB, Hood L, Galas DJ: Analysis of genetic inheritance in a family quartet by whole-genome sequencing. Science. 2010, 328: 636-639. 10.1126/science.1186802.PubMedPubMed CentralView ArticleGoogle Scholar
- Bamshad MJ, Watkins WS, Dixon ME, Jorde LB, Rao BB, Naidu JM, Prasad BV, Rasanayagam A, Hammer MF: Female gene flow stratifies Hindu castes. Nature. 1998, 395: 651-652. 10.1038/27103.PubMedView ArticleGoogle Scholar
- Watkins WS, Bamshad M, Dixon ME, Bhaskara Rao B, Naidu JM, Reddy PG, Prasad BV, Das PK, Reddy PC, Gai PB, Bhanu A, Kusuma YS, Lum JK, Fischer P, Jorde LB: Multiple origins of the mtDNA 9-bp deletion in populations of South India. Am J Phys Anthropol. 1999, 109: 147-158. 10.1002/(SICI)1096-8644(199906)109:2<147::AID-AJPA1>3.0.CO;2-C.PubMedView ArticleGoogle Scholar
- Zhang J, Wheeler DA, Yakub I, Wei S, Sood R, Rowe W, Liu PP, Gibbs RA, Buetow KH: SNPdetector: a software tool for sensitive and accurate SNP detection. PLoS Comput Biol. 2005, 1: e53-10.1371/journal.pcbi.0010053.PubMedPubMed CentralView ArticleGoogle Scholar
- NCBI Trace Archive. [http://www.ncbi.nlm.nih.gov/Traces/trace.cgi]
- UCSC database human-chimpanzee alignments. [http://hgdownload.cse.ucsc.edu/goldenPath/hg18/vsPanTro2/]
- Jorde Laboratory Website. [http://jorde-lab.genetics.utah.edu]
- Shen Y, Wan Z, Coarfa C, Drabek R, Chen L, Ostrowski EA, Liu Y, Weinstock GM, Wheeler DA, Gibbs RA, Yu F: A SNP discovery method to assess variant allele probability from next-generation resequencing data. Genome Res. 2010, 20: 273-280. 10.1101/gr.096388.109.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ: BLAT-the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.PubMedPubMed CentralView ArticleGoogle Scholar
- The cross_match program. [http://www.phrap.org]
- Cai JJ: PGEToolbox: A Matlab toolbox for population genetics and evolution. J Hered. 2008, 99: 438-440. 10.1093/jhered/esm127.PubMedView ArticleGoogle Scholar
- Scheet P, Stephens M: A fast and flexible statistical model for large-scale population genotype data: applications to inferring missing genotypes and haplotypic phase. Am J Hum Genet. 2006, 78: 629-644. 10.1086/502802.PubMedPubMed CentralView ArticleGoogle Scholar
- White TD, Asfaw B, DeGusta D, Gilbert H, Richards GD, Suwa G, Howell FC: Pleistocene Homo sapiens from Middle Awash, Ethiopia. Nature. 2003, 423: 742-747. 10.1038/nature01669.PubMedView ArticleGoogle Scholar
- Weir BS, Cockerham CC: Estimating F-statistics for the analysis of population structure. Evolution. 1984, 38: 1358-1370. 10.2307/2408641.View ArticleGoogle Scholar
- Felsenstein J: PHYLIP (Phylogeny Inference Package) version 3.6. 2004, Department of Genome Sciences, University of Washington, SeattleGoogle Scholar
- The dadi program. [http://code.google.com/p/dadi]
- Burgess R, Yang Z: Estimation of hominoid ancestral population sizes under bayesian coalescent models incorporating mutation rate variation and sequencing errors. Mol Biol Evol. 2008, 25: 1979-1994. 10.1093/molbev/msn148.PubMedView ArticleGoogle Scholar
- Chen FC, Li WH: Genomic divergences between humans and other hominoids and the effective population size of the common ancestor of humans and chimpanzees. Am J Hum Genet. 2001, 68: 444-456. 10.1086/318206.PubMedPubMed CentralView ArticleGoogle Scholar
- Wall JD: Estimating ancestral population sizes and divergence times. Genetics. 2003, 163: 395-404.PubMedPubMed CentralGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.