Genetic disease risks can be misestimated across global populations

Background Accurate assessment of health disparities requires unbiased knowledge of genetic risks in different populations. Unfortunately, most genome-wide association studies use genotyping arrays and European samples. Here, we integrate whole genome sequence data from global populations, results from thousands of genome-wide association studies (GWAS), and extensive computer simulations to identify how genetic disease risks can be misestimated. Results In contrast to null expectations, we find that risk allele frequencies at known disease loci are significantly different for African populations compared to other continents. Strikingly, ancestral risk alleles are found at 9.51% higher frequency in Africa, and derived risk alleles are found at 5.40% lower frequency in Africa. By simulating GWAS with different study populations, we find that non-African cohorts yield disease associations that have biased allele frequencies and that African cohorts yield disease associations that are relatively free of bias. We also find empirical evidence that genotyping arrays and SNP ascertainment bias contribute to continental differences in risk allele frequencies. Because of these causes, polygenic risk scores can be grossly misestimated for individuals of African descent. Importantly, continental differences in risk allele frequencies are only moderately reduced if GWAS use whole genome sequences and hundreds of thousands of cases and controls. Finally, comparisons between uncorrected and corrected genetic risk scores reveal the benefits of considering whether risk alleles are ancestral or derived. Conclusions Our results imply that caution must be taken when extrapolating GWAS results from one population to predict disease risks in another population. Electronic supplementary material The online version of this article (10.1186/s13059-018-1561-7) contains supplementary material, which is available to authorized users.


Background
In the past decade, over 3300 genome-wide association studies (GWAS) have successfully identified more than 58,000 genetic associations with common diseases and other traits [1,2]. However, the vast majority of published GWAS have used samples of European ancestry [3,4], and a looming challenge is to be able to generalize GWAS results across populations [5][6][7][8][9][10][11]. An additional complication is that existing GWAS use genotyping arrays, as opposed to whole genome sequencing (WGS). Each disease-associated locus has risk and protective alleles. Results from GWAS can be combined to generate polygenic risk scores to predict individual risks of disease [12][13][14]. These polygenic risk scores quantify hereditary disease burdens by summing the number of risk alleles in each individual's genome and sometimes weighting SNPs by effect size [15]. The "missing heritability" problem hampers genetic risk scores, as many causal variants remain undiscovered [16,17]. Diseases can also have different genetic architectures in different populations [18]. Because of these issues, genetic predictions of disease risk are not always accurate, and it is important to be able to distinguish between situations where genetic risks actually differ between populations and when genetic predictions of differences in disease risks are spurious.
Although health disparities are often due to access to healthcare and socio-economic factors [19,20], genetic differences in disease risks arise when allele frequencies at disease-associated loci differ across populations [15]. Populations that share recent ancestry have similar allele frequencies and hereditary disease risks, while populations that diverged in the deep past can have large allele frequency differences at disease-associated loci [21,22]. These differences are magnified by population bottlenecks and founder effects, including elevated risks of cystic fibrosis among the Québécois [23] and cardiovascular disease among the descendants of the HMS Bounty mutineers [24]. However, many common diseases are polygenic [25,26], and allele frequency differences at individual loci tend to average out. Because of this, the overall burden of hereditary disease is expected to be similar across the globe [27], with the possible exception of reduced genetic load in African populations [28]. For polygenic diseases, the null expectation is that individuals from different populations will have similar counts of risk alleles.
The genetic ancestry of study participants can cause hereditary disease risks to be misestimated. Indeed, genetic risk scores generated from different study cohorts have been shown to vary across populations [5]. As of 2016, the ancestry of 81% of all GWAS samples was European and 14% was Asian [3], and this is likely to cause the set of known disease associations to be enriched for alleles that are polymorphic or intermediate frequency in Europe or Asia, but not Africa. Inequity in genetic studies parallels what is observed in social science research; most samples are from Western, educated, industrialized, rich and democratic (WEIRD) societies [29,30]. For disease associations to be detected, loci need to be polymorphic in the study population. Because of this, disease loci with allele frequencies that are zero or one in European populations are likely to be missed (i.e., the "known unknowns" [31]), and some of these disease loci will have intermediate frequencies in other populations. Disease associations found in one population can over-or underestimate genetic disease risks in other populations. One partial solution to this problem is to perform multiethnic GWAS that include individuals from multiple populations [32].
Commonly used genotyping arrays can also cause predictions of hereditary disease risks to be misestimated. One issue is that SNPs on genotyping arrays tend to have large minor allele frequencies [33][34][35]. These older SNPs often have large allele frequency differences between populations [36,37]. Systematic biases can also arise because commercially available genotyping arrays tend to use SNPs that were originally ascertained in European populations. This SNP ascertainment bias can be particularly problematic if it yields disease loci with risk allele frequencies that are high for one population and low for another population.
Demographic history also affects whether known disease-associated loci have biased allele frequencies.
Consider a scenario where disease-associated alleles are initially found at the same frequency in two populations, i.e., prior to divergence (Fig. 1a). Note that risk alleles can be ancestral (shared with other primates) or derived (due to new mutations) and that ancestral alleles tend to be high frequency while derived alleles tend to be low frequency [38,39]. Over time, allele frequencies at each locus diverge between daughter populations. Importantly, bottlenecked non-African populations have experienced greater amounts of genetic drift than African populations [40] (Fig. 1b). This asymmetry, coupled with statistical power being maximized at intermediate allele frequencies [41], can cause known disease-associated loci to have biased allele frequencies. Specifically, we predict that non-African GWAS will catch disease loci that have higher ancestral risk allele frequencies (and lower derived allele frequencies) in Africa (Fig. 1c). By contrast, we predict that African GWAS will catch a relatively unbiased set of disease-associated loci (Fig. 1d). Although continental differences in ancestral and derived risk allele frequencies have been observed for prostate cancer loci [42], these biases have yet to be studied in a comprehensive way.
At present, it is unknown how much the set of known disease associations hinders precision medicine and personal genomics. To bridge this knowledge gap, we integrated whole genome sequence data from global populations with results from thousands of GWAS and ran extensive computer simulations. These analyses (1) revealed novel empirical patterns at disease-associated loci, (2) identified multiple causes of how disease risks can be misestimated in global populations, and (3) examined different solutions to this problem (including alternative GWAS study designs and building genetic risk scores that correct for major sources of bias).

African risk allele frequencies differ from other continents
We tested whether there are any systematic biases in genetic estimates of disease risk by analyzing allele frequencies at 3036 GWAS loci for each continental population in the 1000 Genomes Project. Contrary to null expectations, mean risk allele frequencies are not the same for each population (Fig. 2a). Overall, African populations have significantly higher risk allele frequencies compared to non-African populations (mean difference + 1.15%, p value = 0.0213, paired Wilcoxon signed-rank test). Population-level differences in risk allele frequencies persist when disease associations are binned into seven different categories. Compared to other populations, African populations have the highest risk allele frequency for metabolic (p value = 0.0055), morphological (p value = 0.0949), cancer (p value = 0.1169), neurological (p value = 0.0995), and miscellaneous (p value = 0.3865, paired Wilcoxon signed-rank tests) diseases. African populations have intermediate frequencies of risk alleles at the loci that are associated with GI or liver diseases (p value = 0.6965) and lower frequencies of risk alleles at the loci that are associated with cardiovascular disease (p value = 0.0140, paired Wilcoxon signed-rank tests). These statistical comparisons reflect allele frequency differences at individual SNPs. Among non-African populations there is no underlying trend. Some of the continental patterns described here are at odds with clinical data (e.g., health disparities involving cardiovascular disease in African-Americans [43]). This discrepancy between clinical data and allele frequencies suggests that genetic disease risks may be misestimated for individuals with African ancestry.
Disease categories that have a larger proportion of ancestral alleles tend to have elevated risk allele frequencies in Africa (Fig. 2b). After binning GWAS loci by disease category, we find that the differences in the mean frequency of risk alleles between African and non-African populations are highly correlated with the proportion of risk alleles that are ancestral (r 2 = 0.842). Accurate estimation of genetic disease risks across global populations may hinge upon knowledge of whether risk-increasing alleles are ancestral or derived.

Ancestral and derived alleles yield different patterns of genetic disease risk
For loci that are not associated with any disease, the null expectation is that ancestral and derived allele frequencies will be broadly similar across global populations. Just because Homo sapiens emerged in Africa does not mean that African genomes have an excess of ancestral alleles-all human populations share the same evolutionary distance to chimpanzees. Due to the out-of-Africa bottleneck, African genomes are more likely to be heterozygous for derived alleles, and non-African genomes are more likely to be homozygous for derived alleles. Examining WGS data from the 1000 Genomes Project, we find that derived allele frequencies (DAF) are similar for each population (Fig. 3a). However, disease-associated loci need not exhibit the same pattern.
The joint site frequency spectrum (SFS) enables the frequencies of individual risk alleles to be compared between African and non-African populations. Similar numbers of disease associations are found above and b Non-African populations experience greater amounts of genetic drift. Diffusion of allele frequencies following divergence is indicated by red and blue shading. c European GWAS are predicted to catch derived risk alleles that have higher frequencies in Europe and ancestral risk alleles that have higher frequencies in Africa. d African GWAS are predicted to catch a relatively unbiased set of risk alleles below the diagonal in Fig. 3b. However, conditioning on whether risk alleles are ancestral or derived reveals a striking pattern: 69.2% of ancestral risk alleles are found at higher frequency in African populations (red dots below the diagonal), and 64.5% of derived risk alleles are found at higher frequency in non-African populations (blue dots above the diagonal). The magnitudes of allele frequency differences between populations also vary for ancestral and derived risk alleles. We find that ancestral risk alleles are found at much higher frequencies in Africa, and derived risk alleles are found at moderately lower frequencies in Africa (Fig. 3c). Specifically, the mean difference in ancestral risk allele frequencies between African and pooled non-African populations is + 9.51%, and the mean difference in derived risk allele frequencies between African and pooled non-African populations is − 5.40% (p value < 2.2 × 10 −16 for both comparisons, Wilcoxon signed-rank tests). The overall continental difference in risk allele frequencies of + 1.15% arises because 44% of presently known disease-associated loci have ancestral risk alleles.
Derived allele frequencies serve as proxies for SNP age [44], and we find that older disease-associated loci are more likely to have large differences in continental allele frequencies. For each 20% DAF bin (pooled data), we calculated the difference in risk allele frequencies between African and non-African populations. In sharp contrast to other DAF bins, published disease loci with DAF ≤ 0.2 exhibit only a small amount of bias (Fig. 3d). This pattern occurs regardless of whether risk alleles are ancestral or derived. Note that SNPs with DAF ≤ 0.2 tend to be younger than 125,000 years old, assuming an effective population size of 10,000 individuals and generation times of 25 years [44]. Known disease associations lead to misestimates of genetic disease risks. a Risk allele frequencies at published disease-associated loci from the NHGRI-EBI GWAS Catalog vary by population. "*" indicates a statistically significant allele frequency difference between African and non-African populations (p values < 0.05, paired Wilcoxon rank sum tests). n = number of disease-associated loci per disease category. b Proportion of disease-associated loci where the risk allele is ancestral, as opposed to derived Choice of study population contributes to misestimates of genetic disease risk Most disease associations have been discovered in study cohorts with European ancestry, and this can bias the estimation of genetic disease risks in diverse global populations. Empirical data reveal the effects of GWAS study populations; many disease-associated alleles segregate at intermediate frequencies in non-African populations but are found at extremely low or high frequencies in Africa (compare the vertical and horizontal borders of Fig. 3b). This occurs because statistical power is maximized at intermediate frequencies, and most disease-associated loci have been discovered in non-African populations. Existing GWAS have discovered relatively few disease alleles that segregate only in African populations.
To further isolate the effects of different study populations, we simulated a large number of GWAS results, varying the continental ancestry of each study cohort. Importantly, our GWAS simulations did not assume that there are any underlying differences in hereditary disease risks across populations. We find that computer simulations recapitulate empirical patterns at known disease loci and that GWAS of bottlenecked non-African populations yield different results than GWAS of African populations (Fig. 4). Simulated GWAS that use an African (AFR) cohort yield similar risk allele frequencies across each of the five continental populations. However, simulated GWAS that use American (AMR), East Asian (EAS), European (EUR), or South Asian (SAS) cohorts produce a set of disease-associated loci with elevated frequencies of ancestral risk alleles in Africa (Fig. 4a) and reduced frequencies of derived risk alleles in Africa (Fig. 4b). These simulation results indicate that systematic allele frequency differences between populations need not be due to any underlying difference in risk (recall that our simulations did not assume the existence of any underlying differences in disease risks across populations). The effects of European study cohorts are still a b c d seen when GWAS simulations use data from WGS, as opposed to genotyping arrays (Table 1). We also find that continental differences in risk allele frequencies occur if GWAS simulations use a more stringent p value filter, or simulations assume different modes of inheritance, including dominant or recessive disease alleles (Additional file 1: Table S1 and Additional file 2: Table S2).
Additionally, GWAS simulations of study cohorts that contain a mixture of individuals from different populations still yield disease-associated loci with continental biases in risk allele frequencies (MIX in Fig. 4). These results suggest that pooling samples with different ancestries is unlikely to completely alleviate the problem of misestimating genetic disease risks. Regardless of the choice of study cohort, allele frequencies are similar for each non-African population, reflecting the relatively recent divergence times between these populations. We also examined the effects of genotype-by-environment (GxE) interactions by allowing effect sizes to vary by population in our GWAS simulations. In general, results from these simulations mirror the results of other simulations; ancestral risk allele frequencies are higher in African populations than non-African populations, and derived risk allele frequencies are lower in African populations than non-African populations (Additional file 3: Figure S1). Compared to African study cohorts, European study cohorts magnify these allele frequency differences between populations. Choice of study cohort imposes a filter on effect sizes, as SNPs with very small effect sizes do not yield detectable associations (compare gray pre-GWAS effect sizes to red and blue post-GWAS effect sizes in Additional file 3: Figures  S1-S3). Large effect sizes enable high-frequency ancestral alleles and low-frequency derived alleles to be detected in a GWAS. The results described above are also robust to systematic biases in effect sizes, i.e., scenarios where pre-GWAS European effect sizes tend to be larger than African effect-sizes or vice versa (Additional file 3: Figures S2 and S3).  Genotyping arrays and SNP ascertainment bias cause disease risks to be misestimated Many commonly used genotyping arrays contain SNPs that were ascertained in a relatively small number of European individuals. This ascertainment bias results in allele frequency distributions that vary by genotyping platform. Compared to WGS data, derived allele frequencies are higher for SNPs on the Affymetrix Genome-Wide Human SNP Array 6.0 and the Illumina Omni 5M microarray. SNPs on genotyping arrays also exhibit continental biases (Fig. 3a). Specifically, we find that derived allele frequencies in African populations are markedly lower than derived allele frequencies in non-African populations (p value < 2.2 × 10 −16 for both arrays, Wilcoxon signed-rank tests).
The joint SFS of non-African and African populations further reveals the effects of SNP ascertainment bias. Examining WGS data, we find that similar numbers of SNPs have elevated derived allele frequencies in non-African and African populations (Additional file 3: Figure S4a). By contrast, the Affymetrix Genome-Wide Human SNP Array 6.0 and the Illumina Omni 5M microarray are enriched SNPs with higher derived allele frequencies outside of Africa (i.e., SNPs above the diagonal in Additional file 3: Figure S4b and Additional file 3: Figure S4c). Importantly, this pattern mirrors what is seen for empirical GWAS data (Additional file 3: Figure S4d), which suggests that genotyping arrays contribute to continental differences in risk allele frequencies at known disease-associated loci.
Because many disease-associations involve imputed SNPs, we also tested whether continental differences in risk allele frequencies persist for disease-associated loci that are not on the Affymetrix Genome-Wide Human SNP 6.0 Array. For this empirical set of disease-associated loci, we find that sites with ancestral risk alleles have higher allele frequencies in Africa (+ 8.63% on average) and that SNPs with derived risk alleles have lower allele frequencies in Africa (− 4.83% on average). This suggests that biases persist even for imputed SNPs.
Continental differences in allele frequencies persist even if whole genome sequencing and large sample sizes are used Simulations of GWAS results were used to infer the extent that misestimates of disease risks depend upon genotyping technology (Table 1). Here, simulations assume European ancestry for each study cohort and sample sizes of 3500 cases and 3500 controls. We find that different genotyping arrays yield similar results: the Affymetrix Genome-Wide Human SNP Array 6.0 and the Illumina Omni 5M microarray yield ancestral risk allele frequencies that are 10.7% and 11.0% higher in Africa and derived risk alleles that are 8.0% and 8.2% higher in Europe, respectively. Somewhat surprisingly, continental differences in allele frequencies also occur for GWAS simulations that use WGS data. Focusing on WGS GWAS simulations, ancestral risk allele frequencies are 9.7% higher in Africa, and derived risk alleles are 7.2% higher in Europe. These patterns arise because of our choice of study cohort and because sample sizes of 3500 cases and 3500 controls have relatively little power to catch rare disease alleles.
Continental biases in risk allele frequencies occur even if GWAS use large sample sizes. Simulated GWAS with less than 10,000 European cases and controls yield large differences in African and non-African allele frequencies (Fig. 5). This occurs regardless of whether simulations use SNPs from the genotyping arrays or WGS. Increasing GWAS sample sizes increases the statistical power to detect associations with rare alleles. However, our simulations reveal that there are diminishing returns for increasing sample sizes, especially if GWAS use genotyping arrays. Well-powered studies with hundreds and thousands of cases and controls still yield notable differences in continental allele frequencies-even if WGS are used (Fig. 5). These results indicate that WGS is unable to completely mitigate the effects different study populations.

Correcting for ancestral and derived risk alleles leads to improved genetic risk scores
Standardized genetic risk scores (GRS) were generated for 2504 individuals and 7 different disease categories. This involved integrating a curated list of disease-associated loci from the NHGRI-EBI GWAS Catalog with individual-level genotype data from the 1000 Genomes Project. Positive GRS values indicate genomes that contain more risk alleles than the global mean, and negative GRS values indicate genomes that contain less risk alleles than the global mean. Standardized GRS are scaled in terms of standard deviations from the mean, i.e., they are Z-scores. In general, different populations have GRS distributions that mirror what is seen for allele frequency data (compare Fig. 6 to Fig. 2a). We find that African individuals have uncorrected GRS that differ from other populations (p value = 0.0037 for GI or liver diseases and p value < 2.2 × 10 −16 for all other disease categories, Mann-Whitney U tests). These differences are larger for metabolic, cancer, and cardiovascular disease risks. There is a substantial amount of overlap between the GRS distributions of each non-African population, and this pattern occurs for all disease categories. Within each population, there is also a large range of GRS values. Also note that admixed genomes from the Americas (AMR in Fig. 6) have GRS that are broadly similar to other non-African genomes. Although GRS reflect an individual's genetic propensity for different disease categories, we caution against over-interpreting these results. This is because GRS have been built from a biased set of disease-associated loci. GRS corrections reduce some, but not all, of the population-level differences in predicted disease risks. Here, we compensate for continental differences in ancestral and derived risk allele frequencies by generating corrected GRS for African genomes. We find that African individuals have corrected GRS that are similar to other populations for metabolic (p value = 0.8080), morphological (p value = 0.0671), and neurological (p value = 0.7116, Mann-Whitney U tests) disease risks. By contrast, African individuals have corrected GRS that are different than other populations for GI or liver, cancer, miscellaneous, and cardiovascular disease risks (p value < 2.2 × 10 −16 for each disease category, Mann-Whitney U tests). Corrections involve in a leftward shift in the GRS of African genomes, the magnitude of which depends on the proportion of ancestral risk alleles for each disease category (compare the size of arrows in Fig. 6). We observe three different outcomes: minimal effects, over-correction, and reduction of bias. Cardiovascular risk predictions for African genomes were largely unchanged (i.e., GRS still appear to underestimate the risks of cardiovascular disease in individuals of African descent). Two disease categories (GI or liver and miscellaneous diseases) have corrected GRS distributions that differ more between African and non-African populations than uncorrected GRS distributions. The remaining four disease categories (metabolic, morphological, cancer, and neurological diseases) have corrected GRS distributions that overlap heavily with other populations.
Although the correction method used here alleviates some forms of bias, our results suggest that GRS can be further improved by considering additional parameters.

Discussion
The biased set of disease associations that are presently known leads to misestimates of hereditary disease risks. Specifically, African populations tend to have higher frequencies of ancestral risk alleles and lower frequencies of derived risk alleles at existing GWAS loci. Considering the magnitude of these differences and the proportion of disease-associated alleles that are ancestral, as opposed to derived, yields risk allele frequencies that are 1.15% higher in Africa. Elevated risk allele frequencies in African populations are the opposite of what one expects to see given human demographic history. Due to population bottlenecks, non-African populations are expected to have greater amounts of genetic load [28]. This discrepancy arises because GWAS rely on European study cohorts and data from genotyping arrays. Systematic allele frequency biases can be mistaken for directional selection, hindering tests of polygenic selection acting on GWAS traits [45]. Continental differences in allele frequencies also have important ramifications for precision medicine and personal genomics; disease risks are likely to be misestimated if GWAS results are naively used to calculate genetic risk scores (Fig. 6). This can obscure the existing health disparities that are due to socio-cultural factors including access to medical care [46,47]. High-risk Here, we are concerned with the limitations of using disease associations discovered in one population to predict disease risks in another population, as opposed to whether GWAS findings can be successfully replicated across multiple populations. The effects of different study cohorts are asymmetric. Non-African GWAS results can be used to predict disease risks in other non-African populations, but these disease associations generalize poorly to African populations (Fig. 4). By contrast, African GWAS results can be used to predict Fig. 6 Genetic risk scores (GRS) before and after correcting for ancestral and derived risk alleles. GRS probability densities for each continental population are shown (solid lines, uncorrected GRS; dashed lines, corrected GRS for African genomes). n = number of disease-associated loci per disease category. Arrows indicate the shift in African GRS after correcting for whether risk alleles are ancestral or derived. "*" indicates uncorrected African GRS that are significantly different than non-African GRS, and "©" indicates corrected African GRS that are significantly different than non-African GRS (p values < 0.05, Mann-Whitney U tests) disease risks in a relatively unbiased way across all global populations. This asymmetry arises as a by-product of demographic history and the out-of-Africa migration (Fig. 1) and because GWAS use arrays that suffer from SNP ascertainment bias (Fig. 3a). Our results suggest that there may be additional benefits to including a large number of African individuals in multiethnic GWAS. We note that difficulties can arise when transferring GWAS results from one non-African population to another non-African population. This is due to both the existence of private risk alleles and divergence times that can exceed 30,000 years. Regardless of the study cohort used to generate genetic risk scores, it is impossible to fully correct for missing risk alleles from understudied populations. Problems generalizing GWAS results cannot be solved by only using WGS and large sample sizes (Fig. 5). Furthermore, many variants discovered by WGS are rare and population-specific. That said, genetic risk scores generated from WGS data are expected to be less biased than genetic risk scores generated from array data, especially when sample sizes are large.
Although this paper focuses on risk allele frequency differences across populations, we note that many disease loci remain undetected, and this also contributes to misestimates of disease risks. These missing disease loci are particularly important when risk alleles are population-specific. This underscores the need for genetic epidemiology studies to include samples from a diverse set of populations.
Our study demonstrates the benefits of adopting an evolutionary perspective towards health and disease [48,49]. Important empirical patterns would not have been noticed without considering ancestral vs. derived states of alleles. Continental differences in allele frequencies also depend upon SNP age. An evolutionary perspective is also valuable for understanding how genetic disease risks can be misestimated across populations. Specifically, we find that it matters whether populations have experienced a history of bottlenecks and founder effects. Knowing whether individual disease loci have experienced a history of natural section can lead to additional insights [42,50,51].
Recently, Martin et al. found that polygenic risk scores yield inaccurate predictions of height and schizophrenia and that GRS for type II diabetes depend upon on choice of study cohort [5]. Using coalescent simulations, they also found that the proportion of heritability that can be explained decreases with distance to the GWAS study population. Using complementary approaches, our study resulted in novel discoveries. We find that ancestral and derived states of risk alleles play a central role in the estimation of genetic disease risks across multiple populations, something missed by prior studies that examine the generalizability of GWAS results. We also find that important asymmetries exist when extrapolating the results between African and non-African populations and that population bottlenecks play a key role (i.e., generalizability of results depends on much more than the evolutionary distance between populations). By explicitly testing the effects of different genotyping technologies and sample sizes, we were able to discover that WGS of hundreds of thousands of cases and controls still yields biased GWAS results. Martin et al. also advocate mean-centering GRS for each population [5], but this solution can be problematic if hereditary disease risks actually differ between populations.
Our GRS calculations illustrate how misestimation of genetic risks can obscure whether there are any real differences in disease risks across populations (Fig. 6). Two types of error are possible: (1) The underlying risk of a particular disease may actually be the same for different populations, yet GRS distributions show little overlap.
(2) The underlying risk of a particular disease may actually differ for populations, yet GRS distributions show extensive overlap. Accurate GRS corrections are needed to exclude either of these two possibilities. Environmental effects and genotype-by-environment interactions also contribute to disease phenotypes [52]. Studies of immigrants, admixed families, and adopted individuals may prove to be particularly informative with respect to genetics and health inequities [53][54][55][56]. PCA information can be used to improve GRS for admixed genomes [57]. Corrected GRS for admixed genomes may also benefit from local ancestry painting tools like RFMix [58] or ELAI [59].

Conclusions
Going forward, multiple approaches can be used to extend the benefits of precision medicine and personal genomics to a wide range of global populations. One option is to assume that disease associations can be generalized across populations without any complications. However, this approach is flawed because only a biased set of disease loci is known at present. A second option is to require that genetic risk scores only use disease associations discovered in the same population (i.e., avoid generalizing results across populations). However, this is unfeasible from a logistical standpoint-as it would require repeating every GWAS in every global population. A third option is to use whole genome sequencing and large African study cohorts to generate sets of disease-associated loci that can be generalized as free of bias. On a more practical side, genetic risk scores can be generated that correct for existing biases. This requires understanding how risk allele frequencies differ between populations (as shown here) and leveraging linkage disequilibrium information to infer the effect sizes of risk alleles in non-study populations [60,61]. Finally, we note that the gold standard for evaluating the genetic risk scores involves testing how well they predict disease phenotypes in diverse populations-something that requires individual-level phenotype data. Only by understanding population genetics and the effects of SNP ascertainment bias can accurate predictive models of genetic disease risks be built.

Population genetic data
Allele frequencies were obtained for each of the five continental populations of the 1000 Genomes Project: Africa (AFR), Americas (AMR), East Asia (EAS), Europe (EUR), and South Asia (SAS) [21]. These frequencies were used to generate risk allele frequencies and derived allele frequencies at disease-associated loci from the NHGRI-EBI GWAS Catalog and simulated datasets. Ancestral and derived states in phase 3 1000 Genomes Project VCF files were used (these ancestral states were inferred via the EPO pipeline from Ensembl). We found that derived allele frequencies in all populations were elevated for large chunks of chromosome 8, which is indicative of misidentified ancestral states. To compensate for this, we masked SNPs found in the chr8: 89,000,000-146,364,022 region (hg19). Individuals in phase 3 of 1000 Genomes Project were genotyped using WGS. Allele frequencies of SNPs on the Affymetrix Genome-Wide Human SNP Array 6.0 and the Illumina Omni 5M microarray were found by merging data from the 1000 Genomes Project with lists of SNP IDs obtained from the Affymetrix and Illumina websites.

Identification of disease-associated variants
Using the NHGRI-EBI GWAS Catalog [1], Berens et al. generated a curated set of 3180 disease-associated loci [62]. This involved filtering out SNPs that were not associated with a disease, eliminating SNPs lacking risk allele or odds ratio information, and LD-pruning. Here, we further constrained the set of disease-associated loci from [62] by requiring knowledge of whether risk alleles are ancestral or derived. After excluding 144 SNPs with unknown ancestral states, we were left with a focal set of 3036 disease-associated loci (Additional file 4: Table S3). We classified these 3036 disease-associated loci into 7 non-overlapping categories: gastrointestinal/liver, metabolic, morphological, cancer, neurological, miscellaneous, and cardiovascular. Wilcoxon signed-rank tests were used to compare disease allele frequencies between African and non-African populations. Disease-associated loci were binned by DAF, averaging across all 1000 Genomes Populations. Allele ages were estimated as per Eq. 4 in [44] (assuming N = 10,000 and a generation time of 25 years).

GWAS simulations
Computer simulations were used to test whether SNP ascertainment bias alone can produce what appears to be genetic differences in disease risks across populations.
The goal here was to generate simulated datasets comparable to the set of 3036 disease-associated loci from the NHGRI-EBI GWAS Catalog. These simulations assume that the underlying risks of disease are the same across the globe. Two general types of simulations were run: simulations with ancestral risk alleles and simulations with derived risk alleles. Simulations involved randomly drawing a test SNP from a list of known genetic variants ascertained via WGS or found on commercial genotyping arrays. Conditioning on whether risk alleles are ancestral or derived, the risk allele frequency of the test SNP was found in the study population. We then used a Perl script based on the GAS/CaTS power calculator [41] to determine the probability of detecting a successful genetic association at the test SNP. The GAS power calculator leverages information about the number of cases and controls, p value threshold, disease model, prevalence, disease allele frequency, and genotype relevant risk (http://csg.sph.umich.edu/abecasis/cats/gas_ power_calculator/). For each test SNP, we generated a uniformly distributed random number between 0 and 1. The test SNP was retained if the random number was less than the power to successfully detect a genetic association, and the test SNP was rejected if the random number was greater than the probability of detection. This process was repeated until a set of 3036 successful disease associations were detected. At each of these 3036 SNPs, we obtained simulated risk allele frequencies for five populations in the 1000 Genomes Project dataset (AFR, AMR, EAS, EUR, SAS). Our default parameters were as follows: genotyping technology = Affymetrix Genome-Wide Human SNP Array 6.0, study population = Europe (EUR), sample size = 3500 cases and 3500 controls, genetic model = additive, p value threshold = 10 −5 , prevalence = 0.1, and genotype relative risk = 1.211. These parameter values were chosen to be representative of the empirical data found in the NHGRI-EBI GWAS Catalog.
Our default model was modified to test which aspects of SNP ascertainment bias contribute the most to continental differences in risk allele frequencies. This involved varying the following simulation parameters: genotyping technology, sample size, mode of inheritance, and the p value threshold required for association detection. To examine the effects of different study populations, simulated risk allele frequencies were chosen from one of the five different populations (AFR, AMR, EAS, EUR, or SAS) or from an equal mixture of all five populations (MIX). The effects of different sample sizes were simulated by varying the number of cases and controls from three to six on a log 10 scale at intervals of 0.1 (i.e., between 1000 and 1,000,000 cases and controls). The effects of different genotyping technologies were simulated by drawing random SNPs from either the Affymetrix Genome-Wide Human SNP Array 6.0, the Illumina Omni 5M microarray, or WGS data from the 1000 Genomes Project. Three genetic modes of inheritance were simulated: dominant, additive, and recessive. Two different p value thresholds were simulated: 1 × 10 −5 and 5 × 10 −8 .
We also simulated the results of GWAS when effect sizes vary between populations. Simulations examined three different effect size distributions (symmetric, larger effect sizes in Europe, and larger effect sizes in Africa), two different types of risk alleles (ancestral and derived), and two different study cohorts (European and African). In each simulation run, 3036 disease-associated loci were obtained using the power calculator described above.
Simulations were repeated 1000 times per combination of parameters. Symmetric effect sizes were generated by drawing locus-specific genotype relative risks for each test SNP from a gamma distribution (shape = 1.24, scale = 0.85). These parameter values were chosen to give a distribution of effect sizes that is comparable to loci in the NHGRI-EBI GWAS Catalog. We allowed genotype relative risks for each test SNP to vary by population by adding random noise (normally distributed, mean = 0, standard deviation = 0.5). Simulated genotype relative risks < 1 were set equal to 1. Larger European effect sizes were generated by drawing locus-specific genotype relative risks from a gamma distribution that was shifted 0.5 upwards (Additional file 3: Figure S3). Larger African effect sizes were generated by drawing locus-specific genotype relative risks from a gamma distribution shifted 0.5 to the right (Additional file 3: Figure S4). A representative dataset from GWAS simulations is included in Additional file 5: Table S4.

GRS corrections
Genetic risk scores (GRS) for 2504 individuals were built using genotypes at a curated set of 3036 disease-associated loci from the NHGRI-EBI GWAS Catalog. Note that genetic risk scores are sometimes called polygenic risk scores (PRS). For each disease locus, we counted whether an individual has 0, 1, or 2 copies of the risk allele. Because each disease category includes a heterogeneous set of diseases and phenotypes, we did not incorporate odds ratio and/or effect size information into our GRS calculations. Counts of risk alleles were then summed across all loci that belong to a particular disease category, yielding a raw GRS for each individual. Standardized GRS values were calculated for each combination of individual and disease category by finding the mean and standard deviation of raw GRS values across all 2504 individuals in our global dataset. Given our empirical results (Fig. 3c), diploid African genomes tend to have 0.1902 (2 × 9.51%) additional copies of each ancestral risk allele and 0.1082 (2 × 5.41%) fewer copies of each derived risk allele compared to non-African genomes. Because of this, our correction method considered the state of the risk alleles (ancestral or derived). Uncorrected African GRS use counts of 0, 1, or 2 risk alleles at each disease locus. Corrected African GRS use counts of − 0.1902, 0.8098, and 1.8098 "effective risk alleles" for ancestral alleles and 0.1082, 1.1082, and 2.1082 "effective risk alleles" for derived alleles. The same mapping of raw GRS to standardized GRS was used for uncorrected and corrected African GRS.