How genetic disease risks can be misestimated across global populations

Background Accurate assessment of health disparities requires unbiased knowledge of genetic risks in different populations. Unfortunately, most genome-wide association studies use genotyping arrays and European samples. Here, we integrate whole genome sequence data from global populations, results from thousands of GWAS, and extensive computer simulations to identify how genetic disease risks can be misestimated. Results In contrast to null expectations, we find that risk allele frequencies at known disease loci are significantly different for African populations compared to other continents. Strikingly, ancestral risk alleles are found at 9.51% higher frequency in Africa and derived risk alleles are found at 5.40% lower frequency in Africa. By simulating GWAS with different study populations, we find that non-African cohorts yield disease associations that have biased allele frequencies and that African cohorts yield disease associations that are relatively free of bias. We also find empirical evidence that genotyping arrays and SNP ascertainment bias contribute to continental differences in risk allele frequencies. Because of these causes, polygenic risk scores can be grossly misestimated for individuals of African descent. Importantly, continental differences in risk allele frequencies are only moderately reduced if GWAS use whole genome sequences and hundreds of thousands of cases and controls. Finally, comparisons between uncorrected and corrected genetic risk scores reveal the benefits of considering whether risk alleles are ancestral or derived. Conclusions Our results imply that caution must be taken when extrapolating GWAS results from one population to predict disease risks in another population.


Background
In the past decade, over 3,500 genome-wide association studies (GWAS) have successfully identified more than 68,000 genetic associations with common diseases and other traits [1,2]. However, the vast majority of published GWAS have used samples of European ancestry [3,4], and a looming challenge is to be able to generalize GWAS results across populations [5][6][7][8][9][10][11]. An additional complication is that existing GWAS use genotyping arrays, as opposed to whole genome sequencing (WGS). Each diseaseassociated locus has risk and protective alleles, and results from GWAS can be combined to generate polygenic risk scores to predict individual risks of disease [12][13][14]. These polygenic risk scores quantify hereditary disease burdens by summing the number of risk alleles in each individual's genome and sometimes weighting SNPs by effect size [15].
The "missing heritability" problem hampers genetic risk scores, as many causal variants remain undiscovered [16,17]. Diseases can also have different genetic architectures in different populations [18]. Because of these issues, genetic predictions of disease risk are not always accurate, and it is important to be able to distinguish between situations where genetic risks actually differ between populations and when genetic predictions of differences in disease risks are spurious.
Although health disparities are often due to access to healthcare and socioeconomic factors [19,20], genetic differences in disease risks arise when allele frequencies at disease-associated loci differ across populations [15]. Populations that share recent ancestry have similar allele frequencies and hereditary disease risks, while populations that diverged in the deep past can have large allele frequency differences at disease-associated loci [21,22]. These differences are magnified by population bottlenecks and founder effects, including elevated risks of cystic fibrosis among the Québécois [23] and cardiovascular disease among the descendants of the HMS Bounty mutineers [24]. However, many common diseases are polygenic [25,26], and allele frequency differences at individual loci tend to average out. Because of this, the overall burden of hereditary disease is expected to be similar across the globe [27], with the possible exception of reduced genetic load in African populations [28]. For polygenic diseases, the null expectation is that individuals from different populations will have similar counts of risk alleles.
The genetic ancestry of study participants can cause hereditary disease risks to be misestimated. Indeed, genetic risk scores generated from different study cohorts have been shown to vary across populations [5]. As of 2016, the ancestry of 81% of all GWAS samples was European and 14% was Asian [3], and this is likely to cause the set of known disease associations to be enriched for alleles that are polymorphic or intermediate frequency in Europe or Asia, but not Africa. Inequities in genetic studies parallel what is observed in social science research: most samples are from Western, educated, industrialized, rich and democratic (WEIRD) societies [29,30]. For disease associations to be detected, loci need to be polymorphic in the study population. Because of this, disease loci with allele frequencies that are zero or one in European populations are likely to be missed (i.e. the "known unknowns" [31]), and some of these disease loci will have intermediate frequencies in other populations. Disease associations found in one population can over-or underestimate genetic disease risks in other populations. One partial solution to this problem is to perform multiethnic GWAS that include individuals from multiple populations [32].
Commonly used genotyping arrays can also cause predictions of hereditary disease risks to be misestimated. One issue is that SNPs on genotyping arrays tend to have large minor allele frequencies [33][34][35]. These older SNPs often have large allele frequency differences between populations [36,37]. Systematic biases can also arise because commercially available genotyping arrays tend to use SNPs that were originally ascertained in European populations. This SNP ascertainment bias can be particularly problematic if it yields disease loci with risk allele frequencies that are high for one population and low for another population.
Demographic history also affects whether known disease associated loci have biased allele frequencies. Consider disease-associated alleles that are initially found at the same frequency in two populations, i.e. prior to divergence (Figure 1a). Note that risk alleles can be ancestral (shared with other primates) or derived (due to new mutations) and that ancestral alleles tend to be high frequency while derived alleles tend to be low frequency [38,39]. Over time, allele frequencies at each locus diverge between daughter populations. Importantly, bottlenecked non-African populations have experienced greater amounts of genetic drift than African populations [40] (Figure 1b). This asymmetry, coupled with statistical power being maximized at intermediate allele frequencies [41], can cause known disease-associated loci to have biased allele frequencies. Specifically, we predict that non-African GWAS will catch disease loci that have higher ancestral risk allele frequencies (and lower derived allele frequencies) in Africa (Figure 1c). By contrast, we predict that African GWAS will catch a relatively unbiased set of disease-associated loci ( Figure 1d). Although continental differences in ancestral and derived risk allele frequencies have been observed for prostate cancer loci [42], these biases have yet to be studied in a comprehensive way.
At present, it is unknown how much the set of known disease associations hinders precision medicine and personal genomics. To bridge this knowledge gap, we integrated whole genome sequence data from global populations with results from thousands of GWAS and ran extensive computer simulations. These analyses: 1) revealed novel empirical patterns at disease-associated loci, 2) identified multiple causes of how disease risks can be misestimated in global populations, and 3) examined different solutions to this problem (including alternative GWAS study designs and building genetic risk scores that correct for major sources of bias).

African risk allele frequencies differ from other continents
We tested whether there are any systematic biases in genetic estimates of disease risk by analyzing allele frequencies at 3036 GWAS loci for each continental population in the 1000 Genomes Project. Contrary to null expectations, mean risk allele frequencies are not the same for each population (Figure 2a) Disease categories that have a larger proportion of ancestral alleles tend to have elevated risk allele frequencies in Africa (Figure 2b). After binning GWAS loci by disease category, we find that the differences in the mean frequency of risk alleles between African and non-African populations are highly correlated with the proportion of risk alleles that are ancestral (r 2 = 0.842). Accurate estimation of genetic disease risks across global populations may hinge upon knowledge of whether risk-increasing alleles are ancestral or derived.
Ancestral and derived alleles yield different patterns of genetic disease risk For loci that are not associated with any disease, the null expectation is that ancestral and derived allele frequencies will be broadly similar across global populations. Just because Homo sapiens emerged in Africa does not mean that African genomes have an excess of ancestral alleles -all human populations share the same evolutionary distance to chimpanzees. Due to the out-of-Africa bottleneck, African genomes are more likely to be heterozygous for derived alleles and non-African genomes are more likely to be homozygous for derived alleles. Examining WGS data from the 1000 Genomes Project, we find that derived allele frequencies (DAF) are similar for each population (Figure 3a).
However, disease-associated loci need not exhibit the same pattern.
The joint site frequency spectrum (SFS) enables the frequencies of individual risk alleles to be compared between African and non-African populations. Similar numbers of disease associations are found above and below the diagonal in Figure 3b. However, conditioning on whether risk alleles are ancestral or derived reveals a striking pattern: 69.2% of ancestral risk alleles are found at higher frequency in African populations (red dots below the diagonal), and 64.5% of derived risk alleles are found at higher frequency in non-African populations (blue dots above the diagonal). The magnitudes of allele frequency differences between populations also vary for ancestral and derived risk alleles.
We find that ancestral risk alleles are found at much higher frequencies in Africa and derived risk alleles are found at moderately lower frequencies in Africa ( Figure 3c).
Specifically, the mean difference in ancestral risk allele frequencies between African and pooled non-African populations is +9.51%, and the mean difference in derived risk allele frequencies between African and pooled non-African populations is -5.40% (p-value < 2.2x10 -16 for both comparisons, Wilcoxon signed-rank tests). The overall continental difference in risk allele frequencies of +1.15% arises because 44% of presently known disease-associated loci have ancestral risk alleles.  To further isolate the effects of different study populations we simulated a large number of GWAS results, varying the continental ancestry of each study cohort. Importantly, our GWAS simulations did not assume that there are any underlying differences in hereditary disease risks across populations. We find that computer simulations recapitulate empirical patterns at known disease loci, and that GWAS of  Figure 4b). These simulation results indicate that systematic allele frequency differences between populations need not be due to any underlying difference in risk. The effects of European study cohorts are still seen when GWAS simulations use data from WGS, as opposed to genotyping arrays (Table 1). We also find that continental differences in risk allele frequencies occur if GWAS simulations use a more stringent p-value filter or simulations assume different modes of inheritance, including dominant or recessive disease alleles (Table S1 and Table S2). Additionally, GWAS simulations of study cohorts that contain a mixture of individuals from different populations still yield disease-associated loci with continental biases in risk allele frequencies (MIX in Figure 4). These results suggest that pooling samples with different ancestries is unlikely to completely alleviate the problem of misestimating genetic disease risks. Regardless of the choice of study cohort, allele frequencies are similar for each non-African population, reflecting the relatively recent divergence times between these populations.
We also examined the effects of genotype-by-environment (GxE) interactions by allowing effect sizes to vary by population in our GWAS simulations. In general, results from these simulations mirror results other simulations: ancestral risk allele frequencies are higher in African populations than non-African populations, and derived risk allele frequencies are lower in African populations than non-African populations ( Figure S2).
Compared to African study cohorts, European study cohorts magnify these allele frequency differences between populations. Choice of study cohort imposes a filter on effect sizes, as SNPs with very small effect sizes do not yield detectable associations (compare gray pre-GWAS effects sizes to red and blue post-GWAS effect sizes in Figures   S2-S4). Large effect sizes enable high frequency ancestral alleles and low frequency derived alleles to be detected in a GWAS. The results described above are also robust to systematic biases in effect sizes, i.e. scenarios where pre-GWAS European effect sizes tend to be larger than African effect-sizes or vice versa ( Figures S3 and S4).
Genotyping arrays and SNP ascertainment bias cause disease risks to be misestimated Many commonly used genotyping arrays contain SNPs that were ascertained in a relatively small number of European individuals. This ascertainment bias results in allele frequency distributions that vary by genotyping platform. Compared to WGS data, derived allele frequencies are higher for SNPs on the Affymetrix Genome-Wide Human SNP Array 6.0 and the Illumina Omni 5M microarray. SNPs on genotyping arrays also exhibit continental biases ( Figure 3a). Specifically, we find that derived allele frequencies in African populations are markedly lower than derived allele frequencies in non-African populations (p-value < 2.2x10 -16 for both arrays, Wilcoxon signed-rank tests).
The joint SFS of non-African and African populations further reveals the effects of SNP ascertainment bias. Examining WGS data, we find that similar numbers of SNPs have elevated derived allele frequencies in non-African and African populations ( Figure S1a).
By contrast, the Affymetrix Genome-Wide Human SNP Array 6.0 and the Illumina Omni 5M microarray are enriched SNPs with higher derived allele frequencies outside of Africa (i.e. SNPs above the diagonal in Figure S1b and Figure S1c). Importantly, this pattern mirrors what is seen for empirical GWAS data ( Figure S1d), which suggests that genotyping arrays contribute to continental differences in risk allele frequencies at known disease-associated loci.
Because many disease-associations involve imputed SNPs, we also tested whether continental differences in risk allele frequencies persist for disease-associated loci that are not on the Affymetrix Genome-Wide Human SNP 6.0 Array. For this empirical set of disease-associated loci, we find that sites with ancestral risk alleles have higher allele frequencies in Africa (+8.63% on average) and that SNPs with derived risk alleles have lower allele frequencies in Africa (-4.83% on average). This suggests that biases persist even for imputed SNPs.
Continental differences in allele frequencies persist even if whole genome sequencing and large sample sizes are used Simulations of GWAS results were used to infer the extent that misestimates of disease risks depend upon genotyping technology (Table 1). Here, simulations assume European ancestry for each study cohort and sample sizes of 3500 cases and 3500 controls. We find that different genotyping arrays yield similar results: the Affymetrix Genome-Wide Human SNP Array 6.0 and the Illumina Omni 5M microarray yield ancestral risk allele frequencies that are 10.7% and 11.0% higher in Africa and derived risk alleles that are 8.0% and 8.2% higher in Europe, respectively. Somewhat surprisingly, continental differences in allele frequencies also occur for GWAS simulations that use WGS data.
Focusing on WGS GWAS simulations, ancestral risk allele frequencies are 9.7% higher in Africa and derived risk alleles are 7.2% higher in Europe. These patterns arise because of our choice of study cohort and because sample sizes of 3500 cases and 3500 controls have relatively little power to catch rare disease alleles.
Continental biases in risk allele frequencies occur even if GWAS use large sample sizes. Simulated GWAS with less than 10,000 European cases and controls yield large differences in African and non-African allele frequencies ( Figure 5). This occurs regardless of whether simulations use SNPs from the genotyping arrays or WGS.
Increasing GWAS sample sizes increases the statistical power to detect associations with rare alleles. However, our simulations reveal that there are diminishing returns for increasing sample sizes, especially if GWAS use genotyping arrays. Well-powered studies with hundreds and thousands of cases and controls still yield notable differences in continental allele frequencies -even if WGS are used ( Figure 5). These results indicate that WGS is unable to completely mitigate the effects different study populations.
Correcting for ancestral and derived risk alleles leads to improved genetic risk scores Also note that admixed genomes from the Americas (AMR in Figure 6) have GRS that are broadly similar to other non-African genomes. Although GRS reflect an individual's genetic propensity for different disease categories, we caution against over-interpreting these results. This is because GRS have been built from a biased set of disease-associated loci.
GRS corrections reduce some, but not all, of the population-level differences in predicted disease risks. Here, we compensate for continental differences in ancestral and derived risk allele frequencies by generating corrected GRS for African genomes. We find that African individuals have corrected GRS that are similar to other populations for metabolic (p-value = 0.8080), morphological (p-value = 0.0671), and neurological disease risks (p-value = 0.7116, Mann-Whitney U tests). By contrast, African individuals have corrected GRS that are different than other populations for GI or liver, cancer, miscellaneous, and cardiovascular disease risks (p-value < 2.2x10 -16 for each disease category, Mann-Whitney U tests). Corrections involve in a leftward shift in the GRS of African genomes, the magnitude of which depends on the proportion of ancestral risk alleles for each disease category (compare the size of arrows in Figure 6). We observe three different outcomes: minimal effects, over-correction, and reduction of bias.
Cardiovascular risk predictions for African genomes were largely unchanged (i.e. GRS still appear to underestimate the risks of cardiovascular disease in individuals of African descent). Two disease categories (GI or liver and miscellaneous diseases) have corrected GRS distributions that differ more between African and non-African populations than uncorrected GRS distributions. The remaining four disease categories (metabolic, morphological, cancer, and neurological diseases) have corrected GRS distributions that overlap heavily with other populations. Although the correction method used here alleviates some forms of bias, our results suggest that GRS can be further improved by considering additional parameters.

Discussion
The biased set of disease associations that are presently known causes hereditary disease risks to be misestimated. Specifically, African populations tend to have higher frequencies of ancestral risk alleles and lower frequencies of derived risk alleles at existing GWAS loci.
Considering the magnitude of these differences and the proportion of disease-associated alleles that are ancestral, as opposed to derived, yields risk allele frequencies that are 1.15% higher in Africa. Elevated risk allele frequencies in African populations are the opposite of what one expects to see given human demographic history. Due to population bottlenecks, non-African populations are expected to have greater amounts of genetic load [28]. This discrepancy arises because GWAS rely on European study cohorts and data from genotyping arrays. Continental differences in allele frequencies also have important ramifications for precision medicine and personal genomics: disease risks are likely to be misestimated if GWAS results are naively used to calculate genetic risk scores ( Figure 6). This can obscure existing health disparities that are due to socio-cultural factors including access to medical care [45,46]. High risk individuals may have genetic profiles that lull them into a false sense of security, and low risk individuals may have genetic risk profiles that lead to an undue amount of worry.
Here, we are concerned with the limitations of using disease associations discovered in one population to predict disease risks in another population, as opposed to whether GWAS findings can be successfully replicated across multiple populations. The effects of different study cohorts are asymmetric. Non-African GWAS results can be used to predict disease risks in other non-African populations, but these disease associations generalize poorly to African populations (Figure 4). By contrast, African GWAS results can be used to predict disease risks in a relatively unbiased way across all global populations.
This asymmetry arises as a by-product of demographic history and the out-of-Africa migration ( Figure 1) and because GWAS use arrays that suffer from SNP ascertainment bias (Figure 3a). Our results suggest that there may be additional benefits to including a large number of African individuals in multiethnic GWAS. We note that difficulties can still arise when transferring GWAS results from one non-African population to another non-African population. This is due to both the existence of private risk alleles and divergence times that can exceed 30,000 years. Regardless of the study cohort used to generate genetic risk scores, it is impossible to fully correct for missing risk alleles from understudied populations. Problems generalizing GWAS results cannot be solved by only using WGS and large sample sizes ( Figure 5). Furthermore, many variants discovered by WGS are rare and population-specific. That said, genetic risk scores generated from WGS data are expected to be less biased than genetic risk scores generated from array data, especially when sample sizes are large.
Although this paper focuses on risk allele frequency differences across populations, we note that many disease loci remain undetected, and this also contributes to misestimates of disease risks. These missing disease loci are particularly important when risk alleles are population-specific. This underscores the need for genetic epidemiology studies to include samples from a diverse set of populations.
Our study demonstrates the benefits of adopting an evolutionary perspective towards health and disease [47,48]. Important empirical patterns would not have been noticed without considering ancestral vs. derived states of alleles. Continental differences in allele frequencies also depend upon SNP age. An evolutionary perspective is also valuable for understanding how genetic disease risks can be misestimated across populations. Specifically, we find that it matters whether populations have experienced a history of bottlenecks and founder effects. Knowing whether individual disease loci have experienced a history of natural section can lead to additional insights [42, 49,50]. That said, systematic allele frequency biases can be mistaken for directional selection, hindering tests of polygenic selection acting on GWAS traits [51].
Recently, Martin et al. found that polygenic risk scores yield inaccurate predictions of height and schizophrenia, and that GRS for Type II Diabetes depend upon on choice of study cohort [5]. Using coalescent simulations, they also found that the proportion of heritability that can be explained decreases with distance to the GWAS study population.
Using complementary approaches, our study resulted in novel discoveries. We find that ancestral and derived states of risk alleles play a central role in the estimation of genetic disease risks across multiple populations, something missed by prior studies that examine the generalizability of GWAS results. We also find that important asymmetries exist when extrapolating results between African and non-African populations, and that population bottlenecks play a key role (i.e. generalizability of results depends on more than the evolutionary distance between populations). By explicitly testing the effects of different genotyping technologies and sample sizes, we were able to discover that WGS of hundreds of thousands of cases and controls still yields biased GWAS results. Martin et al.
also advocate mean-centering GRS for each population [5], but this solution can be problematic if hereditary disease risks actually differ between populations.
Our GRS calculations illustrate how misestimation of genetic risks can obscure whether there are any real differences in disease risks across populations ( Figure 6). Two interactions also contribute to disease phenotypes [52]. Studies of immigrants, admixed families, and adopted individuals may prove to be particularly informative with respect to genetics and health inequities [53][54][55][56]. Corrected GRS for admixed genomes may also benefit from the use of local ancestry painting tools like RFMix [57] or ELAI [58].

Conclusions
Going forward, multiple approaches can be used to extend the benefits of precision medicine and personal genomics to a wide range of global populations. One option is to assume that disease associations can be generalized across populations without any complications. However, this approach is flawed because only a biased set of disease loci are known at present. A second option is to require that genetic risk scores only use disease associations discovered in the same population (i.e. avoid generalizing results across populations). However, this is unfeasible from a logistical standpoint -as it would require repeating every GWAS in every global population. A third option is to use whole genome sequencing and large African study cohorts to generate sets of disease associated-loci that can be generalized relatively free of bias. On a more practical side, genetic risk scores can be generated that correct for existing biases. This requires understanding how risk allele frequencies differ between populations (as shown here) and leveraging linkage disequilibrium information to infer the effect sizes of risk alleles in nonstudy populations [59,60]. Finally, we note that the gold-standard for evaluating the genetic risk scores involves testing how well they predict disease phenotypes in diverse populations -something that requires individual-level phenotype data. Only by understanding population genetics and the effects of SNP ascertainment bias can accurate predictive models of genetic disease risks be built.

GWAS simulations
Computer simulations were used to test whether SNP ascertainment bias alone can produce what appears to be genetic differences in disease risks across populations. The goal here was to generate simulated datasets comparable to the set of 3036 diseaseassociated loci from the NHGRI-EBI GWAS Catalog. These simulations assume that the underlying risks of disease are the same across the globe. Two general types of simulations were run: simulations with ancestral risk alleles and simulations with derived risk alleles. Simulations involved randomly drawing a test SNP from a list of known genetic variants ascertained via WGS or found on commercial genotyping arrays. Conditioning on whether risk alleles are ancestral or derived, the risk allele frequency of the test SNP was found in the study population. We then used a Perl script based on the GAS/CaTS power calculator [41] to determine the probability of detecting a successful genetic association at the test SNP. The GAS power calculator leverages information about the number of cases and controls, p-value threshold, disease model, prevalence, disease allele frequency, and genotype relevant risk (http://csg.sph.umich.edu/abecasis/cats/gas_power_calculator/).
For each test SNP, we generated a uniformly distributed random number between 0 and 1.
The test SNP was retained if the random number was less than the power to successfully detect a genetic association, and the test SNP was rejected if the random number was greater than the probability of detection. This process was repeated until a set of 3036 successful disease associations were detected. At each of these 3036 SNPs, we obtained We also simulated the results of GWAS where effect sizes vary between populations. Simulations examined three different effect size distributions (symmetric, larger effect sizes in Europe, and larger effect sizes in Africa), two different types of risk alleles (ancestral and derived), and two different study cohorts (European and African). In each simulation run, 3036 disease-associated loci were obtained using the power calculator described above. Simulations were repeated 1000 times per combination of parameters. Symmetric effect sizes were generated by drawing locus-specific genotype relative risks for each test SNP from a gamma distribution (shape= 1.24, scale = 0.85).
These parameter values were chosen to give a distribution of effect sizes that is comparable to loci in the NHGRI-EBI GWAS Catalog. We allowed genotype relative risks for each test SNP to vary by population by adding random noise (normally distributed, mean = 0, standard deviation = 0.5). Simulated genotype relative risks <1 were set equal to 1. Larger European effect sizes were generated by drawing locus-specific genotype relative risks from a gamma distribution that was shifted 0.5 upwards ( Figure S3). Larger African effect sizes were generated by drawing locus-specific genotype relative risks from a gamma distribution shifted 0.5 to the right ( Figure S4). A representative dataset from GWAS simulations is included in Table S4 GRS corrections Genetic risk scores (GRS) for 2504 individuals were built using genotypes at a curated set of 3036 disease-associated loci from the NHGRI-EBI GWAS Catalog. Note that genetic risk scores are sometimes called polygenic risk scores (PRS). For each disease locus we counted whether an individual has 0, 1, or 2 copies of the risk allele. Because each disease category includes a heterogeneous set of diseases and phenotypes we did not incorporate odds ratio and/or effect size information into our GRS calculations. Counts of risk alleles were then summed across all loci that belong to a particular disease category, Uncorrected African GRS use counts of 0,1, or 2 risk alleles at each disease locus.
The same mapping of raw GRS to standardized GRS was used for uncorrected and corrected African GRS. Table S1. Effects of different p-value thresholds for GWAS simulations. Table S2. GWAS simulations of dominant, additive, and recessive disease alleles. Table S3. Curated set of 3036 disease-associated loci from the NHGRI-EBI GWAS Catalog. Table S4. Representative set of loci from GWAS simulations Figure S1. Joint site frequency spectra for multiple genotyping technologies. Figure S2. GWAS simulations that allow effect sizes to vary by population.