Unequal representation of genetic variation across ancestry groups creates healthcare inequality in the application of precision medicine
© The Author(s). 2016
Published: 14 July 2016
An important application of modern genomics is diagnosing genetic disorders. We use the largest publicly available exome sequence database to show that this key clinical service can currently be performed much more effectively in individuals of European genetic ancestry.
It has long been argued that the concentration of large scale genomic data generation on individuals of European ancestry can contribute to healthcare inequalities [1, 2]. Currently, in the search for a genetic diagnosis, much of the effort in the diagnostic sequencing paradigm is focused on candidate variants among known disease-associated genes that are either absent or sufficiently rare in available control reference cohorts, each of which is considered carefully as a possible explanation for the relevant presentation. Need and Goldstein specifically argued in 2009 that our ability to effectively filter variants to identify pathogenic ones as sequencing becomes clinically routine would be very different amongst different ancestry groups unless our knowledge of genetic variation is made more equal across ancestry groups . Unfortunately, now with clinical sequencing becoming routine this fear has been clearly realized. The common experience is that when this clinical service is done today in patients of European ancestry, the number of candidate variants is significantly less than in other geographic ancestry groups.
When searching for genetic aberrations responsible for Mendelian disorders, the expectation that pathogenic genotypes will be under strong negative selection instructs us to focus on genotypes at low or unobserved frequencies in the general population [3–5]. As population reference cohorts increase in size we capture lower allele frequencies with improved resolution . The recently released Exome Aggregation Consortium (ExAC) dataset [7, 8], which contains aggregated exome sequence data from 60,252 individuals with an assigned geographic ancestry, aids in identifying allelic frequencies at an approximately sixfold lower resolution than what was available from the combination of two pre-existing datasets, the Exome Sequencing Project (ESP) and the 1000 Genomes Project. Approximately 60.9 % of the samples in this ExAC reference cohort are of European ancestry, compared with 13.7 % of South Asian ancestry, 9.6 % of Latino ethnicity, 8.6 % of African (African American) ancestry, and 7.2 % of East Asian ancestry.
Geographic ancestry, rare variants, and disease-associated genes
We previously described “narrative potential”  as the opportunity to construct variant-disease narratives given that every genome will contain rare variants predicted to be damaging by in silico tools. To illustrate the value of ancestry matched controls, we generated rare variant distributions for the different ancestry groups. The distributions reflect the number of rare non-synonymous variants found among the 3393 current disease-associated genes from the Online Mendelian Inheritance in Man (OMIM) database.
The first assessment (Fig. 1b) compares the European (blue) and non-European (red) distribution for the number of singleton non-synonymous variants each sample has among OMIM disease-associated genes (Additional file 1). The minor allele frequency (MAF) is based on the internal database of 5965 IGM samples. Due to the reduced access to ethnically matched controls, when comparing the distribution between the European and non-European ancestries, we find longer candidate lists among non-Europeans (Mann–Whitney U test p < 1 × 10−320).
Group summaries for the number of singleton non-synonymous candidate variants in OMIM disease-associated genes among IGM’s 5965 samples
Geographic ancestry / ethnic group
Number of individuals
Number of singletons using internal reference cohort (n = 5,965)
Number of singletons using internal and ExAC reference cohorts (n = 66,217)
African (African American)
These analyses illustrate how unequal representation of genetic variation can negatively affect present genomic interpretation in individuals of non-European ancestry. While the results are unsurprising given our understanding of population genetics, there are still important lessons. Firstly, these data show that it is instructive to assess the allele frequencies of non-European cases in their matched ancestry group(s). Secondly, increasing diversity of geographic ancestry and sample size among sequenced reference cohorts greatly ameliorates the problem (Fig. 1).
Given that sample sizes are about to explode with the US national initiative and other large-scale international sequencing studies, it is vital that we ensure the most equitable distribution of the generation of genomic data possible. Enriching our knowledge of genetic variation in different ancestry groups remains the most effective solution to this problem. With initiatives like the recently announced Precision Medicine Initiative (PMI) Cohort Program, this must be recognized as a high priority for the field as we move towards an era where precision medicine is a reality. If not, genomics could further contribute to healthcare inequalities.
ExAC, Exome Aggregation Consortium; IGM, Institute for Genomic Medicine; OMIM, Online Mendelian Inheritance in Man; PC, principal component
The authors would like to thank the Exome Aggregation Consortium and the groups that provided exome variant data for comparison. A full list of contributing groups can be found at http://exac.broadinstitute.org/about. SP is a National Health and Medical Research Council of Australia (NHMRC) CJ Martin Early Career Fellow.
SP and DBG conceived and designed the study. SP and DBG drafted the manuscript. SP and DBG read and approved the final manuscript.
The authors declare that they have no competing interests.
Ethics approval and consent to participate
IGM participants provided written informed consent.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Need AC, Goldstein DB. Next generation disparities in human genomics: concerns and remedies. Trends Genet. 2009;25:489–94.View ArticlePubMedGoogle Scholar
- Bustamante CD, Burchard EG, De la Vega FM. Genomics for the world. Nature. 2011;475:163–5.View ArticlePubMedPubMed CentralGoogle Scholar
- Bustamante CD, Fledel-Alon A, Williamson S, Nielsen R, Hubisz MT, Glanowski S, Tanenbaum DM, White TJ, Sninsky JJ, Hernandez RD, et al. Natural selection on protein-coding genes in the human genome. Nature. 2005;437:1153–7.Google Scholar
- Kimura M. The Neutral Theory of Molecular Evolution. 1st ed. Cambridge: Cambridge University Press, 1983. Cambridge Books Online. http://dx.doi.org/10.1017/CBO9780511623486.
- Carmi S, Hui KY, Kochav E, Liu X, Xue J, Grady F, Guha S, Upadhyay K, Ben-Avraham D, Mukherjee S, et al. Sequencing an Ashkenazi reference panel supports population-targeted personal genomics and illuminates Jewish and European origins. Nat Commun. 2014;5:4835.Google Scholar
- Gravel S, Henn BM, Gutenkunst RN, Indap AR, Marth GT, Clark AG, Yu F, Gibbs RA, Genomes P, Bustamante CD. Demographic history and rare allele sharing among human populations. Proc Natl Acad Sci U S A. 2011;108:11983–8.Google Scholar
- Lek M, Karczewski K, Minikel E, Samocha K, Banks E, Fennell T, O'Donnell-Luria A, Ware J, Hill A, Cummings B, et al. Analysis of protein-coding genetic variation in 60,706 humans. bioRxiv. 2015. http://dx.doi.org/10.1101/030338.
- ExAC: Exome Aggregation Consortium. http://exac.broadinstitute.org/. Accessed July 2015.
- Price AL, Patterson NJ, Plenge RM, Weinblatt ME, Shadick NA, Reich D. Principal components analysis corrects for stratification in genome-wide association studies. Nat Genet. 2006;38:904–9.View ArticlePubMedGoogle Scholar
- Goldstein DB, Allen A, Keebler J, Margulies EH, Petrou S, Petrovski S, Sunyaev S. Sequencing studies in human genetics: design and interpretation. Nat Rev Genet. 2013;14:460–70.Google Scholar