Skip to main content

Table 1 Open problems

From: Open problems in human trait genetics

Category

#

Open problem

Brief explanation

Why it is important

Related open problems

Selected references

General

1

Population structure

Genetic studies are confounded by the ancestries of participants. Mounting evidence points towards residual population structure not accounted for, while overcorrection can obscure genuine genetic signal.

Without resolving this, it will be difficult to trust the results of genetic studies.

4, 6, 7, 12, 16

[10, 11]

2

Non-additive and epistatic genetic effects (GxG)

The assumption that phenotypes can be approximated by summing separate genetic effects is ubiquitous in genetic studies. If incorrect, this could undermine many results. Also, how do we identify and quantify epistatic effects?

Our ultimate goal is an accurate genetic model of human traits, linear or not.

11, 14

 

3

Gene-environment interactions (GxE)

Genetic effects may be contingent on environmental conditions. Such interactions are difficult to discover, and their overall contribution to phenotypic variance is not clear. Substantial GxE interactions would also undermine many methods.

GxE interactions are potentially an important piece in the genetic puzzle, which can highlight the mechanism of genetic associations and inform interventions.

11, 13

[12]

Data

4

Rare variants

Most genetic studies of complex traits deal only with common variants, even though the strongest effects are expected in rare variants. In aggregate, they may contribute substantially to heritability. Key challenges are lack of statistical power and genotyping.

Rare variants may be important to many complex traits. Neglecting them would leave us with an incomplete understanding of the genetic variation underlying these traits.

1, 5, 11

 

5

Non-standard genetic variation

Routine pipelines are optimized for simple variants (i.e., single-nucleotide variants and small indels), while commonly overlooking more complex genetic variation, including structural variants, copy number variation, repetitive regions and variants on the X, Y or MT chromosomes.

These types of variants contribute substantially to many traits.

4, 11

 

6

Family-based vs. population-based cohorts

Family-based study designs naturally overcome many challenges of cohort studies, specifically with respect to population structure, environmental biases, and direct vs. indirect genetic effects. However, family-based genetic resources are scarce, and there are not enough methods to analyze them.

Family-based genetic data could play an important role in studying genetic effects, especially when causality is sought.

1, 12

[13]

7

Ancestry diversity

Individuals of non-European ancestry are heavily underrepresented in genetic datasets, leading to inequality in access to medical knowledge. More diversity would also help deal with population structure and establish the causality of genetic associations.

We are interested in understanding the genetics of all of humanity, and we cannot afford to discard such a powerful tool.

1, 12, 16

[14]

8

Phenotype definition

Studied traits are often not entirely well defined, and there is often a lot of noise in the phenotyping process (mostly with respect to binary phenotypes).

Noisy and biased data hinders our progress.

 

[15]

9

Selection bias

Genetic associations may reflect people’s decision to participate rather than the studied phenotype.

This is potentially a major source of bias.

  

Heritability

10

Heritability estimate interpretation

It is not entirely clear what the “correct” way to define and measure heritability is and how heritability estimates should be interpreted. For example, do they provide an upper bound on the predictive power of polygenic risk scores?

Heritability estimates provide a lot of insight and guide our progress, and they could be even more useful if we reached a consensus on what they mean exactly.

11, 14

 

11

Missing heritability

This is a classic problem, asking why detected associations explain only a small part of the heritability in most complex traits, and why there is a large gap between heritability estimates obtained from SNP-based and twin-based methods. Despite a lot of progress in suggesting solutions and collecting evidence, the problem is still not fully resolved.

As long as this is not fully resolved, there are lingering doubts that our understanding of genetic effects is flawed in some fundamental way.

2, 3, 4, 5, 10, 14

[16, 17]

Association studies

12

From association to causality

Most genetic associations implicate entire genomic regions, and it is considered a hard problem to pinpoint the exact causal variants. It is also important to rule out confounding and other statistical biases.

If we want to learn from genetic associations, we need to be able to detect causal variants and genes.

1, 6, 7

[18]

13

From causality to mechanism

Even after the causality of genetic elements is established, understanding the molecular mechanisms behind them is a grand challenge. To date, only a very small fraction of genetic discoveries are understood at that level.

Without understanding the mechanism of genetic associations, they provide only limited biological and medical insight.

3

 

Polygenic risk scores

14

Genotype-to-phenotype prediction performance

Our ability to make accurate phenotypic predictions from genetic data is still very limited, even in highly heritable traits. Other than increasing sample sizes, we do not have very effective strategies to improve predictions.

Accurate genotype-to-phenotype predictions have an enormous clinical potential.

2, 10, 11, 15

[5]

15

The clinical utility of polygenic risk scores

The use of polygenic risk scores in the clinics remains quite limited. To be clinically useful, predictive models need to be proven robust and reliable.

If successfully implemented in the clinics, these models have the potential to revolutionize healthcare and usher in the era of personalized medicine.

14, 16

[7, 19]

16

Model transferability

Polygenic risk scores trained in one setting generally do not generalize well to other settings, including different ancestries or genotyping technologies.

This is critical for ensuring the robustness of these models and allow them to be used in the clinics, and for their fruits to benefit all groups.

1, 7, 15