Mining data from 1000 genomes to identify the causal variant in regions under positive selection

Grossman, Shari; Shlyakhter, Ilya; Karlsson, Elinor K; Tabrizi, Shervin; Andersen, Kristian; Rinn, John; Lander, Eric; Schaffner, Steve; Sabeti, Pardis C

doi:10.1186/gb-2010-11-s1-i22

Volume 11 Supplement 1

Beyond the Genome: The true gene count, human evolution and disease genomics

Invited speaker presentation
Published: 11 October 2010

Mining data from 1000 genomes to identify the causal variant in regions under positive selection

Shari Grossman^1,2,3,
Ilya Shlyakhter^1,2,
Elinor K Karlsson^1,2,
Shervin Tabrizi^1,2,
Kristian Andersen^1,2,
John Rinn²,
Eric Lander²,
Steve Schaffner²,
Pardis C Sabeti^1,2 &
The 1000 Genomes Project

Genome Biology volume 11, Article number: I22 (2010) Cite this article

3470 Accesses
Metrics details

The human genome contains hundreds of regions in which the patterns of genetic variation indicate recent positive natural selection, yet for most of these the underlying gene and the advantageous mutation remain unknown. We recently reported the development of a method, Composite of Multiple Signals (CMS), that combines tests for multiple signals of natural selection and increases resolution by up to 100-fold.

Applying CMS to candidate selected regions from the International Haplotype Map, we localized several hundred signals to ~50-100 kb, identifying individual gene and polymorphism targets of selection. These regions included genes involved in processes known to be targets of selection, such as infectious disease, skin pigment, metabolism, and hair and sweat. We further identified many candidates that are similar to regulatory elements. In several regions, we identified variants that are significantly associated with the expression of nearby genes in the selected population. Moreover nearly half of the ~200 regions we examined localized to regions with no genes. Thirty of the regions contain long non-coding RNAs that have been shown to often regulate nearby genes, suggesting that variation within the RNAs might have functional consequences.

With preliminary data now available from the 1000 Genomes Project, we are beginning to explore full sequence data, which should contains most if not all of the causal selected polymorphisms. We extended the CMS method to the preliminary data set, validating our previously identified candidates and identifying many new intriguing coding and regulatory variants.

Author information

Authors and Affiliations

Center for Systems Biology and Department of Organismic and Evolutionary Biology, Cambridge, MA, 02138, USA
Shari Grossman, Ilya Shlyakhter, Elinor K Karlsson, Shervin Tabrizi, Kristian Andersen & Pardis C Sabeti
Broad Institute of Harvard and MIT, Cambridge, MA, 02139, USA
Shari Grossman, Ilya Shlyakhter, Elinor K Karlsson, Shervin Tabrizi, Kristian Andersen, John Rinn, Eric Lander, Steve Schaffner & Pardis C Sabeti
Harvard Medical School, Boston, MA, 0211, USA
Shari Grossman

Authors

Shari Grossman
View author publications
You can also search for this author in PubMed Google Scholar
Ilya Shlyakhter
View author publications
You can also search for this author in PubMed Google Scholar
Elinor K Karlsson
View author publications
You can also search for this author in PubMed Google Scholar
Shervin Tabrizi
View author publications
You can also search for this author in PubMed Google Scholar
Kristian Andersen
View author publications
You can also search for this author in PubMed Google Scholar
John Rinn
View author publications
You can also search for this author in PubMed Google Scholar
Eric Lander
View author publications
You can also search for this author in PubMed Google Scholar
Steve Schaffner
View author publications
You can also search for this author in PubMed Google Scholar
Pardis C Sabeti
View author publications
You can also search for this author in PubMed Google Scholar

Consortia

The 1000 Genomes Project

Additional information

Shari Grossman, Ilya Shlyakhter contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grossman, S., Shlyakhter, I., Karlsson, E.K. et al. Mining data from 1000 genomes to identify the causal variant in regions under positive selection. Genome Biol 11 (Suppl 1), I22 (2010). https://doi.org/10.1186/gb-2010-11-s1-i22

Download citation

Published: 11 October 2010
DOI: https://doi.org/10.1186/gb-2010-11-s1-i22

Beyond the Genome: The true gene count, human evolution and disease genomics

Mining data from 1000 genomes to identify the causal variant in regions under positive selection

Author information

Authors and Affiliations

Consortia

The 1000 Genomes Project

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Genome Biology

Contact us

Beyond the Genome: The true gene count, human evolution and disease genomics

Mining data from 1000 genomes to identify the causal variant in regions under positive selection

Author information

Authors and Affiliations

Consortia

The 1000 Genomes Project

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us