Skip to content

Advertisement

You're viewing the new version of our site. Please leave us feedback.

Learn more
Open Access

Mining data from 1000 genomes to identify the causal variant in regions under positive selection

  • Shari Grossman1, 2, 3,
  • Ilya Shlyakhter1, 2,
  • Elinor K Karlsson1, 2,
  • Shervin Tabrizi1, 2,
  • Kristian Andersen1, 2,
  • John Rinn2,
  • Eric Lander2,
  • Steve Schaffner2,
  • Pardis C Sabeti1, 2 and
  • The 1000 Genomes Project
Contributed equally
Genome Biology201011(Suppl 1):I22

https://doi.org/10.1186/gb-2010-11-s1-i22

Published: 11 October 2010

The human genome contains hundreds of regions in which the patterns of genetic variation indicate recent positive natural selection, yet for most of these the underlying gene and the advantageous mutation remain unknown. We recently reported the development of a method, Composite of Multiple Signals (CMS), that combines tests for multiple signals of natural selection and increases resolution by up to 100-fold.

Applying CMS to candidate selected regions from the International Haplotype Map, we localized several hundred signals to ~50-100 kb, identifying individual gene and polymorphism targets of selection. These regions included genes involved in processes known to be targets of selection, such as infectious disease, skin pigment, metabolism, and hair and sweat. We further identified many candidates that are similar to regulatory elements. In several regions, we identified variants that are significantly associated with the expression of nearby genes in the selected population. Moreover nearly half of the ~200 regions we examined localized to regions with no genes. Thirty of the regions contain long non-coding RNAs that have been shown to often regulate nearby genes, suggesting that variation within the RNAs might have functional consequences.

With preliminary data now available from the 1000 Genomes Project, we are beginning to explore full sequence data, which should contains most if not all of the causal selected polymorphisms. We extended the CMS method to the preliminary data set, validating our previously identified candidates and identifying many new intriguing coding and regulatory variants.

Notes

Authors’ Affiliations

(1)
Center for Systems Biology and Department of Organismic and Evolutionary Biology
(2)
Broad Institute of Harvard and MIT
(3)
Harvard Medical School

Copyright

© Grossman et al; licensee BioMed Central Ltd. 2010

This article is published under license to BioMed Central Ltd.

Advertisement