- Invited speaker presentation
- Open Access
Mining data from 1000 genomes to identify the causal variant in regions under positive selection
- Shari Grossman†1, 2, 3,
- Ilya Shlyakhter†1, 2,
- Elinor K Karlsson1, 2,
- Shervin Tabrizi1, 2,
- Kristian Andersen1, 2,
- John Rinn2,
- Eric Lander2,
- Steve Schaffner2,
- Pardis C Sabeti1, 2 and
- The 1000 Genomes Project
https://doi.org/10.1186/gb-2010-11-s1-i22
© Grossman et al; licensee BioMed Central Ltd. 2010
- Published: 11 October 2010
Keywords
- Natural Selection
- Mining Data
- Regulatory Variant
- Regulatory Element
- Preliminary Data
The human genome contains hundreds of regions in which the patterns of genetic variation indicate recent positive natural selection, yet for most of these the underlying gene and the advantageous mutation remain unknown. We recently reported the development of a method, Composite of Multiple Signals (CMS), that combines tests for multiple signals of natural selection and increases resolution by up to 100-fold.
Applying CMS to candidate selected regions from the International Haplotype Map, we localized several hundred signals to ~50-100 kb, identifying individual gene and polymorphism targets of selection. These regions included genes involved in processes known to be targets of selection, such as infectious disease, skin pigment, metabolism, and hair and sweat. We further identified many candidates that are similar to regulatory elements. In several regions, we identified variants that are significantly associated with the expression of nearby genes in the selected population. Moreover nearly half of the ~200 regions we examined localized to regions with no genes. Thirty of the regions contain long non-coding RNAs that have been shown to often regulate nearby genes, suggesting that variation within the RNAs might have functional consequences.
With preliminary data now available from the 1000 Genomes Project, we are beginning to explore full sequence data, which should contains most if not all of the causal selected polymorphisms. We extended the CMS method to the preliminary data set, validating our previously identified candidates and identifying many new intriguing coding and regulatory variants.
Notes
Authors’ Affiliations
Copyright
This article is published under license to BioMed Central Ltd.