Skip to main content

Mining data from 1000 genomes to identify the causal variant in regions under positive selection

The human genome contains hundreds of regions in which the patterns of genetic variation indicate recent positive natural selection, yet for most of these the underlying gene and the advantageous mutation remain unknown. We recently reported the development of a method, Composite of Multiple Signals (CMS), that combines tests for multiple signals of natural selection and increases resolution by up to 100-fold.

Applying CMS to candidate selected regions from the International Haplotype Map, we localized several hundred signals to ~50-100 kb, identifying individual gene and polymorphism targets of selection. These regions included genes involved in processes known to be targets of selection, such as infectious disease, skin pigment, metabolism, and hair and sweat. We further identified many candidates that are similar to regulatory elements. In several regions, we identified variants that are significantly associated with the expression of nearby genes in the selected population. Moreover nearly half of the ~200 regions we examined localized to regions with no genes. Thirty of the regions contain long non-coding RNAs that have been shown to often regulate nearby genes, suggesting that variation within the RNAs might have functional consequences.

With preliminary data now available from the 1000 Genomes Project, we are beginning to explore full sequence data, which should contains most if not all of the causal selected polymorphisms. We extended the CMS method to the preliminary data set, validating our previously identified candidates and identifying many new intriguing coding and regulatory variants.

Author information

Authors and Affiliations



Additional information

Shari Grossman, Ilya Shlyakhter contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Grossman, S., Shlyakhter, I., Karlsson, E.K. et al. Mining data from 1000 genomes to identify the causal variant in regions under positive selection. Genome Biol 11 (Suppl 1), I22 (2010).

Download citation

  • Published:

  • DOI: