Effective detection of rare variants in pooled DNA samples using cross-pool tail-curve analysis

Niranjan, Tejasvi S; Adamczyk, Abby; Bravo, Hector Corrada; Taub, Margaret A; Wheelan, Sarah J; Irizarry, Rafael; Wang, Tao

doi:10.1186/gb-2011-12-s1-p45

Volume 12 Supplement 1

Beyond the Genome 2011

Poster presentation
Published: 19 September 2011

Effective detection of rare variants in pooled DNA samples using cross-pool tail-curve analysis

Tejasvi S Niranjan^1,2,
Abby Adamczyk¹,
Hector Corrada Bravo³^nAff4,
Margaret A Taub⁵,
Sarah J Wheelan^5,6,
Rafael Irizarry⁵ &
…
Tao Wang¹

Genome Biology volume 12, Article number: P45 (2011) Cite this article

1016 Accesses
1 Citations
Metrics details

Rare genetic variants of large effect may confer a substantial genetic risk for common diseases and complex traits. There is considerable interest in sequencing limited genomic regions such as candidate genes and target regions identified by genetic linkage and/or association studies. Next-generation sequencing of pooled DNA samples is an efficient way to identify rare variants in large sample sets. Although sample pooling can reduce the labor and cost of sequencing, it also reduces the sensitivity and specificity for effective and reliable identification of rare variants. It remains a challenge to solve these problems using the available computational genomics tools. We have developed an effective Illumina-based sequencing strategy using pooled samples and have optimized a novel base-calling algorithm, Srfim, and a variant-calling algorithm, SERVIC⁴E (Sensitive Rare Variant Identification by Cross-pool Cluster, Continuity & Tail-Curve Evaluation). SERVIC⁴E analyzes base composition by cycle or tail-curves across sample pools and employs multiple filtering strategies, including quality and continuity cluster analysis, average quality filtering, tail-curve filtering and error proximity filtering, to accurately identify rare sequence variants. We validated these algorithms using two independent Illumina sequence datasets generated from different pool sizes, read lengths and sequencing chemistries. Using these programs, we identified 32 coding variants, including 14 present only once over 24 exon-containing regions in one sample cohort (n = 480), and 41 coding variants, including 16 present only once in the same regions in an unrelated cohort (n = 480). Validation of these variants by Sanger sequencing revealed an excellent combination of sensitivity (97.8% and 96.4%) and specificity (84.9% and 93.8%) for variant detection in pooled samples from both cohorts, respectively. Data from these studies showed that our algorithms compare favorably with the available programs, including SAMtools, SNPSeeker, CRISP and Syzygy, for the effective and reliable detection of rare variants in pooled samples.

Author information

Hector Corrada Bravo
Present address: Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, MD, 20742, USA

Authors and Affiliations

McKusick-Nathans Institute of Genetic Medicine and Department of Pediatrics, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
Tejasvi S Niranjan, Abby Adamczyk & Tao Wang
Predoctoral Training Program in Human Genetics, Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
Tejasvi S Niranjan
Center for Bioinformatics and Computational Biology, Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
Hector Corrada Bravo
Department of Biostatistics, Bloomberg School of Public Health, Johns Hopkins University, Baltimore, MD, 21205, USA
Margaret A Taub, Sarah J Wheelan & Rafael Irizarry
Department of Oncology, Johns Hopkins University School of Medicine, Baltimore, MD, 21287, USA
Sarah J Wheelan

Authors

Tejasvi S Niranjan
View author publications
You can also search for this author in PubMed Google Scholar
Abby Adamczyk
View author publications
You can also search for this author in PubMed Google Scholar
Hector Corrada Bravo
View author publications
You can also search for this author in PubMed Google Scholar
Margaret A Taub
View author publications
You can also search for this author in PubMed Google Scholar
Sarah J Wheelan
View author publications
You can also search for this author in PubMed Google Scholar
Rafael Irizarry
View author publications
You can also search for this author in PubMed Google Scholar
Tao Wang
View author publications
You can also search for this author in PubMed Google Scholar

Additional information

Tejasvi S Niranjan, Abby Adamczyk, Hector Corrada Bravo contributed equally to this work.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Niranjan, T.S., Adamczyk, A., Bravo, H.C. et al. Effective detection of rare variants in pooled DNA samples using cross-pool tail-curve analysis. Genome Biol 12 (Suppl 1), P45 (2011). https://doi.org/10.1186/gb-2011-12-s1-p45

Download citation

Published: 19 September 2011
DOI: https://doi.org/10.1186/gb-2011-12-s1-p45

Beyond the Genome 2011

Effective detection of rare variants in pooled DNA samples using cross-pool tail-curve analysis

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Keywords

Genome Biology

Contact us

Beyond the Genome 2011

Effective detection of rare variants in pooled DNA samples using cross-pool tail-curve analysis

Author information

Authors and Affiliations

Additional information

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us