Dissemination of scientific software with Galaxy ToolShed
© Blankenberg et al.; licensee BioMed Central Ltd. 2014
Published: 20 February 2014
The proliferation of web-based integrative analysis frameworks has enabled users to perform complex analyses directly through the web. Unfortunately, it also revoked the freedom to easily select the most appropriate tools. To address this, we have developed Galaxy ToolShed.
Previously, our group has investigated the persistence of mitochondrial variants (heteroplasmies) through mother-child transmissions . Many disease-causing mitochondrial variants are heteroplasmic and their clinical manifestations depend on the relative proportion of normal to mutant alleles [2–4]. Because almost all of the mitochondrial genome is transcribed , the next important question is whether the relative frequencies of heteroplasmic alleles are maintained in transcripts. We turned to published studies to find the appropriate dataset that would include matched genomic and transcriptomic data. The initial analysis of DNA/RNA differences by Li et al.  omitted the mitochondrial transcriptome and a much more comprehensive dataset by Chen et al.  has since become available. The latter contains both whole genome and RNA sequencing data from a single individual and is therefore ideally suited for our purpose. To perform this analysis, we started with a ‘clean’ Galaxy Amazon EC2 instance [8–10], mapped the reads against the latest version of the human genome, retained properly mapped pairs, removed reads mapping to multiple locations, added readgroup information, and combined all results into a single binary version of the sequence alignment/map format (BAM) dataset for further analysis (Additional file 1) .
This short example has illustrated that the ToolShed behaves as a de facto AppStore: when users need an analysis tool that is not present in a given Galaxy instance, it can be easily fetched and installed. Just like a brand new iPad, Galaxy comes with a small number of preinstalled applications providing basic functionality. Additional tools may subsequently be installed from the ToolShed to create a ‘flavor’ of Galaxy suitable for a particular analysis. An expanded discussion of the ToolShed can be found in the online supplement.
Binary version of the sequence alignment/map format
Naïve Variant Caller
Variant call format.
The efforts of the Galaxy Team (Dave Clements, Nate Coraor, Carl Eberhard, Dorine Francheteau, Jeremy Goecks, Sam Guerler and Jennifer Jackson) were instrumental for making this work happen. Special thanks to the members of the ToolShed oversight committee (Ira Cooke, Jim Johnson, Ed Kirton, Peter Cock, Brad Chapman, Björn Grüning, Ross Lazarus) for continuing their review of tools being submitted to the ToolShed. This project was supported through grant number HG005542 from the National Human Genome Research Institute, National Institutes of Health as well as grants HG005133, HG004909 and HG006620 and NSF grant DBI 0543285. Additional funding is provided by Huck Institutes for the Life Sciences at Penn State and, in part, under a grant with the Pennsylvania Department of Health using Tobacco Settlement Funds. The Department specifically disclaims responsibility for any analyses, interpretations or conclusions.
- Goto H, Dickins B, Afgan E, Paul IM, Taylor J, Makova KD, Nekrutenko A: Dynamics of mitochondrial heteroplasmy in three families investigated via a repeatable re-sequencing study. Genome Biol. 2011, 12: R59-10.1186/gb-2011-12-6-r59.PubMedPubMed CentralView ArticleGoogle Scholar
- Chinnery PF, Thorburn DR, Samuels DC, White SL, Dahl HM, Turnbull DM, Lightowlers RN, Howell N: The inheritance of mitochondrial DNA heteroplasmy: random drift, selection or both?. Trends Genet. 2000, 16: 500-505. 10.1016/S0168-9525(00)02120-X.PubMedView ArticleGoogle Scholar
- Jacobs HT: Making mitochondrial mutants. Trends Genet. 2001, 17: 653-660. 10.1016/S0168-9525(01)02480-5.PubMedView ArticleGoogle Scholar
- DiMauro S: Mitochondrial diseases. Biochim Biophys Acta. 2004, 1658: 80-88. 10.1016/j.bbabio.2004.03.014.PubMedView ArticleGoogle Scholar
- Mercer TR, Neph S, Dinger ME, Crawford J, Smith MA, Shearwood A-MJ, Haugen E, Bracken CP, Rackham O, Stamatoyannopoulos JA, Filipovska A, Mattick JS: The human mitochondrial transcriptome. Cell. 2011, 146: 645-658. 10.1016/j.cell.2011.06.051.PubMedPubMed CentralView ArticleGoogle Scholar
- Li M, Wang IX, Li Y, Bruzel A, Richards AL, Toung JM, Cheung VG: Widespread RNA and DNA sequence differences in the human transcriptome. Science. 2011, 333: 53-58. 10.1126/science.1207018.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen R, Mias GI, Li-Pook-Than J, Jiang L, Lam HYK, Chen R, Miriami E, Karczewski KJ, Hariharan M, Dewey FE, Cheng Y, Clark MJ, Im H, Habegger L, Balasubramanian S, O'Huallachain M, Dudley JT, Hillenmeyer S, Haraksingh R, Sharon D, Euskirchen G, Lacroute P, Bettinger K, Boyle AP, Kasowski M, Grubert F, Seki S, Garcia M, Whirl-Carrillo M, Gallardo M, et al: Personal omics profiling reveals dynamic molecular and medical phenotypes. Cell. 2012, 148: 1293-1307. 10.1016/j.cell.2012.02.009.PubMedPubMed CentralView ArticleGoogle Scholar
- Blankenberg D, Taylor J, Schenck I, He J, Zhang Y, Ghent M, Veeraraghavan N, Albert I, Miller W, Makova KD, Hardison RC, Nekrutenko A: A framework for collaborative analysis of ENCODE data: making large-scale analyses biologist-friendly. Genome Res. 2007, 17: 960-964. 10.1101/gr.5578007.PubMedPubMed CentralView ArticleGoogle Scholar
- Goecks J, Nekrutenko A, Taylor J: Galaxy: a comprehensive approach for supporting accessible, reproducible, and transparent computational research in the life sciences. Genome Biol. 2010, 11: R86-10.1186/gb-2010-11-8-r86.PubMedPubMed CentralView ArticleGoogle Scholar
- Afgan E, Baker D, Coraor N, Goto H, Paul IM, Makova KD, Nekrutenko A, Taylor J: Harnessing cloud computing with Galaxy Cloud. Nat Biotechnol. 2011, 29: 972-974. 10.1038/nbt.2028.PubMedView ArticleGoogle Scholar
- Introduction to Galaxy ToolShed 1. [http://vimeo.com/73458993]
- Marth GT, Korf I, Yandell MD, Yeh RT, Gu Z, Zakeri H, Stitziel NO, Hillier L, Kwok PY, Gish WR: A general approach to single-nucleotide polymorphism discovery. Nat Genet. 1999, 23: 452-456. 10.1038/70570.PubMedView ArticleGoogle Scholar
- Introduction to Galaxy ToolShed 2. [http://vimeo.com/73460697]
- Li M, Schonberg A, Schaefer M, Schroeder R, Nasidze I, Stoneking M: Detecting heteroplasmy from high-throughput sequencing of complete human mitochondrial DNA genomes. Am J Hum Genet. 2010, 87: 237-249. 10.1016/j.ajhg.2010.07.014.PubMedPubMed CentralView ArticleGoogle Scholar
- Bar-Yaacov D, Avital G, Levin L, Richards A, Hachen N, Rebolledo Jaramillo B, Nekrutenko A, Zarivach R, Mishmar D: RNA-DNA differences in human mitochondria restore ancestral form of 16S ribosomal RNA. Genome Res. 2013, 23: 1789-1796. 10.1101/gr.161265.113.PubMedPubMed CentralView ArticleGoogle Scholar
- Nekrutenko A, Taylor J: Next-generation sequencing data interpretation: enhancing reproducibility and accessibility. Nat Rev Genet. 2012, 13: 667-672.PubMedView ArticleGoogle Scholar
- Danecek P, Auton A, Abecasis G, Albers CA, Banks E, Depristo MA, Handsaker RE, Lunter G, Marth GT, Sherry ST, McVean G, Durbin R, 1000 Genomes Project Analysis Group: The variant call format and VCFtools. Bioinformatics. 2011, 27: 2156-2158. 10.1093/bioinformatics/btr330.PubMedPubMed CentralView ArticleGoogle Scholar
- Introduction to Galaxy ToolShed 3. [https://vimeo.com/73462389]
This article is published under license to BioMed Central Ltd. The licensee has exclusive rights to distribute this article, in any medium, for 12 months following its publication. After this time, the article is available under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.