ScaffViz: visualizing metagenome assemblies
© BioMed Central Ltd 2011
Published: 19 September 2011
Metagenomics has allowed the study of a wide range of microbial communities, from those within the sea [1, 2] to those of the human body . Increasingly, de novo assembly is the first step in the analysis of these metagenomic samples. As the targets have increased in complexity, computational tools have started to emerge [4, 5] to address the challenges presented by the assembly of these datasets. Although the targets and analyses have become more complex, the means of presenting the results has remained the same: a multi-FASTA text file. This presentation hides the variation that is present in the sampled biological community. The ability to navigate and view the complexity of a genomic sample may help drive novel biological insights. Here, we present a graphical visualization tool that allows the visual inspection of genome assembly graphs and the characterization of the genomic variation that is present in these graphs (that is, the differences between two or more related haplotypes commonly found in metagenomes or higher eukaryotes).
Our software, ScaffViz , is open source and was developed as a plug-in for the Cytoscape graph viewer package [7, 8]. Our assembly view represents assembly metadata within node/edge attributes. For example, node height corresponds to coverage (the amount of oversampling of a sequence), and node width is proportional to the length of the sequence. We support assemblies from Celera Assembler , Newbler , Bambus 2 and MetAMOS. The creation and initialization of Cytoscape objects is abstracted to allow a developer to easily add new assembly result formats without knowledge of Cytoscape’s API. We developed a layout algorithm based on information from the assembler on node position, orientation and length. ScaffViz allows users to show (or hide) an arbitrary subset of nodes. The viewer can also output genome sequence that corresponds to any subset of the graph, including all alternative sequences present in all selected subpaths. We believe that this representation may prove to be instrumental in finding and characterizing structural variants such as alternative genes, alternative regulatory units or mobile genomic elements.
We evaluated the performance of ScaffViz on seven datasets of varying size and complexity. We report that the run time is approximately linear with respect to the number of elements in the graph (nodes + edges). The memory scales linearly with respect to the number of nodes. Extrapolating from these factors, a graph of 250,000 contigs can be opened in approximately 2 minutes using approximately 2.5 GB of memory. ScaffViz is scalable to large graphs and can be run on a laptop.
We have developed a novel open-source assembly graph viewer, ScaffViz, as a plug-in for Cytoscape. ScaffViz supports the output of several popular assembly programs and is scalable to large metagenomic assemblies on a laptop.
- Venter J, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO: Environmental genome shotgun sequencing of the Sargasso Sea.Science 2004, 304:66–74.PubMedView ArticleGoogle Scholar
- Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers YH, Falcón LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, et al.: TheSorcerer IIGlobal Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.PLoS Biol 2007, 5:e77.PubMedPubMed CentralView ArticleGoogle Scholar
- Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, et al.: A human gut microbial gene catalogue established by metagenomic sequencing.Nature 2010, 464:59–65.PubMedPubMed CentralView ArticleGoogle Scholar
- Laserson J, Jojic V, Koller D: Genovo:de novoassembly for metagenomes.J Comput Biol 2011, 18:429–443.PubMedView ArticleGoogle Scholar
- Peng Y, Leung HC, Yiu SM, Chin FY: Meta-IDBA: ade novoassembler for metagenomic data.Bioinformatics 2011, 27:i94-i101.PubMedPubMed CentralView ArticleGoogle Scholar
- ScaffViz Project [http://code.google.com/p/scaffold-viewer/]
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks.Genome Res 2003, 13:2498–2504.PubMedPubMed CentralView ArticleGoogle Scholar
- Smoot M, Ono K, Ruscheinski J, Wang P, Ideker T: Cytoscape 2.8: new features for data integration and network visualization.Bioinformatics 2011, 27:431–432.PubMedPubMed CentralView ArticleGoogle Scholar
- Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, Johnson J, Li K, Mobarry C, Sutton G: Aggressive assembly of pyrosequencing reads with mates.Bioinformatics 2008, 24:2818–2824.PubMedPubMed CentralView ArticleGoogle Scholar
- Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al.: Genome sequencing in microfabricated high-density picolitre reactors.Nature 2005, 437:376–380.PubMedPubMed CentralGoogle Scholar