- Poster presentation
- Open Access
- Published:
ScaffViz: visualizing metagenome assemblies
Genome Biology volume 12, Article number: P8 (2011)
Background
Metagenomics has allowed the study of a wide range of microbial communities, from those within the sea [1, 2] to those of the human body [3]. Increasingly, de novo assembly is the first step in the analysis of these metagenomic samples. As the targets have increased in complexity, computational tools have started to emerge [4, 5] to address the challenges presented by the assembly of these datasets. Although the targets and analyses have become more complex, the means of presenting the results has remained the same: a multi-FASTA text file. This presentation hides the variation that is present in the sampled biological community. The ability to navigate and view the complexity of a genomic sample may help drive novel biological insights. Here, we present a graphical visualization tool that allows the visual inspection of genome assembly graphs and the characterization of the genomic variation that is present in these graphs (that is, the differences between two or more related haplotypes commonly found in metagenomes or higher eukaryotes).
Methods
Our software, ScaffViz [6], is open source and was developed as a plug-in for the Cytoscape graph viewer package [7, 8]. Our assembly view represents assembly metadata within node/edge attributes. For example, node height corresponds to coverage (the amount of oversampling of a sequence), and node width is proportional to the length of the sequence. We support assemblies from Celera Assembler [9], Newbler [10], Bambus 2 and MetAMOS. The creation and initialization of Cytoscape objects is abstracted to allow a developer to easily add new assembly result formats without knowledge of Cytoscape’s API. We developed a layout algorithm based on information from the assembler on node position, orientation and length. ScaffViz allows users to show (or hide) an arbitrary subset of nodes. The viewer can also output genome sequence that corresponds to any subset of the graph, including all alternative sequences present in all selected subpaths. We believe that this representation may prove to be instrumental in finding and characterizing structural variants such as alternative genes, alternative regulatory units or mobile genomic elements.
Results
We evaluated the performance of ScaffViz on seven datasets of varying size and complexity. We report that the run time is approximately linear with respect to the number of elements in the graph (nodes + edges). The memory scales linearly with respect to the number of nodes. Extrapolating from these factors, a graph of 250,000 contigs can be opened in approximately 2 minutes using approximately 2.5 GB of memory. ScaffViz is scalable to large graphs and can be run on a laptop.
Conclusions
We have developed a novel open-source assembly graph viewer, ScaffViz, as a plug-in for Cytoscape. ScaffViz supports the output of several popular assembly programs and is scalable to large metagenomic assemblies on a laptop.
References
Venter J, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO: Environmental genome shotgun sequencing of the Sargasso Sea.Science 2004, 304:66–74.
Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers YH, Falcón LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, et al.: TheSorcerer IIGlobal Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific.PLoS Biol 2007, 5:e77.
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, et al.: A human gut microbial gene catalogue established by metagenomic sequencing.Nature 2010, 464:59–65.
Laserson J, Jojic V, Koller D: Genovo:de novoassembly for metagenomes.J Comput Biol 2011, 18:429–443.
Peng Y, Leung HC, Yiu SM, Chin FY: Meta-IDBA: ade novoassembler for metagenomic data.Bioinformatics 2011, 27:i94-i101.
ScaffViz Project [http://code.google.com/p/scaffold-viewer/]
Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks.Genome Res 2003, 13:2498–2504.
Smoot M, Ono K, Ruscheinski J, Wang P, Ideker T: Cytoscape 2.8: new features for data integration and network visualization.Bioinformatics 2011, 27:431–432.
Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, Johnson J, Li K, Mobarry C, Sutton G: Aggressive assembly of pyrosequencing reads with mates.Bioinformatics 2008, 24:2818–2824.
Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al.: Genome sequencing in microfabricated high-density picolitre reactors.Nature 2005, 437:376–380.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Koren, S., Treangen, T. & Pop, M. ScaffViz: visualizing metagenome assemblies. Genome Biol 12 (Suppl 1), P8 (2011). https://doi.org/10.1186/1465-6906-12-S1-P8
Published:
DOI: https://doi.org/10.1186/1465-6906-12-S1-P8
Keywords
- Graphical Visualization
- Alternative Sequence
- Viewer Package
- Graph Viewer
- Assembly Result