Volume 12 Supplement 1

Beyond the Genome 2011

Open Access

ScaffViz: visualizing metagenome assemblies

  • Sergey Koren1, 2,
  • Todd Treangen2, 3 and
  • Mihai Pop1, 2
Genome Biology201112(Suppl 1):P8

https://doi.org/10.1186/gb-2011-12-s1-p8

Published: 19 September 2011

Background

Metagenomics has allowed the study of a wide range of microbial communities, from those within the sea [1, 2] to those of the human body [3]. Increasingly, de novo assembly is the first step in the analysis of these metagenomic samples. As the targets have increased in complexity, computational tools have started to emerge [4, 5] to address the challenges presented by the assembly of these datasets. Although the targets and analyses have become more complex, the means of presenting the results has remained the same: a multi-FASTA text file. This presentation hides the variation that is present in the sampled biological community. The ability to navigate and view the complexity of a genomic sample may help drive novel biological insights. Here, we present a graphical visualization tool that allows the visual inspection of genome assembly graphs and the characterization of the genomic variation that is present in these graphs (that is, the differences between two or more related haplotypes commonly found in metagenomes or higher eukaryotes).

Methods

Our software, ScaffViz [6], is open source and was developed as a plug-in for the Cytoscape graph viewer package [7, 8]. Our assembly view represents assembly metadata within node/edge attributes. For example, node height corresponds to coverage (the amount of oversampling of a sequence), and node width is proportional to the length of the sequence. We support assemblies from Celera Assembler [9], Newbler [10], Bambus 2 and MetAMOS. The creation and initialization of Cytoscape objects is abstracted to allow a developer to easily add new assembly result formats without knowledge of Cytoscape’s API. We developed a layout algorithm based on information from the assembler on node position, orientation and length. ScaffViz allows users to show (or hide) an arbitrary subset of nodes. The viewer can also output genome sequence that corresponds to any subset of the graph, including all alternative sequences present in all selected subpaths. We believe that this representation may prove to be instrumental in finding and characterizing structural variants such as alternative genes, alternative regulatory units or mobile genomic elements.

Results

We evaluated the performance of ScaffViz on seven datasets of varying size and complexity. We report that the run time is approximately linear with respect to the number of elements in the graph (nodes + edges). The memory scales linearly with respect to the number of nodes. Extrapolating from these factors, a graph of 250,000 contigs can be opened in approximately 2 minutes using approximately 2.5 GB of memory. ScaffViz is scalable to large graphs and can be run on a laptop.

Conclusions

We have developed a novel open-source assembly graph viewer, ScaffViz, as a plug-in for Cytoscape. ScaffViz supports the output of several popular assembly programs and is scalable to large metagenomic assemblies on a laptop.

Authors’ Affiliations

(1)
Department of Computer Science, University of Maryland
(2)
Center for Bioinformatics and Computational Biology, University of Maryland
(3)
The McKusick-Nathans Institute for Genetic Medicine, The Johns Hopkins University School of Medicine

References

  1. Venter J, Remington K, Heidelberg JF, Halpern AL, Rusch D, Eisen JA, Wu D, Paulsen I, Nelson KE, Nelson W, Fouts DE, Levy S, Knap AH, Lomas MW, Nealson K, White O, Peterson J, Hoffman J, Parsons R, Baden-Tillson H, Pfannkoch C, Rogers YH, Smith HO: Environmental genome shotgun sequencing of the Sargasso Sea. Science. 2004, 304: 66-74. 10.1126/science.1093857.PubMedView ArticleGoogle Scholar
  2. Rusch DB, Halpern AL, Sutton G, Heidelberg KB, Williamson S, Yooseph S, Wu D, Eisen JA, Hoffman JM, Remington K, Beeson K, Tran B, Smith H, Baden-Tillson H, Stewart C, Thorpe J, Freeman J, Andrews-Pfannkoch C, Venter JE, Li K, Kravitz S, Heidelberg JF, Utterback T, Rogers YH, Falcón LI, Souza V, Bonilla-Rosso G, Eguiarte LE, Karl DM, Sathyendranath S, et al: The Sorcerer II Global Ocean Sampling expedition: northwest Atlantic through eastern tropical Pacific. PLoS Biol. 2007, 5: e77-10.1371/journal.pbio.0050077.PubMedPubMed CentralView ArticleGoogle Scholar
  3. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, et al: A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010, 464: 59-65. 10.1038/nature08821.PubMedPubMed CentralView ArticleGoogle Scholar
  4. Laserson J, Jojic V, Koller D: Genovo: de novo assembly for metagenomes. J Comput Biol. 2011, 18: 429-443. 10.1089/cmb.2010.0244.PubMedView ArticleGoogle Scholar
  5. Peng Y, Leung HC, Yiu SM, Chin FY: Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics. 2011, 27: i94-i101. 10.1093/bioinformatics/btr216.PubMedPubMed CentralView ArticleGoogle Scholar
  6. ScaffViz Project. [http://code.google.com/p/scaffold-viewer/]
  7. Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.PubMedPubMed CentralView ArticleGoogle Scholar
  8. Smoot M, Ono K, Ruscheinski J, Wang P, Ideker T: Cytoscape 2.8: new features for data integration and network visualization. Bioinformatics. 2011, 27: 431-432. 10.1093/bioinformatics/btq675.PubMedPubMed CentralView ArticleGoogle Scholar
  9. Miller JR, Delcher AL, Koren S, Venter E, Walenz BP, Brownley A, Johnson J, Li K, Mobarry C, Sutton G: Aggressive assembly of pyrosequencing reads with mates. Bioinformatics. 2008, 24: 2818-2824. 10.1093/bioinformatics/btn548.PubMedPubMed CentralView ArticleGoogle Scholar
  10. Margulies M, Egholm M, Altman WE, Attiya S, Bader JS, Bemben LA, Berka J, Braverman MS, Chen YJ, Chen Z, Dewell SB, Du L, Fierro JM, Gomes XV, Godwin BC, He W, Helgesen S, Ho CH, Irzyk GP, Jando SC, Alenquer ML, Jarvie TP, Jirage KB, Kim JB, Knight JR, Lanza JR, Leamon JH, Lefkowitz SM, Lei M, Li J, et al: Genome sequencing in microfabricated high-density picolitre reactors. Nature. 2005, 437: 376-380.PubMedPubMed CentralGoogle Scholar

Copyright

© Koren et al; licensee BioMed Central Ltd. 2011

This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Advertisement