MetAMOS: a metagenomic assembly and analysis pipeline for AMOS

Treangen, Todd J; Koren, Sergey; Astrovskaya, Irina; Sommer, Dan; Liu, Bo; Pop, Mihai

doi:10.1186/gb-2011-12-s1-p25

Volume 12 Supplement 1

Beyond the Genome 2011

Poster presentation
Published: 19 September 2011

MetAMOS: a metagenomic assembly and analysis pipeline for AMOS

Todd J Treangen^1,2,
Sergey Koren^1,3,
Irina Astrovskaya¹,
Dan Sommer¹,
Bo Liu^1,3 &
…
Mihai Pop^1,3

Genome Biology volume 12, Article number: P25 (2011) Cite this article

3054 Accesses
9 Citations
Metrics details

Background

Metagenomics has opened the door to unprecedented comparative and ecological studies of microbial communities, ranging from the sea [1] to the soil (the terragenome) to within the human body [2, 3]. Most analyses begin with assembly, as the short reads that are characteristic of most datasets severely limit the ability to classify the data taxonomically [4–7] and require considerable computational resources to perform comparative analyses (such as BLAST against public databases). In addition, given that many sequences are likely to be from novel organisms, classification methods relying on databases fail to acknowledge most of the novel species present in the dataset. In an attempt to move away from reference-based analysis, computational tools based on promising algorithmic and statistical methods for metagenomic de novo assembly have recently started to emerge [8, 9]. However, to date, they either are ill-suited to large datasets or have yet to offer significant improvements over existing genome assemblers that were not designed for metagenomic assembly.

Methods

Here, we describe MetAMOS [10], an open-source, modular assembly pipeline built upon AMOS and tailored specifically for metagenomic next-generation sequencing data. MetAMOS is the first step toward a fully automated assembly and analysis pipeline, from mated reads (Illumina and 454) to scaffolds and ORFs. Currently, MetAMOS has support for four assemblers (SOAPdenovo [11], Newbler, CABOG and Minimus [12]), three annotation methods (BLAST, PhymmBL and MetaPhyler), two metagenomic gene prediction tools (MetaGeneMark and Glimmer-MG) and one unitig scaffolder engineered specifically for metagenomic data (Bambus 2). We also provide a novel graph-based algorithm to propagate annotations rapidly to all contigs in an assembly using, for example, only the largest contigs or contigs with high-confidence classification. MetAMOS has three principal outputs: subdirectories containing FASTA sequence of the contigs/scaffolds/ variant motifs belonging to a specified taxonomic level, a collection of all unclassified/potentially novel contigs contained in the assembly, and an HTML report with detailed assembly statistics and summary charts.

Results and conclusions

We compared MetAMOS with other metagenomic assembly tools (Meta-IDBA and Genovo) and with genome assemblers that have previously been used with metagenomic data (CA-met and SOAPdenovo). We used both a mock/artificial dataset generated for the Human Microbiome Project (HMP) project and real metagenomic samples from the HMP and its European counterpart (MetaHIT). On the mock dataset, MetAMOS compares favorably to existing metagenomic and genomic assemblers with respect to several validation metrics that take into account contig accuracy in addition to size. On the real dataset, MetAMOS also outperforms the existing software. These improvements can largely be attributed to heavy reliance on Bambus 2 and to assembly verification techniques that help identify and remove potentially chimeric contigs while running the pipeline.

In terms of biology, we were able to report several novel variant motifs that would be challenging at best to identify and extract from the output of other methods. In addition, much emphasis was placed on making MetAMOS compatible with a variety of next-generation sequencing technologies, genome assemblers and annotation methods, making the pipeline highly customizable for the beginner and advanced bioinformatics user alike.

References

Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, et al: The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007, 5: e16-10.1371/journal.pbio.0050016.
Article PubMed PubMed Central Google Scholar
Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, et al: A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010, 464: 59-65. 10.1038/nature08821.
Article PubMed CAS PubMed Central Google Scholar
Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI: The Human Microbiome Project. Nature. 2007, 449: 804-810. 10.1038/nature06244.
Article PubMed CAS PubMed Central Google Scholar
Brady A, Salzberg SL: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods. 2009, 6: 673-676. 10.1038/nmeth.1358.
Article PubMed CAS PubMed Central Google Scholar
McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I: Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007, 4: 63-72. 10.1038/nmeth976.
Article PubMed CAS Google Scholar
Nalbantoglu OU, Way SF, Hinrichs SH, Sayood K: RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles. BMC Bioinformatics. 2011, 12: 41-10.1186/1471-2105-12-41.
Article PubMed PubMed Central Google Scholar
Patil KR, Haider P, Pope PB, Turnbaugh PJ, Morrison M, Scheffer T, McHardy AC: Taxonomic metagenome sequence assignment with structured output models. Nat Methods. 2011, 8: 191-192. 10.1038/nmeth0311-191.
Article PubMed CAS PubMed Central Google Scholar
Laserson J, Jojic V, Koller D: Genovo: de novo assembly for metagenomes. J Comput Biol. 2011, 18: 429-443. 10.1089/cmb.2010.0244.
Article PubMed CAS Google Scholar
Peng Y, Leung HC, Yiu SM, Chin FY: Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics. 2011, 27: i94-i101. 10.1093/bioinformatics/btr216.
Article PubMed CAS PubMed Central Google Scholar
MetAMOS Source Code. [https://github.com/treangen/metAMOS]
Li Y, Hu Y, Bolund L, Wang J: State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum Genomics. 2010, 4: 271-277.
Article PubMed CAS PubMed Central Google Scholar
Sommer DD, Delcher AL, Salzberg SL, Pop M: Minimus: a fast, lightweight genome assembler. BMC Bioinformatics. 2007, 8: 64-10.1186/1471-2105-8-64.
Article PubMed PubMed Central Google Scholar

Download references

Author information

Authors and Affiliations

Center for Bioinformatics and Computational Biology, University of Maryland, College Park, MD, 20742, USA
Todd J Treangen, Sergey Koren, Irina Astrovskaya, Dan Sommer, Bo Liu & Mihai Pop
The McKusick-Nathans Institute for Genetic Medicine, The Johns Hopkins University School of Medicine, Baltimore, MD, 21205, USA
Todd J Treangen
Department of Computer Science, University of Maryland, College Park, MD, 20742, USA
Sergey Koren, Bo Liu & Mihai Pop

Authors

Todd J Treangen
View author publications
You can also search for this author in PubMed Google Scholar
Sergey Koren
View author publications
You can also search for this author in PubMed Google Scholar
Irina Astrovskaya
View author publications
You can also search for this author in PubMed Google Scholar
Dan Sommer
View author publications
You can also search for this author in PubMed Google Scholar
Bo Liu
View author publications
You can also search for this author in PubMed Google Scholar
Mihai Pop
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Treangen, T.J., Koren, S., Astrovskaya, I. et al. MetAMOS: a metagenomic assembly and analysis pipeline for AMOS. Genome Biol 12 (Suppl 1), P25 (2011). https://doi.org/10.1186/gb-2011-12-s1-p25

Download citation

Published: 19 September 2011
DOI: https://doi.org/10.1186/gb-2011-12-s1-p25

Beyond the Genome 2011

MetAMOS: a metagenomic assembly and analysis pipeline for AMOS

Background

Methods

Results and conclusions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Keywords

Genome Biology

Contact us

Beyond the Genome 2011

MetAMOS: a metagenomic assembly and analysis pipeline for AMOS

Background

Methods

Results and conclusions

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us