MetAMOS: a metagenomic assembly and analysis pipeline for AMOS
© Treangen et al; licensee BioMed Central Ltd. 2011
Published: 19 September 2011
Metagenomics has opened the door to unprecedented comparative and ecological studies of microbial communities, ranging from the sea  to the soil (the terragenome) to within the human body [2, 3]. Most analyses begin with assembly, as the short reads that are characteristic of most datasets severely limit the ability to classify the data taxonomically [4–7] and require considerable computational resources to perform comparative analyses (such as BLAST against public databases). In addition, given that many sequences are likely to be from novel organisms, classification methods relying on databases fail to acknowledge most of the novel species present in the dataset. In an attempt to move away from reference-based analysis, computational tools based on promising algorithmic and statistical methods for metagenomic de novo assembly have recently started to emerge [8, 9]. However, to date, they either are ill-suited to large datasets or have yet to offer significant improvements over existing genome assemblers that were not designed for metagenomic assembly.
Here, we describe MetAMOS , an open-source, modular assembly pipeline built upon AMOS and tailored specifically for metagenomic next-generation sequencing data. MetAMOS is the first step toward a fully automated assembly and analysis pipeline, from mated reads (Illumina and 454) to scaffolds and ORFs. Currently, MetAMOS has support for four assemblers (SOAPdenovo , Newbler, CABOG and Minimus ), three annotation methods (BLAST, PhymmBL and MetaPhyler), two metagenomic gene prediction tools (MetaGeneMark and Glimmer-MG) and one unitig scaffolder engineered specifically for metagenomic data (Bambus 2). We also provide a novel graph-based algorithm to propagate annotations rapidly to all contigs in an assembly using, for example, only the largest contigs or contigs with high-confidence classification. MetAMOS has three principal outputs: subdirectories containing FASTA sequence of the contigs/scaffolds/ variant motifs belonging to a specified taxonomic level, a collection of all unclassified/potentially novel contigs contained in the assembly, and an HTML report with detailed assembly statistics and summary charts.
Results and conclusions
We compared MetAMOS with other metagenomic assembly tools (Meta-IDBA and Genovo) and with genome assemblers that have previously been used with metagenomic data (CA-met and SOAPdenovo). We used both a mock/artificial dataset generated for the Human Microbiome Project (HMP) project and real metagenomic samples from the HMP and its European counterpart (MetaHIT). On the mock dataset, MetAMOS compares favorably to existing metagenomic and genomic assemblers with respect to several validation metrics that take into account contig accuracy in addition to size. On the real dataset, MetAMOS also outperforms the existing software. These improvements can largely be attributed to heavy reliance on Bambus 2 and to assembly verification techniques that help identify and remove potentially chimeric contigs while running the pipeline.
In terms of biology, we were able to report several novel variant motifs that would be challenging at best to identify and extract from the output of other methods. In addition, much emphasis was placed on making MetAMOS compatible with a variety of next-generation sequencing technologies, genome assemblers and annotation methods, making the pipeline highly customizable for the beginner and advanced bioinformatics user alike.
- Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, et al: The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007, 5: e16-10.1371/journal.pbio.0050016.PubMedPubMed CentralView ArticleGoogle Scholar
- Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, et al: A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010, 464: 59-65. 10.1038/nature08821.PubMedPubMed CentralView ArticleGoogle Scholar
- Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI: The Human Microbiome Project. Nature. 2007, 449: 804-810. 10.1038/nature06244.PubMedPubMed CentralView ArticleGoogle Scholar
- Brady A, Salzberg SL: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods. 2009, 6: 673-676. 10.1038/nmeth.1358.PubMedPubMed CentralView ArticleGoogle Scholar
- McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I: Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007, 4: 63-72. 10.1038/nmeth976.PubMedView ArticleGoogle Scholar
- Nalbantoglu OU, Way SF, Hinrichs SH, Sayood K: RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles. BMC Bioinformatics. 2011, 12: 41-10.1186/1471-2105-12-41.PubMedPubMed CentralView ArticleGoogle Scholar
- Patil KR, Haider P, Pope PB, Turnbaugh PJ, Morrison M, Scheffer T, McHardy AC: Taxonomic metagenome sequence assignment with structured output models. Nat Methods. 2011, 8: 191-192. 10.1038/nmeth0311-191.PubMedPubMed CentralView ArticleGoogle Scholar
- Laserson J, Jojic V, Koller D: Genovo: de novo assembly for metagenomes. J Comput Biol. 2011, 18: 429-443. 10.1089/cmb.2010.0244.PubMedView ArticleGoogle Scholar
- Peng Y, Leung HC, Yiu SM, Chin FY: Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics. 2011, 27: i94-i101. 10.1093/bioinformatics/btr216.PubMedPubMed CentralView ArticleGoogle Scholar
- MetAMOS Source Code. [https://github.com/treangen/metAMOS]
- Li Y, Hu Y, Bolund L, Wang J: State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum Genomics. 2010, 4: 271-277.PubMedPubMed CentralView ArticleGoogle Scholar
- Sommer DD, Delcher AL, Salzberg SL, Pop M: Minimus: a fast, lightweight genome assembler. BMC Bioinformatics. 2007, 8: 64-10.1186/1471-2105-8-64.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.