Skip to main content

Volume 12 Supplement 1

Beyond the Genome 2011

  • Poster presentation
  • Published:

MetAMOS: a metagenomic assembly and analysis pipeline for AMOS

Background

Metagenomics has opened the door to unprecedented comparative and ecological studies of microbial communities, ranging from the sea [1] to the soil (the terragenome) to within the human body [2, 3]. Most analyses begin with assembly, as the short reads that are characteristic of most datasets severely limit the ability to classify the data taxonomically [4–7] and require considerable computational resources to perform comparative analyses (such as BLAST against public databases). In addition, given that many sequences are likely to be from novel organisms, classification methods relying on databases fail to acknowledge most of the novel species present in the dataset. In an attempt to move away from reference-based analysis, computational tools based on promising algorithmic and statistical methods for metagenomic de novo assembly have recently started to emerge [8, 9]. However, to date, they either are ill-suited to large datasets or have yet to offer significant improvements over existing genome assemblers that were not designed for metagenomic assembly.

Methods

Here, we describe MetAMOS [10], an open-source, modular assembly pipeline built upon AMOS and tailored specifically for metagenomic next-generation sequencing data. MetAMOS is the first step toward a fully automated assembly and analysis pipeline, from mated reads (Illumina and 454) to scaffolds and ORFs. Currently, MetAMOS has support for four assemblers (SOAPdenovo [11], Newbler, CABOG and Minimus [12]), three annotation methods (BLAST, PhymmBL and MetaPhyler), two metagenomic gene prediction tools (MetaGeneMark and Glimmer-MG) and one unitig scaffolder engineered specifically for metagenomic data (Bambus 2). We also provide a novel graph-based algorithm to propagate annotations rapidly to all contigs in an assembly using, for example, only the largest contigs or contigs with high-confidence classification. MetAMOS has three principal outputs: subdirectories containing FASTA sequence of the contigs/scaffolds/ variant motifs belonging to a specified taxonomic level, a collection of all unclassified/potentially novel contigs contained in the assembly, and an HTML report with detailed assembly statistics and summary charts.

Results and conclusions

We compared MetAMOS with other metagenomic assembly tools (Meta-IDBA and Genovo) and with genome assemblers that have previously been used with metagenomic data (CA-met and SOAPdenovo). We used both a mock/artificial dataset generated for the Human Microbiome Project (HMP) project and real metagenomic samples from the HMP and its European counterpart (MetaHIT). On the mock dataset, MetAMOS compares favorably to existing metagenomic and genomic assemblers with respect to several validation metrics that take into account contig accuracy in addition to size. On the real dataset, MetAMOS also outperforms the existing software. These improvements can largely be attributed to heavy reliance on Bambus 2 and to assembly verification techniques that help identify and remove potentially chimeric contigs while running the pipeline.

In terms of biology, we were able to report several novel variant motifs that would be challenging at best to identify and extract from the output of other methods. In addition, much emphasis was placed on making MetAMOS compatible with a variety of next-generation sequencing technologies, genome assemblers and annotation methods, making the pipeline highly customizable for the beginner and advanced bioinformatics user alike.

References

  1. Yooseph S, Sutton G, Rusch DB, Halpern AL, Williamson SJ, Remington K, Eisen JA, Heidelberg KB, Manning G, Li W, Jaroszewski L, Cieplak P, Miller CS, Li H, Mashiyama ST, Joachimiak MP, van Belle C, Chandonia JM, Soergel DA, Zhai Y, Natarajan K, Lee S, Raphael BJ, Bafna V, Friedman R, Brenner SE, Godzik A, Eisenberg D, Dixon JE, Taylor SS, et al: The Sorcerer II Global Ocean Sampling expedition: expanding the universe of protein families. PLoS Biol. 2007, 5: e16-10.1371/journal.pbio.0050016.

    Article  PubMed  PubMed Central  Google Scholar 

  2. Qin J, Li R, Raes J, Arumugam M, Burgdorf KS, Manichanh C, Nielsen T, Pons N, Levenez F, Yamada T, Mende DR, Li J, Xu J, Li S, Li D, Cao J, Wang B, Liang H, Zheng H, Xie Y, Tap J, Lepage P, Bertalan M, Batto JM, Hansen T, Le Paslier D, Linneberg A, Nielsen HB, Pelletier E, Renault P, et al: A human gut microbial gene catalogue established by metagenomic sequencing. Nature. 2010, 464: 59-65. 10.1038/nature08821.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  3. Turnbaugh PJ, Ley RE, Hamady M, Fraser-Liggett CM, Knight R, Gordon JI: The Human Microbiome Project. Nature. 2007, 449: 804-810. 10.1038/nature06244.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  4. Brady A, Salzberg SL: Phymm and PhymmBL: metagenomic phylogenetic classification with interpolated Markov models. Nat Methods. 2009, 6: 673-676. 10.1038/nmeth.1358.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  5. McHardy AC, Martin HG, Tsirigos A, Hugenholtz P, Rigoutsos I: Accurate phylogenetic classification of variable-length DNA fragments. Nat Methods. 2007, 4: 63-72. 10.1038/nmeth976.

    Article  PubMed  CAS  Google Scholar 

  6. Nalbantoglu OU, Way SF, Hinrichs SH, Sayood K: RAIphy: phylogenetic classification of metagenomics samples using iterative refinement of relative abundance index profiles. BMC Bioinformatics. 2011, 12: 41-10.1186/1471-2105-12-41.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Patil KR, Haider P, Pope PB, Turnbaugh PJ, Morrison M, Scheffer T, McHardy AC: Taxonomic metagenome sequence assignment with structured output models. Nat Methods. 2011, 8: 191-192. 10.1038/nmeth0311-191.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  8. Laserson J, Jojic V, Koller D: Genovo: de novo assembly for metagenomes. J Comput Biol. 2011, 18: 429-443. 10.1089/cmb.2010.0244.

    Article  PubMed  CAS  Google Scholar 

  9. Peng Y, Leung HC, Yiu SM, Chin FY: Meta-IDBA: a de novo assembler for metagenomic data. Bioinformatics. 2011, 27: i94-i101. 10.1093/bioinformatics/btr216.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  10. MetAMOS Source Code. [https://github.com/treangen/metAMOS]

  11. Li Y, Hu Y, Bolund L, Wang J: State of the art de novo assembly of human genomes from massively parallel sequencing data. Hum Genomics. 2010, 4: 271-277.

    Article  PubMed  CAS  PubMed Central  Google Scholar 

  12. Sommer DD, Delcher AL, Salzberg SL, Pop M: Minimus: a fast, lightweight genome assembler. BMC Bioinformatics. 2007, 8: 64-10.1186/1471-2105-8-64.

    Article  PubMed  PubMed Central  Google Scholar 

Download references

Author information

Authors and Affiliations

Authors

Rights and permissions

Reprints and permissions

About this article

Cite this article

Treangen, T.J., Koren, S., Astrovskaya, I. et al. MetAMOS: a metagenomic assembly and analysis pipeline for AMOS. Genome Biol 12 (Suppl 1), P25 (2011). https://doi.org/10.1186/gb-2011-12-s1-p25

Download citation

  • Published:

  • DOI: https://doi.org/10.1186/gb-2011-12-s1-p25

Keywords