MetAMOS: a modular and open source metagenomic assembly and analysis pipeline

Treangen, Todd J; Koren, Sergey; Sommer, Daniel D; Liu, Bo; Astrovskaya, Irina; Ondov, Brian; Darling, Aaron E; Phillippy, Adam M; Pop, Mihai

doi:10.1186/gb-2013-14-1-r2

Table 2 Performance comparison of metagenomic annotation of reads versus contigs

From: MetAMOS: a modular and open source metagenomic assembly and analysis pipeline

			Class level (pre-propagate)			Class level (post-propagate)
Dataset	Assembler	Run time (speedup)	Number unclassified	Number correctly classified	Number incorrectly classified	Number unclassified	Number correctly classified	Number incorrectly classified
mockE	None	84.2 h (-)	11,116,265	3,920,471	681,801	NA	NA	NA
mockE	SOAPdenovo_MA	33.0 h (2.6×)	634,091	14,852,561	231,885	612,517	14,874,157	231,863
mockE	Velvet_MA	29.4 h (2.9×)	870,073	14,611,333	237,130	854,554	14,626,870	237,112
mockE	MetaVelvet_MA	29.9 h (2.8×)	709,938	14,800,318	208,281	693,142	14,811,333	214,062
mockE	MetaIDBA_MA	37.8 h (2.2×)	1,700,699	13,652,114	365,724	1,676,319	13,676,524	365,724
mockS	None	167.1 h (-)	18,081,508	5,200,170	849,672	NA	NA	NA
mockS	SOAPdenovo_MA	72.3 h (2.3×)	1,971,900	21,772,125	387,325	1,850,541	21,884,121	386,688
mockS	Velvet_MA	71.8 h (2.3×)	2,392,898	21,313,998	424,454	2,250,852	21,456,487	424,011
mockS	MetaVelvet_MA	54.4 h (3.1×)	2,301,985	21,449,129	380,236	2,134,599	21,614,171	382,580
mockS	MetaIDBA_MA	53.8 h (3.1×)	2,576,941	21,316,513	237,896	2,210,972	21,681,036	239,342

Datasets are mockE (mock Even) and mockS (mock Staggered). Representing the truth, a total of 15,718,537/22,735,802 (69.14%) sequences could be unambiguously mapped using Bowtie for the mockE dataset and 24,131,350/39,918,454 (60.45%) for the mockS dataset. Assembler: each assembler was run within MetAMOS and the output contigs were classified using FCP. In the None case, the read sequences were classified by FCP prior to assembly. Classifications of reads with no known truth were neither penalized nor rewarded. Run time: the time required to run either FCP on the reads or the Preprocessing, Assembly (for a specific assembler), Annotate and Propagate steps within MetAMOS is reported in CPU hours. The speedup factor is the FCP run time divided by the time required to perform the analysis within MetAMOS. All experiments were performed on a 64-bit Linux server equipped with eight 2.8 GHz dual-core processors and 128 GB RAM. Number unclassified, Number correctly classified, and Number incorrectly classified: total count of sequences, either unclassified, correctly classified, or incorrectly classified at the class taxonomic level. When compared to the unassembled results, classification within MetAMOS yields at least a three-fold increase in correctly classified sequences and a two-fold reduction in incorrectly classified sequences. Number unclassified, Number correctly classified, and Number incorrectly classified (post-propagate): the MetAMOS propagate step was used to transfer the annotations using the assembly graph. The total number of correctly classified sequences increases slightly in all cases, while not significantly increasing the number of incorrectly classified sequences. The full classification at each taxonomic level is given in Table S1 in Additional file 1. NA, not applicable.

Back to article page

ISSN: 1474-760X

Contact us

Submission enquiries: editorial@genomebiology.com
General enquiries: info@biomedcentral.com

Genome Biology

Contact us