Skip to main content

Table 2 Performance comparison of metagenomic annotation of reads versus contigs

From: MetAMOS: a modular and open source metagenomic assembly and analysis pipeline

    Class level (pre-propagate) Class level (post-propagate)
Dataset Assembler Run time (speedup) Number unclassified Number correctly classified Number incorrectly classified Number unclassified Number correctly classified Number incorrectly classified
mockE None 84.2 h (-) 11,116,265 3,920,471 681,801 NA NA NA
mockE SOAPdenovo_MA 33.0 h (2.6×) 634,091 14,852,561 231,885 612,517 14,874,157 231,863
mockE Velvet_MA 29.4 h (2.9×) 870,073 14,611,333 237,130 854,554 14,626,870 237,112
mockE MetaVelvet_MA 29.9 h (2.8×) 709,938 14,800,318 208,281 693,142 14,811,333 214,062
mockE MetaIDBA_MA 37.8 h (2.2×) 1,700,699 13,652,114 365,724 1,676,319 13,676,524 365,724
mockS None 167.1 h (-) 18,081,508 5,200,170 849,672 NA NA NA
mockS SOAPdenovo_MA 72.3 h (2.3×) 1,971,900 21,772,125 387,325 1,850,541 21,884,121 386,688
mockS Velvet_MA 71.8 h (2.3×) 2,392,898 21,313,998 424,454 2,250,852 21,456,487 424,011
mockS MetaVelvet_MA 54.4 h (3.1×) 2,301,985 21,449,129 380,236 2,134,599 21,614,171 382,580
mockS MetaIDBA_MA 53.8 h (3.1×) 2,576,941 21,316,513 237,896 2,210,972 21,681,036 239,342
  1. Datasets are mockE (mock Even) and mockS (mock Staggered). Representing the truth, a total of 15,718,537/22,735,802 (69.14%) sequences could be unambiguously mapped using Bowtie for the mockE dataset and 24,131,350/39,918,454 (60.45%) for the mockS dataset. Assembler: each assembler was run within MetAMOS and the output contigs were classified using FCP. In the None case, the read sequences were classified by FCP prior to assembly. Classifications of reads with no known truth were neither penalized nor rewarded. Run time: the time required to run either FCP on the reads or the Preprocessing, Assembly (for a specific assembler), Annotate and Propagate steps within MetAMOS is reported in CPU hours. The speedup factor is the FCP run time divided by the time required to perform the analysis within MetAMOS. All experiments were performed on a 64-bit Linux server equipped with eight 2.8 GHz dual-core processors and 128 GB RAM. Number unclassified, Number correctly classified, and Number incorrectly classified: total count of sequences, either unclassified, correctly classified, or incorrectly classified at the class taxonomic level. When compared to the unassembled results, classification within MetAMOS yields at least a three-fold increase in correctly classified sequences and a two-fold reduction in incorrectly classified sequences. Number unclassified, Number correctly classified, and Number incorrectly classified (post-propagate): the MetAMOS propagate step was used to transfer the annotations using the assembly graph. The total number of correctly classified sequences increases slightly in all cases, while not significantly increasing the number of incorrectly classified sequences. The full classification at each taxonomic level is given in Table S1 in Additional file 1. NA, not applicable.