Skip to main content

Table 1 Comparison of assembly statistics

From: MetAMOS: a modular and open source metagenomic assembly and analysis pipeline

Dataset Assembler #ctgs/scfs Good Ctgs/scfs Total aln (Mbp) Slt Hvy Ch Size @ 10 Mbp #@ 10 Mbp Max ctg size Err per Mbp
mockE SOAPdenovo 63,014 99.3% 51 167 131 1 28,208 195 249,819 5.9
mockE SOAPdenovo_MA 63,107 99.3% 51 166 131 1 28,208 195 249,819 5.8
mockE Velvet 12,381 96.0% 41 269 106 2 46,122 128 183,815 9.2
mockE Velvet_MA 12,830 96.2% 41 256 100 2 42,269 137 179,673 8.7
mockE MetaVelvet 23,323 96.7% 49 474 160 5 62,131 93 367,458 13.0
mockE MetaVelvet_MA 22,772 96.8% 49 462 156 4 62,138 91 367,458 12.7
mockE Meta-IDBA 22,064 95.3% 47 362 151 3 26,141 223 249,069 11.0
mockE Meta-IDBA_MA 22,032 95.4% 47 362 151 3 26,141 223 249,069 11.0
mockS SOAPdenovo 45,251 98.8% 28 135 99 0 5,672 626 186,064 8.4
mockS SOAPdenovo_MA 44,928 98.8% 28 135 98 0 5,672 626 186,064 8.3
mockS Velvet 20,981 95.6% 28 498 127 1 6,134 770 119,120 22.4
mockS Velvet_MA 21,050 95.8% 28 485 115 1 6,060 775 119,120 21.5
mockS MetaVelvet 19,649 94.5% 28 518 158 2 13,028 351 217,330 24.2
mockS MetaVelvet_MA 20,551 95.3% 28 517 143 3 6,685 622 217,330 20.1
mockS Meta-IDBA 4,573 92.3% 18 101 83 0 13,150 368 119,604 10.2
mockS Meta-IDBA_MA 4,559 92.5% 18 101 83 0 13,150 368 119,604 10.2
HMP SOAPdenovo 39,028 89.9% 11 1,138 2,686 0 9,881 514 116,204 347.6
HMP SOAPdenovo_MA 35,230 89.1% 11 1,138 2,618 0 11,359 426 238,051 341.5
HMP Meta-IDBA 25,861 88.9% 7 718 2,102 0 4,215 1144 59,188 402.8
HMP Meta-IDBA_MA 25,698 88.7% 7 710 2,087 0 4,215 1144 59,188 399.6
HMPscf SOAPdenovo 31,673 99.9% 11 - - 10 9,906 510 116,181 0.9
HMPscf SOAPdenovo_MA 27,231 99.9% 11 - - 10 11,359 426 238,051 0.9
HMPscf Meta-IDBA 20,352 99.9% 7 - - 10 4,946 939 59,188 1.4
HMPscf Meta-IDBA_MA 22,886 99.9% 7 - - 9 22,304 238 66,401 1.3
  1. Datasets are mockE (mock Even), mockS (mock Staggered), HMP (Tongue dorsum, contig-level analysis), HMPscf (Tongue dorsum, scaffold-level analysis). All analyses other than HMPscf were done at the contig level. If necessary, contigs were extracted from scaffolds by splitting at three consecutive Ns. Assemblers with suffix _MA indicate the results produced by running MetAMOS on contigs produced by the corresponding assembler. #ctgs/scfs: total number of contigs/scaffolds in the assembly. Good Ctgs/scfs: fraction of contigs/scaffolds that mapped without errors to reference genomes. For the HMP dataset (Tongue dorsum contigs) alignments were only made to a small set of genomes estimated by the HMP project to match the genomes in this sample. For the HMPscf dataset good scaffolds are those without chimeric errors. Total Aln: total amount of sequence that can be aligned to the reference genomes (in Mbp). Slt: slight mis-assemblies determined by alignments that cover 80% or more of the aligned contig in a single match. Hvy: heavy misassemblies determined by alignments that cover less than 80% of the aligned contig in a single match or have two or more matches to a single reference. Ch: Chimeras are contigs with matches to two distinct reference genomes. Neither heavy mis-assemblies nor chimeras count towards reference coverage. Size @ 10 Mbp: the size of the largest contig c such that the sum of all contigs larger than c is more than 10 Mbp (similar to the commonly used N50 size). #@ 10 Mbp: smallest number of contigs whose cumulative size adds up to more than 10 Mbp. Max ctg size: size of the largest contig in the assembly. Err per Mbp: average number of errors per Mbp. Numbers in bold represent the best value for the specific dataset.