Skip to main content

Advertisement

Table 1 Core-genome SNP accuracy for simulated E. coli datasets

From: The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes

Method Description a FP Low FN Low FP Med FN Med FP High FN High TPR FDR
Mauve WGA 148 318 198 2,877 100 30,378 0.974 0.0004
Mauve (c) WGA 0 0 2 38 6 649 0.999 0
Mugsy WGA 1,261b 395 1,928 3,371 1,335 34,923 0.970 0.0036
Mugsy (c) WGA 2 0 2 0 1 81 0.999 0
Parsnp CGA 23 423 45 3,494 7 35,466 0.970 0.0001
Parsnp (c) CGA 0 24 0 603 0 10,989 0.992 0
kSNP KMER 259 600 908 19,730 1,968 916,127 0.280 0.0086
Smalt MAP 33 110 0 1,307 55 22,957 0.981 0.0001
BWA MAP 0 168 16 1,947 27 27,091 0.9775 0.0000
  1. Data shown indicates performance metrics of the evaluated methods on the three simulated E. coli datasets (low, medium, and high). Method: Tool used.
  2. (c) indicates aligner ran on closed genomes rather than draft assemblies.
  3. False positive (FP) and false negative (FN) counts for the three mutation rates (low, med, and high). True positive rate TPR: TP/(TP + FN). False discovery rate FDR: FP/(TP + FP). A total of 1,299,178 SNPs were introduced into the 32-genome dataset, across all three mutational rates.
  4. aParadigm employed by each method.
  5. bMugsy’s lower precision was traced to a paralog misalignment that resulted in many false-positive SNPs.
  6. CGA: core genome alignment, FN, number of truth SNP calls not detected, FP, number of SNP calls that are not in truth set, KMER: k-mer based SNP calls, MAP: read mapping, TP: number of SNP calls that agreed with the truth, WGA: whole-genome alignment.