Skip to main content

Table 1 Core-genome SNP accuracy for simulated E. coli datasets

From: The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes

Method

Description a

FP Low

FN Low

FP Med

FN Med

FP High

FN High

TPR

FDR

Mauve

WGA

148

318

198

2,877

100

30,378

0.974

0.0004

Mauve (c)

WGA

0

0

2

38

6

649

0.999

0

Mugsy

WGA

1,261b

395

1,928

3,371

1,335

34,923

0.970

0.0036

Mugsy (c)

WGA

2

0

2

0

1

81

0.999

0

Parsnp

CGA

23

423

45

3,494

7

35,466

0.970

0.0001

Parsnp (c)

CGA

0

24

0

603

0

10,989

0.992

0

kSNP

KMER

259

600

908

19,730

1,968

916,127

0.280

0.0086

Smalt

MAP

33

110

0

1,307

55

22,957

0.981

0.0001

BWA

MAP

0

168

16

1,947

27

27,091

0.9775

0.0000

  1. Data shown indicates performance metrics of the evaluated methods on the three simulated E. coli datasets (low, medium, and high). Method: Tool used.
  2. (c) indicates aligner ran on closed genomes rather than draft assemblies.
  3. False positive (FP) and false negative (FN) counts for the three mutation rates (low, med, and high). True positive rate TPR: TP/(TP + FN). False discovery rate FDR: FP/(TP + FP). A total of 1,299,178 SNPs were introduced into the 32-genome dataset, across all three mutational rates.
  4. aParadigm employed by each method.
  5. bMugsy’s lower precision was traced to a paralog misalignment that resulted in many false-positive SNPs.
  6. CGA: core genome alignment, FN, number of truth SNP calls not detected, FP, number of SNP calls that are not in truth set, KMER: k-mer based SNP calls, MAP: read mapping, TP: number of SNP calls that agreed with the truth, WGA: whole-genome alignment.