The Harvest suite for rapid core-genome alignment and visualization of thousands of intraspecific microbial genomes

Table 1 Core-genome SNP accuracy for simulated E. coli datasets

Method	Description ^a	FP Low	FN Low	FP Med	FN Med	FP High	FN High	TPR	FDR
Mauve	WGA	148	318	198	2,877	100	30,378	0.974	0.0004
Mauve (c)	WGA	0	0	2	38	6	649	0.999	0
Mugsy	WGA	1,261^b	395	1,928	3,371	1,335	34,923	0.970	0.0036
Mugsy (c)	WGA	2	0	2	0	1	81	0.999	0
Parsnp	CGA	23	423	45	3,494	7	35,466	0.970	0.0001
Parsnp (c)	CGA	0	24	0	603	0	10,989	0.992	0
kSNP	KMER	259	600	908	19,730	1,968	916,127	0.280	0.0086
Smalt	MAP	33	110	0	1,307	55	22,957	0.981	0.0001
BWA	MAP	0	168	16	1,947	27	27,091	0.9775	0.0000

Data shown indicates performance metrics of the evaluated methods on the three simulated E. coli datasets (low, medium, and high). Method: Tool used.
(c) indicates aligner ran on closed genomes rather than draft assemblies.
False positive (FP) and false negative (FN) counts for the three mutation rates (low, med, and high). True positive rate TPR: TP/(TP + FN). False discovery rate FDR: FP/(TP + FP). A total of 1,299,178 SNPs were introduced into the 32-genome dataset, across all three mutational rates.
^aParadigm employed by each method.
^bMugsy’s lower precision was traced to a paralog misalignment that resulted in many false-positive SNPs.
CGA: core genome alignment, FN, number of truth SNP calls not detected, FP, number of SNP calls that are not in truth set, KMER: k-mer based SNP calls, MAP: read mapping, TP: number of SNP calls that agreed with the truth, WGA: whole-genome alignment.

ISSN: 1474-760X