Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: Maast: genotyping thousands of microbial strains efficiently

Fig. 3

Evaluation of Maast computational performance and accuracy. a Comparison of genome genotyping speed between Maast, Maast without redundancy collapsing (Maast + NRC), kSNP3, and Parsnp. All methods were run on 500, 1000, and 5000 simulated A. rectalis genomes. b Comparison of short-reads genotyping speed between Maast, Snippy, and SPAdes. All three methods were run on 63 strains of B. uniformis, whose whole genome sequencing reads (~ 150 million) were downloaded from the Culturable Genome Reference (CGR) study. The y-axis of both a and b indicates elapsed seconds of running in log scale. Fewer elapsed seconds indicate better performance (faster processing speed). c–f Comparison of Maast and Snippy genotyping accuracy at non-reference alleles of SNPs in the Maast SNP panel, based on short reads c, d simulated from isolate genomes with sequencing error (15 x coverage) and e, f downloaded from isolate whole-genome sequencing projects. Both Maast and Snippy were run with default settings. c, e Positive predictive value (PPV; 1- false discovery rate) comparison, where false discoveries are genotype calls that do not match the genome. d, e Sensitivity from the simulations in c and downloaded reads in e. Sensitivity is the probability of detecting genotypes present in the genome. Color of points in c–f indicates whether the data comes from tag genomes (black) or not (red). Samples colored in red are regarded as novel to Maast databases. g, h Maast genotype concordance between g genome and short reads or h genome and long reads. In g, strain population structure of H. pylori was reconstructed using SNPs from 473 strains. Each strain has a whole genome sequence (WG) and a short read sample (SRA) as indicated in the stacked color rings. In h, the population structure of H. pylori was reconstructed from 4 strains with whole genome sequence (WG) and long reads (SRA). * kSNP3 and Parsnp runs > 48 h on 5000 genomes and were manually terminated, and we plot a runtime of 48 h with the note that no output was produced

Back to article page