Skip to main content

Table 3 Simulating a novel pathogen. Mash dist and PathoScope were run on pathogen sequences and their near neighbors with the corresponding truth species removed in their respective databases to simulate an example of classifying a novel pathogen not in the database. SRA represents the SRA id of the sample, True Organism represents the actual bacterial strain or species, Mash dist represents the Mash results on each of the samples (with the truth organism species or strain removed from its sketch database), and PathoScope represents the PathoScope results on each of the samples (with the truth organism species or strain removed from its database). In three of the cases, C. sporogenes, C. botulinum, and S. pyogenes, Mash dist classified the organism as it near neighbor—C. botulinum, C. sporogenes, and S. dysgalactiae, respectively. S. dysgalactiae was classified as S. sp. NCTC 11567 whereas the commensal E. coli K12 and pathogenic E. coli 0157:H7 were classified as E. coli O16:H48 and E. coli 2009C-3554, respectively. PathoScope only classified two pathogens, C. sporogenes and C. botuinum, as their nearest neighbor counterparts. S. dysgalactiae was classified as S. intermedius, whereas S. pyogenes was classified as S. infantarius. E. coli K12 was only classified at the species level, while the pathogenic strain E. coli O157:H7 was classified as E. coli xuzhou21

From: SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning

SRA

True Organism

Mash dist

PathoScope

DRR198806

E. coli K12 MG1655

E. coli O16:H48

E. coli

DRR198804

E. coli O157:H7

E. coli 2009C-3554

E. coli Xuzhou21

SRR8758382

C. sporogenes

C. botulinum

C. botulinum

SRR8981313

C. botulinum

C. sporogenes

C. sporogenes

SRR12825903

S. dysgalactiae

S. sp. NCTC 11567

S. intermedius

ERR1735064

S. pyogenes

S. dysgalactiae

S. infantarius