Skip to main content

Advertisement

Springer Nature is making Coronavirus research free. View research | View latest news | Sign up for updates

Table 3 Sequencing runs and assemblies searched against the Mash RefSeq database

From: Mash: fast genome and metagenome distance estimation using MinHash

Organism Tech Type NCBI accession Size (Mbp) Time (CPU s) LCA Best hit
E. coli
K12 MG1655
MiSeq Assembly (SPAdes) 4.6 2.45 Entero. E. coli
K12 MG1655
E. coli
K12 MG1655
PacBio Assembly GCA_000801205 4.6 2.66 Entero. E. coli
K12 MG1655
E. coli
DH1
ABI 3730 Reads (Trace Archive) 60 17.08 Entero. E. coli
DH1
E. coli
K12 MG1655
454 Reads SRR797242 233 57.12 Entero. E. coli
K12 MG1655
E. coli
K12 MG1655
Ion PGM Reads SRR515925 407 72.01 E. coli E. coli
K12 1655
E. coli
K12 MG1655
MiSeq Reads SRR1770413 387 72.01 Entero. E. coli
KLY
E. coli
K12 MT203
HiSeq Reads SRR490124 2155 369.86 E. coli E. coli
GCF_000833635
E. coli
K12 MG1655
PacBio Reads SRR1284073 397 77.96 E. coli E. coli XH140A GCF_000226585
E. coli
K12 MG1655
MinION 1D ERR764952..55 248 55.52 Entero. E. coli
O113 H21
E. coli
K12 MG1655
MinION 2D ERR764952..55 134 27.82 E. coli E. coli GCF_000953515
B. anthracis Ames MinION 1D + 2D SRR2671867 210 44.66 B. anthracis B. anthracis
str. Carbosap
B. cereus ATCC 10987 MinION 1D + 2D SRR2671868 266 76.85 B. cereus ATCC 10987 B. cereus
ATCC 10987
Zaire ebolavirus MinION 1D + 2D ERR1050070 8.7 2.06 Zaire ebolavirus Zaire ebolavirus Mayinga
  1. In all cases, Mash search required 21 MB of RAM for genome assemblies and 209 MB of RAM for sequencing runs (due to the additional Bloom filter overhead). Organism: source strain. Tech: Sequencing technology ABI 3730, 454 GS FLX, Illumina MiSeq, Illumina HiSeq, Ion PGM, PacBio RSII, Oxford Nanopore MinION. Type: Assembly, reads, 1D and 2D nanopore reads. NCBI accession: NCBI accession of the dataset or reads. The SPAdes [63] assembly was derived from the MiSeq reads. Size: total dataset size in Mbp. LCA: lowest common ancestor classification based on the NCBI taxonomy and the resulting hits within a significance tolerance of the best. In several cases, the LCA is at the family level (Enterobacteriaceae) due to significant Mash hits to both E. coli and S. sonnei species. This is a known species naming conflict within the NCBI taxonomy, with some genomes sharing ANI >98 % between these species. Best hit: reports the smallest significant distance reported