Skip to main content


Table 3 Sequencing runs and assemblies searched against the Mash RefSeq database

From: Mash: fast genome and metagenome distance estimation using MinHash

Organism Tech Type NCBI accession Size (Mbp) Time (CPU s) LCA Best hit
E. coli K12 MG1655 MiSeq Assembly (SPAdes) 4.6 2.45 Entero. E. coli K12 MG1655
E. coli K12 MG1655 PacBio Assembly GCA_000801205 4.6 2.66 Entero. E. coli K12 MG1655
E. coli DH1 ABI 3730 Reads (Trace Archive) 60 17.08 Entero. E. coli DH1
E. coli K12 MG1655 454 Reads SRR797242 233 57.12 Entero. E. coli K12 MG1655
E. coli K12 MG1655 Ion PGM Reads SRR515925 407 72.01 E. coli E. coli K12 1655
E. coli K12 MG1655 MiSeq Reads SRR1770413 387 72.01 Entero. E. coli KLY
E. coli K12 MT203 HiSeq Reads SRR490124 2155 369.86 E. coli E. coli GCF_000833635
E. coli K12 MG1655 PacBio Reads SRR1284073 397 77.96 E. coli E. coli XH140A GCF_000226585
E. coli K12 MG1655 MinION 1D ERR764952..55 248 55.52 Entero. E. coli O113 H21
E. coli K12 MG1655 MinION 2D ERR764952..55 134 27.82 E. coli E. coli GCF_000953515
B. anthracis Ames MinION 1D + 2D SRR2671867 210 44.66 B. anthracis B. anthracis str. Carbosap
B. cereus ATCC 10987 MinION 1D + 2D SRR2671868 266 76.85 B. cereus ATCC 10987 B. cereus ATCC 10987
Zaire ebolavirus MinION 1D + 2D ERR1050070 8.7 2.06 Zaire ebolavirus Zaire ebolavirus Mayinga
  1. In all cases, Mash search required 21 MB of RAM for genome assemblies and 209 MB of RAM for sequencing runs (due to the additional Bloom filter overhead). Organism: source strain. Tech: Sequencing technology ABI 3730, 454 GS FLX, Illumina MiSeq, Illumina HiSeq, Ion PGM, PacBio RSII, Oxford Nanopore MinION. Type: Assembly, reads, 1D and 2D nanopore reads. NCBI accession: NCBI accession of the dataset or reads. The SPAdes [63] assembly was derived from the MiSeq reads. Size: total dataset size in Mbp. LCA: lowest common ancestor classification based on the NCBI taxonomy and the resulting hits within a significance tolerance of the best. In several cases, the LCA is at the family level (Enterobacteriaceae) due to significant Mash hits to both E. coli and S. sonnei species. This is a known species naming conflict within the NCBI taxonomy, with some genomes sharing ANI >98 % between these species. Best hit: reports the smallest significant distance reported