Skip to main content

Advertisement

Table 2 Mash runtime and output size for all-pairs RefSeq computation using various sketch and k-mer sizes

From: Mash: fast genome and metagenome distance estimation using MinHash

  k = 16 k = 21
Sketch size Sketch (CPU h) Dist (CPU h) Size (Mb) gzip (Mb) Sketch (CPU h) Dist (CPU h) Size (Mb) gzip (Mb)
500 26.4 8.4 120.1 89.7 31.3 9.0 229.8 201.8
1000 27.7 15.9 224.9 179.7 31.3 17.4 439.2 399.6
5000 26.4 74.5 1022.5 873.8 31.6 83.6 2034.5 1924.6
10,000 26.8 146.9 1961.8 1691.1 31.7 164.0 3913.0 3696.2
  1. Sketch: CPU h required for the Mash sketch operation for all 54,118 RefSeq genomes. Dist: CPU h required for the Mash dist table operation for all pairs of sketches. Size: combined size of the resulting sketches in megabytes. gzip: combined size of the resulting sketches after gzip compression