Skip to main content

Table 2 Mash runtime and output size for all-pairs RefSeq computation using various sketch and k-mer sizes

From: Mash: fast genome and metagenome distance estimation using MinHash

 

k = 16

k = 21

Sketch size

Sketch (CPU h)

Dist (CPU h)

Size (Mb)

gzip (Mb)

Sketch (CPU h)

Dist (CPU h)

Size (Mb)

gzip (Mb)

500

26.4

8.4

120.1

89.7

31.3

9.0

229.8

201.8

1000

27.7

15.9

224.9

179.7

31.3

17.4

439.2

399.6

5000

26.4

74.5

1022.5

873.8

31.6

83.6

2034.5

1924.6

10,000

26.8

146.9

1961.8

1691.1

31.7

164.0

3913.0

3696.2

  1. Sketch: CPU h required for the Mash sketch operation for all 54,118 RefSeq genomes. Dist: CPU h required for the Mash dist table operation for all pairs of sketches. Size: combined size of the resulting sketches in megabytes. gzip: combined size of the resulting sketches after gzip compression