Skip to main content

Table 1 Example Mash error bounds for a k-mer size of 21 and increasing sketch sizes

From: Mash: fast genome and metagenome distance estimation using MinHash

 

Mash distance

Sketch size

0.05

0.10

0.15

0.20

0.25

0.30

0.35

0.40

100

0.0271

0.0868

500

0.0098

0.0245

0.0473

1000

0.0068

0.0158

0.0323

0.0630

5000

0.0029

0.0065

0.0124

0.0235

0.0460

10,000

0.0020

0.0046

0.0086

0.0159

0.0300

0.0726

50,000

0.0009

0.0020

0.0037

0.0065

0.0116

0.0219

0.0396

0.0822

100,000

0.0006

0.0014

0.0026

0.0046

0.0081

0.0143

0.0250

0.0492

500,000

0.0003

0.0006

0.0011

0.0020

0.0035

0.0060

0.0105

0.0187

1,000,000

0.0002

0.0004

0.0008

0.0014

0.0024

0.0042

0.0072

0.0128

  1. For a given sketch size and Mash distance, the Mash estimation error will be less than the given value with 0.99 probability, as calculated by the binomial inverse cumulative distribution function. Missing values indicate that the estimate is unbounded, i.e. there is a chance that no matching k-mers will be found and the Mash distance will be undefined. Plots of the upper and lower error bounds for k = 16 and k = 21 are given in Additional file 1: Figure S2