Skip to main content

Advertisement

Table 1 Example Mash error bounds for a k-mer size of 21 and increasing sketch sizes

From: Mash: fast genome and metagenome distance estimation using MinHash

  Mash distance
Sketch size 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40
100 0.0271 0.0868
500 0.0098 0.0245 0.0473
1000 0.0068 0.0158 0.0323 0.0630
5000 0.0029 0.0065 0.0124 0.0235 0.0460
10,000 0.0020 0.0046 0.0086 0.0159 0.0300 0.0726
50,000 0.0009 0.0020 0.0037 0.0065 0.0116 0.0219 0.0396 0.0822
100,000 0.0006 0.0014 0.0026 0.0046 0.0081 0.0143 0.0250 0.0492
500,000 0.0003 0.0006 0.0011 0.0020 0.0035 0.0060 0.0105 0.0187
1,000,000 0.0002 0.0004 0.0008 0.0014 0.0024 0.0042 0.0072 0.0128
  1. For a given sketch size and Mash distance, the Mash estimation error will be less than the given value with 0.99 probability, as calculated by the binomial inverse cumulative distribution function. Missing values indicate that the estimate is unbounded, i.e. there is a chance that no matching k-mers will be found and the Mash distance will be undefined. Plots of the upper and lower error bounds for k = 16 and k = 21 are given in Additional file 1: Figure S2