Skip to main content

Table 2 Evaluation of Mash distance and AAF distance on the Simulate-Bact dataset with distance threshold 0.001

From: RabbitTClust: enabling fast clustering analysis of millions of bacteria genomes with MinHash sketches

Method

Sketch Size

Time(s)

Memory (GB)

NOCa

NMI

Mash distance with fixed sketch size

1000

2.220

1.31

216

0.568

5000

2.594

1.32

212

0.569

10,000

3.135

1.34

211

0.570

AAF distance with variable sketch size

\(length * (1/10,000)\)

2.057

1.27

31

0.899

\(length * P_{d}\)b

2.199

1.30

26

0.911

\(length * (1/1,000)\)

2.655

1.32

12

0.983

  1. aNOC: Number Of Clusters
  2. b\(P_d\): Default sampling proportion, which serves as 1/6969 on this dataset