Skip to main content

Table 1 Performance of read count and unique k-mer thresholds at genus and species rank on 10 biological and 21 simulated datasets against the three databases ‘orig’, ‘std’ and ‘nt’

From: KrakenUniq: confident and fast metagenomics classification using unique k-mer counts

Data Type

Rank

Statistic

orig

std

nt

reads

k-mers

%diff

reads

k-mers

%diff

reads

k-mers

%diff

Bio

Genus

Recall

0.90

0.93

+4.0%

0.89

0.94

+6.2%

0.91

0.99

+8.9%

F1

0.95

0.96

+1.8%

0.95

0.97

+2.6%

0.96

0.99

+3.4%

Species

Recall

0.85

0.87

+2.6%

0.70

0.78

+11.8%

0.95

0.98

+3.1%

F1

0.94

0.94

+0.7%

0.90

0.92

+2.5%

0.97

0.99

+1.6%

Sim

Genus

Recall

0.96

0.94

-2.1%

0.95

0.97

+2.5%

0.98

0.99

+0.8%

F1

0.98

0.98

-0.0%

0.98

0.98

+0.3%

0.99

0.99

+0.3%

Species

Recall

0.92

0.93

+0.6%

0.88

0.88

+0.3%

0.90

0.90

-0.1%

F1

0.97

0.97

+0.3%

0.94

0.94

+0.5%

0.96

0.96

-0.1%

  1. Bold values indicate better performance by at least 1% difference in the test statistic, show in the third column %diff. Unique k-mer count thresholds give up to 10% better recall and F1 scores, particularly for the biological datasets