Skip to main content

Advertisement

Table 1 Performance of read count and unique k-mer thresholds at genus and species rank on 10 biological and 21 simulated datasets against the three databases ‘orig’, ‘std’ and ‘nt’

From: KrakenUniq: confident and fast metagenomics classification using unique k-mer counts

Data Type Rank Statistic orig std nt
reads k-mers %diff reads k-mers %diff reads k-mers %diff
Bio Genus Recall 0.90 0.93 +4.0% 0.89 0.94 +6.2% 0.91 0.99 +8.9%
F1 0.95 0.96 +1.8% 0.95 0.97 +2.6% 0.96 0.99 +3.4%
Species Recall 0.85 0.87 +2.6% 0.70 0.78 +11.8% 0.95 0.98 +3.1%
F1 0.94 0.94 +0.7% 0.90 0.92 +2.5% 0.97 0.99 +1.6%
Sim Genus Recall 0.96 0.94 -2.1% 0.95 0.97 +2.5% 0.98 0.99 +0.8%
F1 0.98 0.98 -0.0% 0.98 0.98 +0.3% 0.99 0.99 +0.3%
Species Recall 0.92 0.93 +0.6% 0.88 0.88 +0.3% 0.90 0.90 -0.1%
F1 0.97 0.97 +0.3% 0.94 0.94 +0.5% 0.96 0.96 -0.1%
  1. Bold values indicate better performance by at least 1% difference in the test statistic, show in the third column %diff. Unique k-mer count thresholds give up to 10% better recall and F1 scores, particularly for the biological datasets