KrakenUniq: confident and fast metagenomics classification using unique k-mer counts

Table 3 Using cardinality estimates does not decrease classification performance on the test dataset. KrakenUniq in the default mode—using HyperLogLog cardinality estimation with precision 14—classifies reads as accurately as KrakenUniq using exact counting, on both the species and genus level. (Only genus level is shown in the table, which also shows Kraken’s performance for comparison). Note that we tested two versions of exact counting. In version 1, we implemented exact counting using C++ standard library’s unordered_set. Most time is spent on merging counters in the end for report generation. In version 2, we implemented exact counting using khash from klib (https://github.com/attractivechaos/klib/). KrakenUniq uses version 2. Both unordered sets and the hash map require heap allocations for updating, which can cause significant performance cost at runtime because of global locks. Wall clock time for KrakenUniq includes report generation (which takes an additional 2m33s for Kraken)

	Kraken	KrakenUniq
	Kraken	Default	Exact(1)	Exact(2)
Computational performance
Wall clock time³	17m38s	14m18s	3h30m6s	45m30s
Speed [Mbp/m]	478.4	595.4	95.9	377.8
Memory [GB]	167.1	168.2	466.2	272.4
Minor page faults × 10⁶	203.5	192.2	272.5	904.6
Classification performance
Recall	0.827	0.888	0.888	0.888
F1 score	0.922	0.935	0.935	0.935

ISSN: 1474-760X