Skip to main content

Advertisement

Table 3 Using cardinality estimates does not decrease classification performance on the test dataset. KrakenUniq in the default mode—using HyperLogLog cardinality estimation with precision 14—classifies reads as accurately as KrakenUniq using exact counting, on both the species and genus level. (Only genus level is shown in the table, which also shows Kraken’s performance for comparison). Note that we tested two versions of exact counting. In version 1, we implemented exact counting using C++ standard library’s unordered_set. Most time is spent on merging counters in the end for report generation. In version 2, we implemented exact counting using khash from klib (https://github.com/attractivechaos/klib/). KrakenUniq uses version 2. Both unordered sets and the hash map require heap allocations for updating, which can cause significant performance cost at runtime because of global locks. Wall clock time for KrakenUniq includes report generation (which takes an additional 2m33s for Kraken)

From: KrakenUniq: confident and fast metagenomics classification using unique k-mer counts

  Kraken KrakenUniq
Default Exact(1) Exact(2)
Computational performance
 Wall clock time3 17m38s 14m18s 3h30m6s 45m30s
 Speed [Mbp/m] 478.4 595.4 95.9 377.8
 Memory [GB] 167.1 168.2 466.2 272.4
 Minor page faults × 106 203.5 192.2 272.5 904.6
Classification performance
 Recall 0.827 0.888 0.888 0.888
 F1 score 0.922 0.935 0.935 0.935
  1. Bold values indicate the highest or lowest values in each row