Skip to main content

Table 3 Merqury runtime, memory, and disk requirements for QV estimation in a human genome

From: Merqury: reference-free quality, completeness, and phasing assessment for genome assemblies

 

Merqury

Mash

Count

Union-sum

QV

QV

CPUs × nodes

32 × 24

48 × 1

24 × 1

24 × 1

Wall clock time

6 m 52 s/node

7 m 43 s

14 m 13 s

3 h 36 m 17 s

CPU time

9.1 h

4.7 h

1.1 h

19.0 h

Memory

21.2 G

7.0 G

10.56 G

2.6 G

Storage

90 G (fastq.gz)

N/A

48 G

90 G (fastq.gz)

Intermediates

1.8 G × 24

48 G

25.5 G

23.1 M

  1. All statistics are for the diploid (maternal, paternal, and combined) assembly of the human genome NA12878. Merqury QV estimates are generated from the full k-mer databases and use exact k-mer counting, whereas Mash QV estimates are generated by streaming all reads against a MinHash sketch of the assembly using Mash Screen. Merqury’s Count and Union-sum steps count all k-mers in the reads, while the QV estimation counts k-mers in the assembly and compares these to the read counts. Mash’s QV estimation creates a k-mer sketch for the assembly and streams all reads against the sketch. Results are totaled over three QV operations (maternal, paternal, and combined). Runtimes were measured on Intel(R) Xeon(R) Gold 6140 CPU, with 2.30GHz. Storage requirements represent gzipped FASTQ files for counting and QV (Mash), and a binary database for QV (Meryl)
  2. h hours, m minutes, s seconds, G gigabytes