Skip to main content

Table 4 Time- and memory-performance of CUTTLEFISH 2 for constructing the compacted de Bruijn graph from the human read set NIST HG004, and some corresponding metrics of the output maximal unitigs, over a range of k-mer sizes

From: Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

  

Performance-metrics

Unitig-metrics

k

k-mer count

Default memory

Unrestricted memory

Count

Avg. length (bp)

Max. length (bp)

N50 (bp)

N G A50 (bp)

27

2,547,479,119

1 h 12 min (3.19)

54 min (11.29)

80,465,421

58

20,648

62

425

41

2,771,918,177

2 h 19 min (3.48)

1 h 05 min (11.26)

44,768,246

102

29,381

186

769

55

2,900,387,834

2h 12 min (3.54)

1 h 04 min (11.28)

28,510,532

156

32,725

386

1030

69

2,978,629,926

2 h 42 min (3.66)

1 h 11 min (19.49)

20,361,009

214

45,495

552

1256

83

3,029,739,673

2 h 39 min (3.68)

1 h 04 min (22.34)

16,220,627

269

45,359

645

1435

97

3,066,350,056

3 h 05 min (3.78)

1 h 06 min (30.57)

13,938,567

316

57,338

675

1543

111

3,093,353,953

2 h 53 min (3.75)

1 h 08 min (32.18)

12,683,849

354

57,402

660

1596

125

3,111,450,986

3 h 01 min (3.80)

1 h 16 min (42.18)

11,855,026

386

57,416

634

1617

  1. In performance-metrics, the running times are in wall clock format, and the maximum memory usages are in gigabytes, in parentheses. The frequency threshold f 0 for the (k+1)-mers is kept fixed at 5. The number of threads used in all the executions is 8. The setting policy of the execution modes (i.e., default-memory and unrestricted-memory) for CUTTLEFISH 2 is as described in Table 1. N G A50 is calculated using the tool abyss-samtobreak, having aligned the output contigs to the genome reference GRCh38 using BWA-MEM [69]