Skip to main content

Table 2 Time- and memory-performance results for constructing compacted de Bruijn graphs from whole-genome reference collections

From: Scalable, ultra-fast, and low-memory construction of compacted de Bruijn graphs with Cuttlefish 2

   

Bifrost

deGSM

BCALM 2

Cuttlefish 2

Dataset (genome count)

k

Thread-count

   

Default memory

Unrestricted memory

Human gut (30K)

27

8

06 h (155.1)

Δ

10 h 06 min (21.5)

01 h 39 min (15.2)

01 h 39 min (32.5)

  

16

05 h 30 min (155.1)

 

09 h 05 min (22.0)

01 h 01 min (15.5)

59 min (32.5)

 

55

8

08 h 47 min (279.2)

 

11 h 49 min (18.6)

04 h 14 min (20.6)

03 h 42 min (44.4)

  

16

08 h 20 min (279.2)

 

09 h 45 min (19.2)

03 h 50 min (20.9)

03 h 10 min (44.3)

Human (100)

27

8

35 h 45 min (355.9)

19 h 23 min (235.8)

‡

04 h 32 min (27.7)

04 h 09 min (59.7)

  

16

32 h 14 min (355.9)

14 h 07 min (235.8)

‡

03 h 19 min (28.1)

02 h 49 min (59.7)

 

55

8

∗

†

2 days 23 h 31 min (302.9)

15 h 08 min (56.0)

13 h 47 min (121.8)

  

16

∗

†

∗

12 h (56.2)

11 h 33 min (121.8)

Bacterial archive (661K)

27

16

X

X

‡

16 h 38 min (48.7)

16 h 24 min (104.9)

 

55

   

4 days 10 h 11 min (63.3)

22 h 44 min (59.9)

22 h 20 min (129.5)

  1. Each cell contains the running time in wall clock format, and the maximum memory usage in gigabytes, in parentheses. All the inputs being genomic sequences, the frequency threshold f 0 is used as 1 with all the tools. The relevant execution details, i.e., setting policy of the maximum memory usage (and maximum disk usage, if applicable) for deGSM, BCALM 2, and Cuttlefish 2 are the same as described in Table 1.
  2. The best performance with respect to each metric in each row is highlighted, and only the default-memory mode is considered for Cuttlefish 2 for such. The ∗’s and the †e’s denote that the corresponding executions failed to complete due to hardware shortage of memory and disk-space, respectively. The ‡’s in the BCALM 2 executions denote abnormal terminations, reporting an encountered logic-error. The Δ in the deGSM cells for the human gut genomes dataset indicate that the deGSM executions were stuck in an intermediate stage indefinitely, and they were allowed to run for at least 2 days before being explicitly terminated. For the bacterial archive, we did not execute Bifrost and deGSM (denoted with the X’s) as it is anticipated that insufficient resources would be available for the executions, given their resource-usages on the smaller datasets. Additional file 1: Table S3 also includes the intermediate disk-usages incurred by the tools, besides time and memory