Skip to main content

Table 2 Statistics of nonhuman assemblies

From: NextDenovo: an efficient error correction and accurate assembly tool for noisy long reads

Sample

Software

Assembly size (Mb)

NG50 (Mb)/LG50

NGA50 (Mb)/LGA50

No. of misassemblies

QV

Gene completeness (%)

Wall clock time (hour)

A. thaliana (452X)

NextDenovo

128.37

15.18/5

15.18/5

19

33.25

99.20

6.83

Necat

124.55

15.01/5

14.98/5

44

31.93

99.20

6.82

Canu

138.29

9.31/5

9.31/6

430

25.09

99.20

312.13

Flye

121.16

14.63/5

14.63/5

17

35.65

99.20

12.00

Wtdbg2

157.75

2.68/14

1.87/19

326

19.78

94.80

2.10

D. melanogaster (62X)

NextDenovo

134.34

18.11/4

15.68/4

196

30.99

98.70

1.07

Necat

144.01

19.55/4

15.90/4

1,200

25.86

98.70

2.45

Canu

154.94

8.58/6

5.68/7

1,738

23.53

98.80

45.55

Flye

135.82

18.89/4

17.32/4

335

29.97

98.80

1.58

Wtdbg2

137.49

6.32/7

5.33/9

919

26.07

97.20

0.57

O. sativa (230X)

NextDenovo

392.56

30.55/6

18.00/9

81

26.45

98.60

13.05

Necat

394.40

25.44/7

17.86/9

183

25.83

98.70

10.85

Canu

395.23

11.57/13

9.41/15

204

24.94

98.70

728.78

Flye

403.45

11.10/14

7.84/18

115

24.76

98.70

25.02

Wtdbg2

488.33

0.96/88

0.81/95

553

17.90

94.10

5.85

Z. mays (51X)

NextDenovo

2,118.82

44.44/17

37.90/21

700

20.74

98.20

75.90

Necat

2,171.54

22.76/32

17.71/38

3,307

20.41

98.20

87.87

Canu

2,240.87

0.65/950

0.62/995

6,284

19.14

98.10

1,741.77

Flye

2,122.73

2.87/222

2.59/242

863

20.63

98.20

-

Wtdbg2

4,068.86

0.07/11298

0.05/13848

22,258

14.07

97.00

-

  1. NG50 is the length N that 50% of the reference genome is covered in contigs with length ≥ N. LG50 is the number of contigs with length ≥ NG50. NGA50 is an NG50 of aligned blocks that are obtained by breaking contigs at misassembly events and removing all unaligned bases. LGA50 is the number of aligned blocks with length ≥ NGA50. Misassemblies and QV are evaluated by QUAST, where QV is defined as \(-10\times {{\text{log}}}_{10}(\frac{\# {\text{mismatches}} {\text{per}} 100 {\text{kbp}} + \# {\text{indels}} {\text{per}} 100 {\text{kbp}}}{100 {\text{kbp}}})\). Gene completeness is represented by the complete BUSCO values. QV and gene completeness were evaluated using the polished assemblies and other metrics were evaluated using the raw assemblies. The genomes of A. thaliana, D. melanogaster, and O. sativa were assembled on the same computer with 60 CPUs and 504 GB RAM of memory. The Z. mays genome, assembled by NextDenovo, Necat, and Canu, was run on a computer cluster with 7 nodes each with 32 CPUs and 256 GB RAM and assembled by Fly and Wtdbg2 run on a fat computer node. Best results for each metric are highlighted in bold