Skip to main content

Table 1 Gap closure results obtained on the bacterial datasets

From: Toward almost closed genomes with GapFiller

Method

Original

IMAGE

SOAPdenovo

GapFiller

GapFiller-LC

Escherichia coli

     

   Genome size (bp)

4,478,287

4,530,961

4,490,973

4,490,638

 

   Scaffolds

179

179

179

179

 

   Gap count

544

291

16

11

 

   Total gap length (bp)

12,516

2,861

16

130

 

   Errors (SNPs)

12

40

33

22

 

   Errors (indels)

4

17

25

9

 

   Errors (misjoins)

1

1

1

1

 

   N50

50,557

50,558

50,558

50,558

 

Streptomyces coelicolor

     

   Genome size (bp)

8,558,275

8,576,331

8,557,720

8,558,333

 

   Scaffolds

115

115

115

115

 

   Gap count

158

63

60

23

 

   Total gap length (bp)

9,221

4,009

1,288

806

 

   Errors (SNPs)

299

423

406

280

 

   Errors (indels)

664

677

769

686

 

   Errors (misjoins)

12

17

18

18

 

   N50

173,822

173,822

173,822

173,822

 

Staphylococcus aureus

     

   Genome size (bp)

2,880,676

 

2,880,926

2,881,756

2,883,448

   Scaffolds

19

 

19

19

19

   Gap count

48

 

27

27

22

   Total gap length (bp)

9,900

 

1,547

5,508

1,861

   Errors (SNPs)

79

 

260

98

173

   Errors (indels)

16

 

53

26

37

   Errors (misjoins)

4

 

13

7

5

   N50

1,091,731

 

1,091,333

1,092,281

1,092,421

Rhodobacter sphaeroides

     

   Genome size (bp)

4,609,785

 

4,609,466

4,609,596

4,610,796

   Scaffolds

38

 

38

38

38

   Gap count

170

 

163

161

139

   Total gap length (bp)

21,409

 

14,166

20,667

17,625

   Errors (SNPs)

218

 

410

230

300

   Errors (indels)

187

 

294

190

199

   Errors (misjoins)

6

 

10

6

7

   N50

3,192,334

 

3,192,075

3,192,215

3,192,974

  1. Gap closure results obtained on four bacterial datasets show that the GapFiller strategy yields the most accurate finished genomes. Also, the gap count is lower compared to the other methods. The IMAGE method significantly underperforms on all quality measures and would therefore not be the preferred method to use. Differences are smaller between GapFiller and SOAPdenovo. Interestingly, whereas the gap count after closure is generally less for GapFiller, SOAPdenovo yields in three cases a shorter total gap length. This suggests the latter method is able to close larger gaps. Strikingly, however, the amount of errors is significantly higher for SOAPdenovo regardless of the source (SNPs, indels and misjoins). Even when applying less strict settings for GapFiller (GapFiller-LC: minimum coverage o = 1, ratio r = 0.5) to shorten the total gap length, our method still yields significantly less errors.