Skip to main content

Table 1 Gap closure results obtained on the bacterial datasets

From: Toward almost closed genomes with GapFiller

Method Original IMAGE SOAPdenovo GapFiller GapFiller-LC
Escherichia coli      
   Genome size (bp) 4,478,287 4,530,961 4,490,973 4,490,638  
   Scaffolds 179 179 179 179  
   Gap count 544 291 16 11  
   Total gap length (bp) 12,516 2,861 16 130  
   Errors (SNPs) 12 40 33 22  
   Errors (indels) 4 17 25 9  
   Errors (misjoins) 1 1 1 1  
   N50 50,557 50,558 50,558 50,558  
Streptomyces coelicolor      
   Genome size (bp) 8,558,275 8,576,331 8,557,720 8,558,333  
   Scaffolds 115 115 115 115  
   Gap count 158 63 60 23  
   Total gap length (bp) 9,221 4,009 1,288 806  
   Errors (SNPs) 299 423 406 280  
   Errors (indels) 664 677 769 686  
   Errors (misjoins) 12 17 18 18  
   N50 173,822 173,822 173,822 173,822  
Staphylococcus aureus      
   Genome size (bp) 2,880,676   2,880,926 2,881,756 2,883,448
   Scaffolds 19   19 19 19
   Gap count 48   27 27 22
   Total gap length (bp) 9,900   1,547 5,508 1,861
   Errors (SNPs) 79   260 98 173
   Errors (indels) 16   53 26 37
   Errors (misjoins) 4   13 7 5
   N50 1,091,731   1,091,333 1,092,281 1,092,421
Rhodobacter sphaeroides      
   Genome size (bp) 4,609,785   4,609,466 4,609,596 4,610,796
   Scaffolds 38   38 38 38
   Gap count 170   163 161 139
   Total gap length (bp) 21,409   14,166 20,667 17,625
   Errors (SNPs) 218   410 230 300
   Errors (indels) 187   294 190 199
   Errors (misjoins) 6   10 6 7
   N50 3,192,334   3,192,075 3,192,215 3,192,974
  1. Gap closure results obtained on four bacterial datasets show that the GapFiller strategy yields the most accurate finished genomes. Also, the gap count is lower compared to the other methods. The IMAGE method significantly underperforms on all quality measures and would therefore not be the preferred method to use. Differences are smaller between GapFiller and SOAPdenovo. Interestingly, whereas the gap count after closure is generally less for GapFiller, SOAPdenovo yields in three cases a shorter total gap length. This suggests the latter method is able to close larger gaps. Strikingly, however, the amount of errors is significantly higher for SOAPdenovo regardless of the source (SNPs, indels and misjoins). Even when applying less strict settings for GapFiller (GapFiller-LC: minimum coverage o = 1, ratio r = 0.5) to shorten the total gap length, our method still yields significantly less errors.