Skip to main content

Table 1 Benchmark results for correcting simulated PacBio CLR reads. Indels/100 kbp: average number of insertion or deletion errors per 100,000 aligned bases. Mismatches/100 kbp = average number of mismatch errors per 100,000 aligned bases. Genome fraction (GF) reflects how much of each of the strain-specific genomes is covered by the corrected reads. N/100 kbp denotes the average number of uncalled bases (Ns) per 100,000 bases in the read. MC = fraction of misassembled contigs

From: Hybrid-hybrid correction of errors in long reads with HERO

Correction

Indels/100 kbp

Mismatches/100 kbp

GF (%)

N/100 kbp

MC (%)

3 Salmonella strains

     PacBio CLR reads (10X)

6053.09

1564.71

99.96

0.00

0.00

     Ratatosk

110.75

128.23

99.96

6.53

0.29

     R-HERO

11.77

93.90

99.96

0.76

0.20

     FMLRC

83.26

215.29

99.96

0.00

0.11

     F-HERO

17.47

194.45

99.96

0.00

0.13

     LoRDEC

59.00

197.74

99.95

0.00

0.00

     L-HERO

11.58

181.06

99.96

0.00

0.01

20 bacteria strains

     PacBio CLR reads (10X)

6123.04

1594.90

99.32

0.14

0.00

     Ratatosk

402.71

171.31

99.02

3.34

0.06

     R-HERO

26.54

103.51

98.47

0.43

0.13

     FMLRC

314.82

325.05

96.09

0.02

0.99

     F-HERO

72.40

275.34

95.46

0.00

1.11

     LoRDEC

229.92

267.60

97.15

0.00

0.015

     L-HERO

34.25

230.43

96.51

0.00

0.182

100 bacteria strains

     PacBio CLR reads (10X)

6194.76

1644.52

98.07

0.00

0.001

     Ratatosk

523.07

267.42

99.45

6.30

0.29

     R-HERO

46.10

250.16

98.42

1.04

0.85

     FMLRC

413.23

561.24

94.27

0.00

2.31

     F-HERO

97.84

511.03

94.79

0.00

3.47

     LoRDEC

392.21

517.00

96.36

0.00

0.14

     L-HERO

63.56

470.06

96.07

0.00

1.10

  1. Boldface is meant to indicate the best performing tool in the respective category