Skip to main content

Table 1 Mean of alignment results across all 10 hold-out sample alignments to (1) the reference genome H37Rv (H37Rv columns) and (2) the 400 MTB genomes graph (graph columns) for both CHOP/BWA and vg with and without haplotyping to align the reads (note that when aligning only to H37Rv, CHOP is not used)

From: CHOP: haplotype-aware path indexing in population graphs

 

All TB hold-out samples, read length = 101

   

Alignment criteria

BWA

CHOP/BWA

vg

vg

vg + GBWT

HiSat2

HiSat2

 

H37RV

Graph (n = 400)

H37RV

Graph (n = 400)

Graph (n = 400)

H37RV

Graph (n = 400)

Reads aligned

6,160,920

6,162,033 (+ 0.018%)

6,241,270

6,245,907 (+ 0.074%)

6,244,004 (+ 0.044%)

5,536,194

5,489,149 (− 0.850%)

Reads unaligned

360,236

359,123 (− 0.309%)

279,886

275,249 (− 1.657%)

277,152 (− 0.977%)

984,962

1,032,007 (+ 4.776%)

Reads perfectly aligned

4,048,774

4,142,052 (+ 2.304%)

4,048,774

4,153,217 (+ 2.580%)

4,153,124 (+ 2.577%)

4,056,850

4,113,818 (+ 1.404%)

Bases aligned

596,380,132

596,611,260 (+ 0.039%)

599,244,753

599,601,399 (+ 0.060%)

599,528,267 (+ 0.047%)

553,338,901

548,757,802 (− 0.828%)

Bases unaligned

62,191,423

61,960,355 (− 0.372%)

59,307,655

58,949,429 (− 0.604%)

59,023,102 (− 0.480%)

105,271,191

109,852,201 (+ 4.352%)

Bases unaligned from clipped reads

22,349,569

22,380,472 (+ 0.138%)

27,442,533

27,690,552 (+ 0.904%)

27,575,464 (+ 0.484%)

3,625,336

3,589,573 (− 0.986%)

Bases mismatched

3,458,029

3,308,480 (− 4.325%)

3,596,667

3,458,707 (− 3.836%)

3,455,296 (− 3.931%)

2,164,703

2,029,911 (− 6.227%)

Bases inserted

65,210

65,151 (− 0.090%)

84,358

85,938 (+ 1.874%)

85,397 (+ 1.232%)

26,674

26,763 (+ 0.334%)

Bases deleted

52,272

51,165 (− 2.118%)

70,324

72,082 (+ 2.500%)

70,347 (+ 0.033%)

11,793

11,756 (− 0.317%)

Non-primary alignments

246,092

246,540 (+ 0.182%)

539,309

724,904 (+ 34.414%)

724,613 (+ 34.360%)

969,452

755,436 (− 22.076%)

Time (s)

533

721

10,711

4457

4540

312

517