Skip to main content

Table 1 Long-read validation rates for each tool relative to randomly permuted data

From: LUMPY: a probabilistic framework for structural variant discovery

Method

Total calls

Observed validations (fraction)

Expected validations (fraction)

50X coverage

  LUMPY (pe + sr)

4,347

2,653 (0.61)

37.9 ± 1.2 (0.009)

  LUMPY (pe + sr + prior)

4,809

2,706 (0.563)

41.1 ± 1.3 (0.009)

  LUMPY trio (pe + sr)

5,108

2,660 (0.521)

31.5 ± 1.1 (0.006)

  LUMPY (pe + sr&rd)

1,355

1,114 (0.822)

5.4 ± 0.5 (0.001)

  GASVPro

3,929

2,249 (0.572)

61.1 ± 1.5 (0.016)

  DELLY

12,272

3,127 (0.255)

219.2 ± 2.9 (0.018)

  Pindel

7,219

2,208 (0.306)

0.7 ± 0.2 (~0)

5X coverage

  LUMPY (pe + sr)

643

619 (0.963)

4.9 ± 0.4 (0.008)

  LUMPY (pe + sr + prior)

840

785 (0.935)

4.3 ± 0.4 (0.005)

  LUMPY trio (pe + sr)

1,006

958 (0.952)

4.1 ± 0.4 (0.004)

  LUMPY (pe + sr&rd)

73

66 (0.904)

0.01 ± 0.02 (~0)

  GASVPro

356

338 (0.949)

10.2 ± 0.6 (0.029)

  DELLY

798

698 (0.875)

4.5 ± 0.4 (0.006)

  Pindel

640

521 (0.814)

0.04 ± 0.04 (~0)

  1. Monte Carlo simulations were performed to assess the rate at which false positive SV calls are validated purely by chance using split-read mapping analysis of PacBio and Moleculo data. For each NA12878 deletion callset shown in Figures 5 and 6, deletion coordinates were shuffled 100 times (retaining the breakpoint interval sizes and total span of each deletion call), and validation experiments were conducted precisely as for real data. For each callset, we show the total number of deletion calls, the number of validated calls with the fraction validation in parentheses, and the number of validations expected by chance and the 95% confidence interval (with the expected fraction in parentheses) based on Monte Carlo simulations. pe, paired-end; rd, read-depth; sr, split-read.