LUMPY: a probabilistic framework for structural variant discovery

Table 1 Long-read validation rates for each tool relative to randomly permuted data

Method	Total calls	Observed validations (fraction)	Expected validations (fraction)
50X coverage
LUMPY (pe + sr)	4,347	2,653 (0.61)	37.9 ± 1.2 (0.009)
LUMPY (pe + sr + prior)	4,809	2,706 (0.563)	41.1 ± 1.3 (0.009)
LUMPY trio (pe + sr)	5,108	2,660 (0.521)	31.5 ± 1.1 (0.006)
LUMPY (pe + sr&rd)	1,355	1,114 (0.822)	5.4 ± 0.5 (0.001)
GASVPro	3,929	2,249 (0.572)	61.1 ± 1.5 (0.016)
DELLY	12,272	3,127 (0.255)	219.2 ± 2.9 (0.018)
Pindel	7,219	2,208 (0.306)	0.7 ± 0.2 (~0)
5X coverage
LUMPY (pe + sr)	643	619 (0.963)	4.9 ± 0.4 (0.008)
LUMPY (pe + sr + prior)	840	785 (0.935)	4.3 ± 0.4 (0.005)
LUMPY trio (pe + sr)	1,006	958 (0.952)	4.1 ± 0.4 (0.004)
LUMPY (pe + sr&rd)	73	66 (0.904)	0.01 ± 0.02 (~0)
GASVPro	356	338 (0.949)	10.2 ± 0.6 (0.029)
DELLY	798	698 (0.875)	4.5 ± 0.4 (0.006)
Pindel	640	521 (0.814)	0.04 ± 0.04 (~0)

Monte Carlo simulations were performed to assess the rate at which false positive SV calls are validated purely by chance using split-read mapping analysis of PacBio and Moleculo data. For each NA12878 deletion callset shown in Figures 5 and 6, deletion coordinates were shuffled 100 times (retaining the breakpoint interval sizes and total span of each deletion call), and validation experiments were conducted precisely as for real data. For each callset, we show the total number of deletion calls, the number of validated calls with the fraction validation in parentheses, and the number of validations expected by chance and the 95% confidence interval (with the expected fraction in parentheses) based on Monte Carlo simulations. pe, paired-end; rd, read-depth; sr, split-read.

ISSN: 1474-760X