Skip to main content
Fig. 6 | Genome Biology

Fig. 6

From: NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans

Fig. 6

Prioritization of non-coding pathogenic variants within individual genomes. The median across the 100 simulated disease genomes of the within-individual rank percentile of a variant (y-axis) is shown for 37 recently reported pathogenic variants evaluated (red dots) and their corresponding 37 randomly sampled negative common SNVs mapping within the same non-coding region of the associated genes (blue dots). Boxplots represent the distributions for the different evaluated scores (x-axis). NCBoost scores offered the highest within-individual rank percentiles of pathogenic variants (median 97.04%), with a statistical significant difference (one-sided paired Wilcoxon test p value) as compared to all evaluated reference methods: ReMM (median 96.34%; p value = 3.75e−2), Eigen (median 96.00%, p value = 9.55e−2), DeepSEA (95.3%; p value = 1.96e−3), CADD (median 93.50; p value = 4.03e−3), FunSeq2 (median 89.92; p value = 5.88e−4). All scores showed a statistically significant difference between the median rank percentile distribution of pathogenic and their internal control negative variants (one-sided paired Wilcoxon test): NCBoost p value = 8.04E-07, ReMM p value = 9.71E−05, Eigen p value = 2.28E−05, DeepSEA p value = 7.51E−06, CADD p value = 3.88E−04, and FunSeq2 p value = 6.13E−03. Complete details are provided in Additional file 6: Table S5. An alternative graphical representation pairing each pathogenic variant with is provided in Additional file 2: Figure S12

Back to article page