Skip to main content
Fig. 7 | Genome Biology

Fig. 7

From: NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans

Fig. 7

The figure represents the number of monogenic Mendelian disease genes (MMDGs, in blue, y-axis) bearing at least one (solid curves) or ten (dashed curves) potentially pathogenic non-coding variants as a function of the top NCBoost scoring positions considered (x-axis), ranked from left (more pathogenic) to right (less pathogenic). A total of 857,825,085 positions overlapping intronic, 5′UTR or 3′UTR, and upstream and downstream regions collectively associated with 18,404 protein-coding genes was used as a reference background. For the sake of visualization, the x-axis was cut at 10 Million top-scoring genomic positions. Vertical bars represent the thresholds at which left positions display NCBoost scores higher than the corresponding top percentage of the high-confidence pathogenic variants curated in this work. Top 5%, 10%, 15%, and 20% thresholds are represented. The horizontal dotted lines represent the total of 3223 MMDGs (in blue) and 18,404 protein-coding genes (in black) for which NCBoost scores were obtained. Genes bearing potentially pathogenic non-coding variant per gene within the top 5% and the top 10% are highly enriched in MMDGs as compared to the background of protein-coding genes used (see text)

Back to article page