Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans

Fig. 1

Curation of high-confidence pathogenic non-coding SNVs associated with monogenic Mendelian disease genes. a Number of high-confidence pathogenic non-coding SNVs obtained from the Human Gene Mutation Database [37] (HGMD-DM), ClinVar [38], and Smedley’2016 [20], after filtering out SNVs overlapping exonic and splice sites of protein-coding genes and SNVs associated with non-coding RNAs (“Methods” section). Only high-confidence pathogenic non-coding variants associated with the same protein-coding gene by both the original resource and the annotation process done in this work (depicted in orange) were retained for downstream analysis. b Retained variants in a were further classified according the OMIM category of the associated gene, i.e., non-Mendelian disease gene, Mendelian disease gene associated with a disease phenotype differing from the one reported in the original resource (i.e., presenting a conflicting disease description), complex Mendelian disease genes, and monogenic Mendelian disease genes. Only high-confidence pathogenic non-coding SNVs associated with monogenic Mendelian diseases with no homozygous individuals in GnomAD database [35] (depicted in green) were finally retained for downstream analyses. c Distribution of the high-confidence pathogenic non-coding SNVs associated with monogenic Mendelian disease genes according to the type of gene region they overlap: intronic, 5′UTR, 3′UTR, upstream, downstream, and intergenic regions. d Distribution of the high-confidence pathogenic non-coding SNVs associated with monogenic Mendelian disease genes according to the original annotation source, i.e., HGMD-DM, ClinVar, and Smedley’2016. e Corresponding number of monogenic Mendelian disease genes collectively involved by SNVs in d. The number of SNVs in each category is indicated inside the barplots and Venn diagrams together with the number of genes collectively involved (in parenthesis in ac; totals are reported above each barplot)

Back to article page