Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: NCBoost classifies pathogenic non-coding variants in Mendelian diseases through supervised learning on purifying selection signals in humans

Fig. 2

Performance of individual features mined in this work to classify high-confident set of n = 737 pathogenic non-coding SNVs associated with monogenic Mendelian disease genes from a negative set of n = 7370 randomly sampled common SNVs without clinical assertions matched by region. The area under the receiver operating characteristic curve (AUROC; left panel) and the area under the precision-recall curve (AUPRC; right panel) obtained for each feature is represented. Features are gathered according to five categories (A–E; “Methods” section) and ranked within category by decreasing AUROC and AUPRC. AUROC values < 0.5 (anti-classifiers) were transformed in 1-AUROC values for the purpose of this figure and are indicated with an asterisk (*). Accordingly, AUPRC values for anti-classifiers were assessed on the basis of the − 1 product transformation. Of note, population-specific GnomAD MAFs (“Methods” section) are not shown for simplicity. One hot-encoded SNV region features (i.e., “intronic,” “UTR5,” “UTR3,” “upstream,” “downstream,” and “intergenic”) are gathered as a single feature labeled as “region”

Back to article page