Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Predicting RNA splicing from DNA sequence using Pangolin

Fig. 2

Application of Pangolin to a variety of prediction tasks. a Cumulative density plot of the log10 sQTL p-value fold difference between the SNP predicted to affect splicing and that of the lead sQTL SNP for the top 500 sQTLs identified in DGN (All predictions), or for the 100 predictions with the largest predicted effects (inset). b Example of a splice site that shows a large inter-species difference in usage. A single-nucleotide difference between chimp (T) and human (C) is predicted to strongly decrease (resp. increase) usage of a chimp (resp. human) splice site (dashed vertical line indicates the human site). The T (resp. C) difference likely disrupts (resp. creates) a 3’ canonical splice site in chimp (resp. human). c Locations and effects of SNVs ±50bp from a splice site predicted to underlie inter-species differences in splice site usage for 71 3’ and 74 5’ sites. A large fraction—but not all—of splice-altering variants are located near the canonical splice sites. d Survival function plots of BRCA1 variants in splice regions as a function of their predicted effects on splicing. The variants are separated by their classification as loss-of-function (LOF, blue), intermediate effect (INT, orange), or functional (FUNC, green). We observe a huge enrichment of LOF variants among variants with large predicted splicing effects. e Precision-recall curves for different variant types representing the precision and recall for distinguishing LOF variants from functional variants. Pangolin achieves a remarkable AUPRC for variants in extended splice regions (note that this excludes canonical splice variants). See Additional file 1: Fig. S8 for variants from additional annotation bins. f Predicted splicing effects of mutations in or flanking 4 BRCA1 exons from Findlay et al. [12]. Mutations identified to be LOF or to have intermediate phenotypes, as well as missense, nonsense, and canonical splice site mutations are annotated. See Additional file 1: Fig. S9 for all 13 exons with predictions. g Precision-recall curves representing the precision and recall for distinguishing variants annotated as pathogenic from variants annotated as benign in ClinVar. The blue (resp. orange) line represents the PRC for variants excluding (resp. including) variants in annotated splice sites. Missense and nonsense variants are excluded

Back to article page