Skip to main content
Fig. 5 | Genome Biology

Fig. 5

From: The genetic and biochemical determinants of mRNA degradation rates in mammals

Fig. 5

State-of-the-art prediction of half-lives and genetic variant functional effects using a sequence-based deep learning model. a A hybrid convolutional/recurrent neural network architecture to predict half-life from an input of the RNA sequence, an encoding of the first frame of each codon, and 5′ splice site junction(s). The deep learning model, called Saluki, was jointly trained on mouse and human half-life data to predict species-specific half-lives. b Performance of the trained Saluki models on each of 10 held-out folds of data, relative to the corresponding performances from our best genetic (i.e., “BC3MS” for human and “BC3MSD” for mouse, respectively) and biochemical (i.e., “BEeM”) lasso regression models. An improvement relative to another model was evaluated with a two-sided, paired t-test. c Shown are the final predictions after concatenating the observations for all 10 folds of held-out data. Also indicated are the Pearson (r) and Spearman (rho) correlation values. d Metagene plot of ISM scores across all mRNAs for percentiles along the 5′ UTR, ORF, and 3′ UTR. mRNAs were grouped into one of 4 bins according to their predicted half-lives. For the set of mRNAs within each bin, we plotted the average of the absolute value of the mean predicted effect size (i.e., of the three possible alternative mutations). e ISM results of two 3′ UTR segments from TUBGCP3 and PI4K2B. Partial matches to the AU-rich element (ARE, or “UAUUUAU”) and Pumilio/FBF (PUF, or “UGUAHAUA”) binding element consensus sequences are boxed. For each motif, single point mutations resulting in particularly severe or opposite phenotypes are shown alongside annotations reflecting the corresponding ARE and PUF consensus gain or loss events. f Insertional analysis of motifs discovered by TF-MoDISco [84]. Each motif was inserted into one of 50 positional bins along the 5′ UTR, ORF, and 3′ UTR of each mRNA. Indicated is the average predicted change in half-life for each bin plotted along a metagene. g This panel is the same as panel f, except it performs analysis of 61 codons (excluding the 3 stop codons) inserted into the first reading frame along the length of the ORF. Selected codons are colored, with the rest shown in gray. h Scatter plot showing the relationship between the mean influence of each codon along the length of the ORF, as predicted by Saluki in panel g, and the mean codon stability coefficient over a set of cell types as observed previously [26]. Also indicated are the Pearson (r) and Spearman (rho) correlation values

Back to article page