Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: The genetic and biochemical determinants of mRNA degradation rates in mammals

Fig. 3

Prediction of human half-lives using sequence-encoded features. a Performance of trained lasso regression models on each of 10 held-out folds of data. Compared is the relative performance between pairs of nested models which iteratively consider greater numbers of features. Each model is described by a code indicating the features considered. A description of the code is provided in the key, along with the corresponding number of features considered listed in parentheses. An improvement in a more complex model relative to a simpler model was evaluated with a one-sided, paired t-test, adjusted with a Bonferroni correction to account for the total number of hypothesis tests. Features which were ultimately determined to contribute to performance improvement are colored, or are left black if they did not improve the model. b Shown are the final predictions for the optimal model (i.e., BC3MS) after concatenating the observations for all 10 folds of held-out data. Also indicated are the Pearson (r) and Spearman (rho) correlation values. c The top 30 ranked model coefficients corresponding to the BC3MS model, trained on the full dataset. Features are colored according to the same key as that in panel a. d Pearson correlation matrix between the union of all top 30 features from c, shown as rows, and other features sharing a Pearson correlation either ≤ −0.8 or ≥ 0.8, shown as columns. Feature names are colored according to the origin of the feature as shown in the same key as panel a. Hierarchical clustering was used to group features exhibiting similar correlation patterns

Back to article page