From: RNA G-quadruplexes at upstream open reading frames cause DHX36- and DHX9-dependent translation of human mRNAs

RNA G-quadruplexes are determinants of 5′-UTR translation. a Reduced predicted rG4 secondary structures folding energies are associated with translated uORF, i.e., high ORFscore uORFs (ORFscore ≥ 6). Folding energies are expressed as the minimum free energies normalized by the length of the uORFs. b rG4 structure potential downstream the start codons of translated uORFs (upper panel). Folding energies were calculated per position using a sliding window of 35 nt and the lines represent the average of the values over 10 nt. Filled points are the identified local minima for High ORF score uORFs. The dotted lines represent the size of 80S ribosomes (40 nt) phased downstream the start codon. The periodograms obtained from the position of rG4s within the uORFs are reported in the bottom panel and highlight a 41 nt periodicity in High ORF score uORFs. The cartoon above the plot depicts a “queue” of ribosomes stretching back to the uORF initiator codon. c Excess of RPF within 5′-UTR is associated with inefficient translation. The graph reports RPFdist, a proxy of 5′-UTR translation, expressed as z-score, of human mRNAs binned according to their TE (first to fourth quartile of TE distribution). d Principal component analysis of human transcripts using features describing mRNA abundance, 5′-UTR secondary structures, 5′-UTR and mRNA length, and 5′-UTR sequence composition statistics. The first two principal components, explaining ~ 50% of the variance, separate features describing rG4 structures (red quadrant) from features describing dsRNA structures (see also Figure S5a–b in Additional file 1 and Supplementary Information in Additional file 2). e A statistical model with as few as 32 predictors explains 65% of the RPFdist variation observed in the rG4-containing subset of transcripts (see also Supplementary Information in Additional file 2). f Performance of models selected using a subset of predictors on either all transcripts or the rG4-containing subset of transcripts. A model selected using rG4-based predictors only can account for 32.1 ± 8.4% of the observed RPFdist variance in the rG4-containing subset of transcripts making rG4-based predictors as informative as uORF-based predictors (32.0 ± 7.4%, mean ± s.d. over 10 resampling steps). Data in a and c are means ± s.e.m., P values were assessed using one-tailed Mann–Whitney nonparametric tests and compare the reported condition to either background or the rest of the population. Central black lines in (f) represent the medians and the other black lines represent quartile boundaries. P values were assessed using Kolmogorov–Smirnov nonparametric tests. ns: non significant, *P < 0.05, **P < 0.01, ***P < 0.001

