Learning the language of post-transcriptional gene regulation
© BioMed Central Ltd 2013
Published: 23 August 2013
Skip to main content
© BioMed Central Ltd 2013
Published: 23 August 2013
A large-scale RNA in vitro selection study systematically identified RNA recognition elements for 205 RNA-binding proteins belonging to families conserved in most eukaryotes.
Messenger RNAs (mRNAs) are regulated at every stage of their life cycle. All cellular RNA, including mRNA, is packaged into distinct ribonucleoprotein (RNP) complexes to orchestrate RNA maturation and turnover processes summarized as post-transcriptional gene regulation. The most relevant processes involving mRNAs include pre-mRNA splicing, 5' and 3' end modification, editing, transport, translation and degradation. Among the challenges for decoding post-transcriptional gene regulation is the elucidation of the mRNP composition, which changes as mRNAs mature or are translated. This is a prerequisite for understanding the consequences of dysregulation and/or mutation of RNA-binding proteins (RBPs) and/or their target RNA-binding sites in disease.
The human genome encodes 1,500 RBPs, and 600 microRNAs targeting mRNAs . Most RBPs are composed of at least one, but frequently also combinations of multiple distinct RNA-binding domains (RBDs). At least 800 distinct RBDs are known; among the most frequent in humans are the single-stranded-RNA-binding RRM, KH, zf-CCCH and zf-CCHC domains, and the double-stranded-RNA-binding DSRM domain. Recent proteomic analysis consolidated the number of mRBPs to 700 proteins and revealed at least 20 previously unknown RBDs [1, 3].
Following or coinciding with the determination of the composition of mRNPs is the identification of the precise binding site(s) located within the mRNA targets of RBPs and the derivation of the underlying RNA recognition element(s) (RRE(s)). This task is non-trivial considering that RBDs generally recognize short and degenerate sequences of three to eight nucleotides, sometimes involving additional RNA secondary structure. In addition, in vivo binding is modulated by competition with other RBPs for the same or adjacent sites . Since the implementation of high-throughput methods in RNA biology, various protocols for the experimental identification of RBP binding sites have been developed. A recent study by Ray et al.  used a single-cycle RNA in vitro selection approach to characterize the binding specificities for 205 recombinant RBPs and, in doing so, has brought us an important step closer to solving the post-transcriptional RBP regulatory code.
To increase throughput and identify the highest-affinity RREs, Ray et al. [4, 8] introduced a SELEX method termed RNAcompete (Figure 1). In contrast to random sequence pools used in SELEX, which contain up to 1014 different molecules of 20- to 80-nucleotide random sequence flanked by constant primer binding sites, RNAcompete pools were designed to contain only 240,000 different sequences of 30 to 40 nucleotides in length. These RNA sequences were predicted to be only weakly structured, with each possible 9-mer represented at least 16 times in the RNAcompete sequence pool. To prepare this RNA sequence pool, oligodeoxynucleotides printed on a microarray were amplified, transcribed into RNA, and subsequently incubated with a recombinantly expressed, affinity-tagged RBP of interest. The RNA pool was then incubated with 75-fold molar excess over protein to ensure efficient competition between the various sequences during binding, so that at equilibrium the proportion of each sequence bound to the RBP reflected its affinity. The incubated protein was recovered and the enrichment of bound RNAs over the initial pool RNA was quantified on microarrays. In contrast to SELEX, the bound RNA was directly analyzed after the first competitive binding reaction without further cycles of amplification and mutagenesis. The RRE for the protein was inferred by combining the calculated Z- and E-values for each possible 7-mer.
In their recent study, Ray et al.  applied RNAcompete to determine RREs for a collection of 205 different RBPs distributed across 24 species and representing approximately 60 conserved families of RBPs. The parallel processing of samples using a single method facilitated comparison of the RREs and specificities of various RBPs. Most RBPs were expressed in truncated forms comprising all constituent RBD(s) with 30 to 50 flanking amino acid residues to enhance solubility. The selected RBPs contained at least one of nine well-characterized RBDs (RRM, KH, S1, YTH, Pumilio repeats (PUF), zf-CCCH, zf-CCHC, zf-RanBP and SAM), whereby the majority of RBPs contained multiple RBDs. Approximately 90% of the RBPs tested recognized five to seven nucleotide-long sequence motifs and did not require structured RNA for binding, which is expected based on the inclusion of predominantly single-strand-specific RBDs in this study.
For 52 proteins, RNAcompete RREs were compared with RREs previously determined by CLIP or other methods. Of these, 35 were highly similar, 6 matched partially and 11 were dissimilar to RNAcompete RRE. For example, for PUM1/2 or ELAVL1/HuR the RREs agreed perfectly, while for proteins such as FMR1 only one of two established RREs were identified. The discrepancies may mirror technical differences between the methods or differences between in vivo and in vitro specificities of RBPs. Enrichment of an RRE by RNAcompete is dependent on affinity, and for multi-RBD proteins affinities of individual RBDs for RNA can vary by orders of magnitude, and contributions of weaker binding RBDs, which can be detected in in vivo data, may be potentially overlooked. In addition, in vivo, the highest affinity sites may not always be accessible due to competition with other RBPs, the cell-type- and subcellular-compartment-dependent concentration of RBP and RNA targets, modulation of RNA affinities by protein cofactors, and the secondary structure of RNA.
RNAcompete-derived RREs demonstrated predictive power for anticipating regulatory functions of RBPs . Evolutionary conservation analysis showed that sequence elements containing these RREs were frequently under positive selection pressure in 5' UTRs, coding regions, 3' UTRs and intronic regions flanking alternative exons. The location of conserved RREs correlated well with previously elucidated RBP binding patterns, with a few surprising twists; for example, conserved RREs for several splicing factors were unexpectedly frequent in the 3' UTR of mRNAs. RNA sequencing experiments from diverse cell lines and tissues with different RBP expression levels allowed correlation of RBP levels with predicted target RNA levels or splicing patterns. This analysis confirmed known RBP functions in some cases (ELAVL1/HuR, RBM4), but also hinted at unanticipated roles for others (PUM1/2, RBFOX1). A study of RNA knockdown data confirmed that RBFOX1, a splicing regulator, also had a positive effect on RNA stability of putative targets with predicted RBFOX1 sites in the 3' UTR, confirming previous reports that some RBPs may have multiple functions in post-transcriptional gene regulation.
Some of the regulatory effects predicted by the evolutionary conservation analysis of RNAcompete RREs, however, are difficult to reconcile with other available data, such as the implied negative effect of the FMR1 protein on target mRNA levels. An effect of FMR1 on RNA abundance was explicitly ruled out in two recent studies, although FMR1 was shown at the same time to negatively regulate protein abundance of targeted mRNAs [9, 10]. As discussed above, these discrepancies may reflect differences between in vivo and in vitro preferences of multi-RBD proteins, including FMR1. Analysis of CLIP-derived motifs showed that the FMR1 RG-rich region bound WGGA with higher affinity than its KH domains bound ACUK . The RNAcompete motifs GACAAG and ANGGAC more likely reflected contributions of the RG-rich region to binding. The implicit assumption that the highest-affinity RRE also reflects the optimal in vivo RRE may prove inaccurate in some cases, because of varying accessibility of a motif.
The systematic analysis and identification of RREs, together with in vivo RNA targets of regulatory proteins, will remain one of the main focuses in post-transcriptional gene regulation research. Ray et al. have compiled the largest catalog of experimentally derived RREs at present and this resource may be used to understand evolutionary relationships between RBPs. It also allows researchers to find putative binding sites for RBPs of interest and gives computational biologists the opportunity to integrate RREs as predictors into statistical learning methods to model, in concert with microRNA binding sites, transcription factor recognition elements and epigenetic marks, the transcriptional and post-transcriptional control of gene expression.
To capture the physiological role of RBPs, we still need to dissect the target gene network for each RBP individually in various cellular contexts, and then integrate the knowledge into computational approaches that are able to recapitulate quantitatively the regulatory effects of RBPs. This includes understanding protein and target RNA levels in different cell types and tissues, insights into RRE occupancy, competition among RBPs and accounting for redundancies in protein families or regulatory pathways. Efforts such as SELEX- or CLIP-based methods increase the growing compendium of RREs and contribute to this goal to characterize post-transcriptional regulation in a comprehensive manner.
crosslinking and immunoprecipitation
systematic evolution of ligands by exponential enrichment.