SRSF3 and SRSF4 bind distinct RNAs
We used the iCLIP method [15] to identify SRSF3 and SRSF4 binding sites genome-wide in mouse P19 cells. SRSF3 and SRSF4 were immunopurified via the EGFP tag encoded on stable transgenes to allow direct comparison of the binding profiles of the two SR proteins [13]. Previous analyses showed that the EGFP-tagged SR proteins recapitulate interactions with nascent RNA and functionally rescue the endogenous proteins [5, 13]. Both SRSF3-EGFP and SRSF4-EGFP were specifically and efficiently immunopurified from cell extracts, and SR protein-RNA complexes were isolated after in vivo UV crosslinking (Figure S1a, b in Additional file 1). No RNA-protein complexes were detected in cells expressing only nuclear EGFP (EGFP-nuclear localization signal) or in the absence of UV crosslinking (Figure S1b in Additional file 1). In each replicate experiment, SRSF4 showed weaker signal intensity than SRSF3 (Figure S1b in Additional file 1), indicating either lower crosslinking efficiency or fewer RNA targets.
Crosslinked, immunopurified RNA was digested to lengths of 40 to 100 nucleotides, reverse transcribed and prepared for next-generation sequencing [15] (Figure S1c in Additional file 1). The resulting reads, referred to as CLIP-tags throughout the manuscript, were aligned to the mouse mm9 genome assembly. In total, iCLIP produced 1,212,480 and 243,501 unique CLIP-tags for SRSF3 and SRSF4, respectively (Table S1 in Additional file 1). SRSF4 reproducibly yielded fewer sequence reads, in agreement with the lower crosslinking levels observed (Figure S1b in Additional file 1). The EGFP-nuclear localization signal control iCLIP experiments performed in parallel did not produce any detectable PCR products and yielded a total of 2,611 CLIP-tags mapping to the mouse genome. Because the SRSF3 and SRSF4 iCLIPs generated 100- to 1,000-fold more CLIP-tags than the control iCLIP, less than 1% of the detected CLIP-tags could be due to nonspecific crosslinking.
As a first step towards analyzing the RNAs and RNA regions bound by SRSF3 and SRSF4, crosslink sites were identified by mapping to the first nucleotide upstream of the start of each CLIP-tag, as previously described [15]. We determined statistically significant SRSF3 and SRSF4 crosslink sites (33,458 and 10,393, respectively), and identified CLIP-tag clusters with a maximum spacing of 15 nucleotides and containing a significant CLIP-tag count when compared to randomized positions (false discovery rate < 0.05) [15–17]. To test whether the iCLIP captured only the most highly expressed genes, we compared the density of CLIP-tags to our global gene expression data in P19 cells [13]. There was a slight positive correlation between the gene expression level and the density of CLIP-tags within the gene, yet CLIP-tags were identified in genes at the whole range of gene expression (Figure S1d in Additional file 1).
Examination of SRSF3 and SRSF4 CLIP-tag clusters indicated that multiple reads were detected in limited RNA regions. The same transcript could display crosslinking to both SR proteins, albeit in different regions of the transcript, as exemplified by the NPM1 gene that contained CLIP-tag clusters for both SRSF3 and SRSF4 mapping to distinct exons (Figure 1a). Also at the chromosome level, a large proportion of the CLIP-tags and clusters were non-overlapping (Figure 1a; Figure S2 in Additional file 1). Significant crosslink sites were detected in 2,304 genes for SRSF3 and 1,055 genes for SRSF4, of which 83.3% and 83.2% were protein-coding, respectively. A list of genes with significant crosslink sites is provided in Additional file 2. These numbers are likely to be underestimates because our sequencing has not reached saturation. In agreement with our recent analysis showing that SRSF3 and SRSF4 associate with distinct mRNAs [13], the identity of the target RNAs bound by SRSF3 and SRSF4 only partially overlapped (Figure 1b). An even smaller overlap between SRSF3 and SRSF4 CLIP-tag clusters, rather than genes, was observed (compare Figure 1b and 1c), strongly suggesting differential RNA-binding specificities of SRSF3 and SRSF4.
Consensus binding motif of SRSF3 and SRSF4
The in vivo binding specificities of SRSF3 and SRSF4 are unknown. The differences in the CLIP-tag cluster sites suggested that each of the two SR proteins binds to a distinct RNA sequence. To address this directly, we used the data to derive in vivo binding motifs for SRSF3 and SRSF4 by analyzing enriched pentamer sequences around the crosslink sites. To calculate a Z-score for each pentamer, iCLIP positions were randomized within the same regions. The pentamer enrichment analysis showed that SRSF3 and SRSF4 identify distinct sequence motifs (Figure 2). The top five pentamers for SRSF3 (Figure 2a) were in close agreement with the core SELEX (systemic evolution of ligands by exponential enrichment) motif determined in vitro [18, 19]. SELEX has not been performed on SRSF4; interestingly, the SRSF4 top five pentamers (Figure 2b) were similar to one sequence (GAAGGA) previously shown to be an SRSF4 binding site in bovine papilloma virus pre-mRNA [20]. The SRSF3 binding motif was CU-rich excluding Gs, whereas SRSF4 bound to GA-rich sequences excluding Cs (Figure 2d). These results are consistent with the largely non-overlapping SRSF3 and SRSF4 crosslink sites and clusters (Figure 2c, and see above).
SRSF3 and SRSF4 bind to coding and non-coding RNAs
Which categories of RNA and which functional RNA regions are bound by SR proteins? Analysis of the frequency with which SRSF3 and SRSF4 CLIP-tags were mapped to genes and gene regions revealed their common propensity to bind exons and introns in protein-coding genes (Figure 3a; Table S3 in Additional file 1). The high proportion of intronic CLIP-tags detected clearly reflects the fact that mammalian introns are much longer than exons; when the frequency of CLIP-tags was normalized to the length of the RNA region (Figure 3b), both SRSF3 and SRSF4 CLIP-tags were more highly enriched in exons than in introns. SR protein interactions with exons could reflect activities either in pre-mRNA splicing or in mRNPs after splicing (see below).
The highest density of CLIP-tags was detected in ncRNAs (Figure 3b). Overall, 319 and 141 ncRNAs had SRSF3 and SRSF4 CLIP-tag clusters, respectively. The most abundant ncRNA classes with CLIP-tags were long ncRNAs (lincRNAs) and small nucleolar RNAs (snoRNAs) (Figure 4a). Similar to SRSF1 and TDP-43 [12, 21], SRSF3 and SRSF4 crosslinked to the lincRNA MALAT1 (aka NEAT2; Figure S3a in Additional file 1) that is enriched in nuclear speckles [22]. In addition, another speckle-localized ncRNA, 7SK [23], had abundant SRSF3 and SRSF4 CLIP-tag clusters (data not shown). An especially large proportion of ncRNAs with SRSF3 and SRSF4 crosslink sites belonged to snoRNAs, a class of small RNAs that guide RNA modifying enzymes [24]. Intriguingly, small Cajal body-specific RNAs (scaRNAs), a subclass of snoRNAs, were enriched in SRSF4 CLIP-tag clusters. SR protein binding could not be correlated with known elements within scaRNAs because the scaRNAs identified included those with H/ACA boxes alone, C/D boxes alone, and a combination of H/ACA and C/D boxes. The specificity of SR protein binding to this group of scaRNAs was investigated in two ways. First, we asked whether binding was biased to any particular region of the scaRNAs. Figure 4b shows that binding sites were localized near scaRNA 3' ends (Figure 4b; Figure S3a in Additional file 1). Second, the CLIP-tag clusters within the scaRNAs were used to determine a consensus binding motif independent of the global pentamer analysis. Multiple alignment of the CLIP-tag cluster regions using the MEME (Multiple Em for Motif Elicitation) algorithm identified a consensus sequence element (Figure S3c in Additional file 1) that was found in all scaRNAs with SRSF4 CLIP-tag clusters. The motif was GA-rich, similar to the pentamer motif determined for all crosslink sites with the exception that Cs were occasionally observed. This independent derivation of a binding sequence similar to the globally derived consensus indicates that SRSF4 binding to scaRNAs is specific.
SRSF3 and SRSF4 bind to intronless histone mRNAs
SRSF3 and SRSF4 binding sites were found in intronless protein-coding genes, likely reflecting SRSF3 and SRSF4 participation in regulatory events other than splicing. In particular, SRSF3 and SRSF4 CLIP-tag clusters were detected within histone genes: 73.8% of the mouse histone genes annotated in [25] had SRSF3 clusters and 47.7% had SRSF4 clusters (Figure 5a; Figure S4a in Additional file 1). This was also reflected in the enriched Gene Ontology (GO) terms where categories related to chromatin and nucleosome assembly were present (Table S4 in Additional file 1). The SRSF3 and SRSF4 CLIP-tag clusters were located at the boundary between ORF and 3' UTR and/or within the 3' UTR of histone mRNAs (Figure 5b). The CLIP-tag clusters were located just upstream of conserved stem-loops that occur 14 to 50 nucleotides downstream of the ORF (Figure 5a); these stem loops specify the sites of endonucleolytic cleavage of replication-dependent histone mRNAs and therefore define their 3' ends [26].
SRSF3 was previously shown to promote the export of histone H2A reporter mRNAs via a 22-nucleotide transport element within the coding region of H2A mRNAs, to which SRSF3 bound and recruited the mRNA export factor TAP [27, 28]. In our study, however, most SRSF3 and SRSF4 CLIP-tag clusters in histone H2A family mRNAs were found outside this 22-nucleotide transport element (Figure 5a, b; Figure S4a in Additional file 1). Furthermore, most SRSF3 and SRSF4 crosslink sites were present in mRNAs of histone families other than H2A, which do not contain the transport element (Additional file 2). Interestingly, SRSF3 and SRSF4 binding sites identified here are similar to those reported in another study that characterized export factor-binding sites in histone mRNAs [29].
SR proteins also promote polyadenylation in some contexts [30, 31]. We found this intriguing in the context of the histone mRNA targets because several recent studies have shown that a significant pool of histone mRNAs undergo polyadenylation instead of 3' end cleavage [32–36]. To validate the association of SRSF3 and SRSF4 with histone mRNAs and to investigate polyadenylation, we adopted an RNA immunoprecipitation (RIP) assay from UV crosslinked cell extracts (UV-RIP); the immunoprecipitation was carried out from a cytoplasmic fraction in order to avoid contamination by genomic DNA that would later influence results obtained by reverse transcription quantitative PCR (RT-qPCR) (Figure S4b in Additional file 1). Both total and polyadenylated histone mRNA levels were measured in the SRSF3 and SRSF4 immunoprecipitates, using either random hexamers or oligo-dT as reverse primers. Figure 5c shows that both SR proteins immunoprecipitated histone mRNAs significantly above mock immunoprecipitates, irrespective of which reverse primer was used. Compared to input, detection of histone mRNAs was more robust when oligo-dT reverse primers were used, suggesting that SRSF3 and SRSF4 preferentially bind polyadenylated histone mRNAs. The detection of SRSF3 and SRSF4 bound to polyadenylated histone mRNAs in the cytoplasmic fraction suggests that both SR proteins may be involved in histone mRNA 3' end formation, export, and/or translation.
SRSF3 and SRSF4 make diverse contacts with exons and introns
Because SR proteins are known to regulate pre-mRNA splicing, we wondered whether the crosslink sites were correlated with particular locations within introns and/or exons. Data from in vitro studies suggest that SR proteins bind pre-mRNAs primarily within exons and thereby recruit spliceosomal components to adjacent 5' and 3' splice sites [37]. Therefore, crosslink sites were mapped to exon-intron and intron-exon boundaries. Variability in exon and intron length genome-wide leads to an apparent abundance of CLIP-tags close to the junctions (Figure S5a in Additional file 1). Therefore, we established a normalization factor derived from the length distribution of exons and introns to correct for these differences (Figure S5b in Additional file 1). Mapping of normalized crosslink sites showed exonic enrichment of SRSF3 and SRSF4 crosslink sites, which were most pronounced within 100 nucleotides of both 5' and 3' splice sites (Figure 6a). Peaks of SRSF3 and SRSF4 binding approximately 70 nucleotides upstream of 5' splice sites were more prominent than peaks observed downstream of 3' splice sites. Note that we did not map sequences falling onto exon-exon junctions, which explains the drop in crosslinking immediately upstream of 5' splice sites. Because SR proteins bind mRNA as well as pre-mRNA, it seems logical that exon sequences are overrepresented in the experimental data compared to intron sequences. However, similar patterns of enrichment in exons were observed when the pentamer motifs alone were considered (Figure 2; Figure S5c in Additional file 1), suggesting that the observed exon bias reflects the distribution of binding sequences within target RNAs. Interestingly, we noticed a peak of crosslink sites approximately 30 nucleotides upstream of 3' splice sites (Figure 6a). This corresponds to the approximate position of branch points in mammalian introns. However, the actual position of the branch point varies relative to the 3' splice site, with the longest observed distance of 400 nucleotides [38]. Therefore, crosslink sites were mapped to predicted mouse branch points [39]. This mapping indicated that SRSF3 and SRSF4 bind at or slightly downstream of the branch point nucleotide (Figure 6b). In conclusion, SRSF3 and SRSF4 preferentially contact exonic sequences, especially upstream of 5' splice sites; they also interact with branch points as suggested by two previous studies [7, 40], consistent with the model that SR proteins regulate splicing by contacting pre-mRNA in different functional regions.
SRSF3: a regulator of splicing factors
The notion that different splicing factors might regulate transcripts with similar functions, creating an expression module regulated by splicing, has intrigued the field for decades. We therefore asked about the functional identity of SRSF3 and SRSF4 protein-coding targets. Similar to our previous findings by RIP-chip [13], GO analysis of the protein-coding genes with significant SRSF3 and SRSF4 crosslink sites revealed functions related to nucleic acid binding and RNA processing as the most enriched GO terms for both SRSF3 and SRSF4 (Table S4 in Additional file 1). SRSF3 binding sites were especially enriched within genes encoding components of RNP complexes, including splicing factors (Table S5 in Additional file 1). SRSF3 crosslink sites were found within the genes encoding other SR proteins, as well as in proteins of heterogeneous nuclear ribonucleoprotein complexes and components of the core splicing machinery. SRSF3 is known to strictly regulate its own expression through an inclusion of a premature termination codon (PTC)-containing cassette exon, which is referred to as a 'poison cassette exon' because it leads to transcript degradation by nonsense-mediated decay (NMD) [13, 41]. Poison cassette exons occur in all SR protein family members and are ultraconserved among species [42, 43]. The inclusion of the alternative cassette exon or intron retention leads to the introduction of a PTC in the SR protein mRNA in every case. Indeed, SRSF3 and SRSF4 CLIP-tag clusters were detected in the SRSF3 and SRSF4 autoregulatory cassette exons, respectively (Figure 7a, top panel; Figure S6, bottom panel, in Additional file 1).
To date, it has been assumed that poison cassette exons are recognized by the gene's own protein product, in an auto-regulatory feedback loop (see above). Intriguingly, the SRSF3 CLIP-tag clusters were also found in the NMD-associated exons or introns of three heterologous SR protein-encoding genes, SRSF2, SRSF5 and SRSF7 (Figure 7a; Figure S6 in Additional file 1). In contrast, SRSF4 CLIP-tag clusters were found only in the poison cassette exon of its own pre-mRNA. We sought to validate the specificity of these interactions by UV-RIP. SRSF3 specifically immunoprecipitated SRSF2, SRSF3, SRSF5 and SRSF7 (pre-)mRNAs, whereas SRSF4 only immunoprecipitated significant levels of its own (pre-)mRNA (Figure 7b). These data validate the specificity of SRSF3 interactions with heterologous transcripts encoding SR protein family members in the manner indicated by iCLIP; note that low recovery of some transcripts may be due to the short half-lives of the bound, PTC-containing messages.
The presence of SRSF3 CLIP-tag clusters in heterologous SR protein-encoding transcripts could indicate that SRSF3 either positively or negatively regulates poison-cassette exon usage. If so, we would predict that SRSF3 levels in cells should affect the alternative splicing and ultimately expression levels of the three target SR protein transcripts identified. To test this directly, minigenes including the genomic regions around SRSF3 CLIP-tag clusters were constructed for SRSF2, SRSF3, SRSF5 and SRSF7 (Figure 8a; Figure S7C in Additional file 1). Efficient SRSF3 or SRSF4 protein over-expression and knockdown was achieved by transfection of cDNA expression constructs and RNA interference, respectively (Figure S7a in Additional file 1). Under these conditions, the splicing patterns of the minigene-encoded transcripts were analyzed, using vector-specific primers for RT-PCR. Figure 8a shows that over-expression of SRSF3 led to a marked increase in poison cassette exon inclusion for both the SRSF3 and SRSF7 minigenes. Upon SRSF3 knockdown, this pattern was reversed (Figure S7b in Additional file 1). Similarly, SRSF3 over-expression led to alternative splicing changes for the SRSF2 and SRSF5 minigenes, leading to increased poison cassette usage and/or intron retention (Figure S7c in Additional file 1). Importantly, SRSF4 over-expression or knockdown did not detectably alter splicing patterns (Figure 8a; Figure S7b, c in Additional file 1).
The alternative splicing events regulated by SRSF3 documented above predict that the transcripts regulated by SRSF3 - namely SRSF3 itself as well as SRSF2, SRSF5 and SRSF7 - will undergo degradation through NMD when SRSF3 is over-expressed. To test this, the NMD pathway was inhibited by treating the cells with cycloheximide (CHX) [44]. The use of CHX as a tool also enabled us to investigate the alternative splicing outcome of endogenous transcripts. Figure 8b shows that CHX treatment leads to detection of the otherwise highly unstable endogenous poison cassette exon-containing SR protein transcripts that increase in abundance upon SRSF3 over-expression. Another prediction of these findings is that the steady-state levels of heterologous SR protein transcripts will depend on SRSF3 levels. Through measurement of target mRNA levels by RT-qPCR, we show that SRSF5 and SRSF7 mRNA levels decrease significantly in cells over-expressing SRSF3 (Figure 8c). Upon CHX treatment, mRNA levels recovered to those of the control (Figure S7d in Additional file 1). Taken together, the data indicate that SRSF3 specifically binds not only its own but other SR protein transcripts and the binding leads to alternative splicing changes that increase the occurrence of PTCs, which in turn target the expressed transcripts for degradation through the NMD pathway. Thus, SRSF3 regulates the expression of its own mRNA and the mRNAs encoding three other SR protein family members (Figure 8d). This cross-regulation by SRSF3 and the observation that many other RNA binding proteins may similarly be regulated by SRSF3 (Table S5 in Additional file 1) raises the possibility that SRSF3 is a master regulator of the transcriptome acting through a network of feedback mechanisms.