Dynamics of alternative splicing in C/EBPα-enhanced B cell reprogramming occur independently from gene expression changes
To study the dynamics of changes in alternative splicing (AS) during cell reprogramming, primary mouse pre-B cells (hereafter referred to as “B cells”) were reprogrammed as previously described [18,19,20]. Briefly, B cells were isolated from bone marrow of reprogrammable mice, containing a doxycycline-inducible OSKM cassette and an OCT4-GFP reporter. These cells were infected with a retroviral construct containing an inducible version of C/EBPα fused to the estrogen receptor ligand-binding domain (ER). Infected cells were selected and re-plated on a feeder layer of inactivated MEFs. This was followed by a 18 h-long pulse of β-estradiol, triggering the translocation of C/EBPα-ER to the nucleus and poising the B cells for efficient and homogeneous reprogramming [18, 19]. After washout of β-estradiol, reprogramming was induced by growing the cells for 8 days in reprogramming medium containing doxycycline (see “Methods” and [20]). RNA was isolated every other day from duplicates and subjected to paired-end sequencing (RNA-seq), resulting in high coverage (more than 100 million reads per sample) (Fig. 1A). As positive controls, mouse embryonic stem (ES) cells and induced pluripotent stem (iPS) cells were also sequenced.
AS analysis was performed using vast-tools [21], an event-based software that quantifies percent spliced in (PSI) values of annotated AS events and constitutive exons in all samples. These analyses revealed more than 14,000 AS changes during the entire reprogramming time course (for any possible pair of conditions: minimum absolute ∆PSI of 10 between PSI averages and minimum difference of 5 between any individual replicates across conditions) and a gradual increase in the number of differentially spliced events when comparing B cells to progressive stages of somatic cell reprogramming (Fig. 1B and Additional file 1: Figure S1A). Different classes of AS events were detected, with similar relative proportions at the various time points: 31–47% cassette exons, 5–11% alternative 3′ splice sites and 6–11% alternative 5′ splice sites, and 33–57% retained introns (Fig. 1B).
To classify the dynamics of AS changes during reprogramming, we selected cassette exon (CEx) events differentially spliced in at least one comparison (4556 exons) and performed a fuzzy c-means clustering analysis on their scaled PSI values (Mfuzz; [22]). This analysis revealed diverse kinetics of exon inclusion occurring during B cell reprogramming (Fig. 1C, Additional file 1: Figure S1B-C and Additional file 2: Table S1). Six major clusters of AS dynamic profiles were detected: (1) exons that become included already after the C/EBPα pulse; (2) and (3) exons that are regulated either towards increased inclusion or skipping early after OSKM induction (day 2); (4) and (5) exons that display changes in inclusion at middle stages of reprogramming (day 4 onwards); (6) a cluster of exons only included at the latest steps of reprogramming (day 8 onwards) (Fig. 1C). Each of these clusters consisted of 300–500 exons. The inclusion levels of examples of exons belonging to different cluster types were validated by semi-quantitative RT-PCR (Fig. 1D). For reference, changes in the PSI values of Grhl1 exon 6 and Dnmt3B exon 10, previously described to be associated with reprogramming and pluripotency [17, 23, 24] were also quantified and found to follow similar inclusion patterns in B cell reprogramming, compared to the ones previously described in other systems (Fig. 1E).
We next sought to compare general AS and gene expression dynamics during reprogramming. Gene expression was analyzed using the edgeR package [25] and the level of similarity between each pair of conditions was estimated using a Pearson correlation coefficient on the cpm (counts per million) values of the most variable genes (3rd quartile coefficient of variation, n = 2961). In addition, Pearson correlation coefficient was calculated based on the PSI values from the most variable cassette exons (3rd quartile coefficient of variation, n = 1140). Both analyses showed pronounced switches between days 4 and 6 post-OSKM induction (Fig. 1F). Overall, however, most clusters displayed matching profiles in gene expression and AS of any included exon in less than 10% of the genes, reaching a maximum of 20% in a subset of AS clusters (Fig. 1G and Additional file 1: Figures S1D-E). These results argue, as observed before in a variety of other systems (e.g., [26]), that global programs of regulation of gene expression and AS are uncoupled from each other.
Interestingly, a larger proportion of exons included (or skipped) at early stages of reprogramming are predicted to disrupt the open reading frame (ORF) of the transcripts upon exon skipping (or inclusion, respectively), compared to middle/late exons and to the general distribution of mapped alternative exons (Fig. 1H, classification as described in [21]). This suggests a higher impact of AS-mediated on/off regulation of the corresponding protein expression via nonsense-mediated decay (NMD), and a switch to expression of full-length proteins during early steps of reprogramming. Middle clusters, instead, contain more exons predicted to preserve the coding potential of their transcripts, implying modulation of the functions of their encoded protein isoforms rather than on/off switches (Fig. 1H). Consistent with these concepts, while PSI values of cassette exons tend to increase throughout reprogramming (Fig. 1B, blue bars), intron retention—generally leading to NMD—tends to decrease in the course of reprogramming (Fig. 1B, grey bars).
AS changes at intermediate reprogramming stages show commonalities with MEF reprogramming
As a first step to identify key AS events and potential regulators important for reprogramming, we compared our transcriptome analysis of B cell reprogramming with that of MEF reprogramming [17]. The two datasets differ in the experimental design and time frame (compare Figs. 1A and 2A). Specifically, the MEF system required sorting of cells undergoing reprogramming using the SSEA1 surface marker, while this was not necessary for B cell reprogramming due to its efficiency. To facilitate the comparison between the two transcriptome datasets, vast-tools analysis was applied to the MEF dataset [17], which yielded 843 cassette exons differentially spliced (|∆PSI| ≥ 10, range ≥ 5) between any pair of samples of the MEF reprogramming dataset. Despite differences in the experimental setup, 79% of these exons (669 out of 843) were also found to be differentially spliced in B cell reprogramming (Additional file 1: Figure S2A and Additional file 3: Table S2). A similar level of overlap was also observed at the level of other AS events (72% of AS events in general, Additional file 1: Figure S2A). The overlap between differentially spliced exons in the two systems was higher for middle and late AS clusters than for early clusters of B cell reprogramming (Fig. 2B), as might be expected from the convergence on a common program of AS related to pluripotency.
To further compare the two datasets, we performed a Principal Component Analysis (PCA) on the gene expression of the most variable genes in both datasets (3rd quartile coefficient of variation, n = 2679) separating the stages into four distinct groups by k-means clustering (Fig. 2C). These groups clearly distinguish between starting cells, early and late stages of reprogramming and pluripotent cells. The PCA allowed us to outline a “reprogramming pseudotime” which was subsequently used to select AS exons and regulators for functional characterization. It also further highlighted the transition between days 4 and 6 in B cell reprogramming, juxtaposing them with days 7/10 in MEF reprogramming. A heatmap displaying the scaled PSI values of the 669 common differentially spliced exons of the two datasets at the different steps of the reprogramming process shows substantial similarities in exon inclusion (Fig. 2D).
In contrast to the similarities in AS profiles between the two reprogramming systems, we observed that expression of RNA-binding proteins (RBPs) previously associated with pluripotency, somatic cell reprogramming, and/or development [7, 9, 14, 17] and presumably mechanistically linked to different aspects of post-transcriptional regulation, differed significantly between the two datasets (Fig. 2E). Despite these more divergent profiles, hierarchical clustering captured three known functional groups, with cluster (a) containing factors with higher expression in iPS/ES samples (fold change > 0 and high rank score according to 7, right panel) and known to promote pluripotency/reprogramming, such as U2af1 or Srsf2/3 [9, 14]. In contrast, cluster (b) contains factors with higher expression in the starting somatic cells, which includes known repressors of reprogramming such as Mbnl1/2, Celf2, and Zcchc24 [7, 11, 17]. Finally, cluster (c) contains factors with more variegated expression patterns at early and intermediate reprogramming steps (and mildly repressed in iPS/ES cells), including Esrp1/2 [16, 17] (Fig. 2E).
Taken together, our analyses revealed widespread AS changes during B cell reprogramming, which significantly overlapped with those of MEF reprogramming, especially at intermediate phases of the process.
Predicted regulators of alternative splicing during somatic cell reprogramming
To infer potential regulators of exons differentially spliced during B cell reprogramming, we extracted gene expression profiles of 507 RBPs (as annotated in the Uniprot database), which were detectably expressed (cpm ≥ 5 in at least 33% of samples) and featured a minimum of variation across the B cell reprogramming dataset (coefficient of variation ≥ 0.2). Using the membership function of the Mfuzz package, we correlated (positively or negatively) the scaled gene expression profile of each RBP to each AS cluster centroid. This allowed us to derive a list of potential regulators whose changes in levels of expression correlate (or anti-correlate) with the profiles of AS changes in each cluster (membership > 0.3) (Fig. 3, Additional file 1: Figure S3 and Additional file 4: Table S3). In line with previous work [7, 11, 14], our analysis identified known AS regulators involved in the induction or repression of pluripotency/developmental decisions, such as Mbnl1/2, Celf2 (both potential negative regulators of pulse/late clusters 1/6), and U2af1 (potential positive regulator of middle cluster 4) (Fig. 3). Importantly, additional RBPs and splicing factors without previously known functions in reprogramming emerged as possible regulators.
To further define such regulators, we focused on RBPs that change their expression after the C/EBPα pulse or at early stages of reprogramming for functional validation during the induction of pluripotency, as we speculated that these could drive the inclusion/skipping of both early and intermediate AS exons during the reprogramming of C/EBPα-poised B cells. We selected CPSF3 as a putative positive regulator of very early events because its profile of expression increases during B cell and MEF reprogramming in parallel with centroid of AS changes in cluster 1 (Fig. 3A). While CPSF3 was originally described as part of cleavage / polyadenylation complexes [27], more recent work implicated components of this complex on alternative splicing regulation, including numerous internal exons not linked to the selection of alternative polyadenylation sites [28, 29]. Following a similar logic, we chose two putative negative regulators of cluster 1, namely TIA1, a well-described AS regulator with roles in cell proliferation and development (see below), and hnRNP UL1, a member of the heterogeneous ribonucleoprotein (hnRNP) family whose involvement in splicing regulation is largely unexplored (Fig. 3B). Due to experimental difficulties of genetic manipulation of the B cell reprogramming system (see “Discussion”), we modulated their expression at early stages of MEF reprogramming [30] and examined the consequences on the dynamics of pluripotency induction and on relevant AS alterations.
Knockdowns of CPSF3 or hnRNP UL1 repress MEF reprogramming
The Cleavage and Polyadenylation Specificity Factor (CPSF) complex is primarily involved in mRNA polyadenylation, but a number of its components, including CPSF3, has been shown to also play a role in splicing [28, 29, 31,32,33]. Cpsf3 expression increased early during reprogramming of both B cells and MEFs (Figs. 3A and 4A). The expression of Hnrnpul1, an hnRNP whose function in RNA metabolism is poorly understood, decreases in B cells after the C/EBPα pulse and then stabilizes at levels similar to those observed throughout MEF reprogramming (Figs. 3B and 4B).
To test their effects on MEF reprogramming, two different short hairpin RNAs (shRNAs) for each factor were cloned into lentiviral vectors containing a GFP reporter. The protocol used is summarized in Additional file 1: Figure S4A. Briefly, early passage MEFs isolated from reprogrammable mice were transduced with constructs bearing the shRNAs and the cells treated with doxycycline to activate OSKM (day 0). GFP+ cells were FACS-sorted 48 h post-infection and seeded on inactivated MEFs serving as feeder layers, to proceed with reprogramming for up to 14 days. Cells were harvested every other day and samples analyzed by RT-qPCR for the expression of the shRNAs target and of pluripotency markers, and by flow cytometry to quantify the proportion of cells expressing the early pluripotency cell surface marker SSEA1 (appearing around days 5–6) and the later pluripotency marker EPCAM1 (appearing around day 8, [34]). At day 14, 2 days after removing doxycycline, cultures were stained for alkaline phosphatase (AP) activity to identify iPS colonies and to assess the efficiency of reprogramming.
Cells infected with shRNAs targeting Cpsf3 and Hnrnpul1 were compared with non-infected (NI) cells, as well as with cells transduced with a scrambled control shRNA (shSCR). Knockdown efficiency, quantified by RT-qPCR (Fig. 4C, D), showed a 51% and 58% reduction in Cpsf3 at day 6 post-infection with shCPSF3#1 and #2, respectively, becoming slightly less efficient during reprogramming (Fig. 4C). Similarly, knockdown of Hnrnpul1 resulted in 81% and 76% reduction of mRNA levels at day 6 post-infection with shUL1#1 and #2, respectively (Fig. 4D). The increase in the mRNA levels of endogenous Pou5f1 (encoding OCT4) and Nanog pluripotency markers was delayed in both Cpsf3 and Hnrnpul1 knockdown cells (compare for example values at day 8 in Fig. 4E and Additional file 1: Figure S4B), suggesting slower reprogramming kinetics. Consistent with these observations, knockdown of Cpsf3 and Hnrnpul1 reduced the percentage of cells expressing SSEA1 at day 6 (Fig. 4F) and cells expressing both SSEA1 and EPCAM1 at days 10–12 (Fig. 4G and Additional file 1: Figure S4C). Survival of cells during reprogramming did not seem to be affected in the knockdowns because no significant increase in the proportion of DAPI-stained cells was observed throughout reprogramming (Additional file 1: Figure S4D). Finally, we counted the number of AP positive colonies at day 14 and found that the amount was significantly reduced in cells infected with either the shRNAs targeting Cpsf3 or those targeting Hnrnpul1 (Fig. 4H). We finally tested the effects of overexpression of a T7-tagged version of CPSF3 from the beginning of reprogramming (Additional file 1: Figure S4A and E). A trend towards increased expression of Pou5f1 and Nanog at late times of reprogramming upon T7-Cpsf3 overexpression was observed (Additional file 1: Figure S4F). However, this was not accompanied by enhanced reprogramming efficiency or redistribution of reprogramming intermediates (Additional file 1: Figure S4G-I), likely because of additional rate-limiting steps required to increase the efficiency of this complex process or because of suboptimal timing or levels of overexpression achieved in the experiment.
Taken together, these data show that the knockdown of Cpsf3 or Hnrnpul1 reduces the MEF to iPS reprogramming efficiency, therefore arguing that both RBPs contribute to the induction of pluripotency.
Overexpression of TIA1 represses MEF reprogramming
TIA1 is an RBP and AS regulator [35,36,37]. Tia1 mRNA levels decrease early during B cell reprogramming (Figs. 3B and 5A), compatible with a potential role as a repressor of cell reprogramming in this system. Because depletion of TIA1 induces mouse embryonic lethality and the protein is important for MEF proliferation, cell cycle progression, autophagy, and numerous signaling pathways [38], we decided to test the effects of TIA1 overexpression during MEF reprogramming. For this purpose, primary MEFs were infected at day 0 (concomitantly with the induction of reprogramming), with retroviral constructs containing a T7-tagged Tia1 cDNA and a GFP reporter to allow sorting of the transduced cells (Additional file 1: Figure S4A). Levels of Tia1 were quantified by RT-qPCR (24-fold increase compared to cells transduced with an empty vector, Fig. 5B). Consistent with our prediction from its expression profile in B cell reprogramming, overexpression of Tia1 repressed the induction of endogenous Pou5f1 and Nanog genes (Fig. 5C and Additional file 1: Figure S5A) and led to a reduction of early SSEA1+EPCAM1− cells and of late reprogramming intermediates (SSEA1+EPCAM1+ cells) compared to empty vector and non-infected controls (Fig. 5D, E and Additional file 1: Figure S5B). In addition, it significantly reduced the count of AP+ colonies at day 14 post-OSKM induction compared to controls (Fig. 5F), without substantially affecting the viability of reprogramming cells (Additional file 1: Figure S5C). Of note, overexpression of Tia1 delayed the gradual skipping of Lef1 exon 6, a conserved AS event between B cell and MEF reprogramming (Fig. 5G), which might partially explain the observed reduction in reprogramming efficiency.
Taken together, the observed reduced expression of TIA1 during B cell reprogramming and its impairment of MEF reprogramming when overexpressed suggest that it functions as a general repressor of pluripotency induction.
CPSF3, hnRNP UL1, and TIA1 regulate alternative splicing during reprogramming
To assess the effects of CPSF3, hnRNP UL1, and TIA1 manipulation on AS during reprogramming, RNA-seq analyses were carried out with RNAs isolated from MEFs at 0 and at 12 days post-OSKM induction, comparing the effects of Cpsf3/Hnrnpul1 knockdown (two different shRNAs for each factor) or Tia1 overexpression with those of the corresponding shSCR/empty vector controls (Fig. 6A). Quantification of gene expression of Tia1, Cpsf3, and Hnrnpul1 at the two timepoints is shown in Additional file 1: Figures S6A-B.
To determine the impact of these perturbations on AS, we calculated differentially spliced events between day 0 and day 12 in each condition as described before (|∆PSI(day12 − day0)| ≥ 10, range ≥ 5). TIA1-dependent events were defined as those changing during reprogramming only in the control or the overexpression condition, with |∆∆PSI(TIA1 − Empty)| ≥ 10 (colored dots in Fig. 6B). Similarly, CPSF3-dependent and UL1-dependent events were defined as those with |∆∆PSI(average shRNAs − shSCR)| ≥ 10 (only events selected by both specific shRNAs were considered, 54% and 60% overlap respectively, colored dots in Fig. 6C and Additional file 1: Figure S6C). For each RBP, control events, changing during reprogramming but not affected by TIA1, CPSF3, or hnRNP UL1 manipulation, were classified as those differentially spliced between days 0 and 12, either in control or knockdown/overexpression conditions (|∆PSI(day12 − day0)| ≥ 10, range ≥ 5), that display minimal differences between the ∆PSI values in the two conditions (e.g., |∆∆PSI(TIA1 − Empty)| < 2) (grey dots in Fig. 6B, C and Additional file 1: Figure S6C). All sets are summarized in Additional file 5: Table S4.
We thus identified 387 TIA1-dependent events and 558 TIA1-independent events. Similarly, we identified 357 CPSF3-dependent and 298 UL1-dependent events (as well as 429 CPSF3-independent and 662 UL1-independent events). Interestingly, CPSF3-dependent and UL1-dependent events showed a significant overlap (e.g., 70% of UL1-dependent events were also CPSF3-dependent events, of which 98% changed in the same direction) whereas little overlap was observed with TIA1-dependent events (Fig. 6D, left panel). As we found no strong evidence for cross-regulation between CPSF3 and hnRNP UL1 factors (either in AS or in gene expression, Additional file 1: Figure S6B), these data suggest that CPSF3 and hnRNP UL1 contribute to a common program of AS changes relevant for cell reprogramming. Moreover, TIA1-, CPSF3-, and UL1-dependent events detected in MEFs showed high overlap with events that were also differentially spliced during B cell reprogramming, suggesting that these factors regulate AS events that generally contribute to the induction of pluripotency (Fig. 6D, right panel).
Focusing on cassette exon events, TIA1-dependent exons (but not TIA1-independent exons) displayed significantly higher PSI values at day 12 compared to day 0 of reprogramming, as well as a relative increase in exons displaying intermediate PSI values (Fig. 6E, F and Additional file 1: Figure S6E). These effects are altered by TIA1 overexpression (Fig. 6E), suggesting that in this system, the typical role of TIA1 is to repress exon inclusion.
Analysis of sequence features associated with TIA1-regulated exons performed using Matt [39] revealed weaker 5′ splice sites, shorter median length of the flanking downstream introns and a larger difference in GC content between the alternative exons and their flanking upstream introns (Fig. 6G), suggestive of a strong dependence of these exons on the process of exon definition [40]. Notably, an enrichment in putative TIA1 binding motifs was detected about 100 nucleotides upstream of the distal 3′ splice site (Fig. 6H), suggesting that binding of TIA1 to this region might repress inclusion of the alternative exon (or enhance pairing between the distal splice sites), leading to exon skipping.
Gene Ontology (GO) enrichment analysis (GOrilla; [41]) on the set of genes containing TIA1-dependent AS events compared to a background list of all genes containing mapped AS events (n = 11132) showed an enrichment in functions related to peptide/hormone secretion, T-helper cell functions, and embryonic development (Fig. 6I and Additional file 6: Table S5).
CPSF3- and UL1-dependent exons showed similar inclusion patterns: an increased proportion of intermediate PSIs was observed at day 12 in CPSF3- and UL1-dependent but not in CPSF3- and UL1-independent events (Fig. 6J, K and Additional file 1: Figure S6E). CPSF3- and UL1-dependent exons also showed higher PSI values at day 12 compared to day 0, and an enrichment in exons displaying intermediate PSI values at the end of the reprogramming process (Fig. 6J, K and Additional file 1: Figure S6E). These effects were attenuated, specifically for CPSF3- and UL1-dependent exons, upon knockdown of either of these factors (Fig. 6J, K and Additional file 1: Figure S6E), supporting the notion that their regulatory programs overlap. Both CPSF3- and UL1-dependent exons displayed weaker 3′ and 5′ splice sites (Fig. 6L). GO analyses revealed that CPSF3- and UL1-dependent events belong to genes enriched in functional categories related to the regulation of cell morphogenesis, cell-substrate adhesion, and cytoskeleton organization (Fig. 6M and Additional file 6: Table S5). Changes in inclusion levels of selected CPSF3- and UL1-dependent events were validated by semi-quantitative RT-PCR (Additional file 1: Figure S6F).
As CPSF3 was originally described as a cleavage / polyadenylation factor, we asked whether the CPSF3-regulated exons could be the result of changes in alternative polyadenylation (APA) sites. Not unexpectedly, quantification of poly-A usage performed with QAPA [42] revealed hundreds of genes with APA events affected by Cpsf3 knockdown during reprogramming (277 genes) and, interestingly, also many genes with APA events affected by Hnrnpul1 knockdown (237 genes) (|∆∆PAU(shRNA–shSCR) ≥ 10). However, the overlap between the genes / exons whose AS or APA were affected by Cpsf3 or Hnrnpul1 knockdown was very limited (Additional file 1: Figures S6G-H), suggesting that the mechanisms by which CPSF3 and hnRNP UL1 regulate alternative splicing and 3′ end processing are distinct and independent.
Taken together, these results suggest that the AS regulators TIA1, CPSF3, and hnRNP UL1 function in cell reprogramming through their activities on genes relevant for cell fate decisions. Moreover, CPSF3 and hnRNP UL1 act through a highly overlapping program of splicing changes, while TIA1 affects a different set of AS events. Distinct programs of AS and APA were associated with CPSF3 and hnRNP UL1.