- Open Access
Alternative splicing links histone modifications to stem cell fate decision
Genome Biology volume 19, Article number: 133 (2018)
Understanding the embryonic stem cell (ESC) fate decision between self-renewal and proper differentiation is important for developmental biology and regenerative medicine. Attention has focused on mechanisms involving histone modifications, alternative pre-messenger RNA splicing, and cell-cycle progression. However, their intricate interrelations and joint contributions to ESC fate decision remain unclear.
We analyze the transcriptomes and epigenomes of human ESC and five types of differentiated cells. We identify thousands of alternatively spliced exons and reveal their development and lineage-dependent characterizations. Several histone modifications show dynamic changes in alternatively spliced exons and three are strongly associated with 52.8% of alternative splicing events upon hESC differentiation. The histone modification-associated alternatively spliced genes predominantly function in G2/M phases and ATM/ATR-mediated DNA damage response pathway for cell differentiation, whereas other alternatively spliced genes are enriched in the G1 phase and pathways for self-renewal. These results imply a potential epigenetic mechanism by which some histone modifications contribute to ESC fate decision through the regulation of alternative splicing in specific pathways and cell-cycle genes. Supported by experimental validations and extended datasets from Roadmap/ENCODE projects, we exemplify this mechanism by a cell-cycle-related transcription factor, PBX1, which regulates the pluripotency regulatory network by binding to NANOG. We suggest that the isoform switch from PBX1a to PBX1b links H3K36me3 to hESC fate determination through the PSIP1/SRSF1 adaptor, which results in the exon skipping of PBX1.
We reveal the mechanism by which alternative splicing links histone modifications to stem cell fate decision.
Embryonic stem cells (ESCs), the pluripotent stem cells derived from the inner cell mass of a blastocyst, provide a vital tool for studying the regulation of early embryonic development and cell fate decision and hold the promise for regenerative medicine . The past few years have witnessed remarkable progress in understanding the ESC fate decision, i.e. either pluripotency maintenance (self-renewal) or proper differentiation . The underlying mechanisms have been largely expanded from the core pluripotent transcription factors (TFs) , signaling pathways [4,5,6,7,8,9], specific microRNAs [10, 11], and long non-coding RNAs  to alternative pre-messenger RNA (mRNA) splicing (AS) [13, 14], histone modifications (HMs) [15,16,17,18,19], and cell-cycle machinery . These emerging mechanisms suggest their intricate interrelations and potential joint contributions to ESC pluripotency and differentiation, which, however, remain unknown.
Alternative splicing (AS) is one of the most important pre-mRNA processing to increase the diversity of transcriptome and proteome in tissue-dependent and development-dependent manners . The estimates based on RNA-sequencing (RNA-seq) revealed that up to 94%, 60%, and 25% of genes in human, Drosophila melanogaster, and Caenorhabditis elegans, respectively, undergo AS [21,22,23,24,25]. AS also provides a powerful mechanism to control the developmental decision in ESCs [26,27,28]. Specific isoforms are necessary to maintain both the identity and activity of stem cells and switching to different isoforms ensures proper differentiation . In particular, the AS of TFs plays major roles in ESC fate determination, such as FGF4  and FOXP1  for hESC, and Tcf3  and Sall4  for mouse ESCs (mESCs). Understanding the precise regulations on AS would contribute to the elucidation of ESC fate decision and has attracted extensive efforts . For many years, studies aiming to shed light on this process focused on the RNA level, characterizing the manner by which splicing factors (SFs) and auxiliary proteins interact with splicing signals, thereby enabling, facilitating, and regulating RNA splicing. These cis-acting RNA elements and trans-acting SFs have been assembled into splicing code , revealing a number of AS regulators critical for ESC differentiation, such as MBNL  and SON . However, these genetic controls are far from sufficient to explain the faithful regulation of AS , especially in some cases that tissue-specific AS patterns exist despite the identity in sequences and ubiquitous expression of involved SFs [36, 37], indicating additional regulatory layers leading to specific AS patterns. As expected, we are increasingly aware that splicing is not an isolated process; rather, it occurs co-transcriptionally and is presumably also regulated by transcription-related processes. Emerging provocative studies have unveiled that AS is subject to extensive controls not only from genetic but also epigenetic mechanisms due to its co-transcriptional occurrence . The epigenetic mechanisms, such as HMs, benefit ESCs by providing an epigenetic memory for splicing decisions so that the splicing pattern could be passed on during self-renewal and be modified during differentiation without the requirement of establishing new AS rules .
HMs have long been thought to play crucial roles in ESC maintenance and differentiation by determining what parts of the genome are expressed. Specific genomic regulatory regions, such as enhancers and promoters, undergo dynamic changes in HMs during ESC differentiation to transcriptionally permit or repress the expression of genes required for cell fate decision . For example, the co-occurrence of the active (H3K4me3) and repressive (H3K27me3) HMs at the promoters of developmentally regulated genes defines the bivalent domains, resulting in the poised states of these genes . These poised states will be dissolved upon differentiation to allow these genes to be active or more stably repressed depending on the lineage being specified, which enables the ESCs to change their identities . In addition to above roles in determining transcripts abundance, HMs are emerging as major regulators to define the transcripts structure by determining how the genome is spliced when being transcribed, adding another layer of regulatory complexity beyond the genetic splicing code . A number of HMs, such as H3K4me3 , H3K9me3 , H3K36me3 [44, 45], and hyperacetylation of H3 and H4 [46,47,48,49,50], have been proven to regulate AS by either directly recruiting chromatin-associated factors and SFs or indirectly modulating transcriptional elongation rate . Together, these studies reveal that HMs determine not only what parts of the genome are expressed, but also how they are spliced. However, few studies focused on the detailed mechanisms, i.e. epigenetic regulations on AS in the context of cell fate decision.
Additionally, cell-cycle machinery dominates the mechanisms underlying ESC pluripotency and differentiation [20, 51]. Changes of cell fates require going through the cell-cycle progression. Studies in mESCs  and hESCs [53, 54] found that the cell fate specification starts in the G1 phase when ESCs can sense differentiation signals. Cell fate commitment is only achieved in G2/M phases when pluripotency is dissolved through cell-cycle-dependent mechanisms. However, whether the HMs and AS and their interrelations are involved in these cell-cycle-dependent mechanisms remains unclear. Therefore, it is intuitive to expect that HMs could contribute to ESC pluripotency and differentiation by regulating the AS of genes required for specific processes, such cell-cycle progression. Nevertheless, we do not even have a comprehensive view of how HMs relate to AS outcome at a genome-wide level during ESC differentiation. Therefore, further studies are required to elucidate the extent to which the HMs are associated with specific splicing repertoire and their joint contributions to ESC fate decision between self-renewal and proper differentiation.
To address these gaps in current knowledge, we performed genome-wide association studies between transcriptome and epigenome of the differentiation from the hESCs (H1 cell line) to five differentiated cell types . These cells cover three germ layers for embryogenesis, adult stem cells, and adult somatic cells, representing multiple lineages of different developmental levels (Additional file 1: Figure S1A). This carefully selected dataset enabled our understanding of AS epigenetic regulations in the context of cell fate decision. First, we identified several thousands of AS events that are differentially spliced between the hESCs and differentiated cells, including 3513 mutually exclusive exons (MXE) and 3678 skipped exons (SE) which were used for further analyses. These hESC differentiation-related AS events involve ~ 20% of expressed genes and characterize the multiple lineage differentiation. Second, we profiled 16 HMs with chromatin immunoprecipitation sequencing (ChIP-seq) data available for all six cell types, including nine types of acetylation and seven types of methylation. Following the observation that the dynamic changes of most HMs are enriched in AS exons and significantly different between inclusion-gain and inclusion-loss exons, we found that three of the 16 investigated HMs (H3K36me3, H3K27ac, and H4K8ac) are strongly associated with 52.8% of hESC differentiation-related AS exons. We then linked the association between HMs and AS to cell-cycle progression based on the additional discovery that the AS genes predominantly function in cell-cycle progression. More intriguingly, we found that HMs and AS are associated in G2/M phases and involved in ESC fate decision through promoting pluripotency state dissolution, repressing self-renewal, or both. In particular, with experimental valuations, we demonstrated an H3K36me3-regulated isoform switch from PBX1a to PBX1b, which is implicated in hESC differentiation by attenuating the activity of the pluripotency regulatory network. Collectively, we presented a mechanism conveying the HM information into cell fate decision through the regulation of AS, which will drive extensive studies on the involvements of HMs in cell fate decision via determining the transcript structure rather than only the transcript abundance.
AS characterizes hESC differentiation
The role of AS in the regulation of ES cell fates adds another notable regulatory layer to the known mechanisms that govern stemness and differentiation . To screen the AS events associated with ES cell fate decision, we investigated a panel of RNA-seq data during hESC (H1 cell line) differentiation . We considered four cell types directly differentiated from H1 cells, including trophoblast-like cells (TBL), mesendoderm (ME), neural progenitor cells (NPC), and mesenchymal stem cells (MSC). We also considered IMR90, a cell line for primary human fetal lung fibroblast, as an example of terminally differentiated cells. These cells represent five cell lineages of different developmental levels (Additional file 1: Figure S1A). We identified thousands of AS events of all types with their changes of “per spliced in” (ΔPSIs) are > 0.1 (inclusion-loss) or < − 0.1 (inclusion-gain), and with the false discovery rates (FDRs) are < 0.05 based on the measurement used by rMATS  (Additional file 1: Figure S1B and Table S1, see “Methods”). We implemented further analyses only on the most common AS events, including 3513 MXEs and 3678 SEs, which are referred to as hESC differentiation-associated AS exons (Additional file 1: Figure S1C and Additional file 2: Table S2).
These hESC differentiation-related AS exons possess typical properties, as previously described [57, 58], as follows: (1) most of their hosting genes are not differentially expressed between hESCs and differentiated cells (Additional file 1: Figure S1D); (2) they tend to be shorter with much longer flanking introns compared to the average length of all exons and introns (RefSeq annotation), respectively (Additional file 1: Figure S2A, B); (3) the arrangement of shorter AS exons surrounded by longer introns is consistent across cell lineages and AS types (Additional file 1: Figure S2C, D); and (4) the lengths of AS exons are more often divisible by three to preserve the reading frame (Additional file 1: Figure S2E).
During hESC differentiation, about 20% of expressed genes undergo AS (2257 genes for SE and 2489 genes for MXE), including previously known ESC-specific AS genes, such as the pluripotency factor FOXP1  (Fig. 1a) and the Wnt/β-catenin signalling component CTNND1  (Fig. 1b). These hESC differentiation-related AS genes include many TFs, transcriptional co-factors, chromatin remodelling factors, housekeeping genes, and bivalent domain genes implicated in ESC pluripotency and development  (Fig. 1c and Additional file 1: Figure S1C). Enrichment analysis based on a stemness gene set  also shows that hESC differentiation-related AS genes are enriched in the regulators or markers that are most significantly associated with stemness signatures of ESCs (Additional file 1: Figure S3A, see “Methods”).
Clustering on AS events across cell lineages show lineage-dependent splicing patterns (Fig. 1d). Upon hESC differentiation, the SE exons tend to lose their inclusion levels (inclusion-loss), while the upstream exons of MXE events are likely to gain their inclusion levels (inclusion-gain) (Fisher’s exact test, p = 3.83E-107). The numbers of AS events increase accordingly with the developmental level following hESC differentiation (Fig. 1c). For example, the differentiation to ME involves the fewest AS events and ME presents the most stem-cell-like AS profiles, while the IMR90 has the most AS events and exhibits the most similar AS profiles to adult cells (Fig. 1c, d). Inter-lineage comparisons show, on average, that 42.0% of SE and 56.4% of MXE events (Fig. 1c, d and Additional file 1: Figure S3B, C), involved in 29.6% and 38.6% of AS hosting genes (Fig. 1e, f and Additional file 1: Figure S3D, E), are lineage-specific. In contrast, only 0.65% of SE and 0.14% of MEX events (Additional file 1: Figure S3B, C), involved in 0.49% and 1.52% of AS hosting genes, are shared by all lineages (Fig. 1e, f and Additional file 1: Figure S3D, E). Similar trends are observed from pairwise comparisons (Additional file 1: Figure S3F). Furthermore, one-third of AS genes (n = 881) have both MXE and SE events (Additional file 1: Figure S3G). Only four genes are common across all cell lineages and AS types, of which the AS events of Ctnnd1 and Mbd1 have been reported to regulate mESC differentiation . Together, these results demonstrate that AS depicts lineage-dependent and developmental level-dependent characterizations of hESC differentiation.
Dynamic changes of HMs predominantly occur in AS exons
In ESCs, epigenetic mechanisms contribute mainly to maintaining the expression of pluripotency genes and the repression of lineage-specific genes in order to avoid exiting from stemness. Upon differentiation, epigenetic mechanisms orchestrate the expression of developmental programs spatiotemporally to ensure the heritability of existing or newly acquired phenotypic states. Though epigenetic signatures are mainly found to be enriched in promoters and enhancers, it has become increasingly clear that they are also present in gene bodies, especially in exon regions, implying a potential link of epigenetic regulation to pre-mRNA splicing [60, 61]. Consistent with previous reports [36, 37, 62], we also observed that few involved SFs are differentially expressed during H1 cells differentiation (Additional file 1: Figure S3H, see “Methods”), which confirms the existence of an additional layer of epigenetic regulations on AS. However, the extents to which the AS is epigenetically regulated and how these AS genes contribute to the cell fate decision are poorly understood. We focused on 16 HMs, including nine histone acetylation and seven histone methylation that have available data in all six cell types (see “Methods”) and aimed to reveal their associations with AS genes during hESC differentiation.
To investigate whether the dynamic changes of these HMs upon cell differentiation prefer the AS exons consistently (Fig. 2a, b), we profiled the differential HM patterns of around the hESC differentiation-associated AS exons and the same number of randomly selected constitutive splicing (CS) exons of the same AS genes for each differentiation lineage. We compared the changes of ChIP-seq reads count (normalized Δ reads count, see “Methods”) in ± 150-bp regions around the splice sites upon hESC differentiation (Fig. 2c and Additional file 1: Figure S4, see “Methods”). Except for a small part of cases (with black dots or boxes in Fig. 2d), most HMs changed more significantly around AS exons than around constitutive exons upon hESC differentiation (Mann–Whitney–Wilcoxon test, p ≤ 0.05, Fig. 2d and Additional file 1: Figure S4). Nevertheless, some HMs displayed strong links to AS, such as H3K79me1 and H3K36me3, while others only had weak link strengths, such as H3K27me3 and H3K9me3 (Fig. 2d). This result is consistent with the fact that the former are involved in active expression and AS regulation [38, 44, 63], while the latter are the epigenetic marks of repressed regions and heterochromatin . The link strengths are presented as the -log10 p values to test whether the HM changes consistently prefer the AS exons across different cell lineages and AS types (Fig. 2d sidebar graph, see “Methods”). Taken together, these results, from a global view, revealed a potential regulatory link from HMs to RNA splicing, of which some are strong while the others are weak.
Three HMs are significantly associated with AS upon hESC differentiation
To quantitatively associate the HMs with AS, all ChIP-seq data were processed for narrow peak calling using MACS2 . For each AS exon of each differentiation lineage, we then quantified the differential inclusion levels, i.e. the changes of “percent splice in” (ΔPSIs, Additional file 1: Figure S1B), and the differential HMs signals, i.e. the changes of normalized narrow peak height of ChIP-seq (ΔHMs, Additional file 1: Figure S5A, see “Methods”) between H1 and differentiated cells. We observed significant differences in all HM profiles (except H3K27me3, Additional file 1: Figure S5B) between the inclusion-gain and inclusion-loss exons across cell lineages and AS types (Mann–Whitney–Wilcoxon test, p ≤ 0.05) (Fig. 3a and Additional file 1: Figure S5B). However, three independent correlation tests showed only weak global quantitative associations between the ΔPSIs and ΔHMs for some HMs (Fig. 3c and Additional file 1: Figure S5C), including eight HMs for MXE AS exons and eight HMs for SE AS exons. The weak associations may indicate that only subsets of AS exons are strongly associated with HMs and vice versa, which is consistent with a recent report .
To explore the subsets of highly associated AS exons and corresponding HMs, we performed k-means clustering on the sets of inclusion-gain and inclusion-loss exons of SE and MXE events, separately, taking the ΔHMs of eight identified HMs as epigenetic features (Fig. 3c and Additional file 1: Figures S5D and S6, see “Methods”). We obtained three subsets of HM-associated SE exons and three subsets of HM-associated MXE exons (Additional file 3: Table S3). The three HM-associated SE subsets include 180, 664, and 1062 exons and are negatively associated with H4K8ac (Additional file 1: Figure S6), negatively associated with H3K36me3 (Fig. 3c), and positively associated with H3K36me3 (Additional file 1: Figure S6), respectively. The three HM-associated MXE subsets include 99, 821, and 971 exons and are positively associated with H3K27ac (Fig. 3d), negatively associated with H3K36me3 (Additional file 1: Figure S6), and positively associated with H3K36me3 (Additional file 1: Figure S6), respectively. The exons of each subset show significant correlations between their ΔPSIs and ΔHMs upon hESC differentiation (Fig. 3d). These HM-associated AS exons account for an average of 52.8% of hESC differentiation-related AS events, on average (Additional file 1: Figure S5E).
Of the three AS-associated HMs, H3K36me3 has both positive and negative correlations with AS exons. This is consistent with the fact that H3K36me3 has dual functions in regulating AS through two different chromatin-adapter systems, PSIP1/SRSF1  and MRG15/PTBP1 . The former increases the inclusion levels of targeting AS exons, whereas the latter decreases the inclusion levels . As expected, 139 and 11 of our identified H3K36me3-associated AS genes have been reported to be regulated by SRSF1 [67, 68] (Additional file 1: Figure S5F) and PTBP1  (Additional file 1: Figure S5G), respectively. Taken together, our analysis showed that more than half (52.8%) of hESC differentiation-associated AS events are significantly associated with three of 16 HMs during hESC differentiation, including H3K36me3, H3K27ac, and H4K8ac.
HM-associated AS genes predominantly function in G2/M phases to facilitate hESC differentiation
Epigenetic mechanisms have been proposed to be dynamic and play crucial roles in human ESC differentiation [15, 16]. Given the aforementioned associations between HMs and AS, and the well-established links between AS and hESC differentiation, we hypothesized that the three HMs (H3K36me3, H3K27ac, and H4K8ac) may contribute to stem cell differentiation through their associated AS events. To test our hypothesis and gain more insights into the differences between the HM-associated and HM-unassociated AS events, we performed comparative function analyses between their hosting genes, revealing that HMs are involved in alternatively splicing the core components of cell-cycle machinery and related pathways to regulate stem cell pluripotency and differentiation.
We found that HMs prefer to be associated with even shorter AS exons (Additional file 1: Figure S7A, p < 0.001, Student’s t-test), though AS exons are shorter than the average length of all exons (Additional file 1: Figure S2A). HM-associated genes (n = 2125) show more lineage specificity, i.e. more genes (49.76% vs 29.6% of MXE or 38.6% of SE genes) are lineage-specific (Additional file 1: Figures S7B and S3D, E), regardless of whether IMR90 is included or not (Additional file 1: Figure S7C). Only a few HM-associated genes are shared by different cell lineages, even in pairwise comparisons (Additional file 1: Figure S7D); the most common shared genes are lineage-independent housekeeping genes (Additional file 1: Figure S7E). These suggest that HM-associated AS genes contribute more to lineage specificity. In addition, the HM-associated AS genes (966 of 2125) are more enriched in stemness signatures than unassociated AS genes (429 of 1057) (Fig. 4a). TF binding enrichment analysis shows that HM-associated AS genes are likely to be regulated by TFs involved in cell differentiation, whereas HM-unassociated AS genes are likely to be regulated by TFs involved in cell proliferation and growth (Fig. 4b). All these results suggest that HM-associated and HM-unassociated AS genes function differently during hESC differentiation.
Gene Ontology (GO) enrichment analysis shows that more than half of the HM-associated AS genes (1120 of 2125) function in cell-cycle progression and exhibit more significant enrichment than do HM-unassociated AS genes (376 of 1057, Fig. 4c, d and Additional file 1: Figure S8A). The significance of the top enriched GO term (GO:0007049, cell cycle) is consistent across cell lineages, although HM-associated AS genes exhibit more lineage specificity and few of them are shared among lineages (Additional file 1: Figures S7B–D and S8B). These results suggest the involvement of HMs in AS regulation of the cell-cycle machinery that has been reported to be exploited by stem cells to control their cell fate decision .
Further study of the top enriched cell-cycle AS genes (Fig. 4d and Additional file 1: Figure S8A) shows that HM-associated (n = 282) and HM-unassociated AS genes (n = 150) play roles in different cell-cycle phases and related pathways. The former is prone to function in G2/M phases and DNA damage response (Fig. 4e, f). This indicates that HMs contribute to cell differentiation, at least partially, via AS regulations in these phases, which is consistent with the fact that inheritance of HMs in daughter cells occurs during the G2 phases . The latter play roles in G1 phase, cell-cycle arrest, and Wnt/β-catenin signalling (Additional file 1: Figure S8C, D). Since cell fate choices seem to occur or at least be initiated during G1/S transition , while cell fate commitment is achieved in G2/M , it could be rational for stem cells to change their identity during the G2 phase when HMs are reprogrammed .
Intriguingly, the top enriched pathway of HM-associated AS genes is “ATM/ATR-mediated DNA damage response,” which is activated in S/G2 phases and has been recently reported as a gatekeeper of the pluripotency state dissolution (PSD) that participates in allowing hESC differentiation . Together with our previous results , it suggests the presence of a combinational mechanism involving HMs and AS, wherein HMs facilitate the PSD and cell fate commitment by alternatively splicing the key components of the ATM/ATR pathway. Additionally, many cell-cycle TF genes are involved in the top enriched HM-associated AS gene set. The pre-B-cell leukaemia transcription factor 1 (PBX1) is one of these genes that contribute to cell-cycle progression and is discussed later in next section. Taken together, we suggest that three of 16 HMs function in positive or negative ways affect the AS of subsets of genes and further contribute to hESC differentiation in a cell-cycle phase-dependent manner. The results suggest a potential mechanistic model connecting the HMs, AS regulations, and cell-cycle progression with the cell fate decision.
Splicing of PBX1 links H3K36me3 to hESC fate decision
The past few years have identified key factors required for maintaining the pluripotent state of hESCs [70, 71], including NANOG, OCT4 (POU5F1), SOX2, KLF4, and c-MYC, the combination of which was called Yamanaka factors and sufficient to reprogram somatic cells into induced pluripotent stem cells (iPSCs) . These factors appear to activate a transcriptional network that endows cells with pluripotency . The above integrative analyses showed strong links between three HMs and RNA splicing, revealing a group of epigenetic regulated AS genes involved in cell-cycle machinery. PBX1 was one of the genes that their ASs are positively associated with H3K36me3 (Fig. 5a, b). Its protein is a member of the TALE (three-amino acid loop extension) family homeodomain transcription factors [74, 75] and well-known for its functions in lymphoblastic leukaemia [76,77,78,79] and several cancers [80,81,82,83,84,85,86,87,88,89]. PBX1 also plays roles in regulating developmental gene expression , maintaining stemness and self-renewal [80, 91, 92], and promoting the cell-cycle transition to the S phase . Additionally, multiple lines of evidence obtained from in vivo and in vitro highlighted its functions as a pioneer factor [86, 94]. However, few studies have distinguished the distinct functions of its different isoforms.
PBX1 has three isoforms , including PBX1a, PBX1b, and PBX1c (Fig. 5c and Additional file 1: Figure S9A). PBX1a and PBX1b are produced by the AS of exon 7 (Fig. 5a) and attract most of the research attention of PBX1. PBX1b retains the same DNA-binding homeodomain as PBX1a, but changes 14 amino acids (from 334 to 347) and truncates the remaining 83 amino acids at the C-terminus of PBX1a (Fig. 5c and Additional file 1: Figure S9A). This C-terminal alteration of PBX1a has been reported to affect its cooperative interactions with HOX partners , which may impart different functions to these two isoforms. We here revealed its H3K36me3-regulated isoform switch between PBX1a and PBX1b, which functions at the upstream of pluripotency transcriptional network to link H3K36me3 with ESC fate decision.
We first observed differential transcript expressions of these two isoforms between the hESCs and differentiated cells, wherein PBX1a was predominantly transcribed in hESCs, while PBX1b was predominantly induced in differentiated cells (Fig. 5a and Additional file 1: Figure S9B). The same trend was also observed in an extended dataset of 56 human cell lines/tissues (Fig. 5d) from the Roadmap  and ENCODE  projects (Additional file 4: Table S4). Additionally, we did not observe significantly different expression of the total PBX1 and three other PBX family members across cell types (Additional file 1: Figure S9C, fold change < 2), indicating that the isoform switch of PBX1, rather than the differential expression of its family members, plays more important roles during hESC differentiation. To further test the possible mechanism by which PBX1b contributes to stem cell differentiation, we investigated the transcription levels of Yamanaka factors. Of these TFs, the NANOG is activated by direct promoter binding of PBX1 and KLF4, which is essential for stemness maintenance [91, 99]. Consistently, all these core factors are repressed in differentiated cells where PBX1b is highly expressed (Additional file 1: Figure S9D–G), even though the PBX1a is expressed. Based on the 56 human cell lines/tissues, we also observed significant negative correlations between expression of most important pluripotent factors (NANOG and OCT4) and PBX1b (Fig. 5e), as well as positive correlations between these two factors and PBX1a (or inclusion level of exon 7, Additional file 1: Figure S10A, B). Consistent with previous reports showing that the PBX1a and PBX1b differ in their ability to activate or repress the expression of reporter genes [100, 101], we hypothesize that PBX1a promotes the activity of the pluripotent regulatory network by promoting the expression of NANOG, whereas PBX1b may attenuate this activity by competitively binding and regulating the same target gene, since PBX1b retains the same DNA-binding domain as PBX1a. These observations are strongly suggestive that the switch from PBX1a to PBX1b is a mechanism by which PBX1 contributes to hESC differentiation via regulating the pluripotency regulatory network.
Exon 7 of PBX1 shows significantly positive correlations between its inclusion levels (PSIs) and the surrounding epigenetic signals of H3K36me3 in hESCs and differentiated cells (Fig. 5b). It suggests a potential role of H3K36me3 in regulating the isoform switch between PBX1a and PBX1b. To investigate the regulatory role of H3K36me3, we focused on two previously proved chromatin-adaptor complexes, MRG15/PTBP1  and PSIP1/SRSF1 , which endow dual functions to H3K36me3 in AS regulation . Based on the 56 cell lines/tissues from the Roadmap/ENCODE projects, we first found significant positive correlations between the expressions of PBX1a (or inclusion level of exon 7) and PSIP1/SRSF1 (Fig. 5f), but not with MRG15/PTBP1 (Additional file 1: Figure S10C, D). This result suggests that the AS of PBX1 is epigenetic regulated by H3K36me3 through the PSIP1/SRSF1 adaptor system, which was strongly supported by a recent report using the HeLa cell lines . The overexpression of SRSF1 in Hela cells introduces a PSI increase of 0.18 for exon 7 of PBX1 (chr1: 164789308–164,789,421 based on NCBI37/hg19 genome assembly) based on the RNA-seq (Table S1 of ). Additionally, this exon was one of the 104 AS exons that were further validated using radioactive reverse transcription polymerase chain reaction (RT-PCR) (Table S2 of ). Their results showed that exon 7 of PBX1 is indeed a splicing target of SRSF1, supporting our conclusions.
We then validated the above hypotheses on MSCs and IM90 cells, since these two cells types show the most significant difference from H1 cells regarding our hypotheses (Fig. 5b). We cultured H1 cells, IMR90 cells, and induced H1 cells to differentiate into MSCs (H1-MSCs, see “Methods” for details). Additionally, we also included other two sources of MSCs, including one derived from human bone marrow (hBM-MSCs) and the other derived from adipose tissue (hAT-MSCs) (see “Methods” for details). Consistent with the results from RNA-seq, the same expression patterns of Yamanaka factors in H1, MSCs, and IMR90 cells were observed using quantitative RT-PCR (qRT-PCR) and western blot (Fig. 6a), which confirmed the pluripotent state of H1 cells and the differentiated states of other cell types. We then detected the isoform switch from PBX1a to PBX1b in our cultured cells, which are consistent both in mRNA and protein levels (Fig. 6b and Additional file 1: Figure S10E) and further confirmed by the western blot using PBX1b-specific antibody (anti-PBX1b) (Fig. 6b bottom and Additional file 1: Figure S10E iii). These results have verified that the PBX1b was significantly induced in differentiated cells, where the PBX1a was significantly reduced.
We also validated the mechanism by which the splicing of PBX1 links H3K36me3 to stem cell fate decision. We first confirmed that PBX1b also binds to the promoter of NANOG at the same region where PBX1a binds to and the binding signals (ChIP-PCR) were high in the differentiated cells but very low in H1 stem cells (Fig. 6c i and Additional file 1: Figure S10F i). Consistent with the results from ChIP-seq, we also observed reduced H3K36 tri-methylation around exon 7 of PBX1 based on ChIP-PCR assay (Fig. 6c ii and Additional file 1: Figure S10F ii). Furthermore, the chromatin factor PSIP1 only showed high binding signal in H1 stem cells (Fig. 6c iii and Additional file 1: Figure S10F iii), which recruit the SF SRSF1 to the PBX1 exclusively in H1 stem cells (Fig. 6d and Additional file 1: Figure S10G) even though the physical binding between these two factors were universally detected in all cell types (Fig. 6e). All these experimental results suggested that, upon differentiation, stem cells reduced the H3K36 tri-methylation and may attenuate the recruitment of PSIP1/SRSF1 adaptor around exon 7 of PBX1, leading to the exclusion of exon 7 and highly expressed PBX1b in differentiated cells. High expression of PBX1b may attenuate the activity of PBX1a in promoting the pluripotency regulatory network.
Taken together, we suggested that H3K36me3 regulates the AS of PBX1 via the PSIP1/SRSF1 adaptor system, leading the isoform switch from PBX1a to PBX1b during hESC differentiation. Subsequently, PBX1b competitively binds to NANOG and abolishes the bindings of PBX1a. This competitive binding attenuates the pluripotency regulatory network to repress self-renewal and consequently facilitate differentiation (Fig. 6f). These findings revealed how the PBX1 contributes to cell fate decision and exemplify the mechanism by which AS links HMs to stem cell fate decision.
ESCs provide a vital tool for studying the regulation of early embryonic development and cell fate decision. . In addition to the core pluripotency regulatory network, emerging evidence revealed other processes regulating ESC pluripotency and differentiation, including HMs, AS, cell-cycle machinery, and signalling pathways . Here, we connected these previously separate avenues of investigations, beginning with the discovery that three of 16 HMs are significantly associated with more than half of AS events upon hESC differentiation. Further analyses implicated the association of HMs, AS regulation, and cell-cycle progression with hESC fate decision. Specifically, HMs orchestrate a subset of AS outcomes that play critical roles in cell-cycle progression via the related pathways, such as ATM/ATR-mediated DNA response , and TFs, such as PBX1 (Additional file 1: Figure S10H). In this way, HMs, AS regulation, and signalling pathways are converged into the cell-cycle machinery that has been claimed to rule the pluripotency .
Although epigenetic signatures, such as HMs, are mainly enriched in promoters and other intergenic regions, it has become increasingly clear that they are also present in the gene body, especially in exon regions. This indicates a potential link between epigenetic regulation and the predominantly co-transcriptional occurrence of AS. Thus far, H3K36me3 [44, 45], H3K4me3 , H3K9me3 , and the acetylation of H3 and H4 [46,47,48,49,50] have been revealed to regulate AS, either via the chromatin-adapter systems or by altering Pol II elongation rate. Here, we investigated the extent to which the HMs could associate with AS by integrative analyses on both transcriptome and epigenome data during hESC differentiation. We found that three HMs are significantly associated with about half of AS events. By contrast, a recent report showed that only about 4% of differentially regulated exons among five human cell lines are positively associated with three promoter-like epigenetic signatures, including H3K9ac, H3K27ac, and H3K4m3 . Like that report, we also found a positive association of H3K27ac with a subset of AS events. However, our results differ regarding the other two HMs that we identified to be associated with AS.
In our study, H3K36me3 is associated with the most identified HM-associated AS events, either positively or negatively. It is reasonable since H3K36me3 is a mark for actively expressed genomes  and it has been reported to have dual roles in AS regulations through two different chromatin-adapter systems, PSIP1/SRSF1  and MRG15/PTBP1 . SRSF1 is a SF which will increase the inclusion of targeted AS exons and PTBP1 will decrease the inclusion levels of the regulated AS exons. Therefore, the exons that are regulated by the PSIP1/SRSF1 adapter system will show positive correlations with the H3K36me3, while the exons regulated by MRG15/PTBP1 will show negative correlations. Our results are consistent with this fact and show both direction correlations between different sets of AS events and H3K36me3. Many of these AS events in our study have been validated by other studies (Additional file 1: Figure S5F, G).
H4K8ac is associated with the fewest number of AS events in our results. Although rarely studied, H4K8ac is known to act as a transcriptional activator in both the promoters and transcribed regions . Its negative association with AS is supported by the finding that it recruits proteins involved in increasing the Pol II elongation rate . This suggests that H4K8ac may function in AS regulation by altering the Pol II elongation rate, rather than via the chromatin-adaptor systems. However, further studies are required. Collectively, possible reasons for the different results between this study and others  could be that: (1) we considered differentiation from hESCs to five different cell types, which covered more inter-lineage AS events than the previous report; or (2) different sets of epigenetic signatures were considered, which may lead to relatively biased results in both studies. Obviously, the inclusion of more cell lineages and epigenetic signatures may reduce this bias. Therefore, an extended dataset of 56 cell lines/tissues was included in our study and the observations support our results.
Our study also extended the understanding that HMs contribute to cell fate decision via determining not only what parts of the genome are expressed, but also how they are spliced . We demonstrated that the HM-associated AS events have a significant impact on cell fate decision in a cell-cycle-dependent manner. The most intriguing discovery is that the HM-associated genes are enriched in G2/M phases and predominantly function in ATM/ATR-mediated DNA response. Evidentially, the ATM/ATR-mediated checkpoint has been recently revealed to attenuate pluripotency state dissolution and serves as a gatekeeper of the pluripotency state through the cell cycle . The cell cycle has been considered the hub machinery for cell fate decision  since all commitments will go through the cell-cycle progression. Our study expanded such machinery by linking the HMs and AS regulation to cell-cycle pathways and TFs, which, together, contribute to cell fate decision (Additional file 1: Figure S10H).
We also exemplified our hypothesized mechanism by an H3K36me3-regulated isoforms switch of PBX1. In addition to its well-known functions in lymphoblastic leukaemia [76,77,78,79] and a number of cancers [80,81,82,83,84,85,86,87,88,89], PBX1 was also found to promote hESC self-renewal by corporately binding to the regulatory elements of NANOG with KLF4 . We found that the transcriptions of two isoforms of PBX1, PBX1a and PBX1b, are regulated by H3K36me3 during hESC differentiation. Their protein isoforms competitively bind NANOG and the binding of PBX1b will abolish the binding of PBX1a, which further attenuates the activity of the core pluripotency regulatory network composed of Yamanaka factors. The switch from PBX1a to PBX1b is modulated by H3K36me3 via the PSIP1/SRSF1 adapter system . Our results were also supported by an extended dataset of 56 cell lines/tissues from the Roadmap/ENCODE projects. Collectively, our findings expanded understanding of the core transcriptional network by adding a regulatory layer of HM-associated AS (Fig. 6f).
A very recent report showed that the switch in Pbx1 isoforms was regulated by Ptbp1 during neuronal differentiation in mice , indicating a contradiction that the AS of PBX1 should be negatively regulated by H3K36me3 via the MRG15/PTBP1 . Our study also included the neuronal lineage and showed that differentiation to NPC is an exception, distinct from other lineages. If NPC is considered separately, the results are consistent with the recent report  showing that NPCs and mature neurons express increasing levels of PBX1a rather than PBX1b (Additional file 1: Figure S9B). Another recent report showed that PBX1 was a splicing target of SRSF1 in the HeLa cell line , which strongly supports our findings. Taken together, these evidence suggests that there are two parallel mechanisms regulating PBX1 isoforms in embryonic development, in which neuronal differentiation adopts a mechanism that is different from other lineages.
Finally, it is worth noting that both our work and other studies  reported that HMs cannot explain all AS events identified either during ESC differentiation or based on pairwise comparisons between cell types. Moreover, bidirectional communication between HMs and AS has been widely reported. For instance, the AS can enhance the recruitment of H3K36 methyltransferase HYPB/Set2, resulting in significant differences in H3K36me3 around the AS exons . These findings increased the complexity of defining the cause and effect between HMs and AS. Nevertheless, our findings suggest that at least a subset of AS events are regulated epigenetically, similar to the way that epigenetic states around the transcription start sites define what parts of the genome are expressed. Additionally, as we described in our previous study, the AS outcomes may be estimated more precisely by combining splicing cis-elements and trans-factors (i.e. genetic splicing code) and HMs (i.e. epigenetic splicing code), as an “extended splicing code” . Taken together, we presented a mechanism conveying the HM information into cell fate decision through the AS of cell-cycle factors or the core components of pathways that controlling cell-cycle progression (Additional file 1: Figure S10H).
We performed integrative analyses on transcriptome and epigenome data of the hESCs, H1 cell line, and five differentiated cell types, demonstrating that three of 16 HMs were strongly associated with half of AS events upon hESC differentiation. We proposed a potential epigenetic mechanism by which some HMs contribute to ESC fate decision through the AS regulation of specific pathways and cell-cycle genes. We experimentally validated one cell-cycle-related transcription factor, PBX1, which demonstrated that AS provides a mechanism conveying the HM information into the regulation of cell fate decisions (Fig. 6f). Our study will have a broad impact on the field of epigenetic reprogramming in mammalian development involving splicing regulations and cell-cycle machinery.
Identification of AS exons upon hESC differentiation
Aligned BAM files (hg18) for all six cell types (H1, ME, TBL, MSC, NPC, and IMR90) were downloaded from the source provided in reference. Two BAM files (replicates) of each cell type were analyzed using rMATS (version 3.0.9)  and MISO (version 0.4.6) . The rMATS was used to identify AS exons based on the differential PSI (Ψ) values between each differentiated cell type and H1 cells (Additional file 1: Figure S1B). The splicing changes (ΔPSIs or ΔΨ) are used to identify the AS events between H1 and other cell types. A higher cutoff is always helpful in reducing the false positive while compromising the sensitivity. The cutoff, |ΔΨ| ≥ 0.1 or |ΔΨ| ≥ 10%, is widely accepted and used in AS identification [25, 106,107,108]. Many other studies even used 0.05 as the cutoff [109,110,111,112,113]. We did additional correlation analyses based on different ΔPSI cutoffs (0.1, 0.2, 0.3, 0.4, and 0.5). With the increase of the cutoffs, the number of AS events was significantly reduced (Additional file 1: Figure S11A); however, the correlations ware only slightly increased between AS and some HMs (Additional file 1: Figure S11B, C, upper panels), i.e. no consistent impacts of the cutoffs on the correlations were observed. Similarly, the correlation significances were also not consistently affected (Additional file 1: Figure S11B, C, lower panels). Therefore, in our study, only AS exons which hold the |ΔPSI| ≥ 0.1, p value ≤ 0.01, and FDR ≤ 5%, were considered as final hESC differentiation-related AS exons.
All identified AS event types are summarized in Additional file 1: Table S1. Finally, two types of AS exons, namely SE and MXE, which are most common and account for major AS events, were used for subsequent analysis (Additional file 2: Table S2). MISO was used to estimate the confidence interval of each PSI and generate Sashimi graphs  (see Figs. 1a, b and 5a). To match the ChIP-seq analysis, genomic coordinates of identified AS events were lifted over to hg19 using LiftOver tool downloaded from UCSC.
ChIP-seq data process and HM profiling
ChIP-seq data (aligned BAM files, hg19) were downloaded from Gene Expression Omnibus (GEO, accession ID: GSE16256). This dataset includes the ChIP-seq reads of up to 24 types of HMs for six cell types (H1, ME, TBL, MSC, NPC, and IMR90). Among these, nine histone acetylation modifications (H2AK5ac, H2BK120ac, H2BK5ac, H3K18ac, H3K23ac, H3K27ac, H3K4ac, H3K9ac, and H4K8ac) and seven histone methylation modifications (H3K27me3, H3K36me3, H3K4me1, H3K4me2, H3K4me3, H3K79me1, and H3K9me3) are available for all six cell types and therefore were used for our analyses.
To generate global differential profiles of HM changes between AS exons and constitutive exons upon hESC differentiation, for each MXE and SE AS events, we first randomly selected the constitutive splicing (CS) exons from the same genes, composing a set of CS exons. We then considered the HM changes in a ± 150-bp region flanking both splice sites of each AS and CS exon, i.e. a 300-bp exon-intron boundary region. Each region was 15 bp-binned. Alternatively, for a few cases where the exon or intron is < 150 bps, the entire exonic or intronic region was evenly divided into 10 bins. This scaling allows combining all regions of different lengths to generate uniform profiles of HM changes around the splice sites (see Fig. 2c and Additional file 1: Figure S4). To this end, we calculated the sequencing depth-normalized Δ reads number for each binned region between H1 cells and differentiated cells as follows:
Each region is assigned a value representing the average Δ reads number between H1 cells and differentiated cells for each HM. We also compared HM profiles between the inclusion-gain and inclusion-loss exons (Fig. 3a and Additional file 1: Figure S5B) using the same strategy. The statistical results (Fig. 2c and Additional file 1: Figure S5B) are presented as the p values based on Mann–Whitney–Wilcoxon tests (using the R package).
To quantitatively link HMs to AS upon hESC differentiation, the ChIP-seq data were further processed by narrow peak calling. For each histone ChIP-seq dataset, the MACS v2.0.10 peak caller (https://github.com/taoliu/MACS/) [65, 115] was used to compare ChIP-seq signal to a corresponding whole cell extract (WCE) sequenced control to identify narrow regions of enrichment (narrow peak) that pass a Poisson p value threshold of 0.01. All other parameters (options) were set as default. We then compared the HM signals between H1 cells and differentiated cells. We defined the “differential HM signals (ΔHMs)” as the difference of the normalized peak signals (i.e. the heights of the narrow peaks) between H1 and the differentiated cells. Because the 3′ splice sites (3′ ends of the introns) determine the inclusion of the downstream AS exons  and the distances from the peaks to their target sites affect their regulatory effects , we normalize the peak signals against the distance (in kb) between the peak summits and 3′ splice sites (Additional file 1: Figure S5A). Since there is no evidence showing that distal HMs could regulate the AS, we only considered local peaks with at least 1 bp overlapping on either side of the AS exon. For exons without overlapping peaks, peak signals of these exons were set to zero. For the exons there are more than one overlapping peaks, the peak signals of these exons were set to the greater ones. For MXE events, only the upstream AS exons were considered due to their exclusiveness in inclusion level between these two exons, i.e. the sum of the PSIs for 2 exons of an MXE event is always 1.
Association studies and k-means clustering
To quantitatively estimate associations between HMs and AS, we first used three independent correlation tests, including Pearson correlation (PC), multiple linear regression (MLR), and logistic regression (LLR), to test global correlations between AS events and each of 16 HMs based on differential inclusions (ΔPSIs) and differential HM signals (ΔHMs). PC was performed using the R package (stats, cor.test(),method = ‘pearson’). MLR and LLR were calculated using Weka 3 , wherein the ΔHMs are independent variables and ΔPSIs are response variables. The results show that only some HMs correlate with AS, and most correlations are weak (Additional file 1: Figure S5C). HMs that have significant correlations (p ≤ 0.05) with AS were used for further clustering analysis, through which we identified six subsets of HM-associated AS events (Additional file 3: Table S3).
K-means clustering was performed separately on inclusion-gain and inclusion-loss AS events of MXE and SE, based on the selected HM signatures (Additional file 1: Figure S5C, checked with a “√”). K was set to 6 for all clustering processes (Additional file 1: Figure S5D), which produced the minimal root mean square error (RMSE) for most cases based on a series of clustering with k in the range of 2–8 (data not shown). Then the two clusters that generate mirror patterns, of which one was from inclusion-gain events and one was from inclusion-loss events, were combined to be considered as a subset of HM-associated AS events (Additional file 1: Figure S6). Finally, we identified six subsets of HM-associated AS events displaying significantly positive or negative correlations with three HMs, respectively.
Gene expression quantification
For each cell type, two aligned BAM files (replicates) were used to estimate the expression level of genes using Cufflinks . Default parameters (options) were used. The expression level of each gene was presented as FPKM for each cell type. Differentially expressed genes (DEGs) were defined as those genes whose fold changes between the differentiated cells and hESCs are > 2. Specifically for DEG analysis of SF genes, we collected a list of 47 ubiquitously expressed SFs with “RNA splicing” in their GO annotations from AmiGO 2 . The enrichment significances in Additional file 1: Figures S1D and S3H are shown as the p values based on hypergeometric tests, using the DEGs of all expressed genes (with the FPKM ≥ 1 in at least hESCs or one of the differentiated cell types) as background. We found that the AS genes are generally not differentially expressed between hESCs and differentiated cells, indicating that they function in hESC differentiation via isoform switches rather than expression changes. Few SF genes show differential expression between hESCs and differentiated cells, indicating the existence of epigenetic control of AS, rather than the direct control on SFs expression.
Since the RNA-seq reads (BAM) files and ChIP-seq read (BAM) files downloaded from the public sources were mapped to different human genome assemblies, NCBI36/hg18 (Mar. 2006) and GRCh37/hg19 (February 2009), respectively, we downloaded two version of gene annotations (in GTF formats) from the UCSC Table Browser . The hg18 GTF file was used for rMATS and MISO to identify AS during the differentiation from H1 ESCs into five differentiated cells. The hg19 GTF file was used to define the genome coordinates of AS exons and further for ChIP-seq profile analysis (Figs. 2a–c, 3a, and Additional file 1: Figures S4 and S5A). We compared exonic and intronic lengths based on hg18 annotation (Additional file 1: Figure S2).
Gene Ontology enrichment analysis
The GO enrichment analysis was performed using ToppGene  by searching the HGNC Symbol database under default parameters (p value method: probability density function). Overrepresented GO terms for the GO domain “biological process” were used to generate data shown in Fig. 4c–e and Additional file 1: Figure S8A–C using either the FDR (0.05) adjusted p value or the enriched gene numbers (Additional file 1: Figure S8A).
Canonic pathway enrichment analysis
Both the HM-associated (n = 282) and HM-unassociated (150) AS genes from the top enriched GO term (GO:0007049) were used to perform canonic pathway enrichment (Fig. 4f and Additional file 1: Figure S8D) analysis through Ingenuity Pathway Analysis (IPA, http://www.ingenuity.com/products/ipa).
Stemness signature and TF binding enrichment analysis
StemChecker , a web-based tool to discover and explore stemness signatures in gene sets, was used to calculate the enrichment of AS genes in stemness signatures. Significances were tested (hypergeometric test) using the gene set from human (Homo sapiens) of this database as the background. For all AS genes (n = 3865), 2979 genes were found in this dataset. Of 2979 genes, 1395 were found in the stemness signature gene set, most of which (n = 813) are ESC signature genes. Additional file 1: Figure S3A shows the enrichment significance as the -log10 p values (Bonferroni adjusted). For HM-associated AS genes (n = 2125), 1992 genes were found in this dataset. Of 1992 genes, 966 were found in the stemness signature gene set, most of which (n = 562) are ESC signature genes. For HM-unassociated genes (n = 1057), 987 genes were found in this dataset. Of 987 genes, 429 were found in the stemness signature gene set, most of which (n = 251) are ESC signature genes. The significances are shown as -log10 p values (Bonferroni adjusted) in Fig. 4a.
FunRich (v2.1.2) [123, 124], a stand-alone functional enrichment analysis tool, was used for TF binding enrichment analysis to get the enriched TFs that may regulate the query genes. The top six enriched TFs of HM-associated and HM-unassociated AS genes are presented and shown as the proportion of enriched AS genes. It shows that HM-associated AS genes are more likely to be regulated by TFs involved in cell differentiation and development, while the HM-unassociated AS genes are more likely to be regulated by TFs involved in cell proliferation and renewal (Fig. 4b).
Roadmap and ENCODE data analysis
All raw data are available from the GEO accession IDs GSE18927 and GSE16256. The individual sources of RNA-seq data for 56 cell lines/tissues from Roadmap/ENCODE projects are listed in Additional file 4: Table S4. The RNA-seq data (BAM files) were used to calculate the PSI of exon 7 for PBX1 in each cell line/tissue and to estimate the expression levels of all gene (FPKM), based on aforementioned strategies. The relative expression levels of PBX1a and PBX1b shown in Fig. 5 and Additional file 1: Figure S10 were calculated as the individual PFKM value of each divided by their total FPKM values.
Statistical analyses and tests
Levels of significance were calculated with the Mann–Whitney–Wilcoxon test for Figs. 2c, d, 3a, and Additional file 1: Figure S4 and S5B, with Fisher’s exact test for Figs. 1d, 2d, and Additional file 1: Figure S5B, with Student’s t-test for Fig. 5d and Additional file 1: Figure S7A, and with a hypergeometric test for Additional file 1: Figures S1D and S3H. Levels of correlation significance were calculated with PC, MLR, and LLR for Fig. 3c, d, and Additional file 1: Figure S5C. MLR and LLR were performed using Weka 3 , whereas all other tests were performed using R packages. The p values for the enrichment analyses (Fig. 4, Additional file 1: Figures S3A and S8) were adjusted either by PDR or Bonferroni (refer to the corresponding method sections for details). The statistical analyses of the ChIP, RIP, and western blotting assays were shown in Additional file 1: Figure S10E–G. Three replicates were conducted for each assay and the quantitatively statistical analyses were performed on the relative band intensities normalized by control genes (ß-Actin) or input signals. ImageJ (https://imagej.nih.gov/ij/) was used to quantify the band intensities. ANOVA was used for statistical significance tests.
Cell culture and MSC induction
Human embryonic stem cell (H1cell line) line was purchased from the WiCell Research Institute (product ID: WA01). H1 cells were cultured on the matrigel-coated six-well culture plates (BD Bioscience) in the defined mTeSR1 culture medium (STEMCELL Technologies). The culture medium was refreshed daily. The H1-derived mesenchymal stem cells (H1-MSC) were differentiated from H1 cells as described previously . Briefly, small H1 cell aggregates were cultured on a monolayer of mouse OP9 bone-marrow stromal cell line (ATCC) for nine days. After depleting the OP9 cell layer, the cells were then cultured in semi-solid colony-forming serum-free medium supplemented with fibroblast growth factor 2 and platelet-derived growth factor BB for two weeks. The mesodermal colonies were selected and cultured in mesenchymal serum-free expansion medium with FGF2 to generate and expand H1-MSCs. hAT-MSCs were derived from the subcutaneous fats provided by the National Disease Research Interchange (NDRI) using the protocol described previously . Briefly, the adipose tissue was mechanically minced and digested with collagenase Type II (Worthington Bio, Lakewood, NJ, USA). The resulted single cell suspension was cultured in α-minimal essential medium with 5% human platelet lysate (Cook Regentec, Indianapolis, IN, USA), 10 μg/mL gentamicin (Life Technologies, CA, USA), and 1× Glutamax (Life Technologies). After reaching ~ 70% confluence, the adherent cells were harvested at the passage 1 (P1) hAT-MSC/AEY239. hBM-MSCs were isolated from commercially available fresh human bone marrow (hBM) aspirates (AllCells, Emeryville, CA, USA) and expanded following a standard protocol . Briefly, hBM-MSC were cultured in α-minimal essential medium supplemented with 17% fetal bovine serum (FBS; lot-selected for rapid growth of MSC; Atlanta Biologicals, Norcross, GA, USA), 100 units/mL penicillin (Life Technologies, CA, USA), 100 μg/mL streptomycin (Life Technologies), and 2 mM L-glutamine (Life Technologies). After reaching ~ 70% confluence, the adherent cells were harvested at the passage 1 (P1) hBM-MSC-5204(G). The human IMR-90 cell line was purchased from the American Tissue Type Culture Collection (Manassas, VA, USA) and cultured in Eagle’s Minimum Essential Medium (Thermo Scientific, Logan, UT, USA) supplemented 10% fetal calf serum, 2 mM L-glutamine, 100 U/mL penicillin, and 100 μg/mL streptomycin at 37 °C in humidified 5% CO2.
RNA isolation and RT-PCR
Total RNA was extracted from cells using the RNeasy Mini plus kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s instructions. RT reactions for first-strand complementary DNA (cDNA) synthesis were performed with 2 μg of total RNA in 25 uL reaction mixture containing 5 μL of 5× first-strand synthesis buffer, 0.5 mM dNTP, 0.5 μg oligo(dT)12-18mer (Invitrogen), 200 units of M-MLV reverse transcriptase (Promega, Madison, WI, USA), and 25 units of RNase inhibitor (Invitrogen). The mixture was then incubated at 42 °C for 50 min and 52 °C for 10 min. The reaction was stopped by heating at 70 °C for 15 min. The PCR amplifications were carried out in 50 μL reaction solution containing 1 μL of RT product, 5 μL of 10× PCR buffer, 0.15 mM MgCl2, 0.2 mM dNTP, 0.2 mM sense and antisense primers, and 2.5 U Taq polymerase (Bohringer, Mannheim, Germany). The sequences for the upstream and downstream primers of PBX1 and β-actin are listed in Additional file 1: Table S5. PCR reaction solution was denatured initially at 95 °C for 3 min, followed by 35 cycles at 95 °C for 1 min, 55 °C for 40 s, and 72 °C 40 s. The final extension step was at 72 °C for 10 min. The PCR products were resolved in a 2% ethidium bromide-containing agarose gel and visualized using ChemiDoc MP Imager (Bio-Rad).
The qPCR amplification was done in a 20-uL reaction mixture containing 100 ng of cDNA, 10 uL 2× All-in-One™ qPCR mix (GeneCopoeia, Rockville, MD, USA), 0.3 mM of upstream and downstream primers, and nuclear-free water. The PCR reaction was conducted with 1 cycle at 95 °C for 10 min, 40 cycles at 95 °C for 15 s, 40 °C for 30 s, and 60 °C for 1 min, followed by dissociation curve analysis distinguishing PCR products. The expression level of a gene was normalized with the endogenous control gene β-actin. The relative expressions of genes were calculated using the 2-ΔΔCT method, normalized by β-actin and presented as mean ± SD (n = 3) (Fig. 6a). The sequences of the paired sense and antisense primers for human SOX2, NANOG, OCT4, c-MYC, KLF4, and β-actin are listed in Additional file 1: Table S5.
Cells were lysed with 1× RIPA buffer supplemented with protease and phosphatase inhibitor cocktail (Roche Applied Science, Indianapolis, IN, USA) and stored in aliquots at − 20 °C until use. Twenty micrograms of cell lysates were mixed with an equal volume of Laemmli sample buffer, denatured by boiling, and separated by SDS-PAGE. The separated proteins were then transferred to PVDF membranes (Bio-Rad, Hercules, CA, USA). The membranes were blocked using 5% BSA for 1 h at room temperature and incubated with the first antibodies against 1% BSA overnight. SOX2, OCT4, NANOG, KLF4, c-MYC, and β-actin antibodies were from Cell Signalling Technology (Beverly, MA, USA). PBX1 antibody was from Abcam (Cambridge, MA, USA) and PBX1b antibody from Santa Cruz Technology (Dallas, TX, USA). After incubation with IgG horseradish peroxidase-conjugated secondary antibodies (Cell Signalling) for 2 h at room temperature, the immunoblots were developed using SuperSignal West Pico PLUS Chemiluminescent reagent (Thermo Fisher Scientific, Waltham, MA, USA) and imaged using ChemiDoc MP Imager (Bio-Rad).
Co-IP was used to validate the PSIP1-SRSF1 interaction (Fig. 6e). Cells were lysed with ice-cold non-denaturing lysis buffer containing 20 mM Tris HCl (pH 8), 137 mM NaCl, 1% Nonidet P-40, 2 mM EDTA, and proteinase inhibitor cocktails. A total of 200 μg protein were pre-cleared with 30 uL protein G magnetic beads (Cell Signalling) for 30 min at 4 °C on a rotator to reduce non-specific protein binding. The pre-cleared lysate was immunoprecipitated with the 5 μL anti-SRSF1 antibody (Invitrogen) overnight and then incubated with protein G magnetic beads for 4 h at 4 °C. Beads without anti-SRSF1 antibody were used as an IP control. The protein G magnetic beads were washed five times with lysis buffer and the precipitated protein collected. The SRSF1-bound PSIP1 protein (also known as LEDGF) level was determined with the PSIP1 antibody (Human LEDGF Antibody, R&D Systems) using western blot as described above. Three specific bands (one for p75 and two for p52) were detected (Fig. 6e) as indicated on the R&D Systems website (https://www.rndsystems.com/products/human-ledgf-antibody-762815_mab3468).
Chromatin immunoprecipitation (ChIP)
ChIP assay was performed using a SimpleChIP Enzymatic Chromatin IP Kit (Cell Signaling Technology) according to the manufacturer’s instructions. Briefly, 2 × 107 cells were cross-linked with 1% formaldehyde and lysed with lysis buffer. Chromatin was fragmented by partial digestion with Micrococcal Nuclease Chromatin. The protein–DNA complex was precipitated with ChIP-Grade Protein G Magnetic Beads (Cell Signalling) and ChIP-validated antibodies against H3K36me3 (Abcam), PSIP1 (Novus), and PBX1b (Santa Cruz). Normal mouse IgG and normal rabbit IgG were used as negative controls. After reversal of protein–DNA cross-links, the DNA was purified using DNA purification spin columns. The purified DNA fragments were then amplified with the appropriate primers on T100 thermal cycler (Bio-Rad). The primer pairs used for PCR are listed in Additional file 1: Table S5. The H3K36me3-immunoprecipitated and PSIP1-immunoprecipitated DNA fragments surrounding exon 7 of PBX1b and the promoter region of NANOG in the PBX1b-immunoprecipitated DNA fragments were PCR-amplified. The ChIP-PCR products were revealed by electrophoresis on a 2% agarose gel (Fig. 6c).
RNA immunoprecipitation (RIP)
RIP assay was performed using Magna RIP™ RNA-Binding Protein Immunoprecipitation Kit (Millipore Sigma) following the manufacturer’s instructions. Briefly, the cells were lysed with RIP lysis buffer with RNase and protease inhibitors. In total, 500 μg of total protein was precleared with protein G magnetic beads for 30 min. The protein G magnetic beads were preincubated with 5 μg of mouse monoclonal anti-SRSP1 antibody or normal mouse IgG for 2 h at 4 °C. The antibody-coated beads were then incubated with precleared cell lysates at 4 °C overnight with rotation. The RNA/protein/beads conjugates were washed five times with RIP was buffer and the RNA–protein complexes were eluted from the protein G magnetic beads on the magnetic separator. The SRSP1-bound RNA was extracted using acid phenol-chloroform and precipitated with ethanol. The RNA was then reverse-transcribed and the expression levels of PBX1a and PBX1b in the immunoprecipitated and non-immunoprecipitated (input) samples were analyzed using RT-PCR (Fig. 6d).
Alternative (pre-mRNA) splicing
Embryonic stem cell
H2A acetylated lysine 5
H2B acetylated lysine120
H2B acetylated lysine 5
H3 acetylated lysine 18
H3 acetylated lysine 23
H3 acetylated lysine 27
H3 tri-methylated lysine 27
H3 tri-methylated lysine 36
H3 acetylated lysine 4
H3 mono-methylated lysine 4
H3 di-methylated lysine 4
H3 tri-methylated lysine 4
H3 mono-methylated lysine 79
H3 acetylated lysine 9
H3 tri-methylated lysine 9
H4 acetylated lysine 8
Mutually excluded exon
Percent splice in
Thomson JA, Itskovitz-Eldor J, Shapiro SS, Waknitz MA, Swiergiel JJ, Marshall VS, et al. Embryonic stem cell lines derived from human blastocysts. Science. 1998;282:1145–7.
Graveley BR. Splicing up pluripotency. Cell. 2011;147:22–4.
Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JR, et al. Core transcriptional regulatory circuitry in human embryonic stem cells. Cell. 2005;122:947–56.
Lian XJ, Zhang JH, Azarin SM, Zhu KX, Hazeltine LB, Bao XP, et al. Directed cardiomyocyte differentiation from human pluripotent stem cells by modulating Wnt/beta-catenin signaling under fully defined conditions. Nat Protoc. 2013;8:162–75.
Ogaki S, Shiraki N, Kume K, Kume S. Wnt and Notch Signals Guide Embryonic Stem Cell Differentiation into the Intestinal Lineages. Stem Cells. 2013;31:1086–96.
Clevers H, Loh KM, Nusse R. An integral program for tissue renewal and regeneration: Wnt signaling and stem cell control. Science. 2014;346:1248012.
Itoh F, Watabe T, Miyazono K. Roles of TGF-beta family signals in the fate determination of pluripotent stem cells. Semin Cell Dev Biol. 2014;32:98–106.
Sakaki-Yumoto M, Katsuno Y, Derynck R. TGF-beta family signaling in stem cells. Biochimica Et Biophysica Acta-General Subjects. 1830;2013:2280–96.
Watabe T, Miyazono K. Roles of TGF-beta family signaling in stem cell renewal and differentiation. Cell Res. 2009;19:103–15.
Tay Y, Zhang JQ, Thomson AM, Lim B, Rigoutsos I. MicroRNAs to Nanog, Oct4 and Sox2 coding regions modulate embryonic stem cell differentiation. Nature. 2008;455:1124–8.
Shenoy A, Blelloch RH. Regulation of microRNA function in somatic stem cell proliferation and differentiation. Nat Rev Mol Cell Biol. 2014;15:565–76.
Fatica A, Bozzoni I. Long non-coding RNAs: new players in cell differentiation and development. Nat Rev Genet. 2014;15:7–21.
Gabut M, Samavarchi-Tehrani P, Wang XC, Slobodeniuc V, O'Hanlon D, Sung HK, et al. An Alternative Splicing Switch Regulates Embryonic Stem Cell Pluripotency and Reprogramming. Cell. 2011;147:132–46.
Salomonis N, Schlieve CR, Pereira L, Wahlquist C, Colas A, Zambon AC, et al. Alternative splicing regulates mouse embryonic stem cell pluripotency and differentiation. Proc Natl Acad Sci U S A. 2010;107:10514–9.
Xie W, Schultz MD, Lister R, Hou ZG, Rajagopal N, Ray P, et al. Epigenomic Analysis of Multilineage Differentiation of Human Embryonic Stem Cells. Cell. 2013;153:1134–48.
Gifford CA, Ziller MJ, Gu HC, Trapnell C, Donaghey J, Tsankov A, et al. Transcriptional and Epigenetic Dynamics during Specification of Human Embryonic Stem Cells. Cell. 2013;153:1149–63.
Hawkins RD, Hon GC, Lee LK, Ngo Q, Lister R, Pelizzola M, et al. Distinct Epigenomic Landscapes of Pluripotent and Lineage-Committed Human Cells. Cell Stem Cell. 2010;6:479–91.
Boland MJ, Nazor KL, Loring JF. Epigenetic Regulation of Pluripotency and Differentiation. Circ Res. 2014;115:311–24.
Xu Y, Wang Y, Luo J, Zhao W, Zhou X. Deep learning of the splicing (epi)genetic code reveals a novel candidate mechanism linking histone modifications to ESC fate decision. Nucleic Acids Res. 2017;45:12100–12.
Vallier L. Cell Cycle Rules Pluripotency. Cell Stem Cell. 2015;17:131–2.
Pan Q, Shai O, Lee LJ, Frey J, Blencowe BJ. Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008;40:1413–5.
Gerstein MB, Lu ZJ, Van Nostrand EL, Cheng C, Arshinoff BI, Liu T, et al. Integrative Analysis of the Caenorhabditis elegans Genome by the modENCODE Project. Science. 2010;330:1775–1787.
Graveley BR, Brooks AN, Carlson J, Duff MO, Landolin JM, Yang L, et al. The developmental transcriptome of Drosophila melanogaster. Nature. 2011;471:473–9.
Ramani AK, Calarco JA, Pan Q, Mavandadi S, Wang Y, Nelson AC, et al. Genome-wide analysis of alternative splicing in Caenorhabditis elegans. Genome Res. 2011;21:342–8.
Wang ET, Sandberg R, Luo SJ, Khrebtukova I, Zhang L, Mayr C, et al. Alternative isoform regulation in human tissue transcriptomes. Nature. 2008;456:470–476.
Yeo GW, Xu XD, Liang TY, Muotri AR, Carson CT, Coufal NG, et al. Alternative splicing events identified in human embryonic stem cells and neural progenitors. PLoS Comput Biol. 2007;3:1951–67.
Salomonis N, Nelson B, Vranizan K, Pico AR, Hanspers K, Kuchinsky A, et al. Alternative Splicing in the Differentiation of Human Embryonic Stem Cells into Cardiac Precursors. PLoS Comput Biol. 2009;5:e1000553.
Lu XY, Goke J, Sachs F, Jacques PE, Liang HQ, Feng B, et al. SON connects the splicing-regulatory network with pluripotency in human embryonic stem cells. Nat Cell Biol. 2013;15:1141–52.
Kalsotra A, Cooper TA. Functional consequences of developmentally regulated alternative splicing. Nat Rev Genet. 2011;12:715–29.
Mayshar Y, Rom E, Chumakov I, Kronman A, Yayon A, Benvenisty N. Fibroblast growth factor 4 and its novel splice isoform have opposing effects on the maintenance of human embryonic stem cell self-renewal. Stem Cells. 2008;26:767–74.
Rao S, Zhen S, Roumiantsev S, McDonald LT, Yuan GC, Orkin SH. Differential Roles of Sall4 Isoforms in Embryonic Stem Cell Pluripotency. Mol Cell Biol. 2010;30:5364–80.
Fiszbein A, Kornblihtt AR. Alternative splicing switches: Important players in cell differentiation. Bioessays. 2017;39
Barash Y, Calarco JA, Gao WJ, Pan Q, Wang XC, Shai O, et al. Deciphering the splicing code. Nature. 2010;465:53–9.
Han H, Irimia M, Ross PJ, Sung HK, Alipanahi B, David L, et al. MBNL proteins repress ES-cell-specific alternative splicing and reprogramming. Nature. 2013;498:241–5.
Schwartz S, Meshorer E, Ast G. Chromatin organization marks exon-intron structure. Nat Struct Mol Biol. 2009;16:990–5.
Kornblihtt AR, De la Mata M, Fededa JP, Munoz MJ, Nogues G. Multiple links between transcription and splicing. Rna-a Publication of the Rna Society. 2004;10:1489–98.
Wang GS, Cooper TA. Splicing in disease: disruption of the splicing code and the decoding machinery. Nat Rev Genet. 2007;8:749–61.
Luco RF, Allo M, Schor IE, Kornblihtt AR, Misteli T. Epigenetics in Alternative Pre-mRNA Splicing. Cell. 2011;144:16–26.
Sharov AA, Ko MSH. Human ES cell profiling broadens the reach of bivalent domains. Cell Stem Cell. 2007;1:237–8.
Singh AM, Sun YH, Li L, Zhang WJ, Wu TM, Zhao SY, et al. Cell-Cycle Control of Bivalent Epigenetic Domains Regulates the Exit from Pluripotency. Stem Cell Reports. 2015;5:323–36.
Kouzarides T. Chromatin modifications and their function. Cell. 2007;128:693–705.
Sims RJ, Millhouse S, Chen CF, Lewis BA, Erdjument-Bromage H, Tempst P, et al. Recognition of trimethylated histone h3 lysine 4 facilitates the recruitment of transcription postinitiation factors and pre-mRNA splicing. Mol Cell. 2007;28:665–76.
Piacentini L, Fanti L, Negri R, Del Vescovo V, Fatica A, Altieri F, et al. Heterochromatin Protein 1 (HP1a) Positively Regulates Euchromatic Gene Expression through RNA Transcript Association and Interaction with hnRNPs in Drosophila. PLoS Genet. 2009;5:e1000670.
Luco RF, Pan Q, Tominaga K, Blencowe BJ, Pereira-Smith OM, Misteli T. Regulation of alternative splicing by histone modifications. Science. 2010;327:996–1000.
Pradeepa MM, Sutherland HG, Ule J, Grimes GR, Bickmore WA. Psip1/Ledgf p52 Binds Methylated Histone H3K36 and Splicing Factors and Contributes to the Regulation of Alternative Splicing. PLoS Genet. 2012;8:e1002717.
Gunderson FQ, Johnson TL. Acetylation by the Transcriptional Coactivator Gcn5 Plays a Novel Role in Co-Transcriptional Spliceosome Assembly. PLoS Genet. 2009;5:e1000682.
Schor IE, Rascovan N, Pelisch F, Allo M, Kornblihtt AR. Neuronal cell depolarization induces intragenic chromatin modifications affecting NCAM alternative splicing. Proc Natl Acad Sci U S A. 2009;106:4325–30.
Schor IE, Kornblihtt AR. Playing inside the genes: Intragenic histone acetylation after membrane depolarization of neural cells opens a path for alternative splicing regulation. Commun Integr Biol. 2009;2:341–3.
Zhou HL, Hinman MN, Barron VA, Geng C, Zhou G, Luo G, et al. Hu proteins regulate alternative splicing by inducing localized histone hyperacetylation in an RNA-dependent manner. Proc Natl Acad Sci U S A. 2011;108:E627–35.
Sharma A, Nguyen H, Cai L, Lou H. Histone hyperacetylation and exon skipping: a calcium-mediated dynamic regulation in cardiomyocytes. Nucleus. 2015;6:273–8.
Dalton S. Linking the Cell Cycle to Cell Fate Decisions. Trends Cell Biol. 2015;25:592–600.
Coronado D, Godet M, Bourillot PY, Tapponnier Y, Bernat A, Petit M, et al. A short G1 phase is an intrinsic determinant of naive embryonic stem cell pluripotency. Stem Cell Res. 2013;10:118–31.
Pauklin S, Vallier L. The Cell-Cycle State of Stem Cells Determines Cell Fate Propensity. Cell. 2013;155:135–47.
Gonzales KAU, Liang H, Lim YS, Chan YS, Yeo JC, Tan CP, et al. Deterministic Restriction on Pluripotent State Dissolution by Cell-Cycle Pathways. Cell. 2015;162:564–79.
Aaronson Y, Meshorer E. STEM CELLS Regulation by alternative splicing. Nature. 2013;498:176–7.
Shen S, Park JW, Lu ZX, Lin L, Henry MD, Wu YN, et al. rMATS: robust and flexible detection of differential alternative splicing from replicate RNA-Seq data. Proc Natl Acad Sci U S A. 2014;111:E5593–601.
Magen A, Ast G. The importance of being divisible by three in alternative splicing. Nucleic Acids Res. 2005;33:5574–82.
Zheng CL, Fu XD, Gribskov M. Characteristics and regulatory elements defining constitutive splicing and different modes of alternative splicing in human and mouse. Rna. 2005;11:1777–87.
Pinto JP, Kalathur RK, Oliveira DV, Barata T, Machado RSR, Machado S, et al. StemChecker: a web-based tool to discover and explore stemness signatures in gene sets. Nucleic Acids Res. 2015;43:W72–7.
Barski A, Cuddapah S, Cui K, Roh TY, Schones DE, Wang Z, et al. High-resolution profiling of histone methylations in the human genome. Cell. 2007;129:823–37.
Kornblihtt AR, Schor IE, Allo M, Blencowe BJ. When chromatin meets splicing. Nat Struct Mol Biol. 2009;16:902–3.
Mabon SA, Misteli T. Differential recruitment of pre-mRNA splicing factors to alternatively spliced transcripts in vivo. PLoS Biol. 2005;3:e374.
Kolasinska-Zwierz P, Down T, Latorre I, Liu T, Liu XS, Ahringer J. Differential chromatin marking of introns and expressed exons by H3K36me3. Nat Genet. 2009;41:376–81.
Mikkelsen TS, Ku MC, Jaffe DB, Issac B, Lieberman E, Giannoukos G, et al. Genome-wide maps of chromatin state in pluripotent and lineage-committed cells. Nature. 2007;448:553–60.
Feng JX, Liu T, Qin B, Zhang Y, Liu XS. Identifying ChIP-seq enrichment using MACS. Nat Protoc. 2012;7:1728–40.
Curado J, Iannone C, Tilgner H, Valcarcel J, Guigo R. Promoter-like epigenetic signatures in exons displaying cell type-specific splicing. Genome Biol. 2015;16:236.
Anczukow O, Akerman M, Clery A, Wu J, Shen C, Shirole NH, et al. SRSF1-Regulated Alternative Splicing in Breast Cancer. Mol Cell. 2015;60:105–17.
Pandit S, Zhou Y, Shiue L, Coutinho-Mansfield G, Li HR, Qiu JS, et al. Genome-wide Analysis Reveals SR Protein Cooperation and Competition in Regulated Splicing. Mol Cell. 2013;50:223–35.
Xue YC, Zhou Y, Wu TB, Zhu T, Ji X, Kwon YS, et al. Genome-wide Analysis of PTB-RNA Interactions Reveals a Strategy Used by the General Splicing Repressor to Modulate Exon Inclusion or Skipping. Mol Cell. 2009;36:996–1006.
Chen X, Xu H, Yuan P, Fang F, Huss M, Vega VB, et al. Integration of external signaling pathways with the core transcriptional network in embryonic stem cells. Cell. 2008;133:1106–17.
Silva J, Nichols J, Theunissen TW, Guo G, van Oosten AL, Barrandon O, et al. Nanog Is the Gateway to the Pluripotent Ground State. Cell. 2009;138:722–37.
Takahashi K, Tanabe K, Ohnuki M, Narita M, Ichisaka T, Tomoda K, et al. Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell. 2007;131:861–72.
Kim JW, Chu JL, Shen XH, Wang JL, Orkin SH. An extended transcriptional network for pluripotency of embryonic stem cells. Cell. 2008;132:1049–61.
Chang CP, Jacobs Y, Nakamura T, Jenkins NA, Copeland NG, Cleary ML. Meis proteins are major in vivo DNA binding partners for wild-type but not chimeric Pbx proteins. Mol Cell Biol. 1997;17:5679–87.
Merabet S, Mann RS. To Be Specific or Not: The Critical Relationship Between Hox And TALE Proteins. Trends Genet. 2016;32:334–47.
Melendez CLC, Rosales L, Herrera M, Granados L, Tinti D, Villagran S, et al. Frequency of the ETV6-RUNX1, BCR-ABL1, TCF3-PBX1 and MLL-AFF1 fusion genes in Guatemalan Pediatric Acute Lymphoblastic Leukemia Patients and Its Ethnic Associations. Clin Lymphoma Myeloma Leuk. 2015;15:S170.
Tesanovic T, Badura S, Oellerich T, Doering C, Ruthardt M, Ottmann OG. Mechanisms of Antileukemic Activity of the Multikinase Inhibitors Dasatinib and Ponatinib in Acute Lymphoblastic Leukemia (ALL) Harboring the E2A-PBX1 Fusion Gene. Blood. 2014;124
Foa R, Vitale A, Cuneo A, Mecucci C, Mancini H, Cimino G, et al. E2A-PBX1 fusion in adult acute lymphoblastic leukemia (ALL) with t(1;19) translocation: Biologic and clinical features. Blood. 2000;96:189b.
Kamps MP, Baltimore D. E2a-Pbx1, the T(1, 19) Translocation Protein of Human Pre-B-Cell Acute Lymphocytic-Leukemia, Causes Acute Myeloid-Leukemia in Mice. Mol Cell Biol. 1993;13:351–7.
Jung JG, Kim TH, Gerry E, Kuan JC, Ayhan A, Davidson B, et al. PBX1, a transcriptional regulator, promotes stemness and chemoresistance in ovarian cancer. Clin Cancer Res. 2016;22
Magnani L, Patten DK, Nguyen VTM, Hong SP, Steel JH, Patel N, et al. The pioneer factor PBX1 is a novel driver of metastatic progression in ER'-positive breast cancer. Oncotarget. 2015;6:21878–91.
Jung JG, Park JT, Stoeck A, Hussain T, Wu RC, Shih IM, et al. Targeting PBX1 signaling to sensitize carboplatin cytotoxicity in ovarian cancer. Clin Cancer Res. 2015;21
Feng Y, Li L, Zhang X, Zhang Y, Liang Y, Lv J, et al. Hematopoietic pre-B cell leukemia transcription factor interacting protein is overexpressed in gastric cancer and promotes gastric cancer cell proliferation, migration, and invasion. Cancer Sci. 2015;106:1313–22.
Li WH, Huang K, Guo HZ, Cui GH, Zhao S. Inhibition of non-small-cell lung cancer cell proliferation by Pbx1. Chin J Cancer Res. 2014;26:573–8.
Thiaville MM, Stoeck A, Chen L, Wu RC, Magnani L, Oidtman J, et al. Identification of PBX1 Target Genes in Cancer Cells by Global Mapping of PBX1 Binding Sites. PLoS One. 2012;7
Magnani L, Ballantyne EB, Zhang X, Lupien M. PBX1 genomic pioneer function drives ERalpha signaling underlying progression in breast cancer. PLoS Genet. 2011;7:e1002368.
Park JT, Shih Ie M, Wang TL. Identification of Pbx1, a potential oncogene, as a Notch3 target gene in ovarian cancer. Cancer Res. 2008;68:8852–60.
Qiu Y, Tomita Y, Zhang B, Nakamichi I, Morii E, Aozasa K. Pre-B-cell leukemia transcription factor 1 regulates expression of valosin-containing protein, a gene involved in cancer growth. Am J Pathol. 2007;170:152–9.
Berge V, Ramberg H, Eide T, Svindland A, Tasken KA. Expression of Pbx1 and UGT2B7 in prostate cancer cells. Eur Urol Suppl. 2006;5:794.
Kim SK, Selleri L, Lee JS, Zhang AY, Gu XY, Jacobs Y, et al. Pbx1 inactivation disrupts pancreas development and in Ipf1-deficient mice promotes diabetes mellitus. Nat Genet. 2002;30:430–5.
Ficara F, Murphy MJ, Lin M, Cleary ML. Pbx1 regulates self-renewal of long-term hematopoietic stem cells by maintaining their quiescence. Cell Stem Cell. 2008;2:484–96.
Xu B, Cai L, Butler JM, Chen D, Lu X, Allison DF, et al. The Chromatin Remodeler BPTF Activates a Stemness Gene-Expression Program Essential for the Maintenance of Adult Hematopoietic Stem Cells. Stem Cell Reports. 2018;10:675–83.
Koss M, Bolze A, Brendolan A, Saggese M, Capellini TD, Bojilova E, et al. Congenital Asplenia in Mice and Humans with Mutations in a Pbx/Nkx2-5/p15 Module. Dev Cell. 2012;22:913–26.
Grebbin BM, Schulte D. PBX1 as Pioneer Factor: A Case Still Open. Front Cell Dev Biol. 2017;5:9.
Pruitt KD, Brown GR, Hiatt SM, Thibaud-Nissen F, Astashyn A, Ermolaeva O, et al. RefSeq: an update on mammalian reference sequences. Nucleic Acids Res. 2014;42:D756–63.
Burglin TR, Ruvkun G. New Motif in Pbx Genes. Nat Genet. 1992;1:319–20.
Bernstein BE, Stamatoyannopoulos JA, Costello JF, Ren B, Milosavljevic A, Meissner A, et al. The NIH Roadmap Epigenomics Mapping Consortium. Nat Biotechnol. 2010;28:1045–8.
Thomas DJ, Rosenbloom KR, Clawson H, Hinrichs AS, Trumbower H, Raney BJ, et al. The ENCODE project at UC Santa Cruz. Nucleic Acids Res. 2007;35:D663–7.
Chan KKK, Zhang J, Chia NY, Chan YS, Sim HS, Tan KS, et al. KLF4 and PBX1 Directly Regulate NANOG Expression in Human Embryonic Stem Cells. Stem Cells. 2009;27:2114–25.
Asahara H, Dutta S, Kao HY, Evans RM, Montminy M. Pbx-hox heterodimers recruit coactivator-corepressor complexes in an isoform-specific manner. Mol Cell Biol. 1999;19:8219–25.
DiRocco G, Mavilio F, Zappavigna V. Functional dissection of a transcriptionally active, target-specific Hox-Pbx complex. EMBO J. 1997;16:3644–54.
Wang ZB, Zang CZ, Rosenfeld JA, Schones DE, Barski A, Cuddapah S, et al. Combinatorial patterns of histone acetylations and methylations in the human genome. Nat Genet. 2008;40:897–903.
Sim RJ, Belotserkovskaya R, Reinberg D. Elongation by RNA polymerase II: the short and long of it. Genes Dev. 2004;18:2437–68.
Linares AJ, Lin CH, Damianov A, Adams KL, Novitch BG, Black DL. The splicing regulator PTBP1 controls the activity of the transcription factor Pbx1 during neuronal differentiation. Elife. 2015;4
Katz Y, Wang ET, Airoldi EM, Burge CB. Analysis and design of RNA sequencing experiments for identifying isoform regulation. Nat Methods. 2010;7:1009–15.
Wang ET, Cody NA, Jog S, Biancolella M, Wang TT, Treacy DJ, et al. Transcriptome-wide regulation of pre-mRNA splicing and mRNA localization by muscleblind proteins. Cell. 2012;150:710–24.
Tapial J, Ha KCH, Sterne-Weiler T, Gohr A, Braunschweig U, Hermoso-Pulido A, et al. An atlas of alternative splicing profiles and functional associations reveals new regulatory programs and genes that simultaneously express multiple major isoforms. Genome Res. 2017;27:1759–68.
Singh B, Trincado JL, Tatlow PJ, Piccolo SR, Eyras E. Genome Sequencing and RNA-Motif Analysis Reveal Novel Damaging Noncoding Mutations in Human Tumors. Mol Cancer Res. 2018;16:1112–24.
Lin L, Park JW, Ramachandran S, Zhang Y, Tseng YT, Shen S, et al. Transcriptome sequencing reveals aberrant alternative splicing in Huntington's disease. Hum Mol Genet. 2016;25:3454–66.
Ji X, Park JW, Bahrami-Samani E, Lin L, Duncan-Lewis C, Pherribo G, et al. alphaCP binding to a cytosine-rich subset of polypyrimidine tracts drives a novel pathway of cassette exon splicing in the mammalian transcriptome. Nucleic Acids Res. 2016;44:2283–97.
Bebee TW, Park JW, Sheridan KI, Warzecha CC, Cieply BW, Rohacek AM, et al. The splicing regulators Esrp1 and Esrp2 direct an epithelial splicing program essential for mammalian development. Elife. 2015;4
Melikishvili M, Chariker JH, Rouchka EC, Fondufe-Mittendorf YN. Transcriptome-wide identification of the RNA-binding landscape of the chromatin-associated protein PARP1 reveals functions in RNA biogenesis. Cell Discov. 2017;3:17043.
Humphrey J, Emmett W, Fratta P, Isaacs AM, Plagnol V. Quantitative analysis of cryptic splicing associated with TDP-43 depletion. BMC Med Genet. 2017;10:38.
Katz Y, Wang ET, Silterra J, Schwartz S, Wong B, Mesirov JP, et al. Sashimi plots: quantitative visualization of RNA sequencing read alignments. arXiv preprint arXiv:13063466 2013.
Zhang Y, Liu T, Meyer CA, Eeckhoute J, Johnson DS, Bernstein BE, et al. Model-based Analysis of ChIP-Seq (MACS). Genome Biol. 2008;9:R137.
Chiara MD, Reed R. A two-step mechanism for 5′ and 3′ splice-site pairing. Nature. 1995;375:510–3.
Liu L, Jin G, Zhou X. Modeling the relationship of epigenetic modifications to transcription factor binding. Nucleic Acids Res. 2015;43:3873–85.
Hall M, Frank E, Holmes G, Pfahringer B, Reutemann P, Witten IH. The WEKA data mining software: an update. ACM SIGKDD Explor Newslett. 2009;11:10–8.
Trapnell C, Roberts A, Goff L, Pertea G, Kim D, Kelley DR, et al. Differential gene and transcript expression analysis of RNA-seq experiments with TopHat and Cufflinks. Nat Protoc. 2012;7:562–78.
Carbon S, Ireland A, Mungall CJ, Shu S, Marshall B, Lewis S, et al. AmiGO: online access to ontology and annotation data. Bioinformatics. 2009;25:288–9.
Karolchik D, Hinrichs AS, Furey TS, Roskin KM, Sugnet CW, Haussler D, et al. The UCSC Table Browser data retrieval tool. Nucleic Acids Res. 2004;32:D493–6.
Chen J, Bardes EE, Aronow BJ, Jegga AG. ToppGene Suite for gene list enrichment analysis and candidate gene prioritization. Nucleic Acids Res. 2009;37:W305–11.
Pathan M, Keerthikumar S, Ang CS, Gangoda L, Quek CY, Williamson NA, et al. FunRich: An open access standalone functional enrichment and interaction network analysis tool. Proteomics. 2015;15:2597–601.
Pathan M, Keerthikumar S, Chisanga D, Alessandro R, Ang CS, Askenase P, et al. A novel community driven software for functional enrichment analysis of extracellular vesicles data. J Extracell Vesicles. 2017;6:1321455.
Vodyanik MA, Yu J, Zhang X, Tian S, Stewart R, Thomson JA, et al. A mesoderm-derived precursor for mesenchymal stem and endothelial cells. Cell Stem Cell. 2010;7:718–29.
Meyerrose TE, De Ugarte DA, Hofling AA, Herrbrich PE, Cordonnier TD, Shultz LD, et al. In vivo distribution of human adipose-derived mesenchymal stem cells in novel xenotransplantation models. Stem Cells. 2007;25:220–7.
Sekiya I, Larson BL, Smith JR, Pochampally R, Cui JG, Prockop DJ. Expansion of human adult stem cells from bone marrow stroma: conditions that maximize the yields of early progenitors and evaluate their quality. Stem Cells. 2002;20:530–41.
Stamatoyannopoulos J. University of Washington Human Reference Epigenome Mapping Project. NCBI GEO 2010, GSE18927. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE18927. Nov 2015.
Lister R PM, Dowen RH, Hawkins RD, Hon G, Tonti-Filippini J, Nery JR, et al. UCSD Human Reference Epigenome Mapping Project. NCBI GEO 2009. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE16256. Nov 2015.
The authors thank Drs Guangxu Jin, Liang Liu, and Dongmin Guo from the Wake Forest School of Medicine, who gave a lot of comments and suggestions on the data analyses and interpretations. They also acknowledge the editorial assistance of Karen Klein, MA, from the Wake Forest Clinical and Translational Science Institute (UL1 TR001420; PI: McClain).
This work was funded by the National Institutes of Health (NIH) [1R01GM123037 and AR069395].
Availability of data and materials
All RNA-seq and 16 HMs ChIP-seq data of H1 and five other differentiated cells are available in Gene Expression Omnibus (GEO) under accession number GSE16256 . The BAM files of the RNA-seq data (two replicates for each, aligned to human genome hg18) are alternatively available at http://renlab.sdsc.edu/differentiation/download.html. Both RNA-seq and ChIP-seq data of 56 cell lines/tissues from the Roadmap/ENCODE projects [97, 98] are available on their official website (RoadMap: ftp://ftp.ncbi.nlm.nih.gov/pub/geo/DATA/roadmapepigenomics/by_sample/; ENCODE: ftp://hgdownload.cse.ucsc.edu/goldenPath/hg19/encodeDCC/) and all raw files are also available at GEO under the accession numbers GSE18927  and GSE16256 . Additional file 4: Table S4 provides the detailed information of these data.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Figure S1. Identifying hESC differentiation-related AS exons. Figure S2. The hESC differentiation-related AS exons possess the typical properties of AS exons. Figure S3. AS profiles upon hESC differentiation show lineage-specific splicing pattern. Figure S4. HMs change significantly around the alternatively spliced exons upon hESC differentiation. Figure S5. A subset of AS events are significantly associated with some HMs upon hESC differentiation. Figure S6. K-means clustering based on selected epigenetic features of eight HMs for MXE and SE AS exons. Figure S7. HM-associated AS genes are more lineage-specific. Figure S8. HM-unassociated AS genes are enriched in G1 cell-cycle phase and pathways for self-renewal. Figure S9. Isoform switch from PBX1a and PBX1b during hESC differentiation. Figure S10. Isoform switch of PBX1 links H3K36me3 to hESC fate decision. Figure S11. The effect of ΔPSI cutoffs for AS-HM correlations. Table S1. The number of all AS events identified during hESC differentiation. Table S5. The PCR primers used in this study. (PDF 1917 kb)
Table S2. AS events (AS exons) during the differentiation from H1 cells to differentiated cells. (XLSX 1852 kb)
Table S3. HM-associated AS exons based on k-means clustering. (XLSX 1088 kb)
Table S4. 56 cell lines/tissues and their corresponding RNA-seq data sources from ENCODE and Roadmap projects. (XLSX 14 kb)