Integrated transcriptome and epigenome analyses identify alternative splicing as a novel candidate linking histone modifications to embryonic stem cell fate decision

Understanding embryonic stem cell (ESC) fate decision between self-renewal and proper differentiation is important for developmental biology and regenerative medicine. Attentions have focused on underlying mechanisms involving histone modifications (HMs), alternative pre-mRNA splicing (AS), and cell-cycle progression. However, their intricate interrelations and joint contributions to ESC fate decision remain unclear. We performed integrative analyses on transcriptomic and epigenomic data derived from human ESC (hESC) and five types of differentiated cells. We identified thousands of AS exons and revealed their development- and lineage-dependent characterizations. Following the observation that dynamic HM changes predominantly occur in AS exons upon hESCs differentiation, we identified 3 of 16 investigated HMs (H3K36me3, H3K27ac, and H4K8ac) that are strongly associated with 52.8% of hESC differentiation-related AS events. Further analyses showed that the HM-associated AS genes predominantly function in G2/M phases and ATM/ATR-mediated DNA damage response pathway for cell differentiation, whereas HM-unassociated AS genes enrich in G1 phase and pathways for self-renewal. These results imply a potential epigenetic mechanism by which some HMs contribute to ESC fate decision through the AS regulation of specific pathways and cell-cycle genes. We exemplified the potential mechanism by a cell cycle-related transcription factor, PBX1, which regulates the pluripotency regulatory network by binding to NANOG. We found that the isoform switch between PBX1a and PBX1b is strongly associated with H3K36me3 alteration and implicated in hESC fate determination. Supported by extended dataset from Roadmap/ENCODE projects, we identified the alternative splicing of PBX1 as a novel candidate linking H3K36me3 to embryonic stem cell fate decision.


Introduction
Embryonic stem cells (ESCs), the pluripotent stem cells derived from the inner cell mass of a blastocyst, provide a vital tool for studying the regulation of early embryonic development and cell-fate decision, and hold the promise for regenerative medicine (Thomson et al. 1998). The past few years have witnessed remarkable progress in understanding the ESC fate decision, i.e. either pluripotency maintenance (self-renewal) or proper differentiation (Graveley 2011). The underlying mechanisms have been largely expanded from the core pluripotent transcription factors (TFs) (Boyer et al. 2005), signaling pathways (Watabe and Miyazono 2009;Lian et al. 2013;Ogaki et al. 2013;Sakaki-Yumoto et al. 2013;Clevers et al. 2014;Itoh et al. 2014), specific microRNAs (Tay et al. 2008;Shenoy and Blelloch 2014), and long non-coding RNAs (Fatica and Bozzoni 2014) to alternative pre-mRNA splicing (AS) (Salomonis et al. 2010;Gabut et al. 2011), histone modifications (HMs) (Hawkins et al. 2010;Gifford et al. 2013;Xie et al. 2013b;Boland et al. 2014), and cell-cycle machinery (Vallier 2015). These emerging mechanisms suggest their intricate interrelations and potential joint contributions to ESC pluripotency and differentiation, which, however, remain unknown.
Alternative splicing (AS) is a most important pre-mRNA processing to increase the diversity of transcriptome and proteome in tissue-and development-dependent manners (Pan et al. 2008).
The estimates based on RNA-seq revealed that up to 94%, 60% and 25% of genes in human, Drosophila melanogaster and Caenorhabdities elegans, respectively, undergo AS (Pan et al. 2008;Wang et al. 2008a;Gerstein et al. 2010;Graveley et al. 2011;Ramani et al. 2011). AS also provides a powerful mechanism to control developmental decision in ESCs (Yeo et al. 2007;Salomonis et al. 2009;Lu et al. 2013). Specific isoforms are necessary to maintain both the identity and activity of stem cells, and switching to different isoforms ensures proper differentiation (Kalsotra and Cooper 2011). Especially, the AS of TFs plays major roles in ESC fate determination, such as FGF4 (Mayshar et al. 2008) and FOXP1 (Gabut et al. 2011) for hESC, Tcf3 (Salomonis et al. 2010) and Sall4 (Rao et al. 2010) for mouse ESCs (mESCs). Understanding the precise regulations on AS would contribute to elucidation of ESC fate decision and has attracted extensive efforts. For many years, studies aiming to shed light on this process focused on the RNA level, characterizing the manner by which splicing factors and auxiliary proteins interact with splicing signals, thereby enabling, facilitating and regulating RNA splicing. These cis-acting RNA elements and trans-acting splicing factors (SFs) have been assembled into splicing code (Barash et al. 2010), revealing a number of AS regulators critical for ESC differentiation, such as MBNL (Han et al. 2013) and SON (Lu et al. 2013). However, these genetic controls are far from sufficient to explain the faithful regulation of AS (Schwartz et al. 2009). Especially in some cases that tissuespecific AS patterns exist despite the identity in sequences and ubiquitously expression of involved SFs (Kornblihtt et al. 2004;Wang and Cooper 2007), indicating additional regulatory layers leading to specific AS patterns. As Expected, we increasingly understand that splicing is not an isolated process; rather it occurs co-transcriptionally and is presumably also regulated by transcription-related processes. Emerging provocative studies have unveiled that AS is subject to extensive controls not only from genetic but also epigenetic mechanisms due to its cotranscriptional occurrence (Luco et al. 2011). The epigenetic mechanisms, such as HMs, benefits ESCs by providing an epigenetic memory for splicing decisions, so that the splicing pattern could be passed on during self-renewal and be modified during differentiation without the requirement of establishing new AS rules (Luco et al. 2011).
HMs have long been thought to play crucial roles in ESCs maintenance and differentiation by determining what parts of the genome are expressed. Specific genomic regulatory regions, such as enhancers and promoters, undergo dynamic changes in HMs during ESC differentiation to transcriptionally permit or repress the expression of genes required for cell fate decision (Xie et al. 2013a). For example, the co-occurrence of the active (H3K4me3) and repressive (H3K27me3) HMs at the promoters of developmentally regulated genes defines the bivalent domains, resulting in the poised states of these genes (Sharov and Ko 2007). These poised states will be dissolved upon differentiation to allow these genes to be active or more stably repressed depending on the lineage being specified, which enables the ESCs to change their identities (Singh et al. 2015). In addition to above roles in determining transcripts abundance, HMs are emerging as major regulators to define the transcripts structure by determining how the genome is spliced when being transcribed, adding another layer of regulatory complexity beyond the genetic splicing code (Kouzarides 2007). A number of HMs, such as H3K4me3 (Sims et al. 2007), H3K9me3 (Piacentini et al. 2009), H3K36me3 (Luco et al. 2010;Pradeepa et al. 2012), and acetylation of H3 (Gunderson and Johnson 2009), have been proven to regulate AS by either directly recruiting chromatin-associated factors and SFs or indirectly modulating transcriptional elongation rate (Luco et al. 2011). However, few studies focused on epigenetic regulations on AS in the context of cell fate decision.
Additionally, cell-cycle machinery dominates the mechanisms underlying ESC pluripotency and differentiation (Dalton 2015;Vallier 2015). Changes of cell fates require going through cell-cycle progression. Studies in mESCs (Coronado et al. 2013) and hESCs (Pauklin and Vallier 2013;Gonzales et al. 2015) found that the cell fate specification starts in G1 phase, when ESCs can sense differentiation signals. Cell fate commitment is only achieved in G2/M phases, when pluripotency is dissolved through cell-cycle dependent mechanisms. However, whether the HMs and AS, and their interrelations, are involved in these cell cycle-dependent mechanisms remains unclear. Therefore, it is intuitive to expect that HMs could contribute to ESC pluripotency and differentiation by regulating the AS of genes required for specific processes, such cell-cycle progression. Nevertheless, we do not even have a comprehensive view of how HMs relate to AS outcome at genome-wide during ESC differentiation. Therefore, further studies are required to elucidate the extent to which the HMs are associated with specific splicing repertoire and their joint contributions to ESC fate decision between self-renewal and proper differentiation.
To address these gaps in current knowledge, we performed genome-wide association studies between transcriptome and epigenome of the differentiations from the hESCs (H1 cell line) to five differentiated cell types, representing multiple lineages of different developmental levels (Xie et al. 2013b). First, we used RNA-seq data to identify exons that are alternatively spliced during hESC differentiation. We identified 3,513 mutually exclusive exons (MXE) and 3,678 skipped exons (SE) that are differentially spliced between the hESCs and differentiated cells. These hESC differentiation-related AS events involve ~20% of expressed genes and characterize the multiple lineage differentiations. Second, we profiled 16 HMs with ChIP-seq data available for all six cell types, including nine types of acetylation and seven types of methylation. Following the observation that the dynamic changes of most HMs are enriched in AS exons and significantly different between inclusion-gain and inclusion-loss exons, we found that 3 of the 16 investigated HMs (H3K36me3, H3K27ac, and H4K8ac) are strongly associated with 52.8% of hESC differentiation-related AS exons. We then linked the association between HMs and AS to cellcycle progression based on the additional discovery that the AS genes predominantly function in cell-cycle progression. More intriguingly, we found that HMs and AS are associated in G2/M phases and involved in ESC fate decision through promoting pluripotency state dissolution, repressing self-renewal, or both. Specially, we suggested a candidate of the H3K36me3associated isoform switch from PBX1a to PBX1b that is implicated in hESC differentiation by repressing the pluripotency regulatory network. Collectively, we presented a potential mechanism conveying the HM information into cell fate decision through the regulation of AS, which will drive further mechanistic and experimental studies on the involvements of HMs in cell fate decision via determining the transcript structure rather than only the transcript abundance.

Alternative splicing characterizes hESC differentiation
The role of AS in regulation of ES cell fates adds another notable regulatory layer to the know mechanisms that govern stemness and differentiation (Aaronson and Meshorer 2013). To screen the AS events associated with ES cell fate decision, we investigated a panel of RNA-seq data during hESC (H1 cell line) differentiation (Xie et al. 2013b). We considered four cell types directly differentiated from H1 cells, including trophoblast-like cells (TBL), mesendoderm (ME), neural progenitor cells (NPC), and mesenchymal stem cells (MSC) ( Figure S1A). We also considered IMR90, a cell line for primary human fetal lung fibroblast, as an example of terminally differentiated cells ( Figure S1A). Comparisons between the differentiated cell types and hESCs represent five cell lineages of different developmental levels ( Figure S1A). We identified two types of AS exons with the changes of 'per spliced in' (ΔPSIs) are more than 0.1 (inclusion-loss) or less than -0.1 (inclusion-gain), and with the FDRs are less than 0.05 based on the measurement used by rMATS (Shen et al. 2014) ( Figure S1B, see Methods). They are referred to as hESC differentiation-related AS exons, including 3,513 mutually exclusive exons (MXE) and 3,678 skipped exons (SE) ( Figure   S1C, Supplemental Table S1).
These hESC differentiation-related AS exons possess the typical properties as previously described (Magen and Ast 2005;Zheng et al. 2005): (1)   During hESC differentiation, about 20% of expressed genes undergo AS (2,257 for SE and 2,489 for MXE), including previously known ESC-specific AS genes, such as the pluripotency factor FOXP1 (Gabut et al. 2011) ( Figure 1A) and the Wnt/β-catenin signaling component CTNND1 (Salomonis et al. 2010) ( Figure 1B). These hESC differentiation-related AS genes contain many transcription factors, transcriptional co-factors, chromatin remodeling factors, housekeeping genes, and bivalent domain genes implicated in ESC pluripotency and development (Sharov and Ko 2007) (Figures 1C and S1C). Enrichment analysis based on a stemness gene set (Pinto et al. 2015) also shows that hESC differentiation-related AS genes are enriched in regulators or markers most significantly associated with stemness of ESCs ( Figure S3A, see Methods).
Clustering on AS events across cell lineages shows lineage-dependent splicing patterns ( Figure   1D). Upon hESC differentiation, the SE exons tend to lose their inclusion levels (inclusion-loss), while the upstream exons of MXE events are likely to gain their inclusion levels (inclusion-gain) (Fisher's exact test, p = 3.83E-107). The numbers of AS events increase accordingly with the developmental level following hESC differentiation ( Figure 1C). For example, the differentiation to ME involves the fewest AS events and ME presents the most stem-cell-like AS profiles, while the IMR90 has most AS events and exhibits AS profiles most similar to adult cells ( Figure 1C, D).  Figure S3F).
Furthermore, one third of AS genes (n = 881) have both MXE and SE events ( Figure S3G). Only four genes are common across all cell lineages and AS types; of these, the AS events of Ctnnd1 and Mbd1 have been reported to regulate mESC differentiation (Salomonis et al. 2010). Together, these results demonstrate that alternative splicing depicts lineage-and developmental leveldependent characterizations of hESC differentiation.

Dynamic changes of histone modification predominantly occur in AS exons
In ESCs, epigenetic mechanisms contribute mainly to maintaining the expression of pluripotency genes and the repression of lineage-specific genes in order to avoid exiting from stemness. Upon differentiation, epigenetic mechanisms orchestrate the expression of developmental programs spatiotemporally to ensure heritability of existing or newly acquired phenotypic states. Though epigenetic signatures are mainly found to be enriched in promoters, it has become increasingly clear that they are also present in gene bodies, especially in exon regions, implying a potential link of epigenetic regulation to pre-mRNA splicing (Barski et al. 2007;Kornblihtt et al. 2009).
Consistent with previous reports (Kornblihtt et al. 2004;Wang and Cooper 2007), we also observed that few involved SF genes are differentially expressed during H1 cells differentiation ( Figure S3H, see Methods), which confirms the existence of additional layer of epigenetic regulations on AS. However, the extents to which the AS is epigenetically regulated and how these AS genes contribute to cell fate decision are poorly understood. We focused on 16 histone modifications, including nine histone acetylation and seven histone methylation that have available data in all six cell types (see Methods), and aimed to reveal their regulated AS genes during hESC differentiation.
To investigate whether the dynamic changes of these HMs upon cell differentiation prefer AS exons consistently (Figures 2A, B), we profiled the differential HM patterns of around the hESC differentiation-associated AS exons and the same number of randomly selected constitutive splicing (CS) exons of the same AS genes for each differentiation lineage. We compared the changes of ChIP-seq reads count (Δ reads number) in ±150bp regions around the splice sites upon hESC differentiation ( Figure 2C, see Methods). Except a small part of cases (with black dots or boxes in Figure 2D), most HMs change more significantly around AS exons than around constitutive exons upon hESC differentiation (Mann-Whitney-Wilcoxon test, p ≤ 0.05, Figures 2D and S4). Nevertheless, some HMs have strong links to AS, such as H3K79me1 and H3K36me3, while others only have weak link strengths, such as H3K27me3 and H3K9me3 ( Figure 2D). This result is consistent with the fact that the former are involved in active expression and AS regulation (Kolasinska-Zwierz et al. 2009;Luco et al. 2010;Luco et al. 2011), while the latter are the epigenetic marks of repressed regions and heterochromatins (Mikkelsen et al. 2007). The link strengths are shown as the -log10 p-values to test whether the HM changes consistently prefer the AS exons across different cell lineages and AS types ( Figure 2D side bar graph, see Methods).
Taken together, these results, from a global view, revealed a potential regulatory link from HMs to RNA splicing, of which some are strong while the others are weak.

Three histone modifications are associated with alternative splicing upon hESC differentiation
To quantitatively associate the HMs with AS ( Figure 3A), all ChIP-seq data were processed for narrow peak calling using MACS2 (Feng et al. 2012). For each AS exon of each differentiation lineage, we then quantified the differential inclusion levels, i.e., the changes of 'percent splice in' (ΔPSIs, Figure S1B), and the differential HMs signals, i.e., the changes of normalized narrow peak height of ChIP-seq (ΔHMs, Figure S5A, see Methods) between H1 and differentiated cells.
We observed significant differences in all HM profiles (except H3K27me3, Figure S5B The weak associations probably indicate that only subsets of AS exons are strongly associated with HMs, and vice versa, consistent with a recent report (Curado et al. 2015).

It shows that HM changes are significantly different between inclusion-gain and inclusion-loss AS exons (p-values,
Mann-Whitney-Wilcoxon test). Figure S5B provides the whole significances of all HMs across AS types and cell lineages. (C) Pearson correlation test between differential HM signals (ΔHMs) and differential inclusion levels (ΔPSIs), taking H3k36me3 as an example. Figure  To explore the subsets of highly associated AS exons and corresponding HMs, we performed kmeans clustering on the sets of inclusion-gain and inclusion-loss exons of SE and MXE events, separately, taking the ΔHMs of eight identified HMs as epigenetic features ( Figures 3D, S5D, and S6, see Methods). We obtained three subsets of HM-associated SE exons and three subsets of HM-associated MXE exons (Supplemental Table S2). The three HM-associated SE subsets include 180, 664, and 1,062 exons, and are negatively associated with H4K8ac ( Figure S6), negatively associated with H3K36me3 ( Figure 3D), and positively associated with H3K36me3 ( Figure S6), respectively. The three HM-associated MXE subsets include 99, 821, and 971 exons, and are positively associated with H3K27ac ( Figure 3E), negatively associated with H3K36me3 ( Figure S6), and positively associated with H3K36me3 ( Figure S6), respectively. The exons of each subset show significant correlations between their ΔPSIs and ΔHMs upon hESC differentiation ( Figure 3E). These HM-associated AS exons account for 52.8% of all hESC differentiation-related AS events, on average ( Figure S5E).
Of the three AS-associated HMs, H3K36me3 has both positive and negative correlations with AS exons. This is consistent with the fact that H3K36me3 has dual functions in regulating AS through two different chromatin-adapter systems, PSIP1/SRSF1 (Pradeepa et al. 2012) and MRG15/PTBP1 (Luco et al. 2010). The former increases the inclusion levels of targeting AS exons, whereas the latter decreases the inclusion levels (Luco et al. 2011). As expected, 139 and 11 of our identified H3K36me3-associated AS genes have been reported to be regulated by SRSF1 (Pandit et al. 2013;Anczukow et al. 2015) ( Figure S5F) and PTBP1 (Xue et al. 2009) ( Figure   S5G), respectively. Taken together, we showed that more than half (52.8%) of hESC differentiation-related AS events are significantly associated with 3 of 16 HMs during hESC differentiation, including H3K36me3, H3K27ac, and H4K8ac.

HM-associated AS genes predominantly function in G2/M phases to facilitate hESC differentiation
Epigenetic mechanisms have been proposed to be dynamic and play crucial roles in human ESC differentiation (Gifford et al. 2013;Xie et al. 2013b). Given the aforementioned associations between HMs and AS, and the well-established links between AS and hESC differentiation, we hypothesized that the three HMs (H3K36me3, H3K27ac, and H4K8ac) may contribute to stem cell differentiation through their associated AS events. To test our hypothesis and gain more insights into the differences between the HM-associated and -unassociated AS events, we performed comparative function analyses between their hosting genes, revealing that HMs are involved in alternatively splicing the core components of cell-cycle machinery and related pathways to regulate stem cell pluripotency and differentiation.
We found that HMs prefer to be associated with much shorter AS exons ( Figure S7A, p < 0.001, Student's t-test), though AS exons are shorter than average length of all exons ( Figure S2A). HMassociated genes (n = 2125) show more lineage specificity, i.e., more genes (49.76% versus 29.6% of MXE or 38.6% of SE genes) are lineage-specific ( Figures S7B and S3D, E), regardless of whether IMR90 is included or not ( Figure S7C). Only a few HM-associated genes are shared by different cell lineages, even in pairwise comparisons ( Figure S7D); the most common shared genes are lineage-independent housekeeping genes ( Figure S7E). These suggest that HMassociated AS genes contribute more to lineage specificity. Additionally, the HM-associated AS genes (966 of 2,125) are more enriched in stemness signatures than do unassociated AS genes (429 of 1,057) ( Figure 4A). TF binding enrichment analysis shows that HM-associated AS genes are likely to be regulated by TFs involved in cell differentiation, whereas HM-unassociated AS genes are likely to be regulated by TFs involved in cell proliferation and growth ( Figure 4B). All these results suggest that HM-associated and -unassociated AS genes function differently during hESC differentiation.
Gene ontology (GO) enrichment analysis shows that more than half of the HM-associated AS genes (1,120 of 2,125) function in cell-cycle progression and exhibit more significant enrichment than do HM-unassociated AS genes (376 of 1,057, Figures 4C and S8A). The significance of the top enriched GO term (GO:0007049, cell cycle) is consistent across cell lineages, although HMassociated AS genes exhibit more lineage specificity and few of them are shared among lineages ( Figures 4D, S7B-D, and S8B). These results suggest the involvement of HMs in AS regulation of the cell-cycle machinery that has been reported to be exploited by stem cells to control their cell fate decision (Vallier 2015).
Further study of the top enriched cell-cycle AS genes ( Figures 4D and S8A) shows that HMassociated (n = 282) and -unassociated AS genes (n = 150) play roles in different cell-cycle phases and related pathways. The former are prone to function in G2/M phases and DNA damage response ( Figures 4E, F). This indicates that HMs contribute to cell differentiation, at least partially, via AS regulations in these phases, which is consistent with the fact that inheritance of HMs in daughter cells occurs during the G2 phases (Vallier 2015). The latter play roles in G1 phase, cell cycle arrest, and Wnt/β-catenin signaling (Figures S8C, D). Since cell fate choices seem to occur or at least be initiated during G1/S transition (Pauklin and Vallier 2013), while cell fate commitment is achieved in G2/M (Gonzales et al. 2015), it could be rational for stem cells to change their identity during G2 phase when HMs are reprogrammed (Vallier 2015). Intriguingly, the top enriched pathway of HM-associated AS genes is 'ATM/ATR-mediated DNA damage response' (Figures 4F, S9A and S10). This pathway is activated in S/G2 phases and has been recently reported as a gatekeeper of the pluripotency state dissolution (PSD) that participates in allowing hESC differentiation (Gonzales et al. 2015). This enrichment is consistent regardless of whether all lineages are pooled together or considered separately ( Figure S9C). A number of enriched cell-cycle AS genes are shared by multiple lineages (Figures S9B and S10).
Our results suggest the presence of a combinational mechanism involving HMs and AS, wherein HMs facilitate the PSD and cell fate commitment by alternatively splicing the key components of the ATM/ATR pathway ( Figure S10). Additionally, many cell-cycle TF genes are involved in the top enriched HM-associated AS gene set. The pre-B-cell leukemia transcription factor 1 (PBX1) is one of these genes that contribute to cell cycle progression and is discussed later in next section.
Taken together, we suggest that 3 of 16 HMs function in positive or negative ways to affect the AS of subsets of genes, and further contribute to hESC differentiation in a cell-cycle phasedependent manner. The results suggest a potential mechanistic model connecting the HMs, AS regulations, and cell-cycle progression with the cell fate decision.

H3K36me3-associated isoform switch of PBX1 contributes to hESC fate decision
The past few years have identified key factors required for maintaining the pluripotent state of hESCs (Chen et al. 2008;Silva et al. 2009), including NANOG, OCT4 (POU5F1), SOX2, KLF4, and c-MYC, the combination of which was called Yamanaka factors and sufficient to reprogram somatic cells into induced pluripotent stem cells (iPSCs) (Takahashi et al. 2007). These factors appear to activate a transcriptional network that endows cells with pluripotency (Kim et al. 2008).
We identified a putatively H3K36me3-regulated AS "switch" at the upstream of pluripotency transcriptional network, strongly suggesting a potential mechanistic connection between HMs and ESC fate decision through the AS, The above integrative analyses showed strong links between three HMs and RNA splicing, revealing a group of epigenetic regulated AS genes involved in cell cycle machinery. PBX1 was one of the genes that their ASs are positively associated with H3K36me3 ( Figures 5A, B). Its protein, PBX1, is a member of the TALE (three-amino acid loop extension) family homeodomain transcription factors (Chang et al. 1997;Merabet and Mann 2016) and well known for its functions in lymphoblastic leukemia (Kamps and Baltimore 1993;Foa et al. 2000;Tesanovic et al. 2014;Melendez et al. 2015) and several cancers (Berge et al. 2006;Qiu et al. 2007;Park et al. 2008;Magnani et al. 2011;Thiaville et al. 2012;Li et al. 2014;Feng et al. 2015;Jung et al. 2015;Magnani et al. 2015;Jung et al. 2016). PBX1 also plays roles in regulating developmental gene expression (Kim et al. 2002) and maintaining stemness and self-renewal (Ficara et al. 2008;Jung et al. 2016), and in cell-cycle progression by repressing the cell-cycle inhibitor CDKN2B to promote transition to S phase (Koss et al. 2012). However, few studies have distinguished its different isoforms, and we suggested that the splicing variation of PBX1 is likely implicated in ESC fate decision through a distinct mechanism. PBX1 has three isoforms (Pruitt et al. 2014), PBX1a, PBX1b and PBX1c (Figures 5C and S11A).
The first two are produced by the alternative splicing of exon 7 ( Figure 5A) and attract most of the research attentions of PBX1. PBX1b retains the same DNA-binding homeodomain as PBX1a, but changes 14 amino acids (from 334 to 347) and truncates the remaining 83 amino acids at the Cterminus of PBX1a ( Figures 5C and S11A). This C-terminal alteration of PBX1a has been reported to affect its cooperative interactions with HOX partners (Burglin and Ruvkun 1992), which may impart different functions to these two isoforms. We found that the switch from PBX1a to PBX1b is H3K36me3-associated, and these two isoforms may function competitively during ESC differentiation.
We first observed differential transcript expressions of these two isoforms between the hESCs and differentiated cells. Specifically, PBX1a was predominantly transcribed in hESCs, while PBX1b was exclusively induced in differentiated cells ( Figure 5D) except ME and NPC. The exception of ME is due to its low differentiation level, demonstrated as the most stem-cell-like splicing patterns (Figures 1A-D) and expression profiles of pluripotent factors (Figures 5E-G).
Previous studies showed that neuronal development in mice required high expression of Pbx1a rather than Pbx1b (Paraguison et al. 2007;Golonzhka et al. 2015;Linares et al. 2015), supporting the finding of no PBX1b expression in NPCs. Additionally, we did not observe significantly different expression of the total PBX1 and three other PBX family members across cell types ( Figure 5H, fold change < 2). Together with previous reports showing that the PBX1a and PBX1b differ in their ability to activate or repress the expression of reporter genes (DiRocco et al. 1997;Asahara et al. 1999), these observations suggest that the isoform switch of PBX1, rather than the differential expression of its family members play roles that are more critical for hESC differentiation. To further test whether and how PBX1b contributes to stem cell differentiation, we investigated the transcription levels of core pluripotent TFs, including c-MYC, NANOG, and OCT4. Of these TFs, the NANOG is activated by direct promoter binding of PBX1 and KLF4, which is essential for stem cell maintenance (Ficara et al. 2008;Chan et al. 2009). In our observations, all these core factors are repressed in differentiated cells where PBX1b is highly expressed (Figures 5E-G), even though the PBX1a is expressed. As aforementioned, NPC is an exception, in which these stemness factors are also not expressed, although very low PBX1b expression is observed.
The reason for this exception has been suggested that the expression of stemness factors, especially for Nanog, was co-regulated by Pbx1a and Klf4 (Chan et al. 2009), while KLF4 was not expressed in NPCs ( Figure 5I), leading to the repression of stemness regulators in the absence of PBX1b (Figures 5E-G). Additionally, a PBX1a-repressed gene, CDKN2B (Koss et al. 2012), is expressed in cells (TBL, MSC, and IMR90) where PBX1b is expressed ( Figure 5J), regardless of whether or not PBX1a is expressed. The presence of CDKN2B will arrest the cell cycle in G1 phase, which is important for cell differentiation (Pauklin and Vallier 2013). It suggests that PBX1b may promote hESC differentiation by abolishing the PBX1a-induced repression of CDKN2B. Collectively, we hypothesize that PBX1b abolishes both the active and repressive functions of PBX1a by competitively binding to the same target genes during hESC differentiation.
The transcript switches between PBX1a and PBX1b were also observed in an extended dataset of 56 human cell lines/tissues from the Roadmap (Bernstein et al. 2010) and ENCODE (Thomas et al. 2007) projects (supplemental Table S3). Their relative expressions are apparently reversed from stem cells to the differentiated cells and adult tissues ( Figure 6A), which confirms that the isoform switch from PBX1a to PBX1b plays critical roles in hESC differentiation. Based on the 56 human cell lines/tissues, we also observed significant negative correlations between expression of most important pluripotent factors (NANOG and OCT4) and PBX1b ( Figures 6B, C), as well as positive correlations between the two factors and PBX1a (Figures S11B, C). These results suggest distinct functions of PBX1a and PBX1b, wherein the former promotes the activity of the pluripotent regulatory network, whereas the latter attenuates this activity by competitively binding and regulating the same target genes, such as NANOG, since PBX1b retains the same DNAbinding domain as PBX1a. These observations are strongly suggestive that the switch from PBX1a to PBX1b is an additional mechanism by which PBX1 contributes to hESC differentiation via regulating the pluripotency regulatory network. The mechanism by which H3K36me3 is involved in cell fate decision by expanding the transcriptional regulatory network for pluripotency, wherein PBX1b may attenuate the bindings of KLF4 and PBX1a to NANOG by competitively bindings to the same sites. Also see Figure S11.
The exon 7 of PBX1 shows significantly positive correlations between its inclusion levels and the surrounding epigenetic signals of H3K36me3 in hESCs and differentiated cells ( Figure 5B). It suggests a potential role of H3K36me3 in regulating the isoform switch between PBX1a and PBX1b. To investigate the regulatory role of H3K36me3, we focused on two previously proved chromatin-adaptor complexes, MRG15/PTBP1 (Luco et al. 2010) and PSIP1/SRSF1 (Pradeepa et al. 2012), which endow dual functions to H3K36me3 in AS regulation (Luco et al. 2011). Based on the 56 cell lines/tissues from the Roadmap/ENCODE projects, we found significantly positive correlations between the expressions of PBX1a and PSIP1/SRSF1 ( Figures 6D and S11D), but not with MRG15/PTBP1 (Figures 6E and S11D). This result suggests that the alternative splicing of PBX1 is epigenetic regulated by H3K36me3 through the PSIP1/SRSF1 adaptor system. This conclusion is strongly supported by a recent report using the HeLa cell line, which showed that PBX1 is indeed a splicing target of SRSF1 (Anczukow et al. 2015).
Taken together, we strongly suggested that H3K36me3 modulates the alternative splicing of PBX1 via the PSIP1/SRSF1 adaptor system, leading the isoform switch between PBX1a and PBX1b during hESC differentiation. Subsequently, PBX1b competitively binds to NANOG and abolishes the bindings of PBX1a and KLF4. This competitive binding attenuates the pluripotency regulatory network to repress self-renewal and consequently facilitate differentiation ( Figure 6F).
These findings reveal an alternative mechanism by which PBX1 contribute to the ESC fate decision through modulating the pluripotency regulatory network in addition to controlling the cellcycle progression. We also exemplified a potential mechanism by which HMs contribute to stem cell fate decision by regulating the alternative splicing of specific genes and adding a new regulatory layer upstream from the canonic pluripotency regulatory network.

Discussion
ESCs provide a vital tool for studying the regulation of early embryonic development and cell-fate decision. (Thomson et al. 1998). In addition to the core pluripotency regulatory TF network, emerging evidences revealed other processes regulating ESC pluripotency and differentiation, including HMs, AS, cell-cycle machinery, and signaling pathways (Gonzales et al. 2015). Here, we connected these previously separate avenues of investigations, beginning with the discovery that 3 of 16 HMs are significantly associated with more than half of AS events upon hESC differentiation. Further findings have profound implications linking HMs, AS regulation, and cellcycle progression to hESC fate decision. Specifically, HMs orchestrate a subset of AS outcomes that play critical roles in cell-cycle progression via related pathways, such as ATM/ATR-mediated DNA response, and TFs, such as PBX1 ( Figure S11E). In this way, HMs, AS regulation, and signaling pathways are converged into the cell-cycle machinery that has been claimed to rule the pluripotency (Vallier 2015).
Although epigenetic signatures, such as HMs, are mainly enriched in promoters and other intergenic regions, it has become increasingly clear that they are also present in the gene body, especially in exon regions. This indicates a potential link between epigenetic regulation and the predominantly co-transcriptional occurrence of AS. Thus far, H3K36me3 (Luco et al. 2010;Pradeepa et al. 2012), H3K4me3 (Sims et al. 2007), H3K9me3 (Piacentini et al. 2009), andH3acetyl (Gunderson andJohnson 2009) have been revealed to regulate AS, either via the chromatin-adapter systems or by altering Pol II elongation rate. Here, we investigated the extent to which the HMs could associate with AS by integrative analyses on both transcriptome and epigenome data during hESC differentiation. We found that three HMs are significantly associated with about half of AS events. By contrast, a recent report showed that only about 4% of differentially regulated exons among five human cell lines are positively associated with three promoter-like epigenetic signatures, including H3K9ac, H3K27ac and H3K4m3 (Curado et al. 2015). Like that report, we also found a positive association between H3K27ac and a subset of AS events. However, our results differ regarding the other two HMs that we identified as associated with AS.
In our study, H3K36me3 is associated with the most identified HM-associated AS events, either positively or negatively. It is reasonable since H3K36me3 is a mark for actively expressed genomes (Kolasinska-Zwierz et al. 2009), and it has been reported to regulate AS via two chromatin-adaptor systems (Luco et al. 2010;Pradeepa et al. 2012). H4K8ac is associated with the fewest number of AS events in our results. Although little studied, H4K8ac is known to act as a transcriptional activator in both the promoters and transcribed regions (Wang et al. 2008b). Its negative association with AS is supported by the finding that it recruits proteins involved in increasing the Pol II elongation rate (Sim et al. 2004). This suggests that H4K8ac may function in AS regulation by altering the Pol II elongation rate, rather than via the chromatin-adaptor systems.
However, further studies are required. Collectively, possible reasons for the different results between this study and others (Curado et al. 2015) could be that (1) we considered differentiation from hESCs to five different cell types, which covered more inter-lineage AS events than the previous report; or (2) different sets of epigenetic signatures were considered, which may lead to relatively biased results in both studies. Obviously, the inclusion of more cell lineages and epigenetic signatures may reduce this bias. Therefore, an extended dataset of 56 cell lines/tissues was included in our study and the observations support our results.
Our study also extended the understanding that HMs contribute to cell fate decision via determining not only what parts of the genome are expressed, but also how they are spliced (Luco et al. 2011). We demonstrated that the strongly associated HMs and AS events jointly affect cell fate decision in a cell-cycle-dependent manner. The most intriguing discovery is that the HMassociated genes are enriched in G2/M phases and predominantly function in ATM/ATRmediated DNA response. Evidentially, the ATM/ATR-mediated checkpoint has been recently revealed to attenuate PSD and serves as a gatekeeper of the pluripotency state through the cell cycle (Gonzales et al. 2015). The cell cycle has been considered the hub machinery for cell fate decision (Vallier 2015), since all commitments will go through cell-cycle progression. Our study expanded such machinery by linking the HMs and AS regulation to cell-cycle pathways and TFs, which together, contribute to cell fate decision ( Figure 11E).
We also exemplified our hypothesized mechanism by an H3K36me3-associated isoforms switch of PBX1. In addition to its well-known functions in lymphoblastic leukemia (Kamps and Baltimore 1993;Foa et al. 2000;Tesanovic et al. 2014;Melendez et al. 2015) and a number of cancers (Berge et al. 2006;Qiu et al. 2007;Park et al. 2008;Magnani et al. 2011;Thiaville et al. 2012;Li et al. 2014;Feng et al. 2015;Jung et al. 2015;Magnani et al. 2015;Jung et al. 2016), PBX1 was also found to promote hESC self-renewal by corporately binding to the regulatory elements of NANOG with KLF4 (Chan et al. 2009). We found that the transcriptions of two isoforms of PBX1, PBX1a and PBX1b, are putatively regulated by H3K36me3 during hESC differentiation. Their protein isoforms competitively bind NANOG and the binding of PBX1b will abolish the binding of PBX1a, which further attenuates activity of the core pluripotency regulatory network composed of NAOG, OCT4, and SOX2. The switch from PBX1a to PBX1b is modulated by H3K36me3 via the PSIP1/SRSF1 adapter system (Pradeepa et al. 2012). Our results were also supported by an extended dataset of 56 cell lines/tissues from the Roadmap/ENCODE projects. Collectively, our findings expanded understanding of the core transcriptional network by adding a regulatory layer of HM-associated AS ( Figure 6F).
However, a very recent report showed that the switch in Pbx1 isoforms was regulated by Ptbp1 during neuronal differentiation in mice (Linares et al. 2015), indicating a contradiction that the AS of PBX1 should be negatively regulated by H3K36me3 via the MRG15/PTBP1 (Luco et al. 2010).
Our study also included the neuronal lineage and showed that differentiation to NPC is an exception, distinct from other lineages. If NPC is considered separately, the results are consistent with the recent report (Linares et al. 2015) showing that NPCs and mature neurons express increasing levels of PBX1a rather than PBX1b. Another very recent report showed that PBX1 was a splicing target of SRSF1 in the HeLa cell line (Anczukow et al. 2015), which strongly supports our findings. Taken together, these evidences suggest the presence of two parallel mechanisms regulating PBX1 isoforms in embryonic development, of which neuronal differentiation takes the one different from other lineages.
Finally, it is worth noting that both our work and other studies (Curado et al. 2015) reported that HMs cannot explain all AS events identified either during ESC differentiations or based on pairwise comparisons between cell types. Moreover, bidirectional communication between HMs and AS has been widely reported. For instance, the AS can enhance recruitment of H3K36 methyltransferase HYPB/Set2, resulting in significant differences in H3K36me3 around the AS exons (Curado et al. 2015). These findings increased the complexity of defining the cause and effect between HMs and AS. Nevertheless, our findings suggest that at least a subset of AS events are regulated epigenetically, similar to the way that epigenetic states around the transcription start sites define what parts of the genome are expressed. Taken together, AS outcomes may be estimated more precisely by combining splicing cis-elements and trans-factors (i.e., genetic splicing code) and HMs (i.e., epigenetic splicing code), as an 'extended splicing code'.

Identification of alternative splicing exons upon hESC differentiation
Aligned BAM files (hg18) for all six cell types (H1, ME, TBL, MSC, NPC, and IMR90) were downloaded from the source provided in reference (Xie et al. 2013a). Two BAM files (replicates) of each cell type were analyzed using rMATS (version 3.0.9) (Shen et al. 2014) and MISO (version 0.4.6) (Katz et al. 2010). The rMATS was used to identify AS exons based on the differential 'percent splice in' (PSI, Ψ) values between each differentiated cell type and H1 cells ( Figure S1B).
Only AS exons, which hold the |ΔPSI| ≥ 0.1, p-value ≤ 0.01, and FDR ≤ 5%, were considered as final hESC differentiation-related AS exons. Finally, two types of AS exons, namely skipped exons (SE) and mutually excluded exons (MXE), which account for most AS events, were used for subsequent analysis. MISO was used to estimate the confidence interval of each PSI and generate Sashimi graphs (Katz et al. 2013) (see Figures 1A, B, 3A, and 5A). To match the ChIPseq analysis, genomic coordinates of identified AS events were lifted over to hg19 using LiftOver tool downloaded from UCSC. Supplemental Table S1 provides the full list of hESC differentiationrelated AS exons for all cell lineages.
To generate global differential profiles of histone modification changes between AS exons and constitutive exons upon hESC differentiation, for each MXE and SE AS events, we first randomly selected the constitutive splicing (CS) exons from the same genes, composing a set of CS exons.
We then considered the histone modification changes in a ±150bp region flanking the both splice sites of each AS and CS exon, i.e. a 300bp exon-intron boundary region. Each region was 15bp-binned. Alternatively, for a few cases where the exon or intron is shorter than 150 bps, the entire exonic or intronic region was evenly divided into 10 bins. This scaling allows combining all regions of different lengths to generate uniform profiles of histone modification changes around the splice sites (see Figures 2C and S4). To this end, we calculated the sequencing depth-normalized Δ reads number for each binned region between H1 cells and differentiated cells as follows:

= reads number of H1 cells − reads number of differentiated cells bin size in bps
Each region is assigned a value representing the average Δ reads number between H1 cells and differentiated cells for each histone modification. We also compared histone modification profiles between the inclusion-gain and -loss exons ( Figures 3B and S5B To quantitatively link HMs to alternative splicing upon hESC differentiation, the ChIP-seq data were further processed by narrow peak calling. For each histone ChIP-seq dataset, the MACS v2.0.10 peak caller (https://github.com/taoliu/MACS/) Feng et al. 2012) was used to compare ChIP-seq signal to a corresponding whole cell extract (WCE) sequenced control to identify narrow regions of enrichment (narrow peak) that pass a Poisson p-value threshold of 0.01. All other parameters (options) were set as default. We then compared the histone modification signals between H1 cells and differentiated cells. We defined the 'differential HM signals (ΔHMs)' as the difference of the normalized peak signals (i.e., the heights of the narrow peaks) between H1 and the differentiated cells. The normalization was performed based on the distance (in kb) between the peak summits and 3' splice sites (upstream) of the AS exons ( Figure   S5A). Since there is no evidence showing that distal HMs could regulate the AS, we considered only peaks with at least one bp overlapping on either side of the AS exon. For exons without overlapping peaks, peak signals of these exons were set to zero. For the exons there are more than one overlapping peaks, the peak signals of these exons were set to the greater ones. For MXE events, only the upstream AS exons were considered due to their exclusiveness in inclusion level between these two exons, i.e., the sum of the PSIs for two exons of an MXE event is always one.

Association studies and k-means clustering
To quantitatively estimate associations between HMs and AS, we first used three independent correlation tests, including Pearson correlation (PC), multiple linear regression (MLL), and logistic regression (LLR), to test global correlations between AS events and each of 16 HMs based on differential inclusions (ΔPSIs) and differential HM signals (ΔHMs). PC was performed using the R package (stats, cor.test(),method = 'pearson'). MLL and LLR were calculated using Weka 3 (Hall et al. 2009), wherein the ΔHMs are independent variables and ΔPSIs are response variables.
The results show that only some HMs correlate with AS, and most correlations are weak ( Figure   S5C). HMs that have significant correlations (p ≤ 0.05) with AS were used for further clustering analysis, through which we identified six subsets of HM-associated AS events (Supplemental Table S2).
K-means clustering was performed separately on inclusion-gain and inclusion-loss AS events of MXE and SE, based on the selected HM signatures ( Figure S5C). K was set to six for all clustering processes ( Figure S5D), which produced the minimal RSME (root mean square error) for most cases based on a series of clustering with k varying from two to eight (data not shown). Then the two clusters that generate mirror patterns, of which one from inclusion-gain events and one from inclusion-loss events, were combined to be considered as a subset of HM-associated AS events ( Figure S6). Finally, we identified six subsets of HM-associated AS events displaying significantly positive or negative correlations with three HMs.

Gene expression quantification
For each cell type, two aligned BAM files (replicates) were used to estimate the expression level of genes using Cufflinks (Trapnell et al. 2012). Default parameters (options) were used. The expression level of each gene was presented as FPKM for each cell type. Differentially expressed genes (DEGs) were defined as those genes whose fold changes between the differentiated cells and hESCs are greater than 2. Specifically for DEG analysis of SF genes, we collected a list of 47 ubiquitously expressed SFs with "RNA splicing" in their GO annotations from AmiGO 2 (Carbon et al. 2009). The enrichment significances in Figure S1D and S3H are shown as the p-values based on hypergeometric tests, using the DEGs of all expressed genes (with the FPKM ≥ 1 in at least hESCs or one of differentiated cell types) as background. We found that the AS genes are generally not differentially expressed between hESCs and differentiated cells, indicating that they function in hESC differentiation via isoform switches rather than expression changes. Few SF genes show differential expression between hESCs and differentiated cells, indicating the existence of epigenetic control of AS.

Genome annotations
Since the RNA-seq reads (BAM) files and ChIP-seq read (BAM) files downloaded from the public sources are mapped to different human genome assembles, NCBI36/hg18 (Mar. 2006) and GRCh37/hg19 (Feb. 2009), respectively, we downloaded two version of gene annotations (in GTF formats) from the UCSC Table Browser (Karolchik et al. 2004). The hg18 GTF file was used for rMATS and MISO to identify AS during the differentiation from H1 ESCs to five differentiated cells.
The hg19 GTF file was used to define the genome coordinates of AS exons and further for ChIPseq profile analysis ( Figures 2C, 3B, S4, and S5A). We compared exonic and intronic lengths based on hg18 annotation ( Figure S2).

Gene ontology enrichment analysis
The Gene ontology (GO) enrichment analysis was performed using ToppGene (Chen et al. 2009) by searching the HGNC Symbol database under default parameters (p-value method: probability density function). Overrepresented GO terms for the GO domain 'biological process' were used to generate data shown in Figures 4C-E and Figures S8A-C using either the FDR (0.05) adjusted p-value or the enriched gene numbers ( Figure S8A).

Stemness signature and TF binding enrichment analysis
StemChecker (Pinto et al. 2015), a web-based tool to discover and explore stemness signatures in gene sets, was used to calculate enrichment of AS genes in stemness signatures. Significances was tested (hypergeometric test) using the gene set from human (Homo sapiens) of this database as the background. For all AS genes (n = 3,865), 2,979 genes were found in this data set. 1,395 of 2,979 genes were found in the stemness signature gene set, most of which (n = 813) are ESC signature genes. Figure S3A shows the enrichment significance as the -log10 p-values. For HMassociated AS genes (n = 2,125), 1,992 genes were found in this data set. 966 of 1,992 genes were found in the stemness signature gene set, most of which (n = 562) are ESC signature genes.
For HM-unassociated genes (n = 1,057), 987 genes were found in this data set. 429 of 987 genes were found in the stemness signature gene set, most of which (n = 251) are ESC signature genes.
FunRich (v2.1.2), a stand-alone functional enrichment analysis tool, was used for TF binding enrichment analysis. The top six enriched TFs of HM-associated and -unassociated AS genes are presented and shown as the proportion of enriched AS genes. It shows that HM-associated AS genes are more likely to be regulated by TFs involved in cell differentiation and development, while the HM-unassociated AS genes are more likely to be regulated by TFs involved in cell proliferation and renewal ( Figure 4B).

Roadmap and ENCODE data analysis
The sources of RNA-seq data for 56 cell lines/tissues from Roadmap/ENCODE projects were listed in supplemental Table S3. The RNA-seq data (BAM files) were used to calculate the PSI of the exon 7 for PBX1 in each cell line/tissue and to estimate the expression levels of all gene (FPKM), based on aforementioned strategies. The relative expression levels of PBX1a and PBX1b shown in Figure 6 and Figure S11 were calculated as the individual PFKM value of each divided by their total FPKM values.  (Hall et al. 2009), whereas all other tests were performed using R packages.

Data availability
The RNA-seq data of H1 and five other differentiated data (two replicates for each, BAM files) are available at http://epigenome.ucsd.edu/differentiation/download.html. All ChIP-seq data of 16 HMs are available in Gene Expression Omnibus (GEO) under accession number GSE16256.
Both RNA-seq and ChIP-seq data of 56 cell lines/tissues from the Roadmap/ENCODE projects are available in their official website. Supplemental Table 3 provides the details on accession to these data. H3 mono-methylated lysine 79; H3K9me3: H3 tri-methylated lysine 9.