Synchronizing three distinct cell-fate transitions in yeast
To investigate how transcript diversity is regulated during cell-state transitions, we profiled different cell-fate transitions in yeast covering the gametogenesis program and re-entry into the mitotic cell cycle (Fig. 1a). A major benefit of the yeast model is that we can synchronize yeast gametogenesis and re-entry into the mitotic cell cycle, allowing for precise cell stage-specific measurements and minimizing effects caused by asynchronous cell populations. To obtain a high synchrony of three distinct cell fates, we used an engineered yeast strain that expressed the master regulatory transcription factor for entry into gametogenesis, IME1, from an inducible promoter (pCUP-IME1) [41]. The same strain also harbored the transcription factor NDT80, which controls meiotic divisions and spore formation, under control of a different inducible promoter (pGAL-NDT80, Gal4-ER) [40]. We designed a master time course with periodic sampling across three distinct cell-fate transitions: (Transition 1:) gametogenesis up until meiotic prophase by inducing Ime1 expression (pCUP1-IME1, +Cu), (Transition 2:) meiotic divisions followed by spore formation by inducing Ndt80 expression (pGAl1-NDT80, Gal4-ER, + β-estradiol), and (Transition 3:) re-entry into the mitotic cell division cycle (from meiotic prophase (6 h (h) in sporulation medium (SPO)) to nutrient-rich medium (YPD)). We defined these three cell-fate transitions as T1, T2, and T3, respectively (Fig. 1a).
To determine that cells underwent T1, T2, and T3 synchronously, we measured the synchrony of pre-meiotic DNA replication (T1), meiotic divisions (T2), and budding (T3) (Fig. 1b). Indeed, we found that most cells duplicated their DNA at 4 h in SPO (2 h after Ime1 induction), completed meiotic divisions at 9 h in SPO (2 h after Ndt80 induction), and displayed newly formed buds 2 h after shifting to rich nutrient conditions. These data confirm that our experimental system allows for synchronous progression through three distinct cell-state transitions.
Quantitative profiling of transcript heterogeneity across multiple cell-fate transitions
We generated quantitative datasets of TSS and TES usage levels over multiple time points for each cell-fate transition (T1, T2, and T3). Specifically, we adopted high-throughput sequencing approaches to measure usage levels of TSSs (TSS-seq) and TESs (TES-seq) (Fig. 1c) [20, 43,44,45]. In addition, we measured mRNA expression at matching time points (mRNA-seq). To complement TSS-seq and TES-seq datasets, we also used an orthogonal method called transcript isoform sequencing (TIF-seq) on pooled samples of matching time points: prior to meiosis (pm), T1, T2, and T3 respectively [3]. The main utility of TIF-seq is that it matches the start and end of transcripts by sequencing the junction of circularized cDNAs spanning the 5′ and 3′ of transcripts, and can therefore precisely identify transcript isoforms [3, 46]. Finally, we determined nucleosome positions by micrococcal nuclease digestion of chromatin followed by high-throughput sequencing (MNase-seq) on selective time points across all three transitions [47]. The MNase-seq dataset allowed us to determine how the chromatin state is altered during each cell-fate transition. The combination of these methods provides a high-resolution view of transcript isoform diversity and chromatin states over multiple distinct cell-fate transitions (Fig. 1a, see materials and methods for how TSSs/TESs were filtered and assigned to genes).
At the single nucleotide level, we found that on average 38 TSSs and 39 TESs per gene were located within one kilobase (kb) upstream or downstream of the ORF in the sense orientation, which was in the range of what a previous study had showed [46] (Fig. 1d and Additional file 1: Fig. S1a). Individual TSSs and TESs often clustered together. For example, we detected about 256 TSS and 95 TES sites at the single nucleotide resolution at the RAD16 locus, but most of them clustered within a few narrow regions (Fig. 1e). Therefore, we applied a computational method to define these TSS/TES clusters and identified 11,685 distinct TSS and 13,380 TES clusters respectively, with approximately half of the genes harboring two or more TSS (or TES) clusters (Fig. 1f and Additional file 1: Fig. S1b). Per time point, we identified between 7320 and 9412 TSS clusters and between 8437 and 11,382 TES clusters (Additional file 1: Fig. S1c). There was a good overlap between the TSS-seq/TES-seq and TIF-seq datasets (Additional file 1: Fig. S1e). At least 50% of TSS and TES clusters were detected in the TIF-seq dataset, even though TIF-seq was sequenced with lower read depth and displayed an over-representation of shorter fragments (Additional file: Fig. S1d). For the TSS/TES clusters with high expression (Tags Per Million reads (TPM) > = 10), more than 90% of them were supported by TIF-seq (Fig. 1g). Additionally, the three independent biological replicates in this study highly correlated with each other (Additional file 2). The TSS-seq and TES-seq datasets correlated well with the RNA-seq dataset (TSS-seq vs RNA-seq and TES-seq vs RNA-seq) (Fig. 1h and Additional file 1: Fig. S1f, S1g and Additional files 3, 4 and 5). However, it is worth noting that genes with relatively low expression levels for the RNA-seq correlated less well with TSS-seq and TES-seq, which could be due to noise in the data. Nevertheless, these data indicate that our TSS-seq and TES-seq datasets can be largely used for quantitative estimates of steady state levels of TSS and TES usage.
Alternative TSSs and TESs are highly regulated by cell-fate-specific transcription factors
The terms main and alternative TSS and TES have been used in various ways. To avoid ambiguity, we defined these terms as the following. The main TSS or TES is the most highly expressed cluster prior to a cell-fate transition (PT, Fig. 2a). For T1, this was the most highly expressed TSS or TES cluster at the 2 h SPO time point, and for T2 or T3, the 6 h SPO time point. Alternative TSSs or TESs are the TSS or TES clusters expressed prior to or during cell-fate transitions, different from the main TSS/TES. Our definitions were fixed within each individual cell-fate transition (T1, T2, or T3).
Our analysis across three cell transitions revealed widespread usage of alternative TSS and TES clusters. For each cell-fate transition, we observed ~ 5800 alternative and main TESs, and ~ 3500 alternative and ~ 5800 main TSSs (Fig. 2b). Most alternative TSSs were expressed upstream of annotated ORFs, but a subset of genes harbored TSSs within the gene body (Additional file 1: Fig. S2, left panel). The median position of main TSSs was 75 bp (T1) and 77 bp (T2 and T3) upstream of the AUG of the matching ORF, while that of the alternative TSSs upregulated during cell-fate transitions (2 fold or more) were at 170 bp (T1), 173 bp (T2) and 112 bp (T3) respectively (Fig. 2c, Additional file 1: Fig. S2, right panel). A similar trend was observed for alternative TESs suggesting that increased 5′ and 3′ UTR lengths are characteristic of most alternative transcript isoforms expressed during cell-fate transitions.
Alternative TSSs and TESs were highly regulated across the three cell-fate transitions. Weighted gene correlation network analysis (WGCNA), which identifies the gene expression network based on expression correlation among genes across different time-points of the master time course, revealed 13 co-expression TSS modules and 15 TES modules, each consisting of at least one hundred genes (Additional file 1: Fig. S3a and S3b) [48]. The top three co-expression modules of TSSs and TESs were specifically upregulated during T1, T2, and T3 respectively (Additional file 1: Fig. S3a). Alternative TSSs and TESs were well represented in each expression module. In line with this observation, we found that transcription factors involved in regulating alternative and main TSSs were similar. The Ume6 binding motif was detected near main and alternative TSSs which were upregulated in T1, which is in line with the function of Ume6 together with Ime1 in activating transcription of the so-called “early” meiotic genes during yeast gametogenesis [49]. The binding motif of Ndt80, a transcription factor essential for activation of the middle and late meiotic genes, was enriched for T2 TSSs [50]. Given that expression of Ime1 and Ndt80 were controlled from heterologous promoters for T1 and T2 in order to obtain a highly synchronous cell population, there is a possibility that this can lead to mis-regulation of a subset transcripts. However, both synchronization methods have been used to study gene regulation during gametogenesis, and gave rise to viable spores that were indistinguishable from the wild-type [40, 41, 51]. Lastly, motifs of the transcriptional repressor Tod6 and transcriptional activator Sfp1 were proximal to the T3 TSSs (Additional file 1: Fig. S3c). Both transcription factors are known to regulate transcription of ribosomal protein gene promoters, and their activities are controlled by nutrient sensing kinases [52,53,54].
To test whether transcription factors directly control alternative TSS/TES usage, we compared TSS and TES changes between T1- and T2-induced cells (3H and 7H) and mock-treated cells of the matching time point (3M and 7M). We found that the vast majority of alternative TSSs and TESs were expressed in Ime1 and Ndt80-induced cells but not in the mock-treated cells for the same time period (3H versus 3M, and 7H versus 7M) (Additional file 1: Fig. S3d). We conclude that main and alternative TSSs/TESs are widely expressed through the action of cell-fate specific transcription factors.
Increased main to alternative TSS usage is a common feature of cell-fate transitions
Next, we determined how alternative TSS and TES usage contributed to gene expression. Specifically, we computed the relative TSS and TES usage levels by taking the ratio of alternative versus the total TSSs/TESs levels of the same gene at the same time point (Fig. 2d). An increased ratio means elevated relative alternative TSS or TES usage, while a lower ratio indicates a decrease in relative usage. Proportional increases in expression from both alternative and main TSSs (e.g., if TSSs were not regulated independently) would result in an invariant ratio. Strikingly, we found that alternative TSS usage increased significantly during T1, T2, and T3 (Fig. 2d, e). For example, approximately 200–300 genes had alternative TSSs whose usage increased by at least 50% for T1, T2, and T3 respectively. In contrast, alternative TES usage changed by a smaller magnitude across the three transitions (Fig. 2e). Only 100–150 genes had alternative TESs whose usage increased 50% more than the main TES. The extent of increase of TSS was also significantly larger than TES (Fig. 2f, Wilcoxon rank sum test, p < 0.05). These increases were not seen for uninduced cells (Fig. 2e, see time points 3M and 7M). We conclude that there is a large shift from main to alternative TSS usage during cell-fate transitions. For remainder of the manuscript, we decided to focus on this remarkable observation.
Increased upstream alternative TSS usage is linked with a range of outcomes on main TSS usage
During yeast gametogenesis, many noncoding RNAs and mRNA isoforms are expressed. A class of transcripts called long undecodable transcript isoforms (LUTIs) initiate upstream of canonical promoters and are widely expressed [17,18,19]. A well-studied gene regulated by a LUTI is the kinetochore component NDC80 [17, 18]. During early gametogenesis, transcription from the main NDC80 TSS is repressed by transcription through the NDC80 promoter, which initiated from the upstream alternative TSS (NDC80LUTI). Additionally, many other examples where transcription of intergenic noncoding RNAs or 5′ extended mRNA isoforms repress downstream promoters of protein coding genes have been reported [17, 18, 26, 28, 32, 55, 56]. While many LUTIs and noncoding RNAs have been functionally dissected and characterized, a more systematic analysis of how transcription from upstream alternative transcription isoform influences gene expression has been lacking. Close interrogation of our high-resolution time course allowed us to capture these regulatory events.
Our TSS-seq data was consistent with our previous work on the NDC80 locus, indicating that we can identify these regulatory events genome wide [17, 18] (Fig. 3a). During T1, the NDC80 upstream alternative TSS was strongly upregulated and concomitantly the main TSS in the NDC80 promoter was downregulated, while in T2 and T3, the TSS switching effects were reversed. Cells that were not induced for T1 and T2 but exposed to sporulation medium for the same time (Fig. 3a, “no T main” and “no T UA”, 3 h in SPO for T1 and 7 h in SPO for T2, respectively) did not display TSS usage changes at the NDC80 locus, demonstrating that these effects are cell fate specific.
Having established that we could capture gene regulation events accompanying alternative TSS expression, we next examined how increased alternative TSS usage corresponded with expression changes from the matching downstream main TSS. We selected genes that showed upregulation (2-fold or more) of an upstream alternative TSS for at least one time point during the cell-fate transition. Surprisingly, expression from main TSSs changed with various outcomes in response to increased expression from an alternative TSS (Fig. 3b). For example in T1 at 3 h in SPO, 153 genes were downregulated, 87 genes were upregulated, and 184 genes did not change significantly. Genes in T2 and T3 showed a similar trend, but a slightly greater proportion of them were upregulated in expression (Fig. 3b). Genes with co-upregulated main-alternative TSS pairs in T1 and T3 were enriched for cell-fate transition specific biological processes (e.g., “double-strand break repair” during meiotic prophase (T1) and “ribosome biogenesis” during vegetative growth (T3)) (Additional file 8: Table S2). Downregulation of the main TSS in the presence of increased upstream alternative usage was not generally linked to specific biological processes. In contrast to increased alternative TSS usage, downregulation (2-fold or more) of some alternative TSSs was accompanied by downregulation of the main TSS, which suggests that some of these pairs of main and alternative TSSs could be co-regulated (Additional file 1: Fig. S4a). Additionally, at closely spaced tandem pairs of genes (< 200 bp apart), there was no clear effect of increased expression of upstream adjacent genes on expression of the main TSS of the downstream gene (Additional file 1: Fig. S4b). We conclude that upstream alternative TSS usage correlates with a range of effects on main TSS usage, from gene activation to gene repression. While our analysis does not establish a direct causative effect, our data suggest expression from the main TSS for many genes is influenced by transcription from upstream alternative TSSs, as reported in single-locus studies.
TSS switching events are linked to various gene regulatory outcomes
Switching between main and alternative TSSs is reminiscent of the regulation we previously described at the NDC80 locus (Fig. 3a, T1), where the alternative upstream transcript becomes the dominantly expressed isoform. To profile the effect on expression from the main TSS during such TSS shifts, we defined TSS switching by selecting TSS pairs where the alternative TSS is upregulated (2-fold or more), and its expression level must be at least equal or more than that of the main TSS (Fig. 3c). Across all transitions, we identified 109, 93, and 86 genes with TSS switching events in T1, T2, and T3 respectively. TSS switching events were linked to various degrees of downregulation of the main TSS. For example, the main TSSs of NDC80 displayed a decrease of 5-fold in presence of expression from alternative upstream TSS, while the majority of main TSSs displayed a marginal decrease (2 folds or less).
We also assessed how previously reported LUTI-regulated genes (380 genes in total) behaved in our dataset (Fig. 3d) [19]. For a large fraction of genes (109 out of 380 genes) regulated by LUTI, we did not detect an alternative upstream TSS. It is possible that some alternative TSSs were not measured in our dataset because of technical limitations. For example, initiation of transcription from alternative TSSs could be spread over a large region in promoters, making it less sensitive for detection by TSS-seq. Surprisingly, the majority of previously defined LUTI-regulated genes (189 genes) that harbored an alternative TSS in our dataset displayed no TSS switching (Fig. 3d). This suggests that either most LUTI-regulated genes do not switch expression from protein coding TSS to the LUTI TSS. As a caveat, we cannot rule out the impact of noise in the TSS-seq data for the examples in which we observed little to no change in main TSS signals in the presence of increased upstream alternative TSS expression. Nevertheless, our analysis indicates that increased expression from upstream alternative TSSs is linked to various outcomes on the expression of the matching main TSSs and is not always associated with gene repression.
TSS usage changes are dynamic and temporal
Gene regulation via expression from alternative TSSs is dynamic and cell-fate transition specific. Like NDC80, SWI4 and POP4 exhibited TSS switching in T1 (Fig. 3e, Additional file 1: Fig. S4c and S4d). At these loci, an upstream alternative TSS was upregulated, and the main TSS was downregulated concomitantly during T1. In T2 and T3, SWI4 and POP4 TSS switching was rapidly reversed. In comparison, the RAD16 and CLB2 genes showed a different switching pattern. Predominance of the alternative TSS after switching was maintained till the end of T2, indicating that expression from the alternative TSS could persist over multiple cell-fate transitions. We observed T2-specific switching for ORC1 and RAD2, while POP1 and PCL1 showed strong switching events during T3 (Fig. 3e and Additional file 1: Fig. S4c). These examples illustrate that TSS switching not only occurs in a stage-specific manner but can also be spread across multiple stages (e.g., RAD16 and CLB2) or controlled within a tight developmental window (e.g., POP1 and PCL1).
Co-regulation of isoforms also occurs in a stage-specific manner (Fig. 3f, Additional file 1: Fig. S4c and Fig. S4d). Representative examples are the MCM2 and BDF2 genes where the main and alternative TSSs were both upregulated in T1 and then downregulated in T2. A similar pattern was observed for the SPO75 and SWD1 genes except that the alternative and matching main TSSs were co-upregulated in T2 (Fig. 3f, Additional file 1: Fig. S4c and S4d). At the SUM1 locus, the expression of the main and alternative TSSs followed each other throughout all three fate transitions. Thus, the regulation of alternative-main TSS pairs is dynamic and can be coupled to shape gene expression at specific time points.
Cell-fate transitions feature increased TSS usage within gene bodies
TSS switching events were not limited to “conventional” promoter regions only but also occurred in regions downstream of the main TSS (Additional file 1: Fig. S2, labeled “internal”). We observed a subset of genes that displayed expression of a TSS within the coding sequence. Among these, about 30 to 40 genes showed transition-specific TSS switching, where the internal TSS was expressed prior to the transition but decreased during the transition while expression of the upstream TSS encoding for the full-length transcript concomitantly increased (e.g., ECM10, TRZ1, and SPO22,) (Fig. 4a, b and Additional file 1: Fig. S5a). We identified examples of genes where an internal TSS was upregulated during cell-fate transitions (e.g., SSP1 and DUS3), and dynamic switching occurred between the full-length transcript and the internal TSS in a cell-fate-specific manner (Fig. 4c).
The production of truncated transcripts and protein isoforms via internal transcription during T2 was also reported previously [14]. To systemically dissect how internal TSSs are regulated across the different cell-fate transitions, we classified internal TSSs with relaxed (two-fold or more upregulated) and stringent cutoffs for each fate transition (Fig. 4d). The stringent cutoff was met when the expression levels were at least one third that of the full-length mRNA at the same time point, and the matching truncated transcript isoform contained an ORF that was more than 300 bp long (e.g., VPS41) (Fig. 4a). Nearly 500 internal TSSs were induced per transition, of which a substantial fraction remained after a stringent cutoff (Fig. 4d). The expression of internal TSSs was also supported by our TIF-seq dataset (Fig. 4e). Additionally, a subset of truncated transcript isoforms overlapped with coding sequences of specific protein domains, suggesting that encoded truncated proteins may have specialized cellular functions (Fig. 4d, labeled “domain”).
Several studies in yeast have shown that cryptic promoters exist within gene bodies, driving expression of short transcript isoforms and can encode for truncated proteins [14, 26, 57,58,59,60,61,62,63]. In our dataset, we found that many transcripts emanating from internal TSSs harbored an in-frame AUG not far from the internal TSS (Fig. 4f). Ribosomes were associated with truncated transcript isoforms initiating from internal TSSs when we compared a ribosome profiling dataset that covered the T1 and T2 cell-fate transitions with our dataset (Fig. 4g) [21]. For example, SAS4 and TEL1 showed clear ribosome footprint signals in the same region and at the same time when the truncated transcript isoform was expressed (Fig. 4h). Interestingly, the truncated transcript isoform of TEL1 solely covered the FATC domain of the Tel1 protein, a domain that is important for mediating protein-protein interactions (Fig. 4h). Like TEL1, TOR1, the catalytic subunit of TORC1 and TORC2 in yeast, also showed expression of truncated transcript isoform solely encoding for the Tor1 FATC domain (data not shown).
Promoters controlling transcription from internal TSSs shared features with canonical promoters. We observed nucleosome-free regions (NFR) aligned with the internal TSS, and nucleosome periodicity (+ 1, + 2 nucleosomes and so on) downstream of the internal TSS (Additional file 1: Fig. S5b). Like the co-expression modules for T1 and T2, we found that the transcription factors Ume6 and Ndt80 were enriched upstream of internal TSSs (Additional file 1: Fig. S5c). Importantly, the expression of internal TSSs relied on the induction of Ime1 and Ndt80 expression, indicating that these internal transcripts are directly regulated by these transcription factors (Additional file 1: Fig. S3d). The promoter sequences of internal TSSs upregulated in T3 were enriched for the Sfp1 motif, suggesting that this transcription factor regulates truncated mRNA isoforms during return to the mitotic cell cycle (T3). Thus, similar to transcription upstream of canonical TSSs, the expression of transcripts with internal TSSs is also dynamically controlled, possibly by the same transcription factors that regulate the former. The short transcript isoforms, in turn, have the potential to be translated into truncated protein isoforms, diversifying the proteome during cell-state changes.
Determinants of gene regulation via the use of alternative TSSs
Our analysis showed that increased upstream TSS usage is associated with a range of effects on expression of the downstream main TSS (Figs. 3b and 4b). Are there features in the dataset that can explain these outcomes? To examine this systematically, we aggregated the data obtained from three pairs of comparisons representing each cell-fate transition: T1 (6h vs 2 h SPO), T2 (8h vs 6 h SPO) and T3 (60 min YPD vs 6 h SPO). We focussed on two features in the dataset: alternative TSS levels and distance between alternative-main TSS pairs, as both features have been described to affect gene expression, in multiple studies [34, 42, 64].
We found that main and alternative TSSs in close proximity were more likely to be co-regulated. For closely spaced alternative and main TSS pairs of less than 80 bp apart, increased alternative TSS usage correlated with increased main TSS usage, while expression of more widely spaced alternative and main TSS pairs correlated inversely instead (Fig. 5a, b). Moreover, genes which had shorter distances (< 80 bp) between the tandem TSSs displayed a positive correlation between alternative TSS expression levels and main TSS expression changes (Additional file 1: Fig. S6a). This correlation was strengthened at genes with relatively low main TSS expression prior to transition (≤ 50th percentile), and a relatively high alternative TSS expression during transition (≥ 50th percentile) (Fig. 5c and Additional file 1: Fig. S6a). The positive trend was weakened or even absent when we relaxed the criteria for the alternative TSS and main TSS expression levels (Additional file 1: Fig. S6a). This suggests that co-regulation between closely spaced TSSs occurs mostly when the expression from the alternative TSS is relatively high.
Second, we observed that expression from upstream alternative TSSs was linked to repression of the main TSS at genes when the distance between main and upstream alternative TSS was relatively large (≥ 80 bp) (Fig. 5d, Additional file 1: Fig. S6a and S6b). A stronger negative correlation between alternative TSS expression levels and main TSS expression changes was observed when we subsetted for genes with relatively low main TSS expression (≤ 50th percentile) and high alternative TSS expression (≥ 50th percentile) than without subsetting (Fig. 5d, left graph and Additional file 1: Fig. S6b). For this subgroup of genes, increasing distances between main TSS and alternative TSS were correlated with repression of the main TSS expression (Fig. 5d, right panel). The negative relationship weakened when we relaxed the alternative and main TSS expression levels and was absent when upstream alternative and main TSS were spaced 80 bp or less (Additional file 1: Fig. S6b and Fig. S6c). Our analysis suggests that the distance between main and alternative TSS, and alternative TSS expression levels are key determinants that influence how the main TSS responds when upstream TSS usage is increased.
The distance between TSSs was also reflected in the chromatin structure and transcription factor binding. Genes with a TSS of less than 80 bp upstream of the main TSS tended to have wider NFR and a defined peak for transcription factor binding (Additional file 1: Fig. S7a). On the other hand, genes with an alternative TSS of more than 80 bp upstream from the main TSS tended to have narrower NFRs around the main TSS, a second NFR nearby the upstream TSS, and displayed a broader peak for transcription factor binding. It is worth noting that upstream alternative TSS did not show a clear enrichment for the TATA binding sequence, suggesting that their transcription is regulated via TATA-less promoters.
A linear model for gene regulation by upstream alternative TSSs
At genes with relatively large distance between alternative and main TSS, we observed that increased alternative TSS usage and further increase in distance between main and alternative TSS was linked to repression of the main TSS (Fig. 5d). Transcription initiating upstream of canonical promoters is known to alter chromatin state and represses promoters in numerous examples [18, 26, 28, 29, 65]. We found that changes in nucleosome occupancy in the region between the main and alternative TSS were consistent with decreased main TSS expression (Additional file 1: Fig. S7b). To dissect these different variables, we performed multiple regression analysis that accounts for relationships between different explanatory variables. This allowed us to delineate the semi-partial correlations (sr) between the response variable (main TSS levels) and a specific explanatory variable (e.g., nucleosome occupancy) (Fig. 5e).
We identified a negative correlation between main TSS expression changes and upstream expression levels (first variable). The distance between the two promoters (second variable) also negatively correlated with our response variable likely because we already subsetted for genes’ relatively large (≥ 80 bp) distances (Fig. 5d). The combined model revealed that increased nucleosome occupancy was negatively correlated with changes in expression of the main TSS (Fig. 5e). Collectively, these three variables explained a significant part of the variation (adjusted R-squared = 0.27) in main TSS responses across the three cell-state transitions. We propose that a balance between expression levels of different TSSs of the same gene, the distance between tandem TSSs, and chromatin structure are key determinants for the regulation of gene expression via transcription of upstream alternative TSSs.
Transcriptional repression via upstream alternative TSSs requires specific chromatin regulators
Our analysis and modeling did not establish causative relationships between upstream transcription, repression of transcription from the main TSS, and chromatin state. If the repression of the main TSS and changes in chromatin structure were the consequence of upstream transcription, certain chromatin factors are likely required for mediating gene repression. Disrupting chromatin factors may therefore affect the extent of repression driven by transcription from upstream alternative TSSs. Indeed, several regulators for chromatin have been described in facilitating repression via transcription of upstream noncoding RNAs or 5′ extended transcript isoforms [18, 26, 28, 29, 31, 66]. These include Set2-directed histone lysine 36 methylation, histone deacetylation directed by SET3C, and chromatin assembly by FACT.
To test whether the chromatin state contributes to repression of transcription of the main TSS in the presence of upstream transcription, we generated deletion and depletion mutants and measured TSS usage and chromatin state (MNase-seq) during T1 (6h SPO) (Fig. 6a and Additional file 6). Importantly, cells harboring set2Δ and set3Δ single or double deletions entered meiosis and underwent premeiotic DNA replication, allowing for T1-specific transcriptome measurements (Additional file 1: Fig. S8a). Since FACT (Spt16) is essential for cellular growth, we depleted Spt16 using the auxin-induced degron (SPT16-AID) (Additional file 1: Fig. S8b). Importantly, these cells underwent premeiotic DNA replication even through Spt16 was depleted during entry into T1 (Additional file 1: Fig. S8c-e) [67].
To better capture locus-specific changes in gene expression in a backdrop of globally altered transcription in these chromatin mutants, we calculated the relative main TSS usage levels for each gene by dividing the main TSS signal over the sum of all TSSs associated with the same gene during T1 (6h SPO). Approximately 200 genes displayed increased main TSS expression in each of the deletion (set2Δ, set3Δ, and set2Δset3Δ) and depletion (Spt16) mutants compared to WT (Fig. 6b). A subset of these genes showed significant de-repression of expression from the main TSS, indicating that chromatin regulators (Set2, Set3, and FACT) were required for mediating repression. We observed a good overlap between genes de-repressed after Spt16 depletion and genes affected by set2Δ and set3Δ single and double mutants (Fig. 6c). As expected, main TSS usage was significantly higher in T1 (6h SPO) among the genes identified as “de-repressed” in mutants compared to the control (Fig. 6d). Importantly, these differences were not observed prior to T1 (0h SPO) (Fig. 6d). Therefore, these de-repression events in these mutants occurred in the context of transition-specific gene regulatory programs. We posit that the failure to establish a repressive chromatin in the presence of upstream transcription results in leaky or aberrant expression from the main TSS.
We examined chromatin structure and observed a wider NFR near the main TSS during T1 in set2Δ and set3Δ single and double mutants, compared to the control at gene promoters that showed de-repression. This phenomenon was not observed at a matching number of randomly selected promoters. Depletion of Spt16 had a pronounced effect on chromatin structure at gene promoters that were de-repressed (Fig. 6e). We observed a wider NFR and the loss of regularly spaced nucleosome arrays flanking the main TSS, while the chromatin structure of randomly selected genes was disrupted to a lesser extent and regular nucleosome arrays were still visible. We further found that about 30% of depressed genes showed a significant occupancy change in chromatin structure of main TSS and 0.5 kb upstream. In conclusion, disrupting chromatin factors (Set2, Set3, and FACT) that mediate transcription coupled chromatin changes, affected the repression directed by transcription from upstream alternative TSSs, indicating that the effect of upstream transcription on repression of main TSS usage is direct.
Messenger RNAs originating from upstream alternative TSSs have a variety of translation efficiencies
While there was a good overlap with genes expressing LUTIs and our TSS-seq dataset, we also found many genes that displayed expression from an upstream alternative TSS which were not identified as expressing LUTIs (Fig. 3). LUTIs are typically translationally inert due to the presence of small ORFs in their 5′ leader sequence [17, 19, 42]. Perhaps, a subset of transcripts produced from upstream TSSs have protein coding potential.
To examine how upstream alternative transcripts are translationally controlled, we selected genes that underwent TSS switching, which ensures that the dominantly expressed transcript at these genes is initiated from the upstream alternative TSS (Fig. 3c). Subsequently, we examined translation efficiency using a published ribosome profiling dataset [21]. Consistent with previous work, we found a set of genes showing a decrease in translation efficiency as defined by ribosome footprinting (Fig. 7a) [19]. Many of these genes expressed LUTIs (Fig. 7a, genes marked with asterisks). This was particularly clear for the T1 transition, but less so for the T2 transition. A subset of genes showed no reduction in translation efficiency, suggesting that the upstream alternative transcript was translated (Fig. 7a). Interestingly, alternative TSSs that were more distal to the main TSS (≥ 80 bp) displayed decreased translational efficiency for T1, while proximal alternative TSSs showed no decrease (Fig. 7b). Perhaps, alternative TSSs proximal to the main TSS are less likely to harbor a small ORF in the 5′ leading sequence, while longer 5′ leading sequences reduce translation efficiency because of the presence of upstream small ORFs in the leading sequences which repress full-length protein production [42]. Interestingly, for T2, many genes showed no decrease in translational efficiency and there was little difference in translation efficiency between proximal and distal TSSs (Additional file 1: Fig. S8e). Our data suggest that the transcript isoforms emanating from upstream alternative TSSs possess different translation efficiencies.