Skip to main content

3 -5 crosstalk contributes to transcriptional bursting



Transcription in mammalian cells is a complex stochastic process involving shuttling of polymerase between genes and phase-separated liquid condensates. It occurs in bursts, which results in vastly different numbers of an mRNA species in isogenic cell populations. Several factors contributing to transcriptional bursting have been identified, usually classified as intrinsic, in other words local to single genes, or extrinsic, relating to the macroscopic state of the cell. However, some possible contributors have not been explored yet. Here, we focus on processes at the 3 and 5 ends of a gene that enable reinitiation of transcription upon termination.


Using Bayesian methodology, we measure the transcriptional bursting in inducible transgenes, showing that perturbation of polymerase shuttling typically reduces burst size, increases burst frequency, and thus limits transcriptional noise. Analysis based on paired-end tag sequencing (PolII ChIA-PET) suggests that this effect is genome wide. The observed noise patterns are also reproduced by a generative model that captures major characteristics of the polymerase flux between the ends of a gene and a phase-separated compartment.


Interactions between the 3 and 5 ends of a gene, which facilitate polymerase recycling, are major contributors to transcriptional noise.


In many cellular systems, mRNAs appear to be produced in burst-like fashion. This is directly observed in real-time experimental studies [13] and also agrees with theoretical analyses of steady-state mRNA distributions among single cells [4, 5]. Such bursty dynamics are thought to be the signature of gene regulation and are often described in terms of transcriptional “noise” [5, 6]. Due to the central role of transcription in cellular functions, it is important to understand the mechanisms from which the bursting originates [7].

The microscopic dynamics underlying transcription are not yet well understood. Various factors have been found to influence transcriptional dynamics, mostly by modulating bursting parameters such as the size or frequency of bursts [3, 5]. These factors are often classified as either intrinsic or extrinsic, although this distinction is blurred in many cases. This classification originally derives from the observation that fluctuations in expression levels are partially correlated across multiple genes [8], thus suggesting common, extrinsic causes, while the remaining, independent fluctuations are intrinsic to each gene. Typical major extrinsic noise sources are the cell cycle [911] and cell-size fluctuations [12], the latter partially due to the former. Numerous additional factors such as neighbouring cells, cell morphology, and others have been found to affect transcription to varying degrees [13]. Intrinsic factors include non-linear transcription factor interactions [4, 5, 8], changing chromatin status [14, 15], promoter architecture [3], transcription factor diffusion [16], and several others [1719].

It is unclear how these phenomena relate to the local environment at transcribing genes. These are associated to clusters of RNA polymerase II (PolII), which have been interpreted as “transcription factories” [20] and suggested to modulate the temporal patterns of transcription [21, 22]. More recently, it has been found that, in proximity to active genes, the PolIIs are incorporated in membrane-less droplets, maintained by liquid-liquid phase separation (LLPS) from the rest of the nucleus, with the net effect of locally increasing the population of factors involved in initiation; when PolII is liberated from this domain, transcription can be initiated [2328]. LLPS also provides an explanation for the hitherto enigmatic action-at-a-distance type of gene regulation by distal enhancers, as the nuclear condensates are indeed able to restructure the genome, albeit results on LLPS are relatively preliminary at this stage [29, 30].

While a comprehensive description of the interactions between PolIIs, other factors, and the chromatin within these niches is missing, several observations suggest that termination is linked to reinitiation; these include the presence of the same factor species at both ends of a gene, the reduction of initiation upon perturbation of 3 processes, and protein interactions that have been suggested to juxtapose the promoter and the terminator DNA, forming a structure that has been referred to as a “gene loop” [31, 32]. Importantly, it has been demonstrated that 3 -end processing favours transcription initiation; the presence of such 3 - 5 crosstalk in a gene increases its mean expression level [33]. The concept of LLPS appears highly important in this regard, as PolII undergoes a sequence of post-translational modifications on its C-terminal domain during transcription, while integration into phase-separated domains and reinitiation requires it to be unmodified [24]. In line with this, recent studies suggest that LLPS is also involved in 3 -end transcriptional processes [34]. We generically refer to the shuttling of PolIIs from 5 to 3 , potentially passing through the LLPS compartment, as the recycling. It has been suggested that a repetitive cycle of reinitiation and termination due to these mechanisms is likely to produce a rapid succession of mRNA creation events, thus potentially contributing to the transcriptional bursts [35], but to the best of our knowledge, an experimental verification is as yet lacking.

In this paper, we investigate the interplay between bursty expression and 3 - 5 interactions using an interdisciplinary approach. We first consider two integrated genes that permit studying transcription upon perturbation of their 3 - 5 processes at different induction levels; we demonstrate that these interactions strikingly influence the transcription kinetics and typically elicit the transcriptional noise, by decreasing burst frequency and increasing burst size. We then focus on genome-wide 3 - 5 interactions involved in transcription by means of PolII ChIA-PET sequencing data, showing that they are related to the gene-expression parameters similarly to the transgenes’ results. This scenario is well described by a microscopic stochastic model of gene expression, where tuning a single parameter—corresponding to the probability of local polymerase recycling—naturally yields the observed expression patterns, without involving extrinsic-noise contributors or alternative intrinsic mechanisms.


Cell lines as model systems for PolII recycling

We utilised two HEK293 cell lines which contain on their genomes copies of the genes β-globin (HBB) [33] and a modified version of HIV-1-env [36], respectively, driven by inducible CMV promoters (Fig. 1a, b).

Fig. 1

Characteristics of transgenes used in this study. a Schematic gene structure (top) of WT HBB including CMV promoter, Tet operator, pA signal, and exons (black blocks) as indicated. Total RNA-seq confirms Tet-inducible expression (bottom). b As a, for env. c, d Mutant versions of HBB and env, respectively. Point mutations in pA sites (“x”) and read-through transcription are indicated. Positions are relative to TSS; blue and orange shades correspond to WT and mutant versions, respectively; and light and dark shades correspond to 250 and 0 ng mL −1 Tet, respectively. Coverages by sequencing reads are shown. e Kernel density estimates of the flow-FISH single-cell readings corresponding to the abundances of HBB transcripts, WT (blue), mutant (orange) variants, and control (grey) cells, from replicate k=1, at different induction levels (Tet concentrations in unit of ng mL −1, shades of colours, as indicated on the left). Gene expression increases and saturates upon increasing Tet concentration, and mutant-cell expression is lower than the WT; a.u., arbitrary units; y-axes not to scale

This transgene approach allowed us to exploit very well-characterised model systems for recycling perturbation, which achieve mono-allelic expression and, most importantly, allow precise control of expression levels with inducers [33]. The first gene, HBB, is an example for long-range chromosomal interactions in its native genomic neighbourhood. Its expression involves spatial proximity between the promoter and a locus control region (LCR) over 50 Kb away [37]. The LCR has been studied extensively in murine and human cells (see, for example, references [38, 39]) and jointly regulates expression of several β-globin-like genes at the locus, likely involving LLPS [40]. A recent study demonstrates burst-like expression of murine HBB and suggests that interactions between the LCR and the HBB promoter modulate the bursting parameters [9]. Our cell line features an ectopic insertion of human HBB under control of a tetracycline (Tet) responsive promoter. A previous study of this system has provided a substantial number of results suggesting that 3 mRNA processing contributes to reinitiation of transcription [33]. This notion is based on several findings relating to the introduction of a single point mutation in the SV40 late poly-adenylation (pA) site (Fig. 1c). This includes decreased average mRNA expression levels, while “read-through” transcription downstream of the pA site is increased. Furthermore, the mutation leads to a decrease of PolII, TBP, and TFIIB levels at the promoter shortly after gene induction, and to an accumulation at the “read-through” region instead. Reduced transcription initiation compared to wild-type (WT) cells was also supported by nuclear run on assays and by a changed profile of post-translational modifications of PolII. Noticeably, TFIIB has been demonstrated to be functionally involved in linking 3 and 5 transcriptional activities [41], while post-translational modifications of PolII are in part carried out by Ssu72, which is associated with gene-loop formation in yeast [42] and appears to have similar roles in vertebrates [43]. A further recent study that utilised the ectopic HBB system reports direct detection of gene loops based on a 3C assay in the WT cell line, but not the mutant [44].

The second cell line, containing a Tet-inducible version of HIV-1-env, was previously studied in similar fashion to the HBB constructs. Results using a mutated version of the pA site (Fig. 1d) mirrored those obtained with HBB, suggesting extensive 3 - 5 crosstalk and recycling of factors including polymerase [33, 45]. The env construct uses a BGH, not an SV40 pA site, which suggests that the findings are independent of the type of pA site. Notably, expression of the HIV-1 gene using its native long terminal repeat (LTR) promoter exhibits bursting dynamics [6].

We used these cell lines and their mutant versions as a model system for mammalian gene expression in the presence and absence of 3 - 5 crosstalk. We confirmed by total RNA-seq that HBB and env mRNAs are expressed inducibly in all cell lines (Fig. 1a–d). At high Tet concentration (250 ng mL −1), the fold changes over the un-induced samples were ≈16 and ≈26 for HBB and env, respectively. The mutants were expressed at lower levels and featured read-through transcription as described, with intact transcript sequences, i.e. not subject to splicing defects (Fig. 1c, d) [4654]. This indicated specificity of the pA site mutations.

In order to detect transcripts at the single molecule level, we designed probes for single molecule RNA-FISH (smFISH) and confirmed detection of large transcript numbers upon Tet stimulation of the cells, while the expression of a control gene, AKT1, remained constant (Additional file 1: Section S1 and Figure S3). Microscopy-based smFISH is not ideal for HEK293 cells, since they tend to overlap and form aggregates when growing. We therefore decided to record the smFISH signal by adapting a flow-FISH technique based on flow cytometry [55]; this also resolves extrinsic-noise contributors such as cell size, morphology, and cycle, and, thanks to its high throughput, permits recording vast numbers of cells to analyse overall population structures (Additional file 1: Sections S1 and S7).

While the flow-cytometer fluorescence signal from stained cells serves as a proxy for the mRNA abundance, it is returned in arbitrary units (a.u.) rather than in absolute counts. We thus used microscope imaging and nCounter data to calibrate the flow-FISH fluorescence readings of HBB and env cells, respectively. Applying the clustering algorithm of [5658] to the flow-FISH recordings allowed us to select single-cell readings against those from cell clumps, doublets, and debris (Additional file 1: Section S1 and Figure S1).

Flow-FISH data demonstrate Tet-dose dependent expression of HBB and env, indicating specific detection of transcripts above background noise. The stationary expression levels appeared to reach saturation at 80 ng mL −1 Tet (Fig. 1e and Additional file 1: Section S1 and Figure S2). Staining for the DNA content demonstrates a mild increase of HBB and env expression with increasing cell cycle stage. We found that the contribution to the total variability, measured as the squared coefficient of variation (CV2) of the mRNA population, due to the cell cycle and size was minor (Additional file 1: Section S6) and therefore focused on local genic mechanisms to investigate the observed noise pattern. The measured signal includes a background of unspecific staining and auto-fluorescence of the cells, which is subtracted from the total signal [59]. To gauge this background, we deleted the env gene from its host cell line with Cas9 [60] and performed the staining procedure as before. The resulting control cells had low fluorescence intensity that remained virtually unchanged upon maximal Tet stimulation, thus confirming specificity of our system and validating the use of this control to estimate the background (Additional file 1: Section S1 Table S1). Nuclear RNA export was largely unaltered by the mutations (Wilcoxon rank sum test on nuclear/cytoplasmic ratios from 83 HBB cells at 250ng mL −1 Tet, P = 0.85; for 203 env cells P = 6·10−9, but the ratios differed only by 10%). Note that flow-FISH and its analysis/interpretation are unaffected by nuclear export issues.

Increased transcriptional bursting upon 3 - 5 crosstalk

In order to gain insights into the transcriptional dynamics driving WT and mutant expression of HBB and env, we employed a Markov chain Monte Carlo (MCMC) sampling approach to fit statistical models to the flow-FISH data (Fig. 2). Importantly, Bayesian modelling permitted using microscope and nCounter data to estimate informative prior distributions that calibrate the absolute mRNA quantification, while retaining flexibility in this respect. We further incorporated the background signal in the Bayesian framework based on the estimates from the Tet-stimulated control cells (“Materials and methods” section and Additional file 1: Sections S2-S3).

Fig. 2

Relations between transcriptional mechanisms, mRNA distributions, model fitting, and parameter estimates. The timing of transcription events at a gene is important. If a population of isogenic cells is induced to express a certain gene, then the resulting mRNA numbers in each cell will reflect the dynamic properties of the process of transcription. While higher frequencies of transcription events unsurprisingly will boost mRNA numbers, more complex patterns can be identified; if transcription occurs clustered in time, forming “bursts” of mRNA production, then the variability of mRNA numbers among individual cells will increase. The precise nature of these relations depends on the mRNA half-life and other dynamical parameters. In fact, these parameters in general shape the distribution of mRNAs among cells in characteristic ways. We can exploit this by testing which parameter values and models are compatible with experimentally derived mRNA distributions and thus infer the dynamics of the underlying transcription process. In combination with experimental perturbations, this produces mechanistic insights into transcription. We applied this approach to our datasets and test their agreement with three models of nested complexity, in line with Occam’s razor. The most complex model corresponds to the transcriptional bursting described above and predicts intricate mRNA distributions, subject to several parameters. Estimating koff and kon for instance allows us to determine the average times (as their inverse) the genes spend transcribing and non-transcribing, respectively (Model 1). The second model restricts the duration of the bursts and has fewer parameters (Model 2), while the third assumes that the transcriptional events are homogeneous over time (Model 3). These models generate mRNA counts X and, in turn, fluorescence intensity, which also depends on the scaling factor κ and the measurement noise ε. To determine which models and parameters best explain our data, we used a Bayesian approach. Broadly speaking, this makes use of the so-called Bayes’ theorem to determine the probability of a hypothesis conditional to experimental data. The power of this approach is that it allows the construction of very complex settings from conditional and prior probabilities, which can be computationally explored by means of Markov chain Monte Carlo (MCMC) sampling and produce results which again are probabilities. In general, prior probabilities refer to general assumptions that are taken into account independently of the experimental data, while posterior probabilities result as informed output of the Bayesian inference procedure. The latter correspond to probability distributions of the model parameters, which thus permit excellent assessment of the uncertainties associated with the results. For our study, Bayesian inference was ideal; it allowed us to embed in a single probabilistic framework the data for multiple independent replicates, the measurement precision of our calibration experiments, and the data transformation introduced by the flow cytometer. At the same time, it produced posterior distributions that are highly informative

Our strategy requires flexible models to represent the absolute mRNA abundance. We considered three stochastic models of gene expression to capture the phenomenology of the transcription process (Fig. 2 and “Materials and methods” section). According to the first model, the gene can stay in an “on” state, in which transcription occurs at rate \(\tilde {\alpha }\), or in an “off” state, in which no transcription occurs. The gene switches from “off” to “on” and “on” to “off” at rates \(\tilde {k}_{\text {on}}\) and \(\tilde {k}_{\text {off}}\), respectively. Assuming that the mRNA degrades at constant rate \(\tilde {d}\), this model corresponds to a Poisson-beta mixture distribution for the stationary per-cell mRNA population, which can be expressed in terms of the dimensionless rates α,kon, and koff (Additional file 1: Section S2) [4, 61]. The second model is a simplified version of the former two-state model, where α and koff approach infinity, while the ratio α/koff, which is referred to as the average burst size [62] and incorporated as a single parameter, is held finite; this model gives rise to a negative-binomial stationary mRNA distribution and allows much more efficient MCMC sampling than the Poisson-beta model (Additional file 1: Sections S3-S4). The third model is the most naïve as it assumes that transcription events of individual mRNAs occur independently at constant rate \(\mu _{X} \cdot \tilde {d}\), where μX is the mean mRNA population, thus yielding a Poisson distributed mRNA population at equilibrium which is thought to characterise genes with unregulated expression [5]. Noise levels consistent with the Poisson model [63, 64] or higher [4, 13] have both been reported in the literature. Estimates of the degradation rates \(\tilde {d}\) for both mutant and WT transgenes are listed in Additional file 1: Section S5 [65].

We obtained better fits for the Poisson-beta and the negative-binomial models than the Poisson model (Additional file 1: Sections S4 and S6) for all the replicates. In the Poisson-beta case, the MCMC traces of the rates koff and α had a strong correlation; this revealed that most of the information about these two parameters is encoded in the ratio α/koff (Additional file 1: Section S6 and Figure S10), which is more straightforwardly inferred by means of the negative-binomial model. In fact, for our data, these two models give consistent results in terms of CV2, average burst size α/koff, and burst frequency \(\tilde {k}_{\text {on}}\). To study the transcriptional noise, we obtained the CV2 of the mRNA abundance (which we refer to as \(\text {CV}^{2}_{X}\)) from the estimated parameters (Additional file 1: Sections S2 and S6), and plotted it against the estimated mean expression levels μX (Fig. 3a–c). These reveal a trend observed before in other systems [6, 6668], i.e. the transcriptional noise decreases as μX increases, with the data of each experiment well fitted by a curve of the form \(\text {CV}^{2}_{X}=A/\mu _{\mathrm {X}}+B\), and seems to approach a lower limit beyond which it does not further decrease. Such a limit is known as the noise floor [6973]. Strikingly, the presence of the mutation alters the noise trends, thus suggesting that PolII recycling indeed contributes to the noise. The transcriptional noise at intermediate expression levels is significantly higher in WT than mutant cells. For the HBB gene, this pattern extends throughout the range of all induction levels. Env shows less pronounced differences between WT and mutant cells for the highest expression levels but resembles HBB otherwise. In all these cases, the noise clearly appears higher than postulated by the Poisson prediction curve \(\text {CV}^{2}_{X} = 1/\mu _{X}\) (solid lines in Fig. 3a–c).

Fig. 3

Bayesian parameter estimates. Noise plots of HBB (a) and HIV (b) gene expressions, obtained from the Poisson-beta model for both WT (blue) and mutant (orange) gene variants. Different colour intensities correspond to replicates. Mutation changes the balance between noise and average expression level. c Results from replicates are aggregated into consensus estimates (Additional file 1: Section S4) for HBB and HIV (inset). Solid lines are orthogonal-distance regression curves \(\text {CV}^{2}_{X}=A/\mu _{X}+B\). d, e Consensus estimates of Poisson-beta model parameters \(\mu _{X}, \tilde {k}_{\text {on}}, \tilde {k}_{\text {off}}\), and α/koff for HBB (d) and HIV (e). WT (blue) and mutant (orange) show different patterns, with WT genes having highest average burst size and lower burst frequency than mutant at intermediate expression levels. Single-replicate estimates, and negative binomial and Poisson model results are in Additional file 1: Section S6. Points and error bars correspond to medians and 90% HPD CIs of the posterior distributions

Using the DNA content and the forward scatter signal (FSC-A) as proxies of the cell cycle progression and the cell size, respectively, we heuristically selected populations corresponding to G1, S, and G2 phases of three different sizes each from 40 ng mL −1 Tet-induced cells (Fig. 4a–c); we fitted the negative-binomial model to their mRNA-expression reads, and estimated kinetic parameters and noise for each population, separately. Based on this, we found that the cell cycle and size, which typically are major extrinsic-noise contributors, only account for less than 20% of total mRNA variability for the transgenes (Fig. 4d–f), in contrast with [9, 10]; for further details, see Additional file 1: Section S7.

Fig. 4

Extrinsic and intrinsic noise. ac Scatter plots from flow-FISH signals for the HBB gene, replicate k=3; cells from G1, S, and G2 phase highlighted with red-, green-, and blue-scale colours, respectively; each cell-phase cluster is split into three subsets of different average size with breakpoints at 0.33th and 0.99th quantiles of their FSC-A signals; cell-phase and size are extrinsic variables. d Extrinsic and intrinsic contributions to WT HBB and env genes’ expression noise, SE error bars obtained via bootstrap. e, f Cell cycle analysis; consensus estimates of the negative-binomial model parameters for the same genes; points are medians, error bars comprise 90% HPD CIs

Modulation of rates

The overall rate estimates obtained from our fits are largely in agreement with previous findings from similar systems [3]. In fact, estimated values of \(\tilde {k}_{\text {off}}\) ranged up to ≈2.5 events per minute, with \(\tilde {k}_{\text {on}}\) roughly an order of magnitude lower. Increasing the Tet concentration boosts transcription by increasing the average burst size and the frequency \(\tilde {k}_{\text {on}}\) (Fig. 3d), thus shortening the average “off” state duration (\(1/\tilde {k}_{\text {on}}\)). Intriguingly, for the HBB gene, \(\tilde {k}_{\text {on}}\) is higher in mutant than WT cells in all cases, while the average burst size is lower in mutant cells in all cases. These patterns are less definite for the env gene but appear to support the conclusions from the HBB gene (Fig. 3e and Additional file 1: Section S6). In other words, the 3 - 5 crosstalk imposes a constraint on the transcriptional dynamics whose removal can cause bursts to be more frequent and smaller than in the WT gene.

PolII-mediated 3 - 5 interactions by ChIA-PET

To jointly study the expression of a gene and its 3 - 5 interactions, we analysed publicly available datasets for the human cell line K562, obtained from chromatin-interaction analysis by paired-end tag sequencing (ChIA-PET) [74] and single-cell RNA-seq data (scRNAseq) [64]. We chose to use ChIA-PET against PolII to target chromatin interactions that are involved in transcription. We generated HiC-style interaction matrices (whose entries correspond to 2-Kb regions) from the ChIA-PET data using CHIA-PET2 [75]. We filtered the list of genes from the RefGene database with the hg19 reference genome to only contain those with unique gene symbols on chromosomes 1–22 and X, thus excluding alternatively spliced genes. As a proxy of the 3 - 5 interaction of a gene, we first aggregated the reads corresponding to the interaction between the bins that include its transcription start site (TSS) and transcription end site (TES). The resulting metrics depend on the gene length, which we addressed by dividing the number of reads for each gene by the average read number from 104 genomic intervals of the same length as the gene, randomly sampled across the chromosome. We then applied the \(\text {arcsinh}\sqrt {x+0.5}\) transformation to obtain a variance-stable interaction score [76]. Note that 5 to 3 interaction scores correlate with those for 5 to gene body interactions; this appears unsurprising, given that spatial proximity at one location will favour interaction signals at neighbouring regions, and is tangential to our analyses. We also discarded genes that are shorter than the resolution of our interaction matrices.

Fitting a negative-binomial distribution to the scRNA-seq UMI counts data of [64] allows us to conveniently classify expressed genes (sample UMI mean >0.05) based on the estimated noise \(\text {CV}_{X}^{2}\), the burst frequency kon, and the average burst size α/koff (“Materials and methods” section, see also [7, 77, 78]). These are plotted against the mean expression μX in Fig. 5a–c. It is worth noting that burst frequency averaged over all the genes, \(\bar {k}_{\text {on}}\), seems to determine the average trends of \(\text {CV}^{2}_{X}\) and α/koff. The noise trend appears to be explained by the curve \(\text {CV}^{2}_{X}=1/\mu _{X}+1/\bar {k}_{\text {on}}\) (derived under the negative-binomial assumption, see Additional file 1: Section S2), which in fact separates the genes whose noise levels are higher than the mean predicts (blue and orange markers in Fig. 5) from those whose noise is lower than the prediction (yellow markers). As a measure of the deviation from this prediction, for each gene, we calculated the vertical distance ν of its expression noise to the curve \(\text {CV}^{2}_{X}=1/\mu _{X}+1/\bar {k}_{\text {on}}\) in logarithmic scale, further separating noisy genes for which ν>ν1 (blue makers in Fig. 5) from those for which 0<ν<ν1 (orange makers). The interaction score of the high-noise genes is significantly higher than the score of the intermediate group, which in turn is higher than the low-noise genes’ (Mann-Whitney U test, P <2.2·10−16).

Fig. 5

Genome-wide estimates of transcription kinetics and 3 - 5 interactions. ac Scatter points correspond to genes, axes are medians of posterior distributions for expression parameters μX and \(\text {CV}^{2}_{X}, k_{\text {on}}\), and α/koff, respectively, obtained by Bayesian model fitting. Solid lines correspond to the predictions obtained by assuming that all genes have burst frequency equal to the sample average \(\overline {k}_{\text {on}}\). Genes are divided into three groups corresponding to low-, intermediate-, and high-noise levels (yellow, orange, and blue markers, respectively). Dashed line is obtained by setting ν1=4.5 (equation inset in a) to separate intermediate- and high-noise genes. d, e 3 - 5 interaction scores against expression noise (measured as distance ν from the solid-line prediction of a) and burst frequency kon. f Partitioning the genes by ν shows that the interaction score is significantly higher in higher-noise genes than in lower-noise genes (Mann-Whitney U test, P <2.2·10−16)

There is a significant positive correlation between the distance ν and the interaction score (P <2.2·10−16, lm), thus showing that the noise level of genes with high interaction score is typically higher than the mean predicts; we also observe a significant negative correlation between the interaction score and the burst frequency kon (P <2.2·10−16, lm) and a significant positive correlation between the interaction score and the burst size (P <2.2·10−16, lm), consistent with the results on the transgenes. Filtering out zero-count genes, for which there is little statistical information, increases the P values above to 2.0·10−7,2.0·10−3, and 1.68·10−5, respectively, due to smaller sample sizes, and yields the scatter plots of Fig. 5d, e and the boxplot of Fig. 5f for the three groups. These results agree with those obtained from different ChIA-PET biological repeats and different bin resolutions (1 Kb and 7 Kb; Additional file 1: Section S8 and Figure S16).

Microscopic model

To shed further light on the biological mechanisms involved and test whether PolII shuttling can a priori alter the transcriptional noise as seen in the previous section, we constructed and simulated a more complex stochastic model that captures the most important features of our expression system, i.e. induction, polymerase flux between the LLPS droplet (or, more generically, a cluster of PolII [20]) and the gene, transcription, and decay, while stripping away non-essential details (Fig. 6a). Its precise formulation, along with additional details, is illustrated in the “Materials and methods” section and Additional file 1: Section S9. The model is designed around the idea that each PolII waits in a compartment until the transcription occurs [22], where the compartment represents an LLPS droplet (Fig. 6a). This is immersed in its nuclear environment, which adds and removes PolIIs at rates γ and δ, respectively. In addition to this, by transcribing at rate β, the PolIIs leave the compartment with probability 1−l or are re-injected otherwise. This latter reaction represents the crosstalk between the 3 -end processing and the transcription initiation and helps to sustain the compartment population despite the presence of initiation, which on average contributes to depleting it. Consistently with the two genes integrated in our cell lines, the model encodes a Tet-repressor binding site downstream of the TSS which binds to the TetR factor, present at concentration n. Such a binding event interrupts the transcription; therefore, tuning n allows us to control the blocking rate λoff. The model parameters l and n are akin to the pA mutation and the Tet concentration, respectively, in the experimental settings. We assume that the pA mutation hinders but does not completely block PolII flux back to the compartment (which can also be facilitated by diffusion, see for instance [16, 24]); therefore, the parameter l is assumed to be small but still strictly positive even in the presence of pA mutation. During a TetR blockade, PolIIs cannot transcribe and accumulate in the compartment. When the blockade is released, the transcription occurs at a rate directly proportional to the available PolII (consistently with the law of mass action and experimental observations [22, 73]); therefore, at the end of the TetR blockade, the compartment is highly populated and the transcription occurs repeatedly while the PolII population quickly drops. As the simulation results demonstrate, the model is able to reproduce an increase of transcriptional bursting upon increasing the recycling probability l (Fig. 6). This behaviour is conserved under a broad range of different parameter settings, demonstrating that this is a generic result of our model. Fitting a negative-binomial distribution with vague prior distributions to an ensemble of mRNA abundances, simulated from this microscopic model, shows patterns consistent with those obtained from the experimental data (Fig. 6c and Additional file 1: Section S9).

Fig. 6

Microscopic model of transcription in Tet-inducible genes. a PolIIs (blue) are stored in a compartment (dashed circle) in the proximity of the TSS. With rate β, each PolII leaves the compartment to transcribe mRNA and is re-injected with probability l. When TetR (tetracycline repressor) binds to the TetO 2 operator downstream of the TSS (this occurs at rate λoff), transcription is interrupted and PolIIs accumulate in the compartment. At rate λon, TetR unbinds, thus releasing the large amount of PolIIs accumulated in the compartment to cause bursts, which can be phenomenologically described in terms of the rates \(\tilde {\alpha }, \tilde {k}_{\text {on}}\), and \(\tilde {k}_{\text {off}}\). The compartment also exchanges PolIIs with the nuclear environment (at rates δ and γ). The transcription rate is directly proportional to the abundance of PolIIs, which fluctuates in time and in turn elicits transcriptional noise. Similarly to our experimental system, here, we can simulate different Tet concentrations and the recycling probability by tuning the “off”-switch rate λoff and l, respectively. b Noise plots of simulated mRNA abundances. Setting λoff=nKλ and λon=Kλ, we imitate the effect of different TetR concentration values by tuning n. As Tet presence prevents TetR-TetO 2 binding, small values of n correspond to high Tet-induction levels. For extremely small values of n, the gene can be thought of as being always in “on” state, CV2 becomes very low, and the limiting value of μX can be analytically obtained (vertical lines, see also SI Appendix, section S8). n ranges from 0.1 to 100, and values of the other parameters are (γ,β,d,δ,Kλ)=(10,10,0.01,1,0.01). Inset: same scatter plot, axes in linear scale. At intermediate expression levels, \(\text {CV}^{2}_{X}\) always increases with l. Dashed lines are orthogonal-distance regression curves \(\text {CV}^{2}_{X} = A/\mu _{X}+B\), and solid line is Poisson-noise curve \(\text {CV}^{2}_{X} = 1/\mu _{X}\). c Simulated mRNA-population traces; the two parameter combinations yield almost identical average expressions (sample means 71.3±0.7 and 70.4±0.6 over 104 realisations, respectively, SEs obtained via bootstrap), but different biological noise (sample CV2s 0.78±0.01 and 1.07±0.02, respectively). d Negative-binomial model fit to 500 mRNA abundances simulated from the microscopic model with λoff=0.5,1,1.5,2,2.5,3,2.5,3.5,4,4.5, values of other parameters as in b

While actual transcriptional mechanisms are more complex than our idealised model, the latter provides a significant step towards a mechanistic explanation of our observations. In fact, it captures the essential features of the two gene constructs, and naturally reproduces the observed pattern by tuning only the shuttling probability l and the factor abundance n. Notably, our results demonstrate a minor role for extrinsic contributions to noise (Fig. 6b); in fact, intrinsic factors suffice to yield the noise floor for a wide range of λoff and μX, which contrasts with several other studies [6973].

Alternative model settings

The pA site mutations in HBB and env transgenes cause termination defects which in turn affect the mRNA degradation rate (Additional file 1: Section S5, and [33]). To establish whether the observed noise patterns are ascribable to this, we considered both single-cell expression data and numerical simulations. We analysed human genes in the publicly available dataset of [79], which includes scRNA-seq UMI count data from both influenza-infected and uninfected human A594 cells. Influenza infection causes termination defects in human genes, where transcription can continue for tens of kilobases after the pA site [80, 81]. Native elongation transcript sequencing (NET-seq) also shows that infected cells do not have a difference in initiation of transcription [81]. As suggested in [79], we assumed that a cell is infected if it has at least 0.02% of transcripts coming from influenza genes after 6 h from virus inoculation; otherwise, it is assumed to be uninfected. We then computed the mean expression levels μX,inf. and μX,uninf. and the noise levels νinf. and νuninf. for all human genes (where the subscripts “inf.” and “uninf.” indicate infected and uninfected conditions, respectively). The presence of the termination defect increases the transcript degradation rate, which lowers the UMI counts; we found indeed that μX,inf.<μX,uninf. for the overwhelming majority of genes. We also found an overall increase in noise with infection, i.e. νinf.>νuninf. for many genes, as illustrated in Fig. 7a. A similar scenario is obtained simulating our model with increasing values of mRNA degradation rate d and with recycling rate l held fixed (Fig. 7b): increasing d lowers the average amount of in silico mRNA and increments its CV2. This scenario does not fit the experimental transgene observations, where pA mutation equally lowers μX but decreases \(\text {CV}^{2}_{X}\), and therefore, it is not a plausible representation of their true biological mechanisms.

Fig. 7

Alternative model settings. a Upon influenza infection, termination is altered genome wide, thus affecting observed mRNA counts; μX,uninf. (μX,inf.) and νuninf. (νinf.) are the mean expression and noise levels in uninfected (infected) cells (measured as the distance from solid-line prediction as in Fig. 5), respectively, computed from the scRNA-seq data of [79]. The majority of genes (58%) reports an increase in noise and independently a decrease in mean expression upon infection (upper-left quadrant in scatter plot is the most populated). b Noise plots of mRNA abundances simulated according to the model of Fig. 6 with same parameter values except l and d as in legend; incrementing the mRNA degradation rate d suppresses μX for both values of l, while increasing \(\text {CV}^{2}_{X}\). c Noise plots of mRNA abundances in a variant of the looping model; recycling is allowed but PolIIs cannot pile up before initiation, with noise being virtually unaffected when l is tuned from 0.95 to 0; other model parameters are as in Fig. 6b except (γ,β)=(50,100). The settings of b and c cannot explain the noise patterns observed in HBB and env transgenes

Further, we considered a variant of our model where PolIIs are not allowed to condensate in a compartment before the transcription begins. The importance of particle condensation to fluctuations in the presence of an on-off switch has been mathematically described [82]. In the modified in silico model, indeed, increasing the recycling rate does not increase the noise (Fig. 7c), thus suggesting that a reservoir of PolIIs may be a crucial component of gene regulation.


The wealth of existing results strongly suggests the occurrence of 3 - 5 crosstalk in the WT variants of our transgene systems, involving physical interaction between factors at either gene end and recycling of polymerases, which can be disrupted or strongly reduced upon a point mutation. Similarly, information of the interactions between the ends of genes involved in transcription can be accessed genome wide by means of PolII ChIA-PET sequencing.

Based on both an in-depth analysis of the transgene systems (which provide a controlled experimental setting) and an observational study of ChIA-PET sequencing data (which provide a genome-wide view of chromatin interactions involved in transcription), we present results to suggest that PolII-mediated 3 - 5 interactions are major contributors to transcriptional noise.

Building on standard phenomenological models, transcription parameters, such as average burst size and frequency, are consistently inferred across the different conditions using a Bayesian methodology, to demonstrate the presence of association between 3 - 5 interactions and transcription kinetics. Modelling transcription requires abstraction and simplification due to the complexity of the molecular processes involved and the inadequacy of current experimental methodologies to dynamically resolve structural interactions at individual loci. Furthermore, the Bayesian estimates of the kinetic parameters reflect the incomplete quantitative information available on the experimental device. Also note that our transgenes might not exactly represent the average endogenous gene. Nevertheless, our setting is sufficient to resolve specific patterns, which can be reproduced by an ab initio mechanistic model, thus supporting our conclusions.

The analysis suggests that recycling of the polymerase typically increases noise at a given expression level, while an alternative symmetric interpretation is possible, viz., that recycling permits higher expression at a given noise level. These relations either are a byproduct of the construction of the transcriptional machinery or were selected for. It will be interesting to further explore our findings from an evolutionary perspective. In particular, many studies show how selection of noisy expression can be critical by contributing to cell fate diversity [83, 84] and by favouring their long-term survival in adverse environments [85]. This could also have implications in synthetic biology, where the optimisation of gene expression and the control of its noise are desirable features [86, 87]. Our work provides an important contribution to the field of systems biology by identifying a single base, and thus a genetic determinant, that modulates the balance between the average expression level and its variation.

Materials and methods

Measurement equation and Monte Carlo estimation

We assume that the measured fluorescence Yi of cell i is proportional to the true mRNA abundance Xi and therefore can be expressed as \( Y^{(k)}_{i} = \epsilon ^{(k)}_{i} + \kappa ^{(k)} X^{(k)}_{i}\) where (k) indexes the replicate, κ(k) can be thought of as a scale, and \(\epsilon ^{(k)}_{i}\) is the zero of such a scale, also corresponding to the background of unspecific staining and auto-fluorescence of the ith cell [59]. The background noise is measured, for each replicate k, by means of control cells whose gene of interest has been deleted. These are used to define informative priors for \(\epsilon ^{(k)}_{i}\). Our choice is \(\epsilon ^{(k)}_{i} \sim \text {SN} \left (a^{(k)},\mu _{\epsilon }^{(k)},\sigma ^{(k)}_{\epsilon }\right)\), i.e. the control-cell fluorescence y is supposed to have Azzalini’s skew-normal distribution

$$\begin{aligned} &f_{\epsilon}\left(y|a^{(k)},\mu^{(k)}_{\epsilon},\sigma^{(k)}_{\epsilon}\right) = 2 \Phi\left(\left(y - \mu^{(k)}_{\epsilon}\right) \, \sigma^{(k)}_{\epsilon} \, a^{(k)}\right) \, \phi\left(y|\mu^{(k)}_{\epsilon},\sigma^{(k)}_{\epsilon}\right), \end{aligned} $$

where Φ and ϕ are the standard normal CDF and normal PDF, respectively, while the mean \(\mu ^{(k)}_{\epsilon }\), the standard deviation \(\sigma ^{(k)}_{\epsilon }\), and the skewness parameter a(k) are point estimates from the control datasets. Prior distributions for κ(k) are chosen based on the regression coefficients of gamma generalised linear model fits with identity link. For the remaining parameters, we assume vague gamma priors with mean 1 and variance 103. Adaptive Metropolis-Hastings samplers for model fitting were implemented (Additional file 1: Section S4) [88].

Phenomenological two-state gene-expression models

The transcriptional bursting is fully characterised by the rates \(\tilde {\alpha }, \tilde {k}_{\text {on}},\) and \(\tilde {k}_{\text {off}}\) in units of min−1. It is convenient to express the rates in units of the inverse of the mean mRNA life-time \(\tilde {d}\), i.e. \(\tilde {k}_{\text {off}} = k_{\text {off}} \, \tilde {d}, \tilde {k}_{\text {on}} = k_{\text {on}} \, \tilde {d}, \tilde {\alpha } = \alpha \, \tilde {d}\). It can be shown that the stationary mRNA abundance X for this model is Poisson beta with probability density function (PDF)

$$ f_{X}\left(x|\alpha, k_{\text{on}}, k_{\text{off}}\right) = \int_{0}^{1} f_{\text{Poi}}(x|\alpha p) f_{\text{Be}} (p | k_{\text{on}}, k_{\text{off}})\, \mathrm{d} p, $$

where fPoi(x|α)=αxeα/x! and \( f_{\text {Be}}(p | k_{\text {on}}, k_{\text {off}}) = {p^{k_{\text {on}}-1} (1-p)^{k_{\text {off}}-1}} {\Gamma (k_{\text {on}}+k_{\text {off}})} \left ({\Gamma (k_{\text {off}}) \Gamma (k_{\text {on}})}\right)^{-1}\) are PDFs of Poisson and beta random variables (RVs), respectively. This expresses the hierarchy

$$ X | \alpha, P \sim \text{Poi}(\alpha P), \qquad P|k_{\text{on}}, k_{\text{off}} \sim \text{Beta}(k_{\text{on}}, k_{\text{off}}). $$

It is convenient to reparametrise the Poisson-beta PDF in terms of its mean μX=αkon/(koff+kon), to get

$$\begin{array}{*{20}l} X | \mu_{X}, k_{\text{on}}, k_{\text{off}}, P & \sim \text{Poi}(\mu_{X} P \,({k_{\text{off}}+k_{\text{on}}})/{k_{\text{on}}}),\\ f_{X}(x|\alpha, k_{\text{on}}, k_{\text{off}}) & =: f'_{X}(x|\mu_{X}, k_{\text{on}}, k_{\text{off}}). \end{array} $$

In fact, this allows us to exploit knowledge on μX in the form of informative priors and infer the dimensionless rates α,koff, and kon. These are converted to min−1 by using \(\tilde {d}\) estimated from data (Additional file 1: Section S5). In the limit as koff,α, with their ratio α/koff held finite, the population mean satisfies μX=konα/koff, while the PDF of X approaches the negative-binomial distribution

$$\begin{aligned} &f^{\prime\prime}_{X}\left(x| k_{\text{on}}, k_{\text{off}}/\alpha\right) = \int_{0}^{\infty} f_{\text{Poi}}(x| \lambda) f_{\text{Gamma}} \left(\lambda | k_{\text{on}}, k_{\text{off}}/\alpha\right)\, \mathrm{d} \lambda, \end{aligned} $$

where fGamma(x|kon,koff/α) is the density of a Gamma RV with mean μX and variance μXkoff/α; when this RV concentrates near the mean as kon and koff/α→0, X is Poisson with PDF fPoi(x|μX).

Microscopic model

The microscopic model is defined by means of the following chemical reaction scheme:

$$\begin{array}{*{20}l} &\mathrm{DNA_{on}} + \text{PolII} \overset{l \,\beta}{\longrightarrow} \text{mRNA} + \mathrm{DNA_{on}} + \text{PolII},\\ &\mathrm{DNA_{on}} + \text{PolII} \overset{(1-l)\,\beta}{\longrightarrow} \text{mRNA} + \mathrm{DNA_{on}},\\ &\mathrm{DNA_{on}} \overset{\lambda_{\text{off}}}{\longrightarrow} \mathrm{DNA_{off}}, \quad \mathrm{DNA_{off}} \overset{\lambda_{\text{on}}}{\longrightarrow} \mathrm{DNA_{on}},\\ &\text{mRNA} \overset{d}{\rightarrow} \varnothing, \quad \varnothing \overset{\gamma}{\longrightarrow} \text{PolII}, \quad \text{PolII} \overset{\delta}{\rightarrow} \varnothing. \end{array} $$

By the law of mass action, λoff=nKλ,λon=Kλ, where Kλ and n represent the chemical affinity and concentration of TetR homodimers that bind to the TetO 2 operators downstream of the TSS, respectively. When such a binding event occurs, the transcription is inhibited as elongation is impeded and the resulting locked DNA configuration is represented by the species DNAoff. The switch to DNAon corresponds to the release of the lock. A variant of this model that does not allow PolII to accumulate before transcription is obtained with γ>0 when the PolII compartment is empty and γ=0 otherwise.

Availability of data and materials

Custom scripts have been made available at GNU-GPLv3.0 licence [89]. Data that support the findings of this study have been deposited in Zenodo [90] and in the National Center for Biotechnology Information Gene Expression Omnibus with accession number GSE124682 [91].


  1. 1

    Golding I, Paulsson J, Zawilski SM, Cox EC. Real-time kinetics of gene activity in individual bacteria. Cell. 2005; 123(6):1025–36.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  2. 2

    Chubb JR, Trcek T, Shenoy SM, Singer RH. Transcriptional pulsing of a developmental gene. Curr Biol. 2006; 16(10):1018–25.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  3. 3

    Suter DM, Molina N, Gatfield D, Schneider K, Schibler U, Naef F. Mammalian genes are transcribed with widely different bursting kinetics. Science. 2011; 332(6028):472–4.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  4. 4

    Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 2006; 4(10):309.

    Article  CAS  Google Scholar 

  5. 5

    Munsky B, Neuert G, van Oudenaarden A. Using gene expression noise to understand gene regulation. Science. 2012; 336(6078):183–7.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  6. 6

    Singh A, Razooky B, Cox CD, Simpson ML, Weinberger LS. Transcriptional bursting from the HIV-1 promoter is a significant source of stochastic noise in hiv-1 gene expression. Biophys J. 2010; 98(8):32–34.

    Article  CAS  Google Scholar 

  7. 7

    Larsson AJM, Johnsson P, Hagemann-Jensen M, Hartmanis L, Faridani OR, Reinius B, Segerstolpe Å, Rivera CM, Ren B, Sandberg R. Genomic encoding of transcriptional burst kinetics. Nature. 2019; 565(7738):251–4.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  8. 8

    Elowitz MB, Levine AJ, Siggia ED, Swain PS. Stochastic gene expression in a single cell. Science. 2002; 297(5584):1183–6.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  9. 9

    Zopf CJ, Quinn K, Zeidman J, Maheshri N. Cell-cycle dependence of transcription dominates noise in gene expression. PLoS Comput Biol. 2013; 9(7):1003161.

    Article  CAS  Google Scholar 

  10. 10

    Sherman MS, Lorenz K, Lanier MH, Cohen BA. Cell-to-cell variability in the propensity to transcribe explains correlated fluctuations in gene expression. Cell Syst. 2015; 1(5):315–25.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  11. 11

    Skinner SO, Xu H, Nagarkar-Jaiswal S, Freire PR, Zwaka TP, Golding I. Single-cell analysis of transcription kinetics across the cell cycle. eLife. 2016; 5:e12175.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  12. 12

    Padovan-Merhar O, Nair G, Biaesch A, Mayer A, Scarfone S, Foley S, Wu A, Churchman LS, Singh A, Raj A. Single mammalian cells compensate for differences in cellular volume and DNA copy number through independent global transcriptional mechanisms. Mol Cell. 2015; 58(2):339–52.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  13. 13

    Battich N, Stoeger T, Pelkmans L. Control of transcript variability in single mammalian cells. Cell. 2015; 163(7):1596–610.

    CAS  PubMed  Article  Google Scholar 

  14. 14

    Raser JM, O’Shea EK. Control of stochasticity in eukaryotic gene expression. Science. 2004; 304(5678):1811–4.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  15. 15

    Weinberger L, Voichek Y, Tirosh I, Hornung G, Amit I, Barkai N. Expression noise and acetylation profiles distinguish HDAC functions. Mol Cell. 2012; 47(2):193–202.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  16. 16

    van Zon JS, Morelli MJ, Tanase-Nicola S, ten Wolde PR. Diffusion of transcription factors can drastically enhance the noise in gene expression. Biophys J. 2006; 91(12):4350–67.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  17. 17

    Chong S, Chen C, Ge H, Xie XS. Mechanism of transcriptional bursting in bacteria. Cell. 2014; 158(2):314–26.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  18. 18

    Fukaya T, Lim B, Levine M. Enhancer control of transcriptional bursting. Cell. 2016; 166(2):358–68.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  19. 19

    Bartman C, Hsu S, Hsiung C-S, Raj A, Blobel G. Enhancer regulation of transcriptional bursting parameters revealed by forced chromatin looping. Mol Cell. 2016; 62(2):237–47.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  20. 20

    Papantonis A, Cook PR. Transcription factories: Genome organization and gene regulation. Chem Rev. 2013; 113(11):8683–705.

    CAS  PubMed  Article  Google Scholar 

  21. 21

    Cisse II, Izeddin I, Causse SZ, Boudarene L, Senecal A, Muresan L, Dugast-Darzacq C, Hajj B, Dahan M, Darzacq X. Real-time dynamics of RNA polymerase II clustering in live human cells. Science. 2013; 341(6146):664–7.

    CAS  PubMed  Article  Google Scholar 

  22. 22

    Cho WK, Jayanth N, English BP, Inoue T, Andrews JO, Conway W, Grimm JB, Spille JH, Lavis LD, Lionnet T, Cisse II. RNA Polymerase II cluster dynamics predict mRNA output in living cells. eLife. 2016; 5:e13617.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  23. 23

    Plys AJ, Kingston RE. Dynamic condensates activate transcription. Science. 2018; 361(6400):329–30.

    CAS  PubMed  Article  Google Scholar 

  24. 24

    Boehning M, Dugast-Darzacq C, Rankovic M, Hansen AS, Yu T, Marie-Nelly H, McSwiggen DT, Kokic G, Dailey GM, Cramer P, Darzacq X, Zweckstetter M. RNA polymerase II clustering through carboxy-terminal domain phase separation. Nat Struct Mol Biol. 2018; 25(9):833–40.

    CAS  PubMed  Article  Google Scholar 

  25. 25

    Chong S, Dugast-Darzacq C, Liu Z, Dong P, Dailey GM, Cattoglio C, Heckert A, Banala S, Lavis L, Darzacq X, Tjian R. Imaging dynamic and selective low-complexity domain interactions that control gene transcription. Science. 2018; 361(6400):2555.

    Article  CAS  Google Scholar 

  26. 26

    Sabari BR, Dall’Agnese A, Boija A, Klein IA, Coffey EL, Shrinivas K, Abraham BJ, Hannett NM, Zamudio AV, Manteiga JC, Li CH, Guo YE, Day DS, Schuijers J, Vasile E, Malik S, Hnisz D, Lee TI, Cisse II, Roeder RG, Sharp PA, Chakraborty AK, Young RA. Coactivator condensation at super-enhancers links phase separation and gene control. Science. 2018; 361(6400):3958.

    Article  CAS  Google Scholar 

  27. 27

    Cho WK, Spille JH, Hecht M, Lee C, Li C, Grube V, Cisse II. Mediator and RNA polymerase II clusters associate in transcription-dependent condensates. Science. 2018; 361(6400):412–5.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  28. 28

    Cramer P. Organization and regulation of gene transcription. Nature. 2019; 573(7772):45–54.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  29. 29

    Shin Y, Chang YC, Lee DSW, Berry J, Sanders DW, Ronceray P, Wingreen NS, Haataja M, Brangwynne CP. Liquid nuclear condensates mechanically sense and restructure the genome. Cell. 2018; 175(6):1481–9113.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  30. 30

    Wei MT, Chang YC, Shimobayashi SF, Shin Y, Strom AR, Brangwynne CP. Nucleated transcriptional condensates amplify gene expression. Nat Cell Biol. 2020; 22(10):1187–96.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  31. 31

    Moore MJ, Proudfoot NJ. Pre-mRNA processing reaches back to transcription and ahead to translation. Cell. 2009; 136(4):688–700.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  32. 32

    Kuehner JN, Pearson EL, Moore C. Unravelling the means to an end: RNA polymerase II transcription termination. Nat Rev Mol Cell Biol. 2011; 12(5):283–94.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  33. 33

    Mapendano CK, Lykke-Andersen S, Kjems J, Bertrand E, Jensen TH. Crosstalk between mRNA 3’ end processing and transcription initiation. Mol Cell. 2010; 40(3):410–22.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  34. 34

    Fang X, Wang L, Ishikawa R, Li Y, Fiedler M, Liu F, Calder G, Rowan B, Weigel D, Li P, Dean C. Arabidopsis FLL2 promotes liquid–liquid phase separation of polyadenylation complexes. Nature. 2019; 569(7755):265–9.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  35. 35

    Hebenstreit D. Are gene loops the cause of transcriptional noise?Trends Genet. 2013; 29(6):333–8.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  36. 36

    Damgaard CK, Kahns S, Lykke-Andersen S, Nielsen AL, Jensen TH, Kjems J. A 5’ splice site enhances the recruitment of basal transcription initiation factors in vivo. Mol Cell. 2008; 29(2):271–8.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  37. 37

    Carter D, Chakalova L, Osborne CS, Dai YF, Fraser P. Long-range chromatin regulatory interactions in vivo. Nat Genet. 2002; 32(4):623–6.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  38. 38

    Reik A, Telling A, Zitnik G, Cimbora D, Epner E, Groudine M. The locus control region is necessary for gene expression in the human beta-globin locus but not the maintenance of an open chromatin structure in erythroid cells,. Mol Cell Biol. 1998; 18(10):5992–6000.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  39. 39

    Tolhuis B, Palstra RJ, Splinter E, Grosveld F, de Laat W. Looping and interaction between hypersensitive sites in the active beta-globin locus. Mol Cell. 2002; 10(6):1453–65.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  40. 40

    Hnisz D, Shrinivas K, Young RA, Chakraborty AK, Sharp PA. A phase separation model for transcriptional control. Cell. 2017; 169(1):13–23.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  41. 41

    Wang Y, Fairley JA, Roberts SGE. Phosphorylation of TFIIB links transcription initiation and termination. Curr Biol CB. 2010; 20(6):548–53.

    CAS  PubMed  Article  Google Scholar 

  42. 42

    Ansari A, Hampsey M. A role for the CPF 3’-end processing machinery in RNAP II-dependent gene looping. Genes Dev. 2005; 19(24):2969–78.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  43. 43

    Wani S, Yuda M, Fujiwara Y, Yamamoto M, Harada F, Ohkuma Y, Hirose Y. Vertebrate Ssu72 regulates and coordinates 3’-end formation of RNAs transcribed by RNA polymerase II. PLoS ONE. 2014; 9(8):106040.

    Article  Google Scholar 

  44. 44

    Tan-Wong SM, Zaugg JB, Camblong J, Xu Z, Zhang DW, Mischo HE, Ansari AZ, Luscombe NM, Steinmetz LM, Proudfoot NJ. Gene loops enhance transcriptional directionality. Science. 2012; 338(6107):671–5.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  45. 45

    Perkins KJ, Lusic M, Mitar I, Giacca M, Proudfoot NJ. Transcription-dependent gene looping of the HIV-1 provirus is dictated by recognition of pre-mRNA processing signals. Mol Cell. 2008; 29(1):56–68.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  46. 46

    Martin M. Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet J. 2011; 17(1):10.

    Article  Google Scholar 

  47. 47

    Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010; 26(6):841–2.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  48. 48

    Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013; 29(1):15–21.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  49. 49

    Dyer NP, Shahrezaei V, Hebenstreit D. LiBiNorm: an htseq-count analogue with improved normalisation of Smart-seq2 data and library preparation diagnostics. PeerJ. 2019; 7:6222.

    Article  CAS  Google Scholar 

  50. 50

    Anders S, Pyl PT, Huber W. HTSeq–a Python framework to work with high-throughput sequencing data. Bioinformatics. 2015; 31(2):166–9.

    CAS  Article  Google Scholar 

  51. 51

    Ramírez F, Dündar F, Diehl S, Grüning BA, Manke T. deepTools: a flexible platform for exploring deep-sequencing data. Nucleic Acids Res. 2014; 42(Web Server issue):187–91.

    Article  CAS  Google Scholar 

  52. 52

    Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014; 15(12):550.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  53. 53

    Goldstein LD, Cao Y, Pau G, Lawrence M, Wu TD, Seshagiri S, Gentleman R. Prediction and quantification of splice events from RNA-Seq data. PLOS ONE. 2016; 11(5):0156132.

    Article  CAS  Google Scholar 

  54. 54

    Liao Y, Smyth GK, Shi W. The R package Rsubread is easier, faster, cheaper and better for alignment and quantification of RNA sequencing reads. Nucleic Acids Res. 2019; 47(8):47.

    Article  CAS  Google Scholar 

  55. 55

    Shapiro HM. Practical flow cytometry. Hoboken, NJ, USA: John Wiley & Sons, Inc.; 2003.

    Google Scholar 

  56. 56

    Lo K, Brinkman RR, Gottardo R. Automated gating of flow cytometry data via robust model-based clustering. Cytom Part A. 2008; 73A(4):321–32.

    Article  Google Scholar 

  57. 57

    Hahne F, LeMeur N, Brinkman RR, Ellis B, Haaland P, Sarkar D, Spidlen J, Strain E, Gentleman R. flowCore: a Bioconductor package for high throughput flow cytometry,. BMC Bioinformatics. 2009; 10:106.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  58. 58

    Lo K, Hahne F, Brinkman RR, Gottardo R. flowClust: a Bioconductor package for automated gating of flow cytometry data. BMC Bioinformatics. 2009; 10(1):145.

    PubMed  PubMed Central  Article  CAS  Google Scholar 

  59. 59

    Tiberi S, Walsh M, Cavallaro M, Hebenstreit D, Finkenstädt B. Bayesian inference on stochastic gene transcription from flow cytometry data. Bioinformatics. 2018; 34(17):647–55.

    Article  CAS  Google Scholar 

  60. 60

    Ran FA, Hsu PD, Wright J, Agarwala V, Scott DA, Zhang F. Genome engineering using the CRISPR-Cas9 system. Nat Protoc. 2013; 8(11):2281–308.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  61. 61

    Kim J, Marioni JC. Inferring the kinetics of stochastic gene expression from single-cell RNA-sequencing data. Genome Biol. 2013; 14(1):7.

    Article  Google Scholar 

  62. 62

    Dobrzynski M, Bruggeman FJ. Elongation dynamics shape bursty transcription and translation. Proc Natl Acad Sci. 2009; 106(8):2583–8.

    CAS  PubMed  Article  Google Scholar 

  63. 63

    Zenklusen D, Larson DR, Singer RH. Single-RNA counting reveals alternative modes of gene expression in yeast. Nat Struct Mol Biol. 2008; 15(12):1263–71.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  64. 64

    Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz DA, Kirschner MW. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 2015; 161(5):1187–201.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  65. 65

    Spiess A. qpcR: modelling and analysis of real-time PCR data. 2013. Accessed 2018.

  66. 66

    Dar RD, Shaffer SM, Singh A, Razooky BS, Simpson ML, Raj A, Weinberger LS. Transcriptional bursting explains the noise-versus-mean relationship in mRNA and protein levels. PLoS ONE. 2016; 11(7):0158298.

    Article  CAS  Google Scholar 

  67. 67

    Dar RD, Razooky BS, Weinberger LS, Cox CD, Simpson ML. The low noise limit in gene expression. PLoS ONE. 2015; 10(10):0140969.

    Article  CAS  Google Scholar 

  68. 68

    Soltani M, Vargas-Garcia CA, Antunes D, Singh A. Intercellular variability in protein levels from stochastic expression and noisy cell cycle processes. PLoS Comput Biol. 2016; 12(8):1004972.

    Article  CAS  Google Scholar 

  69. 69

    Bar-Even A, Paulsson J, Maheshri N, Carmi M, O’Shea E, Pilpel Y, Barkai N. Noise in protein expression scales with natural protein abundance. Nat Genet. 2006; 38(6):636–43.

    CAS  PubMed  Article  Google Scholar 

  70. 70

    Newman JRS, Ghaemmaghami S, Ihmels J, Breslow DK, Noble M, DeRisi JL, Weissman JS. Single-cell proteomic analysis of S. cerevisiae reveals the architecture of biological noise. Nature. 2006; 441(7095):840–6.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  71. 71

    Taniguchi Y, Choi PJ, Li G-W, Chen H, Babu M, Hearn J, Emili A, Xie XS. Quantifying E. coli proteome and transcriptome with single-molecule sensitivity in single cells. Science. 2010; 329(5991):533–8.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  72. 72

    Stewart-Ornstein J, Weissman JS, El-Samad H. Cellular noise regulons underlie fluctuations in Saccharomyces cerevisiae. Mol Cell. 2012; 45(4):483–93.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  73. 73

    Yang S, Kim S, Rim Lim Y, Kim C, An HJ, Kim J-H, Sung J, Lee NK. Contribution of RNA polymerase concentration variation to protein expression noise. Nat Commun. 2014; 5(1):4761.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  74. 74

    Li G, Ruan X, Auerbach RK, Sandhu KS, Zheng M, Wang P, Poh HM, Goh Y, Lim J, Zhang J, Sim HS, Peh SQ, Mulawadi FH, Ong CT, Orlov YL, Hong S, Zhang Z, Landt S, Raha D, Euskirchen G, Wei C-L, Ge W, Wang H, Davis C, Fisher-Aylor KI, Mortazavi A, Gerstein M, Gingeras T, Wold B, Sun Y, Fullwood MJ, Cheung E, Liu E, Sung W-K, Snyder M, Ruan Y. Extensive promoter-centered chromatin interactions provide a topological basis for transcription regulation,. Cell. 2012; 148(1-2):84–98.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  75. 75

    Li G, Chen Y, Snyder MP, Zhang MQ. ChIA-PET2: a versatile and flexible pipeline for ChIA-PET data analysis. Nucleic Acids Res. 2017; 45(1):4.

    Article  CAS  Google Scholar 

  76. 76

    Bartlett MS. The use of transformations. Biometrics. 1947; 3(1):39–52.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  77. 77

    Svensson V. Droplet scRNA-seq is not zero-inflated. Nat Biotechnol. 2020; 38(2):147–50.

    CAS  PubMed  Article  PubMed Central  Google Scholar 

  78. 78

    Townes FW, Hicks SC, Aryee MJ, Irizarry RA. Feature selection and dimension reduction for single-cell RNA-Seq based on a multinomial model. Genome Biol. 2019; 20(1):295.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  79. 79

    Russell AB, Trapnell C, Bloom JD. Extreme heterogeneity of influenza virus infection in single cells. eLife. 2018; 7:e32303.

    PubMed  PubMed Central  Article  Google Scholar 

  80. 80

    Zhao N, Sebastiano V, Moshkina N, Mena N, Hultquist J, Jimenez-Morales D, Ma Y, Rialdi A, Albrecht R, Fenouil R, Sánchez-Aparicio MT, Ayllon J, Ravisankar S, Haddad B, Ho JSY, Low D, Jin J, Yurchenko V, Prinjha RK, Tarakhovsky A, Squatrito M, Pinto D, Allette K, Byun M, Smith ML, Sebra R, Guccione E, Tumpey T, Krogan N, Greenbaum B, van Bakel H, García-Sastre A, Marazzi I. Influenza virus infection causes global RNAPII termination defects. Nat Struct Mol Biol. 2018; 25(9):885–93.

    CAS  PubMed  Article  Google Scholar 

  81. 81

    Bauer DLV, Tellier M, Martínez-Alonso M, Nojima T, Proudfoot NJ, Murphy S, Fodor E. Influenza virus mounts a two-pronged attack on host RNA polymerase II transcription. Cell Rep. 2018; 23(7):2119–29.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  82. 82

    Cavallaro M, Mondragón RJ, Harris RJ. Temporally correlated zero-range process with open boundaries: steady state and fluctuations. Phys Rev E. 2015; 92(2):022137.

    Article  CAS  Google Scholar 

  83. 83

    Kussell E, Leibler S. Phenotypic diversity, population growth, and information in fluctuating environments. Science. 2005; 309(5743):2075–8.

    CAS  PubMed  Article  Google Scholar 

  84. 84

    Eldar A, Elowitz MB. Functional roles for noise in genetic circuits. Nature. 2010; 467(7312):167–73.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  85. 85

    Beaumont HJE, Gallie J, Kost C, Ferguson GC, Rainey PB. Experimental evolution of bet hedging. Nature. 2009; 462(7269):90–93.

    CAS  PubMed  Article  Google Scholar 

  86. 86

    Murphy KF, Adams RM, Wang X, Balázsi G, Collins JJ. Tuning and controlling gene expression noise in synthetic gene networks. Nucleic Acids Res. 2010; 38(8):2712–26.

    CAS  PubMed  PubMed Central  Article  Google Scholar 

  87. 87

    Bandiera L, Furini S, Giordano E. Phenotypic variability in synthetic biology applications: dealing with noise in microbial gene expression. Front Microbiol. 2016; 7:479.

    PubMed  PubMed Central  Article  Google Scholar 

  88. 88

    Patil A, Huard D, Fonnesbeck C. PyMC: Bayesian stochastic modelling in Python. J Stat Softw. 2010; 35(4):1–81.

    PubMed  PubMed Central  Article  Google Scholar 

  89. 89

    Cavallaro M, Walsh MD, Jones M, Teahan J, Tiberi S, Finkenstädt B, Hebenstreit D. Supporting software to “3’-5’ crosstalk contributes to transcriptional bursting”. GitHub. 2020.

  90. 90

    Cavallaro M, Walsh MD, Jones M, Teahan J, Tiberi S, Finkenstädt B, Hebenstreit D. Supporting data to “3’-5’ crosstalk contributes to transcriptional bursting”. Zenodo. 2020.

  91. 91

    Cavallaro M, Walsh MD, Jones M, Teahan J, Tiberi S, Finkenstädt B, Hebenstreit D. Supporting RNA-seq data to “3’-5’ crosstalk contributes to transcriptional bursting”. Gene Expression Omnibus. 2020. Accessed 2019.

Download references


We thank Louise Dyson, Matt Moores, Lucy Ternent, and Jonathan Keith for valuable discussions, and Sharon Collier and Charlotte Petersen for minor contributions. pSpCas9(BB)-2A-GFP (PX458) was a gift from Feng Zhang (Addgene plasmid #48138;; RRID:Addgene 48138). We also thank Søren Lykke-Andersen and Torben Heick Jensen for providing cell lines.

Review history

The review history is available as Additional file 2.


The research was supported by BBSRC grant BB/L006340/1 and utilised WISB computational and experimental facilities (grant ref: BB/M017982/1) funded under the UK Research Councils’ Synthetic Biology for Growth programme.

Author information




M.C., S.T., B.F., and D.H. designed the research; M.C., M.D.W., M.J., J.T., and D.H. performed the research; M.C., M.D.W., M.J., and D.H. analysed the data; M.C., M.D.W., and D.H. wrote the paper with input from all authors. All authors read and approved the final manuscript.

Corresponding authors

Correspondence to Massimo Cavallaro or Daniel Hebenstreit.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1

“Supporting Information to: “3’-5’ crosstalk contributes to transcriptional bursting”.

Additional file 2

Review history.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Cavallaro, M., Walsh, M.D., Jones, M. et al. 3 -5 crosstalk contributes to transcriptional bursting. Genome Biol 22, 56 (2021).

Download citation


  • Gene expression
  • Parameter inference
  • Mathematical modelling
  • Gene looping
  • Biological noise
  • Liquid-liquid phase separation