Hi-D: nanoscale mapping of nuclear dynamics in single living cells

Bulk chromatin motion has not been analyzed at high resolution. We present Hi-D, a method to quantitatively map dynamics of chromatin and abundant nuclear proteins for every pixel simultaneously over the entire nucleus from fluorescence image series. Hi-D combines reconstruction of chromatin motion and classification of local diffusion processes by Bayesian inference. We show that DNA dynamics in the nuclear interior are spatially partitioned into 0.3–3-μm domains in a mosaic-like pattern, uncoupled from chromatin compaction. This pattern was remodeled in response to transcriptional activity. Hi-D can be applied to any dense and bulk structures opening new perspectives towards understanding motion of nuclear molecules. Electronic supplementary material Supplementary information accompanies this paper at 10.1186/s13059-020-02002-6.


Introduction
Spatial organization and dynamics of chromatin correlate with cell function and fate [1]. On the coarsest level, chromosomes occupy territories in mammalian cells [2]. The relative proportion of dense heterochromatin and open euchromatin regions reflect cellular activity [3]. Transitions within and between eu-and heterochromatin involve the remodeling of multiple hierarchical levels of chromatin organization from domain folding and long-range looping to nucleosome density to adapt to and enable DNA processing [4]. Structural models derived from contact and crosslinking frequencies [5][6][7] are consistent with the view that the genome is partitioned into functional compartments and sub-compartments [8]. It is now becoming increasingly clear that such nuclear compartments are also dynamic entities whose conformational changes impact mechanisms and function of genome folding [9]. Tracking of labeled single DNA loci [10][11][12][13][14] or chromatin domains [15,16] demonstrated that chromatin motion is highly heterogeneous at short time intervals. Sparse loci, however, are difficult to place in the context of global chromatin organization [17] and locally restrained genomic processes can also not be inferred from quantities averaged over the entire nucleus [18,19]. A first study analyzing bulk chromatin motion at nanoscale resolution revealed dynamic partitioning of chromatin into a number of nuclear sub-regions with correlated motion of chromatin in the micrometer range [20]. However, the cause and/or effect of (correlated) chromatin dynamics is not yet clear. Likewise, whether compaction of chromatin determines its spatial coherence or whether chromatin dynamics are distinct in open and closed chromatin is still a matter of debate.
To tackle this need, we developed a new approach called high-resolution diffusion mapping (Hi-D) that overcomes the limitations of sparse and ensemble approaches. Hi-D combines a dense optical flow reconstruction to first quantify the local motion of chromatin and other abundant nuclear constituents at sub-pixel accuracy within a series of images, and a Bayesian inference approach in the second step to precisely classify local types of diffusion. Biophysical properties such as diffusion constants and anomalous exponents are determined for each pixel to create two-dimensional maps of chromatin dynamics at single pixel resolution in living single cells. Hi-D created spatially resolved maps show that DNA compaction and dynamics do not necessarily correlate. Instead, the maps suggest that chromatin dynamics are dictated by DNA-DNA contacts and protein binding to DNA, rather than chromatin density.

Hi-D maps genome dynamic properties at nanoscale resolution in living cells
Motion of densely distributed fluorescent molecules was quantitatively reconstructed from a series of conventional confocal fluorescence microscopy images by a dense optical flow method [20]. By integrating the resulting flow fields, a trajectory was computed for each pixel ( Fig. 1a; Additional file 1: Note S1; Additional file 1: Fig. S1). The type of diffusion characterizing each pixel's chromatin motion was chosen in an unbiased manner using a Bayesian inference from a set of five common models to fit each trajectory's mean squared displacement (MSD) [21] (Fig. 1b, left panel; Additional file 1: Fig. S2). The best fitting models were directly mapped onto the nucleus ( Fig. 1b; right panel) ("Methods" section). We found that only a small fraction of trajectories displayed directed diffusion (Fig. 1b), while the bulk of chromatin exhibited sub-diffusive behavior. Distinguishing between the comparable cases of anomalous and confined diffusion is a challenging task, given the limited duration of the experiment. After examination of a range of parameters governing these different types of diffusion, our results suggest that chromatin diffusion in human U2OS cells can be adequately described as anomalous to avoid misinterpretation (Additional file 1: Note S2; Additional file 1: Fig. S3). The resulting biophysical parameters calculated for each pixel by Hi-D (diffusion constant D, anomalous exponent α, and drift velocity V) are presented in color-coded 2D heatmaps ( Fig. 1c) ("Methods" section). They are distributed in a mosaic of irregular shape and dimensions of similar values (Fig. 1c). These parameter maps clearly demonstrate that chromatin dynamics are spatially heterogeneous and partitioned. These maps also illustrate the notion that chromatin dynamics are spatially correlated in the micrometer range [18,20]. To further characterize this heterogeneous distribution, the parameter distributions were deconvolved into discrete subpopulations using a general mixture model (GMM) ( Fig. 1c; "Methods"; and Additional file 1: Fig. S4). The GMM identified three populations of chromatin mobility referred to as slow, intermediate, and fast ("Methods"; exemplary in Fig. 1c), irrespective of the parameter under consideration (diffusion constant or anomalous exponent) or transcriptional state of the cell. We found that chromatin dynamics characterized by directed motion involving a drift velocity (V) was less present than free and anomalous diffusion and provided significantly less data for V than for the other two parameters (Fig. 1c). Hence, drift velocity was not retained for further analysis.

Validation of the Hi-D approach in simulation and experiment
In order to examine the suitability of calculated trajectories and associated diffusion constants by Hi-D to whole-chromatin imaging conditions, we compared Hi-D to dynamic multiple-target tracing (MTT), a single-particle tracking (SPT) method which is commonly used for dense molecule tracking [22] (Fig. 2a,b; Additional file 1: Note S3; Additional file 1: Fig. S5; Additional file 1: Fig. S7). While the SPT method outperforms the Hi-D approach in scenarios of sparsely labeled molecules (Fig 2a), Hi-D analysis provided considerably more accurate estimates of local diffusion constants than SPT in scenarios of densely labeled molecules or structures of heterogeneous label density such as chromatin (Fig. 2b). Hi-D therefore constitutes an approach to extract dynamic parameters of biomolecules with dense labeling where SPT is unsuitable. One should, however, keep in mind that SPT and Hi-D are meant to analyze images from drastically Fig. 1 Hi-D enables spatially resolved mapping of genome dynamic properties at nanoscale resolution in living cells. Workflow: a A series of N = 150 confocal microscopy images acquired at 5 fps (left) (here SiR-DNA stained living U2OS cells). Dense optical flow was applied to define (N-1) flow fields of the images (center, color coded) based on fluorescence intensity of each pixel (size = 65 nm). Individual trajectories are reconstructed over the duration of acquisition (right). b MSD model selection (left): Trajectories of a 3 × 3 neighborhood of every pixel are used to calculate a mean MSD curve and its corresponding covariance matrix. By a Bayesian inference approach, the type of diffusion fitting each individual curve is chosen (free diffusion (D), anomalous diffusion (DA), directed motion (V), or a combination (DV) or (DAV). The spatial distribution of the selected models for each pixel is shown as a color map. c Maps of biophysical parameters (D, α, and V) extracted from the best describing model per pixel reveal local dynamic behavior of DNA in large domains. The distribution is deconvolved using a general mixture model Fig. 2 Experimental validation of the Hi-D approach. a Exemplary frame of a simulated time series with low density (0.001/px 3 ) of emitters undergoing Brownian motion convolved by a typical point spread function (left). The time series is subject to Hi-D and single-particle tracking estimating the trajectories of emitters. From the estimated trajectories, the MSD is computed and compared to the ground truth diffusion constant. The relative error in the determined diffusion constant is shown. b High density (0.02/px 3 ) of emitters with patches of super-high density (0.035/px 3 ) encircled for visualization, imitating regions of densely packed chromatin. Dashed lines show the optimal value, i.e., perfect agreement between estimation and ground truth. Red lines indicate the median value. Data from 10 independent simulations. Statistical significance assessed by a two-sample Kolmogorov-Smirnov test (***: p < 0.001). c MSD curves computed in fixed (n = 13) and living quiescent (− serum; n = 13) and serum stimulated (+ serum; n = 14) U2OS cells. Diffusion constants for the three average curves were derived by regression yielding D = (0.87 ± 0.1) · 10 −3 μm 2 /s for quiescent, D = (2.6 ± 0.1) · 10 −4 μm 2 /s for stimulated, and D = (6.1 ± 0.1) · 10 −6 μm 2 /s for fixed cells. MSD curves show considerably higher MSD values for living cells and diffusion constants are two orders of magnitude higher for living cells thus confirming the detection of motion well above noise background. d Diffusion constants derived from a nucleus corrupted with varying levels of signal-to-noise ratio. Results are consistent up to a lower bound of~20 dB. e Map of diffusion constants computed by Hi-D (left) and iMSD (right). Diffusion constants are color coded from their minimum to their maximum value (blue to yellow; for absolute values see f). Red arrows indicate regions of high mobility detected by both methods. f Diffusion constants shown in e and g corresponding values of the anomalous exponent computed by Hi-D (blue) and iMSD (red) different labeling conditions and should thus refrain from a direct comparison between single-locus dynamics analyzed by SPT and local bulk chromatin dynamics by Hi-D (Additional file 1: Note S4; Additional file 2: Movie S1).
To ensure that the calculated dynamics are not a consequence of imaging noise, we experimentally validated the sensitivity of the approach by calculating the MSD for formaldehyde fixed and living U2OS cells labeled by SiR-DNA in quiescence (− serum) or normal growth (+ serum). Diffusion constants derived from the MSD curves by Bayesian inference were about two orders of magnitude greater in living cells than in fixed cells (Fig. 2c) confirming that Hi-D enables quantifying DNA dynamics well above the noise background. To confirm the robustness of extracted parameter values with respect to varying levels of imaging noise, Hi-D was applied to nuclei to which noise was artificially added. The signal-to-noise ratio (SNR) of the original nuclei were about 26 dB and subsequently reduced stepwise down to 16 dB. The distributions of computed diffusion constants were consistent up to a lower limit of about 20 dB, below which the distribution is considerably biased towards larger values and broadens (Fig. 2d, Additional file 1: Fig. S8). Likewise, features of the spatial map of diffusion constants were equally conserved for SNR values as low as 20 dB, demonstrating the robustness of Hi-D for varying imaging noise (Additional file 1: Fig. S8). In analogy to the robustness to varying SNR levels, Hi-D is thus robust to photobleaching effects (if SNR ≥ 20 dB) since flow fields are only estimated between consecutive images, for which illumination changes due to photobleaching are usually negligible. Furthermore, Hi-D was also shown to be robust to small variations in time intervals of acquired time series as long as the expected motion between frames was in the order of the pixel size (Additional file 1: S9). We further validated Hi-D against iMSD, a well-established method to extract dynamic information of dense molecules, based on the spatial correlation function of intensity fluctuations caused by diffusing molecules, which are recorded using camera-based systems [23]. Using successive calculations of iMSD to overlap regions of interest, we computed a diffusion map similar to Hi-Dderived maps (Additional file 1: Note S5). Quantitatively, both methods yield diffusion constants of the same order of magnitude (Hi-D (1.6 ± 0.8) · 10 −3 μm 2 /s , iMSD (2.2 ± 4.5) · 10 −3 μm 2 /s, mean ± standard deviation), which are consistent with reported values using SPT and correlation spectroscopy methods applied to interphase chromatin [14,18]. However, the distribution of values derived by iMSD was considerably broader than the distribution revealed by Hi-D (Fig. 2f). The distribution of anomalous exponents computed by iMSD showed many spurious values at the limit of the scale, while Hi-D consistently returns reasonable values (Fig. 2g). We thus conclude that Hi-D reveals dynamic parameters of the same order of magnitude as iMSD but is advantageous in the estimation of multiple parameters simultaneously, by virtue of the featured Bayesian model selection. Hi-D is thus an accurate, robust, fast, and easy to use tool to determine dynamics of macromolecules nucleus-wide.

Single-cell biophysical property maps of genome conformation and behavior
To concomitantly monitor position and distribution of the DNA mobility populations under different biological conditions, we determined Hi-D maps of the same serumstarved and then stimulated cell (Fig. 3a). Transcription is largely inhibited in cell-cycle arrested cells grown in a serum-free medium (Additional file 1: Fig. S10). Adding serum to the medium stimulates mRNA production through transcriptional activity [20,[24][25][26]. As above, diffusion constants of DNA motion were calculated for each pixel based on the model selected by Bayesian inference. Small diffusion constants characterized motion of chromatin prominently located at the nuclear envelope (dark blue). Plotting the average diffusion constants versus the distance from the nuclear periphery showed that the mobility within a rim of 1 μm from the periphery increases linearly before adopting a nearly constant value in the inner volume of the nucleus (Additional file 1: Fig. S11). At numerous sites across the remaining nuclear volume, fast diffusive areas of irregular dimensions spanning 0.3 − 3 μm in diameter (yellow areas in Fig. 3a) are embedded in the bulk of moderately dynamic chromatin. Areas of different parameter values seamlessly transition into one another without clearly defined boundaries, reminiscent of spatially correlated chromatin dynamics [20]. Upon serum stimulation, the spatial distribution of high and low diffusion constants was largely conserved (compare the presence of yellow regions in the quiescent and serum-stimulated cell in Fig. 3a), but the diffusion constant was globally strongly reduced by nearly one order of magnitude. Deconvolution of the distribution of diffusion constants and labelling of pixels according to the mobility population determined by the GMM (Fig. 3b; slow-red, intermediate-orange, and fast-yellow) yields a map in agreement with this observation. Deconvolution hence classifies regions according to the values of a given parameter compared to other regions within the same nucleus. When nuclear activity is modulated, changes in this classification can be measured. In particular, the fast diffusing population with respect to the bulk chromatin in quiescent cells is reduced upon serum stimulation (Fig. 3c) and re-classified as intermediate population. Connected areas with high mobility appear eroded (Fig. 3b). In contrast, the slow population occupying~6% of the nuclear area, which is almost exclusively located at the nuclear periphery, was invariant to transcriptional changes (Fig. 3b, c). Despite considerable reorganization of the relative distribution of mobility populations and overall reduced intensity of motion, the type of diffusion governing the nuclear parameter maps showed only moderate changes upon stimulation of transcriptional activity (Fig. 3d).
Anomalous diffusion dominated across the entire nucleus (0.3 ≤ α ≤ 0.73) forming a mosaic-like pattern, which underwent, compared to maps of diffusion constants, considerable remodeling upon transcriptional activation (Fig. 3e, f). Within this pattern, patches of super-diffusive (red: α > 1) motion segregated into distinct islands which became more fragmented upon serum stimulation. Random contacts or re-distribution of existing contacts of the chromatin with itself may give rise to such variations in anomalous exponent upon serum stimulation [27]. Because the diffusion constant of chromatin fibers appears unaffected for moderate degrees of crosslinking [28], we expect that association of proteins with DNA upon serum stimulation could favor global decrease of mobility in vivo. Hi-D reveals high-resolution spatial changes in mobility and in anomaly of chromatin diffusion in single cells. Further investigation may tell us if all or a subset of visible physical domains correspond to the ones determined using chromosome conformation capture (Hi-C).

Transcription modulates chromatin and RNA polymerase II motion
To further explore the relationship between global chromatin dynamics and transcriptional activity, we examined the dynamics of RNA polymerase II (RPB1-Dendra2; RNA Pol II) in live U2OS cell nuclei (Fig. 4a) at different transcriptional states. Hi-D analysis resolved three mobility populations of RNA Pol II (Fig. 4b), which is consistent with the existence of three kinetically different groups of RNA Pol II based on the half-life of chromatin-binding [29,30]. Diffusion constants of the three dynamic populations in actively transcribing cells (grown in normal condition) were significantly greater compared to transcriptionally less-active cells (serum-starved cells) (Fig. 4b). In quiescent cells, the fraction of quickly diffusing RNA Pol II complexes was reduced compared to actively transcribing cells. Upon elongation inhibition using 5,6-dichloro-1-β-D-ribofuranosylbenzimidazole (DRB), the slowly diffusing fraction was greater than in untreated cells, indicating tenacious immobilization of RNA Pol II on the DNA template after initiation (Fig. 4c). The average diffusion constants in serum-starved and DRB-treated cells stayed roughly unchanged in all three populations, suggesting that RNA Pol II is unbound in the absence of serum [25].
We then compared the effect of transcriptional activity on chromatin dynamics in serum-starved and serum-stimulated cells. In contrast to RNA Pol II mobility, the average diffusion constant of DNA in serum-starved U2OS nuclei decreased by nearly one order of magnitude for all three populations upon addition of serum. Arresting RNA Pol II before elongation did not change the observed diffusion constants, compared to undisturbed  (Fig. 4d). These cell-population results are consistent with results from single-cell analyses (Fig. 3) and strengthen our hypothesis that nuclear processes considerably hamper diffusion of chromatin. In a quiescent state, only essential nuclear processes are maintained. Fewer protein complexes acting upon DNA in a could facilitate motion of the chromatin fiber. Upon serum stimulation, binding of transcription factor complexes and other proteins to DNA increase crowding and reduce the freedom to move and hence the apparent chromatin dynamics, at least in a subset of domains. Increased DNA-protein interactions and interchromatin contacts also enhance spatial correlation of chromatin dynamics in serum supplemented compared to quiescent cells [20]. Serum addition to starved cells likely stimulated RNA Pol II binding to DNA. When inhibiting elongation, transcription factories are still present [24] and, in agreement with chromatin coherence, DNA mobility remains constrained [20].
Independently of the culture conditions, a ground-state Rouse-like behavior characterizes chromatin in the examined nuclei (MSD fit with α close to 0.5) [31,32]. Upon serum stimulation of starved cells, anomalous diffusion became predominant and its value (α~0.33) is indicative of entangled polymers [33]. This behavior was also determined for a single labeled site next to an actively transcribing gene [13]. Entanglement could stem from random DNA-protein contacts, a model coherent with polymer simulations inspired by chromosomal capture data [34]. Hindered motion of chromatin and RNA Pol II is thus a direct consequence of forming transcription "hubs" or factories to which chromatin is tethered [25].

Chromatin dynamics is uncoupled from compaction
We next asked if chromatin dynamics are influenced by the compaction of chromatin since heterochromatin is widely believed to be less dynamic than euchromatin [16]. Euand heterochromatin domains were determined in serum-starved and serum-stimulated cells by quantifying fluorescence intensity as described in [35] (Fig. 5a). We found that the average flow magnitude between successive frames was independent of the compaction state of chromatin (Fig. 5b). Likewise, the distribution of diffusion constants did not correlate with chromatin density or euchromatin and heterochromatin (Fig. 5c). Peripheral heterochromatin overlapped with the slow motion domain at the nuclear rim (Fig. 5d) consistent with previous findings [16]. In the inner nuclear volume however, we did not observe any tendency of heterochromatin being associated with a specific mobility population. Instead, we observed that mobility populations were distributed randomly among euchromatin and heterochromatin regions (Fig. 5e) with the exception that in serumstarved cells, the heterochromatin fraction was slightly enriched in the slow diffusing population at the nuclear periphery. Furthermore, we found that regions characterized by a specific anomalous exponent did not preferentially overlap with either eu-or heterochromatin (Fig. 5f, g). These results also hold for MCF7 cells and different fluorescent markers for chromatin (Additional file 1: Fig. S12). These findings were confirmed in NIH3T3 cells expressing GFP-HP1α, a well-established marker for heterochromatin (Additional file 1: Fig. S13). In addition, Hi-D analysis of HP1α hints towards previously proposed liquid phase separation [36]. Our results thus suggest that chromatin undergoes diffusion processes which are, in general, unrelated to the compaction level of chromatin. However, compact chromatin is characterized by increased contact frequency of the chromatin fiber with itself, which could enhance the extent of coherent chromatin motion. To test this hypothesis, we calculated Moran's Index of Spatial Autocorrelation [37] for the flow magnitude assessed at different time lags in eu-or heterochromatin (Fig. 5h). We found that heterochromatin exhibits enhanced spatial autocorrelation compared to euchromatin across all accessible time lags. Furthermore, the spatial autocorrelation decreases with increasing time lags in serum-starved cells, while in serum-stimulated cells, autocorrelation is enhanced in the long-time limit (over 30 s). This finding points to active processes establishing spatial coherence in the long term [18,20] while random processes such as thermal fluctuations decrease autocorrelation at time scales greater than 10 s in serum-starved cells.

Discussion
Hi-D enables analyzing the dynamics of dense structures such as chromatin and RNA Pol II directly in single cells without losing active fluorophore density and with no need for prior experience in sophisticated labelling preparations or advanced microscopy [38]. We show that Hi-D is an accurate and robust read-out of chromatin dynamics and that the information gain through image analysis afforded by Hi-D alleviates the incompatibility of conventional microscopy for nanoscale mapping of properties of nuclear dynamics in living cells.
Hi-D analysis revealed that DNA dynamics can be classified into three subpopulations within the mammalian nucleus. The use of three distinct populations does not reflect real, likely fluctuating transitions between the populations, but is a handy means to characterize the highly complex and heterogeneous dynamic landscape of chromatin. The first population, a slow mobility fraction prominently located at the nuclear rim, is reminiscent of laminaassociated domains (LADs) [39]. The dynamic response of this population to transcriptional stimuli supports the hypothesis that LADs play an important role in attenuating transcription activity and in controlling gene expression [40]. Although less mobile chromatin at the nuclear periphery largely overlapped with long known perinuclear heterochromatin [41,42], Hi-D analysis remarkably points to an overall absence of correlation between chromatin compaction and mobility. Indeed, two non-peripheral chromatin subpopulations which display intermediate and highly diffusive regimes are distributed in a mosaic-like pattern throughout the nucleus and include sections of intranuclear heterochromatin. Heterochromatin therefore does not exhibit low mobility in nuclear space in general, but may be divided into a more viscous component with reduced mobility due to anchoring to a nuclear lamina and a more rigid LAD component tethered to nuclear structures [39,[43][44][45]. These results are coherent with the hypothesis that heterochromatin domains are formed by a liquid phase-separation mechanism characteristic of mobility reduction across phase boundaries [36]. The extent of the third, highly mobile fraction, which dominated in the quiescent state, decreased dramatically when cells were serum stimulated. This switch in chromatin mobility is suggestive of altered DNA-protein interactions and accrued local concentration of proteins in a highly transcribing nucleus [25].
Heterogeneous chromatin motion arises due to irregular protein binding along the chromosomes and can lead to thermodynamic or electrostatic self-organization of nuclear compartments [46,47]. Local patches of large anomalous exponents indicate super-diffusive behavior of chromatin which may result, among others, from active noise acting on the chromatin fiber [48,49], even in quiescent cells. Chromatin patches with α < 0.3 and α > 1 respectively correspond in size to one or a few DNA loops [50]. These two types of patches are present as islands within the general chromatin fraction governed by an anomalous exponent 0.3 < α < 1. This organization is also in good agreement with the chromosome territory-interchromatin compartment model [2]. In conclusion, the combination of diffusion constant and anomalous exponent maps provides an integrated view on chromatin dynamics and yields insights into possible mechanisms driving dynamics as well as the local chromosomal organization and reorganization during genomic processes.
Our results support the hypothesis that a short treatment of DRB is sufficient to halt most of a cell's transcription [51], but it is believed to have a limited effect on chromatin configuration of promoters, or on upstream events and assembly dynamics of the preinitiation complex (PIC) [52]. Our results suggest that a few minutes of DRB treatment affects only the assembly dynamics of the RNA Pol II itself. In contrast, longer treatment with DRB as well as prolonged serum starvation is expected to result in massive PIC disassembly. We believe therefore that increased chromatin mobility upon serum starvation correlates with PIC breakdown. Conversely, when serum is present in the medium, activity of RNA Pol II and of PIC enzymes is restored and PIC will assemble onto DNA, both pervasively across the genome at background levels (pervasive transcription) and more stably at genes [53,54]. Reduced mobility of chromatin in the presence of serum suggests that stable PIC binding may serve as an anchoring function for individual chromatin fibers, on top of its essential function in transcription initiation.
Hi-D can be applied to real-time imaging of any abundant fluorescent molecule to obtain comprehensive maps of their dynamic behavior in response to stimuli, inhibitors, or disruptors of nuclear functions and integrity.

DNA staining
U2OS and MCF-7 cell lines were labeled by SiR-DNA (SiR-Hoechst) kit (Spirochrome AG). DNA was labeled as described in [56]. Briefly, we diluted 1 mM stock solution in cell culture medium to 2 μM and vortexed briefly. On the day of the imaging, the culture medium was changed to medium containing SiR-fluorophores and incubated at 37°C for 30-60 min. Before imaging, the medium was changed to L-15 medium (Liebovitz's, Gibco) for live imaging.

Cell starvation and stimulation
For starvation mode, cells were incubated for 24 h at 37°C before imaging with serumfree medium (DMEM, Glutamax containing 50 μg/ml gentamicin, 1 mM sodium pyruvate, and G418 0.5 mg/ml). Just before imaging, cells were mounted in L-15 medium. For stimulation, 10% FBS was added to the L-15 medium for 10-15 min.

Flow cytometry analysis with Hoechst and Pyronin Y staining
To differentiate cells in G0 versus G1, double staining of Hoechst 33324 and Pyronin Y was used, as previously described [26,57]. Briefly, U2OS cells representing each condition (with serum, without serum during 24 h, and without serum during 24 h + 15 min serum) were trypsinized, and cells were collected into phosphate-buffer saline (PBS) at a concentration of 2 × 10 6 cells/ml, then added to a fixative of ice-cold 70% ethanol. Cells were fixed for at least 2 h and washed twice with FACS buffer (1× PBS supplemented with 2% (v/v) heat-inactivated, sterile-filtered fatal bovine serum, 1 mM EDTA).
Cells were then incubated in a water bath pre-adjusted to 37°C for 45 min in the dark with 2 μg/ml of Hoechst 33324 (Invitrogen, H3570) diluted in FACS buffer, then incubated to 37°C for 30 min in the dark with 4 μg/ml of Pyronin Y (abcam, ab146350), diluted in FACS buffer. The samples were kept in the dark at 4°C until analyzed using a LSR II flow cytometer (BD Biosciences). Hoechst 33342 and Pyronin Y staining were measured with UV (350 nm) and yellow green (561 nm) lasers, respectively. DNA content was determined by Hoechst 33342 and RNA content was determined by Pyronin Y. Cells in G0 were identified as the population with 2N DNA content and RNA content lower than the level of cells assigned to the G1 phase. Medians of Pyronin Y staining for each condition were compared to assess the increase in mRNA production after serum addition.

DRB treatment
The U2OS cells for both DNA and RNA Pol II images were treated with 100 μM 5,6 dichlorobenzimidazole 1-β-D-ribofuranoside (DRB; Sigma-Aldrich) for transcription inhibition prior to live-cell image acquisition. DRB was diluted in the L-15 (Leibovitz) imaging medium that was supplemented with 10% FBS, DMEM, Glutamax containing 50 μg/ml gentamicin, 1 mM sodium pyruvate, and G418 0.5 mg/ml. The imaging medium was changed with fresh L-15 medium containing DRB and incubated under the microscope for 15 min before imaging.

Cell fixation
U2OS cells were washed with a pre-warmed (37°C) phosphate buffered saline (PBS) and followed by fixation with 4% (vol/vol) paraformaldehyde in PBS for 10-20 min at room temperature. Images were recorded at room temperature in PBS, after washing the cells with PBS (three times, 5 min each).

DNA live-cell imaging
Cells were placed in a 37°C humid incubator by controlling the temperature and CO 2 flow using H201-couple with temperature and CO 2 units. Live chromatin imaging was performed using a DMI8 inverted automated microscope (Leica Microsystems) featuring a confocal spinning disk unit (CSU-X1-M1N, Yokogawa). An integrated laser engine (ILE 400, Andor) was used for excitation with a selected wavelength of 647 nm and 140 mW as excitation power. A 100× oil immersion objective (Leica HCX-PL-APO) with a 1.4 NA was chosen for a high-resolution imaging. Fluorescence emission of the SiR-Hoechst was filtered by a single-band bandpass filter (FF01-650/13-25, Semrock, Inc.). Image series of 150 frames (5 fps), with exposure time of 150 ms per frame, were acquired using Metamorph software (Molecular Devices) and detected using sCMOS cameras (ORCA-Flash4.0 V2) and 1 × 1 binning, with sample pixel size of 65 nm. All series were recorded at 37°C.

RNA pol II live-cell imaging
Image series of 150 frames were recorded with an exposure time of 200 ms using a Nipkow-disk confocal system (Revolution, Andor) featuring a confocal spinning disk unit (CSU22, Yokogawa). A diode-pumped solid-state laser with a single wavelength of a 488 nm (Coherent) at 5-10% laser power was used for excitation of RPB1 fused to Dendra2. A × 100 oil immersion objective (Plan Apo 1.42, Nikon) was used for imaging. The fluorescent emission signal was filtered through an emission filter (ET525/30-25, Semrock, Inc.) and detected at 512/18 nm on a cooled electron multiplying chargecoupled device camera (iXon Ultra 888), with sample pixel size of 88 nm.

Image processing
Denoising Raw images were denoised using non-iterative bilateral filtering [58]. While Gaussian blurring only accounts for the spatial distance of a pixel and its neighborhood, bilateral filtering additionally takes the difference in intensity values into account and is therefore an edge-preserving method. Abrupt transitions from high-to low-intensity regions (e.g., heterochromatin to euchromatin) are not over-smoothed. Images of varying noise levels were treated with a bilateral filter with half-size of the Gaussian bilateral filter window of 5 pixels, the spatial-domain standard deviation value was set to 5 pixels, and the intensity-domain standard deviation was varied from 0.3 to 0.8 for decreasing levels of the signal-to-noise ratio from 26 to 16 dB (SNR = 10log 10 (I 2 /σ 2 ), with the signal power I 2 and the noise variance σ 2 ).

MSD analysis and model selection by using Bayesian inference
In order to carry out a MSD analysis locally, the spatial dependency of the mean squared displacement (MSD) can be written explicitly: where ξ ! r 0 ðtÞ is the position at time t of a virtual particle with initial position r ! 0 , τ = {Δt, 2Δt, …, (N − 1)Δt} are time lags where Δt is the time difference between subsequent images and the average < · > t is taken over time. The resulting MSD is a function of the initial position r ! 0 and the time lag τ.

MSD models
The MSD can be expressed analytically for anomalous diffusion (DA), confined diffusion (DR), and directed motion (V) in two dimensions as where D α is the diffusion constant in units of μm 2 /s α , α is its anomalous exponent, v [μm/s] is its velocity, and R C [μm] is the radius of a sphere within the particle is confined [59]. The case α = 1 is known as free diffusion, 0 < α < 1 corresponds to anomalous diffusion and 1 < α ≤ 2 corresponds to superdiffusion. Strictly speaking, each generalized diffusion constant D α has different units, corresponding to the specific value of α. However, we refer to it as the diffusion constant D throughout the text for simplicity. Additionally to Eqs. (1)-(3), different types of motion can appear overlaying, resulting in a linear combination of the equations above. For example, anomalous motion can be superimposed on an underlying drift and the resulting MSD reads MSD DAV (τ) = MSD DA (τ) + MSD V (τ). We found that anomalous and confined diffusion appears very similar in experimental data and therefore decided in favor for anomalous diffusion to describe our data (Additional file 1: Note S3). The abbreviations used in this study are summarized in Table 1. As experimental data is usually subject to noise, a constant offset ο is added to every model.

MSD model selection
The MSD is calculated for every pixel independently, resulting in a space-and time lagdependent MSD. It is known that living cells can behave largely heterogeneous [3,60]. Ad hoc, it is not known which diffusion model is appropriate. Fitting an MSD curve with a wrong model might result in poor fits and highly inaccurate determination of the mentioned parameters. For this reason, we use a Bayesian inference approach to test different models for any given MSD curve as proposed by Monnier et al. [21]. Given the data Y = {Y 1 , …, Y n } and K model candidates M = {M 1 , …, M K }, each with its own (multidimensional) parameter set θ = {θ 1 , …, θ K }, we want to find the model M k (Y, θ k ) such that the probability that M k (Y, θ k ) describes the data, given the set of models to test, is maximal. By Bayes' theorem, the probability for each model is given by If there is no reason to prefer one model over the other, the prior probability of each model P(M k ) is equal. The parameter set which is used to describe the data, given a fixed model, strongly influences the probability. Therefore, it is crucial to estimate the optimal parameters for every model in order to calculate the model probabilities. The probability that the data Y is observed, given the model M k described by the model function M k (x; θ k ) and any parameter set θ k , is approximated by a general multivariate Gaussian function [61].
where C is the empirical covariance matrix of the data and the prefactor is a normalizing factor. This equation has an intuitive meaning. Assume we test a model M k parametrized by θ k to find out if it describes the data Y. The exponential function consists of the term [Y − M k (x; θ k )], i.e., the residuals of the data and the describing model. If the residuals are small, i.e., the model describes the data well, the exponent is small and the probability P(Y| θ k , M k ) seeks 1. On the other hand, the worse the fit, the greater Table 1 Overview over possible mean squared displacement models the resulting residuals and the probability seeks asymptotically to 0. The factor C −1 accounts for the covariance in the data. The covariance matrix for a set of MSD curves normally shows large values for large time lags as the uncertainty increases and MSD curves diverge. The covariance matrix implicitly introduces a weight to the data, which is small for large variances and large where the data spreads little. This fact avoids cutting of the MSD curve after a specific number of time lags, but instead includes all available time lags weighted by the covariance matrix. The approach is illustrated in Additional file 1: Fig. S2b with the covariance matrix exemplary shown in the inset. In case of uncorrelated errors, non-diagonal elements are zero, but the approach keeps its validity [62] and follows an ordinary least-squares regression.
Given the best estimate of the parameter set for a model, the model and its corresponding parameters are chosen so that their probability to describe the data is max- PðY jθ k ; M k Þ. It has to be stressed that values of the anomalous exponent scatter around 1, but do not assume the value 1 (e.g., Figure 1c, middle panel). This is due to the model selection procedure, selecting the simplest model which is consistent with the data. In the case that the underlying motion is well described by free diffusion, α is inherently set to 1 and classified as free diffusion rather than anomalous diffusion. The descriptions of free diffusion or anomalous diffusion with α = 1 are equivalent, but the free diffusion model contains one parameter less and is therefore preferred leading to "missing" α values close to 1 in the parameter maps and histograms. To carry out the MSD analysis locally, we choose to take the 3 × 3 neighborhood of a pixel, detect possible outliers therein by the interquartile range criterion [63], and calculate the error covariance matrix of the data within the pixel's neighborhood. The restriction to a single pixel and its neighborhood allows us to carry out the MSD analysis of trajectories locally, in contrast to an ensemble MSD in previous studies [18], revealing only average information over many trajectories. The choice of a 3 × 3 window is reasonable with regard to the equivalently chosen filter size in the optical flow estimation. The flow field in this region is therefore assumed to be sufficiently smooth. All calculations, except for the general mixture model analysis, were carried out using MATLAB (MATLAB Release 2017a, The MathWorks, Inc., Natick, Massachusetts, USA) on a 64-bit Intel Xeon CPUE5-2609 1.90 GHz workstation with 64 GB RAM and running Microsoft Windows 10 Professional.

Deconvolution of subpopulations
Regarding the distribution of diffusion constants, an analytical expression can be found assuming that the diffusion constant was calculated from a freely diffusing particle (α = 1) [64]. However, we find anomalous diffusion to a large extent in our data (e.g., Figure 1c, Fig. 2g, Fig. 3f, Fig. 4e, and Fig. 5g), and to our knowledge, an analytical expression cannot be found for distributions of anomalous exponent, radius of confinement and drift velocity. We therefore deconvolved the parameter sets in a rather general manner, for which we use a general mixture model (GMM), a probabilistic model composed of multiple distributions and corresponding weights. We describe each data point as a portion of a normal or log-normal distribution described by respectively. The logarithmic mean m and standard deviation s are related to the mean and standard deviation of the normal distribution via [65].
We consider up to three subpopulations to be found in our data and model the total density estimate as a superposition of one, two or three subpopulations, i.e., the mixture model reads for both normal and log-normal distributions, where to sum goes to 1, 2, or 3, respectively. The variable w k describes the weights for each population (or component), which satisfy 0 ≤ w k ≤ 1 and sum up to unity. The weights of each component are directly proportional to the area of the histogram covered by this component and therefore its existence in the data set.

General mixture model analysis
Let Y = {Y 1 , …, Y n } denote n data points. For the scope of this description, assume Y to be a one-dimensional variable. Further assume that the data cannot be described by a single distribution, but by a mixture of distributions. A deconvolution of the data into subpopulations faces the following problem: Given a label for each data point, denoting the affiliation to a population, one could group corresponding data points and find the parameters of each population separately using a maximum likelihood estimation or other methods. On the other hand, if we had given the model parameters for each population, labels could in principle be inferred from the likelihood of a data point being described by a population or another. The problem can be formulated by Bayes' rule (M indicates model, D indicates data) Here, P(M| D) is the posterior probability of the model given the data, which is the aim to calculate. We assign a data point to the component, which maximizes P(M| D). The probability to observe the data given a model is described by P(D| M), i.e., the likelihood function. P(M) is the prior for the models to be chosen from. In our case, we have no prior beliefs on the models (all models are equally likely) such that P(M) is uniform. Lastly, the probability P(D) does not depend on the models and can therefore be dropped.
Neither labels, that is P(M| D), nor model parameters and weights are known a priori. The problem can be approached by an expectation-maximization (EM) scheme: Without any prior beliefs about the data distribution, one starts with a simple estimate of model parameters, e.g., a k-means clustering estimate and iterates subsequently between the two following steps until convergence: Expectation step: Calculation of the probability that the component with the current parameter estimate generated the sample, i.e., P(D| M).
Maximization step: Update the current parameter estimate for each component by means of a weighted maximum likelihood estimate, where the weight is the probability that the component generated the sample.
We illustrate the results of the EM algorithm exemplary in Additional file 1: Fig. S4. From the input data (Additional file 1: Fig. S4a), represented as histogram, both the likelihood P(D| M) (Additional file 1: Fig. S4b) and the posterior (Additional file 1: Fig. S4c) are obtained. The sum of subpopulations corresponds to the overall probability distribution (shown in black) with different model parameters and weights found by maximizing the likelihood function. The posterior describes the probability of data points to fall under each population, i.e., ∑ k P(M k | D) = 1. The data points are assigned to those population, for which P(M k | D) is maximum, resulting in labeled data. The labels are subsequently mapped in two dimensions, visualizing spatial correspondence of slow, intermediate and fast subpopulations (Additional file 1: Fig. S4d). The GMM analysis is carried out using the pomegranate machine learning package for probabilistic modeling in Python [66].

Selection of subpopulations by the Bayesian information criterion (BIC)
A priori, it is not unambiguously clear from how many populations the data is sampled and which form the subpopulations take. We therefore assess the suitability of each model by means of the Bayesian Information Criterion (BIC), which is calculated by [67].
whereL is the maximum likelihood of the maximum likelihood estimation (MLE) estimate, p denotes the number of parameters in the model and n is the number of data points used for the fit. Among a family of models, the one with the lowest BIC is considered to describe the data best, taking into account competing complexity of models. A large likelihood of a model favors it to describe the data well, while on the other hand the model is penalized if many parameters are involved in the model by the second term in Eq. (4). Therefore, the BIC prevents overfitting. In order to judge which model is appropriate for our data, we tested all considered models for each histogram and assessed the optimal model by means of the BIC. The fraction of all histograms which described best by one of the six models considered is given in Table 2. Based on the objective judgment of the fit using the BIC, we chose for each parameter the model which best describes the largest fraction of histograms ( Table 2, bold cells).