RNAs competing for microRNAs mutually influence their fluctuations in a highly non-linear microRNA-dependent manner in single cells

Background Distinct RNA species may compete for binding to microRNAs (miRNAs). This competition creates an indirect interaction between miRNA targets, which behave as miRNA sponges and eventually influence each other’s expression levels. Theoretical predictions suggest that not only the mean expression levels of targets but also the fluctuations around the means are coupled through miRNAs. This may result in striking effects on a broad range of cellular processes, such as cell differentiation and proliferation. Although several studies have reported the functional relevance of this mechanism of interaction, detailed experiments are lacking that study this phenomenon in controlled conditions by mimicking a physiological range. Results We used an experimental design based on two bidirectional plasmids and flow cytometry measurements of cotransfected mammalian cells. We validated a stochastic gene interaction model that describes how mRNAs can influence each other’s fluctuations in a miRNA-dependent manner in single cells. We show that miRNA–target correlations eventually lead to either bimodal cell population distributions with high and low target expression states, or correlated fluctuations across targets when the pool of unbound targets and miRNAs are in near-equimolar concentration. We found that there is an optimal range of conditions for the onset of cross-regulation, which is compatible with 10–1000 copies of targets per cell. Conclusions Our results are summarized in a phase diagram for miRNA-mediated cross-regulation that links experimentally measured quantities and effective model parameters. This phase diagram can be applied to in vivo studies of RNAs that are in competition for miRNA binding. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1162-x) contains supplementary material, which is available to authorized users.


Absolute quantification of mean number of exogenous molecules per cell
In the following we describe step by step the procedure we followed to obtain the estimate of the mean number of exogenous RNA molecules per cell in the cases with and without pre-miR20a. The entire procedure was repeated three times, for three different biological replicates (that is, three different cotransfection experiments with and without pre-miR20a). In Table 1 in the main text we reported the means and errors over these biological replicates.

Cell sorting
We sorted the cotransfected cells into three groups depending on the level of eYFP fluorescence. To define the three intervals of fluorescence we identified the threshold at the cell sorter plotting mCherry vs eYFP for the cells cotransfected with the pre-miR20a. We then sorted the cells for eYFP below threshold (i.e. eYFP-low sample), around threshold (i.e. eYFP-medium sample) and above threshold (eYFP-high sample), see Figure S5 for an example. For the same eYFP intervals, we sorted as well the cells without pre-miR20a.

RNA extraction and qRT-PCR
Total RNA was extracted from each sorted cell subpopulation using Trizol reagent (Ambion Life Technologies, USA) in combination with Pure Link RNA Mini Kit (Ambion). 1µg of total RNA was treated with RQ1 DNAse (Promega) and reverse transcribed, after heat-inactivation of the enzyme, using M-MLV reverse transcriptase and random primers (Life Technologies). Quantitative RT-PCR was performed on a 7300 Real Time PCR System (Applied Biosystems) using specific primers for eYFP, mCherry, mCerulean and mKOrange. 18S probe (Life Technologies) was used as internal loading control. In particular, qRT-PCR was performed (i) on cDNA of the sorted cotransfected cells with primers for the exogenous targets and for the endogenous ribosomal RNA 18S and (ii) on DNA standards spiked into untransfected cell cDNAs (same protocol used in [1]). The DNA standards were different dilutions of amplicons for the exogenous targets (eYFP, mCherry, etc...), each with its corresponding pair of primers. The use of these standards allowed the definition of a calibration curve for each fluorophore, directly linking threshold cycles and mean number of molecules per cell [2].

Standard curves
DNA standards were used at amounts ranging from 0.5 pg to 0.005 f g per reaction.
For each pair of primers (and cDNA sample) we had three technical replicates (so, three technical replicates for the eYFP-low cDNA sample with mCherry primers, three technical replicates for the eYFP-low cDNA sample with eYFP primers, etc...). The different dilutions of amplicons allowed the definition of a standard curve for each amplicon that directly links the mass pre-PCR of the amplicon and the threshold cycle (C T ) of the qRT-PCR reactions [2]. The C T were in the exponential phase of PCR amplification (see [2,3]). In Figure S6 we show an example of standard curve for the mCherry amplicon.

Internal control and C T correction
We used the 18S rRNA amplification as an internal loading control, to correct for the possible differences in total RNA input and retro-transcription efficiency from one cDNA sample to the other. As previously described in [3], we first checked the stability of this control in our setting by comparing the mean 18S C T (as 2 −C T ) between the experiment without pre-miR20a (1.8719 × 10 −5 ) and the experiment with pre-miR20a (1.7374 × 10 −5 ). The fold-change between the internal controls is 0.93 ∼ 1, thus confirming that the transfection of pre-miR20a does not change the expression of the 18S. For each experiment we chose the lowest 18S C T as a reference. We then evaluated the difference between the lowest 18S C T and the other 18S C T s. Such differences are the correction terms we add to the C T s ("corrected C T ") of the samples we want to quantify (with primers for mCherry, eYFP, etc...). The choice of the lowest 18S C T is to avoid underestimating the average number of molecules/cell. However, using the highest or the mean among all the 18S C T s did not change significantly the order of magnitude of the results, indicating homogeneous loading and retro-transcription efficiency across our samples.

Final quantification
After obtaining corrected C T s for all the samples, we use the calibration curves to know the corresponding pre-PCR mass for the amplicons [2,1]. This is the mass corresponding to 1µg of RNA. Since we know the total mass of RNA extracted, the molecular weight of the amplicons and the number of cells per sample, it is then straightforward to evaluate the mean amount of exogenous RNA molecules per cell: where m is the RNA pre-PCR mass in 1µg, d is the dilution factor of the cDNA sample, µ are the corrected total µg of extracted RNA, M W is the molecular weight of the amplicon (the product of all these quantities is the number of moles of that particular amplicon in the sample), N A is the Avogadro number and N is the number of cells in the original sorted subpopulation of cells. µ is defined as µ = C × total RNA µg, where C is the ratio (total RNA µg/N ) max total RNA µg/N .
C is thus the ratio between the mean amount of total RNA per cell in the sample where this quantity is the highest and the other samples. This correction takes into account possible technical pipetting errors (anyhow small, since C ∼ 1 for all the samples). Here below in Table 1 we show as example of the estimate of the mCherry mean amount of molecules per cell in the "eYFP medium" sample for one biological replicate (see also Figure S6). Consider however, that mean values and standard deviations presented in Table 1 in the main text are computed over the different biological replicates (that is, over different co-transfection experiments of the same constructs). Indeed for each experiment, in which the cells have been sorted in the intervals "eYFP low", "eYFP medium" and "eYFP high", we have three technical replicates per sample. The technical errors, as can be noticed from the small dispersion of the three estimated number of molecules per cell in Table 1 in SI (here above), is much smaller than the biological one (that is, the error we reported in Table 1 of the main text). In the following we report a summary table of the exogenous transcripts quantification for the biological replicates (mediated over the technical ones). 2 Absolute quantification of mean number of miR-20a molecules per cell To quantify the mean number of mature endogenous miR-20a molecules per cell we adopted a spike-in procedure following the guidelines described in [4]. In particular, we lysed independent samples containing a fixed amount of cells (∼ 2 × 10 6 ) with QIAzol lysis reagent (QIAGEN) and added to lysates increasing amounts of synthetic mature miR-20a (UAAAGUGCUUAUAGUGCAGGUAG). Total RNA was then extracted from each sample using miRNeasy mini kit (QIAGEN). Using both TaqMan MicroRNA Assay for hsa-miR-20a (Thermofisher) and RNU44 TaqMan MicroRNA control Assay (Thermofisher), we reverse transcribed 10ng of total RNA per sample and performed quantitative RT-PCR on a 7300 Real Time PCR System (Applied Biosystems). The RNU44 control was used as internal loading control to correct for the possible differences in total RNA input and retro-transcription efficiency from one cDNA sample to the other. This control was used exactly as we did with S18 for the quantification of exogenous target molecules. The corrected C T s for all the 10 samples were then directly linked to the mean amount of miR-20a molecules per cell (see Figure S7). The changing in slope of the resulting curve corresponds to the amount of endogenous miR-20a molecules per cell (1250 molecules per cell in our case). We performed as well qRT-PCR for two samples of cell, one transfected with 100 nM of pre-miR20a 48 hours before the RNA extraction and the other untransfected. We then used the untransfected sample as reference to measure the δδC T between its C T and the C T of the pre-miR transfected sample. The pre-miR transfected sample shows a 163 fold increase in the mature miR-20a with respect to the untransfected one.

Cotransfection experiments in HeLa TeT-ON cell line
To show that our results are unbiased with respect to the cell line we repeated part of the experiments in HeLa cells. We performed cotransfection experiments with and without 100 nM of pre-miR20a for the unregulated case (0 MRE on both mCherry and mCerulean) and for reporter with 4 MRE on mCherry combined with mCerulean with 0,1,4,7 MRE. Cells were plated in 6 well multiwells and induced with doxicycline 24 hours before transfection. We used Effectene (QIAGEN) as transfection reagent and measured the florescence on a CyanADP (Beckman Coulter) flow cytometer.
Results, shown in Figure S8 and S9, summarize what we found in HEK 293 Tet-OFF cell line. That is, microRNA-mediated crossregulation is context dependent but there is an optimal range of parameters for which crossregulation is maximal among targets. In particular, it arises (i) at the level of mean protein concentrations (see the threshold effect in Figure S8a, the modulation of the threshold when increasing the miR20a pool via premiR transfection in Figure S8b and the fold repression in both cases in Figure  S8c); (ii) at the level of fluctuations (see Figure S9ab, where the mCherry total noise is modulated via the expression of mCerulean with an increasing number of MRE) and correlations (the ratio between the Pearson correlation coefficient in presence of miRNA mediated regulation and the unregulated case is maximal for eYFP values close to the threshold, see Figure S9c).

Model definition
We describe with a stochastic model the miRNA-target interactions. The system can be described by 5 interacting variables (1 miRNA, 2 mRNAs, 2 proteins) indicated respectively as s, r 1 , r 2 , p 1 , p 2 , which represent the (integer) copy number of molecules present in the cell at any given time t. Using this notation, the probability P of finding in a cell exactly s, r 1 , r 2 , p 1 , p 2 molecules at any time t is governed by the following master equation: where P := P r1,r2,p1,p2,s and, for example P p2+1 is a short hand notation for P r1,r2,p1,p2+1,s . In Eq. (2) k ri , k s and k pi (with i = 1, 2) are the transcription rates of mRNAs r i and miRNA s and the translation rates for proteins p i respectively. g ri , g s and g pi (with i = 1, 2) are their degradation rates. g i (with i = 1, 2) are the effective association rates for the miRNA s and the mRNA r i . Finally the parameter α measures the catalyticity of the interaction, i.e. the fraction of miRNA molecules that are recycled after the interaction with their targets. This master equation is not amenable for analytic solutions and approximate methods have been proposed [5,6] to obtain accurate quantitative predictions. Following previous work [6] we obtained the approximated expression for mean values, standard deviations and Pearson correlation coefficients (sketched in Figure 1b-d in the main text), which we describe in the next paragraphs. The model can be easily generalized for a system of M miRNAs s 1 , . . . , s M , N mRNAs r 1 , . . . , r N , and N proteins p 1 , . . . , p N . In the following, we will assume that each mRNA is coding for a distinct protein. However, within this framework, we could easily describe, for instance, non-coding RNAs. In other terms we are not bounded to have the number of proteins N equal to that of the different mRNA species. This generalized model reads: Provided that the values of all parameters of the model (e.g. rates of transcription, degradation, etc.) are known, it is in principle possible to deal with a large number of chemical species, both in terms of stochastic simulation a la Gillespie, or in term of controlled mathematical approximations, that we are going to outline in the following. A proof of the principle of how the model behaves in the more realistic case of 10 mRNA targets regulated by 10 microRNa in blocks overlapping blocks, has been already extensively discussed (see in particular Figure 4 in [6] and the related discussion in the main text). In the following we will also give a network about the competitive titration mechanism at large scale.
In the following analysis, unless otherwise stated, we will just consider the specific case N = 2, M = 1 which provides a reasonable model for experimental setup, but, again, more complicated mRNA-microRNA interaction architectures could be analyzed using the same techniques.

Independent molecular-species approximation
As long as one is interested in mean values of the observables at steady state, a good approximation is the so-called independent molecular species approximation, also known as mean-field approximation which amounts to assume that the multivariate probability distribution P is factorized among the different chemical species: P ind (r 1 , r 2 , p 1 , p 2 , s) := P r1 (r 1 )P r2 (r 2 )P p1 (p 1 )P p2 (p 1 )P s (s) .
Plugging this factorized functional form into Eq. (2), and computing the first moments at steady state (i.e. in the limit t → ∞), one obtains a system of second order equations in the five variables which is easily solved numerically for any value of the model parameters. The main limitation of the factorized ansatz in Eq. (4) is that, although giving a fairly accurate prediction of the mean values of the different chemical species across a wide range of parameters, the very simple structural form of Eq. (4) cannot predict their statistical correlations. For example the Pearson correlation coefficients under the independent chemical species approximation are always zero by definition. We used the independent chemical species approximation to fit the model parameters through the following functional law: where x is the transcriptional activity (in terms of eYFP fluorescent value), θ = k S /α and λ = g R g S /(αg), as it can be deduced by the detailed derivation in [6]. The further parameter m is introduced here to take into account the fact that the intrinsic brightness depends on the fluorescent protein type in such a way that, for instance, even for the construct with no miRNA binding sites the straight line (c.f. Figure  2 in the main text) has a slope different than one. It can be shown perturbatively that in the limit of interaction strengths of the same order of magnitude, the same functional form displayed in equation (5) holds also in the multiple target case. From the functional form of Eq. (5), it is also clear that f (0|λ, θ, m) = 0 for all value of the parameters. We thus manually rescaled the fluorescent values so that each curve passes through the origin. Interestingly the same constant values for the rescaling hold for any combination of the constructs. Of the three fit parameters defined in Eq. (5), the last parameter m has no biological interpretation, and accounts only for the different intrinsic brightness of the different fluorescent proteins. The parameter θ = k S /α is the ratio between miR20a transcription rate and the turnover α. Unfortunately for both parameters very little is known in the context of our experiment (293-HEK cell line) and the result of our fit is given in terms of rescaled fluorescent units, rather than in terms of rate of transcription activity. The second parameter λ = g R g S /(αg), with g = k + γ/(k − + γ), involves a combination of biochemical parameters for miRNA/mRNA complex formation (association and dissociation rates k + , k − and degradation rate γ) which are known only for some miRNAs in-vitro. Moreover, the validity of these estimates in-vivo is often debated.

Gaussian Approximation
To overcome the above mentioned limitations and to take under control correlations across chemical species, a very simple yet accurate approximation scheme is the socalled Gaussian one [6]. Note that, following this approximation scheme, copy numbers will not be bound to be integer numbers as it was for the master equation defined in Eq. (2). As we will see in the following, and has already been extensively discussed in [6], this does not affect the good quality of the approximation. Let us denote with X the five-dimensional vector of components r 1 , r 2 , p 1 , p 2 , s respectively. We can thus make the following multivariate Gaussian ansatz for the probability distribution function of X: which in our 5−dimensional case depends on 20 parameters: 5 numbers specify the mean µ and 15 the covariance matrix C (which is symmetric). The key property that makes Eq.
(2) very difficult to solve analytically is that, as shown in detail in [6], it generates a whole hierarchy of moments such that the lower moments are expressed in terms of higher order moments and no moment-closure scheme can be utilized. Multivariate Gauss distributions, on the other hand, have the useful property that all moments can be expressed as a linear combination of just the first and the second moments. As an illustrative example, defining µ i = E(X i ) and E(X i X j )−E(X i )E(X j ) for i = j, we could consider the generic third moment of the distribution defined in Eq.
where C xy is the x-row y-column element of the covariance matrix C.
A systematic procedure to compute µ and C requires to define the time dependent moment generating function: Plugging the above equation in the master equation we get the following second order partial differential equation: The moment generating function has the following properties: By inserting the previous definitions and imposing the Gaussian marginalization conditions mentioned above, we obtain at the steady state a system of 20 equations in 20 unknown that we can numerically solve to get the values for µ and C. As already shown in [6], this approximation turns out to reproduce fairly accurately both noise (in terms of coefficient of variation CV) of single targets and Pearson correlation coefficient between targets, when compared with the numerical values obtained through Gillespie algorithm.

A network perspective
Considering that typically a given miRNA might have hundreds endogenous targets, a puzzling issue of miRNA mediated crosstalk is to what extent a reasonable fold change (say a two to seven fold increase) of one of the targets could produce an observable effect on any other target. Such a situation is observed in our experiment where the transcripts coded by our exogenous constructs are actually competing with the rest of the endogenous targets. This issue was already quantitatively addressed in [6], and here we just report the main conclusions. Let us consider a system where one miRNA s is interacting with 100 targets t 1 , . . . , t 100 . We are interested in the effect on the system at steady-state of a k-fold increase of transcription rate of target t 1 on a randomly chosen target, say t 2 . The overall scenario is dictated by the relative stoichiometry of the system, and in particular on the share of free miRNA s (i.e. number of miRNA molecules which are not bound to their targets). Three scenarios are in principle possible: (i) The number of free targets is larger then the total number of miRNA (s = 0, all miRNAs are bound to a target and there is a part of the targets which is free).
(ii) All targets t i are bound to a miRNA but there is still a share of free miRNA (s 0).
(iii) A marginal quasi-equimolarity condition where the number of targets equals on average the total number of miRNAs (s ∼ 0).
Without resorting to mathematical modeling, one can understand that, for different reasons, under scenarios (i) and (ii) very little or null crosstalk between targets is to be expected. In the first case (excess of targets) an increase in concentration of the first target will indeed be neutral to the second as, being all miRNAs entirely sequestered, the increase will not able to release further miRNAs from the second one. This simple intuition can be made more quantitative by observing Figure S10, where we display the mean number of molecules for t 1 , t 2 and s as a function of k r1 (i.e. the transcription rate of target 1) obtained from direct integration of Eq.(2). One can see how the value of t 2 is saturating to its constitutive expression value (i.e. the value of the transcript in the absence of miRNAs). Given the convexity of the two curves, it is also clear that, well above threshold, an increase δ 1 on t 1 will induce a much smaller δ 2 on t 2 .
In the second case (excess of miRNAs), the system is so repressed that, well below threshold, any fluctuations of any of the targets would be immediately down-regulated by the excess amount of miRNAs. Note also that, in the case of our experiment, the (unbounded) share of target t 1 and t 2 can be considered as a proxy for the quantity of the fluorescent protein they code for. Under condition (ii), as shown in the leftmost part of Figure S10, virtually no protein can be translated, and therefore no crosstalk effect could be observed.
Things become more interesting in the quasi-equimolarity condition (iii), where due to the strong non-linear behavior of our system at threshold, crosstalk becomes indeed possible. As shown in [6] it turns out that a 2-fold increase in the transcription rate of t 1 causes a 1.3-fold increase on t 2 , whereas a 7-fold increase on t 1 would cause a 4.5-fold increase on t 2 (but of course the same would be true for any other target).

Distribution of targets with multiple MREs
In our two reporters system, we have considered, in analogy with [1], different numbers of MREs (0,1,4,7) for the same miRNA family as a proxy of the miRNA-target interaction strength (encoded in the rates g 1 , g 2 in our model). From now on we define as MRE multiplicity, the number of times an MRE of the same miRNA family appears on the same transcript. A natural question is how typical is for endogenous target molecules to host MREs with high multiplicity (e.g. with multiplicity between 4 and 7 as in our experimental setup).
To answer to this question we analyzed the database Target Scan Human (Predicted Conserved Targets) [7] and we simply counted on a per-target base the multiplicity of MREs. A summary of the results is displayed in Figure S11 where we show that although a large majority of targets have multiplicity 1 (295.675 MREs out of 369.988), MREs with with larger multiplicity are not so rare (for instance 5500 targets turn out to show a multiplicity between 4 and 7). Note also that typically targets have MREs with different multiplicities for different miRNA families. For instance the MRE with the larger multiplicity in the database is hosted by transcript ONECUT2 and has 20 distinct binding sites for miR-8 and 10 distinct binding sites for miR-320/320abcd.
These results should be taken, as it is often the case with bioinformatic predictions, cum grano salis as not all MREs might be simultaneously accessible depending on the biological context (e.g. cell type in which the transcript is expressed) or the state of occupancy of neighboring MREs.     Figure S5: Example of cell sorting for cells co-transfected with pre-miR20a. The scatter plot (eYFP on the x-axis and mCherry on the y-axis) shows how the cells were sorted chosing three intervals based on the eYFP fluorescence level. Each dot is a single cell. The three eYFP intervals (low, medium and high) have been chosen on the basis of the threshold location (below threshold, around threshold and above threshold respectively).

CT1
CT2 CT3 m1 m2 m3 Figure S6: mCherry standard curve obtained from different dilution of the mCherry amplicon. DNA standards were used at amounts ranging from 0.5 pg to 0.005 f g per reaction. The different dilutions of amplicons allowed the definition of a standard curve that directly links the mCherry mass pre-PCR and the threshold cycle (CT ) of the qRT-PCR reactions. CT 1, CT 2 and CT 3 are the corrected CT s for the three technical replicates reported in Table 1 in SI together with their corresponding pre-PCR mass m1, m2 and m3.   Purple and cyan circles in legends represent the plasmids coding for mCherry and mCerulean fluorophores, respectively. In (b) are shown experiments performed with 100 nM of pre-miR-20a transfected together with the exogenous targets. (c) The Pearson ratio is measured for three different values of eYFP basal expression (below, around and above threshold) both in absence and presence of pre-miR. As predicted by the model, the correlation is maximal around the threshold and in this case could be even 6 fold higher than in the unregulated case. Blue-delimited areas are regions whose Pearson ratio (i.e. ratio of Pearson coefficient between mCherry and mCerulean possessing different MRE to the same measure in the absence of MRE) is statistically relevant (p-values < 0.01) with respect to the corresponding unregulated case. the mean number of molecules of targets t1, t2 and miRNA s as a function of the target t1 transcription rate. We display the trend of the system upon varying the transcription rate of t1 keeping fixed all remaining parameters of the system. In right tail of t1 and t2 curves, we show that an increase δ1 on t1 will induce a much smaller increase δ2 on t2. Curves are obtained by direct integration of Eq.(2).