SCALE: modeling allele-specific gene expression by single-cell RNA sequencing

Allele-specific expression is traditionally studied by bulk RNA sequencing, which measures average expression across cells. Single-cell RNA sequencing allows the comparison of expression distribution between the two alleles of a diploid organism and the characterization of allele-specific bursting. Here, we propose SCALE to analyze genome-wide allele-specific bursting, with adjustment of technical variability. SCALE detects genes exhibiting allelic differences in bursting parameters and genes whose alleles burst non-independently. We apply SCALE to mouse blastocyst and human fibroblast cells and find that cis control in gene expression overwhelmingly manifests as differences in burst frequency. Electronic supplementary material The online version of this article (doi:10.1186/s13059-017-1200-8) contains supplementary material, which is available to authorized users.


Background
In diploid organisms, two copies of each autosomal gene are available for transcription, and differences in gene expression level between the two alleles are widespread in tissues [1][2][3][4][5][6][7]. Allele-specific expression (ASE), in its extreme, is found in genomic imprinting, where the allele from one parent is uniformly silenced across cells, and in random X-chromosome inactivation, where one of the two X-chromosomes in females is randomly silenced. During the past decade, using single-nucleotide polymorphism (SNP)-sensitive microarrays and bulk RNA sequencing (RNA-seq), more subtle expression differences between the two alleles were found, mostly in the form of allelic imbalance of varying magnitudes in mean expression across cells [8][9][10][11]. In some cases such expression differences between alleles can lead to phenotypic consequences and result in disease [3,[12][13][14]. These studies, though revelatory, were at the bulk tissue level, where one could only observe average expression across a possibly heterogeneous mixture of cells.
Recent developments in single-cell RNA sequencing (scRNA-seq) have made possible the better characterization of the nature of allelic differences in gene expression across individual cells [6,15,16]. For example, recent scRNA-seq studies estimated that 12-24% of the expressed genes are monoallelically expressed during mouse preimplantation development [2] and that 76.4% of the heterozygous loci across all cells express only one allele [17]. These ongoing efforts have improved our understanding of gene regulation and enriched our vocabulary in describing gene expression at the allelic level with single-cell resolution.
Despite this rapid progress, much of the potential offered by scRNA-seq data remains untapped. ASE, in the setting of bulk RNA-seq data, is usually quantified by comparing the mean expression level of the two alleles. However, due to the inherent stochasticity of gene expression across cells, the characterization of ASE using scRNA-seq data should look beyond mean expression. A fundamental property of gene expression is transcriptional bursting, in which transcription from DNA to RNA occurs in bursts, depending on whether the gene's promoter is activated (Fig. 1a) [18,19]. Transcriptional bursting is a widespread phenomenon that has been observed across many species, including bacteria [20], yeast [21], Drosophila embryos [22], and mammalian cells [23,24], and is one of the primary sources of expression variability in single cells. Figure 1b illustrates the expression across time of the two alleles of a gene. Under the assumption of ergodicity, each cell in a scRNA-seq sample pool is at a different time in this process, implying that, for each allele, some cells might be in the transcriptional "ON" state, whereas other cells are in the "OFF" state. While in the ON state, the magnitude and length of the burst can also vary across cells, further complicating analysis. For each expressed heterozygous site, a scRNA-seq experiment gives us the bivariate distribution of the expression of its two alleles across cells, allowing us to compare the alleles not only in their mean, but also in their distribution. In this study, we use scRNA-seq data to characterize transcriptional bursting in an allele-specific manner and detect genes with allelic differences in the parameters of this process. Kim and Marioni [25] first studied bursting kinetics of stochastic gene expression from scRNA-seq data, using a Beta-Poisson model, and estimated the kinetic parameters via a Gibbs sampler. In this early attempt, they assumed shared bursting kinetics between the two alleles and modeled total expression of a gene instead of ASE. Current scRNA-seq protocols often introduce substantial technical noise (e.g., gene dropouts, amplification and sequencing bias; Additional file 1: Figure S1) [26][27][28][29][30] and this is largely ignored in Kim and Marioni [25] and another recent scRNAseq study by Borel et al. [17], where, in particular, gene dropout may have led to overestimation of the pervasiveness of monoallelic expression (ME). Realizing this, Kim et al. [31] incorporated measurements of technical noise from external spike-in molecules in the identification of stochastic ASE (defined as excessive variability in allelic ratios among cells) and concluded that more than 80% of stochastic ASE in mouse embryonic stem cells is due to scRNA-seq technical noise. Kim et al.'s analysis was restricted to the identification of random ME and did not consider more general patterns of ASE, such as allele-specific transcriptional bursting.
scRNA-seq also enables us to quantify the degree of dependence between the expression of the two alleles. A previous RNA fluorescence in situ hybridization (FISH) experiment fluorescently labeled 20 genes in an allelespecific manner and showed that there was no significant deviation from independent bursting between the two alleles [32]. A recent scRNA-seq study of mouse cells through embryonic development [2] produced similar conclusions on the genome-wide level: they modeled transcript loss by splitting each cell's lysate into two fractions of equal volume and controlling for false discoveries by diluting bulk RNA down to the single-cell level. Their results suggest that, on the genome-wide scale, assuming both alleles share the same bursting kinetics, the two alleles of most genes burst independently. Deviation from the theoretical curve in Deng et al. [2] for independent bursting with shared allele-specific kinetics, however, can be due to not only dependent bursting, but also different bursting kinetics.
In this study, we develop SCALE (Single-Cell ALlelic Expression), a systematic statistical framework to study ASE in single cells by examining allele-specific transcriptional bursting kinetics. Our main goal is to detect and characterize differences between the two alleles in their expression distribution across cells. As a by-product, we also quantify the degree of dependence between the expression of the two alleles. SCALE is comprised of three steps. First, an empirical Bayes method determines, for each gene, whether it is silent, monoallelically expressed, or biallelically expressed based on its allele-specific counts across cells (Fig. 1c). Next, for genes determined Fig. 1 Allele-specific transcriptional bursting and gene categorization by single-cell ASE. a Transcription from DNA to RNA occurs in bursts, where genes switch between the "ON" and the "OFF" states. k on , k off , s, and d are activation, deactivation, transcription, and mRNA decay rate in the kinetic model, respectively. b Transcriptional bursting of the two alleles of a gene give rise to cells expressing neither, one, or both alleles of a gene, sampled as vertical snapshots along the time axis. Partially adapted from Reinius and Sandberg [6]. c Empirical Bayes framework that categorizes each gene as silent, monoallelic and biallelic (biallelic bursty, one-allele constitutive, and both-alleles constitutive) based on ASE data with single-cell resolution to be biallelic bursty (i.e., both alleles have zero expression level in some but not all cells), a Poisson-Beta hierarchical model is used to estimate allele-specific transcriptional kinetics while accounting for technical noise and cell size differences. Finally, resampling-based testing procedures are developed to detect allelic differences in transcriptional burst size or burst frequency and identify genes whose alleles exhibit non-independent transcription.
In silico simulations are conducted to investigate estimation accuracy and testing power. The stringency of model assumptions, and the robustness of the proposed procedures to the violation of these assumptions, will be discussed as they are introduced. Using SCALE, we reanalyze the scRNA-seq data for 122 mouse blastocyst cells [2] and 104 human fibroblast cells [17]. The mouse blastocyst study initially found abundant random ME generated by independent and stochastic allelic transcription [2]; the human fibroblast study reported that 76.4% of the heterozygous loci displayed patterns of ME [17]. Through proper modeling of technical noise, our re-analysis of these two datasets brings forth new insights: While for 90% of the bursty genes there are no significant deviations from the assumption of independent allelic bursting and shared bursting kinetics, the remaining bursty genes show different burst frequencies by a cis-effect and/or non-independent bursting with an enrichment in coordinated bursting. Collectively, we present a genome-wide approach to systematically analyze expression variation in an allele-specific manner with single-cell resolution.
SCALE is an open-source R package available at https://github.com/yuchaojiang/SCALE. Figure 2 shows an overview of the analysis pipeline of SCALE. We start with allele-specific read counts of endogenous RNAs across all profiled single cells. An empirical Bayes method is adopted to classify expression of genes into monoallelic, biallelic, and silent states based on ASE data across cells. SCALE then estimates allele-specific transcriptional bursting parameters via a hierarchical Poisson-Beta model, while adjusting for technical variabilities and cell size differences. Statistical testing procedures are then performed to identify genes whose two alleles have different bursting parameters or burst non-independently. We describe each of these steps in turn.

Methods overview
Gene classification by ASE data across cells SCALE first determines for each gene whether its expression is silent, paternal/maternal monoallelic, or biallelic. Figure 1c outlines this categorization scheme. Briefly, for each gene, each cell is assigned to one of four categories corresponding to scenarios where both alleles are off (∅), only the A allele is expressed ( A ), only the B allele is expressed ( B ), and both alleles are expressed ( AB ). An expectation-maximization (EM) algorithm is implemented for parameter estimation. This classification accounts for both sequencing depth variation and sequencing errors. The assignment of the gene is then determined based on the posterior assignments of all cells. For example, if all cells are assigned to ∅ f g, the gene is silent; if all cells are assigned to either ∅ f g or A f g, the gene has ME of the A allele; if all cells are assigned to either ∅ f g or B f g, the gene has ME of the B allele; if both the A and B allele are expressed in the cell pool, then the gene is biallelically expressed. Refer to "Methods" for detailed statistical methods and the EM algorithm.
Through simulation studies (see the "Assessment of estimation accuracy and testing power" section), we show that bursting parameters can only be stably estimated for bursty genes, that is, genes that are silent in a non-zero proportion of cells. Therefore, for biallelic bursty genes, allele-specific transcriptional kinetics are modeled through a Poisson-Beta distribution with adjustment for technical noise, see next section. For silent, monoallelically expressed, or constitutively expressed genes, there is no way nor need to estimate bursting kinetics for both alleles. Overview of analysis pipeline of SCALE. SCALE takes as input allele-specific read counts at heterozygous loci and carries out three major steps: (i) gene classification using an empirical Bayes method, (ii) estimation of allele-specific transcriptional kinetics using a Poisson-Beta hierarchical model with adjustment of technical variability and cell size, (iii) testing of the two alleles of a gene to determine if they have different bursting kinetics and/or nonindependent firing using a hypothesis testing framework

Allele-specific transcriptional bursting
When studying ASE in single cells, it is critical to consider transcriptional bursting due to its pervasiveness in various organisms [20][21][22][23][24]. We adopt a Poisson-Beta hierarchical model to quantify allele-specific transcriptional kinetics while accounting for dropout events and amplification and sequencing bias. Here, we start by reviewing the relevant literature with regard to transcriptional bursting at the single-cell level.
A two-state model for gene transcription is shown in Fig. 1a, where genes switch between the ON and OFF states with activation and deactivation rates k on and k off . When the gene is in the ON state, DNA is transcribed into RNA at rate s while RNA decays at rate d . A Poisson-Beta stochastic model was first proposed by Kepler and Elston [33]: where Y is the number of mRNA molecules and p is the fraction of time that the gene spends in the active state, the latter having mean k on = k on þ k off À Á . Under this model, 1=k on and 1=k off are the average waiting times in the inactive and active states, respectively. Burst size, defined as the average number of synthesized mRNA per burst episode, is given by s=k off , and burst frequency is given by k on . Kepler and Elston [33] gave detailed analytical solutions via differential equations. Raj et al. [23] offered empirical support for this model via singlemolecule FISH on reporter genes. Since the kinetic parameters are measured in units of time and only the stationary distribution is assumed to be observed (e.g., when cells are killed for sequencing and fixed for FISH), the rate of decay d is set to one [15]. This is equivalent to having three kinetic parameters fs; k on ; k of f g, each normalized by the decay rate d. Kim and Marioni [25] applied this Poisson-Beta model to total gene-level transcript counts from scRNA-seq data of mouse embryonic stem cells. While they found that the inferred kinetic parameters are correlated with RNA polymerase II occupancy and histone modification [25], they didn't address the issue of technical noise, especially the dropout events, introduced by scRNAseq. Failure to account for gene dropouts may lead to biased estimation of bursting kinetics.
Furthermore, since the transitions between active and inactive states occur separately for the two alleles, when ASE data are available, it seems more appropriate to model transcriptional bursting in an allele-specific manner. The fact that transcriptional bursting occurs independently for the two alleles has been supported by empirical evidence: case studies based on imaging methods have suggested that the two alleles of genes are transcribed in an independent fashion [34,35]; using scRNA-seq data, Deng et al. [2] showed that the two alleles of most genes tend to fire independently with the assumption that both alleles share the same set of kinetic parameters. These findings, although limited in scale or relying on strong assumptions, emphasize the need to study transcriptional bursting in an allele-specific manner.
Technical noise in scRNA-seq and other complicating factors Additional file 1: Figure S1 outlines the major steps of the scRNA-seq protocols and the sources of bias that are introduced during library preparation and sequencing. After the cells are captured and lysed, exogenous spike-ins are added as internal controls, which have fixed and known concentrations and can thus be used to convert the number of sequenced transcripts into actual abundances. During the reverse transcription, pre-amplification, and library preparation steps, lowly expressed transcripts might be lost, in which case they will not be detected during sequencing. This leads to so-called "dropout" events. Since spike-ins undergo the same experimental procedure as endogenous RNAs in a cell, amplification and sequencing bias can be captured and estimated through the spike-in molecules.
Here we adopt the statistical model in TASC (Toolkit for Analysis of Single Cell data, unpublished), which explicitly models the technical noise through spike-ins. TASC's model is based on the key observation that the probability of a gene being a dropout depends on its true expression in the cell, with lowly expressed genes more likely to drop out. Specifically, let Q cg and Y cg be, respectively, the observed and true expression levels of gene g in cell c. The hierarchical mixture model used to model dropout, amplification, and sequencing bias is: where Z cg is a Bernoulli random variable indicating that gene g is detected in cell c, that is, a dropout event has not occurred. The success probability π cg ¼ P Z cg ¼ 1 À Á depends on logðY cg Þ , the logarithm of the true underlying expression. Cell-specific parameter α c models the capture and sequencing efficiency; β c models the amplification bias; κ c and τ c characterize whether a transcript is successfully captured in the library. This model will later be used to adjust for technical noise in ASE.
As input to SCALE, we recommend scRNA-seq data from cells of the same type. Unwanted heterogeneity, however, still persists as the cells may differ in size or may be in different phases of the cell cycle. Through a series of single-cell FISH experiments, Padovan-Merhar et al. [36] showed how gene transcription depends on these exogenous factors: burst size is independent of cell cycle but is kept proportional to cell size by a trans mechanism; burst frequency is independent of cell size but is reduced approximately by half, through a cis mechanism, between G1 and G2 phases to compensate for the doubling of DNA content. Additional file 1: Figure S2 illustrates how burst size and burst frequency change with cell size and cell cycle phase. Note that while the burst frequency from each DNA copy is halved when the amount of DNA is doubled, the total burst frequency remains roughly constant through the cell cycle. Thus, SCALE adjusts for variation in cell size through modulation of burst size and does not adjust for variation in cell cycle phase. Details will be given below.
Cell size can be measured in multiple ways. Padovan-Merhar et al. [36] proposed using the expression level of GAPDH as a cell size marker. When spike-ins are available, we use the ratio of the total number of endogenous RNA reads over the total number of spike-in reads as a measure (Additional file 1: Figure S2) of the total RNA volume, which was shown to be a good proxy for cell size [28]. SCALE allows the user to input the cell sizes ϕ c if these are available through other means.

Modeling transcriptional bursting with adjustment for technical and cell-size variation
We are now ready to formulate the allele-specific bursting model for scRNA-seq data. For genes that are categorized as biallelic bursty (with the proportion of cells expressing each allele between 5 and 95% from the Bayes framework), SCALE proceeds to estimate the allelespecific bursting parameters using a hierarchical model: where Y A cg and Y B cg are the true ASE for gene g in cell c. The two alleles of each gene are modeled by separate Poisson-Beta distributions with kinetic parameters that are gene-and allele-specific. These two Poisson-Beta distributions share the same cell size factor ϕ c , which affects burst size. The true ASE Y A cg and Y B cg are not directly observable. The observed allele-specific read counts Q A cg and Q B cg are confounded by technical noise and follow the Poisson mixture model outlined in the previous section: How to generate input for SCALE for both endogenous RNAs and exogenous spike-ins is included in "Methods" and Additional file 1: Supplementary methods. For parameter estimation, we developed a new "histogram-repiling" method to obtain the distribution of Y cg from the observed distribution of Q cg . The bursting parameters are then derived from the distribution of Y cg by moment estimators. Standard errors and confidence intervals of the parameters are obtained using nonparametric bootstrapping. The details are shown in "Methods".

Hypothesis testing
For biallelic bursty genes, we use nonparametric bootstrapping to test the null hypothesis that the burst frequency and burst size of the two alleles are the against the alternative hypothesis that either or both parameters differ between alleles. For each gene, we also a perform chi-square test to determine if the transcription of each of the two alleles is independent by comparing the observed proportions of cells from the gene categorization framework against the expected proportions under independence. For genes where the proportion of cells expressing both alleles is significantly higher than expected, we define their bursting as coordinated; for genes where the proportion of cells expressing only one allele is significantly higher than expected, we define their bursting as repulsed (Fig. 2). We use false discovery rate (FDR) to adjust for multiple comparisons. Details of the testing procedures are outlined in "Methods".

Analysis of scRNA-seq dataset of mouse cells during preimplantation development
We re-analyzed the scRNA-seq dataset of mouse blastocyst cells dissociated from in vivo F1 embryos (CAST/female x C57/male) from Deng et al. [2]. Transcriptomic profiles of each individual cell were generated using the Smart-seq [37] protocol. For 22,958 genes, reads per kilobase per million reads (RPKM) and total number of read counts across all cells are available. Parental allelespecific read counts are also available at heterozygous loci (Additional file 1: Figure S3). Principal component analysis was performed on cells from oocyte to blastocyst stages of mouse preimplantation development and showed that the first three principal components separate well the early-stage cells from the blastocyst cells (Additional file 1: Figure S4). The clusters of early-, mid-, and late-blastocyst cells are combined to gain a sufficient sample size. In the "Discussion", we give further insights into the potential effects of cell subtype confounding. A quality control procedure was used to remove outliers in library size, mean, and standard deviation of allelic read counts/proportions. We applied SCALE to this dataset of 122 mouse blastocyst cells, with a focus on addressing the issue of technical variability and modeling of transcriptional bursting.
Eight exogenous RNAs with known serial dilutions were added to late blastocyst cells (Additional file 2: Table S1) and used to estimate the technical noiseassociated parameters (Additional file 1: Figure S5a). We applied the Bayes gene classification framework to these cells to get the genome-wide distribution of gene categories. Specifically, out of the 22,958 genes profiled across all cells,~43% are biallelically expressed (~33% of the total are biallelic bursty, and~10% of the total are biallelic non-bursty),~7% are monoallelically expressed, and~50% are silent. Our empirical Bayes categorization results show that, on the genome-wide scale, the two alleles of most biallelic bursty genes share the same bursting kinetics and burst independently (Additional file 1: Figure S6a), as has been reported by Deng et al. [2].
For the 7486 genes that are categorized as biallelic bursty, we applied SCALE to identify genes whose alleles have different bursting kinetic parameters by the bootstrap-based hypothesis tests as previously described. After FDR control, we identified 425 genes whose two alleles have significantly different burst frequencies (Fig. 3a) and two genes whose two alleles have significantly different burst sizes (Fig. 3b). Figure 4 shows the allelic read counts of a gene that has different burst frequencies (Btf3l4) and a gene that has different burst sizes (Fdps). The two genes with significantly different allelic burst sizes (Fdps and Atp6ap2) are also significant in having different burst frequencies between the two alleles. P values from differential burst frequency testing have a spike below the significance level after FDR control (Fig. 3a), while those from differential burst size testing are roughly uniformly distributed (Fig. 3b).
At the whole genome level, these results show that allelic differences in the expression of bursty genes during embryo development are achieved through differential modulation of burst frequency rather than burst size. This seems to agree with intuition, since allelic differences must be caused by factors that act in cis to regulate gene expression, and cis factors are likely to change burst frequency by affecting promoter accessibility [36,[38][39][40]. On the contrary, while it is plausible for cis factors to affect allelic burst size through, for example, the efficiency of RNA polymerase II recruitment or the speed  [36]. Furthermore, previous studies have shown that the kinetic parameter that varies the most-along the cell cycle [36], between different genes [41], between different growth conditions [42], or under regulation by a transcription factor [43]-is the probabilistic rate of switching to the active state k on , while the rates of gene inactivation k off and of transcription s vary much less. Our analysis includes 107 male cells (X A Y) and 15 female cells (X A X B ) and this allows us to use those bursty X-chromosome genes as positive controls. As a result of this gender mixture, more cells express the maternal X A allele compared to the paternal X B allele. As shown in Fig. 3, SCALE successfully detects these bursty Xchromosome genes with significant difference in allelic burst frequencies but not in allelic burst sizes. If we keep only the 107 male cells, these X-chromosome genes are correctly categorized as monoallelically expressed-the bursting kinetics for the paternal X B allele are not estimable-and in this case there is no longer a cluster of significant X-chromosome genes separated from the autosomal genes (Additional file 1: Figure S8).
For biallelic bursty genes, we also used a simple binomial test to determine if the mean allelic coverage across cells is biased towards either allele. This is comparable to existing tests of allelic imbalance in bulk tissue, although the total coverage across cells in this dataset is much higher than standard bulk tissue RNA-seq data. After multiple hypothesis testing correction, we identified 417 genes with significant allelic imbalance, out of which 238 overlap with the significant genes from the testing of differential bursting kinetics (Fig. 5a). Inspection of the estimated bursting kinetic parameters in Fig. 5a shows that, when the burst size and burst frequency of the two alleles change in the same direction (e.g., gene Gprc5a in Fig. 5b), testing of allelic imbalance can detect more significant genes with higher power. This is not unexpected-a small insignificant increase in burst size adds on top of an insignificant increase in burst frequency, resulting in a significant increase in overall expression levels between the two alleles. However, for genes shown in red in the top left and bottom right quadrants of Fig. 5a, the test for differential bursting kinetics detects more genes than the allelic imbalance test. This is due to the fact that when burst size and burst frequency change in opposite directions (e.g., gene Dhrs7 in Fig. 5b), their effects cancel out when looking at the mean expression. Furthermore, even when the burst size does not change, if the change in burst frequency is small, by using a more specific model SCALE has higher power to detect it compared to an analysis based on mean allelic imbalance. Overall, the allelic imbalance test and differential bursting test report overlapping but substantially different sets of genes, with each test having its benefits. Compared to the allelic imbalance test, SCALE gives more detailed characterization of the nature of the difference by attributing the change in mean expression to a change in the burst frequency and/or burst size.
It is also noticeable that in Fig. 5a the vertical axis, Δfreq , has a 50% wider range than the horizontal axis, Δsize . Therefore, while it is visually not obvious from this scatter plot, much more genes have large absolute Δfreq than large absolute Δsize . Although the standard errors of these estimated differences are not reflected in the plot, given our testing results, those genes with large estimated differences in Δsize also have large standard Further chi-squared test of the null hypothesis of independence (Fig. 4c) shows that 424 genes have two alleles that fire in a significantly non-independent fashion. We find that all significant genes have higher proportions of cells expressing both alleles than expected, indicating coordinated expression between the two alleles. In this dataset, there are no significant genes with repulsed bursting between the two alleles. Repulsed bursting, in the extreme case where at most one allele is expressed in any cell, is also referred to as stochastic ME [31]. Our testing results indicate that, in mouse embryo development, all cases of stochastic ME (i.e., repulsion between the two alleles) can be explained by independent and infrequent stochastic bursting. The burst synchronization in the 424 significant genes is not unexpected and is possibly due to a shared trans factor between the two alleles (e.g., co-activation of both alleles by a shared enhancer). This result is concordant with the findings from a mouse embryonic stem cell scRNA-seq study by Kim et al. [31], which reported that the two alleles of a gene show correlated allelic expression across cells more often than expected by chance, potentially suggesting regulation by extrinsic factors [31]. We further discuss the sharing of such extrinsic factors under the context of cell population admixtures in the "Discussion".
In summary, our results using SCALE suggest that: (i) the two alleles from 10% of the bursty genes show either significant deviations from independent firing or significant differences in bursting kinetic parameters; (ii) for genes whose alleles differ in their bursting kinetic parameters, the difference is found mostly in the burst frequency instead of the burst size; (iii) for genes whose alleles violate independence, their expression tends to be coordinated. Refer to Additional file 3: Table S2 for genome-wide output from SCALE.

Analysis of scRNA-seq dataset of human fibroblast cells
To further examine our findings in a dataset without potential confounding of cell type admixtures, we applied SCALE to a scRNA-seq dataset of 104 cells from female human newborn primary fibroblast culture from Borel et al. [17]. The cells were captured by Fluidigm C1 with 22 PCR cycles and were sequenced with, on average, 36 million reads (100 bp, paired end) per cell. Bulk-tissue whole-genome sequencing was performed on two different lanes with 26-fold coverage on average and was used to identify heterozygous loci in coding regions. After quality control procedures, 9016 heterozygous loci from 9016 genes were identified (if multiple loci coexist in the same gene, we picked the one with the highest mean depth of coverage). At each locus, we used SAMtools [44] mpileup to obtain allelic read counts in each single cell from scRNA-seq, which are further used as input for SCALE. Ninety-two ERCC synthesized RNAs were added in the lysis buffer of 12 fibroblast cells with a final dilution of 1:40,000. The true concentrations and the observed number of reads for all spike-ins were used as baselines to estimate technical variability (Additional file 4: Table S3; Additional (A) (B) Fig. 5 Testing of bursting kinetics by scRNA-seq and testing mean difference by bulk-tissue sequencing. a Genes that are significant from testing of shared burst frequency and allelic imbalance. *Also includes the two genes that are significant from testing of shared burst size. Change in burst frequency and burst size in the same direction leads to higher detection power of allelic imbalance; change in different directions leads to allelic imbalance testing being underpowered. b Gene Dhrs7, whose two alleles have bursting kinetics in different directions, and gene Gprc5a, whose two alleles have bursting kinetics in the same direction. Dhrs7 is significant from testing of differential allelic bursting kinetics; Gprc5a is significant from the testing of mean difference between the two alleles file 1: Figure S5b). Refer to Additional file 1: Supplementary methods for details on the bioinformatic pipeline. We applied the gene categorization framework using SCALE and found that out of the 9016 genes, the proportions of monoallelically expressed, biallelically expressed, and silent genes are 11.5, 45.7, and 42.8%, respectively. For the 2277 genes that are categorized as biallelic bursty, we estimated their allele-specific bursting kinetic parameters and found that the correlations between the estimated burst frequency and burst size between the two alleles are 0.859 and 0.692 (Fig. 6). We then carried out hypothesis testing on differential allelic bursting kinetics. After FDR correction, we identified 26 genes with significantly different burst frequencies between the two alleles (Fig. 6a) and one gene Nfx1 with significantly different burst sizes between the two alleles, which is also significant in burst frequency testing (Fig. 6b). We further carried out testing of non-independent bursting between the two alleles and identified 35 significant genes after FDR correction (Additional file 1: Figure S6b). Out of the 35 significant genes, 27 showed patterns of coordinated bursting while the other eight showed repulsed patterns. Refer to Additional file 5: Table S4 for detailed output from SCALE across all tested genes.
We also carried out pairwise correlation analysis between the estimated allelic bursting kinetics, the proportion of unit time that the gene stays in the active state k on = k on þ k off À Á for each allele, as well as the overall ASE levels (taken as the sum across all cells at the heterozygous locus). Notably, we found that the overall ASE correlates strongly with the burst frequency and the proportion of time that the gene stays active, but not with the burst size (Additional file 1: Figure S9), in concordance with Kim and Marioni [25]. This further supports our previous conclusion that ASE at the single-cell level manifests as differences in burst frequency in a cis-manner.

Assessment of estimation accuracy and testing power
First, we investigated the accuracy of the moment estimators for the bursting parameters under four different scenarios in the Poisson-Beta transcription model: (i) small k on and small k off , which we call bursty and leads to relatively few transitions between the ON and OFF states with a bimodal mRNA distribution across cells (Additional file 1: Figure S10a); (ii) large k on and small k off , which leads to long durations in the ON state and resembles constitutive expression with the mRNA Fig. 6 Allele-specific transcriptional kinetics of 2277 genes from 104 human fibroblast cells. a Burst frequency of the two alleles has a correlation of 0.859; 26 genes show significant allelic difference in burst frequency after FDR. b Burst size of the two alleles has a correlation of 0.692. One gene has significant allelic difference in burst size. The results are concordant with the findings from the mouse embryonic development study having a Poisson-like distribution (Additional file 1: Figure S10B); (iii) small k on and large k off , which leads to most cells being silent (Additional file 1: Figure S10c); (iv) and large k on and large k off , which leads to constitutive expression (Additional file 1: Figure S10d).
We generated simulated data for 100 cells from the four cases above and started with no technical noise or cell size confounding. Within each case, we vary k on , k off , and s and use relative absolute error θ À θ j j=θ as a measurement of accuracy (Additional file 1: Figure S11). Our results show that genes with large k on and small k off (shown as the black curves in Additional file 1: Figure S11) have the largest estimation errors of the bursting parameters. Statistically it is hard to distinguish these constitutively expressed genes from genes with large k on and large k off and thus the kinetic parameters in this case cannot be accurately estimated, which has been previously reported [25,45]. Furthermore, the estimation errors are large for genes with small k on , large k off , and small s (shown as red curves in Additional file 1: Figure S11) due to lack of cells with non-zero expression. The standard errors and confidence intervals of the estimated kinetics from bootstrap resampling further confirm the underperformance for the above two classes (Additional file 1: Table S5). This emphasizes the need to adopt the Bayes categorization framework as a first step so that kinetic parameters are stably estimated only for genes whose both alleles are bursty. For genes whose alleles are perpetually silent or constitutively expressed across cells, there is no good method, nor any need, to estimate their bursting parameters.
Importantly, we see that the estimation bias in transcription rate s and deactivation rate k off cancel-over/ underestimation of s is compensated by over/underestimation of k off -and as a consequence the burst size s=k off can be more stably estimated than either parameter alone, especially when k on ≪k off (shown as red curves in Additional file 1: Figure S11). This is further confirmed by empirical results that allelic burst size has much higher correlation (0.746 from the mouse blastocyst dataset and 0.692 from the human fibroblast dataset) than allelic transcription and deactivation rate (0.464 and 0.265 for mouse blastocyst, and 0.458 and 0.33 for human fibroblast) (Additional file 1: Figure S12). For this reason, all of our results on real data are based on s=k off and we do not consider s and k off separately.
We further carried out power analysis of the testing of differential burst frequency and burst size between the two alleles. The null hypothesis is both alleles sharing the same bursting kinetics (k A on ¼ k B on ¼ 0:2; k A off ¼ k B off ¼ 0:2; s A ¼ s B ¼ 50 ), while the alternative hypotheses with differential burst frequency or burst size are shown in the legends in Additional file 1: Figure S13. The detailed setup of the simulation procedures is as follows. (i) Simulate the true allele-specific read counts Y A and Y B across 100 cells from the Poisson-Beta model under the alternative hypothesis. Technical noise is then added based on the noise model described earlier with technical noise parameters α; β; κ; τ f gestimated from the mouse blastocyst cell dataset. (ii) Apply SCALE to the observed expression level Q A and Q B , which returns a p value for testing differential burst size or burst frequency. If the p value is less than the significance level, we reject the null hypothesis. Our results indicate that the testing of burst frequency and burst size have similar power overall with relatively reduced power if the difference in allelic burst size is due to a difference in the deactivation rate k off .
We then simulated allele-specific counts from the full model including technical noise as well as variations in cell size with the ground truth k (bursty with small activation and deactivation rate). For parameters quantifying the degree of technical noise, we used the estimates from the mouse blastocyst cells (Additional file 1: Figure S5a) as well as the human fibroblast cells (Additional file 1: Figure S5b). Cell sizes were simulated from a normal distribution with mean 0 and standard deviation 0.1 and 0.01. We ran SCALE under four different settings: (i) in its default setting, (ii) without accounting for cell size, (iii) without adjusting for technical variability, (iv) not in an allele-specific fashion but using total coverage as input. Each was repeated 5000 times with a sample size of 100 and 400 cells, respectively. Relative estimation errors of burst size and burst frequency were summarized across all simulation runs. Our results show that SCALE in its default setting has the smallest estimation errors for both burst size and burst frequency (Additional file 1: Figure S14 and S15). Not surprisingly, cell size has a larger effect on burst size estimation than burst frequency estimation, while technical variability leads to biased estimation of both burst frequency and burst size. The estimates taking total expression instead of ASE as input are completely off. Furthermore, the estimation accuracy improved as the number of cells increased. These results indicate the necessity to profile transcriptional kinetics in an allele-specific fashion with adjustment for technical variability and cell size.

Discussion
We propose SCALE, a statistical framework to study ASE using scRNA-seq data. The input data to SCALE are allele-specific read counts at heterozygous loci across all cells. In the two datasets that we analyzed, we used F1 mouse crossing and bulk-tissue sequencing to profile the true heterozygous loci. When these are not available, scRNA-seq itself can be used to retrieve ASE and, more specifically, haplotype information, as described in Edsgard et al. [46]. SCALE estimates parameters that characterize allele-specific transcriptional bursting after accounting for technical biases in scRNA-seq and size differences between cells. This allows us to detect genes that exhibit allelic differences in burst frequency and burst size and genes whose alleles show coordinated or repulsed bursting patterns. Differences in mean expression between two alleles have long been observed in bulk RNA-seq. By scRNA-seq, we now move beyond the mean and characterize the difference in expression distributions between the two alleles, specifically in terms of their transcriptional bursting parameters.
Transcriptional bursting is a fundamental property of gene expression, yet its global patterns in the genome have not been well characterized, and most studies consider bursting at the gene level by ignoring the allelic origin of transcription. In this paper, we reanalyzed the Deng et al. [2] and Borel et al. [17] data. We confirmed the findings from Levesque and Raj [32] and Deng et al. [2] that, for most genes across the genome, there is no sufficient evidence against the assumption of independent bursting with shared bursting kinetics between the two alleles. For genes where significant deviations are observed, SCALE allows us to attribute the deviation to differential bursting kinetics and/or non-independent bursting between the two alleles.
More specifically, for genes that are transcribed in a bursty fashion, we compared the burst frequency and burst size between their two alleles. For both scRNA-seq datasets, we identified a significant number of genes whose allele-specific bursting differs according to burst frequency but not burst size. Our findings provide evidence that burst frequency, which represents the rate of gene activation, is modified in cis and that burst size, which represents the ratio of transcription rate to gene inactivation rate, is less likely to be modulated in cis. Although our testing framework may have slightly reduced power in detecting the differential deactivation rate (Additional file 1: Figure S13), regulation of burst size can result from either a global trans factor or extrinsic factors that act upon both alleles. Similar findings have been previously reported, from different perspectives and on different scales, using various technologies, platforms, and model organisms [31,36,[41][42][43].
It is worth noting that the bursting parameters estimated by SCALE are normalized by the decay rate, where the inverse 1=d denotes the average lifetime of an mRNA molecule. Here we implicitly make the assumptions that, for each allele, the gene-specific decay rates (d A g and d B g ) are constant, and thus the estimated allelic burst frequencies are the ratio of true burst frequency over decay rate (that is k A on;g =d A g and k B on;g =d B g ). The decay rates, however, cancel out in the numerator and denominator in the allelic burst sizes, s A g =k A off ;g and s B g =k B off ;g . Therefore, the differences that we observe in the allelic burst frequencies can also potentially be due to different decay rates between the two alleles, which has been previously reported to be regulated by microRNAs [47].
It is also important to note that 44% of the genes found to be significant for differential burst frequency are not significant in the allelic imbalance test based on mean expression across cells. This suggests that expression quantitative trait loci (eQTL) affecting gene expression through modulation of bursting kinetics are likely to escape detection in existing eQTL studies by bulk sequencing, especially when burst size and burst frequency change in different directions. This is further underscored by the study of Wills et al. [48], which measured the expression of 92 genes affected by Wnt signaling in 1440 single cells from 15 individuals and then correlated SNPs with various gene-expression phenotypes. They found bursting kinetics as characterized by burst size and burst frequency to be heritable, thus suggesting the existence of bursting QTLs. Taken together, these results should further motivate more large scale genome-wide studies to systematically characterize the impact of eQTLs on various aspects of transcriptional bursting.
Kim et al. [31] described a statistical framework to quantify the extent of stochastic ASE in scRNA-seq data by using spike-ins, where stochastic ASE is defined as excessive variability in the ratio of the expression level of the paternal (or maternal) allele between cells after controlling for mean allelic expression levels. While they attributed 18% of the stochastic ASE to biological variability, they did not examine what biological factors lead to this stochastic ASE. In this study, we attribute the observed stochastic ASE to differences in allelic bursting kinetics. By studying bursting kinetics in an allelespecific manner, we can compare the transcriptional differences between the two alleles at a finer scale.
Kim and Marioni [25] described a procedure to estimate bursting kinetic parameters using scRNA-seq data. Our method differs from that of Kim and Marioni [25] in several ways. First, our model is an allele-specific model that infers kinetic parameters for each allele separately, thus allowing comparisons between alleles. Second, we infer kinetic parameters based on the distribution of "true expression" rather than the distribution of observed expression. We are able to do this through the use of a simple and novel deconvolution approach, which allows us to eliminate the impact of technical noise when making inference on the kinetic parameters. Appropriate modeling of technical noise, particularly gene dropouts, is critical in this context, as failing to do so could lead to the overestimation of k off . Third, we employ a gene categorization procedure prior to fitting the bursting model. This is important because the bursting parameters can only be reliably estimated for genes that have sufficient expression and that are bursty.
As a by-product, SCALE also allows us to rigorously test, for scRNA-seq data, whether the paternal and maternal alleles of a gene are independently expressed. In both scRNA-seq datasets we analyzed, we identified more genes whose allele-specific bursting is in a coordinated fashion than those for which it is in a repulsed fashion. The tendency towards coordination is not surprising, since the two alleles of a gene share the same nuclear environment and thus the same ensemble of transcription factors. We are aware that this degree of coordination can also arise from the mixture of non-homogeneous cell populations, e.g., different lineages of cells during mouse embryonic development, as we combine the early-, mid-, and late-blastocyst cells to gain a large enough sample size. While it is possible that this might lead to false positives in identifying coordinated bursting events, it will result in a decrease in power for the testing of differential bursting kinetics. Given the amount of stochasticity that is observed in the ASE data, how to define cell sub-types and how to quantify betweencell heterogeneity need further investigation.

Conclusions
We have developed SCALE, a statistical framework for systematic characterization of ASE using data generated from scRNA-seq experiments. Our approach allows us to profile allele-specific bursting kinetics while accounting for technical variability and cell size difference. For genes that are classified as biallelic bursty through a Bayes categorization framework, we further examine whether transcription of the paternal and maternal alleles are independent and whether there are any kinetic differences, as represented by burst frequency and burst size, between the two alleles. Our results from the re-analysis of Deng et al. [2] and Borel et al. [17] provide insights into the extent of differences, coordination, and repulsion between alleles in transcriptional bursting.

Input for endogenous RNAs and exogenous spike-ins
For endogenous RNAs, SCALE takes as input the observed allele-specific read counts at heterozygous loci Q A cg and Q B cg , with adjustment by library size factor: In addition, for spike-ins, SCALE takes as input the true concentrations of the spike-in molecules, the lengths of the molecules, as well as the depths of coverage for each spike-in sequence across all cells (Additional file 2: Table S1; Additional file 4: Table S3). The true concentration of each spike-in molecule is calculated according to the known concentration (denoted as C attomoles/μL) and the dilution factor (×40,000): C Â 10 −18 moles=μL Â 6:02214 Â 10 23 mole −1 Avogadro constant ð Þ 40000 dilution factor ð Þ

:
The observed number of reads for each spike-in is calculated by adjusting for the library size factor, the read length, and the length of the spike-in RNA. The bioinformatic pipeline to generate the input for SCALE is included in Additional file 1: Supplementary methods. assumption in that most genes have balanced ASE on average and the use of Beta distribution allows variability of allelic ratio across cells. We adopt an EM algorithm for estimation, with Z being the missing variables: if cell c belongs to category k otherwise : The complete-data log-likelihood is given as: Z ck log f k n A c ; n B c j; a; b À Á Â Ã : For each cell, we assign the state that has the maximum posterior probability and only keep a cell if its maximum posterior probability is greater than 0.8. Let N ∅ , N A , N B , and N AB be the number of cells in state ∅ f g, A f g, B f g, and AB f g, respectively. We then assign a gene to be: Þ ≤0:95 and 0:05≤ investigate the estimation accuracy and robustness under different settings.

Hypothesis testing framework
We carry out a nonparametric bootstrap hypothesis testing procedure with the null hypothesis that the two alleles of a gene share the same kinetic parameters (Fig. 4a, b). The procedures are as follow.
( Iterate this N times. (iii) Compute the p values: We use a binomial test of allelic imbalance with the null hypothesis that the allelic ratio of the mean expression across all cells is 0.5. A chi-square test of independence is further performed to test whether the two alleles of a gene fire independently (Fig. 4c). The observed number of cells is from the direct output of the Bayes gene categorization framework. For all hypothesis testing, we adopt FDR to adjust for multiple comparisons.