 Method
 Open Access
 Published:
Modeling group heteroscedasticity in singlecell RNAseq pseudobulk data
Genome Biology volume 24, Article number: 107 (2023)
Abstract
Group heteroscedasticity is commonly observed in pseudobulk singlecell RNAseq datasets and its presence can hamper the detection of differentially expressed genes. Since most bulk RNAseq methods assume equal group variances, we introduce two new approaches that account for heteroscedastic groups, namely voomByGroup and voomWithQualityWeights using a blocked design (voomQWB). Compared to current goldstandard methods that do not account for group heteroscedasticity, we show results from simulations and various experiments that demonstrate the superior performance of voomByGroup and voomQWB in terms of error control and power when group variances in pseudobulk singlecell RNAseq data are unequal.
Background
Singlecell RNA sequencing (scRNAseq) allows the quantification of transcript profiles across individual cells and has become widely adopted over the past few years. A major advantage of scRNAseq is the high resolution it offers, enabling researchers to study molecular responses to different biological perturbations at the cellularlevel [1] rather than the populationlevel as surveyed by bulk RNAseq approaches. Many statistical tools and methods have been developed to make use of these highresolution data, such as methods for trajectory analysis [2], celltocell interactions [3], and differential expression (DE) analysis [4, 5].
Early DE analysis of this data type aimed to fully leverage information from individual cells, whereby each cell in comparison is treated as an independent biological unit (or “replicate”). To achieve this, a number of studies used established methods developed for bulk RNAseq data [6]. However, due to the sparsity of the gene count matrix, which is a major point of difference between singlecell and bulk data [4, 5], other researchers modeled scRNAseq data as either zeroinflated or multimodal in distribution and developed tailored DE analysis methods for scRNAseq data (e.g., MAST [4], BPSC [7], and DEsingle [8]). To guide the analysts’ choice, various evaluation studies have assessed the performance of bulk and tailored scRNAseq analysis methods, although their findings have varied. Some showed that bulk methods are unsuitable when directly applied to scRNAseq data [9, 10], while others found bulk methods were comparable to tailored scRNAseq methods [11]. Another analysis strategy performs DE analysis on pseudobulk samples that are created by cell aggregation [12]. This strategy was pointed out to perform better than singlecell methods that treat each cell as an independent replicate in the analysis in two independent studies [13, 14]. Through the use of an aggregation approach, dependencies between cells from the same sample are avoided [15] so that the intrinsic variability of biological replicates is wellestimated leading to fewer false discoveries compared to methods that fail to account for this [14]. Although a generalized linear mixed model with a random effect to take care of zeros and correlation structure within a sample provides slightly more power compared to pseudobulk aggregation methods [13, 16], it brings a much heavier computational burden [13, 14].
Most of the DE analysis methods applied on pseudobulk data in the literature are “goldstandard” bulk DE analysis methods, including limmavoom, limmatrend [17, 18], edgeR [19], and DESeq2 [20]. Limma was developed for microarray data, assumes the logtransformed expression values are normally distributed, and employs linear modeling and empirical Bayes shrinkage to improve the stability and power of statistical tests. For RNAseq data, voom and limmatrend were subsequently developed based on the assumption of normality of the logtransformed counts, using different strategies to address the dependence that is observed between the mean and the variance (referred to as heteroscedasticity) in this kind of data. Voom models the relationship between the mean and variance across all observations using a fitted LOWESS trend and calculates precision weights based on the estimated trend for use in linear modeling. On the other hand, edgeR employs empirical Bayes shrinkage and was developed assuming genelevel counts follow a negative binomial distribution.
Due to limited sample numbers, most bulk DE analysis methods including the aforementioned goldstandard methods borrow information between genes to estimate the variance [19, 21,22,23] and assume equal variances between experimental groups (also referred to as “homoscedasticity”). However, there are cases where the variability observed is distinct for different groups (“heteroscedasticity”). Here we use “group” as a general term that covers common experimental variables or conditions such as treatment (drug A, drug B, vehicle control), genotype (wildtype, knockout), and sex (male, female). In scRNAseq analysis, DE methods can be used to find marker genes as well [24], in which case, the concept of “group” can extend to different cell types or clusters.
Heteroscedasticity has been frequently observed in microarray gene expression data [25, 26], for instance, Demissie et al. showed that a moderated Welch test performs better than the moderated ttest when group variances are unequal [26]. In largescale bulk RNAseq data, under the scenarios of heteroscedasticity, Ran et al. pointed out that voom was unable to model the variability appropriately and they noted that the weighting strategy used in voomWithQualityWeights (voomQW) may be more helpful [27] on account of its joint modeling of variability at the observational and samplelevel. Chen et al. noted an unequal group variance in singlecell data as well, stating that unequal variance tests are underused [28]. They made use of the large sample sizes available when each cell is considered as a replicate and estimated groupspecific dispersions for each gene separately.
In this article, we examine whether groupspecific variances are homoscedastic (equal) or heteroscedastic (unequal) in pseudobulk scRNAseq data. We show that heteroscedastic groups are frequently observed in the data and that the application of current DE analysis methods has variable performance. Importantly, goldstandard methods that do not model grouplevel variability can both under and overestimate variances leading to poor error control or reduced power to detect DE genes. We demonstrate that methods that account for heteroscedastic groups, namely voomByGroup and voomQW using a blocked design, have superior performance in this regard when group variances are unequal.
Results
Observing heteroscedasticity in scRNAseq pseudobulk data
To study whether group variances are equal or unequal in scRNAseq pseudobulk data, we explored pseudobulk scRNAseq datasets generated with cells from specific cell types obtained from various sample types ranging from experimental replicates of mice to human samples (see the “Methods” section). We examined three things: (1) multidimensional scaling (MDS) plots, (2) common biological coefficient of variation (BCV) of groups, and (3) meanvariance trends derived from individual groups (we refer to these as “groupspecific voom trends”). Larger distances between samples in a group on the MDS plots indicate more withingroup variation. BCV is a measure of the biological variability in gene expression between biological replicates and is frequently used to estimate the variance of gene expression in RNAseq data [29]. Higher common BCV values correspond to increased biological variation between samples across genes based on the assumption of the negative binomial (NB) distribution in edgeR. For the groupspecific voom trends, we are interested in observing where the curves sit relative to other groups in the same study, as well as the shape of the curve. The shape and “height” of the curves reflect the total variation within groups—both technical and biological.
In studies of mouse lung tissue [30] and Xenopus tail [31], we observed some minor differences in groupspecific voom trends, with the curves sitting close together and mostly overlapping one another (Fig. 1ab). Common BCV values for these studies ranged from 0.197 to 0.240 across 2 groups in mouse lungs, and 0.226 to 0.295 across 5 groups in Xenopus tails.
In human datasets, the differences in group variances were greater. For human peripheral blood mononuclear cells (PBMCs) [32], healthy controls unsurprisingly exhibited lower variability than the 3 other patient groups to which they were compared (Fig. 1c). This was evident from groupspecific voom trends—although the curves had similar shapes, the curve of the healthy controls sat distinctly below the curves of other groups. Common BCV values for the PBMCs ranged from 0.154 to 0.241, with the lowest for healthy controls and the highest for asymptomatic patients.
A separate study on human macrophages collected from lung tissues [33] showed even higher levels of heteroscedasticity, where common BCV values ranged from 0.338 to 0.495 (Fig. 1d). Groupspecific voom trends had distinct shapes and were well separated from one another along the vertical axis (with the exception of IPF and control groups which were quite similar). Moreover, the plateauing of voom trends at higher expression values that are commonly observed in many datasets was not observed here. This might be on account of the complexity of regions in the lung where samples are collected, the diverse causes of lung fibrosis, and limited patient numbers for some groups. High levels of biological variation are reflected in the large BCV values in this dataset.
While we have not commented specifically on MDS plots, these plots (or other similar plots e.g., principle components analysis) provide a useful first glance of the data and are already part of many analysis pipelines (Fig. 1). For example, in the study of mouse lung tissue, the 3month (3m) samples are less spread out across dimensions 1 and 2 than the 24month (24m) samples, indicating that the 3m group has lower variability than the 24m group. This is confirmed by the groups’ BCV values and voom trends.
In conclusion, we observe unequal group variability across multiple scRNAseq pseudobulk datasets. At this stage, it is unclear whether goldstandard bulk DE analysis methods are robust against heteroscedasticity, and how different group variances need to be before it affects their performance. We test this in later sections of this article, using three goldstandard methods that do not account for heteroscedasticity and two novel methods that do.
Novel use of voomWithQualityWeights using a block design (voomQWB)
The first method that accounts for grouplevel variability makes novel use of the existing voomQW method. The standard use of voomQW assigns a different quality weight to each sample, which then adjusts the sample’s variance estimate—a strategy used to tackle individual outliers in the dataset. Rather than adjusting the variance of individual samples, we adjust the variance of whole groups by specifying sample group information via the var.group argument in the voomWithQualityWeights function. This produces quality weights as “blocks” within groups (identical weights for samples in the same group) and adjusts each group’s variance estimate—we refer to this method as “voomQW using a block design,” or simply voomQWB.
Figure 2a shows the estimated groupspecific weights from voomQWB for a study comparing healthy controls to COVID19 patients that are moderately sick and those that are asymptomatic [32]. Samples of moderately sick and asymptomatic patients have similar weights, just under 1; while the weights for healthy controls are above 1 (1.27). The sample weights are combined with observationlevel weights derived from the overall meanvariance trend from voom (Fig. 2b). What this achieves in practice is an upshift of the voom trend for groups with sample weights below 1 (Fig. 2c pink and green curves), resulting in a higher variance estimate and a smaller precision weight for statistical modeling (see the “Methods” section). On the other hand, groups with sample weights that are greater than 1 have a downshifted voom trend (purple curve), resulting in lower variance estimates and larger precision weights. There are a couple of things to note here: (1) the groupspecific voom trends from voomQWB (Fig. 1c) are roughly parallel to the single voom trend (Fig. 1b), and 2) the groupspecific trends shown here are created manually, not as an output of the voomWithQualityWeights function.
voomByGroup: modeling observationlevel variance in individual groups
As mentioned above, voomQWB models groupwise meanvariance relationships via roughly parallel trendlines, which has the disadvantage of not being able to capture more complicated shapes observed in different datasets (Fig. 1). The second method we describe here, called voomByGroup, can account for such grouplevel variability with greater flexibility. voomByGroup achieves this by subsetting the data and estimating separate voom trends for each group. In other words, while voomQWB can shift the same voom trend up and down for each group, voomByGroup estimates distinct groupspecific trends that can also allow up and downshifts for different groups.
For example, on the PBMC1 dataset, the mean and variance are calculated for the log_{2}countspermillion (logCPM) of each gene in “Group 1 (Moderate symptoms)” and a curve, or trend, is fitted to these values from which precision weights are then calculated (Fig. 2d). Similarly, a curve is fitted separately to each of the other groups in the dataset (Fig. 2e and f). This results in 3 nonparallel curves as shown in the summary plot (Fig. 2g) which includes all 3 trends. In theory, the voomByGroup method gives more robust estimates of variability since the trend for each group can take on a different “shape”—we test how this works in practice in the following sections.
Group variance methods provide a balance between power and error control
Using simulated data, we test the performance of 3 goldstandard methods against the 2 new methods that account for heteroscedastic groups. The goldstandard methods are voom, edgeR using a likelihoodratio test (edgeR LRT), and edgeR quasilikelihood (edgeR QL). The methods that account for group heteroscedasticity (“group variance methods”) are voomQWB and voomByGroup. Using simulations of pseudobulk data, we can examine the effects of unequal group variation while controlling other factors. Specifically, group variation can be divided into biological variation between RNA samples and technical variation caused by sequencing technologies.
In the first scenario (scenario 1), we looked into unequal group variation as a result of biological variation. To obtain pseudobulk data, we simulated singlecell genewise read counts that followed a correlated negative binomial distribution and aggregated the reads from each sample (see the “Methods” section and Additional file 1: Fig. S1). Each simulation consists of 4 groups with 3 samples in each group—a total of 50 such simulations were generated. We generated varying groupspecific common BCVs for the 4 groups that are well within the range of BCV values observed in experimental datasets (Figs. 1 and 3a)—the BCV values averaged over 50 simulations were 0.2, 0.22, 0.26, and 0.28 (the values in Fig. 3a are for one such simulation).
The meanvariance trends generated for the different groups appear as expected, with a typical decreasing “voomtrend” with increasing gene expression and the curves ordered correctly from those with the most biological variation at the top of the plot (group 4 in Fig. 3a) to the group with the least biological variation at the bottom of the plot (group 1). The lefthand side of these meanvariance trends is primarily driven by technical variation—as expected, here they mostly overlap each other since groups were generated to have the same technical variation in these simulations. These groupspecific meanvariance plots generated by the voomByGroup function provide a useful “first glance” of the data before any formal testing was carried out.
We then performed differential gene expression analysis for all pairwise group comparisons, which gave a total of 6 comparisons. In these simulations, 50 genes were generated to be upregulated in each group, such that 100 genes are differentially expressed in each pairwise comparison (see the “Methods” section). We noticed that the number of differentially expressed genes varied from method to method and calculated the false discovery rate (FDR) of each method which was averaged over the 50 simulations. The FDR, or type I error rate, is calculated as the number of genes that were incorrectly identified as differentially expressed out of the total number of genes that were identified as differentially expressed at a particular adjusted pvalue cutoff. We observed that none of the methods controlled type I error for all comparisons across the 3 cutoffs we examined: adjusted pvalue cutoff of 0.01, 0.05, and 0.10 (Fig. 3b, Additional file 1: Fig. S2); such that the methods detected false discoveries at a higher rate than expected. voomByGroup outperformed other methods by controlling type I error at the 0.01 and 0.10 cutoffs, and only exceeding the threshold marginally for 2 out of 6 comparisons at the 0.05 cutoff (FDR of 0.054 and 0.056).
A closer look at these plots revealed that the goldstandard methods had FDR values that spanned a broad range, with some comparisons having FDR values that were well under the threshold, and others that exceeded the threshold by 2 or 3fold. This means that it could be difficult to gauge whether the DE results are too conservative, too liberal, or perhaps “just right” for a given comparison in real datasets when applying these methods to heteroscedastic groups. The range of FDR values is broadest for edgeR LRT, followed by edgeR QL, then voom. In comparison, the group variance methods, though not perfect in terms of type I error control, had a substantially tighter range of FDR values, and the comparisons that exceeded the FDR threshold only exceed it by a small margin.
To understand how heteroscedasticity influences DE analysis in more detail, we focused on results obtained using a 0.05 adjusted pvalue cutoff. Across the 6 comparisons, group variance methods tend to detect similar numbers of differentially expressed genes, the same goes for goldstandard methods (Fig. 3cd, Additional file 1: Fig. S3). There is some variation between the goldstandard and group variance modeling methods, with some comparisons having quite similar results, while others produce results that are very different. A closer examination of the comparison between group 3 and group 4 (which we refer to as “High vs High” in terms of biological variation) and group 1 and group 2 (“Low vs Low”) shows where the 2 classes of methods differ.
In the High vs High comparison, goldstandard methods detect more DE genes than group variance methods. However, the DE genes contain a much higher proportion of false discoveries than it was controlled for. It is not as though goldstandard methods were prioritizing falsepositive genes in terms of significance—it had similar numbers of true and falsepositive genes when looking at topranked genes (Additional file 1: Fig. S4). Rather, goldstandard methods had smaller adjusted pvalues than group variance methods, allowing more genes detected at a certain cutoff (Additional file 1: Fig. S5a). By pooling variance estimates across all 4 groups, goldstandard methods underestimate the variances for groups 3 and 4, when in fact those groups have relatively high biological variation, resulting in poor type I error control. voomByGroup and voomQWB are more robust in their estimation of individual group variances, allowing them to maintain good type I error control.
In the Low vs Low comparison, all methods have good type I error control, with group variance methods detecting substantially more DE genes than goldstandard methods. Although topranked genes were yet again very similar (Additional file 1: Fig. S4), this time, goldstandard methods had larger adjusted pvalues than group variance methods (Additional file 1: Fig. S5b), meaning that fewer genes were selected at given threshold. Here, the pooled variance estimates used by goldstandard methods resulted in an overestimation of variances for the two groups with relatively low biological variation (groups 1 and 2). In consequence, goldstandard methods suffered from a loss of power.
In the Low vs High comparison, there were no significant benefits in applying the group variance methods even though these methods provide more accurate variance estimates than the standard methods. When using voom, the overall variance trend that is applied to all samples and groups would underestimate the variability of high variance groups, and overestimate this for low variance groups. When pairwise comparisons are made between High and Low groups, the over and underestimated values balance each other out, giving results that are similar to the more precise estimates from voomByGroup and voomQWB.
These simulations demonstrated that in the presence of group heteroscedasticity, group variance methods have a good balance between controlling type I error and the power to detect DE genes. To ensure that the superior performance of group variance methods was due to group heteroscedasticity in the data, we separately simulated 50 null simulations (scenario 2) where all groups had equal underlying biological variation (see the “Methods” section). We observed similar numbers of true positives and false discoveries between goldstandard and group variance methods (Additional file 1: Fig. S6).
voomByGroup captures both biological and technical variation well in individual groups
Systematic differences are commonly observed in the sequenced libraries of scRNAseq data [34]. For example, gene counts vary between cells on account of limited starting material per cell and variations in technical efficiency. In addition, the number of cells detected in each sample is also variable. After aggregating cells, pseudobulk samples have library sizes that are more variable than that of bulk RNAseq data—contributing to a major source of technical variation in pseudobulk data.
We explore the influence of unequal library sizes by varying the number of cells in each sample for a new set of simulations. Keeping the underlying biological variation constant between groups (homoscedasticity), we first vary the library sizes of samples. The provided number of cells are 250, 250, and 250 in group 1, 250, 200, and 200 in group 2, 250, 500, and 500 in group 3, and 250, 750, and 750 in group 4 (Fig. 4a). The expected library size for each cell remains constant (see the “Methods” section).
Under this scenario (scenario 3), the meanvariance trends generated appear to mostly overlap each other on the righthand side as expected on account of equal group dispersions, while on the lefthand side, slight differences appear (Fig. 4b). DE analysis was then performed over 50 simulations and averaged FDR rates and numbers of DE genes were compared. We observed much tighter ranges of FDR values compared to those from scenario 1 (Fig. 4c, where the same yaxis range from Fig. 3b was used) and a similar number of true positive genes compared to the null simulation (Additional file 1: Fig. S7). These suggest that simulated technical variation does not have a significant influence on the DE results. To compare the influence of technical and biological variation in the simulations, we carried out another separate 50 simulations (scenario 4) with both aspects of variation incorporated (see the “Methods” section). For these results, we observed expected location trends that differed on both the left and right sides (Additional file 1: Fig. S8a). FDR results were rather similar to what was observed in the scenario where biological variation was unequal (Additional file 1: Fig. S8b, Fig. 3b), indicating that biological variation is the major source of variation that influences the DE results.
However, closer inspection of the FDR plot from scenario 3 where only the number of cells differed between groups revealed that among those comparisons, voom and voomQWB performed similarly, as a result of the weighting strategy used in voomQWB only adjusting the groupwise weight in an overall manner. While voomByGroup is more flexible, we observed that for groupwise meanvariance trends, regardless of the overlapping trends on the righthand side, on the lefthand side, groups with fewer cells (group1 and group2) exhibit slightly more variation and sit at the top, while the group with the largest number of cells (group4) is at the bottom (Fig. 4b). Because of the wellcaptured meanvariance relationship, voomByGroup delivered wellcontrolled FDR compared to those from voom and voomQWB, especially when comparing between group1 versus group2 in scenario 3, where slightly higher technical variation is present (Fig. 4c).
Immune responses in asymptomatic COVID19 patients
While the simulations allowed us to assess the performance of gold standard methods and group variance methods based on known truth, these results have no biological interest. Moreover, no matter how carefully thoughtout and welldesigned our simulations are, these data will inevitably miss some features from experimental data. Thus, we also examined the performance of methods on human scRNAseq data.
Zhao et al. [32] investigated PBMCs from COVID19 patients of varying severity alongside healthy controls (HCs), with a focus on the comparison between asymptomatic individuals and HCs. The study found that interferongamma played an important role in differentiating asymptomatic individuals and HCs, such that it was more highly expressed in natural killer (NK) cells of asymptomatic individuals [32]. In their data, the expression of IFNG was observed to be upregulated in asymptomatic individuals; however, the difference was not statistically significant when analyzed with edgeR QL. We reanalysed this dataset (PBMC1, see the “Methods” section) to see whether group variance methods could offer improved results.
We aggregated CD56^{dim} CD16^{+} NK cells from each sample to create pseudobulk samples and then filtered out samples with fewer than 50 cells. A first glance at the data via MDS and groupspecific meanvariance plots shows that HCs have a distinct meanvariance trend and less biological variation (Fig. 1c). By accounting for the relatively low variance in the HC group, we found that group variance methods outperformed goldstandard methods in terms of statistical power, such that they detected more DE genes for the comparison between HCs and asymptomatic individuals (Fig. 5a)—this is consistent with our simulation results when comparing groups with low variance (Fig. 3b). voomByGroup detected the most DE genes, followed by voomQWB: 880 and 719 genes respectively. The goldstandard methods, edgeR LRT, voom, and edgeR QL, detected 664, 453, and 403 DE genes respectively.
To understand our results further, we looked at the consistency at which genes were detected as DE between methods (Additional file 1: Fig. S9a). We excluded edgeR LRT from our Venn diagram since the inclusion of all 5 methods greatly increased the complexity of the plot, and edgeR LRT was of less interest to us since we previously demonstrated that it performed poorly in the control of type I error. Although group variance methods detected almost double the number of DE genes as compared to voom and edgeR QL, most genes were detected by all methods (356 genes). Both of the voomvariants, voomQWB and voomByGroup, detected all of the genes that were also detected by voom. With the exception of 1 gene, voomByGroup also detected all of the genes that were detected by voomQWB. From voom to voomQWB then voomByGroup, the methods increase in their level of groupspecific variance modeling. The overlap between these methods and the extra DE genes reflects the hierarchy in variance modeling for these methods and demonstrates the potential gain in statistical power when capturing group variance more accurately.
Next, we turned to Gene Ontology (GO) enrichment analysis to study the biological processes that play a role in COVID19. We looked for any enrichment in GO terms for significantly upregulated genes in asymptomatic patients for each of the methods under examination. voomByGroup was the only method to detect the “interferongammamediated signaling pathway” as significant using a pvalue cutoff of 0.01 (Fig. 5b). None of the other methods found any of these 5 genes as significant—they had much higher pvalues as compared to that of voomByGroup (Additional file 1: Fig. S9b). To confirm the role of interferongamma in asymptomatic patients, we analyzed data from a separate study also involving CD56^{dim} CD16^{+} NK cells in COVID19 patients of varying severity [35]. The original study did not look into the role of interferongamma. Reanalyzing these data (PBMC2, see the “Methods” section), we found that in this second dataset the “interferongammamediated signaling pathway” was enriched using any of the DE methods under examination, and those groupspecific variances were similar between all groups (Additional file 1: Fig. S10).
Taking the two COVID19 datasets into consideration, we noticed a few things: (1) group variances can change between one dataset and another, even for studies on similar cell types and similar subjects—this perhaps has to do with how samples are processed (technical variation) and/or the “grouping” criterion plus the individual subjects involved (biological variation); (2) when variance trends are not too distinct from one another, all methods perform similarly, as observed in the second dataset (PBMC2); (3) when variance trends are distinct, group variance methods may benefit from a gain in statistical power, as observed in the first dataset (PBMC1); and (4) by modeling groupspecific variances closely, voomByGroup was the only method that obtained statistically significant results for the biological process of interest in both datasets. These two datasets are used here to highlight how results of biological interest may be “missed” if heteroscedasticity is not carefully considered.
Discussion
We have shown that modeling the meanvariance relationship at the grouplevel and the use of groupwise precision weights enhances DE analysis results when there is group heteroscedasticity. Simulations demonstrated that voomQWB and voomByGroup have a good balance between controlling type I errors and the power to detect DE genes. Additionally, voomByGroup performs better at capturing technical variation in the meanvariance trends. The analysis of PBMC data agreed with our simulation results whereby methods that model groupspecific variation provide more DE genes when lowBCV groups are included in the comparison, with statistically significant results obtained for key biological processes of interest with voomByGroup. Null simulations confirmed that established goldstandard methods and approaches that model groupspecific variation performed similarly when there were no distinct differences in variability between groups. Consistent results were presented by Chen et al. on scRNAseq data [28] where methods that accounted for heteroscedasticity performed as well as methods that do not account for heteroscedasticity when there is equal group variation, which indicates there is potential for groupvariation methods to be more broadly used.
In this article, we demonstrate that group variance modeling methods outperform goldstandard methods for DE analysis of pseudobulk scRNAseq data. Specifically, voomByGroup has the best performance in terms of balancing type I error control and power. This is because voomByGroup models the meanvariance relationships for different groups more flexibly to better capture the distinct trends that may be present in the data. voomQWB also performs very well; with better results than goldstandard methods in the presence of group heteroscedasticity. Its performance is similar, and second only, to voomByGroup. Relative to voomByGroup, voomQWB lacks the flexibility to capture the distinct shapes of groupspecific meanvariance trends, which could explain some of the differences in performance.
We recommend the use of either voomByGroup or voomQWB over goldstandard methods in scRNAseq pseudobulk analysis in datasets that exhibit heteroscedastic variation across different experimental groups. The voomByGroup software provides useful diagnostic plots that can help guide the choice of method, with code that is easy to run, taking similar inputs to the widely used and wellestablished voom approach (see the “Methods” section). Running voomByGroup first can allow the analyst to determine the level of heteroscedasticity in a given dataset. For example, if the meanvariance trends per group are mostly overlapping each other, then group variance methods are likely to offer very similar results to current goldstandard methods (Fig. 4bc). In this case, method choice will not affect the results much, and one may prefer to choose a method that is simpler, based on fewer assumptions, such as voom. If voomByGroup meanvariance plots show distinct trends in one or more groups, then the variance for that group can be more closely modeled using voomByGroup or voomQWB (e.g., “ST40_3” in Fig. 1b, “HC” in Fig. 1c, and “NSIP” and “cHP” in Fig. 1d). In such cases, methods that explicitly model groupspecific variability are highly recommended over standard methods that do not. Moreover, the common BCV values that are automatically generated and displayed on these plots provide summary information about differences in mean dispersion for different groups calculated across all genes.
Between the two group variance methods, voomByGroup outperforms voomQWB slightly. It also provides groupspecific meanvariance plots that are a useful diagnostic in exploratory data analysis. voomByGroup, however, has some limitations related to its use of a subset of the design matrix and data—a necessary step to obtain distinct groupspecific shapes for the meanvariance trends. This means that in practice, the use of voomByGroup is most suitable for simple block designs with a single group factor only [36]. When there are additional explanatory variables, voomByGroup may not estimate covariates accurately or may run out of degrees of freedom when estimating coefficients for additional factors. For these complex experimental designs, voomQWB is ideal since it can handle the same complexity as goldstandard methods such as voom, but with the additional safeguard against heteroscedastic groups. One such example of this includes datasets that are collected over several batches. voomQWB can properly accommodate biological groups of interest and batch information into the linear modeling, while handling differences in group variability.
For experiments with very small group sizes, voomByGroup offers the option of applying the overall voom trend to specific groups rather than using the default groupspecific trend—this is specified in the dynamic argument of the function. Since voomByGroup estimates group variances using only the relevant samples, groupspecific meanvariance trends could be unstable when modeled using a limited number of samples. We recommend the application of an overall voom trend to groups of size 1 or 2.
In situations where there are individual samples with higher variability (outliers), the voomQWB and voomByGroup methods may work less well, with the inclusion of highly variable samples increases the estimated group variation, which may decrease power. In these situations, regular samplespecific modeling of variability (i.e., voomWithQualityWeights without specifying the var.group option) would be more appropriate. In our study, we did not explore datasets with outlier samples and leave such investigations as future work.
In this study, we also observed that edgeRbased methods returned relatively different results compared to voombased methods (Fig. 5a). One major source of this is the different distributional assumptions between methods. Due to the mathematical intractability of the NB distribution (basic distribution in edgeR) compared to the normal distribution, methods were first developed for modeling group heterogeneity in limma (e.g., voomWithQualityWeights), which assumes normally distributed data. When modeling data using a NBGLM, modeling groupwise variation is more challenging. An example of weighted regression in this context comes from Zhao et al. [37], who used observational weights to account for outlier observations.
In our article, we focus on DE analysis of scRNAseq pseudobulk data because recent benchmark studies have shown that it gives better results relative to analyzing scRNAseq data in its original nonaggregated form [13, 14, 38]. However, it is worth noting that by aggregating singlecell data to obtain pseudobulk samples, the variance between cells of the same sample is masked. Thus, it may be useful to check celllevel gene expression and its variability, especially for any genes that are detected as significant. To account for this, Zimmerman et al. modeled the correlation structure between cells using a generalized mixed model where individuals were assigned as a random effect [16]. A similar approach was taken in Crowell et al. [13]. In a similar way, linear mixed modeling may also be accessible by using the voomQWB method together with the duplicateCorrelation function in the limma package.
Whilst we apply group variance methods on pseudobulk samples in this article, the idea of modeling group variances more closely can in theory be extended to DE analysis of other data types such as bulk RNAseq data, pseudobulk of spatial scRNAseq data, and surface protein data from CITEseq. Moreover, the “groups” that are used by voomQWB or voomByGroup can be extended to cell types or clusters to find marker genes.
Conclusion
In the presence of group heteroscedasticity, the voomQWB and voomByGroup methods have superior performance to approaches that do not account for distinct groupspecific variation. These methods offer a better balance between false discovery control and the power to detect DE genes on pseudobulk scRNAseq datasets. We recommend both group variance modeling methods, with voomByGroup having accurate variance estimation for simple designs, and voomQWB capable of modeling data with more complex study designs. To guide an analyst’s choice of the most appropriate variance modeling method to apply to a given dataset, we recommend checking the relevant diagnostic plot to assess whether the model assumptions are met.
Methods
Revisiting variance modeling with voom
The group variance methods presented in this article, voomQWB and voomByGroup, are adaptations of voom method by Law et al. [18]. Briefly, the voom method fits a linear model to each gene using a design matrix with full column rank, X, such that
where \(y_{g}\) is a vector of logCPM values for gene g, and \(\beta\) is a vector of regression coefficients for gene g. The fitted model allows us to calculate residual standard deviations \(s_g\). Squareroot standard deviations \(\sqrt{s_g}\) are plotted against the average log count of each gene, and a LOWESS curve [39] is fitted to the points—this creates the voomstyle meanvariance plots seen throughout this article (Fig. 2b). Precision weights \(w_{gi}\) for gene g and sample i are then calculated as a function of the fitted counts \(\hat{\lambda }_{gi}\) using the LOWESS curve, such that \(w_{gi}=\text {lo}(\hat{\lambda }_{gi})^{4}\). The weights \(w_{gi}\) are then associated with logCPM values \(y_{gi}\) in the standard limma pipeline, which uses these in weighted least squares regression.
Group variance modeling with voomQWB
Written with outlier sample detection in mind, Liu et al. [40] combined samplespecific weights with the voom precision weights in their voomWithQualityWeights method. The combined weights, denoted as \(w^{*}_{gi}\), can be described as \(w^{*}_{gi}=w_{gi}/\)exp\(\hat{\gamma _{i}}\), where 1/exp\(\hat{\gamma _{i}}\) represents the samplespecific weights. The standard use of voomWithQualityWeights calculates samplespecific weights based on the similarity of gene expression profiles within groups, such that any sample that is dissimilar to the rest of the samples in the same group gets downweighted. In other words, samples within the same group can be assigned varying weights.
Instead, we force samples within the same group to have the same weight by exploiting the var.group argument in the voomWithQualityWeights function. A factor representing the groups group is assigned to var.group to obtain “blocked” weights for the samples. Visually, what this achieves is an adjustment of the standard voom curve to separate curves for each of the groups, where the adjustment is based on the blocked samplespecific weights (Fig. 2c). In practice, the blocked samplespecific weights are used to adjust the precision weights fed into the standard limma pipeline, such that \(w^{*}_{gi}\) rather than \(w_{gi}\) is used.
Group variance modeling with voomByGroup
Our second group variance method, voomByGroup, tackles heteroscedastic groups from a different angle. voomByGroup subsets the gene expression data and design matrix X for each group, such that a LOWESS curve is created using only the data from specific samples. The LOWESS curve is then used to obtain precision weights \(w_{gic}\) for gene g and sample i in a group (or condition) c. As a result, each group has its meanvariance curve and set of weights (Fig. 2dg). The groupspecific weights are combined across all groups, \(c = 1, 2, ... C\), to get \(w^{\#}_{gi}\) which replaces \(w_{gi}\) in the standard limma pipeline. Since voomByGroup subsets the data and the design matrix X to obtain precision weights, any additional covariates are estimated using only a subset of the data.
Additionally, voomByGroup offers an option to make the usage of overall voom trend instead of groupspecific trends, which is specified in the argument dynamic with the input as a vector of BOOLEAN variables. The dynamic is recommended to be turned on for groups with small group sizes, e.g., 2 or fewer samples in a group. For groups with relatively more samples (3 or more than 3), the dynamic can remain FALSE, which means that to estimate the variance, the groupspecific trends are still used.
Running variations of voom
The voom, voomByGroup, and voomQWB methods are run in R using the following functions:
voom(y, design=design, ...)
voomByGroup(y, design=design, group=group, ...)
voomWithQualityWeights(y, design=design, var.group=group, ...)
All functions are run similarly, with 2 common arguments and an additional argument for voomByGroup and voomWithQualityWeights. y represents pseudobulk count data with N samples and G genes. design is the design matrix with N rows matching the number of samples and P model parameters. group is a factor vector that is of length N.
Groupspecific meanvariance plots are produced in the voomByGroup function, by specifying plot=“separate” to get individual meanvariance plots for each group (Fig. 2df) or plot=“combine” to show all meanvariance curves in a single plot (Fig. 2g) which makes relative differences between the curves easier to spot. The common BCV values calculated using estimateCommonDisp function in edgeR are automatically added to the plots.
DE analysis with edgeR
Besides the standard voom method, two further options for DE analysis using edgeR, namely edgeR LRT (likelihoodratio test) [29] and edgeR QL (quasilikelihood) [41], were also evaluated. To run edgeR LRT, glmFit was used with default settings, and only the count matrix and design matrix were provided, followed by glmLRT. To run edgeR QL, glmQLFit was used with default settings, and only the count matrix and design matrix were provided, followed by glmQLFTest. All genes with associated pvalues from the DE test used were then extracted with the topTag function.
Simulated scRNAseq data
Singlecell genewise read counts were generated to follow correlated negative binomial distributions (Additional file 1: Fig. S2). Baseline expression frequencies were generated by the function edgeR::goodTuringProportions on reference data [42] (iTreg cells were used, and genes with UMI counts > 200 were kept). The expected library size for each cell is estimated using a lognormal distribution [43]. Parameters (location mu and scale sigma) are estimated based on the reference data as well, and they are then used to calculate expected library sizes. Then baseline proportions were multiplied by expected library sizes to generate expected read counts.
Read counts from the same subject were generated to be correlated using a copulamultivariate normal strategy. First, multivariate normal deviates were generated with the specified intrasubject correlation. Then, the normal deviates were transformed to gamma random variables by quantiletoquantile transformations to represent the “true” expression levels of each gene in each cell. Then, Poisson counts were generated with expected values specified by the gamma variates. Here the gamma deviates represent biological variation between subjects and cells while the Poisson counts represent technical variation associated with sequencing [29]. This process ensured that the counts follow marginal negative binomial distributions and that counts for each subject are correlated. Importantly, the intrasubject correlation affects only the biological part of the variation whereas the technical variation remains independent.
The relationship of the dispersion of aggregated cells to the dispersion of single cells is approximate:
where \(\phi _{agg}\) is the dispersion of aggregated pseudobulk data, correlation is the intrasubject correlation, and \(\phi _{sc}\) is the dispersion of singlecell data. N is calculated by \(N=\frac{(\sum L_{i})^{2}}{\sum L_{i}^{2}}\), where \(L_{i}\) is the cellwise library size. In the current study, intrablock correlation is set as 0.1 for all simulations.
Simulation scenarios
In this study, we simulate data for 4 scenarios: (1) groups having different biological variation, (2) no biological or technical variation between samples or groups, (3) samples having different cell numbers (this induces differences in technical variation by having unequal sample sizes), and (4) samples having different cell numbers and groups having different levels of biological variation. We generate 50 simulations for each scenario, with each simulation involving 12 samples (4 groups, with triplicate samples in each) and 10,000 genes. To induce differentially expressed genes in the datasets, 50 genes are randomly selected to be upregulated in each group with a true log\(_2\)foldchange of 2. This means that for every pairwise comparison, there are 100 true DE genes. For each simulated dataset, genes with fewer than 30 reads across all pseudobulk samples were filtered out before DE analysis.
In the first scenario (scenario 1), we keep the expected library size of each sample consistent by generating 250 cells for each sample. By varying the dispersion in our simulation, we obtain common BCV values that are variable between groups – 0.20, 0.22, 0.26, and 0.28.
The second scenario (scenario 2) generates homoscedastic groups, such that there should be no true differences (biological or technical) in group variability. We use the data here to confirm whether the methods behave in the way we would expect—that voomQWB and voomByGroup perform similarly to voom if there are no group differences. Here, each sample contains 250 cells and groups have BCV values of \(\approx\)0.22.
In the third scenario (scenario 3), the biological variation is consistent between groups (common BCV\(=\sim\)0.22), but the number of cells varies for each sample. With the baseline number of cells set as 250, the samples are adjusted by these proportions: 1:1:1, 1:0.8,0.8, 1:2:2, and 1:3:3. The expected library for each cell remains constant, such that a sample with more cells is expected to have a library size that is proportional to its cell number.
The fourth scenario (scenario 4) combines elements from the first and third scenarios. Biological variation is adjusted such that group 1 and group 2 have less biological variation (common BCV=\(\sim\)0.22), and group 3 and group 4 have relatively more biological variation (common BCV\(=\sim\)0.24). Samples have variable cell numbers—generated using the same baseline cell number and proportions as for our third simulation.
scRNAseq datasets
Publicly available scRNAseq datasets that were examined in this article in Fig. 1 include:

Whole lung tissue from 3month and 24monthold mice [30]. Pseudobulk data from type 2 pneumocytes were created. These data are available from GEO under accession number GSE124872.

Xenopus tail from regenerationcompetent and incompetent tadpoles, 1–3 days postamputation [31]. Pseudobulk data from goblet cells were created. The data is available in the scRNAseq package [44].

PBMCs from healthy controls and COVID19 patients of varying severity (asymptomatic, moderate, or severe) [32]. Pseudobulk data from CD56^{dim} CD16^{+} NK cells were created. These data are available from the CNGB Nucleotide Sequence Archive (CNSA) under accession number CNP0001250.

Human lung tissue from nonfibrotic and pulmonary fibrosis lungs [33]. Pseudobulk data from macrophage cells were created. These data are available from GEO under accession number GSE135893.
COVID19 datasets: PBMC1 and PBMC2
We examined two separate scRNAseq datasets that sequenced PBMCs from COVID19 patients with varying severity (asymptomatic, moderate, and severe) and healthy controls. We refer to the first dataset described above as “PBMC1”. The second dataset, which we refer to as “PBMC2” [35], is available from https://covid19cellatlas.org/.
Filtering, data normalization, and downstream analysis
Prior to creating pseudobulk samples, we performed filtering at the gene and celllevel. We then selected one cell type per dataset to create pseudobulk samples. We filtered out pseudobulk samples with relatively fewer cells or smaller library sizes before performing DE analysis (see Additional file 2: Table S1 for further details).
Normalization was then performed for each dataset using the trimmed mean of M values (TMM) method [45] using the calcNormFactors function.
The goana function in limma was used to carry out Gene Ontology (GO) analyses on DE results from the COVID19 NK cells, using the org.Hs.eg.db annotation package (version 3.14.0) [46].
Software
Results were generated using R version 4.1.3 [47], and software packages limma version 3.50.0, edgeR version 3.36.0, and ggplot version 3.3.5 [48].
Availability of data and materials
Datasets of whole lung tissue from 3month and 24monthold mice [30] are available from GEO under accession number GSE124872. Datasets of Xenopus tail from regenerationcompetent and incompetent tadpoles are available in the scRNAseq package [44]. Datasets of PBMCs from healthy controls and COVID19 patients of varying severity are available from the CNGB Nucleotide Sequence Archive (CNSA) under accession number CNP0001250 [32], and https://covid19cellatlas.org/ [35]. Human lung tissue from nonfibrotic and pulmonary fibrosis lungs [33] are available from GEO under accession number GSE135893.
Code for the voombased methods, simulations, and comparisons of DE analysis methods on the various datasets explored are available from GitHub at https://github.com/YOUk/voomByGroup [49]. A Zenodo repository for the source codes is available under doi: 10.5281/zenodo.7847793 [50]. The source code is licensed under an MIT License.
Change history
12 May 2023
A Correction to this paper has been published: https://doi.org/10.1186/s13059023029652
References
Chen X, Teichmann SA, Meyer KB. From tissues to cell types and back: singlecell gene expression analysis of tissue architecture. Ann Rev Biomed Data Sci. 2018;1:29–51.
Street K, Risso D, Fletcher RB, Das D, Ngai J, Yosef N, et al. Slingshot: cell lineage and pseudotime inference for singlecell transcriptomics. BMC Genomics. 2018;19(1):1–16.
Hou R, Denisenko E, Ong HT, Ramilowski JA, Forrest AR. Predicting celltocell communication networks using NATMI. Nat Commun. 2020;11(1):1–11.
Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek AK, et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in singlecell RNA sequencing data. Genome Biol. 2015;16(1):1–13.
Bacher R, Kendziorski C. Design and computational analysis of singlecell RNAsequencing experiments. Genome Biol. 2016;17(1):1–14.
Bartoschek M, Oskolkov N, Bocci M, Lövrot J, Larsson C, Sommarin M, et al. Spatially and functionally distinct subclasses of breast cancerassociated fibroblasts revealed by single cell RNA sequencing. Nat Commun. 2018;9(1):1–13.
Vu TN, Wills QF, Kalari KR, Niu N, Wang L, Rantalainen M, et al. BetaPoisson model for singlecell RNAseq data analyses. Bioinformatics. 2016;32(14):2128–35.
Miao Z, Deng K, Wang X, Zhang X. DEsingle for detecting three types of differential expression in singlecell RNAseq data. Bioinformatics. 2018;34(18):3223–4.
Mou T, Deng W, Gu F, Pawitan Y, Vu TN. Reproducibility of methods to detect differentially expressed genes from singlecell RNA sequencing. Front Genet. 2020;10:1331.
Jaakkola MK, Seyednasrollah F, Mehmood A, Elo LL. Comparison of methods to detect differentially expressed genes between singlecell populations. Brief Bioinforma. 2017;18(5):735–43.
Soneson C, Robinson MD. Bias, robustness and scalability in singlecell differential expression analysis. Nat Methods. 2018;15(4):255–61.
Tung PY, Blischak JD, Hsiao CJ, Knowles DA, Burnett JE, Pritchard JK, et al. Batch effects and the effective design of singlecell gene expression studies. Sci Rep. 2017;7(1):1–15.
Crowell HL, Soneson C, Germain PL, Calini D, Collin L, Raposo C, et al. Muscat detects subpopulationspecific state transitions from multisample multicondition singlecell transcriptomics data. Nat Commun. 2020;11(1):1–12.
Squair JW, Gautier M, Kathe C, Anderson MA, James ND, Hutson TH, et al. Confronting false discoveries in singlecell differential expression. Nat Commun. 2021;12(1):1–15.
Lun AT, Marioni JC. Overcoming confounding plate effects in differential expression analyses of singlecell RNAseq data. Biostatistics. 2017;18(3):451–64.
Zimmerman KD, Espeland MA, Langefeld CD. A practical solution to pseudoreplication bias in singlecell studies. Nat Commun. 2021;12(1):1–9.
Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. limma powers differential expression analyses for RNAsequencing and microarray studies. Nucleic Acids Res. 2015;43(7):e47–e47.
Law CW, Chen Y, Shi W, Smyth GK. voom: Precision weights unlock linear model analysis tools for RNAseq read counts. Genome Biol. 2014;15(2):1–17.
Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139–40.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNAseq data with DESeq2. Genome Biology. 2014;15(12):1–21.
Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Statistical applications in genetics and molecular biology. 2004;3(1):1–25.
Phipson B, Lee S, Majewski IJ, Alexander WS, Smyth GK. Robust hyperparameter estimation protects against hypervariable genes and improves power to detect differential expression. Ann Appl Stat. 2016;10(2):946–63.
Anders S, Huber W. Differential expression analysis for sequence count data. Genome Biol. 2010;11:R106.
Pal B, Chen Y, Vaillant F, Capaldo BD, Joyce R, Song X, et al. A singlecell RNA expression atlas of normal, preneoplastic and tumorigenic states in the human breast. EMBO J. 2021;40(11):e107333.
Yang K, Li J, Gao H. The impact of sample imbalance on identifying differentially expressed genes. BMC Bioinformatics. 2006;7(4):1–13.
Demissie M, Mascialino B, Calza S, Pawitan Y. Unequal group variances in microarray data analyses. Bioinformatics. 2008;24(9):1168–74.
Ran D, Daye ZJ. Gene expression variability and the analysis of largescale RNAseq studies with the MDSeq. Nucleic Acids Res. 2017;45(13):e127–e127.
Chen W, Li Y, Easton J, Finkelstein D, Wu G, Chen X. UMIcount modeling and differential expression analysis for singlecell RNA sequencing. Genome Biol. 2018;19(1):1–17.
McCarthy DJ, Chen Y, Smyth GK. Differential expression analysis of multifactor RNASeq experiments with respect to biological variation. Nucleic Acids Res. 2012;40(10):4288–97.
Angelidis I, Simon LM, Fernandez IE, Strunz M, Mayr CH, Greiffo FR, et al. An atlas of the aging lung mapped by single cell transcriptomics and deep tissue proteomics. Nat Commun. 2019;10(1):1–17.
Aztekin C, Hiscock T, Marioni J, Gurdon J, Simons B, Jullien J. Identification of a regenerationorganizing cell in the Xenopus tail. Science. 2019;364(6441):653–8.
Zhao XN, You Y, Cui XM, Gao HX, Wang GL, Zhang SB, et al. Singlecell immune profiling reveals distinct immune response in asymptomatic COVID19 patients. Signal Transduct Target Ther. 2021;6(1):1–11.
Habermann AC, Gutierrez AJ, Bui LT, Yahn SL, Winters NI, Calvi CL, et al. Singlecell RNA sequencing reveals profibrotic roles of distinct epithelial and mesenchymal lineages in pulmonary fibrosis. Sci Adv. 2020;6(28):eaba1972.
Stegle O, Teichmann SA, Marioni JC. Computational and analytical challenges in singlecell transcriptomics. Nat Rev Genet. 2015;16(3):133–45.
Stephenson E, Reynolds G, Botting RA, CaleroNieto FJ, Morgan MD, Tuong ZK, et al. Singlecell multiomics analysis of the immune response in COVID19. Nat Med. 2021;27(5):904–16.
Law CW, Zeglinski K, Dong X, Alhamdoosh M, Smyth GK, Ritchie ME. A guide to creating design matrices for gene expression experiments. F1000Research. 2020;9:1444.
Zhou X, Lindsay H, Robinson MD. Robustly detecting differential expression in RNA sequencing data using observation weights. Nucleic Acids Research. 2014;42(11);91.
Murphy AE, Skene NG. A balanced measure shows superior performance of pseudobulk methods in singlecell RNAsequencing analysis. Nat Commun. 2022;13(1):7851.
Cleveland WS. Robust locally weighted regression and smoothing scatterplots. J Am Stat Assoc. 1979;74(368):829–36.
Liu R, Holik AZ, Su S, Jansz N, Chen K, Leong HS, et al. Why weight? Modelling sample and observational level variability improves power in RNAseq analyses. Nucleic Acids Res. 2015;43(15):e97–e97.
Lund SP, Nettleton D, McCarthy DJ, Smyth GK. Detecting differential expression in RNAsequence data using quasilikelihood with shrunken dispersion estimates. Stat Appl Genet Mol Biol. 2012;11(5):1–44.
CanoGamez E, Soskic B, Roumeliotis TI, So E, Smyth DJ, Baldrighi M, et al. Singlecell transcriptomics identifies an effectorness gradient shaping the response of CD4+ T cells to cytokines. Nat Commun. 2020;11(1):1–15.
Zappia L, Phipson B, Oshlack A. Splatter: simulation of singlecell RNA sequencing data. Genome Biol. 2017;18(1):1–15.
Risso D, Cole M. scRNAseq: Collection of Public SingleCell RNASeq Datasets. 2020. R package version 2.14.0. https://bioconductor.org/packages/scRNAseq.
Robinson MD, Oshlack A. A scaling normalization method for differential expression analysis of RNAseq data. Genome Biol. 2010;11(3):1–9.
Carlson M. org.Hs.eg.db: Genome wide annotation for Human. 2021. R package version 3.14.0. https://bioconductor.org/packages/org.Hs.eg.db/.
R Core Team. R: A Language and Environment for Statistical Computing. Vienna: 2022. https://www.Rproject.org/. Accessed 3 May 2023.
Wickham H. ggplot2: Elegant Graphics for Data Analysis. :New York SpringerVerlag; 2016. https://ggplot2.tidyverse.org. Accessed 3 May 2023.
You Y. Scripts from the Modeling heteroscedastic groups in singlecell RNAseq data. 2023. Github. https://github.com/YOUk/voomByGroup. Accessed 3 May 2023.
You Y. Source code of function voomByGroup and associated scripts. 2023. Zenodo. https://doi.org/10.5281/zenodo.7847793. Accessed 3 May 2023.
Peer review information
Kevin Pang was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Review history
The review history is available as Additional file 3.
Funding
This work was supported by the Chan Zuckerberg Initiative Essential Open Source Software for Science Program (Grant no. 2019207283 to G.K.S and C.W.L. and Grant no. 2021237445 to G.K.S) and the Chan Zuckerberg Initiative DAF, an advised fund of Silicon Valley Community Foundation (Grant No. 2019002443 to M.E.R.) and an Australian National Health and Medical Research Council (NHMRC) Investigator Grant (2017257 to M.E.R).
Author information
Authors and Affiliations
Contributions
Y.Y. performed the data analysis, generated all figures, and wrote the manuscript. C.W.L., X.D., and Y.Y. created new analysis methods and code. Y.K.W., M.J.M., and M.A. provided relevant human datasets and feedback on analysis results. G.K.S. created the framework for data simulation and advised on methods development. P.F.H., M.E.R., and C.W.L. designed the study. G.K.S., P.F.H., M.E.R., and C.W.L. supervised the analysis and wrote the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The original version of this article was revised: You, Ritchie and Law are all corresponding authors for this article. The affiliation for Xueyi Dong has been corrected.
Supplementary information
Additional file 1:
Supplementary Figures.
Additional file 2:
Table S1. Summary of filtering steps applied to each dataset.
Additional file 3:
Review history.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
You, Y., Dong, X., Wee, Y.K. et al. Modeling group heteroscedasticity in singlecell RNAseq pseudobulk data. Genome Biol 24, 107 (2023). https://doi.org/10.1186/s13059023029492
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13059023029492
Keywords
 Pseudobulk scRNAseq
 Differential expression analysis
 Group heteroscedasticity