 Method
 Open Access
 Published:
SCEPTRE improves calibration and sensitivity in singlecell CRISPR screen analysis
Genome Biology volume 22, Article number: 344 (2021)
Abstract
Singlecell CRISPR screens are a promising biotechnology for mapping regulatory elements to target genes at genomewide scale. However, technical factors like sequencing depth impact not only expression measurement but also perturbation detection, creating a confounding effect. We demonstrate on two singlecell CRISPR screens how these challenges cause calibration issues. We propose SCEPTRE: analysis of singlecell perturbation screens via conditional resampling, which infers associations between perturbations and expression by resampling the former according to a working model for perturbation detection probability in each cell. SCEPTRE demonstrates very good calibration and sensitivity on CRISPR screen data, yielding hundreds of new regulatory relationships supported by orthogonal biological evidence.
Background
The noncoding genome plays a crucial role in human development and homeostasis: over 90% of loci implicated by GWAS in diseases lie in regions outside proteincoding exons [1]. Enhancers and silencers, segments of DNA that modulate the expression of a gene or genes in cis, harbor many or most of these noncoding trait loci. While millions of cisregulatory elements (CREs) have been nominated through biochemical annotations, the functional role of these CREs, including the genes that they target, remain essentially unknown [2]. A central challenge over the coming decade, therefore, is to unravel the cisregulatory landscape of the genome across various cell types and diseases.
Singlecell CRISPR screens (implemented by Perturbseq [3, 4], CROPseq [5], ECCITEseq [6], and other protocols) are among the most promising technologies for mapping CREs to their target genes at genomewide scale. Singlecell CRISPR screens pair CRISPR perturbations with singlecell sequencing to survey the effects of perturbations on cellular phenotypes, including the transcriptome. High multiplicity of infection (MOI) screens deliver dozens perturbations to each cell [7–9], enabling the interrogation of hundreds or thousands of CREs in a single experiment. Singlecell screens overcome the limitations of previous technologies for mapping CREs [9]: unlike eQTLs, singlecell screens are highresolution and can target rare variants, and unlike bulk screens, singlecell screens measure the impact of perturbations on the entire transcriptome.
Despite their promise, highMOI singlecell CRISPR screens pose significant statistical challenges. In particular, researchers have encountered substantial difficulties in calibrating tests of association between a CRISPR perturbation and the expression of a gene. Gasperini et al. [9] found considerable inflation in their negative binomial regressionbased pvalues for negative control perturbations. Similarly, Xie et al. [8] found an excess of falsepositive hits in their rankbased Virtual FACS analysis. Finally, Yang et al. [10] found that their permutationbased scMAGeCKRRA method deems almost all geneenhancer pairs significant in a reanalysis of the Gasperini et al. data. These works propose ad hoc fixes to improve calibration, but we argue that these adjustments are insufficient to address the issue. Miscalibrated pvalues can adversely impact the reliability of data analysis conclusions by creating excesses of falsepositive and falsenegative discoveries.
In this work, we make two contributions. We (i) elucidate core statistical challenges at play in highMOI singlecell CRISPR screens and (ii) present a novel analysis methodology to address them. We identify a key challenge that sets singlecell CRISPR screens apart from traditional differential expression experiments: the “treatment”—in this case the presence of a CRISPR perturbation in a given cell—is subject to measurement error [3, 11, 12]. In fact, underlying this measurement error are the same technical factors contributing to errors in the measurement of gene expression, including sequencing depth and batch effects. These technical factors therefore act as confounders, invalidating traditional nonparametric calibration approaches. On the other hand, parametric modeling of singlecell expression data is also fraught with unresolved difficulties.
To address these challenges, we propose SCEPTRE (analysis of SingleCEll PerTurbation screens via conditional REsampling; pronounced “scepter”). SCEPTRE is based on the conditional randomization test [13], a powerful and intuitive statistical methodology that, like parametric methods, enables simple confounder adjustment, and like nonparametric methods, is robust to expression model misspecification. We used SCEPTRE to analyze two recent, largescale, highMOI singlecell CRISPR screen experiments. SCEPTRE demonstrated excellent calibration and sensitivity on the data and revealed hundreds of new regulatory relationships, validated using a variety of orthogonal functional assays. In the “Discussion” section, we describe an independent work conducted in parallel to the current study in which we leveraged biobankscale GWAS data, singlecell CRISPR screens, and SCEPTRE to dissect the cis and trans effects of noncoding variants associated with blood diseases [14]. This work highlights what we see as a primary application of SCEPTRE: dissecting regulatory mechanisms underlying GWAS associations.
Results
Analysis challenges
We examined two recent singlecell CRISPR screen datasets — one produced by Gasperini et al. [9] and the other by Xie et al. [8] — that exemplify several of the analysis challenges in highMOI singlecell CRISPR screens. Gasperini et al. and Xie et al. used CRISPRi to perturb putative enhancers at high MOI in K562 cells. They sequenced polyadenylated gRNAs alongside the whole transcriptome and assigned perturbation identities to cells by thresholding the resulting gRNA UMI counts.
Both Gasperini et al. and Xie et al. encountered substantial difficulties in calibrating tests of association between candidate enhancers and genes. Gasperini et al. computed pvalues using a DESeq2 [15]inspired negative binomial regression analysis implemented in Monocle2 [16], and Xie et al. computed pvalues using Virtual FACS, a nonparametric method proposed by these authors. Gasperini et al. assessed calibration by pairing each of 50 nontargeting (or negative) control gRNAs with each proteincoding gene. These “null” pvalues exhibited inflation, deviating substantially from the expected uniform distribution (Fig. 1a, red). To assess the calibration of Virtual FACS in a similar manner, we constructed a set of in silico negative control pairs of genes and gRNAs on the Xie et al. data (see the “Methods” section). The resulting pvalues were likewise miscalibrated, with some pairs exhibiting strong conservative bias and others strong liberal bias (Fig. 1a, graygreen).
A core challenge in the analysis of singlecell CRISPR screens is the presence of confounders, technical factors that impact both gRNA detection probability and gene expression. The total number of gRNAs detected in a cell increases with the total number of mRNA UMIs detected in a cell (ρ=0.35,p<10^{−15} in Gasperini et al. data; ρ=0.25,p<10^{−15} in Xie et al. data; Figs. 1b, c). Technical covariates, such as sequencing depth and batch, induce a correlation between gRNA detection probability and gene expression, even in the absence of a regulatory relationship (Fig. 1d). This confounding effect can lead to severe test miscalibration and is especially problematic for traditional nonparametric approaches, which implicitly (and incorrectly) treat cells symmetrically with respect to confounders.
Parametric regression approaches, like negative binomial regression, are the most straightforward way to adjust for confounders. However, parametric methods rely heavily on correct model specification, a challenge in singlecell analysis given the heterogeneity and complexity of the count data. We hypothesized that inaccurate estimation of the negative binomial dispersion parameter was (in part) responsible for the pvalue inflation observed by Gasperini et al. Monocle2 estimates a raw dispersion for each gene, fits a parametric meandispersion relationship across genes, and finally collapses raw dispersion estimates onto this fitted line (Fig. 1e). We computed the deviation from uniformity of the negative control pvalues for each gene using the KolmogorovSmirnov (KS) test, represented by the color of each point in Fig. 1e. Circled genes had significantly miscalibrated pvalues based on a Bonferroni correction at level α=0.05. Genes significantly above the curve showed marked signs of pvalue inflation, suggesting model misspecification. Analysis challenges are summarized in Table 1.
Gasperini et al. and Xie et al. incorporated ad hoc adjustments into their analyses to remedy the observed calibration issues. On closer inspection, however, these efforts were not satisfactory to ensure reliability of the results. Gasperini et al. attempted to calibrate pvalues against the distribution negative control pvalues instead of the more standard uniform distribution. This adjustment lead to overcorrection for some geneenhancer pairs (false negatives) and undercorrection for others (false positives) (Fig. S1). Along similar lines Xie et al. compared their Virtual FACS pvalues to genespecific simulated null pvalues to produce “significance scores” that were used to determine significance. These significance scores were challenging to interpret and could not be subjected to multiple hypothesis testing correction procedures, as they are not pvalues.
Improvements to the negative binomial approach
We attempted to alleviate the miscalibration within the negative binomial regression framework by following the recommendations of Hafemeister and Satija, who recently proposed a strategy for parametric modeling of singlecell RNAseq data [17]. First, we abandoned the DESeq2style size factors of Monocle2 and instead corrected for sequencing depth by including it as a covariate in the negative binomial regression model. Second, we adopted a more flexible dispersion estimation procedure: we (i) computed raw dispersion estimates for each gene, (ii) regressed the raw dispersion estimates onto the mean gene expressions via kernel regression, and (iii) projected the raw dispersion estimates onto the fitted nonparametric regression curve.
We reanalyzed the Gasperini et al. and Xie et al. negative control data using the improved negative binomial regression approach. In addition to sequencing depth, we included as covariates in the regression model the total number of expressed genes per cell and the technical factors accounted for in the original analysis (total number of gRNAs detected per cell, percentage of transcripts mapped to mitochondrial genes, and sequencing batch). Improved negative binomial regression exhibited better calibration than Monocle regression on both Gasperini et al. and Xie et al. datasets. Still, improved negative binomial regression demonstrated clear pvalue inflation. We concluded that parametric count models likely are challenging to calibrate to highMOI singlecell CRISPR screen data.
SCEPTRE: analysis of singlecell perturbation screens via conditional resampling
To address the challenges identified above, we propose SCEPTRE, a methodology for singlecell CRISPR screen analysis based on the simple and powerful conditional randomization test [13] (Fig. 2). To test the association between a given gRNA and gene, we first fit the improved negative binomial statistic described above. This yields a zvalue, which typically would be compared to a standard normal null distribution based on the parametric negative binomial model. Instead, we build a null distribution for this statistic via conditional resampling. First, we estimate the probability that the gRNA will be detected in a given cell based on the cell’s technical factors, such as sequencing depth and batch. Next, we resample a large number of “null” datasets, holding gene expression and technical factors constant while redrawing gRNA assignment independently for each cell based on its fitted probability. We compute a negative binomial zvalue for each resampled dataset, resulting in an empirical null distribution (gray histogram in Fig. 2). Finally, we compute a left, right, or twotailed probability of the original zvalue under the empirical null distribution, yielding a wellcalibrated pvalue. This pvalue can deviate substantially from that obtained based on the standard normal (Fig. 2, Fig. S2). While we used a negative binomial regression test statistic for this work, SCEPTRE in principle is compatible with any test statistic that reasonably tracks the expression data, including, for example, statistics based on machine learning algorithms.
We leverage several computational accelerations to enable SCEPTRE to scale to large singlecell CRISPR screen datasets. First, we approximate the null histogram of the resampled test statistics using a skewt distribution to obtain precise pvalues based on a limited number of resamples (500 in the current implementation). Second, we employ statistical shortcuts that reduce the cost of each resample by a factor of about 100 (see the “Methods” section). Finally, we implement the method so that it can run in parallel on hundreds or thousands of processors on a computer cluster. (We used this approach in our independent study of noncoding blood trait GWAS loci [14].) We estimate that SCEPTRE can analyze 2.5 million genegRNA pairs on a dataset of 200,000 cells in a single day using 500 processors.
SCEPTRE demonstrates good calibration and sensitivity on real and simulated data
First, we investigated the calibration of SCEPTRE in a small, proofofconcept simulation study (Fig. 3a). We considered a class of negative binomial regression models with fixed dispersion and two technical covariates (sequencing depth and batch). We simulated expression data for a single gene in 1000 cells using four models selected from this class: the first with dispersion =1, the second with dispersion =0.2, the third with dispersion =5, and the last with dispersion =1, but with 25% zeroinflation. We also simulated negative control gRNA data using a logistic regression model with the same covariates as the gene expression model. We assessed the calibration of SCEPTRE and negative binomial regression across the four simulated datasets. To explore the impact of model misspecification on SCEPTRE and the negative binomial method (on which SCEPTRE relies), we fixed the dispersion of the negative binomial method to 1. The negative binomial method worked as expected when the model was correctly specified. However, negative binomial regression broke down in all three cases of model misspecification. SCEPTRE demonstrated good calibration in all settings.
Next, to assess the calibration of SCEPTRE on real data, we applied SCEPTRE to test the association between negative control gRNAs and genes in the Gasperini et al. data (Fig. 3b) and Xie et al. data (Fig. 3c). We compared SCEPTRE to Monocle regression and the improved negative binomial method. For the Xie et al. data, we also compared to Virtual FACS, the method originally applied to the data. SCEPTRE showed good calibration on both datasets; by contrast, Monocle regression and improved negative binomial regression demonstrated signs of severe pvalue inflation, while Virtual FACS exhibited a bimodal pvalue distribution peaked at 0 and 1.
SCEPTRE demonstrated modestly better calibration on the Gasperini et al. data than on the Xie et al. data. This likely is because the Gasperini et al. negative control pairs — which consisted of real, nontargeting gRNAs — were higherquality than the Xie et al. negative control pairs — which were constructed in silico using enhancertargeting gRNAs (see the “Methods” section). We reasoned that the Xie et al. negative controls carried mild regulatory signal, resulting in slight inflation of the SCEPTRE pvalues on these data relative to the Gasperini et al. data.
To assess the sensitivity of SCEPTRE, we applied SCEPTRE to test the 381 positive control pairs of genes and TSStargeting gRNAs assayed by Gasperini et al. (Fig. 3d). Allowing for the fact that the empirical correction employed by Gasperini et al. limited the accuracy of pvalues to about 10^{−6}, the SCEPTRE pvalues for the positive controls were highly significant, and in particular, almost always more significant than the original empirical pvalues, indicating greater sensitivity. Finally, we assessed the sensitivity of SCEPTRE on the Xie et al. data. Xie et al. conducted an arrayed CRISPR screen with bulk RNAseq readout of ARL15enh, a putative enhancer of gene ARL15. Both SCEPTRE and the bulk RNAseq differential expression analysis rejected ARL15 at an FDR of 0.1 after a BenjaminiHochberg (BH) correction, increasing our confidence in the calibration and sensitivity of SCEPTRE (Fig. 3e).
Analysis of candidate cisregulatory pairs
We applied SCEPTRE to test all candidate cisregulatory pairs in the Gasperini et al. (n=84,595) and Xie et al. (n=5,209) data. A given gene and gRNA were considered a “candidate pair” if the gRNA targeted a site within one Mb the gene’s TSS. SCEPTRE discovered 563 and 139 geneenhancer links at an FDR of 0.1 on the Gasperini et al. and Xie et al. data, respectively. We used several orthogonal assays to quantify the enrichment of SCEPTRE’s discovery set for regulatory biological signals, and we compared the SCEPTRE results to those of other methods.
SCEPTRE’s discovery set on the Gasperini et al. data was highly biologically plausible, and in particular, more enriched for biological signals of regulation than the original discovery set. Gasperini et al. discovered 470 genegRNA pairs at a reported FDR of 0.1. The SCEPTRE pvalues and original empirical pvalues diverged substantially: of the 670 geneenhancer pairs discovered by either method, SCEPTRE and the original method agreed on only 363, or 54% (Fig. 4a). Geneenhancer pairs discovered by SCEPTRE were physically closer (mean distance =65 kb) to each other than those discovered by the original method (mean distance =81 kb; Fig. 4b). Furthermore, SCEPTRE’s geneenhancer pairs fell within the same topologically associating domain (TAD) at a higher frequency (74%) than the original pairs (71%). Pairs within the same TAD showed similar levels of HIC interaction frequency across methods, despite the fact that SCEPTRE discovered 85 more sameTAD pairs (Fig. 4c). Finally, enhancers discovered by SCEPTRE showed improved enrichment across all eight celltype relevant ChIPseq targets reported by Gasperini et al. (Fig. 4d, Fig. S5a).
When we compared discoveries unique to SCEPTRE (n=200) against those unique to the original method (n=107), the disparities became more extreme (Fig. S3). For example, only 57% of pairs unique to the original method fell within the same TAD, compared to 73% unique to SCEPTRE. We concluded that many pairs in the Gasperini et al. discovery set likely were false positives. Finally, when we compared SCEPTRE to the improved negative binomial method (n = 824 discoveries), we observed even greater differences in discovery set quality in favor of SCEPTRE (Figs. 4b–d).
We highlight several especially interesting geneenhancer pairs discovered by SCEPTRE. Five discoveries (Fig. 4a, labels 1–5; Fig. 4e) were nominated as probable geneenhancer links by eQTL [18] and eRNA [19] pvalues in relevant tissue types. The SCEPTRE pvalues for these pairs were 1–2 orders of magnitude smaller than the original empirical pvalues, hinting at SCEPTRE’s greater sensitivity. Additionally, six pairs (Fig. 4a, blue triangles) were discovered by SCEPTRE but discarded as outliers by the original analysis, underscoring SCEPTRE’s ability to handle genes with arbitrary expression distributions.
We repeated the same orthogonal analyses for the SCEPTRE discoveries on the Xie et al. data, comparing SCEPTRE’s results to those of Xie et al. Xie et al.’s analysis method, Virtual FACS, outputted “significance scores” rather than pvalues (see the “Analysis challenges” section). Because significance scores cannot be subjected to multiple hypothesis testing correction procedures (like BH), we compared the top 139 Virtual FACS pairs (ranked by significance score) against the set of 139 (FDR =0.1) SCEPTRE discoveries (Fig. 5a; see the “Methods” section). Of the 195 pairs in either set, SCEPTRE and Virtual FACS agreed on only 83, or 43%. The SCEPTRE discoveries were more biologically plausible: compared to the Virtual FACS pairs, the SCEPTRE pairs were (i) physically closer (Fig. 5b), (ii) more likely to fall within the same TAD (Fig. 5c), (iii) more likely to interact when in the same TAD (Fig. 5c), and (iv) more enriched for all eight celltype relevant ChIPseq targets (Fig. 5d, Fig. S5b). When we examined the symmetric difference of the discovery sets, these differences became more pronounced (Fig. S4).
We additionally compared SCEPTRE to Monocle regression (n=180 discoveries) and improved negative binomial regression (n=156 discoveries) on the Xie et al. data. SCEPTRE uniformly dominated Monocle: SCEPTRE pairs were physically closer to one another (median distance = 44kb versus 110kb; Fig. 5b); SCEPTRE pairs interacted more frequently and were more likely to fall within the same TAD (68% versus 61%; Fig. 5c); and SCEPTRE pairs were more enriched for seven of eight cell typerelevant ChIPseq targets (one target, DP22, was a tie; Fig. 5d). Improved negative binomial regression was more competitive than Monocle across metrics (Fig. 5b–d). However, as noted earlier, improved negative binomial regression exhibited severe miscalibration on the negative control pairs (Fig. 3d), rendering its discovery set less reliable than that of SCEPTRE.
Gene expression level and sensitivity
To better understand when SCEPTRE works best, we investigated the impact of gene expression level on the sensitivity of SCEPTRE. We binned candidate geneenhancer pairs into nonoverlapping categories based on mean expression level of the gene. On both the Gasperini et al. and Xie et al. data, we found that candidate pairs containing a highly expressed gene were more likely to be rejected than candidate pairs containing a lowlyexpressed gene (Tables S1, S2), indicating SCEPTRE’s greater sensitivity for highly expressed genes. We observed similar trends for the other methods (not shown), consistent with the intuition that highlyexpressed genes carry more information.
Discussion
In this work we illustrated a variety of statistical challenges arising in the analysis of highMOI singlecell CRISPR screens, leaving existing methods (based on parametric expression models, permutations, or negative control data) vulnerable to miscalibration. To address these challenges, we developed SCEPTRE, a resampling method based on modeling the probability a given gRNA will be detected in a given cell, based on that cell’s technical factors. We found that SCEPTRE exhibited very good calibration despite the presence of confounding technical factors and misspecification of singlecell gene expression models. We implemented computational accelerations to bring the cost of the resamplingbased methodology down to well within an order of magnitude of the traditional negative binomial parametric approach, making it quite feasible to apply for largescale data. We used SCEPTRE to reanalyze the Gasperini et al. and Xie et al. data. While our analysis replicated many of their findings, it also clarified other relationships, removing a large set (>20% for Gasperini) of pairs that exhibited a weak relationship and adding an even larger set (>40% for Gasperini) of new, biologically plausible geneenhancer relationships. These links were supported by orthogonal evidence from eQTL, enhancer RNA coexpression, ChIPseq, and HIC data.
As an example application of SCEPTRE, we highlight STINGseq, a platform that we developed in parallel to the current work in an independent study [14]. STINGseq leverages biobankscale GWAS data and singlecell CRISPR screens to map noncoding, diseaseassociated variants at scale. First, we used statistical finemapping to identify a set of 88 putatively causal variants from 56 loci associated with quantitative blood traits. We perturbed the selected variants at high MOI in K562 cells using an improved CRISPRi platform and sequenced gRNAs and transcriptomes in individual cells using ECCITEseq [6], a protocol that enables the profiling of multiple modalities and the direct capture of gRNAs.
We then applied SCEPTRE to quantify associations between perturbations and changes in gene expression in cis (within 500 kb) and trans. SCEPTRE confidently mapped 37 noncoding variants to their cis target genes, in some cases identifying a causal variant among a set of candidate variants in strong LD. Nine variants were found to regulate a gene other than the closest gene, and four variants were found to regulate multiple genes, an apparent example of pleiotropy. Several perturbations lead to widespread changes in gene expression, illuminating transeffects networks. For example, two variants that were found to regulate the transcription factor GFI1B in cis altered the expression of hundreds of genes in trans upon perturbation; these differentially expressed genes were strongly enriched for GFI1B binding sites and blood disease GWAS hits. We concluded on the basis of this study SCEPTRE can power the systematic dissection regulatory networks underlying GWAS associations.
Despite these exciting results, key challenges remain in the analysis of singlecell CRISPR screens. Currently, SCEPTRE does not estimate the effect sizes of perturbations, disentangle interactions among perturbed regulatory elements [20, 21], or leverage information from offtargeting gRNAs to improve power. Such extensions could be implemented by harnessing more sophisticated, multivariate models of gRNA detection or applying methods for estimating variable importance in the presence of possibly misspecified models [22]. The statistical challenges that we identified in this study — specifying an accurate expression model and accounting for technical factors — and the solutions that we proposed — conditional resampling and massively parallel association testing — will help guide the development of future versions of SCEPTRE.
Conclusions
Singlecell CRISPR screens will play a key role in unraveling the regulatory architecture of the noncoding genome [23]. Technological improvements and methodological innovations will increase the scope, scale, and variety of theses screens over the coming years. For example, screens of candidate CREs could be extended to different, diseaserelevant cell types and tissues (although this remains a challenge); new combinatorial indexing strategies, such as scifiRNAseq, could enable the scalingup of such screens to millions of cells [24]; different CRISPR technologies, such as CRISPRa, could enable the activation, rather than repression, of candidate CREs, yielding new insights; and informationrich, multimodal singlecell readouts could strengthen conclusions drawn about regulatory relationships [25]. SCEPTRE is a flexible, robust, and efficient method: it has now successfully been applied to three singlecell CRISPR screen datasets, across two technologies (CROPseq and ECCITEseq), to map regulatory relationships both in cis and in trans. We expect SCEPTRE to facilitate the analysis of future singlecell screens of the noncoding genome, advancing understading of CREs and enabling the detailed interpretation of GWAS results.
Methods
Gasperini et al. and Xie et al. data
Gasperini et al. used CROPseq [5, 11] to transduce a library of CRISPR guide RNAs (gRNAs) into a population of 207,324 K562 cells expressing the Cas9KRAB repressive complex at a high multiplicity of infection. Each cell received an average of 28 perturbations. The gRNA library targeted 5779 candidate enhancers, 50 negative controls, and 381 TSStargeting positive controls. Xie et al. used Mosaicseq [7, 8] to perturb at a high multiplicity of infection 518 putative enhancers in a population of 106,670 Cas9KRABexpressing K562 cells. Each putative enhancer was perturbed in an average of 1276 cells.
Cis and in silico negative control pairs for Xie et al. data
We generated the set of candidate cis geneenhancer relationships on the Xie et al. data by pairing each proteincoding gene with each gRNA targeting a site within 1 Mb of the TSS of the gene. This procedure resulted in 3553 candidate cis geneenhancer links that we tested using SCEPTRE and Virtual FACS.
To generate the set of in silico negative control pairs for calibration assessment, we (i) identified gRNAs that targeted sites far (>1 Mb) from the TSSs of known transcription factor genes and (ii) paired these gRNAs with genes located on other chromosomes. We excluded all pairs containing genes known to be transcription factors, and we downsampled the pairs so that each gRNA was matched to 500 genes. The final in silico negative control set consisted of 84,500 pairs, the elements of which were not expected to exhibit a regulatory relationship.
Conditional randomization test
Consider a particular genegRNA pair. For each cell \(i = 1, \dots, n\), let X_{i}∈{0,1} indicate whether the gRNA was present in the cell, let \(Y_{i} \in \{0,1,2,\dots \}\) be the gene expression in the cell, defined as the number of unique molecular identifiers (UMIs) from this gene, and let \(Z_{i} \in \mathbb R^{d}\) be a list of celllevel technical factors. Letting \((X, Y, Z) = \{(X_{i}, Y_{i}, Z_{i})\}_{i = 1}^{n}\), consider any test statistic T(X,Y,Z) measuring the effect of the gRNA on the expression of the gene. The conditional randomization test [13] is based on resampling the gRNA indicators independently for each cell. Letting \(\pi _{i} = \mathbb P[X_{i} = 1  Z_{i}]\), define random variables
Then, the CRT pvalue is given by
This translates to repeatedly sampling \(\widetilde X\) from the distribution (1), recomputing the test statistic with X replaced by \(\widetilde X\), and defining the pvalue as the probability the resampled test statistic exceeds the original. Under the null hypothesis that the gRNA perturbation does not impact the cell (adjusting for technical factors), i.e., Y⊥ ⊥X∣Z, we obtain a valid pvalue (2), regardless of the expression distribution YX,Zand regardless of the test statistic T. We choose as a test statistic T the zscore of X_{i} obtained from a negative binomial regression of Y_{i} on X_{i} and Z_{i}:
where α is the dispersion. Following Hafemeister and Satjia [17], we estimate α by pooling dispersion information across genes, and we include sequencing depth as an entry in the vector of technical factors Z_{i} (see the “Improvements to the negative binomial approach” section).
Accelerations to the conditional randomization test
We implemented computational accelerations to the conditional randomization test. First, we employed the recently proposed [26] distillation technique to accelerate the recomputation of the negative binomial regression for each resample. The idea is to use a slightly modified test statistic, consisting of two steps:

1
Fit \((\widehat \beta _{0}, \widehat \gamma)\) from the negative binomial regression (3) except without the gRNA term:
$$ Y_{i} \overset{\text{ind}}\sim \text{NegBin}(\mu_{i}, \alpha); \quad \log(\mu_{i}) = \beta_{0} + Z_{i}^{T} \gamma. $$(4) 
2
Fit \(\widehat \beta \) from a negative binomial regression with the estimated contributions of Z_{i} from step 1 as offsets:
$$ Y_{i} \overset{\text{ind}}\sim \text{NegBin}(\mu_{i}, \alpha); \quad \log(\mu_{i}) = X_{i} \beta + \widehat \beta_{0} + Z_{i}^{T} \widehat \gamma. $$(5)
Conditional randomization testing with this twostep test statistic, which is nearly identical to the full negative binomial regression (3), is much faster. Indeed, since the first step is not a function of X_{i}, it remains the same for each resampled triple \((\widetilde X, Y, Z)\). Therefore, only the second step must be recomputed with each resample, and this step is faster because it involves only a univariate regression.
Next, we accelerated the second step above using the sparsity of the binary vector \((X_{1}, \dots, X_{n})\) (or a resample of it). To do so, we wrote the loglikelihood of the reduced negative binomial regression (5) as follows, denoting by ℓ(Y_{i}, log(μ_{i})) the negative binomial loglikelihood:
This simple calculation shows that, up to a constant that does not depend on β, the negative binomial loglikelihood corresponding to the model (5) is the same as that corresponding to the model with only intercept and offset term for those cells with a gRNA:
The above negative binomial regression is therefore equivalent to Eq. 5, but much faster to compute, because it involves much fewer cells. For example, in the Gasperini et al. data, each gRNA is observed in only about 1000 of the 200,000 total cells.
SCEPTRE methodology
In practice, we must estimate the gRNA probabilities π_{i} as well as the pvalue p_{CRT}. This is because usually we do not know the distribution XZ and cannot compute the conditional probability in Eq. 2 exactly. We propose to estimate π_{i} via logistic regression of X on Z, and to estimate p_{CRT} by resampling \(\widetilde X\) a large number of times and then fitting a skewt distribution to the resampling null distribution \(T(\widetilde X, Y, Z)X,Y,Z\). We outline SCEPTRE below:

1
Fit technical factor effects \((\widehat \beta _{0},\widehat \gamma)\) on gene expression using the negative binomial regression (4).

2
Extract a zscore z(X,Y,Z) from the reduced negative binomial regression (6).

3
Assume that
$$ X_{i} \overset{\text{ind}}\sim \text{Ber}(\pi_{i}); \quad \log\left(\frac{\pi_{i}}{1\pi_{i}}\right) = \tau_{0} + Z_{i}^{T} \tau $$(7)for \(\tau _{0} \in \mathbb R\) and \(\tau \in \mathbb R^{d}\), and fit \((\widehat \tau _{0}, \widehat \tau)\) via logistic regression of X on Z. Then, extract the fitted probabilities \(\widehat \pi _{i} = (1+\exp ((\widehat \tau _{0} + Z_{i}^{T} \widehat \tau)))^{1}\).

4
For \(b = 1, \dots, B\),

5
Fit a skewt distribution \(\widehat F_{\text {null}}\) to the resampled zscores \(\{z(\widetilde X^{b}, Y, Z)\}_{b = 1}^{B}\).

6
Return the pvalue \(\widehat p_{\text {SCEPTRE}} = \mathbb P[\widehat F_{\text {null}} \leq z(X,Y,Z)]\).
In our data analysis, we used B=500 resamples.
Numerical simulation to assess calibration
We simulated one gene Y_{i}, one gRNA X_{i}, and two confounders Z_{i1},Z_{i2} in n=1000 cells. We generated the confounders Z_{i1} and Z_{i2} by sampling with replacement the batch IDs and logtransformed sequencing depths of the cells in the Gasperini dataset. The batch ID confounder Z_{i1} was a binary variable, as the Gasperni data included two batches. Next, we drew the gRNA indicators X_{i} i.i.d. from the logistic regression model (7), with τ_{0}=−7,τ_{1}=−2, and τ_{2}=0.5. We selected these parameters to make the probability of gRNA occurrence about 0.04 across cells. Finally, we drew the gene expression Y_{i} from the following zeroinflated negative binomial model:
Note that gRNA presence or absence does not impact gene expression in this model. We set β_{0}=−2.5,β_{1}=−2,β_{2}=0.5 to make the average gene expression about 4 across cells. We generated the four datasets shown in Fig. 3a by setting the dispersion parameter α and the zero inflation rate parameter λ equal to the following values:
For the first, the negative binomial model is correctly specified. For the second and third, the dispersion estimate of 1 is too small and too large, respectively. The last setting exhibits zero inflation. We applied SCEPTRE and negative binomial regression to the four problem settings, each with n_{sim}=500 repetitions. The negative binomial method, and in turn SCEPTRE, was based on the z statistic from the Hafemeisterinspired negative binomial model (3) with α=1. We used B=500 resamples for SCEPTRE, the default choice.
scMAGeCK
scMAGeCKLR [10] is a method for high MOI singlecell CRISPR screen analysis. (A complimentary method, scMAGeCKRRA, is designed for the lowMOI setting.) scMAGeCKLR (henceforth scMAGeCK) uses a permutation test with ridge regression test statistic to compute pvalues for pairs of genes and gRNAs. Unfortunately, we were unable to apply scMAGeCK to the real data. First, we were unable to understand the documentation of the scMAGeCK software well enough to confidently apply the method. Second, scMAGeCK is prohibitively expensive to apply atscale. The authors of the original scMAGeCK study applied their method only to a small subset of pairs in the Gasperini et al. data. We could not meaningfully compare scMAGeCK to SCEPTRE on calibration and sensitivity metrics without applying scMAGeCK to the full set of gRNAgene pairs, which, to our knowledge, never has been done (and likely is infeasible).
To enable a simple comparison to scMAGeCK on the simulated data, we implemented a custom, inhouse version of scMAGeCK based on a careful examination of the scMAGeCK codebase and a close reading of the original paper. We view this custom implementation as a faithful interpretation of the method in the specialized onegene to oneNTC setting. We applied our implementation of scMAGeCK to the simulated data, using B=1,000 permutations, the default option. To reduce confusion, we reported the results of the scMAGeCK simulation study in the supplementary materials (Fig. S6) rather than the “Results” section. We could not apply our custom implementation of scMAGeCK to the real data, because the real data are significantly more complex than the simulated data. For example, the real data consist of many genes and gRNAs, and the gRNAs are differently typed (e.g., negative control, positive control, enhancertargeting, etc.), complicating the analysis considerably.
Definition of Gasperini et al. discovery set
Gasperini et al. reported a total of 664 geneenhancer pairs, identifying 470 of these as “highconfidence.” We chose to use the latter set, rather than the former, for all our comparisons. Gasperini et al. carried out their ChIPseq and HIC enrichment analyses only on the highconfidence discoveries, so for those comparisons we did the same. Furthermore, the 664 total geneenhancer pairs reported in the original analysis were the result of a BH FDR correction that included not only the candidate enhancers but also hundreds of positive controls. While Bonferroni corrections can only become more conservative when including more hypotheses, BH corrections are known to become anticonservative when extra positive controls are included [27]. To avoid this extra risk of false positives, we chose to use the “highconfidence” set throughout.
Xie et al. significance scores and discovery set
Xie et al. reported a local (or cis) discovery set, which consisted of genegRNA pairs with a significance score of greater than zero (see original manuscript for definition of “significance score” [8]; cutoff of zero arbitrary). This discovery set was not directly comparable to the SCEPTRE discovery set. First, the candidate set of cis genegRNA pairs tested by Xie et al. consisted of gRNAs within two Mb of a proteincoding gene or longnoncoding RNA. Our candidate cis set, by contrast, consisted of gRNAs within one Mb of a proteincoding gene. We defined our candidate cis set differently than Xie et al. to maintain consistency with Gasperini et al. Second, Xie et al. appear to have used a significantly more conservative threshold than Gasperini et al. in defining their discovery set, but this was challenging to ascertain given the impossibility of FDR correction on the significance scores. To enable a meaningful comparison between Virtual FACS and SCEPTRE, we therefore ranked the Virtual FACS pairs by their significance score and selected the top n pairs, where n was the size of the SCEPTRE discovery set at FDR 0.1.
ChIPseq, HIC enrichment analyses
ChIPseq and HIC enrichment analyses on the Gasperini et al. data (see Figs. 4e, f and S5a) were carried out almost exactly following Gasperini et al. The only change we made is in our quantification of the ChIPseq enrichment (Fig. 4f). We used the odds ratio of a candidate enhancer being paired to a gene, comparing the top and bottom ChIPseq quintiles. On the Xie et al. data, we binned the candidate enhancers into two (rather than five) quantiles due to the fewer number of candidate cis pairs. We computed odds ratios by comparing enhancers in the upper quantile to those that did not intersect a ChIPseq peak at all (Fig. S5b).
Availability of data and materials
Analysis results are available online at https://upenn.box.com/v/sceptrefilesv8. All analysis was performed on publicly available data. The Gasperini et al. CRISPR screen data [9] are available at www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE120861. The Xie et al. singlecell and bulk CRISPR screen data are available at www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE129837. The ChIPseq data are taken from the ENCODE project [30] and are available at www.encodeproject.org/. The HIC enrichment analysis is based on the data from Rao et al. [31], available at www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE63525. The eQTL and eRNA coexpression pvalues are taken from the GeneHancer, database [32] available as part of GeneCards (www.genecards.org/).
The sceptre R package is available at https://github.com/KatsevichLab/sceptre. Vignettes and tutorials are available at https://katsevichlab.github.io/sceptre/. The scripts used to run the analyses reported in this paper are available at https://github.com/KatsevichLab/sceptremanuscript. (permanent version deposited at Zenodo; DOI 10.5281/zenodo.5643541 [33]). All software is released under an MIT license.
References
 1
Gallagher MD, ChenPlotkin AS. The postGWAS era: from association to function. Am J Hum Genet. 2018; 102(5):717–30.
 2
Gasperini M, Tome JM, Shendure J. Towards a comprehensive catalogue of validated and targetlinked human enhancers. Nat Rev Genet. 2020; 21(5):292–310.
 3
Dixit A, Parnas O, Li B, Chen J, Fulco CP, JerbyArnon L, Marjanovic ND, Dionne D, Burks T, Raychowdhury R, Adamson B, Norman TM, Lander ES, Weissman JS, Friedman N, Regev A. PerturbSeq: dissecting molecular circuits with scalable singlecell RNA profiling of pooled genetic screens. Cell. 2016; 167:1853–66.
 4
Adamson B, Norman TM, Jost M, Cho MY, Nuñez JK, Chen Y, Villalta JE, Gilbert LA, Horlbeck MA, Hein MY, Pak RA, Gray AN, Gross CA, Dixit A, Parnas O, Regev A, Weissman JS. A multiplexed singlecell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell. 2016; 167(7):1867–82.
 5
Datlinger P, Rendeiro AF, Schmidl C, Krausgruber T, Traxler P, Klughammer J, Schuster LC, Kuchler A, Alpar D, Bock C. Pooled CRISPR screening with singlecell transcriptome readout. Nat Methods. 2017; 14(3):297–301.
 6
Mimitou EP, Cheng A, Montalbano A, Hao S, Stoeckius M, Legut M, Roush T, Herrera A, Papalexi E, Ouyang Z, Satija R, Sanjana NE, Koralov SB, Smibert P. Multiplexed detection of proteins, transcriptomes, clonotypes and CRISPR perturbations in single cells. Nat Methods. 2019; 16(5):409–12.
 7
Xie S, Duan J, Li B, Zhou P, Hon GC. Multiplexed engineering and analysis of combinatorial enhancer activity in single cells. Mol Cell. 2017; 66:85–299.
 8
Xie S, Armendariz D, Zhou P, Duan J, Hon GC. Global analysis of enhancer targets reveals convergent enhancerdriven regulatory modules. Cell Rep. 2019; 29(9):2570–8.
 9
Gasperini M, Hill AJ, McFalineFigueroa JL, Martin B, Kim S, Zhang MD, Jackson D, Leith A, Schreiber J, Noble WS, Trapnell C, Ahituv N, Shendure J. A genomewide framework for mapping gene regulation via cellular genetic screens. Cell. 2019; 176(12):377–90.
 10
Yang L, Zhu Y, Yu H, Chen S, Chu Y, Huang H, Zhang J, Li W. Linking genotypes with multiple phenotypes in singlecell CRISPR screens. Genome Biol. 2020;21(19).
 11
Hill AJ, McFalineFigueroa JL, Starita LM, Gasperini MJ, Matreyek KA, Packer J, Jackson D, Shendure J, Trapnell C. On the design of CRISPRbased singlecell molecular screens. Nat Methods. 2018; 15(4):271–4.
 12
Replogle JM, Norman TM, Xu A, Hussmann JA, Chen J, Cogan JZ, Meer EJ, Terry JM, Riordan DP, Srinivas N, Fiddes IT, Arthur JG, Alvarado LJ, Pfeiffer KA, Mikkelsen TS, Weissman JS, Adamson B. Combinatorial singlecell CRISPR screens by direct guide RNA capture and targeted sequencing. Nat Biotechnol. 2020; 38(8):954–61.
 13
Candes E, Fan Y, Janson L, Lv J. Panning for gold: ‘modelX’ knockoffs for high dimensional controlled variable selection. J R Stat Soc Ser B Stat Methodol. 2018; 80(3):551–77.
 14
Morris JA, Daniloski Z, Domingo J, Barry T, Ziosi M, Glinos DA, Hao S, Mimitou E, Smibert P, Roeder K, et al.Discovery of target genes and pathways of blood trait loci using pooled crispr screens and single cell rna sequencing. bioRxiv. 2021.
 15
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNAseq data with DESeq2. Genome Biol. 2014; 15:550.
 16
Qiu X, Hill A, Packer J, Lin D, Ma YA, Trapnell C. Singlecell mRNA quantification and differential analysis with Census. Nat Methods. 2017; 14(3):309–15.
 17
Hafemeister C, Satija R. Normalization and variance stabilization of singlecell RNAseq data using regularized negative binomial regression. Genome Biol. 2019; 20(1):1–15.
 18
Ardlie KG, Deluca DS, Segrè AV, Sullivan TJ, Young TR, Gelfand ET, Trowbridge CA, Maller JB, Tukiainen T, et al. The GenotypeTissue Expression (GTEx) pilot analysis: multitissue gene regulation in humans. Science. 2015; 348(6235):648–60.
 19
Andersson R, Gebhard C, MiguelEscalada I, Hoof I, Bornholdt J, Boyd M, Chen Y, Zhao X, Schmidl C, Suzuki T, Ntini E, Arner E, Valen E, Li K, Schwarzfischer L, Glatz D, Raithel J, Lilje B, Rapin N, Bagger FO, Jørgensen M, Andersen PR, Bertin N, Rackham O, Burroughs AM, Baillie JK, Ishizu Y, Shimizu Y, Furuhata E, Maeda S, Negishi Y, Mungall CJ, Meehan TF, Lassmann T, Itoh M, Kawaji H, Kondo N, Kawai J, Lennartsson A, Daub CO, Heutink P, Hume DA, Jensen TH, Suzuki H, Hayashizaki Y, Müller F, Forrest ARR, Carninci P, Rehli M, Sandelin A. An atlas of active enhancers across human cell types and tissues. Nature. 2014; 507(7493):455–61.
 20
Zamanighomi M, Jain SS, Ito T, Pal D, Daley TP, Sellers WR. GEMINI: a variational Bayesian approach to identify genetic interactions from combinatorial CRISPR screens. Genome Biol. 2019; 20(1):1–10.
 21
Norman TM, Horlbeck MA, Replogle JM, Alex YG, Xu A, Jost M, Gilbert LA, Weissman JS. Exploring genetic interaction manifolds constructed from rich singlecell phenotypes. Science. 2019; 365(6455):786–93.
 22
Zhang L, Janson L. Floodgate : inference for modelfree variable importance. arXiv. 2020:1–67.
 23
Przybyla L, Gilbert LA. A new era in functional genomics screens. Nat Rev Genet. 2021:1–15.
 24
Datlinger P, Rendeiro AF, Boenke T, Senekowitsch M, Krausgruber T, Barreca D, Bock C. Ultrahighthroughput singlecell rna sequencing and perturbation screening with combinatorial fluidic indexing. Nat Methods. 2021; 18(6):635–42.
 25
Pierce SE, Granja JM, Greenleaf WJ. Highthroughput singlecell chromatin accessibility CRISPR screens enable unbiased identification of regulatory networks in cancer. Nat Commun. 2021;12(2969).
 26
Liu M, Katsevich E, Janson L, Ramdas A. Fast and Powerful Conditional Randomization Testing via Distillation. Biometrika. 2021;:asab039.
 27
Finner H, Roters M. On the false discovery rate and expected type I errors. Biom J. 2001; 43(8):985–1005.
 28
Towns J, Cockerill T, Dahan M, Foster I, Gaither K, Grimshaw A, Hazlewood V, Lathrop S, Lifka D, Peterson GD, Roskies R, Scott J, WilkinsDiehr N. XSEDE: accelerating scientific discovery. Comput Sci Eng. 2014; 16(05):62–74.
 29
Nystrom NA, Levine MJ, Roskies RZ, Scott JR. Bridges: a uniquely flexible HPC resource for new communities and data analytics. New York: ACM; 2015. pp. 1–8.
 30
Dunham I, Kundaje A, Aldred SF, Collins PJ, Davis CA, Doyle F, Epstein CB, Frietze S, Harrow J, Kaul R, Khatun J, et al. An integrated encyclopedia of DNA elements in the human genome. Nature. 2012; 489(7414):57–74.
 31
Rao SSP, Huntley MH, Durand NC, Stamenova EK, Bochkov ID, Robinson JT, Sanborn AL, Machol I, Omer AD, Lander ES, Aiden EL. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell. 2014; 159(7):1665–80.
 32
Fishilevich S, Nudel R, Rappaport N, Hadar R, Plaschkes I, Iny Stein T, Rosen N, Kohn A, Twik M, Safran M, Lancet D, Cohen D. GeneHancer: genomewide integration of enhancers and target genes in GeneCards. Database: J Biol Databases Curation. 2017; 2017:1–17.
 33
Barry T, Wang X, Morris JA, Roeder K, Katsevich E. Sceptre improves calibration and sensitivity in singlecell crispr screen analysis. Github. 2021. https://doi.org/katsevichlab.github.io/sceptre.
Acknowledgements
We are indebted to Molly Gasperini, Jacob Tome, and Andrew Hill for clarifying several aspects of their data analysis [9] and to the Shendure lab for providing extensive feedback on an earlier draft of this paper. We thank Shiqi Xie for providing guidance on using the Xie et al. 2019 data and Wei Li for a helpful discussion on scMAGeCK. Finally, we thank Tom Norman, Atray Dixit, and Wesley Tansey for useful discussions on singlecell CRISPR screens. This work was supported, in part, by National Institute of Mental Health (NIMH) grant R01MH123184 as well as SFARI Grant 575547. Part of the data analysis used the Extreme Science and Engineering Discovery Environment (XSEDE) [28], which is supported by National Science Foundation grant number ACI1548562. Specifically, it used the Bridges system [29], which is supported by NSF award number ACI1445606, at the Pittsburgh Supercomputing Center (PSC).
Peer review information
Kevin Pang was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Review history
The review history is available as Additional file 2.
Author information
Affiliations
Contributions
Authors’ contributions
GK and KR identified the problem, conceived the research, and provided supervision. GK developed the method with input from TB and KR. TB and GK implemented the method with assistance from JM. TB, XW, and GK performed the analyses. JM provided guidance on interpretation and communication of results. TB and GK wrote the manuscript with input from all authors. The authors read and approved the final manuscript.
Authors’ information
Twitter handle: @EugeneKatsevich (Eugene Katsevich)
Corresponding author
Ethics declarations
Ethics approval and consent to participate
Ethical approval not applicable.
Competing interests
The authors declare that they have no competing financial interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
Additional file 1
Supplementary figures and tables
Additional file 2
Review history
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Barry, T., Wang, X., Morris, J.A. et al. SCEPTRE improves calibration and sensitivity in singlecell CRISPR screen analysis. Genome Biol 22, 344 (2021). https://doi.org/10.1186/s13059021025452
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13059021025452