ACE: a probabilistic model for characterizing gene-level essentiality in CRISPR screens

High-throughput CRISPR-Cas9 knockout screens are widely used to evaluate gene essentiality in cancer research. Here we introduce a probabilistic modeling framework, Analysis of CRISPR-based Essentiality (ACE), that accounts for multiple sources of variation in CRISPR-Cas9 screens and enables new statistical tests for essentiality. We show using simulations that ACE is effective at predicting both absolute and differential essentiality. When applied to publicly available data, ACE identifies known and novel candidates for genotype-specific essentiality, including RNA m6-A methyltransferases that exhibit enhanced essentiality in the presence of inactivating TP53 mutations. ACE provides a robust framework for identifying genes responsive to subtype-specific therapeutic targeting. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-021-02491-z).

CRISPR-Cas9 Negative Selection Screen and Probabilistic Graphical Model. a The initial pool of Cas9-expressing cells is infected with a lentiviral sgRNA master library. The initial sgRNA abundances are measured by sequencing either the newly infected population of cells, the master library, or both. Cells are allowed to grow, and final sgRNA abundances are obtained by sequencing the surviving cells. b Graphical model illustrating how information is pooled across samples (s) and sgRNAs targeting each gene (g ∈ G) to infer each gene's essentiality, φ G . From top-left to bottom-right, m g represents the fraction of the master library consisting of sgRNA g; c s represents the number of cells in sample s (estimated by the user); n sg represents the number of cells initially infected with sgRNA g (such that n sg ≤ c s ); d sg represents the corresponding number of surviving cells after cell growth; and x sg and y sg represent the read counts obtained by sequencing g in the initial and surviving cells, respectively. The numbers of infected cells prior to (n sg ) and following (d sg ) growth are assumed to be related by the deterministic function d sg = f (n sg ; φ G , g ) = n sg (1 − g φ G ) (indicated by factor-graph notation). The efficiency g is determined by logistic regression and the gene essentiality values φ G are estimated by maximum likelihood (see Methods). The prior distribution for n sg and the sampling distributions for x sg and y sg are assumed to be Poisson. The scaling factors γ s and γ s accommodate global properties of sequencing depth and cell growth, and are estimated in pre-processing (see the " Methods" section). Shaded nodes represent variables observed in the data, unfilled nodes represent latent variables, and smaller solid circles represent free parameters The resulting genetic lesion often disrupts the function of the targeted region. The genetically modified cell is allowed to proliferate for a series of divisions, after which the effects on cell growth are measured by sequencing the sgRNA present in the final surviving population of cells. Lethal sgRNA constructs that have disrupted sites critical for normal cell growth will be underrepresented in the final population in contrast to "nonessential" control sgRNAs. Moreover, differential essentiality can be detected from different levels of sgRNA depletion between samples. Hundreds of different cell lines have now been assayed in publicly available CRISPR knockout screens, offering an excellent opportunity to compare gene essentiality between panels of cancer cell lines according to their oncogenic mutations.
Within the general paradigm of a CRISPR-Cas9 screen, several variations of experimental design are possible. For example, many experiments evaluate the depletion of cells infected with lethal sgRNA by sequencing both the initial infected and the final population of cells [14][15][16][17]. In contrast, other experiments streamline the process by sequencing only the initial pool of sgRNA constructs prior to infection [9,18,19], and using the sgRNA abundances in this "master library" as a proxy for the initial frequencies. However, the sgRNA abundances in the master library typically vary by several orders of magnitude, and each biological replicate represents a separate infection into a new pool of cells, making this proxy imperfect. Overall, the variation in experimental design necessitates a flexible analytical framework that can robustly account for different sources of experimental error.
Through deliberate selection of target cell lines, a CRISPR-Cas9 screen can reveal genes that are essential only in a specific oncogenic context, for example, in the presence of a loss-of-function mutation of an important tumor suppressor such as TP53. These context-dependent essential genes -sometimes called "non-oncogene addictions" -are of particular interest because they may offer druggable therapeutic targets even when irreversible loss-of-function mutations to oncogenes are present [20]. A better understanding of genotype-specific gene essentiality has the potential to enable patient-specific therapeutics that are tailored to mutations present at the time of diagnosis. Non-oncogene addiction has recently achieved prominence, for example, in the first round of chemical inhibitors for RNA epigenetic modifiers soon to enter phase I trials [21].
However, statistical tests for identifying such context-dependent essentiality require special care. Not only are they subject to reduced power because they require contrasting one subset of the data with another, but they can yield spurious results if the "test" and "control" subsets are not adequately matched in other respects. Thus, rigorous testing frameworks are needed that efficiently make use of the data and control for such biases.
Many of the statistical tests that have been applied to CRISPR-Cas9 screens rely on summary statistics such as average read counts across samples, or log-fold changes of read counts between the initial and final populations of cells. However, these summary statistics are incomplete descriptions of the raw data, leading to inevitable loss of power. In addition, these tests can be difficult to adapt to alternative experimental designs. For example, methods such as BAGEL, CRISPhieRmix, and JACKS rely upon the log-fold change of read abundances [1, 2, 8-10, 22, 23], but these log-fold changes may be calculated with respect to read counts obtained from either the initial infected population of cells or the master library, with potentially substantial impacts on the power of the tests. Another issue is that some methods reduce the estimation of gene essentiality to a binary classification of 'essential' or 'nonessential' genes [2,4,24], which can blur the differences between weak and strong effects. Finally, most methods do not test directly for differential essentiality within sample subtypes (see Additional file 1: Figure S1 for a comparison of methods).
In this paper, we introduce the Analysis of CRISPR-based Essentiality (ACE), a flexible probabilistic method that directly estimates gene-level essentiality values by maximum likelihood, accommodates variable experimental designs, and enables rigorous likelihood-based tests of both absolute and differential essentiality. The ACE model describes each experimental phase of the CRISPR screen -the master library infection, and the initial and final sequencing -as a separate probabilistic process forming a hierarchical model. The modularity of our framework permits the analysis of a variety of CRISPR-screen data sets while accounting for differences in experimental design. We validate ACE using both simulations and published sets of essential and nonessential genes. We then apply the method to publicly available data from Achilles DepMap [9,25] and demonstrate several compelling examples of differential essentiality in non-small cell lung cancer, including both known and novel cases.

Model overview
ACE makes use of a hierarchical model design to integrate information at the sample, gene, and individual sgRNA levels, with separate parameters at each level (see Fig. 1). The model is structured to account separately for the major sources of uncertainty at each stage of the experimental process. In addition, if samples are partitioned into "test" and "control" groups, ACE can estimate gene essentiality within each sample group separately, allowing for the detection of differential essentiality (discussed further below).
As illustrated in Fig. 1b, the structure of the ACE model reflects the experimental design of a CRISPR-Cas9 screen. The initial number of infected cells n sg , for each sgRNA g and sample (replicate) s, is assumed to be Poisson-distributed with an expected value given by the relative frequency of guide g in the master library, m g , scaled by the number of cells, c s , infected in the assay. Estimates of both m g and c s are provided by the user. The final number of infected cells, d sg , is then assumed to be given by a scaled version of the initial number, d sg = n sg (1 − g φ G ), where g ∈[ 0, 1] denotes the editing efficiency of sgRNA g and φ G ≤ 1 denotes the essentiality of gene G. These initial and final numbers of infected cells are not observed directly (they are latent variables) but are indirectly sampled via the sequencing process, resulting in observed read counts of x sg and y sg , respectively, which are provided by the user as inputs to the inference procedure. We assume Poisson-distributed read counts x sg and y sg , and sum over possible values of the latent variable n sg in the likelihood calculation (see the "Methods" section). Notably, the combination of the Poisson prior and the Poisson sampling distribution adequately captures the observed overdispersion in the read count data (see the "Discussion" section and Additional file 1: Figure S2). Independence is assumed across samples and replicates. The free parameters of the model are estimated by numerical maximization of the likelihood function (see Methods). The ACE software reports estimates of not only the essentiality of each gene, φ G , but also the efficiency of each sgRNA, g .
A value of φ G = 1 represents complete essentiality (driving d sg = n sg (1 − g φ G ) to zero in the case of efficient editing, g = 1), whereas a value of φ G = 0 represents complete nonessentiality (allowing d sg = n sg even when g > 0). Notably, however, our definition of φ G also allows for negative values, reflecting increased cell growth after inactivation of gene G. Such an increase might occur, for example, if a tumor suppressor gene were deactivated by guide RNAs. In practice, negative estimates of φ G are fairly frequently observed, either by random chance or from true increases in cell growth.
To pool information across sgRNAs and ensure identifiability of g and φ G , we determine g by logistic regression based on a user-defined set of features, and treat the regression coefficients only as free parameters (see the "Methods" section). The efficiency of a class of sgRNA has been shown to be strongly correlated with a number of genomic features, including local GC content, proximity to heterochromatin, number of base mismatches, and abundance of off-target sites [18,[26][27][28][29]. However, for simplicity and efficiency, we have used only the GC content of the sgRNA (one of the strongest predictors of efficiency) in our initial implementation. Nevertheless, ACE can easily be extended to consider additional features.
The two sample-specific scaling factors, γ s and γ s , determine the relationships between the numbers of infected cells and the expected sgRNA counts in the initial and final data sets, respectively. These parameters apply globally to all sgRNAs and all genes. Rather than including these parameters in the numerical optimization of the likelihood function, we find it adequate to estimate them in preprocessing based on a median-of-ratios normalization, as used in the widely used differential expression analysis tool, DESeq2 [5].
Notably, γ s is primarily a reflection of sequencing read depth, whereas γ s is determined by both read depth and sample-specific factors in cell growth (see the "Methods" section). In the case where the bulk distribution is strongly influenced by essential genes, as when a library has a large number of essential targets, γ s can alternatively be estimated from designated negative controls.

Differential essentiality
In the case of differential essentiality, ACE compares two (log) likelihoods for each gene G: a likelihood maximized under the constraint that the essentiality, φ G , is the same in designated "test" and "control" sample categories, and a likelihood that allows for different values of φ G in these two categories. These sample categories can be selected by the user to examine differences, for example, between cell lines with and without mutations in KRAS, or between lung-and liver-derived sample panels. The two per-gene likelihoods are then compared in a likelihood ratio test, with significance assessed by comparison with an empirical null distribution based on simulated data (see the "Methods" section). In this way, ACE performs a rigorous test of the alternative hypothesis that gene G exhibits different degrees of essentiality in the test and control categories against a null hypothesis of equal essentiality.

Simulation study
To evaluate the ability of ACE to infer sgRNA-, sample-, and gene-specific parameters, we devised a simulation method to generate artificial CRISPR-screen data sets. To ensure that our simulated data resemble data from real CRISPR screens as closely as possible, our simulator, called empiriCRISPR, mimics the variation in read counts observed in a reference experiment and makes minimal assumptions about the processes by which the data are generated (see the "Methods" section and Additional file 1: Figure S3). Importantly, empiriCRISPR is fully independent of the probabilistic model and parameter inference used in ACE.

Benchmarking of absolute essentiality predictions
Using empiriCRISPR, we generated data sets of 5250 genes in which the essentiality of each gene ranged from complete nonessentiality (φ G = 0) to complete essentiality (φ G = 1). To ensure a fair comparison with methods that rely on a median-based normalization method, we took care to include an adequate number of nonessential genes in each data set, simulating 3,150 genes with φ G = 0 (60% of all genes). The remaining 40% of genes consisted of 300 genes at each of several intermediate values of φ G and 600 genes at φ G = 0.99 (see Fig. 2A). For each choice of simulation parameters, three replicates were simulated with four sgRNAs targeting each gene and an average of 300 reads per sgRNA. As a template data set for empiriCRISPR, we used a collection of previously published genome-wide CRISPR screens in which both the master library and initial infected sgRNA abundances were sequenced along with the depleted samples [16]. The master library was generated through random selection of sgRNA from the template screen, followed by simulation of read counts according to the patterns of variation observed in the template.
We first compared ACE's estimates of essentiality with an estimator based on the average fold change (AFC) in read counts (see the "Methods" section), which has been used in several previous studies [17,30] (see Fig. 2A). We found that the ACE estimates are for 300 genes simulated in three replicates at each essentiality value shown (φ). Simulated data sets also included 3150 nonessential genes (φ G = 0; not shown), 300 of which were provided to each method as negative controls. Black horizontal bars indicate true values ( φ = φ). B Performance in binary classification (essential vs. nonessential) of simulations of 300 "essential" genes at various values of φ G and 300 "nonessential" genes. Performance reported as the Area under the Receiver Operating Characteristic (ROC) curve. Only initial and final read counts from the simulation were used for all methods. C Full ROC curves for each of the simulated essentiality values. "ACE" -our probabilistic method (thresholded on log likelihood ratios), "AFC" -method based on average fold changes in sgRNA abundance (thresholded on z-scores; see the "Methods" section). "BAGEL" -Bayesian Analysis of Gene Essentiality [2] (thresholded on reported Bayes Factors); "JACKS" -Joint analysis of CRISPR/Cas9 knockout Screens [8] (thresholded on reported p-values). "CRISPhieRmix"hierarchical mixture model [23] (thresholded on reported essential gene FDR) approximately unbiased and exhibit relatively low variance (see also Additional file 1: Figure S4). By contrast, the AFC-based estimates display a downward bias, particularly at low simulated values of φ G , and show substantially greater variance. Notably, the simple AFC-based method often yields negative estimates of φ G even when the true φ G is positive.
Many methods do not provide a numerical estimate of the essentiality of a gene, but instead focus on a binary classification of genes into "essential" and "nonessential" categories. To compare ACE with these methods, we simulated data sets containing 300 genes at each of several values of φ G , together with 3150 nonessential genes. For each of these experiments, we then compared the binary classification performance of ACE with BAGEL [1], JACKS [8], CRISPhieRmix [23], and a classifier based on simply thresholding the AFC statistic. We summarized the performance of all methods using receiver operating characteristic (ROC) curves and the area under the ROC curve (AUC) metric ( Fig. 2B and C). We found that most methods performed similarly well at high values of φ G , but in the harder scenarios where φ G takes moderate to low values (φ G < 0.5), JACKS and ACE performed significantly better than the others. JACKS, however, slightly outperformed ACE on this task, apparently because it makes particularly efficient use of the negative controls in defining a null distribution. CRISPhieRmix performed substantially worse than the other methods in this test.

Benchmarking of differential essentiality predictions
To test the detection of differential essentiality, we simulated CRISPR-screen data in two samples ("test" and "control") that included 300 genes essential only in the "test" sample with several different choices of φ G test (see the "Methods" section). To represent uniformly essential genes, both samples also included 300 genes at strong (φ G = 0.99) and 300 genes at moderate (φ G = 0.5) essentiality, as well as a large collection (3150) of nonessential genes (φ G = 0). Similar to the previous tests, we compared the performance of ACE with BAGEL, JACKS, CRISPhieRmix, and a simple AFC-based method in the detection of the differentially essential genes. Because the other methods do not support direct tests for differential essentiality, we devised heuristic test statistics based on the results of two separate tests for essentiality. After some experimentation, we settled on using the ratio of p-values for the test and control panels for JACKS, as there is only one "sample type" in our simulation. For CRISPhieRmix, we used the ratio of predicted false discovery rates between test and control panels, and for BAGEL, we used the absolute difference in the Bayes Factors for the test and control panels (see the "Methods" section).
We found that ACE's likelihood ratio test for differential essentiality did result in substantially better power than the competing methods, on average ( Fig. 3). Most methods had reasonably good power when φ G test was large (≥ 0.8) with JACKS and CRISPhieRmix performing somewhat more poorly than the others, but at values of φ G test ≤ 0.6, ACE showed a clear advantage, except for the weakest signal of differential essentiality (φ G test = 0.2), at which it was surpassed by CRISPhieRmix. Even when φ G test = 0.2, ACE was able to detect 67% of differentially essential genes (Fig. 3B).

Identification of essential genes
We next sought to evaluate the performance of ACE in detecting previously identified essential genes from real data. We focused on 688 genes that have been classified as essential in at least two previous studies [1][2][3], based on both RNAi and CRISPR screens [19]. For negative controls, we used 634 genes that are not widely expressed according to public RNA-seq data (see Methods). We used ACE to evaluate the essentiality of both gene sets using the CRISPR-screen data from Achilles DepMap. Notably, this data set includes sequence data representing only the sgRNA master library and the post-depletion sgRNA abundances. To ensure a similar genetic background between samples, we focused on 92 Fig. 3 Detection of differential essentiality in simulated data. A Performance in binary classification of differential essentiality for 300 genes that were "essential" (at various levels of simulated essentiality φ) in "test" samples and "nonessential" (φ = 0) in "control" samples. Both samples also included 300 genes at strong (φ G = 0.99) and 300 genes at moderate (φ G = 0.5) essentiality, as well as a large collection (3150) of nonessential genes (φ G = 0). 300 genes from each of these uniformly essential sets were used as negative controls. B Fractions of differentially essential genes correctly detected by ACE (at a Bonferroni-corrected empirical p < 0.05) at various values of simulated essentiality φ. C Full ROC curves for each of the simulated differential essentiality levels. In all cases, three replicates of the control and test samples were simulated for each value of φ. Methods compared are as described in Fig. 2 unique cell lines that all originate from lung adenocarcinoma, a subtype of non-small cell lung cancer (NSCLC).
Considering the many differences in methods and data sets, we found that ACE's characterization of these genes was fairly concordant with previous results. The genes previously identified as essential (positive controls) yielded both clearly elevated estimates of essentiality ( φ) and clearly elevated log likelihood ratios (LLRs) in a test for non-zero essentiality relative to the negative controls (Fig. 4A). At a LLR threshold defined such that only 5% of the negative controls exceeded it (i.e., with empirical p = 0.05), 90.5% of Fig. 4 Identification of known genotype-specific essentiality. A Estimates of essentiality per gene ( φ; x-axis) and essentiality log likelihood ratios (Ess. LLR; y-axis) from ACE based on genome-wide CRISPR-screen data from the Achilles DepMap Project. Results are shown for 688 "essential" genes (Pos. Controls; red) identified in references [1][2][3] and 634 nonessential genes (Neg. Controls; blue; see the "Methods" section for details). The LLR metrics are relative to a null hypothesis of φ = 0. B Estimated essentiality per gene in wild-type TP53 (φ; x-axis) vs. log likelihood ratio (LLR) for differential essentiality in wild-type vs. mutant TP53 based on DepMap data for NSCLC Adenocarcinoma [9]. Several high-scoring genes are highlighted with labels, including MDM2 (see text). C Top 10 genes identified as differentially essential in NSCLC samples having wildtype TP53, ordered by differential essentiality log likelihood ratio (Diff. Ess. LLR). D Top 10 genes identified as differentially essential in NSCLC samples having mutant KRAS. In panels A-D, the horizontal dashed lines indicate Bonferroni-corrected empirical p-values of 0.05. Genes highlighted in red in panels C and D are discussed in the text. E Ranking of positive (CDKN1A) and negative (MDM2) regulators of TP53 by differential essentiality test statistic across all prediction methods. Ranking is relative to 1171 uniformly essential or nonessential genes from our previously described negative and positive control sets; a subset of 50 from each of these sets were provided to methods as annotated controls. An illustration of MDM2 alone is shown in Additional file 1: Fig. S9. F Median ranking of 19 differentially expressed genes and their 14 associated genetic backgrounds across five tissue types from DepMap's Project Achilles. Ranking is relative to 500 uniformly essential and 500 uniformly nonessential genes from our previously described negative and positive control sets. Method performance by gene and tissue type is shown in Additional file 1: Figs. S11 and S12. Results from a one-sided test using expected direction of essentiality are shown in Additional file 1: Fig. S13 the positive controls were predicted to be essential. In an ROC analysis based on the same data, ACE was able to distinguish the positive and negative controls with high accuracy (AUC = 0.97; see Additional file 1: Figure S5). The outliers in the negative control population appear to be mis-classified as nonessential due to low expression levels, despite that they may actually have essential (e.g., DUX4) or growth-suppressing activity (e.g., POU1F1 [31]).

Identification of differentially essential genes
To evaluate ACE's ability to identify differential essentiality with real data, we applied it to genes evaluated in liver and lung tissue in the DepMap database. For our set of uniformly essential genes, we selected both a "uniformly active" set of 698 genes robustly expressed across lung-and liver-derived samples, and a "uniformly silent" set of 594 genes showing little or no evidence of expression across samples (see the "Methods" section). For the target set of differentially essential genes, we used as a proxy genes that exhibited a substantial difference in median expression (≥1.5 TPM) between tissue panels, excluding genes that were highly expressed (>5 TPM) in both panels (Additional file 1: Figure S6). We then predicted differential essentiality between the liver and lung samples using each of the computational methods under evaluation ( Figures S7 and S8).
We found that ACE performed roughly as well as the best methods on this task (AUC = 0.701, better than all but CRISPhieRmix [0.709] and the AFC-based method [0.705]). However, all methods performed, at best, only modestly well on this benchmark, likely owing to the disparity between gene expression levels and gene essentiality as evaluated by a knockout phenotype. The main outlier in this analysis was CERES, which showed considerably poorer performance than the other methods, possibly because it was optimized for uniformly essential and nonessential genes across all DepMap samples.

Detection of known cases of genotype-specific essentiality
One possible application of ACE's test for differential essentiality is to identify significant differences in essentiality between distinct genetic backgrounds. For example, cases of synthetic lethality may appear as genes that are essential in test samples in which another gene is mutated and nonessential in control samples in which that gene is intact. Conversely, other genes may be found to be essential in the presence of the wildtype but not the mutant genotype. To explore this possibility, we contrasted cell lines annotated by the Cancer Cell Line Encyclopedia (CCLE) with variants of two well-known cancer-related genes, the tumor suppressor TP53 and the proto-oncogene KRAS (see the "Methods" section) [32,33].
We restricted ourselves to NSCLC samples in these cases, to avoid spurious signals unrelated to the mutations of interest. Moreover, we structured our test specifically to identify genes that were significantly more (rather than less) essential in the test samples than in the control samples. As in the previous analysis, we used data from Achilles DepMap [9,25].
In the case of TP53, we first defined the test samples as ones with "wildtype" copies of TP53, with no annotated mutations, and the control samples as ones in which TP53 has "damaging" mutations (denoted here as TP53; see the "Methods" section). This test for a wildtype-dependent increase in essentiality yielded a clear outlier: the gene MDM2 ( Fig. 4B and C). Cell lines with wildtype TP53 require an intact copy of MDM2 to negatively regulate the TP53 protein and allow for cell proliferation [34,35]. The dependence of cell lines with wildtype TP on functional MDM2 observed by ACE therefore provides support for the biological validity of our test for differential essentiality.
In the case of KRAS, which becomes oncogenic in the presence of gain-of-function mutations and is known to drive aggressive tumorigenesis [36], we defined the samples in the opposite way: with test samples containing a missense mutation in KRAS ( KRAS), the majority of which are activating gain-of-function mutations [37], and control samples with no annotated KRAS SNP. Samples containing wildtype TP53 were additionally excluded to remove the signal from MDM2. As predicted, the KRAS gene itself emerged as having, by far, the strongest signal for a mutation-dependent increase in essentiality (Fig. 4D), further validating our method.
To compare computational methods in this setting, we applied all methods to MDM2 and CDKN1A, a negative and a positive regulator of TP53, respectively [38]. Again, we tested test samples having "wildtype" TP53 against control samples having TP53. Moreover, we examined the performance of all methods as a function of the number of samples per panel, quantifying performance using the ranking of each gene relative to 1171 uniformly positive and negative control genes. ACE consistently ranked both genes highly for differential essentiality, both in the low-sample panel regime, and as more samples were included ( Fig. 4E and Additional file 1: Figure S9). The other methods consistently ranked MDM2 highly, but most methods performed more poorly in the case of CDKN1A, likely because they are not designed to detect the 'negative essentiality' associated with the loss of positive regulator.
We found this trend holds across multiple tissues and genes (Fig. 4F), with ACE consistently ranking known differentially essential genes higher than uniform controls. In this case, we analyzed 19 known instances of differentially essential genes in cell-line panels derived from five different tissue types (see Additional file 1: Figures S10-S13). Even in the relatively noisy regime of three samples per panel, ACE was able to detect additional signals of differential essentiality.

Test for non-oncogene addictions in TP53 mutants
Finally, we swapped the "test" and "control" samples for TP53 -using TP53 samples for the "test" and wildtype samples for the "control" -with the goal of discovering novel nononcogene addictions. As discussed above, such addictions may suggest new drug targets in cases of undruggable loss-of-function mutations. This test of increased essentiality in the presence of damaging TP53 mutations identified several candidate genes ( Fig. 5A and Additional file 1: Figure S14).
Notably, the top candidates included two genes encoding RNA m 6 -A methyltransferases, METTL14, and METTL16. A third gene, METTL3, encoding the heterodimeric partner of METTL14 also scored highly for differential essentiality (Fig. 5B). When examined in CRISPR knockout screens performed by the Sanger Institute's Project Score, using a different master sgRNA library, both METTL14 and METTL3 were found to be essential only in the presence of "damaged" TP53 in lung samples (Fig. 5C). All three methyltransferases are expressed at a moderate to high level across the TP53-mutant cell line panel, according to CCLE expression data ( Fig. 5D and Additional file 1: Fig. S15) [25,32,33].
Several studies have noted CRISPR sgRNA cutting in multiple regions can reduce cell proliferation, regardless of the essentiality of the sites targeted. To ensure that the  ). B Estimated essentiality of RNA methyltransferase genes in TP53 samples (x-axis) vs. log likelihood ratio in TP53 vs. wild-type TP53 samples (y-axis). C Evaluation of methyltransferase gene essentiality in Project Score CRISPR screens. Only two unique lung-tissue-derived cell lines (3 replicates each) carry wildtype TP53 in the Project Score database, so only two cell lines with a "damaging" mutation in TP53 were selected for comparison using ACE. The cell lines chosen were based on our analysis in the Broad's Project Achilles, but the data used is generated by screening with a different CRISPR library by the Sanger Institute. The dashed line indicates the empirical p-value cutoff for p<0.05 based on negative controls. D Gene expression data from the CCLE, processed by the Achilles DepMap, for the TP53 cell line panel highlighting the differentially essential methyltransferase genes from panel (B). MDM2 is shown as a positive control and nonessential ZNF366 is shown as a negative control. The distribution of all genes is shown in gray signatures of essentiality in METTL3, METTL14, and METTL16 are not merely reflective of changes in copy number, we compared gene duplications between mutant and wildtype TP53 sample panels (Additional file 1: Fig. S16). Copy number variation was similar between the two panels, suggesting that differences in gene essentiality are driven by biological activity of the genes themselves. In addition, flanking genomic regions of METTL14 and METTL16 do not contain known oncogenic drivers, implying that CRISPR-induced lesions are unlikely to be simultaneously influencing a nearby essential gene (Additional file 1: Fig. S17). An additional analysis of Project Achilles liver samples revealed no significant differential essentiality, with little difference in absolute essentiality observed between samples with "damaged" or wildtype TP53 (Additional file 1: Fig.  S18). Overall, our findings suggest that METTL3, METTL14, and METTL16 are promising candidates for non-oncogene addiction in the absence of functional TP53 in lung NSCLC Adenocarcinoma.

Discussion
In this paper, we have introduced a new probabilistic modeling and inference method, called ACE, for detecting essential genes and estimating levels of essentiality from CRISPR negative selection screens. As we have shown, ACE uses a hierarchical model to account for the uncertainty associated with each major stage of a CRISPR screen and enable maximum-likelihood estimation of gene-level essentiality. It supports likelihood ratio tests for both absolute and differential essentiality. In comparison with other available computational methods, ACE appears to be particularly advantageous in cases of differential essentiality, moderate-to-weak essentiality, or limited numbers of samples. In addition, while we have not highlighted this feature in our analyses, the hierarchical structure of the ACE model makes it particularly well suited for integrating data across experiments, or across groups of genes. We anticipate that ACE's performance can be further improved by incorporating state-of-the-art models for sgRNA editing efficiency into the method.
While the number of known essential genes is growing, simulations remain critical in providing an objective gold standard to evaluate the performance of computational methods. To test ACE and compare it with other available methods, we developed a new simulator, called empiriCRISPR. Our strategy with empiriCRISPR was not to generate simulated data according to a process-based model -which would tend not to account for all of the sources of noise in real data, and would be difficult to define without heavily overlapping the assumptions of ACE itself -but instead to mimic the empirical properties of real data sets as closely as possible. In this way, we were able to generate 'authentic-looking' data sets of arbitrary size with known sets of essential genes at known levels of essentiality. We showed that ACE performs quite well on these data sets despite their realistic 'noise' characteristics.
To our knowledge, ACE is the first method to support a direct likelihood-based test for differential essentiality between designated 'test' and 'control' sample panels. This test can be useful in a variety of applications, including tests for differences in essentiality in the presence of mutant and wild-type versions of known cancer genes. In comparison to indirect tests for differential essentiality -derived from separate analyses of "test" and "control" panels -ACE appears to offer improved power according to our simulation study (see Fig. 3). Notably, it is straightforward to define one-sided variants of this test to detect increased or decreased essentiality in the "test" relative to "control" panels. The framework also extends naturally to tests of more complex hypotheses (e.g., test essentiality > 0 and control essentiality ≤ 0). These should all be well-powered tests, provided they are applied to sufficiently abundant data for the desirable asymptotic properties of likelihood ratio tests to become apparent.
Whereas most current statistical models for sequencing read counts allow for overdispersed counts, typically through the use of a negative binomial distribution, we settled on the use of the simpler Poisson distribution in this work. We initially experimented with a negative binomial distribution for read counts, allowing the gene-wise dispersion parameter to vary as a function of the mean, as in DEseq2 [5]. However, we found this version of the model computationally expensive to optimize and prone to over-fitting. The Poisson version of the model is simpler, more robust, and easier to fit to data. Moreover, our integration over possible values of the Poisson mean appears to be adequate to account for the major effects of overdispersion. In the end, we observe comparable performance between ACE and methods that use more sophisticated variance estimation methods.
As illustrated by the major revisions of "core essential gene" sets over the past 10 years, the definition of "essential genes" is inevitably somewhat subjective and arbitrary. In reality, essentiality occurs on a spectrum and depends on the particular genetic background, cell-type, and conditions tested. As a result, candidate "essential" genes require carefully designed experiments for validation, and researchers should be cautious about relying upon published sets of genes labeled "essential". Estimates of φ G in ACE are also limited in that they reflect an "average" essentiality across all samples analyzed, so users must take care to choose a panel of samples that ensures a level of diversity appropriate for the biological question of interest. For example, a diverse collection of tissue samples or cell lines may be most suitable for the identification of universally essential genes, whereas a set of patient-derived tissue samples may be best for personalized tumor therapeutics. Notably, because it is a relative metric, differential essentiality may in some cases generalize better than absolute essentiality. As illustrated in Additional file 1: Fig. S7, the imperfect correlation of gene expression with its knockout phenotype restricts the identification of truly differential genes to data such as CRISPR screens. We anticipate that findings of differential essentiality may be of interest to researchers exploring genetic vulnerabilities even in other tissues and cancer subtypes.
We have focused in this work on differential essentiality between previously defined groups of samples, but it is worth noting that ACE could be adapted to identify sample subgroups of interest de novo on the basis of differential essentiality. For example, one could iterate over all possible bipartitions of a set of samples and identify subgroups exhibiting the strongest signals of differential essentiality. If this approach proved too computationally intensive, candidate bipartitions could be identified based on precomputed individual gene estimates of essentiality. We anticipate that subtype discovery may be a promising area for future applications of ACE.
Among the genotype-dependent differentially essential genes identified by ACE, we found the RNA methyltransferase genes METTL3, METTL14, and METTL16 to be particularly interesting. RNA m 6 -A modifications have recently been found to be critical in the maintenance of leukemic cell growth [39]. These genes have been previously implicated in other cancer types, and RNA m 6 -A methyltransferases in general have been a topic of recent interest in the new field of cancer epitranscriptomics, with several phase I trials targeting METTL3 due to begin in 2021 [21]. METTL3, in particular, was recently identified as interacting with TP53 mRNA and inducing drug resistance in a biochemical assay in SW48 colon cancer cells [40]. Another study found METTL3 expression to be essential in NSCLC adenocarcinoma, although only two cell types were examined [41]. METTL3 mRNA knockdown with siRNA has also been shown, by M6A-seq, to result in an enrichment of methylation changes in genes in the p53-signaling pathway [42]. Finally, patients with genetic alterations of m 6 -A regulatory genes have been found to have an increase in TP53 mutations, albeit only with sample sizes of n = 5 [43]. Thus, there is abundant indirect evidence suggesting that these genes deserve further investigation as possible candidates for addiction in the absence of functional TP53. More generally, we anticipate that improved tests for differential essentiality, such as the ones introduced here, combined with rapidly growing CRISPR-screen data sets, will enable the detection of many new cases of genotype-dependent differences in essentiality, including both synthetic lethality and non-oncogene addiction.

Conclusions
In this paper, we have introduced the ACE modeling framework for CRISPR/Cas9 screens. We show that ACE is competitive with the best available methods in predicting essentiality, and in addition, supports a powerful new test for differential essentiality. In addition, we show using real data how this test for differential essentiality can uncover unique genetic dependencies that might otherwise be overlooked, focusing in particular on the case of a TP53-linked dependence on RNA methyltransferases in NSCLC lung samples. The ACE software is freely available for use by the scientific community.

Initial infection
The initial abundance n sg of each sgRNA g within sample s is assumed to be given by a Poisson distribution with mean μ 1 = c s m g , where c s is the total number of cells infected in the screen and m g is the relative frequency of g in the master library: P n sg |c s , m g = Pois n sg |μ 1 = c s m g .
( 1 ) The value m g for each sgRNA g is estimated as the fraction of sequence reads corresponding to g in the sequence data for the master library. If multiple replicates are available, an average is taken. The parameter c s is either provided by the user or a default value of 1000 is assumed.

Initial read counts
The sgRNA sequencing counts x sg from the initial infected population n sg are assumed to follow a Poisson distribution with mean μ 2 = γ s n sg , where γ s is a scaling parameter that captures the relationship between sequencing depth and the number of cells infected: P x sg |γ s , n sg = Pois x sg |μ 2 = γ s n sg .

Final read counts
Following several rounds of cell division, the final abundance of each sgRNA g is assumed to be d sg = n sg (1 − g φ G ), where φ G is the essentiality of the gene G that is targeted by g, and g is the efficiency of g. This process is assumed to be deterministic; we assume that the number of infected cells is sufficiently large and sufficiently many cell divisions have occurred that stochastic effects can be ignored, and uncertainty in the value of d sg can be absorbed by the distributions for n sg and y sg . The final read counts y sg are then assumed to follow a Poisson distribution with mean μ 3 = γ s d sg , where γ s is the final read count scaling factor: P y sg |d sg , γ s , g φ G = Pois y sg |μ 3 = γ s d sg = Pois y sg |μ 3 = γ s n sg (1 − g φ G ) .
The scaling factor γ s not only captures the relationship between sequencing depth and the number of cells that remain after cell growth, but also can accommodate samplespecific differences in the growth rate of all cells (independent of sgRNA).

Guide-specific effects
The sgRNA-specific efficiency g is derived from a general feature vector by logistic regression: where F g is a feature vector and ω is a corresponding vector of real-valued coefficients.
In this paper, we used a ten-dimensional vector F g whose elements are indicators (via "one-hot" encoding) for the decile of GC content of each sgRNA's template guide. The corresponding 10-dimensional weight vector ω was treated as a free parameter and estimated by maximum likelihood. In this way, we effectively estimate a separate efficiency value for each GC decile, based on pooled information from all guides within that decile. As noted in the text, users may instead wish to use richer feature vectors such as those described in other recent studies [26,[44][45][46][47][48].

Estimation of sample scaling parameters
Following the "median-of-ratios" approach used by DESeq2 [5], the sample scaling parameters γ s and γ s are not treated as free parameters in the maximization of the likelihood, but instead are set in a preprocessing step based on the raw read counts x sg and y sg for negative controls. Specifically, the scaling parameter is set equal to the median of the ratios of each sgRNA's read count to a guide-specific reference value, which in this case is the expected number of infected cells, c s m g : where g ∈ G neg indicates all sgRNAs targeting genes in the designated set of negative controls. If no master library is available, the initial abundance of sgRNAḡ is approximated from the initial read counts as the average across samples: where S is the total number of samples. The user may choose to use the median ratio across all genes rather than only the negative controls, if desired. This heuristic method of estimation may prove insufficient for some experimental cases, such as misclassified negative controls or strong, sample-specific multipliers of essentiality affecting all genes in experiments with few samples, though we found these estimates satisfactory for the analyses and benchmarking presented in this work.

Full model likelihood
Substituting from Eqs. 1 to 3, the full likelihood function can be expressed as: Pois n sg |c s m g Pois x sg |γ s n sg × Pois y sg |γ s n sg (1 − g φ G ) where g is implicitly a function of ω as defined by Eq. 4. Here and below we use "dot" notation to indicate sets of all relevant indices (e.g., x .. = {x sg | for all samples s and sgRNAs g}).
The likelihood function must sum over the number of infected cells n sg , which cannot be directly observed. This summation step can be accelerated by an approximation that bundles summands together, for example, into groups of 10 cells.
When ω is held fixed, the full likelihood decomposes by gene, such that each φ G can be estimated separately, and similarly, when the φ G values are held fixed, the likelihood decomposes by the efficiency features f, owing to our "one-hot" design for the feature vector. Thus, we estimate each φ G and component of ω f iteratively, to enable parallelization by gene and feature category. Specifically, we repeatedly carry out the following optimizations until convergence (for each gene G and feature f, respectively): where φ ¬G denotes the set {φ g | g = G}, ω ¬f denotes the set {ω f | f = f }, and θ denotes the remaining parameter set. We initialize ω to treat all sgRNA as 100% effective.
Finally, to evaluate whether an estimated value of φ G is significantly different from zero, we compare the component of the maximized likelihood function for that gene to the corresponding component when φ G is held fixed at zero. Specifically, we compute a test statistic T as, where L(φ G ) represents the component of the likelihood corresponding to gene G.
We compute an empirical p-value for T based on the equivalent test statistics for the designated negative control genes.

Adaptation for alternative experimental designs
The likelihood function can easily be adapted to settings in which data is unavailable for either the master library or the initial sgRNA abundances. In the case where the master library information is missing, we simply use a uniform prior for values of n sg , P(n sg |c s ) = 1 c s . When the initial infected sgRNA abundances are unavailable, we remove the corresponding portion of the likelihood function, effectively treating x .. as missing data: Pois n sg |c s m g × Pois y sg |γ s n sg (

ACE test for differential essentiality
To identify genes with significant differences in essentiality between a "test" set of samples, S t , and a "control" set, S c , we employ the following likelihood ratio test. Let L( φ G ) represent the component of the likelihood function relevant to gene G that has been maximized in the standard way, with one version of the φ G parameter shared across both S t and S c . By contrast, let L( φ t G )L( φ c G ) represent the corresponding component of the likelihood optimized such that separate versions of the essentiality parameter, φ t G and φ c G , are used for samples S t and S c , respectively. The likelihood ratio test statistic T is then given by, . (11) For this test, all samples are used to calculate the sample-specific parameters γ s and γ s , and the sgRNA-specific parameter g ; only φ t G and φ c G are re-estimated to compute T . As with the test for absolute essentiality, an empirical p-value is computed for T based on the distribution of values computed for the negative controls. In some analyses, we performed a one-sided test by further requiring that φ t G > φ c G and otherwise setting T to 0.

Essential and nonessential controls
The set of essential genes used for validation was defined as those genes that were included in at least two of three previously published sets of essential genes [1][2][3]. The set of nonessential genes was obtained from 604 total RNA-seq datasets for 128 cell lines, in vitro differentiated cells, primary cells, and tissues accessed from the Encyclopedia of DNA Elements (ENCODE) data portal on August 29, 2019 (see Additional file 2 for accession info of all cell lines used). For the nonessential set, we chose 594 genes expressed at ≤1 TPM in all cell lines.

Data from cancer dependency map (DepMap)
For our analysis of real data from genome-wide CRISPR knockout screens, we used two large databases available from the Cancer Dependency Map (DepMap) Consortium: Project Achilles, produced by the Broad Institute [9], and Project Score, produced by the Sanger Institute [49,50]. The Achilles Project dataset uses the Avana library and includes 1374 publicly available screens in 625 unique cell lines at the time of writing. Only the pre-infection master library and the post-depletion sgRNAs were sequenced, with one to four replicates per cell line available. Following the Achilles DepMap convention [51], we assume a gene has lost function (indicated as ) in all CCLE cell lines annotated as containing one or more of the following: single nucleotide polymorphisms (SNPs), deletions, or insertions in the start codon; deletions or insertions that introduce a frameshift mutation; mutations that introduce a premature stop codon or a de novo frameshifted start codon; or mutations that disrupt a splice site [32,33]. For benchmarking with differential essential genes, we took three different approaches. For the first, we relied upon gene expression from the previously described ENCODE RNA-seq data sets to nominate our differentially essential gene set. The median expression of each gene was evaluated separately across the 206 unique lung samples and the 24 unique liver samples, with a total of 42 liver replicates and 42 lung replicates subsampled for data in the final analysis. 494 genes with a median expression difference ≥1.5 TPM were chosen to represent differentially essential genes (see Additional file 1: Fig.  S6). Genes with high levels of expression in both tissue panels (>5 TPM) were excluded in an effort to isolate genes essential to only one tissue type. No genes were "perfectly" differential, with no expression in all samples of one panel, and detectable expression in all samples of the other. Consequently, by the imperfect proxy of gene expression, there is overlap in the by-sample essentiality of our "differentially essential" gene sets.
The uniformly active gene set is composed of 698 genes with expression levels >5 TPM in all lung-and liver-derived samples evaluated, according to previously published gene expression data for these cell lines. The uniformly silent gene set contains 594 genes with expression <1 TPM across all samples; 42 of these genes were randomly selected and provided as annotated negative controls to all essentiality inference methods save CERES, for which by-sample precomputed scores were used (see below).
For our second benchmarking analysis in real data, we focused on two known differentially essential genes, CDKN1A and MDM2, and randomly selected the various numbers of samples compared. To retain a similar genetic background between sample panels, samples were compared from the same tissue type and lineage, specifically cell lines originating from lung non-small cell lung cancer (NSCLC) adenocarcinoma. The Achilles Project dataset includes 220 replicates of 92 unique samples of these cell lines.
For our third benchmarking analysis in real data, we randomly selected three samples for each panel from cell lines across five tissues of origin. In this case, no replicates were used. We chose 19 differentially essential genes and their associated genetic backgrounds according to the RNAi-based findings of Project Drive [38], assuming differential essentiality to be universal across all cell lines. We subsampled 500 negative and positive controls from the previously mentioned control gene sets and contrasted them with the rankings of the differentially essential genes according to their test statistics. In addition, we randomly selected 50 examples from each control set and provided them as annotated controls to all essentiality inference methods save CERES, for which by-sample precomputed scores were used (see below).
The Project Score dataset has assayed 323 human cancer cell lines using the Human CRISPR Library [19]. For our test of RNA methyltransferase differential essentiality, we compared the essentiality signals of the six available screens in cell lines having wildtype TP53 (cell lines LU998 and H322M) with those from six screens from two of the cell lines with non-functional TP53 (LXF289 and H1650).

empiriCRISPR simulation of CRISPR knockout screens
We designed the empiriCRISPR screen simulator to mimic the empirical properties of a template data set as closely as possible, while minimizing shared properties with our ACE inference model. empiriCRISPR simulates fractional sgRNA abundance sampled from a template screen and uses a series of gamma distributions to generate initial and final screen counts, x . . and y .. , as follows: x .. = X . Gamma(α .. , β .. ), where X . and Y . are each sample's target total read counts in the initial and final sequencing and m . is the abundance of each guide in the simulation's template master library. The parameters . and φ . are the sgRNA efficiency and gene essentiality chosen for the simulation. empiriCRISPR simulates each sgRNA independently using α and β parameters derived from empirical estimates of variation from infection, initial, and final sequencing (σ m , σ x , and σ y , respectively), fit across replicates of the provided template screen. empiri-CRISPR calculates the gamma distribution parameters by minimizing the adjusted mean squared error of the summary statistics of simulated data with the median of 5 summary statistics of the template screens, shown in Additional file 1: Fig. S3. The actual variation of sgRNA abundances within a sample is wholly dependent on the number of sgRNA and template master library chosen.
This implementation of empiriCRISPR is independent of the probabilistic model and parameter inference used in ACE, save for the general structure of the hierarchical model based on CRISPR screens. While the read counts in empiriCRISPR are derived from flexible gamma distributions designed to capture as much of the biological variation of template CRISPR screens as possible, ACE makes no attempt to infer specific variation parameters. CRISPR knockout screens from [16] were used as the empiriCRISPR template, and initial master library abundances were selected by sampling the template master sgRNA library. Unless otherwise noted, simulations used 300 genes per essentiality value, four sgRNA per gene, and three samples, and assumed a constant essentiality φ G across all sgRNA and samples for gene G.

Software package
ACE is implemented as an R package (R 3.5.0, [52]) that uses data.table [53] and R6 [54]. Optimization is performed using R's optim function with the "Brent" method. Both the ACE and empiriCRISPR github repositories can be found at https://github.com/ CshlSiepelLab. ACE can be run across all genes in parallel. Optimization takes several seconds per gene (see Additional file 1: Fig. S19). The run-time of the software scales approximately linearly with the number of samples included.

Estimation of essentiality from mean log-fold change
The "average fold change" ("AFC") score for each gene G was calculated as follows: where a pseudocount of 0.5 was added to both initial (x sg ) and final (y sg ) read counts to prevent undefined values. Each sample was normalized by λ s to bring the total read count of each sample to a total of 10 million reads, and final fold changes were adjusted according to τ neg , the mean fold change in all negative controls G neg . Consistent with our ACE notation, an AFC value of zero has no impact on cell growth or proliferation.

Estimation of essentiality using BAGEL
We applied the BAGEL (Bayesian Analysis of Gene Essentiality) method [1] using software version 0.91 (last modified 09/2015). Simulated data was analyzed using 4% of genes as positive and negative controls, with simulated essentiality scores of 0.99 (99% depletion) and 0.05 (5% depletion) respectively. Differential essentiality was calculated as the difference of Bayes Factors between separate analysis of test and control sample panels, as performed in [1]. For the analysis of real data, we used all genes identified as essential or nonessential in our earlier analysis.

Estimation of essentiality using JACKS
We applied the JACKS (Joint Analysis of CRISPR/Cas9 Knock-out Screens) method [8] version 0.2 (downloaded 11/2019). Differential essentiality was evaluated by running JACKS separately for the test and control sample panels, using the built-in hierarchical prior option and the same negative control genes provided to other methods for both real and simulated data. The test statistic for differential essentiality was indicated by the ratio in sample-specific p-value in the case of only one sample type (as in simulated data in Figs. 2 and 3), and by a Student's t-test in the case of panels containing multiple sample types (as in Fig. 4E). While JACKS also reports gene-knockout effects, they are not normalized by sample [49], and the p-values used reflect the sample-dependent null distributions inferred by JACKS. For Fig. 4E and Additional file 1: Fig. S9, a one-sided t-test of the appropriate direction was separately performed for the statistics of MDM2 and CDKN1A, as CDKN1A is a tumor suppressor with a "negative" gene essentiality. In the case of bulk gene analysis in Additional file 1: Figs. S7 and S8, a two-sided t-test was used.

Estimation of essentiality using CRISPhieRmix
We used the CRISPhieRmix software version 0.1.0, using 100 points of integration (parameter "nMesh") and the "BIMODAL" testing option. While the software is optimized for use in large pooled CRISPRi and CRISPRa screens, it may also be applied to knockout screens as demonstrated by the authors in [23]. Differential essentiality was estimated by separately running CRISPhieRmix on test and control sample panels, and the test statistic determined by the ratio of the gene essentiality false discovery rates between the two panels.

Estimation of essentiality using CERES
Previously published CERES-based p-values were obtained for each Achilles DepMap sample used in the methods comparison. As in ref. [9], the CERES differential essentiality test statistic was calculated using a Student's t-test to compare the by-sample log fold change effect sizes of the test and control sample panels. For Fig. 4E and Additional file 1: Fig. S9, a one-sided t-test of the appropriate direction was separately performed for the statistics of MDM2 and CDKN1A, as CDKN1A is a tumor suppressor with a "negative" gene essentiality. In the case of bulk gene analysis in Additional file 1: Figs. S7 and S8, a two-sided t-test was used.

Additional file 1: Supplemental figures and analysis.
Additional file 2: Accession numbers of RNAseq assays used in nonessential and tissue-specific gene set curation.
Additional file 3: Review history.