DeCAF: a novel method to identify cell-type specific regulatory variants and their role in cancer risk

Here, we propose DeCAF (DEconvoluted cell type Allele specific Function), a new method to identify cell-fraction (cf) QTLs in tumors by leveraging both allelic and total expression information. Applying DeCAF to RNA-seq data from TCGA, we identify 3664 genes with cfQTLs (at 10% FDR) in 14 cell types, a 5.63× increase in discovery over conventional interaction-eQTL mapping. cfQTLs replicated in external cell-type-specific eQTL data are more enriched for cancer risk than conventional eQTLs. Our new method, DeCAF, empowers the discovery of biologically meaningful cfQTLs from bulk RNA-seq data in moderately sized studies.


Background
Genome-wide association studies (GWAS) have been instrumental in identifying a large number of genetic variants associated with risk for many diseases including cancer [1][2][3][4][5][6][7]. However, as the majority of GWAS associations are non-coding variants without clear function, the mechanism of action is typically unknown. Expression quantitative trait loci (eQTLs) have been instrumental in linking genetic variation to effects in gene expression [8][9][10][11], recently identifying putative susceptibility genes for ovarian cancer [12], prostate cancer [13], and breast cancer [14]. Recently, it has been observed that some eQTL effects can be observed only in specific contexts and often vary across tissue and cell types [15][16][17][18]. In the context of cancer, the cell types in the tumor/microenvironment can have substantially different functions [19,20]: with CD4 and CD8 T cells driving cytotoxic anticancer immunity [19,21], while regulatory T cells are associated with immune suppression and homeostasis [22,23]. Identifying and quantifying cell-type-specific eQTLs in tumors is thus a critical step to understanding germline cancer mechanisms and germline-somatic interactions.
To date, cell-type-specific studies have been limited in size due to cost and labor associated with selecting a pure subset (i.e., cell sorting) and therefore have weak power to identify QTLs [24]. While emerging single-cell technologies have the potential to precisely measure expression in specific cell populations, this approach remains too expensive to measure across hundreds of individuals and exhibits very sparse expression, with most genes unexpressed in a given cell. This has led to the development of bulk RNA-seq deconvolution methods which generally work by defining gene sets in reference data from a pure population and ranking the expression of these in the target sample [25,26], or modeling the target sample as a mixture of signatures [27][28][29][30]. These cell fraction estimates can additionally be incorporated into eQTL analyses to identify cell-fractionspecific effects (cfQTLs) [31][32][33][34][35]. However, by testing for an interaction effect on a very noisy outcome, such studies typically require sample sizes in the thousands to achieve adequate power (particularly for cell types present at low frequency) [33].
In addition to conventional eQTLs, genetic effects on expression can be measured by quantifying the ratio of RNA-seq reads at heterozygous variants in exons. A significant departure from a 50% allelic ratio is indicative of a cis-genetic effect on expression, and referred to as allelic imbalance (AI). This approach benefits from being able to control for trans-variation, and, because it measures allelic effects within individuals and not between individuals, can harness power from read depth [36][37][38][39][40][41][42][43][44]. AI has also been leveraged to identify genes undergoing gene-environment interactions [45,46], tumor/normal regulatory differences [47] and, recently, cell type specificity [48]. Methods exist and have been proven to work well, which jointly model AI and eQTL effect sizes, including TReCASE [49], RASQUAL [50] and BaseQTL [51]. However, the integration of total expression and AI to detect cfQTLs has been largely unexplored.
Here we propose DeCAF (DEconvoluted cell type Allele specific Function), a method that increases power to identify cfQTLs in bulk data by combining AI and total expression signals. We use DeCAF to identify thousands of allelic cfQTLs in multiple cell types as well as tumor cells in renal cell carcinoma, which we replicate using external cell/tissue type data and link to GWAS risk variants. DeCAF identified many cfQTLs that were not observed in a conventional bulk eQTL analysis, suggesting that this framework may greatly enrich our understanding of cis regulatory mechanisms.

Overview of DeCAF for cfQTL discovery
We sought to identify cfQTL variants whose effect on gene expression varied across bulk samples according to cell type fraction (Fig. 1a) (not to be confused with having a variant that increases proliferation of a certain cell-type in a heterogeneous tissue). As in previous work [52][53][54][55][56], such variants can be identified through a linear interaction model with deconvoluted (individual-level) cell fraction, where a significant interaction between genotype and cell fraction is indicative of an effect size that differs with differing cellfractions. We introduce an analog of the interaction model for AI data, which we combine with the conventional interaction (ieQTL) model. The ieQTL model is a linear regression of y ∼ μ + βx + f + β f x * f , where y is total expression, μ is the expression mean, x is the genotype encoding the number of alternative alleles, β is the standard eQTL effect (expected to be zero under the null), f is the individual-level cell fraction, and β f measures the cell fraction specificity. Our proposed AI model relates the number of reference and alternative allelic reads in heterozygous individuals to their cell fraction as a regression of REF, ALT ∼ μ a + α f f , where μ a is the mean allelic fraction across individuals and Fig. 1 Identifying cfQTLs. A Depiction of AI changing across increasing cell fraction. B Flowchart showing the data entering into identifying DeCAF QTLs. Dark blue boxes represent outside data and light blue boxes represent internal calculated data. C Comparing cfQTL testing methods for simulated data. Plot depicting the power (y-axis) for varying number of individuals (x-axis) and either ieQTL (green), iAI (pink), or DeCAF combined method (blue). Each column represents a cell type (denoting simulations using real deconvoluted cell fractions from TIMER) and each row is a different allelic fraction (effect size of AI) α f is the modifying effect of the cell-fraction f and our primary effect of interest. Under the null μ a is 0.5 and deviations from 0.5 indicate an AI effect in the bulk population; likewise, a significantly non-zero α f indicates the cell fraction modifies the allelic fraction effect and is indicative of a cfQTL. The parameters of the AI model can be estimated by binomial/logistic regression, but we additionally implement beta-binomial regression to accommodate read overdispersion that is common in RNA-seq data [57] (see the "Methods" section). Finally, the β f and α f estimates (which are both oriented towards the alternative allele) are combined by meta-analysis to form the complete DeCAF test statistic. In practice, we additionally include a tumor purity term in both models to jointly estimate tumor-specific effects (tsQTLs) [31] and determine whether a variant is acting in the tumor or the microenvironment. We note that, in contrast to some analyses of AI in individual samples, our model always tests for consistent effects in the population at a given variant, polarized to the variant allele. We tested causal variants outside the gene by using haplotype phasing (heterozygous variants were assigned the reads on their corresponding haplotype as in [47]).
Somatic copy number alterations can lead to extra variation and non-genetically driven AI. Indeed, the earliest applications of allelic imbalance in cancer were to identify copy number changes from sequenced DNA. As CNVs have been estimated in TCGA data from tumor/normal genotyping, the local CNV estimate from each individual was included in the model as a fixed effect covariate, to account for an offset in the expression due to carrying a CNV. Accounting for CNV as a covariate was well calibrated and led to only a 1.12× reduction in power (Additional file 2: Fig. S3). In the real data, we additionally conservatively masked out any individuals with deep CNVs (>0.1 segment mean, or Log2 ratios of the tumor copy number to the normal copy number) for a given test.

Evaluation of DeCAF in simulation
We performed wide-ranging simulations reflecting conditions found in real data and evaluated the power of the interaction eQTL, interaction AI, and DeCAF tests (see the "Methods" section). These included allelic effect size, minor allele frequency, sequencing depth, CNV, number of individuals, and read overdispersion. In particular, we sampled cell fractions from the real TIMER-inferred fractions in TCGA across 6 cell-types, to investigate the performance of this approach under realistic cell proportions. Under the null hypothesis where there is no effect, all of the methods were well calibrated and produced null results (Additional file 2: Figs. S1a and S2). Under the alternative hypothesis, DeCAF consistently met or outperformed the power of the conventional interaction eQTL test both under cell fractions generated from a uniform distribution (Additional file 2: S1b) and real cell fractions from TIMER inference in the TCGA KIRC data (Fig. 1c). requiring 0.56× fewer samples on average to reach >75% power (Fig. 1c). In 42% of the simulated models, DeCAF achieved >75% power at <600 individuals (the typical TCGA study size) while the eQTL test did not (Fig. 1c).

Identifying a multitude of QTLs
We applied DeCAF to genotype and RNA-seq data from 503 RCC tumors in TCGA ( Fig. 1b) using cell fractions from xCell [26] and TIMER [30] as well as tumor purity (previously estimated by [58]) to account for differences in purity and identify tsQTLs [31] ( Fig. 1b; see the "Methods" section). The eQTL and AI effects were highly and significantly correlated in the marginal DeCAF results (ρ = 0.8, p value < 2.2 −16 , Additional file 2: Fig. S6). To confirm that DeCAF assumptions (AI and QTL effects are shared and AI effects are not unexpectedly noisy) hold in real data, we tested whether DeCAF would identify more significant effects than the interaction eQTL test. Across all tested cell types, DeCAF identified 5.63× more genes with significant cfQTLs than a conventional interaction QTL analysis. By cell fraction type, DeCAF identified 1753 (xCell), 1220 (TIMER) cfQTL genes, and 691 (purity) tsQTL genes at 10% FDR (defined as unique genes containing a significant corresponding QTL; Additional file 1: Table S1). For comparison, conventional interaction eQTL mapping identified 310 (xCell), 218 (TIMER), and 186 (purity) tsQTL genes (Fig. 2). Applying DeCAF without a cell fraction term identified a total of 3654 marginal eQTL genes, a 1.7× increase over the 2150 eQTL genes identified with the conventional eQTL linear model, suggesting that incorporating AI particularly increases power for interaction analyses where the effective sample sizes are lower. Across the cfQTL genes, we observed 42 genes with significant effects in all cell types while the rest of genes were largely observed in a single cell type (270 genes, Fig. 2b) and were not substantially shared across the xCell and TIMER results. We thus focus on cfQTLs from both methods moving forward.
Of the significant DeCAF cfQTL genes, 56% were also significant DeCAF marginal eQTL genes and 41% were also significant DeCAF tsQTL genes. In contrast, of the significant DeCAF tsQTL genes, 82% were also DeCAF marginal eQTL genes, suggesting that tumor-specific effects often manifest as apparent bulk eQTLs (although they make up a minority of all marginal eQTLs, as previously observed [31]). Remarkably, while we saw a large portion of shared eGenes between cfQTLs, tsQTLs, and marginal eQTLs, the SNPs associated with each eGene were mostly not shared (Fig. 2d). Of the significant DeCAF cfQTLs, 1% were also significant DeCAF marginal eQTLs and 1% were also significant DeCAF tsQTLs. Most strikingly, only 3 SNP:eGene pairs are shared between cfQTLs, tsQTLs, and marginal eQTLs. While differences in the lead eQTL SNP may be due to statistical noise, we further observed that for genes with an eQTL and cfQTL, 92% of SNP pairs had r 2 < 0.5 (LD), suggesting that DeCAF identified largely independent genetic signals that were not detectable from conventional bulk analyses.
To provide visual intuition for the DeCAF approach, we identified genes with a significant (Spearman) correlation between cell fraction and individual allelic fraction of their corresponding cfQTL (Fig. 3c). We highlight the cfQTL effect of rs117689555 on the expression of GOLGA6L1 in the context of neutrophils (ρ = 0.92, p value = 0.001). A striking linear relationship can be observed between the neutrophil fraction in each heterozygous sample and the allele fraction, shifting from imbalance for the reference allele to imbalance for the alternative allele (i.e., a complete reversal of effect-size direction). We note that this correlation-based approach was statistically much weaker than DeCAF because it did not incorporate read count or read overdispersion, and so is used here only to provide intuition.
In principle, somatic variants will introduce additional noise into the analysis by sporadically knocking out or activating driver genes, but they would not be expected to lead to false positives in the absence of germline-somatic interactions. Analyzing somatic SNV calling data from TCGA whole-exome sequencing, we found that a total of 96 genes had >10 somatic mutation carriers (sufficient to influence population-scale QTL discovery), which overlapped only 9/3654 of the significant marginal DeCAF eGenes we previously identified. For these 9 genes that were significant marginal DeCAF eGenes and had >10 SNV carriers, we tested each lead SNP for a germline-somatic interaction by adding somatic carrier status as a binary interaction term (similar to how purity is included as an interaction term for tsQTL analysis). None of the 9 interaction terms were statistically significant after Bonferroni correction, indicating no significant differences in AI/QTL effect size between carriers and non-carriers. In sum, we find that recurrent SNVs are unlikely to influence the overall DeCAF results due to the generally low number of highly recurrently mutated eGenes, and a formal interaction test of these genes did not modify the associations.
DeCAF can also identify cell-fraction effects in normal tissue. As a demonstration, we applied DeCAF to matched kidney normal RNA-seq data from the same TCGA KIRC cohort (N = 70). This much smaller cohort still yielded a total of 287 significant cfQTL eGenes identified by DeCAF, compared to 201 cfQTL eGenes by conventional ieQTL  (FDR 10%). A larger proportion of the identified effects were shared (71% shared cfQTLs for normal versus 32% for tumor; Additional file 2: Fig. S7), likely due to the much lower sample size and lower number of heterozygous individuals for the AI component as can be seen in simulations (Additional file 2: Fig. S1b).

Replication of DeCAF cfQTLs in external studies
Reasoning that the DeCAF cfQTLs would capture components of the tumor immune microenvironment, we sought to replicate them in external non-cancer data. First, we turned to eQTLs in the GTEx (v8 [59]), detected in bulk RNA-seq from whole blood (N = 670), lymphoblastoid cells (LCLs, N = 147), fibroblasts (N = 483), kidney(N = 73), and breast (N = 396) tissues -selected as a sampling of potential cells/tissues in the RCC microenvironment. Across all DeCAF cfQTLs combined, we observed substantial replication across multiple GTEx tissues, with the highest replication rates in fibroblasts: π 1 = 0.67 and π 1 = 0.73 for xCell and TIMER-based results respectively (see Methods, Fig. 4). These replication rates were comparable or higher than the replication of marginal eQTLs into GTEx data, where π 1 ranged from 0.59 (kidney) to 0.77 (breast), demonstrating that cfQTLs have similar true positive rates when evaluated in the appropriate target tissue. Moreover, GTEx breast, kidney, and LCLs exhibited the lowest cfQTL replication rates (all π 1 ≤ 0.25), demonstrating that cfQTLs were also specific to a target tissue (in this case fibroblast and blood), in contrast to marginal eQTLs that generally replicated in all target tissues (all π 1 > 0.59). Taken together, these results suggest that cfQTLs may be capturing genetic effects enriched in cancer-associated fibroblasts and depleted in organ tissues or B-lymphocytes (though we note that replication rates will be biased by statistical power in the target tissue, which is a function of sample size and data quality).
Within the individual cell types tested by DeCAF, we found a wide range of replication rates in GTEx tissues with no clear cell-tissue relationships, likely due to the relatively low number of cfQTLs within a given cell type or heterogeneity in the target GTEx samples (Additional file 2: Fig. S10). We note that while GTEx did calculate AI signal [60], it was not used in the identification of these population-level eQTLs, further underscoring the DeCAF replications given the different statistical models employed.
We carried out a second replication using cell-type-specific eQTL data from the BLUEPRINT consortium [24]. Notably, BLUEPRINT data (but not eQTLs) were used to train the xCell deconvolution scores and thus provide an apples-to-apples comparison of pure versus deconvoluted eQTLs in independent data. Across all DeCAF cfQTLs combined, replication rates were notably lower than in GTEx, with π 1 in the 0.30-0.40 range (Additional file 2: Fig. S9). As in GTEx, we again did not observe clear trends between the DeCAF cfQTL cell type and the target eQTL cell type (for example, the strongest replication was from Mesenchymal stem cells (MSCs) to CD4 T cells, with π 1 = 0.60). The lower replication of BLUEPRINT versus GTEx may be explained by the relatively small sample size of the BLUEPRINT, and thus lower power to detect weak, cell-type-specific effects. Overall, these replications were generally comparable to the replication π 1 range of 0.3-0.67 observed in a recent GTEx ieQTL analysis of similarly sized target studies [35], suggesting that replicating cfQTLs may be generally more challenging than marginal eQTLs.
Turning to tsQTLs, we were surprised to find substantial replication into both external eQTL studies (Fig. 3). The strongest replication in GTEx was in blood (π 1 = 0.65) and the strongest replication in BLUEPRINT was in neutrophils and monocytes (π 1 = 0.70). Strikingly, DeCAF tsQTL replication was higher than DeCAF cfQTL in GTEx breast, LCL, kidney, and blood as well as all BLUEPRINT tissues, and was comparable to that of marginal DeCAF QTLs (Additional file 2: Fig. S9). Given the highest replication was observed in the BLUEPRINT immune cell types, we hypothesize that tsQTLs may be capturing genetic effects from or due to tumor infiltrating lymphocytes (TILs), or a mix of TILs and the tumor cell of origin. These enrichment patterns were broadly similar when testing the smaller number of conventional tumor interaction eQTLs, underscoring the robustness of this effect to detection methodology.

Enrichment of DeCAF cfQTLs in GWAS
We next sought to quantify the relationship between cfQTLs and genome-wide association study (GWAS) variants, which may shed light on the downstream phenotypic mechanisms of cfQTLs. We analyzed the GWAS effect-size (squared Z-score) enrichments across 7 common cancer GWAS [1][2][3][4][5][6][7]61]. The enrichments compared significant cfQTLs to random non-significant cfQTLs as a background to account for the frequency and LD of tested variants, with significance quantified through background sampling (see the "Methods" section). Every cancer GWAS had at least one significant cfQTL enrichment (Fig. 4b) and most cfQTL cell types exhibited enrichment in cancer GWAS (mean 1.16× s.e. 0.04). Notably, cfQTLs consistently showed higher enrichments than marginal DeCAF eQTLs (mean 1.10× s.e. 0.02). Compared to cfQTLs, tsQTLs generally had weaker enrichments, but were still significant in 4/7 cancers. For example, all eight cell types with significant enrichment in prostate cancer GWAS (mean 1.84× s.e. 0.12) had a higher magnitude of enrichment than the corresponding tsQTLs (mean 1.10×). As a comparison set of phenotypes, we expanded our analyses to 12 non-cancer GWAS from recent large-scale biobank studies [61,62]. We observed broadly lower enrichments in these non-cancer GWAS (mean cfQTL enrichment 0.86× s.e. 0.01), with only 5/12 non-cancer GWAS traits having any significant (p value<0.05) cfQTL enrichments (compared to 7/7 cancer traits), covering a broad range of trait types (Fig. 4b). Overall, cancer GWAS traits had a stronger mean enrichment for every cell type than non-cancer GWAS (Fig. 4c). In sum, cfQTLs are more directly relevant to cancer GWAS mechanisms than eQTLs across a wide range of cancers.
We next focused on enrichments specific to the RCC GWAS [1] where we expected more biologically plausible cell-type-specific cfQTL enrichments. In the previous analysis comparing average effect sizes to non-imbalanced cfQTLs, only tsQTLs were significantly enriched for the RCC GWAS. We thus turned to a more sensitive analysis based on the enrichment of significant versus non-significant effects evaluated by Fisher's Exact Test (see the "Methods" section). After testing multiple DeCAF FDR and GWAS pvalue thresholds we found that a more relaxed threshold (see the "Methods" section) produced the most confident results (smallest 95% confidence intervals) while identifying the most commonly seen significant cell types (Additional file 2: Fig. S11a). The core results were relatively stable across thresholds. We identified significant (Bonferoni p value<0.05) enrichment for cfQTLs in (TIMER) dendritic cells (OR=7.90, p value=1.86 −4 ) and all combined TIMER cfQTLs (OR=2.89, p value=1.58 −3 ) (Fig. 5a, Additional file 2: Fig. S11b), with nominal significance for cfQTLs in (TIMER) B cells (OR=4.30,p value=0.02), tsQTLs (OR=2.27, p value=0.01), and marginal eQTLs (OR=1.80, p value=0.02). Again, marginal eQTLs were less enriched than other significant cfQTLs. Interestingly, the RCC GWAS enrichment was generally stronger for cfQTLs in individual cell types than across all cell types, suggesting that RCC GWAS variants may be enriched for specific cell types rather than generic cell-type-specific variants (although the statistical power was too low to identify significant differences between cell types in this analysis).

Biological interpretation of cfQTL examples
Lastly, we highlight two specific examples of cfQTLs and tsQTLs that showcase the utility of this framework.
We identified a highly significant cfQTL at rs26481 and ERAP2 in epithelial cells (FDR = 5.5 −69 , Fig. 3b). Intriguingly, rs26481 was clearly an independent signal from the top marginal eQTL for ERAP2 (rs2927610; FDR = 0), with an r 2 of 0.0008 between the two variants. The marginal eQTL signal also coincided with a highly significant tsQTL, indicative of two independent genetic mechanisms (microenvironment and tumor) operating through ERAP2. Interestingly, a previous study found that ERAP2 expression was limited to epithelial cells in normal tissue, and loss of this enzyme was found in 86% of tumors [63]. This loss of ERAP2 was also significantly linked to lack of HLA class I molecules [63]. In renal cancers, ERAP2 was found to be frequently downregulated compared to normal tissue, and potentially involved in immune escape mechanisms [64]. The low expression of ERAP2 has also been associated with poor cancer prognosis [65].
Intersecting the DeCAF results with kidney cancer GWAS data revealed a tsQTL at rs4765621 (FDR = 6.02 −8 ) that colocalized with a genome-wide significant risk variant (p value = 3.33 −12 ) for SCARB1 (Fig. 5a). The same variant was also a marginal eQTL for SCARB1 but is not an eQTL in the normal [47], strongly supporting its tumor-specific effect (Fig. 5). Previous studies have found that inhibiting SCARB1 (a HDL cholesterol receptor) could kill and stop proliferation of clear cell renal cell carcinoma (ccRCC) cells [66]. A previous study [47] thoroughly validated the effect of rs4765623 on SCARB1, a SNP in perfect LD with the variant identified as a tsQTL in our study. rs4765623/SCARB1 exhibited tumor-specific enhancer activity and was validated in a 786-O cell line, with the C risk allele enriched for chromatin activity, consistent with its expression increasing effect in our study.

Discussion
We presented DeCAF, a novel method for identifying cell-type-specific QTLs by harnessing signals from both total and allelic expression. No methods currently exist that leverage AI for cell-type-specific discovery, relying solely on QTL signals. By drawing additional power from individual reads, as well as the internal control offered by withinindividual variation, DeCAF enables well-powered cfQTL analyses of medium-size QTL studies that would not have sufficient power for conventional tests. DeCAF directly models multiple potential cancer-specific confounders including tumor purity and CNVs, and can distinguish whether a QTL is functional in the tumor, microenvironment cell type, or both.
We applied DeCAF to TCGA data to perform the first cfQTL mapping effort in tumors. DeCAF identified 3664 significant cfQTL genes across all cell types, 5.63× more than was found using conventional ieQTL mapping (consistent with simulations). DeCAF similarly identified 3.72× more tsQTLs than the conventional approach, thus being a powerful method for identifying tumor-specific effects even in the absence of deconvoluted data. Surprisingly, DeCAF tsQTLs had the higher replication levels in independent eQTL data. This high replication, particularly in sorted immune cells, suggests tsQTLs may capture QTL effects in the context of tumor immune infiltration. Though we caution that the tsQTL analysis was substantially better powered than any other individual cfQTL analysis due to the higher variance and abundance of pure tumors, which likely impacts the relative replication performance. As little is known about the relevance of cell types in the tumor microenvironment to cancer risk, we integrated DeCAF data with cancer GWAS variants. We observed significant and cell-type-specific enrichment of cfQTLs with GWAS risk variants across all cancers evaluated; which was substantially higher than in non-cancer GWAS. By looking at the overlap of GWAS variants with cfQTLs and cancer QTLs, we were able to characterize their function and connect them to target genes in a specific cellular context. In sum, we found that DeCAF can identify thousands of cfQTLs from bulk RNA-seq, these cfQTLs replicated significantly in independent eQTL data (particularly tsQTLs), and were more enriched for GWAS risk than conventional eQTLs.
Other methods to detect cell-type-specific eQTL effects from bulk tissue data have recently been applied. GTEx [35] and Westra et al. [33] took the approach of identifying ieQTLs using a standard linear model with a cell fraction -genotype interaction term. The GTEx study evaluated concordance with allelic fractions in the same individuals as a replication, but did not utilize the full power of cell-type AI for discovery. As such, these approaches were generally underpowered for typical, moderately sized eQTL studies. For example, Westra et al. showed very low power below N=1000 (N<1000 represents the vast majority of QTL studies), and the GTEx analysis identified <80 ieQTL genes per cell type-tissue pair on average. Our approach also shares some similarities to recently proposed methods that leverage AI in other contexts. EAGLE [45] is a method to identify gene-environment interactions through AI mapping, which in principle could be extended to quantitative cell-fractions as "environments". However, EAGLE only identifies a gene level effect rather than specific QTLs, and is thus challenging to integrate with external data such as GWAS. The recent method BSCET [48] leveraged AI with deconvoluted cell types, but is similarly intended for analysis of transcribed SNPs rather than cfQTLs, and does not model the count nature of allelic data nor incorporate total expression. DeCAF is thus the first method to integrate total and allelic expression together for powerful cfQTL discovery at small to moderate sample sizes.
While DeCAF has made great strides in identifying cfQTLs, it has some limitations. First, DeCAF has limited power to detect cfQTLs in rare cell types (i.e., low cell fractions or cell fraction variance). Lower power for rare cell types is also a limitation of direct single-cell QTL analyses and motivates larger reference and target studies. Second, DeCAF is inherently dependent on the quality of the deconvolution and cannot test cell types that are not in the reference data. The majority of existing deconvolution methods (including TIMER and xCell) calculate cell fraction scores and not direct percentages. As a consequence, DeCAF cfQTL effect sizes should be interpreted with caution, because they will be influenced by (a) the scale of the deconvoluted score; (b) the differing uncertainty in the deconvolution of different cell types; (c) and the power to detect an effect in rare cell types. While most deconvolution methods have been shown to replicate well in pure bulk cell types [25][26][27][28][29][30], they continue to be in active development. DeCAF can be applied to any deconvolution framework or score and so will benefit from improved methodologies in the future. Using DeCAF to improve the cell type deconvolution itself (i.e., by maximizing cfQTL discovery) is thus a compelling future direction. Third, we elected to take the conservative approach and remove individuals with high CNV values from the analysis. In principle, DeCAF could be extended to model both cell-typeand CNV-specific interactions using matched allelic data from DNA sequencing, but this remains an open problem. Fourth, AI cannot be used for trans-QTL discovery, as the tested variant must be in phase with the allelic variants, and thus DeCAF's power advantages are limited to the study of cis-eQTLs. Fifth, any other clinical factor that's correlated with the cell fraction could be the causal mediator (for example -tumor stage, site of sampling and BMI), which could be further investigated by incorporating additional relevant interaction terms. Sixth, AI can be the consequence of many mechanisms such as nonsense-mediated decay, imprinting, post-translational effects and alternative splicing [67]. By assuming consistent AI/QTL effects in DeCAF, our approach may lose power when these deviating effects are present. However, prior work has found that, on average, AI and QTL effect sizes are highly concordant [60] and are thus likely capturing the same mechanisms in general. Our real data analyses also show that the combined DeCAF model achieves the largest power (with comparable external replication rate), further supporting these assumptions. Lastly, although DeCAF cfQTLs broadly replicated in external data at comparable rates to prior studies, we generally did not find that a given DeCAF cell type replicated best in the matching target cell type. This lack of cell type consistent replication may be explained by many factors including: differences in cell fractions influencing DeCAF power, differences in the target study sizes influencing replication power, and potential differences between AI signals in the discovery and eQTL signals in the replication. Understanding the relationship between cfQTLs detected in heterogeneous cell populations and cell-type-specific QTLs detected in pure cell types thus continues to be an open question of great interest.
Although DeCAF was applied to primarliy tumor RNA-seq data in this work, it is directly amenable to identifying cis-regulatory effects in other molecular phenotypes and contexts where sequencing read data is available. For example, DeCAF could be applied to identify cfQTLs in larger normal tissue datasets such as in GTEx. By developing a method in tumor tissue, we have addressed a more difficult scenario first. In the case of emerging single-cell QTL studies, DeCAF can be applied to identify cell-type-specific QTLs using the directly estimated cell fractions. Likewise, other measures of cell state, which may span multiple cell types, could be incorporated as interaction terms in the DeCAF framework. Beyond RNA-seq, cfQTLs can be investigated via DeCAF in other molecular measurements such as tumor ATAC-seq data from TCGA [68], which would shed light on cell-type-and tumor-specific mechanisms of DNA accessibility. DeCAF could be extended to estimate effects in multiple cell types jointly, through the inclusion of multiple interaction terms, although the potential for over-fitting and autocorrelation across cell fractions needs to be carefully considered. Beyond cell fraction and tumor purity, any continuous score can be used in DeCAF, including polygenic risk scores capturing germline risk burden [69], integrated clustering of molecular cancer subtypes, and emerging immune subtype definitions [70]. DeCAF thus provides a framework for broad analyses of context-specific QTLs in tumor and non-tumor data.

Conclusions
Rich availlability of bulk RNA-sequencing studies has lead to the development of cellfraction deconvolution methods for large scale cell-type-specific expression analysis. Here, we present DeCAF as the first to use these analyses to study both cell-type-specific and tumor-specific allelic expression in cancer. By using a combination of AI and eQTL mapping, we gained considerable power over previous studies that considered eQTL mapping alone. Through identifying both microenvironment cell-type and tumor effects we were able to explore effects unique to each and link them back to disease. This high preforming method provides a framework to identify cell-type allelic effects in other cancer types, normal tissue, and a multitude of other continuous traits such as open chromatic from ATAC-seq.

Statistical model to detect cfQTLs
The conventional QTL-based approach defines y as a per-individual vector of total expression, f as the corresponding cell fraction estimate for a given cell type, and x as the genotype; and the models y = μ + βx + β f x * f . The cell fraction ieQTLeffect, β f , is then estimated by typical linear regression [52][53][54][55][56]. To test cell-type-specific AI, we first restrict to individuals that are heterozygous for a variant in the target gene for which reads have been allelically assigned (we refer to this as the "functional SNP"). For this sub-population, f is again defined as the vector of cell fraction estimates, but instead of x, we introduce π, the population-level allelic fraction for heterozygous carriers of the functional SNP. Unlike conventional allelic imbalance tests that evaluate one individual at a time internally, we test for a consistent trend of cell fraction and imbalance across the population. We then model π = π f f + π 0 (1 − f ) where π f is the allelic fraction in the focal cell type and π 0 is the allelic fraction in the rest of the cell types. While we do not observe the π value, for each individual we see REF, ALT read counts that are sampled from their π. The cell fraction iAI effect is then estimated by binomial regression of REF, ALT ∼ μ a +β f f where μ a captures the mean allelic fraction in the population and β f captures the additional fraction-specific effect. When μ a is significantly different from 0.5 this variant is exhibiting population-level AI, and when β f is significantly different from 0, this variant is also exhibiting cell-type-specific AI. Functional (read carrying) SNPs were tested directly. For distal SNPs, the read counts for each allele were computed as the sum of functional SNP reads along the respective haplotype (as shown in previous work [50,57]). To account for overdispersion that is common in molecular data [57], we leveraged a beta-binomial regression, with overdispersion estimated for each individual across all heterozygous reads (see below for tumors). Finally, we combined the cell fraction ieQTL and iAI tests by Stouffer's method (these tests are independent and so can be combined); we refer to the combined estimate as the DeCAF test statistic. Stouffer's method is the sum of the Z scores (here the test statistics derived from the QTL and AI tests), divided by the square of the number of values input. This combined statistic is equivalent to an inverse-variance weighted meta-analysis between the two associations and thus assumes a shared underlying effect (i.e., effects in opposite directions will become less significant when combined). Our assumption that the total and allelic signals are independent is the same as that made by prior approaches for combining QTL/AI data, including TReCASE [49], RASQUAL [50] and BaseQTL [51]. Even though expression is used from the same individuals, the tests are independent because the tested independent variable is independent: eQTLs use the variance between the 0/1/2 genotypes, whereas AI uses the variance within the 1 genotype (between alleles). We note that this is further supported by our simulations, which simulate consistent allelic effects in the same individuals and show that the test is well controlled under the null (Additional file 2: Fig. S2). In principle, AI and QTL effects could be aggregated with a statistic that ignores sign, such as a Fisher's combined test, but prior work has shown that AI and QTL effects are highly consistent [71].

Statistical model to account for tumor variation
The basic model described above allows for estimation of cfQTLs in normal expression, and we additionally extend this model to account for potential biases due tumor heterogeneity and somatic alterations. First, tumors are a mix of normal and cancer cells which introduces additional variance into the expression and could create a false relationship with a given cell type if it is correlated with the tumor fraction. To account for this, we introduce an additional term corresponding to tumor purity into the AI model, and an interaction term corresponding to the SNP-purity interaction into the eQTL model (as previously proposed in ref. [31]). This extension improves power for cfQTLs by accounting for tumor-specific variance. In addition, tumor-specific QTLs [31] can be inferred by testing for a non-zero effect size on the purity term.
Second, somatic copy number alterations can lead to extra variation and non-genetically driven AI. Indeed, the earliest applications of allelic imbalance in cancer were to identify copy number alterations (CNVs) from sequenced DNA. In the AI component of the DeCAF model, we estimate the beta-binomial overdispersion parameter for each CNV region in an individual separately, to account for CNV-specific variance. We additionally investigated three approaches to account for extra variance due to a significant CNV: (i) excluding variants in a CNV in an individual from the analysis entirely; (ii) including the per-individual CNV estimate in the model as a fixed effect covariate, to account for an offset in the expression due to carrying a CNV; (iii) including a random effect term for all CNV carriers to account for extra variance in expression without a consistent direction. These models make different assumptions about the CNV architecture and we selected the best performing model empirically based on the number of cfQTLs identified and their reproducibility in external data.

Simulation framework
We investigated multiple AI-based approaches and modeling assumptions in simulation. Parameters tested for impact on the performance of these tests included: minor allele frequency (MAF); baseline (0.5) and cell-type-specific effect size (AF); read depth (D, sampled from a Poisson with fixed mean); number of individuals (N); and cell fraction percentages (f, sampled from a uniform distribution or from real data). To generate a single AI individual, a π is defined as 0.5(1 − f ) + (π f * f ), ALT reads were drawn from X ∼ BetaBin(π, N) with fixed overdispersion parameter, and REF reads were computed as D − ALT. Hardy-Weinberg equations were then used to generate the expected number of heterozygous AI individuals based on the MAF. To simulate CNVs, a fraction of individuals were sampled as being carriers and additional variance terms are introduced. In the fixed-effect model, being a CNV carrier adds a constant factor to the π (i.e., one allele is always amplified), whereas in the random effect model, one allele is randomly amplified or deleted for computing the π. Finally, to simulate total expression for the conventional eQTL models, quantitative phenotypes were sampled from a linear model y = σ NF * NF(σ effect * SNP * f ) + σ CNV * CNV + ε. The overall simulation was performed 500 times and power for each test was defined as the number of simulations in which that test produces an association with p<0.05/20,000 (where 20,000 is approximately the number of genes to be tested).
Wherever possible, we identified simulation parameters from the real data: mean overdispersion was estimated as 0.0263 from tumor RNA-seq in TCGA; QTL effect sizes are set to 0.04, matching the average eQTL variance explained in real data [72]; normal fraction (NF) effect sizes were set to 0.0269, matching the average variance in expression explained by NF from real data; σ CNV was set to 0.012 to match the variance in CNVs in real data. For cell fraction estimates, as a uniform distribution is an optimistic case (the majority of cell fractions are in the lower range and are variable (Additional file 2: Fig.  S4), in addition to running simulations on generated uniform cell fractions we also ran simulations using the real cell fractions identified by TIMER [30].

Deconvoluted cell fractions in TCGA data
TCGA is a rich resource for tumor RNA-seq data which has been deconvoluted into cell types by multiple published methods: xCell [26] (64 cell types), which defines gene sets in a pure population and ranks the expression of these in the sample; TIMER [30] (six tumorinfiltrating immune cell types including B cells, CD4 T cells, CD8 T cells, neutrophils, macrophages, and dendritic cells), which uses a signature gene matrix from bulk expression; and others [73][74][75]. We downloaded xCell [26] and TIMER [30] cell fraction data for individuals with KIRC data from TCGA. To further improve power for testing in the xCell results, we focus on using cell types with a cell fraction score with an interquartile range >0.1 and cell types expected to be found in kidney tissue (Additional file 2: Fig. S4). Previous estimates of tumor purity [58] were additionally used to account for differences in purity and identify tsQTLs [31] (Fig. 1b).

Allele-specific quantification
We applied DeCAF to genotype and RNA-seq data from 503 TCGA RCC tumors from the KIRC study (Fig. 1b). Germline genotype data and RNA-seq BAMs were downloaded from the NCI Genomic Data Commons. Germline genotype data was imputed and phased through the Michigan Imputation Server. Total quantified and normalized gene expression (upper-quartile normalized count estimates) was accessed as previously described [76]. RNA-seq reads were integrated with the genotype data and run through the WASP allelic mapping pipeline (using BWA alignment) to account for potential mapping biases (Additional file 2: Fig. S5). In brief, for each read containing a polymorphism a new "proxy" read was generated with the alternative allele and mapped to the genome; any reads with proxies that did not map to the same location were then excluded from the analysis. This conservative approach removes reads with potential allele-specific mapping biases. Finally, allelic counts at individual variants were quantified using GATK [77] ASE quantification and assigned to the phased haplotypes. For each individual, allelic reads from all heterozygous variants within the tested gene are aggregated using haplotype information, under the assumption that the regulatory effect is consistent across the haplotype [47,50,57]. To process the TCGA data efficiently, we implemented DeCAF in a self-contained and reproducible pipeline on the Cancer Genomics Cloud, creating a workflow that can then be run on any data source (see Web Resources).

cfQTL discovery
For QTL discovery, for each expressed gene and cell type, we tested each common (MAF greater than 1%) germline SNP in the cis-locus of the gene (100kb around the TSS) using each described method. Each SNP is tested individually and separately for eQTL and AI. For the eQTL test, reference/alternate allele genotype dosage is used as the independent variable. For the AI test, heterozygous individuals are assigned the aggregated allelic read counts (see above) corresponding to their reference/alternate haplotypes. For both signals only a single distal variant is evaluated in a given test, but RNA-seq reads are aggregated across the gene. Each SNP then has eQTL and AI effects combined via Stouffer's method (see above) to define the final DeCAF statistic. For the marginal association without cell fraction interactions and tumor variation adjustment (see above), DeCAF estimates the same underlying quantity as the previously proposed WASP and RASQUAL tests [50,57], with minor differences in how the likelihoods are maximized (regression versus grid search).
Significant cfQTLs in each cell type were determined by identifying the most significant variant for each cell type and gene, applying a genel-level Bonferroni correction. At this point a single SNP per gene and cell-type is then selected (the most significant after bonferroni correction). This set of SNPs then is further BH (Benjamini-Hochberg False Discovery Rate) corrected across genes, per cell-type. The same testing and multiple test correction procedure was applied to identify significant marginal eQTLs in the bulk data. While base results are reported at 10% FDR threshold, using a more stringent threshold did not greatly reduce the number of significant cfQTLs (Additional file 2: Fig. S8).

Testing for variant enrichment
We calculated multiple enrichment and replication estimators for cfQTLs. First, we estimated π 1 replication statistics (i.e., the fraction of associations that are expected to be "non-null") using the qvalue package [78] and (FDR<5%) for the significance threshold for the DeCAF QTL input. Secondly, we conduct a Fisher's exact test for enrichment as follows. A contingency table was constructed where positive-positive was defined as the number of SNPs with significance in both DeCAF (FDR<20%) and GWAS (p-value<0.001). Adjustment was based on the Bonferroni method. Finally, the average Z 2 was calculated as follows. We calculated the average Z 2 (Z score of the SNPs in the category of interest) for significant DeCAF cfQTLs overlapping the category of interest. We then calculated the average Z 2 of a randomly sampled (100×) null and took the ratio of the significant average Z 2 over the null as the measure of enrichment.