Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Cancer expression quantitative trait loci (eQTLs) can be determined from heterogeneous tumor gene expression data by modeling variation in tumor purity

Fig. 1

The interaction model can accurately attribute eQTLs to cancer using bulk tumor gene expression in simulated data. a Scatterplot of the eQTL effect size recovered from a conventional analysis of bulk tumor expression data (y-axis) against the known normal eQTL effect size created by simulation (x-axis) for the 100 eQTLs that were simulated to have an effect in normal cells, but not cancer. Points are colored red if the conventional model identified them as significant at FDR < 0.05. The eQTL effects recovered by the conventional model (y-axis) are heavily influenced by the eQTL effects in tumor-associated normal cells. b Scatterplot of the estimated cancer eQTL effect size recovered by the interaction model (y-axis) plotted against the known normal eQTL effect size created by simulation (x-axis) for the same 100 eQTLs as in (a) that were simulated to have an effect in normal cells, but not cancer. Points are colored red if the interaction model identified them as significant at FDR < 0.05. The recovered eQTL effects (y-axis values) are no longer affected by eQTLs in associated normal cells and in general have not been misattributed to cancer. c Strip chart of a simulated eQTL in tumor expression data, where the effect size in cancer cells was simulated to be 0 (i.e., no eQTL) and the effect size in tumor-associated normal cells was simulated to be 0.48. The conventional model misattributed this eQTL to cancer. d The same eQTL as in (c), with the effect size calculated in five bins (black points), grouped by the proportion of tumor-associated normal cells. The effect size decreases with increasing proportions of cancer cells. The extrapolated effect size in cancer cells, estimated by the interaction model, is shown in red. The effect size recovered from the bulk tumor, obtained by the conventional model, is shown in green. Whiskers represent 95% confidence intervals. The interaction model has not misattributed this eQTL to cancer cells. e The change in the sensitivity, specificity, and FDR achieved by the interaction model as the level of noise with which the proportion of cancer cells is measured changes. The Pearson correlation on the x-axis is the correlation between the known simulated proportions and those “measured” as more noise is added (see Methods). The dashed red line is at 0.05, the rate at which the FDR was controlled for these tests using the Benjamini and Hochberg method. The FDR is well controlled by the interaction model, even when the correlation between the real and measured (noise added) proportions approaches 0.5. Note: if the cancer cell proportions are completely randomized, the true FDR is 22% (at the 5% threshold). Again, when calculating these true FDRs, the known simulated set of cancer eQTLs were treated as the ground truth

Back to article page