Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Data normalization considerations for digital tumor dissection

Fig. 2

Impact of data normalization on in silico tumor-infiltrating leukocyte profiling. a Tumor purity inferred by ABSOLUTE [29] versus immune content inferred by ESTIMATE [30], compared across 11 TCGA (The Cancer Genome Atlas) cancer types (ABSOLUTE data were obtained from [26]). b Bottom heat map showing Pearson correlations comparing overall leukocyte content, inferred by ESTIMATE, with immune subset abundance, inferred by TIMER, across 23 TCGA tumor types. Cancers are ordered from left to right by the mean correlation coefficient calculated across the six immune cell types. Top mean cross-correlation coefficient of the six immune subsets compared with each other, omitting self-comparisons. Cancer types are vertically aligned, and correlation coefficients are expressed as mean ± SEM. c TIMER results are shown for four representative TCGA cancer types, along with immune content inferred by ESTIMATE. Overall leukocyte content and estimates of individual tumor-infiltrating leukocyte (TIL) subsets are normalized from 0 to 1 within each cancer type, and ordered from left to right by decreasing immune content. Regression lines (shown in black) were calculated by cubic splines. d Same as panel b, but after normalizing inferred levels of the six leukocyte subsets to one in each patient. e Cross-correlation matrix of CIBERSORT results before and after adjustment by total leukocyte content. Results are shown for lung squamous cell carcinoma (LUSC) microarrays profiled by TCGA (n = 130 tumor samples). ESTIMATE was used to infer total leukocyte content, denoted immune score. f Average representation of the six immune subsets inferred by TIMER across 23 TCGA cancer types. g Impact of source datasets on tumor gene expression levels following batch correction. Li et al. applied ComBat [17] to merge expression profiles of bulk tumors with a reference database containing six immune cell types with variable representation. Here, the number of dendritic cell (DC) samples in the authors’ reference database (n = 88) was randomly sampled from 1 to 88 while the remaining immune subsets were left unchanged. For each iteration, ComBat was applied to merge the reference immune profiles with RNA-Seq data from LUSC, which we used as a representative TCGA cancer type (n = 555 tumors). The median expression level of each DC marker gene (used in Li et al. and originally obtained from [31]) was determined across the LUSC cohort; markers are represented as medians, quartiles, and 10th and 90th percentiles. h Analysis of the number of immune reference samples versus the relative fraction of each immune subset inferred by TIMER across TCGA (colored as in panel f)

Back to article page