A human lung tumor microenvironment interactome identifies clinically relevant cell-type cross-talk

Background Tumors comprise a complex microenvironment of interacting malignant and stromal cell types. Much of our understanding of the tumor microenvironment comes from in vitro studies isolating the interactions between malignant cells and a single stromal cell type, often along a single pathway. Result To develop a deeper understanding of the interactions between cells within human lung tumors, we perform RNA-seq profiling of flow-sorted malignant cells, endothelial cells, immune cells, fibroblasts, and bulk cells from freshly resected human primary non-small-cell lung tumors. We map the cell-specific differential expression of prognostically associated secreted factors and cell surface genes, and computationally reconstruct cross-talk between these cell types to generate a novel resource called the Lung Tumor Microenvironment Interactome (LTMI). Using this resource, we identify and validate a prognostically unfavorable influence of Gremlin-1 production by fibroblasts on proliferation of malignant lung adenocarcinoma cells. We also find a prognostically favorable association between infiltration of mast cells and less aggressive tumor cell behavior. Conclusion These results illustrate the utility of the LTMI as a resource for generating hypotheses concerning tumor-microenvironment interactions that may have prognostic and therapeutic relevance. Supplementary information Supplementary information accompanies this paper at 10.1186/s13059-020-02019-x.


Introduction
Non-small cell lung carcinoma (NSCLC) accounts for~80% of all lung tumors and is comprised of two major histologic subtypes: adenocarcinoma (~60%) and squamous cell carcinoma (~40%). Despite significant therapeutic efforts, overall 5-year survival for NSCLC remains a dismal 18% [1]. While therapies that target malignant cells, such as cisplatinum-based chemotherapy and EGFR inhibitors, have led to improvements in outcomes, new therapeutic strategies are urgently needed in order to significantly improve survival of most lung cancer patients. Recent advances in tumor immunotherapy highlight the importance of targeting interactions between malignant and immune cells, with much effort focusing on the T cell suppressive PD1/CTLA4 axes [2]. Significant evidence points to additional complex interactions between malignant cells and other cell types comprising the tumor microenvironment. Experimental and clinical studies suggest that immune cells as well as endothelial cells and tumor-infiltrating fibroblasts play significant roles in lung cancer development and progression [3][4][5][6][7]. The molecular mechanisms underlying these observations are only beginning to be understood. Rapid progress has been hindered by the reality that these cell subtypes interact in a complex network (i.e., interactome) consisting of intra-and intercellular communication via juxtacrine, autocrine, and paracrine signaling. Elucidating the nature of interactions between lung cancer cells and cells comprising the tumor microenvironment could guide the development of novel therapeutic interventions.
Examples of important stromal players in NSCLC include tumor-associated macrophages (TAMs) which are a major component of the immune cell infiltrate seen in solid tumors [8]. Macrophage-tumor cell interactions lead to release of macrophagederived cytokines, chemokines, and growth/motility factors which in turn recruit additional inflammatory cells to the microenvironment [9,10]. Other immune cells commonly infiltrating lung tumors that play important roles in tumor biology include T, B, and NK cells [11][12][13]. Cancer-associated fibroblasts (CAFs) represent another class of stromal cells that interact with the malignant cell compartment in lung cancers [14][15][16][17]. Although several studies have found functionally important interactions between CAFs and lung cancer cells, a comprehensive understanding of their precise role in lung tumorigenesis remains lacking. These specialized fibroblasts can enhance tumor progression via multiple pathways, including synthesis of support matrices, production of promalignancy growth factors, promotion of angiogenesis, secretion of ECM proteases and pro-invasion factors such as hepatocyte growth factor, and production of immune suppressive cytokines [15][16][17].
The most common approach to studying tumor microenvironment gene expression has been to profile bulk tumors and look for cell-type-specific gene expression "clusters" in the resulting data. Interpretation becomes difficult when genes are expressed in multiple cell types. Co-expression of genes in multiple cell types within tumors occurs frequently. For example, subsets of malignant cells have been found to express genes such as vimentin and fibronectin-1 that are also expressed by fibroblasts [18]. More recently, it has become possible to perform single-cell RNAseq (scRNAseq) for hundreds to thousands of cells from a tumor sample [19]. However, the cost is still prohibitive for large cohorts, and transcriptome coverage is not complete.
Here, we dissociated primary human lung tumor samples directly after surgery, sorted individual cell subtypes based on the expression of surface markers, and performed RNA-seq analysis. We computationally identified cross-talk between different cell types in the lung tumor microenvironment, with a specific focus on prognostically relevant associations. Through the combination of cell purification from primary tumors and gene expression profiling, we have constructed a novel resource for identifying functional interactions between human lung cancer cells and their stroma: The Lung Tumor Microenvironment Interactome (LTMI; https:// lungtmi.stanford.edu).

Results
Forty human primary NSCLC tumors were acquired directly from the operating room, dissociated and sorted based on CD45 + EPCAM − (pan-immune), CD31 + CD45 − EPCAM − (endothelial cells), EPCAM + CD45 − CD31 − (malignant cells), and CD10 + EP-CAM − CD45 − CD31 − (fibroblasts), via our previously published flow cytometry strategy (Fig. 1a) [20]. We performed RNA-seq profiling on 185 samples from 36 tumors for which good-quality RNA could be obtained (Additional file 1: Table S1), including unsorted bulk RNA from the majority of cases, along with six reference samples distributed between experimental batches (Stratagene Universal Reference Human RNA). This yielded an average of 48.6 × 10 6 fragments per sample (range 3.8-85.4 × 10 6 ), with mean effective mapping rate of 87.2% (range 26.6-97.1%). Out of a total of 28,034 expressed protein-coding genes across all cell types, 5790 (21%) were expressed in all four and 9918 (35%) were expressed in only one (Additional file 14: Figure S1), indicating that a large fraction of genes are expressed in specific TMI subpopulations. All patients were treatment-naïve, and clinical characteristics of the cohort are shown in Additional file 2: Table S2. Sample SNP profiles were compared to verify identities (Additional file 14: Figure S2) [21]. After data normalization and summarization of expression at the gene level, we performed batch correction to remove differences between flow cells and observed that this generally improved concordance between transcriptomes from replicates (n = 11; Additional file 14: Figure S3). Multidimensional scaling (MDS) analysis of the 1000 most variable genes across sorted populations showed separation of the malignant, fibroblast, immune, and endothelial cells (Fig. 1b). There was no separation between individual populations isolated from adenocarcinoma versus squamous cell carcinoma by MDS. We next performed unsupervised hierarchical clustering analysis on the same 1000 genes and again observed clear separation of profiles from the different sorted cell types (Fig. 1c). With one exception (T29 CD31 + ), replicates were immediately adjacent to each other in the sample-wise dendrogram.
Within the malignant population, adenocarcinomas clustered apart from SCC as expected. The one tumor in our cohort that was called NSCLC NOS (Not Otherwise Specified) based on histopathology clustered with adenocarcinoma, whereas three other tumors T9 (adeno-squamous), T23 (fetal), and T37 (invasive mucinous adenocarcinoma) clustered with SCC. In supervised analysis, 1168 genes were differentially expression between malignant cells from adenocarcinoma versus SCC tumors (local FDR < 1%) with 931 being more highly expressed in adenocarcinoma, and 237 being lower (Additional file 3: Table S3). Distinguishing genes included classic basal keratins (KRT5, KRT6A, KRT6B, KRT13, KRT14) that along with TP63 were more highly expressed in SCC (Fig. 1d). Conversely, NKX2-1 and mucins (MUC1, MUC5B) were more highly expressed in adenocarcinoma, as were ROS1 and CLDN3.
One potential limitation of experimental strategies that involve dissociation and sorting of cells is that these procedures could distort their transcriptomes prior to RNAseq profiling. To permit analysis of this phenomenon, we also performed RNA-seq on bulk tissues that were frozen immediately after surgical dissection (Additional file 1: Table S1). We then compared the "ground truth" bulk transcriptional profile of each sample to the computationally reconstructed one defined by combining the profiles of individual populations weighted according to their relative abundance in the tumor. The latter was defined by deconvolving the bulk transcriptomes using CIBERSORT [22, 23], with the sorted sample transcriptomes used to construct a signature matrix (Additional file 14: Figure S4 and Additional files 4 and 5: Tables S4 and S5). Reconstructed profiles largely recapitulated bulk profiles (R = 0.97, Fig. 1e, and Additional file 6: Table  S6). We further explored these differences by ranking all genes by their difference between bulk and reconstructed profiles, and compared these to an atlas of body tissues using the Nextbio Correlation Engine [24]. Off-diagonal genes higher in bulk were ones archetypally expressed on cell populations that were not isolated by sorting, including muscle-and nerve-related genes (Fig. 1e, Additional file 6: Table S6). To further isolate the effects of dissociation and sorting, we applied CIBERSORT to the bulk tumors and the sorted immune samples using our previously validated signature matrix of 22 immune cell types (LM22; [22]) and compared deconvolution results. Relative proportions of infiltrating leukocytes were similar across adenocarcinoma and SCC in both our RNA-seq data and previously published microarray studies (Additional file 14: Supplementary Figure 5). Direct comparison of deconvolution results based on bulk vs sorted immune cells showed that some immune subtypes had higher or lower inferred proportions in sorted immune cells, suggesting that they were more sensitive to dissociation and/or sorting (Fig. 1f,g). These included lower than expected levels of plasma cells in immune sorted populations, and higher levels of activated mast cells and eosinophils. Plasma cells are systematically lost during flow sorting, whereas activation/degranulation of mast cells might be triggered by sorting. Taken together, these results suggest that our experimental strategy left the transcriptomes of most populations largely intact, but identify specific populations that are sensitive to flow sorting. These findings are likely also relevant for scRNA-seq analyses.

The LTMI reveals a complex transcriptional landscape of secreted ligands and their receptors across NSCLC tumor sub-populations
To identify avenues for cross-talk between cell types in adenocarcinoma and SCC, we integrated the LTMI data with the FANTOM5 resource of ligand-receptor interactions, and the PRECOG resource of prognostic associations between bulk gene expression and overall survival (Fig. 2a) [25,26]. We examined the potential complexity of cell-cell interactions by comparing the number of populations in which a ligand was expressed with the number of populations in which its cognate receptor was expressed, using TPM >10 as a threshold, to be consistent with the criteria used by FANTOM5 ( Fig. 2b and Additional file 7: Table S7). In both NSCLC histologies, the most frequent pattern was many-to-one, where all four sorted populations expressed a ligand, but only one population expressed its receptor; however, this pattern was not statistically significantly more prevalent than others (p = 0.11 and p = 0.18 respectively in adenocarcinoma and SCC by chi-squared test). In general, both ligands and receptors could be seen to be uniquely or ubiquitously expressed, suggesting that transcriptional regulation of cellular crosstalk is occurring at the level of both ligand and receptor activity.
We identified genes that were significantly differentially expressed in specific cell types relative to others, separately in adenocarcinoma and SCC, focusing on those that were over-expressed in a single cell type relative to all others, i.e., uniquely differentially expressed genes (uDEGs), at FDR < 1% with a minimum twofold difference in expression ("Materials and methods"). We intersected these with genes coding for putative secreted factors and cell surface proteins. Expression levels of these genes are frequently associated with survival outcomes in lung cancer; however, the cell type producing these factors is often unknown (Additional file 8: Table S8). In both adenocarcinoma and SCC, the most prolific expressors of uDEG ligands were fibroblasts (Fig. 2c, d and Additional file 9: Table S9). Their corresponding uDEG receptors were most commonly expressed by endothelial and malignant cells (Fig. 2c, d). Malignant cells also highly expressed many ligands, but their receptors were most frequently also expressed in malignant cells, suggesting autocrine signaling. In adenocarcinoma, receptor uDEGs for ligands differentially expressed in immune cells were most often also on immune cells; with a similar pattern seen in endothelial cells. In contrast, in Fig. 2 a The Lung Tumor Microenvironment Interactome (LTMI) integrates data generated in this study, the FANTOM5 resource of ligand-receptor pairs, and PRECOG for prognostic associations of genes in bulk tumor samples. b Potential complexity of inter-cell-type signaling via secreted factors. Ligands or receptors were defined as significantly expressed in a cell type if they had TPM > 10, as in the FANTOM5 study. c, d Potential cross-talk between cell types in adenocarcinoma (c) and squamous cell carcinoma (d). Shown are the number of ligand-receptor pairs where each is a uniquely differentially expressed gene (uDEG) in the indicated cell type. Arrows X->Y indicate that the ligand is a uDEG in cell type X, while the corresponding receptor is a uDEG in cell type Y. e ANGPT1 and ANGPT2 compete antagonistically for receptor binding and have opposite prognostic associations in NSCLC. They are expressed on fibroblasts and endothelial cells respectively, with expression of their known receptors being predominantly in endothelial cells. (f) Expression patterns of ligands and receptors pair that are highly expressed (TPM > 10) in single-cell types (corresponding to the 1-1 entries for adenocarcinoma and SCC in panel (b). Pink indicates ligand whereas blue indicates receptor. Three rightmost panels are expanded views of the groups indicated in first panel SCC, we did not observe a bias towards expression of both ligand and receptor within immune and endothelial populations.
We noted that the vascular growth signaling angiopoietin genes ANGPT1 and ANGPT2 were expressed by fibroblasts and endothelial cells respectively whereas the cognate receptors encoded by TEK and TIE1 were only expressed on endothelial cells. In bulk tumors, high ANGPT1 expression is associated with good overall survival while high ANGPT2 is associated with poor survival. Their products function as a rheostat competing for receptor binding, and the LTMI suggests that this occurs in an intracell-type fashion (Fig. 2e). We further examined the pattern of expression of ligandreceptor pairs where each was highly expressed in a single cell type (Fig. 2f). One major group (Group 3) showed a pattern where fibroblasts were prolific expressors of ligands whose receptors were expressed on every possible cell type, suggesting autocrine and paracrine signaling. This included BMP (bone morphogenic protein) signaling pathways involving BMP3 and BMP2 which promote cell growth. Group 2 displayed a prominent enrichment for NOTCH-related signaling within the endothelial compartments of both adenocarcinoma and SCC. An enrichment for immune compartment expression of ligands/receptors dominated Group 1, with autocrine and paracrine cross-talk potential. The latter was mainly with endothelial cells (in both adenocarcinoma and SCC) or malignant cells (in adenocarcinoma only).
Overall, our results indicated potential for highly complex inter-population communication via ligand-receptor signaling, particularly initiating from fibroblasts to endothelial or malignant cells, and autocrine influences within the malignant compartment.
Identification of clinically relevant cell-type cross-talk using the LTMI We sought to identify and validate cross-talk between cell types in the LTMI that had potential clinical relevance (Fig. 2a). To this end, we focused on genes that were associated with patient survival and that were expressed in a specific cell type as assessed by RNA-seq. Among the resulting potential interactions, we selected two (GREM1 and TPSAB1, encoding mast cell tryptase MCT) for experimental validation. These represented cross-talk between fibroblasts and malignant cells (GREM1), and between immune and malignant cells (TPSAB1).

High expression of Gremlin-1 by fibroblasts correlates with proliferation of lung adenocarcinoma cells
High expression of GREM1, encoding for the secreted factor Gremlin-1, is associated with poor overall survival in lung adenocarcinoma but not squamous cell carcinoma (PRECOG meta-Z: + 4.11 in adenocarcinoma vs − 0.75 in SCC). Our data showed it to be expressed strongly in fibroblasts from both histologies, but not in other cell types (Fig. 3a). GREM1 inhibits bone morphogenetic protein (BMP) signaling by binding BMP ligands and preventing their interaction with their receptors [27]. Additionally, GREM1 has been shown to bind and activate the vascular endothelial growth factor (VEGF) receptor Kinase Insert Domain Receptor (KDR, also known as VEGFR2, one of two receptors for VEGF), which is expressed in endothelial cells of both adenocarcinoma and SCC. Interestingly, KDR is also expressed in the malignant compartment in We next sought evidence for a role for GREM1 in cross-talk between fibroblasts and malignant cells by using the LTMI to correlate gene expression levels in malignant cells from adenocarcinoma with the level of GREM1 in fibroblasts from the same tumors. Expression levels of genes involved in translation initiation, ribosomal biogenesis, and invasiveness in malignant cells were positively correlated with GREM1 expression in fibroblasts from the same patient in adenocarcinoma but not in SCC ( Fig. 3b; see also Additional file 10: Table S10). Genes related to cellular transformation and hypoxia were also higher when GREM1 was higher in adenocarcinoma, but not SCC. Additionally, higher adenocarcinoma fibroblast GREM1 correlated with lower malignant cell glucocorticoid metabolism gene expression. Together, these observations suggested that GREM1 production by fibroblasts might induce a more aggressive malignant cell behavior in adenocarcinoma but not squamous cell carcinoma. To further test this, we evaluated the relationship between fibroblast content and overall survival in TCGA adenocarcinoma and SCC tumors with CIBERSORT using the signature matrix defined by our purified cell populations (Additional file 5: Table S5). Patients with a higher inferred proportion of fibroblasts had worse overall survival in adenocarcinoma (p = 0.01 as a continuous variable, likelihood ratio test) but not in SCC (p = 0.83, not shown). An optimal dichotomization of adenocarcinoma into patients with fibroblast proportion higher or lower than 17% robustly separated survival curves (p = 0.0004, log-rank test; Fig. 3c).
Based on the results from the LTMI, we sought to functionally test if GREM1 can alter behavior of lung cancer cells. Lung cancer cell lines express GREM1 at varying levels, with~5500-fold range across SCC lines and nearly 13,000-fold across adenocarcinomas as measured in the Cancer Cell Line Encyclopedia (Additional file 11: Table  S11) [28]. To test a positive causal association of GREM1 with malignant cell behavior, we treated adenocarcinoma cell lines with low intrinsic expression of the gene (HCC78 and SW1573) with recombinant GREM1. Treatment with GREM1 increased both 2D colony and 3D tumorsphere formation by approximately twofold (Fig. 3d, e). Additionally, GREM1 treatment resulted in significantly higher migratory potential using (See figure on previous page.) Fig. 3 a GREM1 (encoding the secreted factor Gremlin-1) is highly expressed on fibroblasts in adenocarcinoma and SCC. Its receptor KDR is highly expressed in endothelial cells of both adenocarcinoma and SCC, and also in malignant cells from adenocarcinoma but not SCC. b Expression of GREM1 in fibroblasts is positively correlated with expression of proliferation and invasiveness related genes in malignant cells in adenocarcinoma (all adjusted p values < 0.05), but not in SCC. c High levels of fibroblasts inferred in adenocarcinoma from TCGA are associated with less favorable overall survival. d-f Treatment of low GREM1-expressing adenocarcinoma cell lines HCC78 and SW1573 with recombinant Gremlin-1 protein resulted in increased number of clones (red), sphere formation in 3-D culture (yellow), and invasion as evaluated by in vitro trans-well migration assays (magenta). g si-RNA knockdown resulted in decreased GREM1 expression in both H1755 and H1792 adenocarcinoma cell lines, which normally express it highly. h Knockdown of GREM1 expression reduced survival in both cell lines that highly express it. i Representative stain for GREM1 RNA shows expression confined to fibroblasts, that spatially colocate preferentially with leading edge of malignant cell nests. Malignant cells are highlighted in green. Black bars show closest malignant cell to each GREM1+ fibroblast. j Western blots showing (left) Gremlin-1 protein levels in CAFs from primary human NSCLC with low vs high GREM1 RNA levels (alpha-Tubulin control also shown), and levels of KDR and pKDR at baseline vs after co-culture with GREM1 low (+) and high (+++) CAFs. k Flow cytometry assessment of KI67 status of malignant cells before and after co-culture with CAFs expressing different Gremlin-1 protein levels in vitro trans-well migration assays (Fig. 3f). Thus, exogenous GREM1 increases aggressiveness of lung cancer cells in vitro.
As noted above, some lung cancer cell lines express high levels of GREM1, suggesting a potential tumor-promoting autocrine role in a subset of lung cancers. Consistent with this, we observed a range of GREM1 expression in the malignant cells from human tumors with a small number of outliers expressing significant levels of GREM1 (Fig. 3a). To test if GREM1 may have an autocrine function in these cells, we knocked down the transcript in high GREM1 expressing H1755 (which does not express the KDR receptor) and H1792 (which does express KDR) adenocarcinoma cells using siRNA. Knockdown reduced GREM1 transcript levels by 85% in H1755 and 54% in H1792 (Fig. 3g) and reduced survival of both cell lines by up to 50% after 8 days (Fig. 3h).

GREM1-expressing fibroblasts are preferentially spatially located adjacent to malignant cells
We further verified that GREM1 expression was confined to fibroblasts using in situ RNA hybridization on tumor tissues (Fig. 3i). Interestingly, visual inspection indicated that fibroblasts expressing GREM1 clustered around nests of cancer cells, suggesting a potential juxtacrine interaction between these cell types mediated by this pathway. In order to assess the spatial distribution of GREM1-positive cells, we stained and digitally imaged four tissue samples from tumors corresponding to very low, low, medium, or high GREM1 expression. We developed an automated image processing pipeline ("Materials and methods") to detect nuclei and classify cells as GREM1 positive vs. negative. We used this pipeline to quantitatively evaluate our qualitative observation that GREM1+ cells tend to be physically closer to tumor cells than other stromal cells. We manually annotated tumor regions in the four images and calculated the distance to the nearest tumor cell for every stromal cell, both GREM1+ and GREM1−. We then compared the distribution of these distances for GREM1+ vs. GREM1− using a Mann-Whitney U test for difference in the mean. For all three samples with GREM1 expression, the GREM1+ cells were significantly closer on average to malignant cells than GREM1− cells (p = 3 × 10 − 16 , 1 × 10 − 7 , and 1 × 10 − 10 for the low, medium, and high tissue samples respectively-no GREM1+ cells were detected in the very low GREM1 expression image). To further confirm this result, we performed a simulation study, repeatedly resampling the stroma nuclei as being GREM1+ vs. GREM1− while maintaining the same number of GREM1+ cells. We used the median distance of the GREM1+ cells to the nearest tumor cells as a test statistic, T. For all three samples with GREM1 expression, out of 10 5 simulations, T was never as small as for the observed configuration, implying a p value of < 1 × 10 − 5 in each case.

Co-culturing of malignant NSCLC cells with GREM1-producing fibroblasts engages KDR receptor and increases their proliferation
Exogenous GREM1 protein increased the proliferation of adenocarcinoma cell lines, but might be an indirect effect rather than mechanistic. To better validate the potential interaction, we co-cultured adenocarcinoma cell lines with primary CAFs expressing high or low amounts of GREM1. CAFs were obtained from new human NSCLC biopsies that were not part of the LTMI cohort, and subjected to RNA-seq analysis ("Materials and methods"). We selected CAFs that showed the lowest and highest amounts of GREM1 expression (Fig. 3j). We stained malignant cells with e-Cadherin (to guard against cross-contamination from other cell types) and the proliferation marker KI67. Proliferation was unchanged in malignant cells co-cultured with low-GREM1 CAFs (14.25% vs 15.8%; Fig. 3k); however, the proportion of KI67+ cells increased from 15.82 to 34.16% in malignant cells co-cultured with high-GREM1 CAFs. To further test for a causal connection, we evaluated the phosphorylation of the KDR receptor in malignant cells under these co-culture conditions, via Western blot with anti-Tyr1175 [29]. Phospho-KDR was not detected in baseline malignant cells, or when they were cultured with GREM1-low CAFs, but was present when they were cultured with GREM1-high CAFs (Fig. 3j and Additional file 1: Figure S6). Taken together, these results support the potential of GREM1 produced by NSCLC CAFs to engage and phosphorylate the cognate KDR receptor on malignant cells and to induce their proliferation as assessed by KI67 positivity.

Levels of infiltrating mast cells negatively correlate with tumor proliferation in adenocarcinoma and SCC
To further demonstrate of the utility of the LTMI, we investigated potential associations between the immune and malignant compartments. We noted that TPSAB1 (Tryptase Alpha/Beta 1) was highly expressed in sorted immune cells from both adenocarcinoma and SCC (p < 2.2 × 10 − 16 by ANOVA; Fig. 4a) and is favorably prognostic in both histologies across multiple datasets in PRECOG [25]. Among a panel of 22 different immune cell types, TPSAB1 expression was nearly 30-fold more highly expressed on mast cells (Additional file 14: Figure S7). We performed cross-population enrichment analysis by ranking genes in malignant cells by their correlation to TPSAB1 expression in immune cells across the cohort (Additional file 12: Table S12). In adenocarcinoma, high TPSAB1 expression in immune cells correlated with reduced malignant cell expression of proliferation and cell cycle genes as well as of genes related to metastasis (Fig. 4b). Few gene sets positively correlated with TPSAB1 expression, but included olfactory receptor genes and genes downregulated in gefitinib-resistant NSCLC. In SCC, we again observed negative association of immune TPSAB1 expression with metastasis and extracellular matrix genes in malignant cells, as well as VEGF and EGF signaling pathway genes (Fig. 4c). However, interestingly, proliferation genes in SCC malignant cells positively correlated with immune TPSAB1, in contrast to adenocarcinoma.
We validated the prognostic relevance of mast cells in NSCLC by immunohistochemical (IHC) staining of a lung tumor tissue microarray (TMA) for MCT (mast cell tryptase, encoded by TPSAB1). The lung TMA (n = 389 samples) was stained for MCT, and each core was scored for the number of mast cells by a pathologist. Mast cell infiltration was similar across NSCLC histologies, but higher in adenocarcinoma in situ relative to other types (Additional file 14: Figure S8). Within adenocarcinoma, mast cell counts were significantly higher in Stage 1 vs Stage 3 (p = 0.006 by t-test) but not in Stage 1 vs Stage 2 or Stage 2 vs Stage 3 (Additional file 14: Figure S8). There was no difference in mast cell levels across stages of SCC, though the modest sample size (n = 66) limited the statistical power of this analysis.
Mast cell counts were converted to levels of "high", "intermediate", "low", and "negative" ("Materials and methods"). In order to validate the relationship of mast cell levels to tumor proliferation, the same TMA was stained for the proliferation marker KI67 (Fig. 4d, see also Additional file 14: Figures S9 and S10). The proportion of KI67positive malignant adenocarcinoma cells was lower in tumors with high vs low/intermediate numbers of mast cells (Fig. 4e; p = 0.003, ANOVA F-test), consistent with the gene set-based analysis of our sorted RNA-seq data. Negative, low, and intermediate levels of mast cells all conferred worse overall survival than high mast-cell levels . f, g High numbers of mast cells in both primary adenocarcinomas (f) and SCC (g), assessed by tissue microarray staining for mast cell tryptase (MCT), were associated with better overall survival. Mast cell counts were assigned to pre-defined "none," "low," "medium," and "high" categories by pathologist whether considered across only non-squamous NSCLC (n = 214, Fig. 4f) or SCC (n = 66, Fig. 4g). Multivariable analysis indicated that mast cell levels carried prognostic information independent of stage (Additional file 13: Table S13), and Kaplan-Meier analysis within stages I, II, and III separately confirmed that the level of mast cell infiltration was prognostic across NSCLC and within adenocarcinoma (Additional file 14: Figure S11).

The LTMI: a resource for exploring lung tumor microenvironment interactions
To facilitate investigation of relationships between transcriptional profiles within the lung TMI, we developed an online resource, the Lung Tumor Microenvironment Interactome (https://lungtmi.stanford.edu). Users can select from sets of genes that are prognostic, expressed in a specific population, and/or encode for secreted or surface factors. Alternatively, a list of genes of specific interest can be entered manually. Given this set of genes, the LTMI interface can display differential expression between different sub-populations in adenocarcinoma and SCC. Correlations can be extracted between expression levels of these genes in a cell type of interest compared to other cell types. Gene set enrichment analysis, as performed in this study, can be applied to the resulting correlative output. A tutorial in the resource is available to recapitulate the results described here relating GREM1 in fibroblasts to malignant cell transcriptional programs. During the review process, a new comprehensive resource for ligandreceptor pairs, CellphoneDB, became available that we also make available within the LTMI interface [30]. This will be updated in our resource as new versions are released, while preserving the initial sets of pairs used in the analysis presented here.

Discussion
As with other malignancies, most research efforts on lung cancer have focused on the transformed cells themselves. This has led to the identification of important pathways and individual genes involved in oncogenesis such as EGFR, KRAS, and ALK [31][32][33][34]. Significantly less attention has been directed at investigating possible contributions of the tumor microenvironment to cancer formation, progression, and treatment response, although this is a burgeoning area of interest. Here, we developed a unique resource by profiling human primary lung tumors that were dissociated and sorted directly after surgical resection.
Prior applications of computationally derived regulatory networks have used whole tumor high-throughput data to gain insight into mechanisms underlying hematological cancers [35][36][37][38][39]. More recently, such computational approaches have been extended to solid tumors [40][41][42]. Previous work on profiling the tumor microenvironment has often been accomplished through the use of laser capture microdissection (LCM) in a variety of tumors, including those of the breast and lung [43,44]. However, it is difficult to separate endothelial cells, fibroblasts, and infiltrating immune cells using LCM and these are therefore usually lumped as one stromal sample in such studies. Our approach for gene expression profiling malignant and stromal cells within primary tumors involves dissociating the tumor tissues and then purifying individual cell subtypes based on the expression of surface markers. Our approach is thematically similar to previous work, but we believe is novel due to the size of the cohort studied, the focus on both lung adenocarcinoma and SCC, and the use of primary human samples. Choi et al. performed a study in a transgenic mouse model of lung cancer, focusing on three immune populations plus epithelial cells in a small number of samples, but did not consider the role of endothelial cells or fibroblasts which we found to be crucial players in the NSCLC TME [45]. In a related study, Durrans et al. examined gene expression in human tumor samples including myeloid cells, neutrophils, and epithelial cells but only in adenocarcinoma, and only from 6 patients [46]. Hence, they did not make any inferences about intercellular crosstalk. An earlier study in breast cancer did not consider fibroblasts which, in one of our key findings, are the most prolific expressors of genes encoding ligands that could mediate crosstalk [47]. Recent works have begun to exploit scRNA-seq for understanding the TME. One such study by Kumar et al. regressed interactions with tumor phenotypes in mouse models, but did not perform validation studies [48]. Finally, a seminal paper by Tirosh et al. dissected the landscape of metastatic melanoma using scRNAseq, but was descriptive in nature and did not attempt to reconstruct cellular cross-talk or perform validation experiments [49].
By RNA-seq profiling of cell types from lung tumors, we found that GREM1, high levels of which are associated with worse patient outcomes, is specifically expressed on fibroblasts in the adenocarcinoma microenvironment. The LTMI identified a positive association between fibroblast GREM1 expression and malignant cell proliferation genes. Gremlin-1 has been shown previously to induce proliferation of normal lung cells and to be over-expressed in adenocarcinoma, but not SCC, compared to normal lung [50]. However, to the best of our knowledge, neither its fibroblast origin, nor a specific role in stimulating proliferation of lung cancer lines has been noted. Cancerassociated fibroblasts (CAFs) represent a major class of stromal cells that interact with the malignant cell compartment in lung cancers [14]. CAFs appear biologically distinct from fibroblasts present in benign microenvironments [51]. Although several studies have found functionally important interactions between CAFs and lung cancer cells [52][53][54], the role of Gremlin-1 identified using the LTMI appears to be novel. Adenocarcinoma cell lines express GREM1 variably. Si-RNA knockdown in high-expressing cell lines resulted in reduced proliferation independent of KDR receptor expression. However, our data suggest that in primary tumors, receptor expression is required. In support of our computational inference from the LTMI, co-culturing of high-GREM1 expressing CAFs with malignant cells resulted in phosphorylation of the KDR receptor (encoded by VEGFR2) and increased their proliferation. Interfering with this TME interaction may therefore represent a novel therapeutic opportunity.
Interactions between malignant cells and infiltrating immune cells are another major class of microenvironmental interactions within lung tumors. There have been conflicting reports concerning a role for mast cells in cancer, and specifically in lung tumors, with some finding them to be a favorable prognostic factor, and others adverse [55,56]. We found similar contradictions when we examined the association of angiogenesis and inflammation markers in bulk expression data using PRECOG. On the one hand, VEGFA and VEGFC expression portended worse outcomes in adenocarcinoma (P = 3 × 10 − 6 and P = 2 × 10 − 6 respectively), whereas the expression of resistin (RETN), which has been implicated in a wide variety of inflammatory processes [57], was associated with better survival (P = 8 × 10 − 4 ). Such results illustrate the complexity of processes like inflammation, which can have pro-and anti-tumor effects, and the limitations in bulk data for disentangling them. Using the LTMI, we identified and validated a favorable prognostic association of mast cell infiltration in lung tumors. This finding was consistent with a novel inverse correlation between mast cell infiltration and numbers of KI67-positive malignant cells. The mechanistic influence of mast cells on NSCLC malignant cells clearly requires further investigation to understand whether there are differences in their spatial localization or functional state that might explain adverse vs favorable associations in different contexts; however, they have been proposed to have cytolytic activity in breast cancer [58].
Limitations of our study include the focus on four pre-defined sub-populations, the potential impact of cell dissociation and sorting on transcriptional profiles, and the restriction to expression data. It is possible that ligands and receptors that we identified in particular populations could also be present on other cell types that we did not profile. We attempted to allow for this by filtering for correlations of ligand expression with changes in gene expression of downstream genes in populations expressing the cognate receptor. Nonetheless, we were able to identify and validate associations between malignant and stromal cell types. An additional limitation is that we cannot perform these analyses within subgroups such as stage or genetic subtype because there would be insufficient sample sizes to obtain cross-correlations. It is likely that the TME and its interactions will differ in these contexts, and this represents an important direction for investigation. In the future, single-cell RNA-seq will increasingly be used to dissect the tumor microenvironment and will allow further resolution of transcriptional properties of malignant and stromal sub-populations within the TMI. However, it is not yet practical for large cohorts and still has technical limitations that preclude full coverage of the transcriptome.
Our LTMI resource is useful for generating hypotheses for subsequent follow-up, but cannot make definitive causal inferences, which would require more detailed validation with perturbation experiments. This drawback is common to computational methods leveraging retrospective primary sample genomic resources. The ultimate output of signaling pathways is whether their downstream pathways are transcriptionally activated. These would be the end point of phospho-signaling. Nonetheless, a variety of validated computational approaches use the activity of transcriptional targets to infer the activity of an upstream regulatory protein, in lieu of directly measuring its phosphorylation state [59,60], and these methods could be extended to utilize data such as provided in the LTMI.
In conclusion, we have developed a publicly available resource called the Lung Tumor Microenvironment Interactome that allows interrogation of potential interactions between cell subpopulations within human lung tumors. We anticipate that this resource will complement scRNA-seq analyses and facilitate future studies of lung cancer biology that will allow identification of novel drug targets for improving treatment outcomes for this devastating disease.

Materials and methods
Freshly resected surgical tumor samples from patients with NSCLC were dissociated and sorted as described [20] using A700 anti-human CD45 clone HI30 (pan-leukocyte cell marker), PE anti-human CD31 clone XWM59 (endothelial cell marker), APC antihuman EpCAM clone X9C4 (epithelial cell marker), and PE-Cy7 anti-human CD10 clone XHI10a (fibroblast marker). All antibodies were obtained from BioLegend (San Diego, CA). Library preparation and sequencing were performed as described previously. For co-culture experiments, fibroblasts were isolated from fresh human lung adenocarcinomas obtained immediately after patient resection under IRB approval (15166) and underwent immediate dissociation for single-cell suspension using a gen-tleMACS™ dissociator (Miltenyi Biotec) according to the manufacturer protocol. The single-cell suspension was then plated in a 35-mm dish and fibroblasts expanded for further use.

Cell lines and reagents
The human lung cancer cell lines were obtained from American Type Culture Collection. All cell lines were cultured in RPMI-1640, supplemented with 10% FBS and 100 mg/L penicillin/streptomycin, and maintained at 37°C with 5% CO 2 . The HCC-78 cell line was validated by exome sequencing.

Effect of exogenous gremlin-1 in lung adenocarcinoma cells
Recombinant Grem-1 protein (500 ng/ml) was added to lung adenocarcinoma cell lines (HCC78, SW1573) with low intrinsic GREM1 expression in 2D or 3D culture. The clones were stained with crystal violet and enumerated at 10-14 days after seeding the cells. Effect of Grem-1 on lung adenocarcinoma cell migration and invasion were evaluated using in vitro trans-well migration assays. Recombinant Grem-1 protein (500 ng/ ml) was added to lung HCC78 and SW1573 cell lines in 3D culture. Numbers of spheres were counted 8-10 days after seeding the cells on matrigel.
si-RNA knockdown of GREM1, viability, and clonogenic assays siRNA (30 M) targeted against Grem-1 (siGrem-1) was used to decrease GREM1 mRNA expression in lung adenocarcinoma cell lines with high GREM1 expression (H1755 and H1792). Viability of si-Grem-1 transfected cells was examined using the CellTiter 96 Non-Radioactive Cell Proliferation Assay (MTS), according to the manufacturer's protocol (Promega BioSciences). For clonogenic assays (2D), identical number of cells with or without treatment were reseeded at low density in six-well plates in triplicate and incubated at 37°C under 5% CO 2 . After 10 to 12 days, plates were washed, fixed in 50% methanol, and stained with 0.1% crystal violet and then the number of colonies was counted. Evaluation of colony formation was also conducted in 3D cell culture using matrigel (Corning) and cell culture inserts for 24-well plates (Corning). After 10 to 12 days, the number of spheres was enumerated under a light microscope.

In vitro migration and invasion assays
Effect of Grem-1 in the invasion of cells was assayed using the BD BioCoat Matrigel Invasion Chambers (BD Bioscience). Each well of a 24-well plate contained an insert with an 8-mm pore size PET (polyethylene terephthalate) membrane. Inserts coated with a thin layer of matrigel basement membrane matrix were used to measure the ability of the cells to invade through the reconstituted basement membrane. 1 × 10 5 cells were seeded inside the insert with medium containing 1% serum. High-serum (10%) medium was then added to the bottom chamber of 24-well plates to serve as a chemoattractant. After 24 h, the membranes were washed, stained, then separated with a sterile scalpel and mounted on a glass slide. The number of migrating cells was counted under a light microscope.
Co-culture and flow cytometry HCC827 lung adenocarcinoma cells were co-cultured at 1:1 ratio with primary fibroblasts expressing low (+) or high (+++) Gremlin-1. After 24 h, cancer cells were separated from the fibroblasts with Anti-Fibroblast MicroBeads (Miltenyi Biotec) according to the manufacturer's protocol. A HCC827 monoculture adjusted for the number of cells was used as control. The sorted cell types were kept at − 80°C for further analysis. CC827 monoculture or HCC827:fibroblast co-cultures were rinsed with PBS 1× and lifted off tissue culture plates using TrypLE (Life technologies, 12605-010). Aliquots of 1 × 106 cells per condition were stained using Zombie Aqua™ Fixable Viability Kit (Bio-Legend) for 5 min in PBS 1× at RT in the dark before washing with cell staining media (CSM: 0.5% w/v BSA, 0.02% w/v NaN3 and 2 mM EDTA in PBS). Cells were centrifuged at 500g for 5 min at 4°C to pellet cells and then fixed by adding PFA at a final concentration of 1.6% for 10 min at room temperature. Cells were washed with eBioscience™ Permeabilization Buffer solution (Thermo Fisher), then centrifuged at 500g for 5 min at 4°C to pellet cells and PFA was removed. Cells were then incubated with E-cadherin (PE/Cy7, Biolegend, Clone 67A4, 324,115), CD10 (BV421, BioLegend, Clone HI10a, 312218), and Ki67 (AF 647, BD Biosciences, clone B56, 558,615) antibodies in the permeabilization buffer for 30 min at 4°C. After two washes with CSM, cells were resuspended in 500 μl of CSM and analyzed using a BD LSRFortessa™ X-20 (BD Biosciences). Results were analyzed using Cytobank single-cell analysis software.

Quantitative PCR
qRT-PCR analysis was utilized to analyze expression changes of GREM1, MYC, CDKN1A (encoding p21 protein), and GAPDH. Total RNA was isolated from cells using the Paris Kit (Ambion). One microgram of total RNA was reverse transcribed using The High Capacity cDNA Reverse Transcription Kit (Applied Biosystems) as specified by the manufacturer. qRT-PCR was done using SYBRGreen PCR Master Mix (Applied Biosystems) and an ABI PRISM 7900 Sequence Detection System (Applied Biosystems). Primers for PCR amplifications (Table 1) were designed using Primer 3 Input (version 0.4.0). Relative mRNA levels were calculated using the 2ΔΔCt method.

Tissue microarray staining and analysis
The tissue microarray (TMA) was cut into 4-μm-thick sections, deparaffinized, and hydrated. For the Ki-67 staining, the TMA slide was subjected to Epitope Retrieval Solution 2 (ER2, Leica) antigen retrieval and stained with a prediluted anti-Ki-67 antibody (Mouse, Clone MIB-1, Dako #M7240) using an automated immunostainer. For the MCT (mast cell tryptase; corresponding to TPSAB1 gene) staining, the TMA slide was subjected to Cell Conditioning 1 (CC1, Ventana) antigen retrieval and stained with a prediluted anti-MCT antibody (Mouse, Clone G3, Millipore #MAB1222) using an automated immunostainer. Mast cell counts were assigned to pre-defined categories by pathologists as follows: "none" when 0 mast cells, "low" when 1-9 mast cells, "medium" when 10-30 mast cells, and "high" when greater than 31 mast cells were present in each entire 0.6 mm core.

Computational analysis
Briefly, paired-end reads were aligned to the human genome (GRCh38) using STAR version 2.5.0 [61], with Gencode v23 transcriptome annotation [62] using a two-pass approach. Alignment files were de-duplicated and further processed using the Genome Analysis Toolkit (GATK). Expression levels were also quantified using Salmon v0.4.2 [63]. Individual Salmon runs were integrated into a single expression matrix using the tximport package in R, using TPM (transcripts per million) to summarize to the gene level [64]. Despite the use of the Nugen kit for eliminating ribosomal rRNA, in common with previous studies [65], we found that a large and variable proportion of reads derived from mitochondrial rRNA, specifically MT-RNR1 (Mitochondrially Encoded 12S RNA) and MT-RNR2 (Mitochondrially Encoded 16S RNA). Accordingly, we renormalized the data matrix by removing these transcripts and rescaling each sample to have TPM sum to 10 6 . This preserved the relative ranking within each sample but rescales between samples, eliminating the distorting effect of the two mitochondrial rRNAs without changing the relative ranking on genes in TPM space. For subsequent analysis, we eliminated genes that had mean TPM < 1 in all sample subtypes (bulk, fibroblast, endothelial, immune, malignant) in all histologies (adenocarcinoma, SCC, or "other"). For clustering, visualization, and subsequent analyses, we used the moderated log of TPM, i.e., log 2 (1 + TPM).
We observed that there were significant batch effects between sequencing lanes based on Salmon quantification of RNA levels, with the majority of transcripts being significantly associated with sequencing lane. We applied batch correction at the level of flowcell identity using ComBat [66], with histology/sub-population as a model matrix, to avoid eradicating  biologically meaningful signals. After this step, the lane-level batch effect was largely eliminated. RNA-seq data are available in the Gene Expression Omnibus under accession number GSE111907 [67]. The lung-TMI website interface (http://lungtmi.stanford.edu) was built using R/Shiny. In order to assess the spatial distribution of GREM1-positive cells, we immuno-stained and digitally imaged four tissue samples corresponding to very low, low, medium, or high GREM1 expression. We developed an automated image processing pipeline using the Pillow 2.7.0 fork of the Python Imaging Library to detect nuclei and classify GREM1 positive vs. negative. This pipeline involved the following: 1. Non-negative Matrix Factorization (using the NMF function from scikit-learn) to separate the GREM1 and hematoxylin channels.
2. Applying a Laplacian filter with radius 6 to the hematoxylin channel followed by non-maximum suppression to detect nuclei centers.
3. Applying a Gaussian filter with radius 6 to the GREM1 channel and evaluating at the nuclei centers, followed by thresholding at 0.1 to detect positive vs. negative GREM1 expression.

Verification of sample identities
In order to verify that there were no sample swaps during preparation or sequencing, we performed pairwise comparison of all BAM files using bam-matcher [21]. This tool uses a set of known common SNPs and computes the overlap between genotypes of samples. We used the Freebayes genotyping option, a depth threshold of 10 for considering a position, and the largest available set of common SNPs (n = 7550).

Secreted and surface factors
We compiled lists of genes encoding potential secreted and surface factors from several sources. For secreted factors, we included the following: known chemokines and cytokines obtained by searching Entrez gene; genes whose SwissPROT function or localization included "secreted" as a keyword; and an additional list of WNT-and Sonic Hedgehog genes that we noted were not included among the previous groups. For surface factor genes, we took the computationally inferred list previously described as the "Surfaceome" [68]. We also incorporated information on ligand-receptor pairs defined by the FANTOM5 consortium.