The epigenetic landscape differed between normal lung and LUAD tissues
To explore the epigenetic landscape in LUAD, we assembled a cohort of clinical samples from patients who received surgery in Fudan University Shanghai Cancer Center (FUSCC). The histological subtype and tumor cell content of each LUAD sample were confirmed by two independent pathologists. H3K27ac high-throughput chromatin immunoprecipitation sequencing (ChIP-seq) was then successfully performed in 42 paired LUAD and adjacent normal lung samples. Among the 42 LUAD samples, 1 sample was minimally invasive adenocarcinoma (MIA) and 41 samples were invasive adenocarcinomas (IACs) containing different subtypes (Additional file 2: Table S1).
We next processed the ChIP-seq data for each sample as previously described before further analysis [28, 29]. Briefly, after data quality control, reads alignment, peak calling, and filtering, in total we identified ~ 100 thousand H3K27ac peaks for LUAD samples and ~ 60 thousand peaks for normal samples, mostly located at intergenic and intronic regions (Additional file 1: Figures S1A and S1C). Interestingly, the LUAD and normal lung tissues can be well separated by their H3K27ac landscape, indicating that epigenetic alteration was one of the major differences and might play important roles in LUAD tumorigenesis (Fig. 1a, Additional file 1: Figure S1B). We grouped the H3K27ac ChIP-seq profiles of LUAD and normal lung samples, and then used the MAnorm2 model to compare these two groups of profiles [29]. By this method, we detected a huge number of differential H3K27ac sites, including 4784 tumor-specific and 7645 normal-specific ones (Fig. 1b, c). We found that genes regulated by the normal-specific sites were enriched for basic cellular functions, such as actin cytoskeleton organization and GTPase activity whereas the tumor-specific sites were significantly associated with a number of genes known to be dysregulated in cancers (Additional file 1: Figure S1D), indicating that aberrant epigenetic modifications directly contribute to transcriptional dysregulation in LUAD.
We and others have previously investigated the H3K27ac landscape in cell lines instead of tissue samples to explore LUAD epigenetic alterations [27, 30]. Strikingly, we found that the H3K27ac profiles of LUAD cell lines, including the widely used A549 cell line and two derived from Chinese patients, were quite different from those of primary tumor tissues (Additional file 1: Figure S1E). This indicated that observation obtained from cell lines may not accurately represent the in vivo situation and directly profiling the epigenetic landscape in primary tissues is of irreplaceable value to cancer studies. Taken together, these data uncovered the dramatic difference of the H3K27ac landscape between LUAD and normal lung tissues, which might be of functional importance.
Super-enhancers control LUAD driver oncogenes
Previous studies have revealed that large clusters of enhancers, named super-enhancers (SEs), have prominent roles in determining cell identity, tumorigenesis, and chronic disease [28]. Compared to typical enhancers (TEs), SEs elicit stronger effects and predominantly exert a transactivation function to induce continuous and high expression of target genes [31]. We thus wanted to identify the SEs in our dataset and determine the role of SEs in LUAD tumorigenesis. To do so, we first separately identified SEs for each H3K27ac ChIP-seq sample using ROSE, a software specifically developed for this purpose [31, 32]. By this means, we identified ~ 800 and ~ 700 SEs for each tumor and normal sample, respectively, leaving the other distal H3K27ac sites as TEs (Additional file 1: Figure S1F). The SEs only accounted for a minor portion of the total enhancer domains (median, 5.58%) but for the majority of H3K27ac signals (median, 33.76%) (Additional file 1: Figure S1G). We merged the SEs identified from all tumors and normal samples and obtained 2,893 SEs in total. We further mapped the differential H3K27ac sites detected between LUAD and normal lung samples to these SEs and thus defined 205 tumor-specific and 324 normal-specific SEs (Fig. 1d; Additional file 1: Figure S2A - S2C). Interestingly, > 50% of the 11,574 differential H3K27ac sites at distal regions were located within SEs (Additional file 1: Figure S2D). This finding emphasizes that SE abnormalities might play an important role in LUAD development.
Generally, SEs and TEs regulate spatially closed genes, but they can also contact distant gene transcription start sites (TSSs) by forming long-range enhancer-promoter interactions [33]. While the most widely used method for annotating the target gene of each regulatory element is to map it to gene with the nearest TSS, we used a more reliable gene annotation method here which considered the correlation between the normalized H3K27ac intensities of each SE and those at gene promoters within 500 kb of the SE across all tumors and normal samples (Additional file 1: Figure S3A - 3C showing the SE at MAX-FUT8 gene locus as an example). In this way, we found that a number of well-known oncogenes, including MYC and SOX9, were annotated as targets of tumor-specific SEs, while several tumor suppressor genes in LUAD, such as DLC1 and RB1, were linked to normal-specific SEs (Fig. 1d; Additional file 3: Table S2, Additional file 4: Table S3). Among tumor-specific SE-associated genes, lots of genes were important transcriptional factors (TFs). Then, we ranked all the TFs and found SOX9 was ranked as top one tumor-specific SE-associated TF (Fig. 1e). We performed immunohistochemistry (IHC) staining of tumor-specific SE-associated transcription factor SOX9 on patient tissue samples and found that higher level of SOX9 was observed in tumors than adjacent normal lung tissues (Fig. 1f). We also performed IHC staining of other top-ranked TFs including SIX1, RUNX1, and SOX4 and again found higher expression level of these TFs in tumor tissues (Additional file 1: Figure S3E). The tumor-specific SE-associated genes were enriched in important pathways, such as chordate embryonic development (RUNX2, FOXA1), regulation of cell adhesion (RUNX1, SOX2 and VEGFA), and epithelial cell differentiation (SOX9, ELF3) (Fig. 1g). Some of these pathways are related to cancer stem cell (CSC) or LUAD cancer cell migration and invasion, including chordate embryonic development, cell fate specification, regulation of cell adhesion, and collagen catabolic process [34, 35]. In particular, the activity of SOX9 was known to be associated with the primitive transcriptional programs spanning stem cell-like to regenerative pulmonary epithelial progenitor states during metastasis in LUAD, consistent with a lot of epithelial differentiation and embryonic related pathways enriched in tumor-specific super-enhancer associated genes [34]. Genes associated with normal-specific SEs were enriched in cell growth regulation (DLC1, RB1, GATA6) and other basic cellular functions, such as cell morphogenesis, GTPase-mediated signal transduction and regulation of phosphate metabolic process (Fig. 1g). For example, CAV1 and CAV2 were marker genes of lung alveolar cells [36], and the H3K27ac levels at their nearby SEs were significantly upregulated in normal lung tissues compared to LUAD (Fig. 1h). Thus, in both normal lung and LUAD samples, SEs regulated key cell identity genes, and the SE abnormality contributed to tumorigenesis.
Previous studies have revealed that genomic rearrangement is highly associated with SE abnormality in cancers, and thus we focused on the most frequent fused oncogene ALK in LUAD [37]. ALK-fusion is a powerful driver mutant and important therapeutic target that occurs in ~ 5% of LUAD patients, especially in advanced stage patients [8]. The malignant behavior caused by ALK-fusion can be strongly inhibited by receptor tyrosine kinase inhibitors (TKIs), thus dramatically improved patients’ prognosis [38]. Three of the 42 patients involved in our study were detected with ALK-EML4 gene fusion, which hijacked (or translocated) the SE located upstream of EML4 (Fig. 1i). It was suggested that the fusion protein can activate several canonical signaling pathways, including RAS/MEK/ERK pathway and PI3K/AKT cascades [39], and our results indicated that the hijacked SE probably maintained aberrant expression of ALK-EML4. Similar SE hijacking was also found in one patient with ROS1-SLC34A2 fusion (Additional file 1: Figure S3F). These data suggested that SE abnormality could also directly get involved in LUAD oncogenesis through SE hijacking of driver oncogenes, and thus, the hijacked SE might be a potential target for LUAD therapy, especially for TKIs-resistant patients [32].
LUAD is composed of two different epigenetic states
Our data thus far highlighted the key epigenetic differences between normal lung and LUAD tissues. However, a high epigenetic heterogeneity was also observed among LUAD tissues from different patients (Fig. 1a). We next quantitatively evaluated the intertumoral variations of H3K27ac signals normalized by MAnorm2 on genome scale. In order to remove the mean-variance dependence, we fitted a mean-variance curve and selected those peaks with significant deviations above the curve [29]. By this method, we identified 4615 hyper-variable H3K27ac peaks (HVPs) among the tumor samples (Fig. 2a). We downloaded the susceptible SNPs of lung cancer identified by a previous genome-wide association study (GWAS) [40] and found they were highly enriched in the HVPs of LUAD samples (Additional file 1: Figure S4B). This finding suggests that the fluctuations of H3K27ac levels at these HVPs might provide valuable insights in understanding the epigenetic heterogeneity of LUAD. We further noticed that the vast majority of the tumor HVPs were not identified as HVPs across normal samples (Fig. 2b; Additional file 1: Figure S4C). It indicated that most of the intertumoral epigenetic heterogeneity observed at the tumor HVPs could hardly be explained by the epigenetic variations already existed in the patients’ normal lung tissue and it predominantly emerged during tumorigenesis. Meanwhile, there was little overlap between the tumor HVPs and the peaks upregulated in tumor samples compared to normal samples (previous identified tumor-specific peaks) showing consistent H3K27ac changes between tumors and normal tissues (Fig. 2b), suggesting that a very large fraction of them may be peaks specific to a subset of tumor samples. Thus, further investigation on the observed intertumoral epigenetic heterogeneity was needed and might provide insights into LUAD progression.
We next performed principal component analysis (PCA) on the normalized H3K27ac intensities of tumor HVPs across all tumor ChIP-seq samples. Surprisingly, we found that, for each sample, its score on the first principle component (PC1) was significantly associated with the lymph node invasion status of corresponding patient (Fig. 2c). Hierarchical clustering of the tumor samples’ PC1 scores clustered them (and thus the corresponding patients) into two subgroups, namely group I and group II (GI and GII, Fig. 2d). Of note, LUAD can be classified into several subtypes which are important predictors of prognosis by its predominant histologic component [41,42,43]. Based on that, we have divided the 42 LUAD patients in our cohort into three pathological subgroups: low-risk, median-risk, and high-risk (Additional file 2: Table S1). Here, after a comprehensive comparison of the pathological subtype, tumor size, and lymph node metastasis status of patients in two subgroups, we observed a weak difference in tumor size and a significant difference in lymph node invasion between GI and GII patients (Fig. 2d). More importantly, we found that the tumor samples’ PC1 scores were strongly associated with the patient pathological subtypes and could largely distinguish the more aggressive tumors (GII) from the less aggressive ones (GI) (Fig. 2d). High-risk pathological subgroup composed of micropapillary and solid predominant LUAD tumors was only contained in GII patients (Additional file 2: Table S1). In contrast, GI contained more low-risk pathological subgroup patients (Fig. 2d, Additional file 2: Table S1). Survival analysis of these two groups of patients further supported that GII patients had a significantly poorer prognosis than GI patients (Fig. 2e). In summary, by systematically dissecting the inter-tumor epigenetic heterogeneity, we were able to classify the LUAD samples into two subgroups that correlated with clinical outcomes.
In addition to clinical features, we also compared the gene mutation patterns between GI and GII patients. EGFR mutation occurred in > 50% of Asian LUAD patients and was the most frequent driver mutation in our cohort (27 of all 42 patients). However, we found no difference in the EGFR mutation frequency between GI and GII (Additional file 1: Figure S4E). Our previous studies revealed that patients with ALK-fusion are more likely to be in advanced stages, and the EML4-ALK variant 3 usually indicates a poor prognosis [44]. Consistently, all 3 ALK-fusion positive patients were found in GII and had a solid predominant subtype (Additional file 1: Figure S4E). In conclusion, based on the intertumoral heterogeneity of H3K27ac profiles, we successfully classified LUAD into two distinct subgroups with different pathological subtypes and clinical outcomes.
Epigenomic and transcriptomic changes underlie distinct pathways and malignancy between two LUAD subclasses
Our data suggested that patients in GI had better prognosis than patients in GII, as evidenced by the significant prolonged relapse-free survival (RFS) and over-all survival (OS) time (Fig. 2e). To investigate the reasons underlying these differences in survival, we performed RNA sequencing (RNA-seq) of the tumor samples and determined the differentially expressed gene (DEG) signatures for each group. In total, we identified 2501 significant DEGs between GII and GI (Fig. 3a). Functional enrichment analysis revealed that GII-upregulated genes were enriched in the cell cycle and DNA replication pathways, both of which were signatures of highly malignant tumors. Besides, FOXM1 pathway and E2F pathway which were both enriched in GII have been proven to be related to LUAD invasion and metastasis [45, 46]. Conversely, a number of genes involved in maintenance of normal cell functions and metabolic processes were significantly downregulated from GI to GII (Fig. 3b). Consistent with GI-specific genes signatures, several independent studies have shown that KRAS-driven murine LUAD preferentially arises from alveolar type II (AT2) cells, supported by the IHC staining of alveolar markers in early-stage tumors [47]. Surfactant metabolism-related pathways were usually highly activated in AT2 cells but significantly decreased in GII compared to GI, indicating that GI might somehow represent early-stage tumors. Next, we moved to test the epigenomic differences between GI and GII. After a quantitative comparison of the H3K27ac profiles between GI and GII tumors using MAnorm2, we defined 17,713 GII-specific and 13,408 GI-specific H3K27ac peaks, which then led to the identification of 194 GII-specific and 437 GI-specific SEs. We further found that these SE were highly correlated with the genes differentially expressed between GI and GII (Fig. 3c, d; Additional file 1: Figures S6 and S7A). We chose two subgroup-specific SE-associated genes, RUNX2 and NKX2-1(thyroid transcription factor 1, also known as TTF1), for IHC staining to assess their protein levels in GI and GII tumors. The LUAD maker NKX2-1 was reported to suppress LUAD progression, whereas RUNX2 was reported as key regulator to promote cancer progression [47, 48]. In our study, higher RUNX2 expression was detected in GII tumors compared to GI tumors, whereas NKX2-1 seemed to be specifically expressed in GI tumors (Additional file 1: Figure S7C and D).
To further investigate the significant pathways changed with epigenetic alterations, we ranked genes by the correlation of gene expression with PC1 scores across all tumor samples and performed gene set enrichment analysis (GSEA). The analysis confirmed the positive enrichment of gene sets related to cell cycle and DNA repair. On the contrary, the negative enrichment of genes set was associated with immune-cell signaling related pathway, such as asthma, complement, and coagulation cascades in aggressive tumors (Additional file 1: Figure S7B). These results indicated the strong connection between epigenetic alterations with transcriptional changes during LUAD progression. Accordingly, we found that more GII-specific enhancers were associated with cell cycle genes than GI-specific enhancers, indicating that epigenetic alterations might lead to group-specific transcription signatures (Fig. 3e; Additional file 1: Figure S8A).
We noticed that a number of lineage-specific TFs were upregulated in GI group and associated with GI-specific SEs, such as NKX2-1 (LUAD marker), FOXA2 (regulating surfactant protein production), and CEBPD (AT2 cell marker) (Fig. 3d). Loss of lineage identity during tumor development was shown in a recent study [47]. Thus, we used gene single sample set enrichment analysis (ssGSEA) to evaluate the activities of these two important pathways related to stem cell characteristics in GI and GII (Fig. 3f), and then performed GSEA analysis to verify the differential activation of stem cell related pathways between GI and GII (Additional file 1: Figure S8C and D). Compared with GI, the GII-upregulated genes were enriched in processes associated with rapid cell proliferation and a de-differentiated state (Additional file 1: Figure S8C and D). These results suggested that GI represented as more differentiated state while GII displayed a more stem cell-like phenotype. However due to the lack of temporal relationship between samples, we could not perform a trajectory analysis to check whether GI and GII corresponded to early and late-stage during tumor progression.
In order to investigate the epigenomic and transcriptomic changes in the continuum progression from differentiated state to de-differentiated state, we further divided GII tumor samples into GII.1 and GII.2 based on previous hierarchical clustering result and re-analyzed the epigenetic and transcriptomic changes among these three subgroups (Additional file 1: Figure S7E and F). GII.1 and GI had great differences in epigenome but not in transcriptome (Additional file 1: Figure S7G, third column), compared to the differential analysis between GII.2 and GI (Additional file 1: Figure S7G, first column). The epigenetic alterations in GII.1 compared to GI seemed like prior to the transcriptional changes between them during the progression. These results indicated that epigenetic regulators might be involved in the epigenome remodeling in GII.1 during the progression. Taken together, epigenetic heterogeneity led to the discovery of two LUAD subgroups and then different driver genes and biological pathways enriched in each LUAD subgroup, which can greatly help to explain the differences in clinical outcomes among patients.
DEGs between GI and GII tumors can predict LUAD prognosis
To verify the utility of our LUAD classification model, we incorporated the transcriptomic data of 504 LUAD patients in TCGA database from diverse populations and tried to map each patient to GI or GII based on transcriptome similarity. We first calculated the ssGSEA scores of GI-specific and GII-specific genes in the RNA-seq profile of each TCGA-LUAD patient’ tumor sample, which were called the GI and GII scores, respectively. We designated the TCGA patients with high GI score and low GII score samples as GI-like, and the patients with low GI score and high GII score samples as GII-like, leaving the remainder as the intermediate group (intergroup). In total, we defined 143 GI-like, 142 GII-like, and 219 intergroup patients. Consistently, GI-like patients had significantly better OS than GII-like patients (Fig. 3g). We also noticed that the majority of GI-like patients were labeled as stage I patients by TCGA, while most GII-like patients were at later stages (II-IV, Fig. 3i). In particular, even for the stage I patients, the GII-like ones showed clearly worse prognosis compared to the GI-like ones (Fig. 3h). These findings indicated that our LUAD classification model could be potentially used as a reference to predict tumor progression and risk.
Additionally, genomic mutation analysis of TCGA-LUAD datasets revealed more gene mutations in GII-like patients than in GI-like patients (Fig. 3j). Especially, some tumor suppressor genes (TSGs) including DLC1 and SYNE1, which have been found to be downregulated in our GII tumor samples compared to GI samples, were predominantly mutated in GII-like TCGA patients (Fig. 3j). This result indicated that loss of TSG function caused by gene mutation and downregulation can directly contribute to progression in tumorigenesis. Taken together, these results supported our new classification model and showed pronounced transcriptomic changes in these two epigenetic subgroups that correlate with the OS rate. The GI-GII score derivative from our classification model could be used to identify high-risk patients, especially in stage I, and, therefore, constituted an important supplement to current TNM staging system.
Subgroup-specific core regulators determined transcription characteristics in GI and GII
In order to explore key regulators driving the transcriptional differences between two subgroups, a co-expression network was constructed based on the differential expressed epigenetic regulators and transcription factors between GI and GII, which were considered regulators here, as well as their target genes identified by co-expression analysis. We found that the network formed two different sub-networks, which correspond to the two subgroups and were named as GI-specific network and GII-specific network, respectively (Fig. 4a). More importantly, GI-specific network contained 539 (50.4%) GI-specific genes, and GII-specific networks covered 947 (66.2%) GII-specific genes (Additional file 5: Table S4, Additional file 6: Table S5). These results indicated that the transcriptional characteristics of GI and GII were driven by different regulatory modules. We also found that the degrees of the transcription and epigenetic regulators showed a power law distribution (Additional file 1: Figure S9A), indicated that a small set of them had very high connectivity in the network, and thus might play important roles in driving the transcription characteristics of GI and GII. Then, we constructed the co-expression network of regulators and identified the core regulators of GI and GII, which not only had high connectivity with each other but also with other regulators (see methods). By this means, we identified 6 core regulators for GI and 18 core regulators for GII (Fig. 4b, c). In addition, 9 of the 18 GII-specific core regulators were annotated to cell cycle pathway (Additional file 1: Figure S9B), such as MYBL2, FOXM1, previously known to be related to cancer abnormal cell cycle [49]. In contrast, GI-specific core regulators were assigned to complement and coagulation cascades and metabolic related pathways (Additional file 1: Figure S9B), such as POU2F3, which has been identified as a good prognosis marker of LUAD. The better prognosis of high POU2F3 expression patients was confirmed by The Human Protein Atlas [50], with a 46% 5-year survival rate compared to 34% 5-year survival rate of low POU2F3 expression patients. The shift of core regulators might happen at the initial stage and drives the transition from GI to GII. We observed an upregulation of GII-specific core regulators EZH2 and EHMT2 in GII.1 compared to GI, and they were further upregulated in GII.2 (Additional file 1: Figure S9C). We also detected a progressive downregulation of GI-specific core regulators HLF and IRX1 (Additional file 1: Figure S9C). Compared with GI samples, GII.1 upregulated genes were highly enriched in epigenetic regulation of gene expression, consistent with the function of GII-specific core regulators (Additional file 1: Figure S9D). Moreover, by integrating with our previously defined links from super-enhancers, distal enhancers and proximal H3K27ac peaks to gene s[51], a large fraction of the GI and GII-specific core regulators were found to be directly assigned to at least one differential epigenetic element (Fig. 4d). In summary, the transcriptional characteristics of the LUAD subgroups identified here were driven by a number of subgroup-specific core regulators, which potentially influenced the biological pathways related to clinical prognosis and contributed to LUAD progression. Meanwhile, subgroup-specific epigenetic elements acted as upstream factors driving the differential expression of these core regulators.
Screening of genes regulated by GI-specific core regulators and in vivo validation reveals CLU as a potential tumor suppressor
After constructing a co-expression network dominated by the core regulators comprised of two subgroup-specific sub-networks, we asked whether the function of downstream genes regulated by these core regulators were directly linked with subgroup biological features. Making use of a previously compiled list of 49 TSGs whose expression was consistently downregulated in 11 different cancer types [52], we found that the GI core regulators target genes enriched more TSGs than GII core regulators target genes and the expression of these TSGs predominantly showed a positively correlation with GI-specific core regulators but a negatively correlation with GII-specific core regulators (Additional file 1: Figure S10A). Group-specific distal enhancers or SEs could control TSGs such as DLC1 and MAPK10 directly or indirectly via group-specific core regulators, which might help to explain the downregulation of these TSGs and tumor progression in GII (Fig. 5a). We then screened the genes directed linked with GI-specific core regulators to discover new candidate TSGs that can inhibit LUAD progression. As the top 3 GI core regulators were also regulated by SEs, we firstly searched the downstream genes co-regulated by HLF, KLF15, and POU2F3 (Fig. 5b). A total of 38 protein-coding genes were identified (Additional file 7: Table S6). Interestingly, there were multiple well-known TSGs, including MAPK10, a well-studied TSG in esophageal cancer, cervical cancer, and breast cancer (Additional file 1: Figure S10B) [53,54,55]. To screen for progression associated TSGs, we further selected the candidate genes whose expression was only downregulated from GI-like to GII-like TCGA-LUAD tumor samples, but not from normal to GI-like ones. Three GI-specific genes were filtered out and we noticed CLU, one potential LUAD TSG which was included in our previous TSGs screening study [56]. During LUAD progression, CLU was significantly downregulated from GI-like to GII-like samples. However, there was no significant difference between normal and GI-like samples (Fig. 5c). The epigenetic analysis also showed that CLU was directly regulated by GI-specific enhancers (Fig. 5d). These results suggested that CLU might act as a TSG to inhibit LUAD progression. Our previous study showed that CLU knockout mediated by CRISPR/Cas9 technique in KrasG12D-based genetically engineered mouse model (GEMM) promoted LUAD malignant progression [56]. We further found that ectopic CLU expression in CRL-5803 and PC9 cells significantly inhibited cell proliferation and suppressed colony formation in soft agar (Fig. 5e–g). Conversely, CLU knockdown in CRL-5872 cells accelerated cell proliferation and promoted colony formation in soft agar (Fig. 5h–j). We also assessed the protein levels of CLU as well as Ki-67, a proliferation marker, in our cohort using IHC staining. Compared with GI tumors, the CLU level was significantly downregulated in GII tumors. In contrast, the Ki-67 level was conversely upregulated in GII tumors (Fig. 5k, l). We here identified CLU as a new TSG in LUAD which was related to tumor progression.
We then further investigated the regulatory relationship between CLU and associated enhancers. Using dual gRNA CRISPR system [57], we successfully performed knockout of a GI-specific enhancer, which contains the binding motif of GI core regulator HLF, in CRL-5872 cells (Fig. 5m, n). Consistent with our CLU knockdown results, CLU enhancer knockout significantly downregulated the expression of CLU and accelerated cell proliferation (Fig. 5o, p). Taken together, these data support that GI tumors display relatively high expression of TSGs, which may be directly regulated by GI-specific enhancers to suppress malignant progression and contribute to better patient prognosis.