- Open Access
Convergent network effects along the axis of gene expression during prostate cancer progression
Genome Biology volume 21, Article number: 302 (2020)
Tumor-specific genomic aberrations are routinely determined by high-throughput genomic measurements. It remains unclear how complex genome alterations affect molecular networks through changing protein levels and consequently biochemical states of tumor tissues.
Here, we investigate the propagation of genomic effects along the axis of gene expression during prostate cancer progression. We quantify genomic, transcriptomic, and proteomic alterations based on 105 prostate samples, consisting of benign prostatic hyperplasia regions and malignant tumors, from 39 prostate cancer patients. Our analysis reveals the convergent effects of distinct copy number alterations impacting on common downstream proteins, which are important for establishing the tumor phenotype. We devise a network-based approach that integrates perturbations across different molecular layers, which identifies a sub-network consisting of nine genes whose joint activity positively correlates with increasingly aggressive tumor phenotypes and is predictive of recurrence-free survival. Further, our data reveal a wide spectrum of intra-patient network effects, ranging from similar to very distinct alterations on different molecular layers.
This study uncovers molecular networks with considerable convergent alterations across tumor sites and patients. It also exposes a diversity of network effects: we could not identify a single sub-network that is perturbed in all high-grade tumor regions.
Prostate cancer (PCa) represents one of the most common neoplasms among men with almost 1,300,000 new cases and 360,000 deaths in 2018  accounting for 15% of all cancers diagnosed. PCa is the fifth leading cause of cancer death in men and represents 6.6% of the total cancer mortality in men . Despite earlier detection and new treatments, the lifetime risk to die of PCa has remained stable at approximately 3% since 1980. (National Cancer Institute SEER data: https://seer.cancer.gov/statfacts/html/prost.html). In many patients, PCa is indolent and slowly growing. The challenge is to identify those patients who are unlikely to experience significant progression while offering radical therapy to those who are at risk. Current risk stratification models are based on clinicopathological variables including histomorphologically defined grade groups, prostate-specific antigen (PSA) levels, and clinical stage. Although those variables provide important information for clinical risk assessment and treatment planning [2, 3], they do not sufficiently predict the course of the disease.
Extensive genomic profiling efforts have provided important insights into the common genomic alterations in primary and metastatic PCa [4,5,6,7,8,9]. Interestingly, PCa genomes show a high frequency of recurrent large-scale chromosomal rearrangements such as TMPRSS2-ERG . In addition, extensive copy number alterations (CNAs) are common in PCa, yet point mutations are relatively infrequent in primary PCa compared to other cancers [6, 11]. A major complicating factor is that around 80% of PCas are multifocal and harbor multiple spatially and often morphologically distinct tumor foci [12, 13]. Several recent studies have suggested that the majority of topographically distinct tumor foci appear to arise independently and show few or no overlap in driver gene alterations [14,15,16]. Therefore, a given prostate gland can harbor clonally independent PCas.
To allow for a more functional assessment of the biochemical state of PCa, it is necessary to go beyond genomic alterations and comprehensively catalog cancer-specific genomic, transcriptomic, and proteomic alterations in an integrated manner [17,18,19]. Such an approach will provide critical information for basic and translational research and could result in clinically relevant markers. While hundreds of PCa genomes and transcriptomes have been profiled to date , little is known about the PCa proteome. Although recent work has emphasized the need for integrated multi-omics profiling of PCa, we still lack understanding about how genomic changes impact mRNA and protein levels [17,18,19]. Especially, the complex relationship between tumor grade, tumor progression, and multi-layered molecular network changes remains largely elusive.
For example, previous work has shown that copy number changes may alter transcript levels of many genes, whereas the respective protein levels remain relatively stable . Indeed, there is compelling evidence across multiple tumor types that many genomic alterations are “buffered” at the protein level and are hence mostly clinically inconsequential . To better understand the evolution of PCa and to identify core networks perturbed by genomic alterations and thus central for the tumor phenotype, it is therefore essential to investigate the transmission of CNAs to the transcriptomic and proteomic level.
To this end, it is important to decipher which genomic alterations impact PCa proteomes, which of those proteomic alterations are functionally relevant, and how molecular networks are perturbed at the protein level across tumors.
To address these open questions, we performed a multi-omics profiling of radical prostatectomy (RP) specimens at the level of the genome, transcriptome, and proteome from adjacent biopsy-level samples, using state-of-the-art technologies. Unique features of this study are (1) the utilization of pressure cycling technology (PCT)-sequential window acquisition of all theoretical mass spectra (SWATH) mass spectrometry [23, 24], allowing rapid and reproducible quantification of thousands of proteins from biopsy-level tissue samples collected in clinical cohorts; (2) the simultaneous profiling of all omics layers from the same tissue regions; (3) inclusion and full profiling of benign regions, which provides a matching control for each tumor; and (4) the full multi-omics characterization of multiple tumor regions from the same patients, thus enabling the detailed investigation of tumor heterogeneity. This design resulted in the multi-layered analyses of 105 samples from 39 PCa patients, as well as of the exome of corresponding peripheral blood cells yielding a comprehensive molecular profile for each patient, and identified molecular networks that are commonly altered in multiple patients. Importantly, some of the affected genes/proteins exhibited very small individual effect sizes, suggesting that the combined network effects of multiple genes may significantly contribute to determining PCa phenotypes.
Proteogenomic analysis of the sample cohort identifies known PCa biomarkers
In this study, we analyzed 39 PCa patients (Additional file 1: Fig. S1) belonging to three groups who underwent laparoscopic robotic-assisted RP. The patients were from the PCa Outcomes Cohort (ProCOC) study [25, 26]. Tumor areas were graded using the International Society of Urological Pathology (ISUP) grade groups , which range from ISUP grade group G1 (least aggressive) to G5 (most aggressive). The more advanced grade groups G4 and G5 were considered jointly (G4/5). The cohort tested included 12 low-grade (G1), 17 intermediate-grade (G2 and G3), and 10 high-grade (G4/5) patients (Fig. 1a, Additional file 1: Fig. S1, Additional file 2: Table S1). For low-grade PCa patients, we selected two representative regions, one of benign prostatic hyperplasia (BPH) and one of malignant tumor (TA). Since PCa often presents as a multifocal disease with heterogeneous grading within each prostate specimen , we analyzed two different tumor regions from the 27 intermediate- and high-grade patients. In those cases, three representative regions, including BPH, the most aggressive tumor (TA1), and a secondary, lower-grade tumor (TA2) , were analyzed. Thus, TA1 always represented the higher-grade nodule compared to TA2. Note that whereas each patient was assigned a patient-specific overall grade (i.e., “low,” “intermediate,” or “high”), each tumor area was additionally assigned an individual grade group based on its histological appearance. According to current ISUP guidelines, the grading of the entire prostate specimen depends on the size and grade of individual nodules . Thus, it is possible that the patient grading is lower than the grading of the most aggressive nodule, if another lower-grade nodule is larger. Tumor regions contained at least 70% tumor cellularity, and the distance between the analyzed areas (TA1 versus TA2) was at least 5 mm. Altogether, we obtained 105 prostate tissue specimens (Additional file 2: Table S1). Three adjacent tissue biopsies of the dimensions 0.6 × 0.6 × 3.0 mm were punched from each representative region for exome sequencing, CNA (derived from the exome sequencing data), RNA sequencing (RNA-seq), and quantitative proteomic analysis using the PCT-SWATH technology , respectively. Proteomic analysis was performed in duplicates for each tissue sample. Peripheral blood samples from each patient were also subjected to exome sequencing and served as the genomic wild-type reference (Fig. 1). All three types of grading (i.e., patient-specific overall grading, TA1 grading, and TA2 grading) were predictive of the recurrence-free survival (RFS) in our study.
In agreement with prior reports, we observed relatively few recurrent point mutations across patients (Additional file 1: Fig. S2, Additional file 3: Table S2), but substantial CNAs (Additional file 1: Figs. S3 and S4, Additional file 4: Table S3). Mutations in SPOP, FOXA1, and MED12 reported in independent cohorts [4,5,6,7,8,9] were confirmed in this cohort. In total, 1110 genes showed copy number gains in at least five samples or copy number losses in at least five samples (see Additional file 1: Supplementary Text for details). Additional file 1: Fig. S4 shows the CNA status of signature genes representing known areas of recurrent CNAs in PCa—split into fusion-partner and non-fusion-partner genes—for instance, loss of PTEN and gain of MYC in high-grade PCa . Likewise, our data confirmed the differential expression of several transcripts/proteins that had previously been suggested as PCa biomarkers or which are known oncogenes in other tumor types (Additional file 1: Supplementary Text and Fig. S5, Additional file 5: Table S4 and Additional file 6: Table S5). We further identified somatic fusions from the RNA-seq data. A large fraction of the tumors harbored ETS family gene fusions, which are frequently detected in PCa [8, 9, 11]. ETS fusions were mutually exclusive and appeared in tumors from all grade groups (Additional file 1: Fig. S6; Additional file 5: Table S4). This consistency with previously published results confirmed the quality of our data and motivated us to go beyond previous work by performing a network-based multi-omics multi-gene analysis.
Molecular perturbations correlate with tumor grade
Mutational burden is associated with PCa risk [8, 9, 11]. Hence, as the first step towards a cross-layer analysis, we asked if high-grade PCa would generally be affected by stronger alterations (compared to low-grade PCa) at the genome, transcriptome, and proteome layers . For that purpose, we devised molecular perturbation scores that quantified the number of affected genes/proteins and the extent to which these genes/proteins were altered in the tumor specimens compared to their benign controls (see the “Methods” section for details). In the case of the DNA layer, these scores carry a similar meaning as established mutational burden scores. However, we wanted to capture the effects at all three molecular layers measured in this study. Higher-grade tumors (G3 and G4/5) exhibited significantly higher molecular perturbation scores than lower-grade tumors (G1 and G2). Those differences were statistically significant in all but one case (P value < 0.05, one-sided Wilcoxon rank sum test, Fig. 2). The CNA perturbation magnitude exhibited the highest correlation with the PCa grading, confirming prior studies documenting the tight association between CNA, histopathological grade, and risk of progression [4, 5, 31]. Further, we found that mRNA fold changes (FCs) correlated more strongly with CNAs of the same genes than protein FCs (average CNA-mRNA Spearman ρ = 0.1 and average CNA-protein Spearman ρ = 0.02). This observation is in agreement with previous work, which suggested that copy number changes are to some extent buffered at the protein level [17, 21, 32]. Interestingly, we observed that proteins known to be part of protein complexes were significantly less strongly correlated with the FCs of their coding mRNAs than proteins not known to be part of protein complexes (P value < 2.6e−11, one-sided t test, Additional file 1: Fig. S7). This result is consistent with the concept that protein complex stoichiometry contributes to the buffering of mRNA changes at the level of proteins [21, 22, 33,34,35]. Thus, molecular patterns in high-grade PCa are more strongly perturbed at all layers, and the effects of genomic variation are progressively but non-uniformly attenuated along the axis of gene expression.
Effects of distinct CNAs converge on common proteins
It has previously been suggested that mutations affecting different genes could impact common molecular networks if the respective gene products interact at the molecular level . However, previous analyses were mostly restricted to individual molecular layers. For example, it was shown that genes mutated in different patients often cluster together in molecular interaction networks . Yet, the effects of these mutations on transcript and protein levels remained unexplored in this case.
Previous work of Sinha and colleagues already suggested extensive trans-effects of CNAs on mRNA and protein levels . Thus, here, we aimed to systematically explore how different CNA events would impact the level of one common protein. In order to prioritize potentially interesting proteins for such an analysis, we focused on the 20 proteins with the largest average absolute FCs across all tumor specimens (Additional file 1: Fig. S8, Additional file 7: Table S6). Thus, these proteins represent a set of proteins that were strongly affected across most tumors independent of tumor grade. Among them was PSA (KLK3), and several other well-established PCa-associated proteins like AGR2 , MDH2 , MFAP4 , and FABP5 . RABL3 was one of the most strongly downregulated proteins, which is a surprising finding as RABL3 is known to be upregulated in other solid tumors [41, 42]. Interestingly, in most cases, these proteins were from loci that were not subject to CNAs (Additional file 1: Fig. S8, Additional file 7: Table S6), hinting that independent genomic events would impact these target proteins via network effects in trans.
Among those top targets, we selected AGR2, ACPP, POSTN, and LGALS3BP for further analysis (Fig. 3a), because these proteins/genes had correlated protein- and mRNA FCs; thus, protein level changes were likely caused by cognate mRNA level changes. To identify potential regulators for each target gene, we used the STRING gene interaction network  and selected putative effectors at most one edge away from the target genes. Further, we required that neighbors or the target itself was subject to CNAs in at least four tumor samples (the “Methods” section). By including the target itself, we account for potential CNA cis-effects. However, only POSTN passed that filter. This filtering identified 13 neighbors of ACPP, 28 neighbors of POSTN (and POSTN itself), 14 neighbors of LGALS3BP, and one neighbor for AGR2, which was not further considered. Next, we correlated CNAs of those neighbor genes with the mRNA FCs of the respective target genes (Fig. 3b). We then used the non-neighboring genes (i.e., the network complement) to generate a background distribution of CNA-target correlations specifically for each target. Here, we also only considered genes with at least four CNAs across the tumor samples. Since STRING reports predicted functional associations between genes, we expected only a minority of the neighbors to actually correlate with their putative targets. Further note that edges in STRING could represent indirect gene-gene relationships. Yet, in the case of ACPP, we found that CNA levels of its 13 neighbors were on average more strongly correlated with ACPP mRNA FCs than the complement (Fig. 3b). This observation does not preclude the possibility that also some of the POSTN and LGALS3BP neighbors impacted their mRNA levels in trans. However, the fact that ACPP neighbors were on average more strongly correlated with ACPP mRNA FCs suggested to us that multiple of its network neighbors might be involved in tumorigenic downregulation of ACPP.
ACPP, which is also known as ACP3 or PAcP, is a prostate-specific acid phosphatase with a critical role in PCa etiology and has been suggested as a PCa biomarker long before PSA . ACP3 is known to inhibit cell proliferation and is therefore typically downregulated in PCa , despite elevated ACP3 protein levels in patient’s blood . In our cohort, ACP3 mRNA levels were strongly downregulated in all of the high-grade patients and in the vast majority of low- and intermediate-grade patients, suggesting that ACP3 downregulation represents an early event during PCa evolution. Despite its established role in PCa, little is known about the oncogenic driver events downregulating ACP3 .
We speculated that CNA events affecting different ACP3 neighbors might be in operation in different tumor specimens. Thus, to further narrow the list of candidates, we devised a multi-dimensional regression approach modeling the combined CNA effects of neighbors on ACPP mRNA FCs (the “Methods” section). Five neighbors (DGUOK, APRT, GOT1L1, NKX3-1, and ENTPD4) had statistically significant effects in that multi-dimensional model (Fig. 3c). We utilized two independent PCa cohorts (TCGA  and MSKCC ) to validate the potential effects of those five genes: we computed the association between the CNAs of each significant regulator and the corresponding mRNA log-FC of ACPP in each cohort and confirmed that the signs of the effects (i.e., effect directions) were the same for all five genes in all cohorts. Among those hits was NKX3-1, which is a prostate-specific tumor suppressor gene, and loss of a single allele may predispose to prostate carcinogenesis [46, 47]. NKX3-1 is a transcription factor found to have substantial trans-effects in PCa . Consistent with its potential role as an ACP3 regulator, NKX3-1 has been found to bind within 1 kb of the transcription start site of ACP3 (; GEO GSE40269). Interestingly, the CNA signatures of the five putative regulators split into two clusters affecting two distinct sets of patients (Fig. 3c): the first one harboring joint deletions of DGUOK and APRT, the second one harboring joint deletions of NKX3-1, ENTPD4, and GOT1L1. The latter three genes are all encoded on chromosome 8, and thus, their deletion may be due to single CNA events. DGUOK and APRT, however, are encoded on different chromosomes. Importantly, these events were clonal in most cases, i.e., they were mostly common to both tumor samples of a given patient. Hence, our network analysis hints that distinct deletions in the network vicinity of ACP3 can lead to the repression of this anti-proliferative protein. Taken together, these findings suggest that tumor mechanisms in different patients converged on common protein endpoints.
Joint network effects of CNAs drive tumor progression
The analysis above identified molecular networks driving tumor alterations and thus indicated altered biochemical states that were common to most tumor specimens. To identify sub-networks that specifically distinguish high-grade from low-grade tumors, we performed a distinct network analysis: we mapped our data onto the STRING gene interaction network  and employed network propagation [49, 50] separately to the CNA, transcriptome, and proteome data for each of the tumor samples. We excluded point mutations from this analysis as their frequency was too low in our cohort. By combining published molecular interactome data with a network propagation algorithm [36, 49], we aimed to “enrich” network regions with many perturbed genes/proteins. We reasoned that the convergent consequences of genomic variants on common network regions would be indicative of specific biochemical functions that are important for the tumor biology. We therefore identified genes/proteins in network regions that showed a higher score (or a lower score) in high-grade (G4/5) relative to lower-grade (G1) tumor groups at all three levels (Fig. 4a, b; the “Methods” section). This analysis identified sub-networks consisting of over- and under-expressed genes (relative to the benign controls). We found 57 amplified genes (Additional file 7: Table S6) for which transcripts and proteins were often over-expressed in high-grade PCa (Fig. 4a) and 21 genes with copy number loss (Additional file 7: Table S6) for which transcripts and proteins were often downregulated compared to lower-grade tumors (Fig. 4b).
Among the upregulated network nodes, we observed genes modulating the stability of chromatin, such as chromatin-binding protein chromobox 1 (CBX1) ; SET domain bifurcated 1 (SETDB1) , a function linking to H3K27me3 and H3K9me3 in chromatin; and CBX3 (known as HP1-γ) . SETDB1 is an oncogene in melanoma  and has also been found to be over-expressed in PCa and cell lines . Further, we found genes involved in DNA damage repair, such as SMG7 , ATR , and PRKCZ , which had already been suggested as a biomarker prognostic for survival in PCa . Multiple actin-related proteins including ARPC1B , ARPC5 , ACTL6A , and CFL1 , which are markers for aggressive cancers, were part of the upregulated network nodes. Moreover, the upregulated genes contained proteins related to the cell cycle like BANF1 and proteins interacting with the centrosome including LAMTOR1 and RAB7A that had already been associated with PCa . Finally, several signaling molecules with known roles in PCa were upregulated, such as the transcription factor Yin Yang 1 (YY1) , the TGF-β receptor TGFBR1 , and KPNA4, which promotes metastasis through activation of NF-κB and Notch signaling . Thus, upregulated network nodes are involved in DNA/chromatin integrity and growth control.
Likewise, several of the downregulated genes had functions associated with PCa, for example, the oxidative stress-related gene MGST1, which is recurrently deleted in PCa . ALDH1A3 is a direct androgen-responsive gene, which encodes NAD-dependent aldehyde dehydrogenase . DHCR24 is involved in cholesterol biosynthesis and regulated by the androgen receptor . Polymorphisms in CYP1A1 are associated with PCa risk in several meta-analyses among different ethnicities [71,72,73].
Further, our network analysis is suggesting tumor mechanisms converging on genes that are known contributors to PCa tumor biology. For example, the PCa-associated gene SF3B2 [74, 75] was only weakly amplified in some of the high-grade tumors (average log2FC = 0.016), and mRNA levels showed similarly small changes (average log2FC = 0.024). On the other hand, the SF3B2 protein levels were consistently and more strongly upregulated across tumors (average log2FC = 0.31), especially within the high-grade tumors (Additional file 1: Fig. S9). Another example is UBE2T whose over-expression is known to be associated with PCa . Unfortunately, we could not quantify the corresponding protein levels. However, we observed a strong and consistent mRNA over-expression across several tumors (average log2FC = 0.73), even though at the DNA level, the gene was only weakly amplified (average log2FC = 0.023; Additional file 1: Fig. S9). Our findings of more heterogeneous CNAs, but more uniform mRNA and protein alterations, point to convergent evolutionary mechanisms, as we move along the axis of gene expression.
Next, we analyzed the largest connected component with genes upregulated in advanced disease in more detail (see the “Methods” section). It consists of the nine nodes EMD, BANF1, ACTL6A, YY1, RUVBL1, KANSL1, MRGBP, VPS72, and ZNHIT1 (Fig. 4a) and is referred to in the following as Network Component 1 (Additional file 7: Table S6). Proteins encoded by these seven genes are involved in chromosome organization which may induce genomic alterations and influence the outcome of multiple cancers including PCa . For example, the actin-related protein ACTL6A is a member of the SWI/SNF (BAF) chromatin remodeling complex  and a known oncogene and a prognostic biomarker for PCa . Further, ACTL6A, RUVBL1, and MRGBP are together part of the NuA4/Tip60-HAT complex, which is another chromatin remodeling complex involved in DNA repair . Likewise, KANSL1 is involved in histone post-translation modifications, while VPS72 is a member of histone- and chromatin remodeling complexes . Thus, Network Component 1 consists of genes involved in chromatin remodeling and DNA repair, many of which are known to be involved in cancers.
Several samples were characterized by a small, but consistent, DNA amplification of multiple members of Network Component 1 (Fig. 4c). Out of the 66 tumor samples, there were 30 samples—belonging to all grade groups—with a weak but remarkably consistent DNA amplification of Network Component 1 members, while the high-grade samples had stronger amplifications on average (i.e., larger effect sizes). Importantly, gene members of Network Component 1 were dispersed across eight chromosomes (Additional file 7: Table S6). The parallel DNA amplification of these genes is therefore the result of multiple independent CNA events, while the signal on any single gene alone was too weak to be significant in isolation. Further, members of Network Component 1 were consistently amplified in both tumor areas (i.e., TA1 and TA2) of six patients (H2, H4, H5, H6, H8, and H9; Fig. 4c), thus establishing them as likely clonal events. In some but not all cases, the amplifications led to a small, but consistent increase in mRNA expression of the amplified gene loci (Fig. 4c). We were able to reconcile 40 tumor samples with a significant enrichment of this network component in either the CNA or mRNA layer. Unfortunately, only three out of the nine proteins were detected in our proteomics experiments (Fig. 4c). Interestingly, patients where the DNA amplifications led to transcript over-expression were almost always high-grade patients, whereas patients where the amplification affected the gene expression to a smaller extent were low- or intermediate-grade patients (Fig. 4c). Further, we noticed that TA2 samples graded as G3 from high-grade patients carried amplifications of Network Component 1, whereas tumor areas graded as G3 from intermediate-grade patients did not have amplifications of this network component (Fig. 4c). Thus, although the tumor areas were histologically equally classified, tumor areas from high-grade patients carried a CNA signature and expression patterns reminiscent of the high-grade areas from the same patients. Therefore, within the cohort tested, the joint DNA amplification of this network component along with RNA upregulation is a signature of high-grade tumors. Curiously, the higher-grade tumor areas of those high-grades patients (TA1) carried stronger DNA amplifications than the respective lower-grade areas (TA2), which implies that the progressive amplification of Network Component 1 during tumor evolution may contribute to an increasingly aggressive phenotype. To further corroborate the clinical relevance of this network perturbation, we analyzed published datasets of three additional PCa cohorts (TCGA , MSKCC , and Aarhus ), together comprising a total of 709 patients with known clinical outcome. We found that the amplification of genes from Network Component 1 was a significant predictor of reduced RFS in the MSKCC cohort (P value = 8.8e−3, log-rank test). In the TCGA cohort, we observed the same trend although the difference in RFS was not statistically significant (P value = 0.17; Fig. 4d). Additionally, we found that over-expression of genes from Network Component 1 was a significant predictor of reduced RFS in the TCGA cohort (P value = 2.1e−4, log-rank test), which was the cohort with the largest number of patients. In the other two cohorts, we observed the same trend, although the difference in RFS was not statistically significant (P value = 0.30 and 0.093 for MSKCC and Aarhus, respectively; Fig. 4d). Thus, both CNA and RNA changes of Network Component 1 are predictive of the time to relapse in independent cohorts. To also account for covariates, we fitted Cox proportional hazards models with the age and the copy number burden as (additional) covariates. When including only the age in our model, the results showed minor changes (Fig. 4d). When including both the age and the copy number burden in our model, the effect direction of Network Component 1 remained the same but was not statistically significant anymore (P value = 0.055 for TCGA, mRNA, and P value = 0.064 for MSKCC, CNA).
In conclusion, our findings suggest that relatively weak but broad CNAs of the entire network components are associated with high-grade tumors and that the presence of some of these perturbations in lower-grade tumors may be predictive of the future development of a more aggressive phenotype.
Analysis of distinct tumor nodules defines intra-patient heterogeneity (TA1 versus TA2 comparison)
The CNA patterns (Additional file 1: Fig. S4) and the Network Component 1 analysis (Fig. 4c) suggest that different tumor areas from the same patient shared several mutations. Such common signatures are expected if different tumor nodules originate from a common clone. If this was true, we would expect mutational signatures to be more similar between different nodules from the same patient than between patients, even though mutated genes may be shared across patients. To compare the intra- and inter-patient molecular heterogeneity at the levels of CNAs, transcript, and protein FCs, we computed the Pearson correlation between tumor area 1 (TA1) and its paired tumor area 2 (TA2) for each layer and all of the 27 patients with two characterized tumor areas (25 for the mRNA, see the “Methods” section and Additional file 1: Supplementary Text). As a control, we also computed all pairwise Pearson correlations between the samples within each of the grade groups (i.e., inter-patient correlation). As expected, paired TA1 and TA2 from the same patient were on average more strongly correlated to each other compared to samples from different patients within the same grade group. This finding was consistent for all omics layers (Fig. 5a) and was more pronounced at the CNA and mRNA layers compared to the protein layer.
Next, we tested whether a high correlation at the level of CNAs also implies a high correlation at the level of mRNA and proteins. We tested this idea by “correlating the correlations,” i.e., we correlated the TA1-TA2 correlation of CNA profiles with the correlation between the mRNA and protein profiles of the same tumor areas (Fig. 5b). Indeed, a higher correlation of two tumor areas at the level of CNA correlated significantly with a higher correlation at the level of mRNA (r = 0.49, P value = 0.014). In other words, knowing how similar two tumor areas of a patient are at the CNA level supports a prediction of their similarity at the mRNA level (and conversely). Although the correlation between protein and CNA was not statistically significant, it followed the same trend (r = 0.35, P value = 0.076).
Comparing molecular similarity across omics layers allowed us to identify specific types of patients. The patients H2, H4, and M13 had highly correlated tumor areas at all three layers (upper right corner in all scatterplots of Fig. 5b). Likely, the tumor areas of these patients have a common clonal origin (Additional file 1: Fig. S3). In contrast, patients M12 and M14 had weakly correlated tumor areas at all levels (bottom left corner in all scatterplots of Fig. 5b). These tumor nodules either have independent clonal origins or they diverged at an earlier stage during tumor evolution (Additional file 1: Fig. S3) . For example, in the case of patient M12, large parts of the genome were not affected by CNAs in the benign sample as well as in TA1 and TA2. However, as shown on Additional file 1: Fig. S3, a large region was amplified in TA1, whereas the same region was deleted in TA2. This is consistent with a scenario in which TA1 and TA2 show parallel evolution. The third class of patients is exemplified by the patients M9 and M17, who showed a high correlation between their tumor areas on the CNA and mRNA levels, but not on the protein level. Yet, other patterns were apparent in patients M4, M7, and H10. They showed similar mRNA and protein patterns in the two tumor areas, but relatively uncorrelated CNAs. The results here apply to global proteome patterns and therefore hint that such convergent network effects of CNAs can be frequent. We confirmed that protein-level similarity correlated with similar histological characteristics of the tumor areas. Additional file 1: Fig. S10 shows formalin-fixed paraffin-embedded (FFPE) tissue microarray images (duplicates) from the analyzed tumor nodules (TA1 and TA2, diameter 0.6 mm), further underlining the hypothesis that ultimately protein-level alterations are responsible for common cellular phenotypes. Although we cannot fully exclude the possibility that some of these results were affected by technical noise in the data, our findings suggest that transcript alterations can frequently be buffered at the level of proteins (patients M9, M17, Additional file 1: Fig. S7) and that convergent evolutionary processes may lead to the alteration of common proteins (patients M4, M7, H10). We also note that our findings are specific to the two tumor areas available in this study and could be different if other nodules had been sampled for each of the patients. However, our findings on patients with weakly correlated tumor areas at all levels like M12 and M14 suggest that these patients might carry more than one disease .
Despite 20 years of oncological research involving genome-scale (omics) technologies, we know remarkably little about how the discovered genomic alterations affect the biochemical state of a cell and consequently the disease phenotype. In particular, little is known about how genomic alterations propagate along the axis of gene expression [17, 18]. Here, we have exploited recent technological advances in data acquisition that made it possible to characterize small samples of the same tumor specimens at the level of genomes, transcriptomes, and proteomes and advances in computational strategies towards the network-based integration of multi-omics data.
In our study, samples were generated from small, less than 1 mm diameter punches in immediate spatial proximity in the tumor and subsequently profiled at all three “omics layers” (DNA, RNA, proteome). Due to the large spatial heterogeneity of PCa [14, 24], this design—which is so far uncommon for studies profiling multiple layers from tumor specimens—was instrumental for increasing the comparability of the various omics layers and thus facilitated the analysis of molecular mechanisms. Our key findings are as follows: (1) we confirmed the importance of CNAs for PCa biology and the alteration of many known PCa-associated genes at the transcript and protein levels, (2) we revealed a generally elevated molecular alteration of high-grade tumors compared to lower-grade tumors, and (3) although our study confirmed large within- and between-patient genomic heterogeneity, (4) we detected molecular networks that were commonly altered at the mRNA and protein levels. The fact that many of those target molecules are known drivers of PCa tumorigenesis supports the notion that these proteins/transcripts are subject to convergent evolutionary mechanisms.
We integrated the three omics layers using a network-based approach as opposed to directly comparing gene perturbations (mutations) to gene products (transcripts and proteins). Using genome data only, it had previously been hypothesized that whereas the identity of specific mutated genes may differ between tumors, those mutations might still affect common molecular networks . In other words, tumor phenotypes are determined by the perturbation of molecular networks and not by the perturbation of isolated genes. Our study provides experimental evidence that such network effects are indeed propagated to subsequent molecular layers and that this effect propagation may be clinically relevant. A very prominent example is the indication derived from our data that the long-known PCa gene ACPP (ACP3) is downregulated through diverse CNA events. Of particular interest is the potential role of NKX3-1 in ACP3 downregulation. Although both genes have a well-established PCa association, their regulatory relationship had not been reported so far (to our knowledge).
Our multi-omics network analysis revealed molecular sub-networks that distinguished high-grade PCa tumors from low-grade tumors. Specifically, our analysis led to the identification of Network Component 1, a sub-network involved in chromatin remodeling and consisting of genes that were weakly amplified in intermediate-grade (G3) tumor specimens. Signals of individual gene members of this component were virtually indistinguishable from noise in our cohort. However, their consistent alterations across the network region, across molecular layers and the fact that the same genes showed enhanced signals in high-grade specimens, rendered this component highly interesting. The fact that copy number and expression changes of Network Component 1 members were predictive for survival in independent cohorts further supports the potential clinical relevance of this sub-network. Amplification of Network Component 1 was to some extent confounded with overall CNA burden (r = 0.58 (TCGA, CNA), r = 0.55 (TCGA, mRNA), r = 0.34 (MSKCC, CNA), r = 0.16 (MSKCC, mRNA)). However, the amplifications of Network Component 1 members were highly correlated and on average above the background of CNAs. Thus, the coordinated amplification of Network Component 1 does not simply mirror the overall CNA burden. Our network-based cross-omics analysis identified nine other network components (Fig. 4) successfully capturing several known and potentially new PCa-associated genes. However, neither Network Component 1 nor any of the other network components were uniformly subject to CNAs across all high-grade patients. Instead, we found different network components modified in different patients, and these sub-networks were involved in cellular processes as diverse as actin remodeling, DNA damage response, and metabolic functions, all of which are known contributors to PCa biology. This further underlines the large inter-patient variability of PCa, and it demonstrates the diversity of molecular mechanisms leading to histologically similar phenotypes. Future prediction models of PCa including the ISUP grade groups, PSA levels, and clinical stage might be improved by exploiting multi-omics network analyses. Detecting aggressive network alterations in prostate biopsies would help clinicians to advise either active surveillance or active therapy. However, the development of such multi-dimensional biomarkers would require much larger patient cohorts.
Another distinguishing feature of this study was the simultaneous profiling of two different tumor regions in 27 out of the 39 patients. The profiling of multiple tumor regions from the same prostate helped to further highlight the enormous heterogeneity of PCa within patients and provided important insights into PCa evolution. The fact that Network Component 1 was more strongly affected in the paired higher-grade nodules of high-grade patients suggests that at least certain sub-networks are subject to an evolutionary process, which progressively “moves” protein levels towards a more aggressive state. Generally, and at all molecular layers tested, the two paired tumor areas were more similar to each other compared to two samples from the same grade group but different patients, suggesting common evolutionary origins. Although the two tumor areas seemed to mostly originate from the same clone, this was not always the case. In some patients, different nodules exhibited different molecular patterns at all omics layers, suggesting early evolutionary separation. Thus, for the first time, current diagnostic, expert-level consensus guidelines  are supported by detailed proteogenomic data. Our findings support earlier claims that clonality itself might be a prognostic marker with implications for future, more tumor-specific treatment when targeted therapies become available also for PCa [16, 83].
Our study shows that all three molecular layers (genome, transcriptome, and proteome) contributed valuable information for understanding the biology of PCa. In particular, the DNA layer informed about causal events, clonality, and genomic similarity between tumors. The transcriptome was relevant for understanding the transmission of CNA effects to proteins and served as a surrogate in cases where protein levels remained undetected. The proteome was crucial for revealing protein-level buffering of CNA effects as well as for indicating convergent evolution on functional endpoints. In a routine diagnostic context though, measuring all three layers may not be feasible for the near future due to resource and time limitations. Thus, the identification of improved, routine-usable molecular markers for PCa diagnostics and prognosis remains an open problem .
This study uncovered molecular networks with considerable convergent alterations across tumor sites and patients. In particular, we identified a sub-network consisting of nine genes whose joint activity positively correlated with increasingly aggressive tumor phenotypes. The fact that this sub-network was predictive for survival in independent cohorts further supports its potential clinical relevance. At the same time though, our study also exposed a diversity of network effects: we could not identify a single sub-network that was perturbed in all high-grade tumor regions, let alone the observed distinct intra-patient alterations at all omics layers for some patients. Overall, our study has significantly expanded our understanding of PCa biology and serves as a model for future work aiming to explore the network effects of mutations with an integrated multi-omics approach.
Patients and samples
A total of 39 men with localized PCa who were scheduled for RP were selected from a cohort of 1200 patients within the ProCOC study and processed at the Department of Pathology and Molecular Pathology, University Hospital Zurich, Switzerland . Each of the selected intermediate- and high-grade patients had two different tumor nodules with different ISUP grade groups. Hematoxylin and eosin (H&E)-stained fresh frozen tissue sections of 105 selected BPH and tumor regions were evaluated by two experienced pathologists (PJW, NJR) to assign malignancy, tumor stage, and grade group according to the International Union Against Cancer (UICC) and WHO/ISUP criteria. This study was approved by the Cantonal Ethics Committee of Zurich (KEK-ZH-No. 2008-0040); the associated methods were carried out in accordance with the approved guidelines, and each patient has signed an informed consent form. Patients were followed up on a regular basis (every 3 months in the first year and at least annually thereafter) or on an individual basis depending on the disease course in the following years. The RFS was calculated with a biochemical recurrence (BCR) defined as a PSA ≥ 0.1 ng/ml. Patients were censored if lost to follow-up or event-free at their most recent clinic visit. Patients with a postoperative PSA persistence or without distinct follow-up, data for the endpoint BCR were excluded from the analysis of BCR.
Exome sequencing and somatic variant analysis
The exome sequencing (exome-seq) was performed using the Agilent Sure Select Exome platform for library construction and Illumina HiSeq 2500 for sequencing read generation. We mapped and processed the reads using a pipeline based on bowtie2  (1.1.1) and the Genome Analysis Tools Kit (GATK)  (3.2-2). We detected and reported non-synonymous variants or variants causing splicing changes using Strelka (1.0.14) and Mutect (1.1.7) combined with post-processing by the CLC Genomics Workbench (8.0.3). In this process, all tissue samples of a patient were compared to the respective blood sample.
Trimmomatic  (0.36) was used for adaptor clipping and low-quality subsequence trimming of the FASTQ files. Subsequently, single reads were aligned to the hg19 reference genome with bowtie2 with options “--very-sensitive -k 20.” We applied samtools  (0.1.19) and picard-tools (1.119) to sort the resulting bam files in coordinate order, merge different lanes, filter out all non-primary alignments, and remove PCR duplicates. The quality of the runs was checked using a combination of BEDtools  (2.21), samtools, R (3.1), and FastQC (0.11.2).
Bam files containing the mapped reads were preprocessed in the following way: indel information was used to realign individual reads using the RealignerTargetCreator and IndelRealigner option of the GATK. Mate-pair information between mates was verified and fixed using Picard tools, and single bases were recalibrated using GATK’s BaseRecalibrator. After preprocessing, variant calling was carried out by comparing benign or tumor prostate tissue samples with matched blood samples using the programs MuTect  and Strelka  independently. Somatic variants that were only detected by one of the two programs were filtered out using CLC Genomics Workbench, so were those that had an entry in the dbSNP  common database and those that represented synonymous variants without predicted effects on splicing.
CNA analysis of exome-seq data
The bam files generated during the process of somatic variant calling were processed with the CopywriteR package (v.2.2.0) for the R software . CopywriteR makes use of the so-called off-target reads, i.e., reads that cover areas outside of the exon amplicons. “Off-target” reads are produced due to inefficient enrichment strategies. In our case, on average, 28.5% of the total reads were not on target. Briefly, CopywriteR removes low-quality and anomalous read pairs, then peaks are called in the respective blood reference, and all reads in this region are discarded. After mapping the reads into bins, those peak regions, in which reads had been removed, were compensated for. Additionally, read counts are corrected based on mappability and GC content. Finally, a circular binary segmentation is carried out, and for each segment, the log count ratios between tissue samples and the respective blood sample are reported as copy number gain or loss. The copy number of each gene in each sample was reported based on the log count ratio of the respective segment in which the gene was located. The overall performance of this CNA-calling approach was evaluated by comparing the results of the TA1 (and TA) samples with CNA results obtained by applying the OncoScan Microarray pipeline to FFPE samples from the same tumors (Additional file 1: Fig. S11).
OncoScan copy number assays were carried out and analyzed as described previously . Briefly, DNA was extracted from punches of FFPE cancer tissue blocks. Locus-specific molecular inversion probes were hybridized to complementary DNA, and gaps were filled in a nucleotide-specific manner. After amplification and cleavage of the probes, the probes were hybridized to the OncoScan assay arrays. Scanning the fluorescence intensity and subsequent data processing using the Affymetrix® GeneChip® Command Console and BioDiscovery Nexus express resulted in log intensity ratio data (sample versus Affymetrix reference) and virtual segmentation of the genome into areas with copy number gain, loss, or stability.
RNA sequencing was performed at the Functional Genomics Center Zurich. RNA-seq libraries were generated using the TruSeq RNA stranded kit with PolyA enrichment (Illumina, San Diego, CA, USA). Libraries were sequenced with 2 × 126 bp paired-end on an Illumina HiSeq 2500 with an average of 105.2 mio reads per sample.
Paired-end reads were mapped to the human reference genome (GRCh37) using the STAR aligner (version 2.4.2a) . Quality control of the resulting bam files using QoRTs  and mRIN  showed strong RNA degradation  in a significant fraction of the samples: mRIN classified 31 samples as highly degraded (Additional file 1: Fig. S12, Additional file 5: Table S4). In order to correct for this 3′ bias, 3 tag counting was performed as described by Sigurgeirsson et al.  using a tag length of 1000. After 3′ bias correction, three samples still showed a clear 3′ bias: the two tumor regions (TA1 and TA2) of the patient M5 and TA2 from patient M8 (Additional file 1: Fig. S12). These samples were excluded from subsequent analyses. Additionally, the BPH region of the patient M5 was excluded due to the exclusion of both its tumor regions.
FeatureCounts  was used to determine read counts for all genes annotated in ENSEMBL v75. Genes for which no read was observed in any of the samples in the original data were excluded from the analysis. Further, after 3 tag counting, all genes with without at least 1 read per million in N of the samples were removed. We chose N to be 10 which corresponds to the size of the smallest grade group (G2). In the last reduction step, all genes with more than one transcript were excluded, yielding a final set of 14,281 genes.
Read count normalization and differential gene expression analysis was performed using the R packages sva  and DESeq2 . All benign tissues were considered biological replicates and differential gene expression for the individual tumor samples was determined against all benign tissues. Gene expression changes with an adjusted P value < 0.1 were considered significant.
RNA-seq—3′ bias correction
The 3 tag counting approach for 3′ bias correction was used on the RNA-seq dataset . This approach requires changing of the annotation file in two steps: (1) isoform filtering and (2) transcript length restriction. As proposed in  for each gene, we determined the highest expressed isoform within a set of high-quality samples. All samples with an mRIN score greater than or equal to 0.02 were considered to be of high quality. This set contains 7 benign and 15 tumor samples. Isoform expression was determined using cufflinks . As for transcript length, we chose 1000 bp.
FusionCatcher (version 0.99.5a beta) was used to determine gene fusions for all samples. Fusions classified as “probably false positive” are discarded unless they are also classified as “known fusion.”
PCT-assisted sample preparation for SWATH-MS
We first washed each tissue sample to remove O.C.T., followed by PCT-assisted tissue lysis and protein digestion, and SWATH-MS analysis, as described previously . Briefly, a series of ethanol solutions were used to wash each tissue, including 70% ethanol/30% water (30 s), water (30 s), 70% ethanol/30% water (5 min, twice), 85% ethanol/15% water (5 min, twice), and 100% ethanol (5 min, twice). Subsequently, the tissue punches were lysed in PCT-MicroTubes with PCT-MicroPestle  with 30 μl lysis buffer containing 8 M urea, 0.1 M ammonium bicarbonate, complete protease inhibitor cocktail (Roche), and PhosSTOP phosphatase inhibitor cocktail (Roche) using a barocycler (model NEP2320-45k, PressureBioSciences, South Easton, MA). The lysis was performed with 60 cycles of high pressure (45,000 psi., 50 s per cycle) and ambient pressure (14.7 psi, 10 s per cycle). The extracted proteins were then reduced and alkylated prior to lys-C and trypsin-mediated proteolysis under pressure cycling. Lys-C (Wako; enzyme-to-substrate ratio, 1:40)-mediated proteolysis was performed using 45 cycles of pressure alternation (20,000 psi for 50 s per cycle and 14.7 psi for 10 s per cycle), followed by trypsin (Promega; enzyme-to-substrate ratio, 1:20)-mediated proteolysis using the same cycling scheme for 90 cycles. The resultant peptides were cleaned using SEP-PAC C18 (Waters Corp., Milford, MA) and analyzed, after spike-in 10% iRT peptides , using SWATH-MS following the 32-fixed-size-window scheme as described previously [24, 105] using a 5600 TripleTOF mass spectrometer (Sciex) and a 1D+ Nano LC system (Eksigent, Dublin, CA). The LC gradient was formulated with buffer A (2% acetonitrile and 0.1% formic acid in HPLC water) and buffer B (2% water and 0.1% formic acid in acetonitrile) through an analytical column (75 μm × 20 cm) and a fused silica PicoTip emitter (New Objective, Woburn, MA, USA) with 3-μm 200 Å Magic C18 AQ resin (Michrom BioResources, Auburn, CA, USA). Peptide samples were separated with a linear gradient of 2 to 35% buffer B over 120 min at a flow rate of 0.3 μl min− 1. Ion accumulation time for MS1 and MS2 was set at 100 ms, leading to a total cycle time of 3.3 s.
SWATH assay query library for prostate tissue proteome
To build a comprehensive library for SWATH data analysis, we analyzed unfractionated prostate tissue digests prepared by the PCT method using Data Dependent Acquisition (DDA) mode in a tripleTOF mass spectrometer over a gradient of 2 h as described previously . We spiked iRT peptides  into each sample to enable retention time calibration among different samples. We then combined these data with the DDA files from the pan-human library project . Altogether, we analyzed 422 DDA files using X!Tandem  and OMSSA  against three protein sequence databases downloaded on October 21, 2016, from UniProt, including the SwissProt database of curated protein sequences (n = 20,160), the splicing variant database (n = 21,970), and the trembl database (n = 135,369). Using each database, we built a target-decoy protein sequence database by reversing the target protein sequences. We allowed maximal two missed cleavages for fully tryptic peptides, and 50 ppm for peptide precursor mass error, and 0.1 Da for peptide fragment mass error. Static modification included carbamidomethyl at cysteine, while variable modification included oxidation at methionine. Search results from X!Tandem and OMSSA were further analyzed through Trans-Proteomic Pipeline (TPP, version 4.6.0)  using PeptideProphet and iProphet, followed by SWATH assay library building procedures as detailed previously [105, 110]. Altogether, we identified 167,402 peptide precursors, from which we selected the proteins detected in prostate tissue samples, and built a sample-specific library. SWATH wiff files were converted into mzXML files using ProteoWizard  msconvert v.3.0.3316, and then mzML files using OpenMS  tool FileConverter. OpenSWATH  was performed using the tool OpenSWATHWorkflow with input files including the mzXML file, the TraML library file, and TraML file for iRT peptides.
Peptide quantification using OpenSWATH
To obtain consistent quantification of the SWATH files, we obtained the all annotated b and y fragments from the sp, sv, and tr libraries. About ten thousand redundant and low-quality assays were removed. Then, we extracted the chromatography of these fragments and MS1 signals using OpenSWATHWorkflow, followed by curation using DIA-expert . Briefly, the chromatography of all fragments and MS1 signals were subject to scrutiny by empirically developed expert rules. A reference sample with the best q value by pyprophet was picked up to refined fragments. The peptide precursors are further filtered based on the following criteria: (i) remove peptide precursors with a q value higher than 1.7783e−06 to achieve a false discovery rate of 0.00977 at peptide level using SWATH2stats , (ii) peptides with a FC higher than 2 between the reference sample and its technical replicate were removed, and (iii) peptides matching to multiple SwissProt protein sequences were removed. The data matrix was first quantile normalized and log2 transformed, followed by batch correction using the ComBat R package . Finally, for each protein and pair of technical replicates, the average value was computed.
All plots were produced with R. Kaplan-Meier estimators were used for RFS analysis. Differences between survival estimates were evaluated by the log-rank test.
Computation of molecular perturbation scores
On the genomic level (mutation and CNA), we kept the tumor samples (66 in total) that contain FCs with respect to the blood. The mutation matrix was further discretized by setting all non-zero events to 1. At the transcriptomics level, the FCs for the 63 tumor samples were computed as described above (see the “RNA sequencing” section). Finally, on the proteomics level, we computed the FCs for the tumor samples (66 in total) as follows: for each protein, its mean intensity over the normal samples was subtracted from the intensities of the tumor samples. (We chose to compute the FCs for the tumor samples with respect to a global reference (average of all normal samples) and not with respect to their paired benign sample in order to achieve a higher consistency with the transcriptomics level.)
We assigned to each sample two molecular perturbation scores summarizing/quantifying the magnitude of its FCs: DE_count counts the number of mutated/differentially expressed (DE) genes, while the DE_sum score is the sum of absolute FCs of all genes. Thus, while the first score counts the number of events (mutations/DE genes), the second one quantifies their magnitude. These two scores can be regarded as generalizations of the term “mutational burden” for the mRNA and protein layer. A gene is regarded as mutated/DE if its value is 1 in the mutation layer and if its absolute value is above a threshold that has been set to 1 for the mRNA and protein layer. For the CNA layer, the corresponding threshold was set to 0.5 because the range of FCs in the CNA matrix is smaller than the mRNA and protein matrices. Both types of scores were computed for each molecular level, except for the point mutations where only DE_count was computed. Afterwards, the 66 DE_count scores (63 for the mRNA) and the DE_sum scores at each layer were divided into the four grade groups G1, G2, G3, and G4/5 respectively.
Correlating CNAs with mRNA and protein layer
For each of the 2120 genes measured in all three layers (CNA, mRNA, and protein), we computed the Spearman correlation between its CNAs and corresponding mRNA FCs as well as between its CNAs and corresponding protein FCs. We reduced each layer to the 63 tumor samples with available mRNA data.
As a network, the STRING gene interaction network (version 10)  was used, after removing all edges with a combined score smaller or equal to 0.9 and keeping subsequently the largest connected component. The resulting network consisted of 10,729 nodes and 118,647 (high-confidence) edges. For the network smoothing, the weight matrix was computed as described in Vanunu et al. , but for an unweighted graph and the propagation, the parameter was set to 0.5. The propagation was iteratively repeated 500 times to ensure convergence of the results. For the mapping from gene symbols to STRING identifiers (Additional file 7: Table S6), we used the R/Bioconductor package STRINGdb . The gene symbols with no matching STRING identifier were removed, while for those that mapped to multiple STRING identifiers, the first mapping was kept (default choice in the package). From the multiple gene symbols that mapped to the same STRING identifier, the first mapping was kept. The genes that were not present in the network were removed from the datasets, while those that were present in the network but not in the corresponding dataset were initially filled in with 0’s.
Genes with very small, “smoothed” (absolute) FCs were filtered out as follows: after the network propagation, only network nodes that had protein measurements themselves or at least one direct neighbor (on the filtered STRING network) with protein measurements were considered in the next steps of this analysis, i.e., network nodes without measured FCs at the protein layer that had no direct neighbor with measured protein values were removed from the subsequent analyses.
For significance testing, the one-sided Wilcoxon rank sum test comparing the smoothed FCs between the groups G4/5 (consisting of 12 samples for the CNA, mRNA, and protein layer) and G1 (consisting of 26 samples for the CNA and proteins and 25 for the mRNA) was applied to each network node (after filtering) and layer, once for upregulation and once for downregulation. The resulting sub-networks (upregulated and downregulated) consisted of those genes that were significant (P value below 0.05) at all three layers and all of the edges connecting them on the filtered STRING network.
It should be noted that although measurements from the same patient might not be statistically independent, we have kept them in our analyses firstly in order to increase statistical power and secondly because not all of them correspond to clonal events as shown in Fig. 5. To make sure though that having two samples for some of the patients has not affected our conclusions, we have repeated the statistical testing step (one-sided Wilcoxon rank sum test) in two ways: comparing G4/5 with G1 as before but removing the second tumor area of a patient if it belonged to the same grade group as tumor area 1 (i.e., removing TA2 of patients H3 and H10), and secondly comparing G4/5 with the combined (G1 and G2) group and once again removing the second tumor area of a patient if it belonged to the same grade group as his tumor area 1. P values resulting from these analyses were highly correlated (Additional file 1: Fig. S9), and we would thus consider the current conclusions to be robust.
Network Component 1 analysis
For each tumor sample at the CNA layer, a one-sided, one-sample t test has been applied testing if its average FC over the genes of the Network Component 1 (and in particular those that have been measured at the CNA) is significantly greater than 0. Due to the presence of outliers in some samples, the non-parametric, one-sided Wilcoxon signed-rank test has been applied as well yielding very similar results (data not shown). A result is considered to be significant if the corresponding P value is below 0.05. The analysis has been repeated for the mRNA and protein layer.
Independent cohorts validation
For the validation of Network Component 1, we used published datasets of three PCa cohorts: TCGA, MSKCC, and Aarhus. For TCGA and MSKCC, we downloaded the CNA, mRNA with precomputed z-scores per gene, and corresponding clinical data from cBioPortal  (https://www.cbioportal.org/). There were 489 samples with log2CNA data and 493 samples with mRNA profiles in TCGA. In MSKCC, there were 157 primary tumors with CNA data and 131 primary tumors with mRNA data. The clinical endpoint used in TCGA was the progression-free survival time and the disease-free survival in MSKCC. All previous samples had known survival time.
For the Aarhus study (NCBI GEO dataset GSE46602), we downloaded the mRNA matrix and corresponding clinical information as described in Ycart et al. . The resulting mRNA matrix consisted of 20,186 genes and 50 samples—36 PCa samples with known RFS time and 14 benign samples. Once excluding the benign samples, we computed z-scores per gene in order to have comparable values with the other two studies. These 36 PCa samples were also considered in the subsequent survival analysis. CNA data was not available for the Aarhus study.
We reduced all datasets to the nine genes of Network Component 1. In each of the datasets, we computed for each sample an average copy number change (CNA) or an average z-score (mRNA) across the nine genes of Network Component 1 (combined risk score). Subsequently, we used these combined risk scores to split the samples of each dataset into two groups: samples with a combined risk score larger or equal to the median combined risk score of the study were considered as “altered” and the rest as “unaltered.” Kaplan-Meier curves were generated for the two groups. Due to the high level of discretized values in MSKCC at the CNA layer, a sample is considered to be “altered” in that dataset if its combined risk score is above zero.
Additionally, we fitted for each dataset a Cox proportional hazards model to predict survival time using as input variables the average copy number change (CNA) or average z-score (mRNA) of Network Component 1 (variable of interest) and the age (when available, i.e., for TCGA and Aarhus). For each dataset with available copy number information (i.e., for the TCGA and MSKCC studies), we fitted a second Cox proportional hazards model with the fraction of genome altered as an additional input variable. For the model fitting, we used the R package survival (https://cran.r-project.org/web/packages/survival/index.html).
Analysis of regulators and target genes
For this analysis, we used once again the STRING gene interaction network. For each target gene (AGR2, ACPP, POSTN, LGALS3BP), we split the network nodes into two groups as follows: firstly, we identified the neighbors of the target gene supported by a combined evidence larger than 0.2. This set together with the target gene constituted group 1 while the remaining network nodes constituted group 2. For this splitting, only genes present in the network with copy number measurements and with a matching STRING identifier (Additional file 7: Table S6) were considered (i.e., 17,306 genes in total). Subsequently, genes altered (i.e., with log2 copy number ratio greater than 0.5 in absolute) in fewer than four tumor samples across the 66 tumor samples were filtered out in each of the two groups. Genes in group 1 after the CNA filtering are potential regulators of the target under consideration. For each gene in the two groups after the filtering, we computed the Spearman correlation between its CNAs and the mRNA FCs of the target gene. For computing the correlation, the samples were reduced to the 63 tumor samples with available mRNA data.
Subsequently, we fitted an elastic net model with alpha = 0.5 for ACPP. We used as output variable the mRNA FC of ACPP and as input variables the CNAs of the genes in group 1 after the copy number event filtering. The value for the regularization parameter lambda was chosen through 10-fold cross-validation (default in the R package glmnet (https://cran.r-project.org/web/packages/glmnet/)). The samples were necessarily reduced to the 63 mRNA tumor samples. Predictors/regulators with a non-zero beta coefficient were deemed significant. We have used the elastic net model with alpha = 0.5 because it is a method giving sparse solutions and can deal with correlated predictors at the same time.
As an additional validation to our approach, we used the two independent PCa cohorts described above (TCGA and MSKCC) and reduced the samples to those having both CNA and mRNA profile. This resulted in 488 samples for TCGA and 109 samples for MSKCC. Next, for each of the significant regulators/predictors, we computed the Spearman correlation between its CNAs and the corresponding mRNA z-scores of ACPP in each of the two independent studies and checked if the sign of the Spearman correlation matched the sign of the Spearman correlation computed for our cohort, i.e., there was an agreement regarding the direction of the association.
Availability of data and materials
Exome and RNA sequencing data were submitted to the Sequence Read Archive (SRA) at NCBI under accession numbers PRJNA577801 (exome-seq)  and PRJNA579899 (RNA-seq) , respectively. The SWATH proteomics data were deposited in PRIDE. The project accession code is PXD004589 . The published datasets of the two PCa cohorts (TCGA and MSKCC) analyzed during the current study can be downloaded from cBioPortal [122, 123] while the third (Aarhus) is available at the NCBI GEO repository under the accession number GSE46602 . Finally, we downloaded the STRING gene interaction network from the STRING database .
Ferlay J, Soerjomataram I, Dikshit R, Eser S, Mathers C, Rebelo M, Parkin DM, Forman D, Bray F. Cancer incidence and mortality worldwide: sources, methods and major patterns in GLOBOCAN 2012. Int J Cancer. 2015;136:E359–86.
Epstein JI, Zelefsky MJ, Sjoberg DD, Nelson JB, Egevad L, Magi-Galluzzi C, Vickers AJ, Parwani AV, Reuter VE, Fine SW, et al. A contemporary prostate cancer grading system: a validated alternative to the Gleason Score. Eur Urol. 2016;69:428–35.
Gordetsky J, Epstein J. Grading of prostatic adenocarcinoma: current state and prognostic implications. Diagn Pathol. 2016;11:25.
Berger MF, Lawrence MS, Demichelis F, Drier Y, Cibulskis K, Sivachenko AY, Sboner A, Esgueva R, Pflueger D, Sougnez C, et al. The genomic complexity of primary human prostate cancer. Nature. 2011;470:214–20.
Barbieri CE, Baca SC, Lawrence MS, Demichelis F, Blattner M, Theurillat JP, White TA, Stojanov P, Van Allen E, Stransky N, et al. Exome sequencing identifies recurrent SPOP, FOXA1 and MED12 mutations in prostate cancer. Nat Genet. 2012;44:685–9.
Robinson D, Van Allen EM, Wu YM, Schultz N, Lonigro RJ, Mosquera JM, Montgomery B, Taplin ME, Pritchard CC, Attard G, et al. Integrative clinical genomics of advanced prostate cancer. Cell. 2015;161:1215–28.
Baca SC, Prandi D, Lawrence MS, Mosquera JM, Romanel A, Drier Y, Park K, Kitabayashi N, MacDonald TY, Ghandi M, et al. Punctuated evolution of prostate cancer genomes. Cell. 2013;153:666–77.
Cancer Genome Atlas Research N. The molecular taxonomy of primary prostate cancer. Cell. 2015;163:1011–25.
Wedge DC, Gundem G, Mitchell T, Woodcock DJ, Martincorena I, Ghori M, Zamora J, Butler A, Whitaker H, Kote-Jarai Z, et al. Sequencing of prostate cancers identifies new cancer genes, routes of progression and drug targets. Nat Genet. 2018;50:682–92.
Tomlins SA, Rhodes DR, Perner S, Dhanasekaran SM, Mehra R, Sun XW, Varambally S, Cao X, Tchinda J, Kuefer R, et al. Recurrent fusion of TMPRSS2 and ETS transcription factor genes in prostate cancer. Science. 2005;310:644–8.
Grasso CS, Wu YM, Robinson DR, Cao X, Dhanasekaran SM, Khan AP, Quist MJ, Jing X, Lonigro RJ, Brenner JC, et al. The mutational landscape of lethal castration-resistant prostate cancer. Nature. 2012;487:239–43.
Boutros PC, Fraser M, Harding NJ, de Borja R, Trudel D, Lalonde E, Meng A, Hennings-Yeomans PH, McPherson A, Sabelnykova VY, et al. Spatial genomic heterogeneity within localized, multifocal prostate cancer. Nat Genet. 2015;47:736–45.
Cooper CS, Eeles R, Wedge DC, Van Loo P, Gundem G, Alexandrov LB, Kremeyer B, Butler A, Lynch AG, Camacho N, et al. Analysis of the genetic phylogeny of multifocal prostate cancer identifies multiple independent clonal expansions in neoplastic and morphologically normal prostate tissue. Nat Genet. 2015;47:367–72.
Lovf M, Zhao S, Axcrona U, Johannessen B, Bakken AC, Carm KT, Hoff AM, Myklebost O, Meza-Zepeda LA, Lie AK, et al. Multifocal primary prostate cancer exhibits high degree of genomic heterogeneity. Eur Urol. 2019;75:498–505.
Lindberg J, Klevebring D, Liu W, Neiman M, Xu J, Wiklund P, Wiklund F, Mills IG, Egevad L, Gronberg H. Exome sequencing of prostate cancer supports the hypothesis of independent tumour origins. Eur Urol. 2013;63:347–53.
Espiritu SMG, Liu LY, Rubanova Y, Bhandari V, Holgersen EM, Szyca LM, Fox NS, Chua MLK, Yamaguchi TN, Heisler LE, et al. The evolutionary landscape of localized prostate cancers drives clinical aggression. Cell. 2018;173:1003–13 e1015.
Sinha A, Huang V, Livingstone J, Wang J, Fox NS, Kurganovs N, Ignatchenko V, Fritsch K, Donmez N, Heisler LE, et al. The proteogenomic landscape of curable prostate cancer. Cancer Cell. 2019;35:414–27 e416.
Latonen L, Afyounian E, Jylha A, Nattinen J, Aapola U, Annala M, Kivinummi KK, Tammela TTL, Beuerman RW, Uusitalo H, et al. Integrative proteomics in prostate cancer uncovers robustness against genomic and transcriptomic aberrations during disease progression. Nat Commun. 2018;9:1176.
Iglesias-Gato D, Wikstrom P, Tyanova S, Lavallee C, Thysell E, Carlsson J, Hagglof C, Cox J, Andren O, Stattin P, et al. The proteome of primary prostate cancer. Eur Urol. 2016;69:942–52.
Attard G, Parker C, Eeles RA, Schroder F, Tomlins SA, Tannock I, Drake CG, de Bono JS. Prostate cancer. Lancet. 2016;387:70–82.
Goncalves E, Fragoulis A, Garcia-Alonso L, Cramer T, Saez-Rodriguez J, Beltrao P. Widespread post-transcriptional attenuation of genomic copy-number variation in cancer. Cell Syst. 2017;5:386–98 e384.
Liu Y, Beyer A, Aebersold R. On the dependency of cellular protein levels on mRNA abundance. Cell. 2016;165:535–50.
Guo T, Kouvonen P, Koh CC, Gillet LC, Wolski WE, Rost HL, Rosenberger G, Collins BC, Blum LC, Gillessen S, et al. Rapid mass spectrometric conversion of tissue biopsy samples into permanent quantitative digital proteome maps. Nat Med. 2015;21:407–13.
Guo T, Li L, Zhong Q, Rupp NJ, Charmpi K, Wong CE, Wagner U, Rueschoff JH, Jochum W, Fankhauser CD, et al. Multi-region proteome analysis quantifies spatial heterogeneity of prostate tissue biomarkers. Life Sci Alliance. 2018;1:e201800042.
Umbehr M, Kessler TM, Sulser T, Kristiansen G, Probst N, Steurer J, Bachmann LM. ProCOC: the prostate cancer outcomes cohort study. BMC Urol. 2008;8:9.
Wettstein MS, Saba K, Umbehr MH, Murtola TJ, Fankhauser CD, Adank JP, Hofmann M, Sulser T, Hermanns T, Moch H, et al. Prognostic role of preoperative serum lipid levels in patients undergoing radical prostatectomy for clinically localized prostate cancer. Prostate. 2017;77:549–56.
Epstein JI, Egevad L, Amin MB, Delahunt B, Srigley JR, Humphrey PA, Grading C. The 2014 International Society of Urological Pathology (ISUP) Consensus Conference on Gleason Grading of Prostatic Carcinoma: definition of grading patterns and proposal for a new grading system. Am J Surg Pathol. 2016;40:244–52.
Epstein JI, Amin MB, Reuter VE, Humphrey PA. Contemporary Gleason grading of prostatic carcinoma: an update with discussion on practical issues to implement the 2014 International Society of Urological Pathology (ISUP) Consensus Conference on Gleason Grading of Prostatic Carcinoma. Am J Surg Pathol. 2017;41:e1–7.
DeMarzo AM, Nelson WG, Isaacs WB, Epstein JI. Pathological and molecular aspects of prostate cancer. Lancet. 2003;361:955–64.
Reznik E, Sander C. Extensive decoupling of metabolic genes in cancer. PLoS Comput Biol. 2015;11:e1004176.
Taylor BS, Schultz N, Hieronymus H, Gopalan A, Xiao Y, Carver BS, Arora VK, Kaushik P, Cerami E, Reva B, et al. Integrative genomic profiling of human prostate cancer. Cancer Cell. 2010;18:11–22.
Zhang B, Wang J, Wang X, Zhu J, Liu Q, Shi Z, Chambers MC, Zimmerman LJ, Shaddox KF, Kim S, et al. Proteogenomic characterization of human colon and rectal cancer. Nature. 2014;513:382–7.
Geiger T, Cox J, Mann M. Proteomic changes resulting from gene copy number variations in cancer cells. PLoS Genet. 2010;6:e1001090.
Juschke C, Dohnal I, Pichler P, Harzer H, Swart R, Ammerer G, Mechtler K, Knoblich JA. Transcriptome and proteome quantification of a tumor model provides novel insights into post-transcriptional gene regulation. Genome Biol. 2013;14:r133.
Stingele S, Stoehr G, Peplowska K, Cox J, Mann M, Storchova Z. Global analysis of genome, transcriptome and proteome reveals the response to aneuploidy in human cells. Mol Syst Biol. 2012;8:608.
Hofree M, Shen JP, Carter H, Gross A, Ideker T. Network-based stratification of tumor mutations. Nat Methods. 2013;10:1108–15.
Zhang JS, Gong A, Cheville JC, Smith DI, Young CY. AGR2, an androgen-inducible secretory protein overexpressed in prostate cancer. Genes Chromosomes Cancer. 2005;43:249–59.
Liu Q, Harvey CT, Geng H, Xue C, Chen V, Beer TM, Qian DZ. Malate dehydrogenase 2 confers docetaxel resistance via regulations of JNK signaling and oxidative metabolism. Prostate. 2013;73:1028–37.
Yang J, Song H, Chen L, Cao K, Zhang Y, Li Y, Hao X. Integrated analysis of microfibrillar-associated proteins reveals MFAP4 as a novel biomarker in human cancers. Epigenomics. 2019;11:1635–51.
Senga S, Kobayashi N, Kawaguchi K, Ando A, Fujii H. Fatty acid-binding protein 5 (FABP5) promotes lipolysis of lipid droplets, de novo fatty acid (FA) synthesis and activation of nuclear factor-kappa B (NF-kappaB) signaling in cancer cells. Biochim Biophys Acta Mol Cell Biol Lipids. 1863;2018:1057–67.
Pan Y, Liu Z, Feng Z, Hui D, Huang X, Tong D, Jin Y. The overexpression of Rabl3 is associated with pathogenesis and clinicopathologic variables in hepatocellular carcinoma. Tumour Biol. 2017;39:1010428317696230.
Zhang W, Sun J, Luo J. High expression of Rab-like 3 (Rabl3) is associated with poor survival of patients with non-small cell lung cancer via repression of MAPK8/9/10-mediated autophagy. Med Sci Monit. 2016;22:1582–8.
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, et al. STRING v10: protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res. 2015;43:D447–52.
Muniyan S, Chaturvedi NK, Dwyer JG, Lagrange CA, Chaney WG, Lin MF. Human prostatic acid phosphatase: structure, function and regulation. Int J Mol Sci. 2013;14:10438–64.
Araujo CL, Quintero IB, Ovaska K, Herrala AM, Hautaniemi S, Vihko PT. Transmembrane prostatic acid phosphatase (TMPAP) delays cells in G1 phase of the cell cycle. Prostate. 2016;76:151–62.
Bhatia-Gaur R, Donjacour AA, Sciavolino PJ, Kim M, Desai N, Young P, Norton CR, Gridley T, Cardiff RD, Cunha GR, et al. Roles for Nkx3.1 in prostate development and cancer. Genes Dev. 1999;13:966–77.
Gurel B, Ali TZ, Montgomery EA, Begum S, Hicks J, Goggins M, Eberhart CG, Clark DP, Bieberich CJ, Epstein JI, De Marzo AM. NKX3.1 as a marker of prostatic origin in metastatic tumors. Am J Surg Pathol. 2010;34:1097–105.
Oki S, Ohta T, Shioi G, Hatanaka H, Ogasawara O, Okuda Y, Kawaji H, Nakaki R, Sese J, Meno C. ChIP-Atlas: a data-mining suite powered by full integration of public ChIP-seq data. EMBO Rep. 2018;19:e46255.
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol. 2010;6:e1000641.
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B. Learning with local and global consistency. Adv Neural Inf Proces Syst. 2004;16:312–28.
Baude A, Aaes TL, Zhai B, Al-Nakouzi N, Oo HZ, Daugaard M, Rohde M, Jaattela M. Hepatoma-derived growth factor-related protein 2 promotes DNA repair by homologous recombination. Nucleic Acids Res. 2016;44:2214–26.
Guler GD, Tindell CA, Pitti R, Wilson C, Nichols K, Kaiwai CT, Kim HJ, Wongchenko M, Yan Y, Haley B. Repression of stress-induced LINE-1 expression protects cancer cell subpopulations from lethal drug exposure. Cancer Cell. 2017;32:221.
Slezak J, Truong M, Huang W, Jarrard D. HP1γ expression is elevated in prostate cancer and is superior to Gleason Score as a predictor of biochemical recurrence after radical prostatectomy. BMC Cancer. 2013;13:148.
Ceol CJ, Yariv H, Judit JV, Steve B, Orlando DA, Valentine B, Lauriane F, Lin WM, Hollmann TJ, Fabrizio F. The SETDB1 histone methyltransferase is recurrently amplified in and accelerates melanoma. Nature. 2011;471:513–7.
Sun Y, Wei M, Ren SC, Chen R, Xu WD, Wang FB, Lu J, Shen J, Yu YW, Hou JG, et al. Histone methyltransferase SETDB1 is required for prostate cancer cell proliferation, migration and invasion. Asian J Androl. 2014;16:319–24.
Luo H, Cowen L, Yu G, Jiang W, Tang Y. SMG7 is a critical regulator of p53 stability and function in DNA damage stress response. Cell Discovery. 2016;2:15042.
Karanika S, Karantanos T, Li L, Wang J, Park S, Yang G, Zuo X, Song JH, Maity SN, Manyam GC. Targeting DNA damage response in prostate cancer by inhibiting androgen receptor-CDC6-ATR-Chk1 signaling. Cell Rep. 2017;18:1970.
Diouf B, Cheng Q, Krynetskaia NF, Yang W, Cheok M, Pei D, Fan Y, Cheng C, Krynetskiy EY, Geng H. Somatic deletions of genes regulating MSH2 protein stability cause DNA mismatch repair deficiency and drug resistance in human leukemia cells. Nat Med. 2011;17:1298–303.
Yao S, Bee A, Brewer D, Dodson A, Beesley C, Ke Y, Ambroisine L, Fisher G, Møller H, Dickinson T. PRKC-ζ expression promotes the aggressive phenotype of human prostate cancer cells and is a novel target for therapeutic intervention. Genes Cancer. 2010;1:444–64.
Zhou W, Jiang Y, Zhu M, Hang D, Chen J, Zhou J, Dai J, Ma H, Hu Z, Jin G, et al. Low-frequency nonsynonymous variants in FKBPL and ARPC1B genes are associated with breast cancer risk in Chinese women. Mol Carcinog. 2017;56:774–80.
De Robertis M, Mazza T, Fusilli C, Loiacono L, Poeta ML, Sanchez M, Massi E, Lamorte G, Diodoro MG, Pescarmona E, et al. EphB2 stem-related and EphA2 progression-related miRNA-based networks in progressive stages of CRC evolution: clinical significance and potential miRNA drivers. Mol Cancer. 2018;17:169.
Zeng Z, Yang H, Xiao S. ACTL6A expression promotes invasion, metastasis and epithelial mesenchymal transition of colon cancer. BMC Cancer. 2018;18:1020.
Castro MA, Dal-Pizzol F, Zdanov S, Soares M, Muller CB, Lopes FM, Zanotto-Filho A, da Cruz FM, Moreira JC, Shacter E, Klamt F. CFL1 expression levels as a prognostic and drug resistance marker in nonsmall cell lung cancer. Cancer. 2010;116:3645–55.
Overbye A, Skotland T, Koehler CJ, Thiede B, Seierstad T, Berge V, Sandvig K, Llorente A. Identification of prostate cancer biomarkers in urinary exosomes. Oncotarget. 2015;6:30357–76.
Deng Z, Wan M, Cao P, Rao A, Cramer SD, Sui G. Yin Yang 1 regulates the transcriptional activity of androgen receptor. Oncogene. 2009;28:3746–57.
Fournier PG, Juarez P, Jiang G, Clines GA, Niewolna M, Kim HS, Walton HW, Peng XH, Liu Y, Mohammad KS, et al. The TGF-beta signaling regulator PMEPA1 suppresses prostate cancer metastases to bone. Cancer Cell. 2015;27:809–21.
Yang J, Lu C, Wei J, Guo Y, Liu W, Luo L, Fisch G, Li X. Inhibition of KPNA4 attenuates prostate cancer metastasis. Oncogene. 2017;36:2868–78.
Liu W, Xie CC, Zhu Y, Li T, Sun J, Cheng Y, Ewing CM, Dalrymple S, Turner AR, Sun J, et al. Homozygous deletions and recurrent amplifications implicate new genes involved in prostate cancer. Neoplasia. 2008;10:897–907.
Trasino SE, Harrison EH, Wang TT. Androgen regulation of aldehyde dehydrogenase 1A3 (ALDH1A3) in the androgen-responsive human prostate cancer cell line LNCaP. Exp Biol Med (Maywood). 2007;232:762–71.
Bonaccorsi L, Luciani P, Nesi G, Mannucci E, Deledda C, Dichiara F, Paglierani M, Rosati F, Masieri L, Serni S, et al. Androgen receptor regulation of the seladin-1/DHCR24 gene: altered expression in prostate cancer. Lab Investig. 2008;88:1049–56.
Li H, Xiao D, Hu L, He T. Association of CYP1A1 polymorphisms with prostate cancer risk: an updated meta-analysis. Mol Biol Rep. 2012;39:10273–84.
Aktas D, Hascicek M, Sozen S, Ozen H, Tuncbilek E. CYP1A1 and GSTM1 polymorphic genotypes in patients with prostate cancer in a Turkish population. Cancer Genet Cytogenet. 2004;154:81–5.
Acevedo C, Opazo JL, Huidobro C, Cabezas J, Iturrieta J, Quinones Sepulveda L. Positive correlation between single or combined genotypes of CYP1A1 and GSTM1 in relation to prostate cancer in Chilean people. Prostate. 2003;57:111–7.
Cao HM, Wan Z, Wu Y, Wang HY, Guan C. Development and internal validation of a novel model and markers to identify the candidates for lymph node metastasis in patients with prostate cancer. Medicine (Baltimore). 2019;98:e16534.
Kawamura N, Nimura K, Saga K, Ishibashi A, Kitamura K, Nagano H, Yoshikawa Y, Ishida K, Nonomura N, Arisawa M, et al. SF3B2-mediated RNA splicing drives human prostate cancer progression. Cancer Res. 2019;79:5204–17.
Wen M, Kwon Y, Wang Y, Mao JH, Wei G. Elevated expression of UBE2T exhibits oncogenic properties in human prostate cancer. Oncotarget. 2015;6:25226–39.
Kleppe A, Albregtsen F, Vlatkovic L, Pradhan M, Nielsen B, Hveem TS, Askautrud HA, Kristensen GB, Nesbakken A, Trovik J, et al. Chromatin organisation and cancer prognosis: a pan-cancer study. Lancet Oncol. 2018;19:356–69.
Lee RS, Roberts CW. Linking the SWI/SNF complex to prostate cancer. Nat Genet. 2013;45:1268–9.
Chuu C-P, Lin C-Y, Huang S-H, Tsai KK-C. MO2-10-6ACTL6A is a novel oncogene and prognostic biomarker for prostate cancer. Ann Oncol. 2019;30:vi106.
Doyon Y, Selleck W, Lane WS, Tan S, Cote J. Structural and functional conservation of the NuA4 histone acetyltransferase complex from yeast to humans. Mol Cell Biol. 2004;24:1884–96.
Cai Y, Jin J, Florens L, Swanson SK, Kusch T, Li B, Workman JL, Washburn MP, Conaway RC, Conaway JW. The mammalian YL1 protein is a shared subunit of the TRRAP/TIP60 histone acetyltransferase and SRCAP complexes. J Biol Chem. 2005;280:13665–70.
Mortensen MM, Hoyer S, Lynnerup AS, Orntoft TF, Sorensen KD, Borre M, Dyrskjot L. Expression profiling of prostate cancer tissue delineates genes associated with recurrence after prostatectomy. Sci Rep. 2015;5:16018.
Hussain M, Mateo J, Fizazi K, Saad F, Shore ND, Sandhu S, Chi KN, Sartor O, Agarwal N, Olmos D, et al. LBA12_PRPROfound: phase III study of olaparib versus enzalutamide or abiraterone for metastatic castration-resistant prostate cancer (mCRPC) with homologous recombination repair (HRR) gene alterations. Ann Oncol. 2019;30:v881-v882.
Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9.
McKenna A, Hanna M, Banks E, Sivachenko A, Cibulskis K, Kernytsky A, Garimella K, Altshuler D, Gabriel S, Daly M, DePristo MA. The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. Genome Res. 2010;20:1297–303.
Bolger AM, Lohse M, Usadel B. Trimmomatic: a flexible trimmer for Illumina sequence data. Bioinformatics. 2014;30:2114–20.
Li H, Handsaker B, Wysoker A, Fennell T, Ruan J, Homer N, Marth G, Abecasis G, Durbin R, Genome Project Data Processing S. The sequence alignment/map format and SAMtools. Bioinformatics. 2009;25:2078–9.
Quinlan AR, Hall IM. BEDTools: a flexible suite of utilities for comparing genomic features. Bioinformatics. 2010;26:841–2.
Cibulskis K, Lawrence MS, Carter SL, Sivachenko A, Jaffe D, Sougnez C, Gabriel S, Meyerson M, Lander ES, Getz G. Sensitive detection of somatic point mutations in impure and heterogeneous cancer samples. Nat Biotechnol. 2013;31:213–9.
Saunders CT, Wong WSW, Swamy S, Becq J, Murray LJ, Cheetham RK. Strelka: accurate somatic small-variant calling from sequenced tumor-normal sample pairs. Bioinformatics. 2012;28:1811–7.
Sherry ST, Ward MH, Kholodov M, Baker J, Phan L, Smigielski EM, Sirotkin K. dbSNP: the NCBI database of genetic variation. Nucleic Acids Res. 2001;29:308–11.
Kuilman T, Velds A, Kemper K, Ranzani M, Bombardelli L, Hoogstraat M, Nevedomskaya E, Xu G, de Ruiter J, Lolkema MP, et al. CopywriteR: DNA copy number detection from off-target sequence data. Genome Biol. 2015;16:49.
Noske A, Brandt S, Valtcheva N, Wagner U, Zhong Q, Bellini E, Fink D, Obermann EC, Moch H, Wild PJ. Detection of CCNE1/URI (19q12) amplification by in situ hybridisation is common in high grade and type II endometrial cancer. Oncotarget. 2017;8:14794–805.
Dobin A, Davis CA, Schlesinger F, Drenkow J, Zaleski C, Jha S, Batut P, Chaisson M, Gingeras TR. STAR: ultrafast universal RNA-seq aligner. Bioinformatics. 2013;29:15–21.
Hartley SW, Mullikin JC. QoRTs: a comprehensive toolset for quality control and data processing of RNA-Seq experiments. BMC Bioinformatics. 2015;16:224.
Feng H, Zhang X, Zhang C. mRIN for direct assessment of genome-wide and gene-specific mRNA integrity from large-scale RNA-sequencing data. Nat Commun. 2015;6:7816.
Shao W, Guo T, Toussaint NC, Xue P, Wagner U, Li L, Charmpi K, Zhu Y, Wu J, Buljan M, et al. Comparative analysis of mRNA and protein degradation in prostate tissues indicates high stability of proteins. Nat Commun. 2019;10:2524.
Sigurgeirsson B, Emanuelsson O, Lundeberg J. Sequencing degraded RNA addressed by 3′ tag counting. PloS One. 2014;9:e91851.
Liao Y, Smyth GK, Shi W. featureCounts: an efficient general purpose program for assigning sequence reads to genomic features. Bioinformatics. 2014;30:923–30.
Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28:882–3.
Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15:550.
Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010;28:511–5.
Shao S, Guo T, Gross V, Lazarev A, Koh CC, Gillessen S, Joerger M, Jochum W, Aebersold R. Reproducible tissue homogenization and protein extraction for quantitative proteomics using MicroPestle-assisted pressure-cycling technology. J Proteome Res. 2016;15:1821–9.
Escher C, Reiter L, MacLean B, Ossola R, Herzog F, Chilton J, MacCoss MJ, Rinner O. Using iRT, a normalized retention time for more targeted measurement of peptides. Proteomics. 2012;12(8):1111–21.
Röst HL, Rosenberger G, Navarro P, Gillet L, Miladinovic SM, Schubert OT, Wolski W, Collins BC, Malmstrom J, Malmstrom L, Aebersold R. OpenSWATH enables automated, targeted analysis of data-independent acquisition MS data. Nat Biotechnol. 2014;32:219–23.
Rosenberger G, Koh CC, Guo T, Rost HL, Kouvonen P, Collins BC, Heusel M, Liu Y, Caron E, Vichalkovski A, et al. A repository of assays to quantify 10,000 human proteins by SWATH-MS. Sci Data. 2014;1:140031.
MacLean B, Eng JK, Beavis RC, McIntosh M. General framework for developing and evaluating database scoring algorithms using the TANDEM search engine. Bioinform. 2006;22(22):2830–32.
Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH. Open Mass Spectrometry Search Algorithm. J Proteome Res. 2004;3(5):958–64.
Deutsch EW, Mendoza L, Shteynberg D, Farrah T, Lam H, Tasman N, Sun Z, Nilsson E, Pratt B, Prazen B, Eng JK, Martin DB, Nesvizhskii AI, Aebersold R. A guided tour of the Trans-Proteomic Pipeline. Proteomics. 2010;10(6):1150–59.
Schubert OT, Gillet LC, Collins BC, Navarro P, Rosenberger G, Wolski WE, Lam H, Amodei D, Mallick P, MacLean B, Aebersold R. Building high-quality assay libraries for targeted analysis of SWATH MS data. Nat Protoc. 2015;10(3):426–41.
Chambers MC, Maclean B, Burke R, Amodei D, Ruderman DL, Neumann S, Gatto L, Fischer B, Pratt B, Egertson J, Hoff K, Kessner D, Tasman N, Shulman N, Frewen B, Baker TA, Brusniak M-Y, Paulse C, Creasy D, Flashner L, Kani K, Moulding C, Seymour SL, Nuwaysir LM, Lefebvre B, Kuhlmann F, Roark J, Rainer P, Detlev S, Hemenway T, Huhmer A, Langridge J, Connolly B, Chadick T, Holly K, Eckels J, Deutsch EW, Moritz RL, Katz JE, Agus DB, MacCoss M, Tabb DL, Mallick P. A cross-platform toolkit for mass spectrometry and proteomics. Nat Biotechnol. 2012;30(10):918–20.
Sturm M, Bertsch A, Gröpl C, Hildebrandt A, Hussong R, Lange E, Pfeifer N, Schulz-Trieglaff O, Zerck A, Reinert K, Kohlbacher O. OpenMS – An open-source software framework for mass spectrometry. BMC Bioinformatics. 2008;9(1).
Guo T, Luna A, Rajapakse VN, Koh CC, Wu Z, Liu W, Sun Y, Gao H, Menden MP, Xu C, et al. Quantitative proteome landscape of the NCI-60 cancer cell lines. iScience. 2019;21:664–80.
Blattmann P, Heusel M, Aebersold R. SWATH2stats: an R/Bioconductor package to process and convert quantitative SWATH-MS proteomics data for downstream analysis tools. PLoS One. 2016;11:e0153160.
Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8:118–27.
Franceschini A, Szklarczyk D, Frankild S, Kuhn M, Simonovic M, Roth A, Lin J, Minguez P, Bork P, von Mering C, Jensen LJ. STRING v9.1: protein-protein interaction networks, with increased coverage and integration. Nucleic Acids Res. 2013;41:D808–15.
Gao J, Aksoy BA, Dogrusoz U, Dresdner G, Gross B, Sumer SO, Sun Y, Jacobsen A, Sinha R, Larsson E, et al. Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Sci Signal. 2013;6:pl1.
Ycart B, Charmpi K, Rousseaux S, Fournié J-J. Large scale statistical analysis of GEO datasets. Gene Technology. 2014;4:113:111–119.
Charmpi K, Guo T, Zhong Q, Wagner U, Sun R, Toussaint NC, Fritz CF, Yuan C, Chen H, Rupp NJ, et al. PCP39: raw genomic data for 39 prostate cancer patients. Dataset. SRA: PRJNA577801. https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA577801 (2020).
Charmpi K, Guo T, Zhong Q, Wagner U, Sun R, Toussaint NC, Fritz CF, Yuan C, Chen H, Rupp NJ, et al. PCP39: raw RNA-Seq data for 39 prostate cancer patients. Dataset. SRA: PRJNA579899. https://www.ncbi.nlm.nih.gov/Traces/study/?acc=PRJNA579899 (2020).
Charmpi K, Guo T, Zhong Q, Wagner U, Sun R, Toussaint NC, Fritz CF, Yuan C, Chen H, Rupp NJ, et al. PCP39: prostate cancer proteome for 39 patients by PCT-SWATH. Dataset. PRIDE: PXD004589. http://proteomecentral.proteomexchange.org/cgi/GetDataset?ID=PXD004589 (2018).
Hoadley KA, Yau C, Hinoue T, Wolf DM, Lazar AJ, Drill E, Shen R, Taylor AM, Cherniack AD, Thorsson V, et al. Prostate adenocarcinoma TCGA PanCancer data. Dataset. cBioPortal. https://www.cbioportal.org/study/summary?id=prad_tcga_pan_can_atlas_2018 (2018). Accessed Apr 2020.
Taylor BS, Schultz N, Hieronymus H, Sawyers CL. Comprehensive profiling of 218 prostate tumors (181 primaries, 37 metastases) and 14 prostate cancer cell lines and xenografts. MSKCC Prostate Oncogenome Project. Dataset. cBioPortal. https://www.cbioportal.org/study/summary?id=prad_mskcc (2010). Accessed Nov 2019.
Mortensen MM, Dyrskjøt L. Expression data from prostate cancer and benign prostate glands. Dataset. Gene Expression Omnibus: GSE46602. https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE46602 (2015). Accessed Dec 2019.
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, Simonovic M, Roth A, Santos A, Tsafou KP, et al. STRING gene interaction network (version 10). Dataset. STRING. http://version10.string-db.org/cgi/download.pl (2017). Accessed Aug 2020.
We wish to thank the patients participating in this study and we are grateful to two anonymous reviewers for their valuable input.
Peer review information
Anahita Bishop was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
The review history is available as Additional file 8.
This work was supported by the SystemsX.ch project PhosphoNet PPM (to R.A. and P.J.W.), the Swiss National Science Foundation (grant no. 3100A0-688 107679 to R.A.), the European Research Council (grant no. ERC-2008-AdG 233226 and ERC-2014-AdG670821 to R.A.), the Foundation for Scientific Research at the University of Zurich (to P.J.W.), the Westlake Startup Grant (to T.G.), the Zhejiang Provincial Natural Science Foundation of China (Grant No. LR19C050001 to T.G.), and the Hangzhou Agriculture and Society Advancement Program (Grant No. 20190101A04 to T.G.). A.B. and K.C. are supported by the German Federal Ministry of Education and Research Grants Sybacol and PhosphoNetPPM.
Ethics approval and consent to participate
This study was approved by the Cantonal Ethics Committee of Zurich (KEK-ZH-No. 2008-0040), the associated methods were carried out in accordance with the approved guidelines and complied with the Declaration of Helsinki, and each patient has signed an informed consent form.
Consent for publication
R.A. holds shares of Biognosys AG, which operates in the field covered by the article. The research groups of R.A. and T.G. are supported by SCIEX, which provides access to prototype instrumentation, and Pressure Biosciences Inc., which provides access to advanced sample preparation instrumentation.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary text and supplementary figures.
Table S1. Clinicopathological, immunological and other molecular information of the 39 PCa patients. (a) Overall clinicopathological characteristics. (b) Detailed information for each patient. Pat: numeric patient ID; Pat_id: patient ID grouped by the overall grade. L: low grade; M: intermediate grade; H: high grade; Overall_Gleason_GrGp: overall ISUP grade group; pT: tumor stage; pN: nodal status; R: surgical margin status; Age_at_OP: age at operation; PSA_at_Diag: blood PSA level at diagnosis; Time (months): RFS time. A value of 0 corresponds to patients excluded for the reasons explained in the ‘Methods’ section (see ‘Patients and samples’); Status: status indicator. 1 means recurrence; DX name: tissue region name; ImageName: name of the scanned images; index_tumor_id: patient ID of TA1 (or TA); TA1_GrGp: grade group for TA1; T_GrGp: grade group for TA2.
Table S2. Exome analysis of the peripheral blood cells and 105 prostate tumor punches in 39 patients. (a) Allele frequencies (AF) of somatic single nucleotide variants (SNVs) that were called by our bioinformatics pipeline. Genes with called SNV are indicated by an AF > 0. A value of 0 indicates that no SNV was found in the respective genes. In our data, no gene was found with more than one called somatic SNV. (b) Number of samples per gene with called somatic SNV. (c) Protein domain analysis using DAVID.
Table S3. Copy number analysis of 105 PCa samples. (a) Log2 ratios indicating the CNA status are shown for all genes in all samples. Values were determined by overlapping gene locations with CNA segments as calculated by CopywriteR. In case more than one segment overlapped with a gene, number was chosen that had the highest absolute value. (b) Genes are shown with log2 ratios higher than 0.5 or lower than − 0.5 in at least one sample.
Table S4. RNA-seq analysis. (a) Log2FCs (relative to all benign samples) for all genes across the tumor samples. (b) mRIN score per sample generated using mRIN (v1.2.0). (c) ETS family gene fusions observed in tumor samples using FusionCatcher: a value of 1 means that the fusion was observed in the respective sample but not its corresponding benign sample, otherwise the value is 0. (d) Normalized RNA-seq count data matrix.
Table S5. Proteomics data of 210 PCa samples with duplicates. (a) Sample information includes patient ID, clinical diagnosis, sample ID and batch design. (b) Protein matrix of log2 scaled intensity of 2371 proteins quantified in 210 PCa samples.
Table S6. Integration analysis of 66 tumor samples. (a) Information (i.e. reference linking them to PCa, consistency between observed and reported effect and number of tumor samples with CNAs) for the first 10 highest-scoring proteins (those with largest average absolute FCs across all tumor specimens). (b) Consistently up-regulated genes in the high-grade tumors: for each of these genes, there is a significant up-regulation of its FCs after network smoothing in the group G4/5 compared to the group G1 in all three layers (CNA, mRNA and protein). (c) Consistently down-regulated genes in the high-grade tumors: for each of these genes, there is a significant down-regulation of its FCs after network smoothing in the group G4/5 compared to the group G1 in all three layers (CNA, mRNA and protein). (d) Chromosome information for the gene members of Network Component 1. (e) Mapping from gene symbols to STRING identifiers.
About this article
Cite this article
Charmpi, K., Guo, T., Zhong, Q. et al. Convergent network effects along the axis of gene expression during prostate cancer progression. Genome Biol 21, 302 (2020). https://doi.org/10.1186/s13059-020-02188-9
- Molecular aberrations
- Network effects
- Prostate cancer
- Proteogenomic analysis
- Tumor heterogeneity