Modeling precision treatment of breast cancer
- Anneleen Daemen1, 2, 13Email author,
- Obi L Griffith1, 3, 6Email author,
- Laura M Heiser1, 4,
- Nicholas J Wang1, 4,
- Oana M Enache1,
- Zachary Sanborn5,
- Francois Pepin1, 14,
- Steffen Durinck1,
- James E Korkola1, 4,
- Malachi Griffith6,
- Joe S Hur7,
- Nam Huh8,
- Jongsuk Chung8,
- Leslie Cope9,
- Mary Jo Fackler9,
- Christopher Umbricht9,
- Saraswati Sukumar9,
- Pankaj Seth10,
- Vikas P Sukhatme10,
- Lakshmi R Jakkula1,
- Yiling Lu11,
- Gordon B Mills11,
- Raymond J Cho12,
- Eric A Collisson1, 2,
- Laura J van’t Veer2,
- Paul T Spellman1, 3 and
- Joe W Gray1, 4Email author
© Daemen et al.; licensee BioMed Central Ltd. 2013
Received: 1 March 2013
Accepted: 31 October 2013
Published: 10 December 2013
First-generation molecular profiles for human breast cancers have enabled the identification of features that can predict therapeutic response; however, little is known about how the various data types can best be combined to yield optimal predictors. Collections of breast cancer cell lines mirror many aspects of breast cancer molecular pathobiology, and measurements of their omic and biological therapeutic responses are well-suited for development of strategies to identify the most predictive molecular feature sets.
We used least squares-support vector machines and random forest algorithms to identify molecular features associated with responses of a collection of 70 breast cancer cell lines to 90 experimental or approved therapeutic agents. The datasets analyzed included measurements of copy number aberrations, mutations, gene and isoform expression, promoter methylation and protein expression. Transcriptional subtype contributed strongly to response predictors for 25% of compounds, and adding other molecular data types improved prediction for 65%. No single molecular dataset consistently out-performed the others, suggesting that therapeutic response is mediated at multiple levels in the genome. Response predictors were developed and applied to TCGA data, and were found to be present in subsets of those patient samples.
These results suggest that matching patients to treatments based on transcriptional subtype will improve response rates, and inclusion of additional features from other profiling data types may provide additional benefit. Further, we suggest a systems biology strategy for guiding clinical trials so that patient cohorts most likely to respond to new therapies may be more efficiently identified.
Breast cancer is a clinically and genomically heterogeneous disease. Six subtypes were defined approximately a decade ago based on transcriptional characteristics and were designated luminal A, luminal B, ERBB2-enriched, basal-like, claudin-low and normal-like [1, 2]. New cancers can be assigned to these subtypes using a 50-gene transcriptional signature designated the PAM50 . However, the number of distinct subtypes is increasing steadily as multiple data types are integrated. Integration of genome copy number and transcriptional profiles defines 10 subtypes , and adding mutation status , methylation pattern , pattern of splice variants , protein and phosphoprotein expression  and microRNA expression and pathway activity  may define still more subtypes. The Cancer Genome Atlas (TCGA) project and other international genomics efforts were founded to improve our understanding of the molecular landscapes of most major tumor types with the ultimate goal of increasing the precision with which individual cancers are managed. One application of these data is to identify molecular signatures that can be used to assign specific treatment to individual patients. However, strategies to develop optimal predictive marker sets are still being explored. Indeed, it is not yet clear which molecular data types (genome, transcriptome, proteome, and so on) will be most useful as response predictors.
In breast cancer, cell lines mirror many of the molecular characteristics of the tumors from which they were derived, and are therefore a useful preclinical model in which to explore strategies for predictive marker development [8, 9]. To this end, we have analyzed the responses of 70 well characterized breast cancer cell lines to 90 compounds and used two independent machine learning approaches to identify pretreatment molecular features that are strongly associated with responses within the cell line panel. For most compounds tested, in vitro cell line systems provide the only experimental data that can be used to identify predictive response signatures, as most of the compounds have not been tested in clinical trials. Our study focuses on breast cancer [10, 11] and extends earlier efforts [12–14], by including more cell lines, by evaluating a larger number of compounds relevant to breast cancer, and by increasing the molecular data types used for predictor development. Data types used for correlative analysis include pretreatment measurements of mRNA expression, genome copy number, protein expression, promoter methylation, gene mutation, and transcriptome sequence (RNAseq). This compendium of data is now available to the community as a resource for further studies of breast cancer and the inter-relationships between data types. We report here on initial machine learning-based methods to identify correlations between these molecular features and drug response. In the process, we assessed the utility of individual data sets and the integrated data set for response predictor development. We also describe a publicly available software package that we developed to predict compound efficacy in individual tumors based on their omic features. This tool could be used to assign an experimental compound to individual patients in marker-guided trials, and serves as a model for how to assign approved drugs to individual patients in the clinical setting. We explored the performance of the predictors by using it to assign compounds to 306 TCGA samples based on their molecular profiles.
Results and discussion
Breast cancer cell line panel
We assembled a collection of 84 breast cancer cell lines composed of 35 luminal, 27 basal, 10 claudin-low, 7 normal-like, 2 matched normal cell lines, and 3 of unknown subtype (Additional file 1) . Fourteen luminal and 7 basal cell lines were also ERBB2-amplified. Seventy cell lines were tested for response to 138 compounds by growth inhibition assays. The cells were treated in triplicate with nine different concentrations of each compound as previously described . The concentration required to inhibit growth by 50% (GI50) was used as the response measure for each compound. Compounds with low variation in response in the cell line panel were eliminated, leaving a response data set of 90 compounds. An overview of the 70 cell lines with subtype information and 90 therapeutic compounds with GI50 values is provided in Additional file 1. All 70 lines were used in development of at least some predictors depending on data type availability. The therapeutic compounds include conventional cytotoxic agents such as taxanes, platinols and anthracyclines, as well as targeted agents such as hormone and kinase inhibitors. Some of the agents target the same protein or share common molecular mechanisms of action. Responses to compounds with common mechanisms of action were highly correlated, as has been described previously .
A rich and multi-omic molecular profiling dataset
Seven pretreatment molecular profiling data sets were analyzed to identify molecular features associated with response. These included profiles for DNA copy number (Affymetrix SNP6 - EGA accessions EGAS00000000059 and EGAS00001000585), mRNA expression (Affymetrix U133A and Exon 1.0 ST array - ArrayExpress accessions E-TABM-157 and E-MTAB-181), transcriptome sequence (RNAseq - Gene Expression Omnibus (GEO) accession GSE48216), promoter methylation (Illumina Methylation27 BeadChip - GEO accession GSE42944), protein abundance (Reverse Protein Lysate Array - Additional file 2), and mutation status (Exome-Seq - GEO accession GSE48216). The data were preprocessed as described in Supplementary Methods of Additional file 3. Figure S1 in Additional file 3 gives an overview of the number of features per data set before and after filtering based on variance and signal detection above background where applicable. Exome-seq data were available for 75 cell lines, followed by SNP6 data for 74 cell lines, therapeutic response data for 70, RNAseq for 56, exon array for 56, Reverse Phase Protein Array (RPPA) for 49, methylation for 47, and U133A expression array data for 46 cell lines. Information on the overlap in cell lines with both response data and molecular data is provided in Additional file 3. The set of 48 core cell lines was defined as those with response data and at least 4 molecular data sets.
We investigated the association between expression, copy number and methylation data. We distinguished correlation at the cell line level and gene level. At the cell line level, we report average correlation between datasets for each cell line across all genes, while correlation at the gene level represents the average correlation between datasets for each gene across all cell lines. Correlation among the three expression datasets (U133A, exon array, and RNAseq) ranged from 0.6 to 0.77 at the cell line level, and from 0.58 to 0.71 at the gene level. Promoter methylation and gene expression were, on average, negatively correlated as expected, with correlation ranging from -0.16 to -0.25 at the cell line level and -0.10 to -0.15 at the gene level. Across the genome, copy number and gene expression were positively correlated (0.18 to 0.22 at the cell line level; 0.35 to 0.44 at the gene level). When restricted to copy number aberrations, 22 to 39% of genes in the aberrant regions showed a significant concordance between their genomic and transcriptomic profiles from U133A, exon array and RNAseq after multiple testing correction (see the 'Intra-data relationships’ section in Supplementary Results in Additional file 3 and Table S4a-c in Additional file 3).
Machine learning approaches identify accurate cell line-derived response signatures
The candidate signatures incorporated copy number, methylation, transcription and/or proteomic features. We also included the mutation status of TP53, PIK3CA, MLL3, CDH1, MAP2K4, PTEN and NCOR1, chosen based on reported frequencies from TCGA breast project. That project sequenced the exomes of 507 breast invasive carcinomas and identified approximately 30,000 somatic mutations . Each of the 7 genes was mutated in at least 3% of samples with a false discovery rate (FDR) P-value <0.05. Our whole exome sequencing showed that these genes were also mutated in at least 3% of the breast cancer cell lines. Their mutation rate in TCGA and the cell line panel showed a similar distribution across the subtypes (Figure S2 in Additional file 3). We excluded lower prevalence mutations because their low frequency limits the possibility of significant associations.
These signatures incorporating any of the molecular features are shown in Additional file 5. They predicted compound response within the cell lines with high estimated accuracy (AUC >0.70) regardless of classification method for 51 (57%) of the compounds tested. Concordance between GI50 and TGI exceeded 80% for 67% (34/51) of these compounds. A comparison across all 90 compounds of the LS-SVM and RF models with highest AUC based on copy number, methylation, transcription and/or proteomic features revealed a high correlation between both classification methods (Spearman correlation coefficient = 0.85, P-value <0.001), with the LS-SVM more predictive for 35 compounds and RF for 55 compounds (Figure S3 in Additional file 3). However, there was a better correlation between both classification methods for compounds with strong biomarkers of response (upper third; Spearman correlation coefficient 0.84) and compounds without a clear signal associated with drug response (lower third; Spearman correlation coefficient 0.46). This suggests that for compounds with strong biomarkers, a signature can be identified by either approach. For compounds with a weaker signal of drug response (middle third), there was a larger discrepancy in performance between both classification methods (Spearman correlation coefficient 0.16), with neither of them outperforming the other.
In vivovalidation of the cell line-derived response signatures
Unfortunately, omic profiles and corresponding clinical responses are not available for the other compounds tested in vitro. For these, we investigated whether the in vitro predictive signature was present in 536 breast TCGA tumors and consistent with the signature observed in cell lines. Here, we limited our analyses to those data types that are available in the TCGA dataset. Specifically, we developed response predictors for the breast cancer cell line panel using profiles for expression (U133A, exon array at the gene level, or RNAseq at the gene level), copy number, and promoter methylation for 51 compounds for which predictive power was high (AUC >0.7; Additional file 5). We applied these signatures to a set of 369 luminal, 95 basal, 8 claudin-low, and 58 ERBB2-amplified samples from the TCGA project. We used profiles of expression (n = 536), copy number (n = 306) and promoter methylation (n = 318) in our analyses. Additional file 5 shows that the transcriptional subtype specificities measured for these compounds in the cell lines were concordant with the subtype of TCGA samples predicted to respond. Figure S5 in Additional file 3 shows the predicted probability of response to four compounds with test AUC >0.7 for TCGA tumor samples ordered according to increasing probability. Importantly, genes in these signatures that were coordinately regulated in the set of cell lines were also coordinately regulated in the tumor samples (average Jaccard coefficient = 0.68, P-value <0.0001; Figure S6 in Additional file 3). This panel of 51 compounds represented most major therapeutic target classes (phosphatidylinositol 3-kinase (PI3K), receptor tyrosine kinase, anti-mitotic, DNA damage, cell cycle, proteasome, anti-metabolite, TP53, mitogen-activated protein kinase (MAPK), and estrogen antagonist). Eighteen of these compounds have been approved by the US Food and Drug Administration, including five for breast cancer. Phase I clinical trials are ongoing for seven compounds, phase II trials are underway for seven compounds, including six for breast cancer, and one compound is currently being tested in a phase III trial (Additional file 5). Thus further validation of signatures may be possible in the near future.
Robust predictors of drug response are found at all levels of the genome
In Table S6b,d in Additional file 3, a distinction is made between two groups of compounds: compounds for which all datasets perform similarly well (for example, CGC-11047, GSK461364, GSK2126458, lapatinib) versus compounds for which results with one dataset are much better than obtained with any of the other datasets, defined as an AUC increase of at least 0.1. For example, exon array worked best for VX-680 (AUC 0.81), RNAseq for carboplatin (AUC 0.89), and RPPA for bortezomib (AUC 0.87). Data type specificity was in general not related to therapeutic compound class, although there were a few exceptions for LS-SVM with RNAseq performing well for polyamine analogs (CGC-11047, CGC-11144) and mitotic inhibitors (ixabepilone, paclitaxel, vinorelbine), SNP6 for ERBB2/epidermal growth factor receptor (EGFR) inhibitors (AG1478, BIBW2992, erlotinib, gefitinib, lapatinib), and methylation for CDK1 inhibitors (NU6102, purvalanol A). The full combination of genome-wide datasets yielded a higher AUC value than the best performing individual dataset for only a limited number of compounds (AKT1-2 inhibitor, GSK461364 and PF-4691502). The full combination signatures, however, generally ranked closely to the best signatures based on individual data types. We refer to the 'Robust predictors of drug response' section in Supplementary Results in Additional file 3 for two additional complementary analyses on dataset comparison.
Splice-specific predictors provide only minimal information
We compared the performance of classifiers between the fully featured data and gene-level data in order to investigate the contribution of splice-specific predictors for RNAseq and exon array data. The fully featured data included transcript- and exon-level estimates for the exon array data and transcript-, exon-, junction, boundary-, and intron-level estimates for the RNAseq data. Overall, there was no increase in performance for classifiers built with 'splice-aware' data versus gene level only. The overall difference in AUC from all features minus gene-level was 0.002 for RNAseq and -0.006 for exon array, a negligible difference in both cases. However, there were a few individual compounds with a modest increase in performance when considering splicing information (Table S8 in Additional file 3). Interestingly, both ERBB2 targeting compounds, BIBW2992 and lapatinib, showed improved performance using splice-aware features in both RNAseq and exon array datasets. This suggests that splice-aware predictors may perform better for prediction of ERBB2 amplification and response to compounds that target it. However, the overall result suggests that prediction of response does not benefit greatly from splicing information over gene-level estimates of expression. This indicates that the high performance of RNAseq for discrimination may have more to do with that technology’s improved sensitivity and dynamic range, rather than its ability to detect splicing patterns.
Pathway overrepresentation analysis aids in interpretation of the response signatures
We surveyed the pathways and biological processes represented by genes for the 49 best-performing therapeutic response signatures incorporating copy number, methylation, transcription, and/or proteomic features (i.e., no mutation status) with AUC >0.7 (Additional file 5). For these compounds we created functionally organized networks with the ClueGO plugin in Cytoscape  using Gene Ontology (GO) categories and Kyoto Encyclopedia of Genes and Genomes (KEGG)/BioCarta pathways (Supplementary Methods in Additional file 3). Our previous work identified transcriptional networks associated with response to many of these compounds . In this study, 5 to 100% (median 79%) of GO categories and pathways present in the predictive signatures were found to be significantly associated with drug response (FDR P-value <0.05). The majority of these significant pathways, however, were also associated with transcriptional subtype (17 to 100%, median 70%). These were filtered out to capture subtype-independent biology underlying each compound’s mechanism of action. The resulting non-subtype-specific pathways with FDR P-value <0.05 are shown in Additional file 6. Eighty-eight percent of the compounds for which we conducted pathway analysis were significantly associated with one or more GO category and 80% were significantly associated with one or more KEGG pathway. The most commonly identified KEGG pathways (six or more compounds) were hedgehog signaling, basal cell carcinoma, glycosphingolipid biosynthesis, ribosome, spliceosome and Wnt signaling. The most commonly identified GO processes (six or more compounds) also included many critical cancer pathways and processes, such as regulation of cell cycle, cell death, protein kinase activity, metabolism, TGFβ receptor signaling, cell-cell adhesion, microtubule polymerization, and Wnt receptor signaling. Many of these processes can be linked directly to the known mechanisms of action of their associated compounds. For example, the signature for docetaxel was significantly enriched for microtubule polymerization genes. Docetaxel is known to function by microtubule disassembly inhibition. Similarly, signatures for the AKT1/2 kinase inhibitor, bosutinib SRC kinase inhibitor, TCS PIM-11 kinase inhibitor and four PI3K inhibitors (GSK2119563, GSK2126458, PF-4691502, TGX-221) were all enriched in genes involved in the negative regulation of protein kinase activity. These kinase regulation genes tended to be consistently up-regulated or both methylated and down-regulated, depending on the therapeutic response signature. Many of the genes in this enriched gene set have well-described roles in modulation of the PI3K/MAPK cascades, including ERRFI1, DUSP6/7/8 and SPRY1/2/4. In particular, we found that high expression of GADD45A was associated with resistance to GSK2126458, PF-4691502 and the AKT1/2 inhibitor, which is consistent with the observation that AKT inhibition modulates cell growth via activation of GADD45A. The pan-PI3K targeting agent GSK2126458 is reported to function as a competitive ATP binding inhibitor and the signature for this compound was over-represented in ATP metabolic processes .
Genomic aberrations and transcriptomic/proteomic features played prominent roles in some of the candidate response signatures. For copy number aberrations, ERBB2 amplification was strongly associated with response to the ERBB2 targeting compounds lapatinib (two-sample t-test, P-value 2.1e-11) and BIBW2992 (1.6e-5) and to EGFR inhibitors AG1478 (2.5e-4) and gefitinib (9.5e-4). In addition to the association of overall mutation status with tamoxifen and CGC-11144 response discussed above, we also found several individual mutations to be relevant for treatment response. The presence of mutations in TP53 was strongly associated with response to the PI3K inhibitor BEZ235, with 13/25 (52%) of the sensitive cell lines harboring TP53 mutations compared to 3/19 (16%) for the resistant cell lines (Fisher’s exact test, P-value 0.025). This may be an indication of synthetic lethality resulting from BEZ235 inhibition of ATR (Ataxia telangiectasia and Rad3-related protein) leading to replicative stress in TP53-deficient cells . Kim et al.  showed a similar trend in a study of 310 cell lines across multiple lineages in which co-mutation of TP53 and PIK3CA was positively associated with response to BEZ235. In our study, mutation status for PIK3CA was associated with response to the PI3K inhibitor GSK1059615B, with 11/27 (41%) sensitive cell lines carrying PIK3CA mutations compared to 2/21 (10%) for resistant cell lines (P-value 0.022). These findings are consistent with recent clinical observations in patients with breast and gynecologic malignancies where treatment with similar agents resulted in response for 30% of patients with PIK3CA mutations compared to a response rate of 10% in wild-type PIK3CA patients .
Response signature Toolbox to predict response in individual tumors
Our long-term goal is to develop a way to select therapeutic compounds most likely to be effective in an individual patient. A shorter-term goal is to test experimental compounds in patients that are most likely to be responsive. Both of these goals require a strategy to order compounds according to their predicted relative efficacy for individual patients. To this end, we developed software to rank order compounds for predicted efficacy in individual patients (see the 'Patient response prediction toolbox in R' section in Supplementary Results in Additional file 3). The software applies signatures of response developed in vitro to measurements of expression, copy number, and/or methylation for individual samples and produces a list of recommended treatments ranked according to predicted probability of response and in vitro GI50 dynamic range. For cases where several compounds are predicted to be equally effective, highest priority is assigned to the compound with highest GI50 dynamic range in the cell line panel.
In this study we developed strategies to identify molecular response signatures for 90 compounds based on measured responses in a panel of 70 breast cancer cell lines, and we assessed the predictive strengths of several strategies. The molecular features comprising the high quality signatures are candidate molecular markers of response that we suggest for clinical evaluation. In most cases, the signatures with high predictive power in the cell line panel show significant PAM50 subtype specificity, suggesting that assigning compounds in clinical trials according to transcriptional subtype will increase the frequency of responding patients. However, our findings suggest that treatment decisions could further be improved for most compounds using specifically developed response signatures based on profiling at multiple omic levels, independent of - or in addition to - the previously defined transcriptional subtypes. We make available the drug response data and molecular profiling data from seven different platforms for the entire cell line panel as a resource for the community to aid in improving methods of drug response prediction.
We found predictive signatures of response across all platforms and levels of the genome. When restricting the analysis to just 55 well-known cancer proteins and phosphoprotein genes, all platforms do a reasonable job of measuring a signal associated with and predictive of drug response. This indicates that if a compound has a molecular signature that correlates with response, it is likely that many of the molecular data types will be able to measure this signature in some way. Furthermore, there was no substantial advantage of the combined platforms compared with the individual platforms. Some platforms might be able to measure the signature with slightly better accuracy, but our results indicate that many of the platforms could be optimized to identify a response-associated predictor.
Conversely, in the genome-wide comparison, the more comprehensive platforms are the ones that overall resulted in better prediction performance. This difference may reflect the fact that for those platforms, we selected the most significant feature per gene. For example, when a gene measured on the Affymetrix microarray is significantly differentially expressed, the chance is high that a particular exon or transcript is even more significant. Thus, the richness of data types like RNAseq offer the chance to identify both the signature and the most useful specific gene regions and junctions for use in a diagnostic (Figure 4). Taken together, these results suggest that the more comprehensive genome-wide platforms could be used for discovery, and once identified, significant features can be migrated to alternative platforms for a lab diagnostic.
Currently, treatment decisions are guided by ER and ERBB2 status. Using the TCGA dataset of 306 samples with expression, copy number and methylation measurements as a hypothetical example (Figure 5), a personalized treatment decision would be available for 81% of patients based on ERBB2 or ER status alone (55 ERBB2+, 193 ERBB2-/ER+). However, given reported response rates for trastuzumab (15 to 50%)  and tamoxifen (approximately 25%)  we can expect a substantial fraction of these will not respond. The candidate predictors proposed here could inform such clinical decisions for nearly all patients. Therefore, by considering diverse molecular data, we might suggest treatment options for not only the approximately 20% of patients who are ERBB2-/ER- but also secondary treatment options for those who will suboptimally respond to ER or ERBB2 directed treatments.
While our efforts to develop predictive drug response signatures are quite promising, they come with several conceptual caveats. Although the cell line panel is a reasonable model system, it does not capture several features known to be of critical importance in primary tumors. In particular, we have not modeled influences of the microenvironment, including additional cell types known to contribute to tumorigenesis , as well as variation in oxygen content, which has been shown to influence therapeutic response . Expanding these experiments to three-dimensional model systems or mouse xenografts would aid in translation to the clinic. Additionally, validating these predictors in independent data sets will be important for determining how robust they are (see Supplementary Results and Additional file 8). Despite these limitations, our observation that we could find evidence of these predictive signatures in the TCGA data suggests that our cell line system is likely capturing many of the key elements involved in mediating therapeutic response.
Of course, the cell line-derived predictive signatures described in this study require substantial clinical validation. One possibility is in neoadjuvant trials like the I-SPY 2 TRIAL , in which in vitro-derived signatures for individual compounds are tested for power in predicting pathologic complete response or change in tumor volume measured with magnetic resonance imaging. An alternative approach for validation of signatures for approved drugs is to compare outcomes in patients assigned compounds according to in vitro predictors with outcomes in patients assigned drugs according to physicians’ first treatment choice. This study constitutes the basis for such a trial, with the development of a portfolio of in vitro predictors (for example, the 22 compounds displayed in Figure 5) and a computational tool that physicians might use to select compounds from that portfolio for individual patients.
Regardless of the specific design of the clinical trial, gene expression, methylation and copy number levels should be collected for all patients. High throughput sequencing techniques can provide all three with the additional benefits of alternative splicing information. As outlined in Figure 1, measurements of expression, methylation and copy number would serve as input to the predictor toolbox. The output of the toolbox consists of a report for each individualized patient, with the 22 therapeutic compounds ranked according to a patient’s likelihood of response and in vitro GI50 dynamic range. The full panel of 22 drug compounds could be tested simultaneously in a multi-arm trial to speed up the validation of the in vitro approach. The proposed clinical trial may also involve further optimizing of the number of markers in the signatures and choosing clinically relevant thresholds for tumor classification.
Materials and methods
We refer to Supplementary Methods in Additional file 3 for a detailed description of the therapeutic compound response data, molecular data for the breast cancer cell lines, molecular data for the external breast cancer tumor samples used for validation, classification methods, data integration approach, statistical methods, pathway overrepresentation analysis, and the patient response prediction toolbox for the R project for statistical computing.
Data and code deposition
Genome copy number data have been deposited at the European Genome-phenome Archive (EGA) , hosted at the EBI (accession numbers EGAS00000000059 and EGAS00001000585). Gene expression data for the cell lines were derived from Affymetrix GeneChip Human Genome U133A and Affymetrix GeneChip Human Exon 1.0 ST arrays. Raw data are available in ArrayExpress , hosted at the EBI (accession numbers E-TABM-157 and E-MTAB-181). RNAseq and exome-seq data can be accessed at the GEO, , accession number GSE48216. Genome-wide methylation data for the cell lines are also available through GEO, accession number GSE42944. Software and data for treatment response prediction are available on Synapse . The software has also been deposited at GitHub . The raw drug response data are available as Additional file 9.
AD was partly supported by a BAEF Fellowship of the Belgian American Educational Foundation for postdoctoral research, OLG was supported by a Fellowship from the Canadian Institutes of Health Research, EAC is supported by NCI 5K08CA137153-02.
area under the receiver operating characteristic curve
epidermal growth factor receptor
false discovery rate
Gene Expression Omnibus
concentration at which growth is inhibited by 50%
Kyoto Encyclopedia of Genes and Genomes
least squares support vector machine
mitogen-activated protein kinase
Prediction Analysis for Microarrays
Reverse Phase Protein Array
single nucleotide polymorphism
The Cancer Genome Atlas
total growth inhibition.
We thank Michael Kellen, Brian Bot, and Steven Friend at Sage Bionetworks for their assistance in porting the R Toolbox into Synapse. This work was supported by the Director, Office of Science, Office of Biological and Environmental Research, of the US Department of Energy under contract number DE-AC02-05CH11231; by the National Institutes of Health, National Cancer Institute grants P50 CA 58207 (LJV, JWG), U54 CA 112970, CA 126551, the SU2C-AACR-DT0409 and research grants from GSK, Pfizer Corporation and Prospect Creek Foundation grants to JWG. The content of the information does not necessarily reflect the position or the policy of the Government, and no official endorsement should be inferred.
- Parker JS, Mullins M, Cheang MC, Leung S, Voduc D, Vickery T, Davies S, Fauron C, He X, Hu Z, Quackenbush JF, Stijleman IJ, Palazzo J, Marron JS, Nobel AB, Mardis E, Nielsen TO, Ellis MJ, Perou CM, Bernard PS: Supervised risk predictor of breast cancer based on intrinsic subtypes. J Clin Oncol. 2009, 27: 1160-1167.View ArticlePubMedPubMed CentralGoogle Scholar
- Reis-Filho JS, Pusztai L: Gene expression profiling in breast cancer: classification, prognostication, and prediction. Lancet. 2011, 378: 1812-1823.View ArticlePubMedGoogle Scholar
- Curtis C, Shah SP, Chin SF, Turashvili G, Rueda OM, Dunning MJ, Speed D, Lynch AG, Samarajiwa S, Yuan Y, Gräf S, Ha G, Haffari G, Bashashati A, Russell R, McKinney S, Langerød A, Green A, Provenzano E, Wishart G, Pinder S, Watson P, Markowetz F, Murphy L, Ellis I, Purushotham A, Børresen-Dale AL, Brenton JD, Tavaré S, METABRIC Group, et al: The genomic and transcriptomic architecture of 2,000 breast tumours reveals novel subgroups. Nature. 2012, 486: 346-352.PubMedPubMed CentralGoogle Scholar
- The Cancer Genome Atlas Network: Comprehensive characterization of the molecular portraits of human breast tumors. Nature. 2012, 490: 61-67.View ArticlePubMed CentralGoogle Scholar
- Holm K, Hegardt C, Staaf J, Vallon-Christersson J, Jonsson G, Olsson H, Borg A, Ringner M: Molecular subtypes of breast cancer are associated with characteristic DNA methylation patterns. Breast Cancer Res. 2010, 12: R36-View ArticlePubMedPubMed CentralGoogle Scholar
- Lapuk A, Marr H, Jakkula L, Pedro H, Bhattacharya S, Purdom E, Hu Z, Simpson K, Pachter L, Durinck S, Wang N, Parvin B, Fontenay G, Speed T, Garbe J, Stampfer M, Bayandorian H, Dorton S, Clark TA, Schweitzer A, Wyrobek A, Feiler H, Spellman P, Conboy J, Gray JW: Exon-level microarray analyses identify alternative splicing programs in breast cancer. Mol Cancer Res. 2010, 8: 961-974.View ArticlePubMedPubMed CentralGoogle Scholar
- Kamel D, Brady B, Tabchy A, Mills GB, Hennessy B: Proteomic classification of breast cancer. Curr Drug Targets. 2012, 13: 1495-1509.View ArticlePubMedGoogle Scholar
- Heiser LM, Sadanandam A, Kuo WL, Benz SC, Goldstein TC, Ng S, Gibb WJ, Wang NJ, Ziyad S, Tong F, Bayani N, Hu Z, Billig JI, Dueregger A, Lewis S, Jakkula L, Korkola JE, Durinck S, Pepin F, Guan Y, Purdom E, Neuvial P, Bengtsson H, Wood KW, Smith PG, Vassilev LT, Hennessy BT, Greshock J, Bachman KE, Hardwicke MA, et al: Subtype and pathway specific responses to anticancer compounds in breast cancer. Proc Natl Acad Sci U S A. 2012, 109: 2724-2729.View ArticlePubMedGoogle Scholar
- Neve RM, Chin K, Fridlyand J, Yeh J, Baehner FL, Fevr T, Clark L, Bayani N, Coppe JP, Tong F, Speed T, Spellman PT, DeVries S, Lapuk A, Wang NJ, Kuo WL, Stilwell JL, Pinkel D, Albertson DG, Waldman FM, McCormick F, Dickson RB, Johnson MD, Lippman M, Ethier S, Gazdar A, Gray JW: A collection of breast cancer cell lines for the study of functionally distinct cancer subtypes. Cancer Cell. 2006, 10: 515-527.View ArticlePubMedPubMed CentralGoogle Scholar
- Barretina J, Caponigro G, Stransky N, Venkatesan K, Margolin AA, Kim S, Wilson CJ, Lehar J, Kryukov GV, Sonkin D, Reddy A, Liu M, Murray L, Berger MF, Monahan JE, Morais P, Meltzer J, Korejwa A, Jane-Valbuena J, Mapa FA, Thibault J, Bric-Furlong E, Raman P, Shipway A, Engels IH, Cheng J, Yu GK, Yu J, Aspesi P, de Silva M, et al: The Cancer Cell Line Encyclopedia enables predictive modelling of anticancer drug sensitivity. Nature. 2012, 483: 603-607.View ArticlePubMedPubMed CentralGoogle Scholar
- Garnett MJ, Edelman EJ, Heidorn SJ, Greenman CD, Dastur A, Lau KW, Greninger P, Thompson IR, Luo X, Soares J, Liu Q, Iorio F, Surdez D, Chen L, Milano RJ, Bignell GR, Tam AT, Davies H, Stevenson JA, Barthorpe S, Lutz SR, Kogera F, Lawrence K, McLaren-Douglas A, Mitropoulos X, Mironenko T, Thi H, Richardson L, Zhou W, Jewitt F, et al: Systematic identification of genomic markers of drug sensitivity in cancer cells. Nature. 2012, 483: 570-575.View ArticlePubMedPubMed CentralGoogle Scholar
- Staunton JE, Slonim DK, Coller HA, Tamayo P, Angelo MJ, Park J, Scherf U, Lee JK, Reinhold WO, Weinstein JN, Mesirov JP, Lander ES, Golub TR: Chemosensitivity prediction by transcriptional profiling. Proc Natl Acad Sci U S A. 2001, 98: 10787-10792.View ArticlePubMedPubMed CentralGoogle Scholar
- Liedtke C, Wang J, Tordai A, Symmans WF, Hortobagyi GN, Kiesel L, Hess K, Baggerly KA, Coombes KR, Pusztai L: Clinical evaluation of chemotherapy response predictors developed from breast cancer cell lines. Breast Cancer Res Treat. 2010, 121: 301-309.View ArticlePubMedGoogle Scholar
- Lee JK, Coutant C, Kim YC, Qi Y, Theodorescu D, Symmans WF, Baggerly K, Rouzier R, Pusztai L: Prospective comparison of clinical and genomic multivariate predictors of response to neoadjuvant chemotherapy in breast cancer. Clin Cancer Res. 2010, 16: 711-718.View ArticlePubMedPubMed CentralGoogle Scholar
- Suykens JAK, De Brabanter J, Lukas L, Vandewalle J: Weighted least squares support vector machines: robustness and sparse approximation. Neurocomputing. 2002, 48: 85-105.View ArticleGoogle Scholar
- Breiman L: Random forests. Mach Learn. 2001, 45: 5-32.View ArticleGoogle Scholar
- Loi S, Haibe-Kains B, Desmedt C, Lallemand F, Tutt AM, Gillet C, Ellis P, Harris A, Bergh J, Foekens JA, Klijn JG, Larsimont D, Buyse M, Bontempi G, Delorenzi M, Piccart MJ, Sotiriou C: Definition of clinically distinct molecular subtypes in estrogen receptor-positive breast carcinomas through genomic grade. J Clin Oncol. 2007, 25: 1239-1246.View ArticlePubMedGoogle Scholar
- Zhang Y, Sieuwerts AM, McGreevy M, Casey G, Cufer T, Paradiso A, Harbeck N, Span PN, Hicks DG, Crowe J, Tubbs RR, Budd GT, Lyons J, Sweep FC, Schmitt M, Schittulli F, Golouh R, Talantov D, Wang Y, Foekens JA: The 76-gene signature defines high-risk patients that benefit from adjuvant tamoxifen therapy. Breast Cancer Res Treat. 2009, 116: 303-309.View ArticlePubMedGoogle Scholar
- Symmans WF, Hatzis C, Sotiriou C, Andre F, Peintinger F, Regitnig P, Daxenbichler G, Desmedt C, Domont J, Marth C, Delaloge S, Bauernhofer T, Valero V, Booser DJ, Hortobagyi GN, Pusztai L: Genomic index of sensitivity to endocrine therapy for breast cancer. J Clin Oncol. 2010, 28: 4111-4119.View ArticlePubMedPubMed CentralGoogle Scholar
- Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, Desmedt C, Larsimont D, Cardoso F, Peterse H, Nuyten D, Buyse M, Van de Vijver MJ, Bergh J, Piccart M, Delorenzi M: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006, 98: 262-272.View ArticlePubMedGoogle Scholar
- Cohen AL, Soldi R, Zhang H, Gustafson AM, Wilcox R, Welm BE, Chang JT, Johnson E, Spira A, Jeffrey SS, Bild AH: A pharmacogenomic method for individualized prediction of drug sensitivity. Mol Syst Biol. 2011, 7: 513-View ArticlePubMedPubMed CentralGoogle Scholar
- Bindea G, Mlecnik B, Hackl H, Charoentong P, Tosolini M, Kirilovsky A, Fridman WH, Pages F, Trajanoski Z, Galon J: ClueGO: a Cytoscape plug-in to decipher functionally grouped gene ontology and pathway annotation networks. Bioinformatics. 2009, 25: 1091-1093.View ArticlePubMedPubMed CentralGoogle Scholar
- Ferby I, Reschke M, Kudlacek O, Knyazev P, Pante G, Amann K, Sommergruber W, Kraut N, Ullrich A, Fassler R, Klein R: Mig6 is a negative regulator of EGF receptor-mediated skin morphogenesis and tumor formation. Nat Med. 2006, 12: 568-573.View ArticlePubMedGoogle Scholar
- Jeffrey KL, Camps M, Rommel C, Mackay CR: Targeting dual-specificity phosphatases: manipulating MAP kinase signalling and immune responses. Nat Rev Drug Discov. 2007, 6: 391-403.View ArticlePubMedGoogle Scholar
- Kim HJ, Bar-Sagi D: Modulation of signalling by Sprouty: a developing story. Nat Rev Mol Cell Biol. 2004, 5: 441-450.View ArticlePubMedGoogle Scholar
- Zhu QS, Ren W, Korchin B, Lahat G, Dicker A, Lu Y, Mills G, Pollock RE, Lev D: Soft tissue sarcoma cells are highly sensitive to AKT blockade: a role for p53-independent up-regulation of GADD45 alpha. Cancer Res. 2008, 68: 2895-2903.View ArticlePubMedPubMed CentralGoogle Scholar
- Greger JG, Eastman SD, Zhang V, Bleam MR, Hughes AM, Smitheman KN, Dickerson SH, Laquerre SG, Liu L, Gilmer TM: Combinations of BRAF, MEK, and PI3K/mTOR Inhibitors Overcome Acquired Resistance to the BRAF Inhibitor GSK2118436 Dabrafenib, Mediated by NRAS or MEK Mutations. Mol Cancer Ther. 2012, 11: 909-920.View ArticlePubMedGoogle Scholar
- Toledo LI, Murga M, Zur R, Soria R, Rodriguez A, Martinez S, Oyarzabal J, Pastor J, Bischoff JR, Fernandez-Capetillo O: A cell-based screen identifies ATR inhibitors with synthetic lethal properties for cancer-associated mutations. Nat Struct Mol Biol. 2011, 18: 721-727.View ArticlePubMedPubMed CentralGoogle Scholar
- Kim N, He N, Kim C, Zhang F, Lu Y, Yu Q, Stemke-Hale K, Greshock J, Wooster R, Yoon S, Mills GB: Systematic analysis of genotype-specific drug responses in cancer. Int J Cancer. 2012, 131: 2456-2464.View ArticlePubMedPubMed CentralGoogle Scholar
- Janku F, Wheler JJ, Westin SN, Moulder SL, Naing A, Tsimberidou AM, Fu S, Falchook GS, Hong DS, Garrido-Laguna I, Luthra R, Lee JJ, Lu KH, Kurzrock R: PI3K/AKT/mTOR inhibitors in patients with breast and gynecologic malignancies harboring PIK3CA mutations. J Clin Oncol. 2012, 30: 777-782.View ArticlePubMedPubMed CentralGoogle Scholar
- Blum JL, Jones SE, Buzdar AU, LoRusso PM, Kuter I, Vogel C, Osterwalder B, Burger HU, Brown CS, Griffin T: Multicenter phase II study of capecitabine in paclitaxel-refractory metastatic breast cancer. J Clin Oncol. 1999, 17: 485-493.PubMedGoogle Scholar
- Blum JL, Dieras V, Lo Russo PM, Horton J, Rutman O, Buzdar A, Osterwalder B: Multicenter, Phase II study of capecitabine in taxane-pretreated metastatic breast carcinoma patients. Cancer. 2001, 92: 1759-1768.View ArticlePubMedGoogle Scholar
- Cobleigh MA, Vogel CL, Tripathy D, Robert NJ, Scholl S, Fehrenbacher L, Wolter JM, Paton V, Shak S, Lieberman G, Slamon DJ: Multinational study of the efficacy and safety of humanized anti-HER2 monoclonal antibody in women who have HER2-overexpressing metastatic breast cancer that has progressed after chemotherapy for metastatic disease. J Clin Oncol. 1999, 17: 2639-2648.PubMedGoogle Scholar
- Buchanan RB, Blamey RW, Durrant KR, Howell A, Paterson AG, Preece PE, Smith DC, Williams CJ, Wilson RG: A randomized comparison of tamoxifen with surgical oophorectomy in premenopausal patients with advanced breast cancer. J Clin Oncol. 1986, 4: 1326-1330.PubMedGoogle Scholar
- Hanahan D, Coussens LM: Accessories to the crime: functions of cells recruited to the tumor microenvironment. Cancer Cell. 2012, 21: 309-322.View ArticlePubMedGoogle Scholar
- Strese S, Fryknas M, Larsson R, Gullbo J: Effects of hypoxia on human cancer cell line chemosensitivity. BMC Cancer. 2013, 13: 331-View ArticlePubMedPubMed CentralGoogle Scholar
- Barker AD, Sigman CC, Kelloff GJ, Hylton NM, Berry DA, Esserman LJ: I-SPY 2: an adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy. Clin Pharmacol Ther. 2009, 86: 97-100.View ArticlePubMedGoogle Scholar
- European Genome-phenome Archive: [http://www.ebi.ac.uk/ega/]
- ArrayExpress. [http://www.ebi.ac.uk/arrayexpress/]
- Gene Expression Omnibus. [http://www.ncbi.nlm.nih.gov/geo/]
- Modeling precision treatment of breast cancer. [https://www.synapse.org/#!Synapse:syn2179898]
- Rtoolbox. [https://github.com/obigriffith/Rtoolbox]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.