Cell line-based response prediction strategy. (A) We assembled a collection of 84 breast cancer cell lines composed of 35 luminal, 27 basal, 10 claudin-low, 7 normal-like, 2 matched normal and 3 of unknown subtype. Fourteen luminal and 7 basal cell lines were also ERBB2-amplified. (B) Seventy lines were tested for response to 138 compounds by growth inhibition assays. Compounds with low variation in response in the cell line panel were eliminated, leaving a response data set of 90 compounds. Cell lines were divided into a sensitive and resistant group for each compound using the mean GI50 value for that compound. (C) Seven pretreatment molecular profiling data sets were analyzed to identify molecular features associated with response. Exome-seq data were available for 75 cell lines, followed by SNP6 data for 74 cell lines, RNAseq for 56, exon array for 56, RPPA for 49, methylation for 47, and U133A expression array data for 46 cell lines. All 70 lines were used in development of at least some predictors depending on data type availability. (D) Classification signatures were developed using the molecular feature data (after filtering) and with response status as the target. Two methods, weighted least squares support vector machine (LS-SVM) and random forests (RF), were utilized. The best performing signature was chosen for each drug and data type combination. This allows prediction of response for additional cell lines or tumors with any given combination of input data types. (E) Cell line-based response predictors were applied to 306 TCGA breast tumors for which expression (Exp), copy number (CNV) and methylation (Meth) measurements were all available. (F) This identified 22 compounds with a model AUC >0.7 for which at least some patients were predicted to be responsive with a probability >0.65. Thresholds for considering a tumor responsive were objectively chosen for each compound from the distribution of predicted probabilities and each patient was assigned to a status of resistant, intermediate or sensitive. WPMV, weighted percent of model variables.