A consensus prognostic gene expression classifier for ER positive breast cancer
- Andrew E Teschendorff†^{1}Email author,
- Ali Naderi†^{1},
- Nuno L Barbosa-Morais^{1, 2},
- Sarah E Pinder^{3},
- Ian O Ellis^{4},
- Sam Aparicio^{1, 5},
- James D Brenton^{1} and
- Carlos Caldas^{1}Email author
DOI: 10.1186/gb-2006-7-10-r101
© Teschendorff et al.; licensee BioMed Central Ltd. 2006
Received: 7 June 2006
Accepted: 31 October 2006
Published: 31 October 2006
Abstract
Background
A consensus prognostic gene expression classifier is still elusive in heterogeneous diseases such as breast cancer.
Results
Here we perform a combined analysis of three major breast cancer microarray data sets to hone in on a universally valid prognostic molecular classifier in estrogen receptor (ER) positive tumors. Using a recently developed robust measure of prognostic separation, we further validate the prognostic classifier in three external independent cohorts, confirming the validity of our molecular classifier in a total of 877 ER positive samples. Furthermore, we find that molecular classifiers may not outperform classical prognostic indices but that they can be used in hybrid molecular-pathological classification schemes to improve prognostic separation.
Conclusion
The prognostic molecular classifier presented here is the first to be valid in over 877 ER positive breast cancer samples and across three different microarray platforms. Larger multi-institutional studies will be needed to fully determine the added prognostic value of molecular classifiers when combined with standard prognostic factors.
Background
The identification of a prognostic gene expression signature in breast cancer that is valid across multiple independent data sets and different microarray platforms is a challenging problem [1]. Recently, there have been reports of molecular prognostic and predictive signatures that were also valid in external independent cohorts [2–7]. One of these studies derived the prognostic signature from genes correlating with histological grade [4], while in [5] it was derived directly from correlations with clinical outcome data and was validated in estrogen receptor positive lymph node negative (ER+LN-) breast cancer. Another study validated a predictive score, based on 21 genes, for ER+LN-tamoxifen treated breast cancer [2]. These results are encouraging, yet, as explained recently in [8, 9], much larger cohort sizes may be needed before a consensus prognostic signature emerges. While the intrinsic subtype classification does appear to constitute a set of consensus signatures [7], it is also clear that these classifiers are not optimized for prognosis. Moreover, although different prognostic signatures have recently been shown to give similar classifications in one breast cancer cohort [6], this result was not shown to hold in other cohorts. In fact, a problem remains in that the two main prognostic gene signatures derived so far [10, 11] do not validate in the other's data set, even when cohort differences are taken into account [9, 12]. Furthermore, the 21 genes that make up the predictive score [2] were derived from a relatively small number of genes (approximately 250) using criteria such as assay-probe performance. Hence, it is likely that other gene combinations could result in improved classifiers. These problems have raised questions about the clinical utility of molecular signatures as currently developed [13].
There are many factors that may contribute to the observed lack of consistency between derived signatures. In addition to cohort size, another factor is the use of dichotomized outcome variables, a procedure that is justified clinically but which may introduce significant bias [14]. A related problem concerns the way molecular prognostic classifiers have been evaluated, which is often done by dichotomizing the associated molecular prognostic index (MPI). Such dichotomizations are often not justified since they implicitly assume a bi-modal distribution for the MPI, while the evidence points at prognostic indices that are often best described in terms of uni-modal distributions [4, 10, 11]. Another difficulty concerns the evaluation of a prognostic index in external independent studies, which requires a careful recalibration procedure, but which is often either ignored or not addressed rigorously [15]. A strategy that may allow for uni-modal prognostic index distributions and that allows a more objective and reliable evaluation of a prognostic classifier across independent cohorts is, therefore, desirable [16].
Another matter of recent controversy is whether a molecular prognostic signature can outperform classical prognostic factors, such as lymph node status, tumor size, grade or combinations thereof such as the Nottingham Prognostic Index (NPI) [17]. It was shown that molecular prognostic signatures are the strongest predictors in multivariate Cox-regression models that include standard prognostic factors [4, 5, 18, 19]. On the other hand, more objective tests that compare a molecular prognostic signature with classical prognostic factors in completely independent cohorts profiled on different platforms is still lacking. Furthermore, it appears that prognostic models that combine classical prognostic factors in multivariate models may perform as well, or even better than, molecular prognostic signatures [20].
One way to effectively increase the cohort size is to use a combined ('meta-analysis') approach. Meta-analyses of microarray data sets have already enabled identification of robust metagene signatures associated with neoplastic transformation and progression and particular gene functions across a wide range of different tumor types [21, 22]. A meta-analysis of breast cancer was also recently attempted [23], where four independent breast cancer cohorts were fused together using an ingenious Bayesian method [24], and from which a metasignature was derived that correlated with relapse in each of the four studies. This study was exploratory in nature, however, and did not evaluate the metasignature in independent data sets. Furthermore, the metasignature was derived from a mix of ER+ and ER-tumors and was, therefore, confounded by ER status. In fact, this signature does not validate in the more recent breast cancer cohorts (Teschendorff AE, unpublished).
In this work we present a combined analysis of ER+ breast cancer that uses a recently proposed framework [16] for objectively evaluating prognostic separation of a molecular classifier across independent data sets and platforms. Importantly, this evaluation method does not dichotomize the prognostic index, allowing for prognostic index distributions that may be uni-modal. Using this novel approach, the purpose of our work is two-fold. First, to hone in on a consensus set of prognostic genes by using a meta-analysis to derive a prognostic molecular classifier in ER+ breast cancer and show that it validates in completely independent external cohorts and different platforms. Second, to evaluate its prognostic separation relative to histopathological prognostic factors and to explore the prognostic added value of molecular classifiers when combined with classical prognostic factors. We use six of the largest breast cancer cohorts available (described in [4, 11, 12, 18, 25, 26]; in [4] we used the independent cohort of 101 samples from the John Radcliffe Hospital, Oxford, UK), representing a total of 877 ER+ patients profiled across three different microarray platforms.
Results
Breast cancer data sets used
Study | Cohort name | Platform | ER+ samples | Events (RIP/DM) |
---|---|---|---|---|
van de Vijver [18] | NKI2 | oligos Agilent | 226 | 45 |
Wang [11] | EMC | oligos Affymetrix | 208 | 80 |
Naderi [12] | NCH | oligos Agilent | 93 | 21 |
Sotiriou [25] | JRH-1 | spotted cDNA | 65 | 20 |
Miller [26] | UPP | oligos Affymetrix | 213 | 49 |
Sotiriou [4] | JRH-2 | oligos Affymetrix | 72 | 17 |
A meta-analysis derived molecular prognostic index (MPI)
The derivation of the molecular classifier is described in detail in Materials and methods (see also Figure 1). Briefly, each of the three training cohorts was divided into 10 different training-test set partitions [27], ensuring the same number of training samples for each training cohort. Because of the small cohort size of NCH (n = 93), all samples from this cohort were used; thus, 93 training samples were also used from the NKI2 and EMC cohorts. We found that, by choosing a smaller training set for NCH, the performance of the classifier in the NCH test set would be too variable and would unduly influence the derived prognostic classifier. While using the whole NCH cohort as a training set introduces a slight bias towards selecting features that perform well in the NCH cohort, this is offset by optimizing the classifier to the test sets in NKI2 and EMC. The remaining samples in NKI2 (n = 133) and EMC (n = 115) were used as additional independent test sets. The common genes were z-score normalized and ranked, for each training-test set partition p = 1,...,10, according to their average univariate Cox-scores over the three training data sets. A continuous molecular prognostic index (MPI_{ p }) for each of the test samples (i) in the training cohorts (s) and for a given number of top-ranked genes in the classifier (n) was then computed by the dot product of the average Cox-regression coefficient vector ($\widehat{\beta}$_{ gp }, (g = 1,..., n)) (as estimated from the training-set samples) with the vector of normalized gene expression values (x_{ gis }, (g = 1,..., n)), that is:
$MP{I}_{isp}={\displaystyle \sum _{g=1}^{n}{\widehat{\beta}}_{gp}{x}_{gis}}$
The D-index of prognostic factors across cohorts
Training | Test | |||||
---|---|---|---|---|---|---|
Factor | NKI2 | ECM | NCH | JRH-1 | UPP | JRH-2* |
Grade | 3.80 (<10^{-5}) | NA | 3.57 (0.001) | 3.84 (0.003) | 2.55 (0.0003) | 2.15 (0.17) |
Node status | 1.01 (0.97) | all LN- | 2.23 (0.05) | 2.64 (0.04) | 4.03 (<10^{-6}) | 2.36 (0.25) |
Size | 1.59 (0.06) | NA | 3.36 (0.003) | 4.16 (<10^{-3}) | 3.18 (<10^{-5}) | 3.04 (0.008) |
NPI | 2.27 (<10^{-3}) | NA | 4.07 (<10^{-3}) | 5.16 (<10^{-4}) | 3.82 (<10^{-7}) | 3.78 (0.03) |
MPI^{†} | 3.32 (<10^{-3}) | 2.29 (0.002) | NA | 3.20 (0.002) | 2.71 (<10^{-4}) | 7.96 (<10^{-4}) |
MPI^{‡} | 3.64 (<10^{-7}) | 2.56 (<10^{-6}) | 6.45 (<10^{-5}) | 3.44 (<10^{-3}) | 2.80 (<10^{-4}) | 11.26 (<10^{-5}) |
MPI^{§} | 3.64 (<10^{-6}) | 2.51 (<10^{-5}) | 6.51 (<10^{-5}) | 3.10 (<10^{-5}) | 2.84 (0.001) | 10.10 (<10^{-4}) |
A consensus molecular prognostic classifier
To arrive at a final list of prognostic genes, independent of any choice of training-test set realization, we computed the global average Cox-scores over the ten training-test set realizations and three training cohorts. The resulting global averaged Cox-scores were then used to give a final ranking of the genes. A 'consensus' optimal classifier was then built by sequentially adding genes from the top of this list to a classifier set and computing the D-index of this classifier for each of the three training cohorts. An overall D-index score, D_{ O }, was then evaluated as the weighted average of the D-indices for each training cohort (D_{ S }), that is:
${D}_{o}={\displaystyle {\sum}_{s\in {S}_{train}}{w}_{s}{D}_{s}}$
Top prognostic genes in ER+ breast cancer
UniGene symbol | Coefficient sign | Cytoband | GO |
---|---|---|---|
RACGAP1 | + | 12q13.12 | GTPase activator activity, electron transporter activity |
STK6 | + | 20q13.2-q13.3 | ATP binding, mitosis, phosphorylation, kinase activity |
HUMMLC2B | - | 16p11.2 | calcium ion binding, muscle myosin |
MELK | + | 9p13.2 | ATP binding, phosphorylation, tyrosine kinase activity |
PPARA | - | 22q12-q13.1 | Transcription factor, steroid hormone activity/lipid metabolism |
DHCR7 | + | 11q13.2-q13.5 | cholesterol binding and biosynthesis, electron transporter activity |
MAD2L1 | + | 4q27 | Cell-cycle, mitotic checkpoint, spindle |
ZWINT | + | 10q21-q22 | Nucleus |
KIF20A | + | 5q31 | ATP binding, microtubule associated complex |
CDCA8 | + | 1p34.3 | Cytokinesis |
KIAA0101 | + | 15q22.31 | PCNA associated factor |
TIMELESS | + | 12q12-q13 | Development, negative regulation of transcription |
PTTG1 | + | 5q35.1 | DNA metabolism, repair, replication and chromosome cycle |
WSB2 | + | 12q24.23 | Intracellular signaling cascade |
ABCC5 | + | 3q27 | ATP binding, ATPase activity, transmembrane movement |
KIF23 | + | 15q23 | ATP binding, microtubule complex/motor activity, mitosis |
H2AFY | + | 5q31.3-q32 | DNA binding, chromosome organization, nucleosome assembly |
Validation in three external cohorts
We next validated the 17-gene and 52-gene classifiers in the three external independent cohorts JRH-1, UPP and JRH2. The MPI associated with these classifiers induced in each of these cohorts an ordering based on the relative risks of the samples. As before, the association of the predicted risk ordering with outcome was tested by computing the D-indices and the corresponding log-rank test p values yielded their levels of significance. Remarkably, both classifiers were valid in the three external independent cohorts JRH-1, UPP and JRH-2 and performed equally well (Table 2), with statistically significant D-index values (for the 52-gene classifier) of 3.44 (95%CI 1.67 to 7.00; p < 10^{-3}), 2.80 (95%CI 1.73 to 4.54; p < 10^{-4}) and 11.26 (95%CI 3.66 to 34.57; p < 10^{-5}), respectively. The distribution of MPI values in these cohorts as well as heatmaps of gene expression of our optimal classifier confirmed the robustness of the classifier across different cohorts and platforms (Figure 2d-f). To further test the robustness of this result, we also evaluated the 10 optimal classifiers (${C}_{p}^{\ast}$, p = 1,..., 10) in the three external cohorts JRH-1, UPP and JRH-2. The median D-index and the median p value over the 10 ${C}_{p}^{\ast}$ classifiers in each of these cohorts are shown in Table 2, which also provides a comparison with the D-indices for the standard prognostic factors in ER+ breast cancer. Over all 10 ${C}_{p}^{\ast}$ classifiers, the D-index ranged from 2.04 to 3.96 in JRH-1, from 2.39 to 3.04 in UPP, and from 5.08 to 12.61 in JRH-2, with p values in all cases statistically significant (p < 0.05). It is noteworthy that all 10 molecular classifiers ${C}_{p}^{\ast}$ predicted prognosis in the external sets as well as in the independent test sets of the training cohorts (Table 2), a strong indication that the molecular classifiers were not overfitted to the training data.
Molecular versus classical prognostic indices
Table 2 also shows that the molecular prognostic classification did not outperform standard histopathological prognostic factors. Notably, in two of the external studies it did not outperform a modified NPI [17] (see Materials and methods), which was overall the best prognostic indicator.
Multivariate D-index analysis
1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | Opt. | |
---|---|---|---|---|---|---|---|---|---|---|---|
JRH-1 | |||||||||||
MPI (A) | 0.21 | 0.19 | 0.03 | 0.03 | 0.05 | 0.06 | 0.43 | 0.12 | 0.04 | 0.16 | 0.15 |
Grade | 0.06 | 0.07 | 0.11 | 0.05 | 0.10 | 0.11 | 0.05 | 0.08 | 0.10 | 0.10 | 0.05 |
Node status | 0.79 | 0.73 | 0.96 | 0.86 | 0.93 | 0.89 | 0.65 | 0.87 | 0.83 | 0.91 | 0.91 |
Size | 0.07 | 0.01 | 0.02 | 0.05 | 0.03 | 0.05 | 0.04 | 0.02 | 0.02 | 0.11 | 0.11 |
MPI (B) | 0.22 | 0.31 | 0.08 | 0.06 | 0.13 | 0.14 | 0.42 | 0.18 | 0.11 | 0.16 | 0.16 |
NPI | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | 0.01 | <0.005 | <0.005 | <0.005 | 0.01 | 0.01 |
UPP | |||||||||||
MPI (A) | 0.01 | 0.01 | 0.06 | 0.01 | 0.02 | 0.01 | <0.005 | 0.01 | 0.01 | 0.01 | 0.01 |
Grade | 0.72 | 0.84 | 0.85 | 0.90 | 0.98 | 0.99 | 0.73 | 0.83 | 0.85 | 0.88 | 0.83 |
Node status | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 |
Size | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.02 | 0.03 | 0.02 | 0.02 | 0.02 |
MPI (B) | 0.02 | 0.03 | 0.12 | 0.02 | 0.04 | 0.03 | 0.01 | 0.02 | 0.02 | 0.02 | 0.02 |
NPI | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 | 0.01 | <0.005 | <0.005 | <0.005 | <0.005 | <0.005 |
JRH-2 | |||||||||||
MPI (A) | 0.01 | 0.01 | 0.01 | 0.01 | 0.02 | 0.01 | 0.01 | 0.04 | 0.01 | <0.005 | 0.01 |
Grade | 0.19 | 0.10 | 0.24 | 0.24 | 0.07 | 0.15 | 0.23 | 0.13 | 0.21 | 0.43 | 0.10 |
Node status | 0.44 | 0.16 | 0.76 | 0.45 | 0.24 | 0.35 | 0.67 | 0.50 | 0.25 | 0.37 | 0.24 |
Size | 0.12 | 0.73 | 0.28 | 0.28 | 0.93 | 0.82 | 0.27 | 0.47 | 0.35 | 0.36 | 0.97 |
MPI (B) | <0.005 | <0.005 | 0.01 | 0.01 | <0.005 | 0.01 | <0.005 | 0.01 | <0.005 | <0.005 | <0.005 |
NPI* | 0.51 | 0.53 | 0.75 | 0.58 | 0.97 | 0.78 | 0.99 | 0.63 | 0.50 | 0.47 | 0.96 |
Hybrid models to evaluate prognostic added value of MPI
The prognostic added value of the MPI
Model | JRH-1 | UPP | JRH-2^{†} |
---|---|---|---|
Grade | 3.85 | 2.55 | 2.15 |
Grade + MPI* | 5.85 (2.49-13.72) | 2.85 (1.79-4.51) | 10.60 (2.79-40.20) |
Grade + MPI** | 4.62 (2.15-9.91) | 2.90 (1.83-4.58) | 8.14 (2.11-31.31) |
Node Status | 2.64 | 4.03 | 2.36 |
Node Status + MPI* | 2.98 (1.48-6.01) | 4.71 (2.83-7.86) | 14.07 (2.08-94.84) |
Node Status + MPI** | 3.09 (1.53-6.23) | 4.40 (2.74-7.06) | 11.79 (1.86-74.44) |
Size | 4.16 | 3.18 | 3.04 |
Size + MPI* | 5.40 (2.51-11.62) | 3.41 (2.16-5.38) | 5.21 (2.29-11.84) |
Size + MPI** | 4.88 (2.35-10.13) | 3.65 (2.29-5.81) | 4.51 (2.08-9.77) |
NPI | 5.16 | 3.82 | 3.78 |
NPI + MPI* | 5.85 (2.54-13.47) | 4.02 (2.52-6.41) | 19.11 (2.62-139.3) |
NPI + MPI** | 4.93 (2.30-10.56) | 4.04 (2.53-6.44) | 25.23 (2.53-251.6) |
Gene Ontology
Enrichment of gene ontologies among the top 100 prognostic genes was studied using the Gene Ontology (GO) Tree Machine (GOTM) [29]. Not surprisingly, and in agreement with previous studies [11, 18], most of the genes (23/100, p < 10^{-9}) were associated with mitotic cell-cycle functions. In terms of molecular function, nucleid acid and ATP binding was also significantly overrepresented (26/100, p < 10^{-3}). Furthermore, most genes were associated with intracellular component (62/100, p < 10^{-4}). Interestingly, other significantly overrepresented biological processes included microtubule cytoskeleton organization and biogenesis and DNA metabolism. Similar results were obtained for the top 150 and 200 prognostic genes. Summary gene functions for the top 17 and 52 prognostic genes are shown in Table 3 and Additional data file 3, respectively, while the detailed summaries can be found in Additional data files 7, 8, 9.
Overlap with other prognostic gene lists
Finally, we considered the overlap of our 52 prognostic classifier with the four main molecular prognostic gene lists presented in [4, 10–12] (Additional data file 10). Interestingly, the strongest overlap was with the 97 gene list reported in [4], where we found 20 genes in common, and which may explain the better prognostic performance in this cohort, although a mere sample size effect cannot be excluded. Among these 20 genes are well-known prognostic genes in breast cancer (for example, BIRC5, BUB1B, CDC2, MAD2L1, MYBL2, STK6). The overlap with the other three prognostic signatures was weaker: a 2-gene overlap (ATAD2, CCNE2) with the 76-gene signature of [11], an 8-gene overlap (CCNE2, BIRC5, STK6, EZH2, BM039, PSMD7, PRAME, MAD2L1) with the 231 prognostic genes of [10], and a 12-gene overlap with the 70-gene signature of [12].
Discussion
The D-index [15, 16] has three key properties that make it particularly suited as a measure of prognostic separation. First, it does not require the MPI to be recalibrated since it is invariant under monotonic transformations that preserve the risk-ordering of samples. Second, because it does not require the MPI to be dichotomized, it allows for uni-modal MPI distributions. Indeed, using various pattern recognition algorithms [30, 31], we verified that bi-modality is very often absent from the MPI profiles. Third, because it doesn't use a prospectively defined cut-off it avoids the pitfalls associated with using such a cut-off when evaluating the prognostic performance of a classifier in external cohorts of widely different characteristics. Thus, the D-index provides a more reliable and objective measure of prognostic separation for evaluating classifiers across multiple independent data sets and platforms than, for example, the hazard ratio or the area under the curve. While dichotomization of a prognostic index into good and poor prognostic classes is necessary for clinical decision making, for the purposes of our work dichotomization of the MPI was not necessary.
Using the D-index in a meta-analysis of three ER+ breast cancer microarray data sets, we derived an optimal molecular classifier of 52 genes with an associated rule for computing a MPI and successfully validated it in three completely independent external cohorts. Moreover, we showed that a slightly less optimal but much simpler classifier made up of only 17 genes performed comparably to the 52-gene classifier across all six studies.
The optimal 52-gene classifier showed a notable overlap of 20 genes with the grade-derived prognostic signature reported in [4], which is perhaps not surprising given that the latter signature was prognostic in up to 5 breast cancer cohorts. Intriguingly though, the grade-derived signature was not validated in a large available cohort [11], raising doubts as to its wider applicability. Importantly, and in spite of the significant overlap between our optimal classifier and the grade-derived signature reported in [4], we found that our optimal classifier performed independently of grade. In addition, we verified that our optimal classifier performed independently of the ER gene expression level (data not shown) in ER+ tumors. The overlap of the 52-gene classifier with either van't Veer's or Wang's prognostic signature was smaller, yet these two signatures also fail to validate in each other's data set. We believe that all these results strongly support the validity of the 52-gene and 17-gene prognostic signatures and that we have successfully honed in on a core set of prognostic genes for ER+ breast cancer, to be tested further in prospective clinical studies.
The D-index also provided us with a framework in which to objectively evaluate the molecular prognostic index against classical prognostic factors in external cohorts. We found that molecular classifiers may increase prognostic separability when added to single prognostic factors, such as grade or node status. However, in agreement with [20], we didn't find the molecular prognostic index to either outperform or add prognostic value to the NPI. In fact, our analyses showed that the degree of improvement in prognostic separation over the NPI was strongly dependent on the cohort considered, indicating that larger cohorts of more uniform characteristics will be needed to rigorously elucidate the future clinical role of molecular prognostic classifiers in breast cancer.
Conclusion
The molecular classifier derived here is the first molecular prognostic classification scheme that is valid across six major breast cancer studies representing a total of 877 ER+ patients profiled over three different platforms. In order to further test this prognostic classifier and to fully evaluate the prognostic value it adds over standard prognostic factors such as the NPI, we propose a multi-institutional study that profiles the consensus set of genes identified here over larger and more homogeneous cohorts using either quantitative RT-PCR or custom-made arrays.
Materials and methods
Internal data set
The cohort of 135 primary breast tumors was profiled using Agilent Human 1A 60-mer (Agilent Technologies, Santa Clara, CA, USA) oligonucleotide microarrays containing 22,575 features (19,061 genes and 3,514 control spots) [12]. Details regarding RNA amplification, labeling, hybridization and scanning are as described previously [32, 33]. Feature extraction, normalization of the raw data and data filtering were performed using the Agilent G2567AA Feature Extraction software (Agilent Technologies) and Spotfire DecisionSite 8.0 (Somerville, MA, USA). This resulted in a normalized matrix of 8,278 genes (Additional data file 11). The clinical data are also summarized in Additional data file 11.
External data sets and gene annotation
The external microarray breast cancer data sets considered in this work are described in [4, 11, 18, 25, 26]. For these cohorts we used the normalized data, which are available in the public domain (see references). The retrieved data sets were further normalized, if necessary, by transforming them onto a common log2-scale and shifting the median of each array to zero. We also created an automated computational pipeline (Perl scripts on a Linux platform) to cross-link the annotation provided for each dataset with UniGene. For some datasets, the linkage relied on Ensembl [34] external database identifiers. Thus, each probe was associated with a universal gene name. This procedure generated a non-redundant set of gene identifiers for the subsequent meta-analysis.
The D-index measure for prognostic separation
Here, we briefly review the D-index measure for prognostic separation as proposed in [16]. A classifier C induces on a set of n samples with gene expression vectors (${\overrightarrow{x}}_{1},{\overrightarrow{x}}_{2},\mathrm{...},{\overrightarrow{x}}_{n}$) a 'risk ordering' based on the relative magnitude of the continuous prognostic indices PI_{ k }= PI(${\overrightarrow{x}}_{\kappa}$) (k = 1,..., n). Given outcome data O = (T × E)^{ n }, where T ∈ [0, t_{ max }] and E ∈ {0,1} represent the time to event and event type random variables, respectively, one may evaluate the prognostic separation predicted by C by a Cox-proportional hazards regression:
log(h_{ i }(t)) = log(h_{0}(t)) + PI_{ i }∀i = 1,..., n. (2)
and estimating the log-rank test p value. A difficulty with this approach is that, generally, the prognostic index needs recalibrating in the independent data sets where prognostic separation is to be evaluated. To overcome this difficulty a robust measure of prognostic separation that does not need model recalibration has been proposed. It is obtained by considering only the relative risk ordering of the samples and then evaluating this risk ordering against the actual outcome data. Specifically, let us assume that C induces the ordering (i_{1}, i_{2},..., i_{ n }), so that $P{I}_{{i}_{1}}\le P{I}_{{i}_{2}}\le \mathrm{...}\le P{I}_{{i}_{n}}$. Assume further that the PI_{ i }are normally distributed (this assumption is not crucial to the argument as similar results hold for PI that are not normally distributed [16]), so that they can be expressed in terms of the standard gaussian (ordered) rankits (u_{1},..., u_{ n }) as:
where ε_{ j }denote the error terms, μ is the mean of the PI distribution and σ denotes the standard deviation of the PI and is a direct measure of prognostic separation. A robust measure of prognostic separation can now be obtained by regressing the outcome data against the scaled rankits:
${z}_{j}=\sqrt{\frac{\pi}{8}}{u}_{j}$
that is,
log(${h}_{{i}_{j}}$(t)) = b(t) + σ*z_{ j }∀j = 1,..., n. (4)
and estimating the coefficient σ*. Note that the mean μ has been absorbed into the baseline hazard function. As explained in more detail in [16], the scaling of the rankits ensures the interpretability of σ* as a generalized log-hazard ratio. We adopt here a slightly different convention to [16] and define the D-index, D, as D ≡ ${e}^{{\sigma}^{*}}$.
The D-index, in contrast to the hazard ratio (HR) [35] and the Brier Score [36], combines interpretability, precision (confidence intervals can be readily computed) and robustness (because it only depends on the relative risk ordering of the samples, it is invariant under monotonic recalibration transformations). Ties in the PI are treated by averaging the corresponding rankits as explained in [16]. In the extreme case of a binary prognostic index PI ∈ {0,1}, it can be shown that D ≤ HR and D ≈ HR when the imbalance between 1s and 0s is small.
Derivation of the molecular prognostic index
We first decided which data sets to use for training and deriving an optimal molecular prognostic classifier and which to leave out for external independent validation tests. Denoting S_{ train }and S_{ test }as the set of training studies and test studies, respectively, we then divided each of the cohorts in S_{ train }randomly into 10 different training-test set partitions. The partitioning was performed ensuring equal proportions of events (death or distant metastasis) in training and test sets and to ensure approximately equal numbers of training samples in each training cohort. Next, genes were normalized to have mean zero and unit standard deviation across the training samples in each training cohort separately. For each training cohort and training set we then performed univariate Cox-regressions over the G genes common to all studies in S_{ train }. Let $\overrightarrow{\beta}$_{ sp }and $\overrightarrow{Q}$_{ sp }denote, for study s ∈ S_{ train }and partition p ∈ {1,...,10}, the vectors of G estimated regression coefficients and G Cox-scores, respectively. We then computed for each partition p the average coefficient vector $\overrightarrow{\widehat{\beta}}$_{ p }and average Cox-score vector $\overrightarrow{\widehat{Q}}$as:
${\overrightarrow{\widehat{\beta}}}_{p}\equiv \frac{1}{\left|{S}_{train}\right|}{\displaystyle \sum _{s\in {S}_{train}}{\overrightarrow{\beta}}_{sp}}\left(5\right)$
and similarly for $\overrightarrow{\widehat{Q}}$_{ p }. We next ranked for each partition p the G genes according to their average Cox-scores across the training studies. Let $\overrightarrow{r}$_{ p }= (r_{1p},..., r_{ Gp }) where r_{ gp }specifies, for partition p, the position in {1,..., G} of the gth ranked gene. The following procedure was then carried out for each partition p to obtain an optimal molecular classifier ${C}_{p}^{\ast}$:
First, let n denote the number of top ranked genes in the classifier set. Set n = 1.
Second, for every test sample i in each of the training cohorts s we compute a molecular prognostic index:
$MP{I}_{isp}^{(n)}={\displaystyle {\sum}_{g=1}^{n}{\widehat{\beta}}_{{r}_{gp}p}{x}_{{r}_{gp}is}}$
where ${x}_{{r}_{gp}is}$ denotes the normalized expression of gene r_{ gp }in sample i of training study s ∈ S_{ train }. The expression normalization is done using the mean and standard deviation from the training set.
Third, for each partition p and training study s we compute the D-index on the test samples as explained in the previous subsection. This yields a value ${D}_{sp}^{(n)}$.
Fourth, compute the average D-index over the training studies:
${\widehat{D}}_{p}^{(n)}={\displaystyle {\sum}_{s\in {S}_{train}}{w}_{s}{D}_{sp}^{(n)}}$
where w_{ s }denotes the weights for each training study to take account of the varying numbers of samples in the test sets of the training studies.
Fifth, increment n by 1 and repeat steps two to five until n ≤ n_{ max }.
Sixth, let ${n}_{p}^{\ast}$ = {n : max(${\widehat{D}}_{p}^{(n)}$) and n ≥ 10} denote the optimal number of top-ranked genes. Thus, for each partition p we have an optimal classifier ${C}_{p}^{\ast}$ defined by a set of ${n}_{p}^{\ast}$ genes and associated average regression coefficients {${\widehat{\beta}}_{{r}_{1p}p},\mathrm{...},{\widehat{\beta}}_{{r}_{{n}_{p}^{*}p}p}$} and the rule in step-2 for computing a continuous MPI for independent samples.
Two notes with this procedure are in order: n_{ max }can be estimated by evaluating the statistical significance of the average ranking position of the common genes across the training studies - for our purposes, setting n_{ max }= 100 led to suitable optimal classifiers; and we constrained n* to be larger than 10 in order to ensure a significant number of overlapping genes for the independent validation tests.
Validation in independent data sets
The above procedure yields for each choice of training-test set partition in the training studies a molecular classifier and an associated rule for computing a continuous molecular prognostic index for independent test samples. For an independent test sample i in test study s ∈ S_{ test }we compute its molecular prognostic index as:
$MP{I}_{isp}^{*}={\displaystyle \sum _{g=1}^{{n}_{p}^{*}}{\widehat{\beta}}_{{r}_{gp}p}{x}_{{r}_{gp}is}}\left(6\right)$
where ${x}_{{r}_{gp}is}$ has been normalized as before. Recalibration of the MPI values is necessary if a prognostic meaning is to be attached to these values. This is difficult because of inter-cohort variability. The D-index measure of prognostic separation, however, overcomes this difficulty since it only depends on the relative risk ordering of the external samples. Thus, for each test study s and partition p we can evaluate the prognostic separation of the molecular classifier ${C}_{p}^{\ast}$ by computing the D-index D_{ sp }and the associated log-rank test p values.
Hybrid or augmented prognostic models
To evaluate whether the molecular prognostic index improved prognostic separation over histopathological classifiers we considered hybrid augmented models within the D-index framework. Specifically, we assume that we have a rule to compute a MPI (as given in the last section) for a sample i in a new cohort, which we denote by MPI_{ i }. Suppose further we have a histopathological prognostic index value for each sample i in this new cohort, which we denote by SPI_{ i }. Both indices induce a relative risk ordering of the samples. Let ${r}_{i}^{M}$ and ${r}_{i}^{P}$ denote the rank position of sample i as predicted by the prognostic indices MPI and SPI, respectively. It is then clear that the weighted average rank position of sample i over the two indices, that is:
${r}_{i}^{H}\equiv {w}_{M}{r}_{i}^{M}+{w}_{P}{r}_{i}^{P}$
with w_{ M }+ w_{ P }= 1 represents on overall relative risk of sample i as predicted by both indices. Finally, the D-index of this hybrid prognostic index, HPI, can be evaluated using the same procedure as before.
The Nottingham Prognostic Index
Because the precise number of positive axillary nodes was not known in two of the external studies (UPP and JRH-2), we used here a slightly different definition for the NPI:
NPI = 0.2 × (Tumor_Size [cm]) + Grade + 1.5 × (Node_Status) + 1
where Node Status can be positive (1) or negative (0) and Grade can be 1, 2 or 3. While our modified NPI gave slightly smaller D-index values than the NPI in those cohorts where axillary node information was available, the difference between the two values was minimal, thus validating our definition.
Software used
All computations were carried out using the R-software environment version 2.2.1 [37]. We made use of the R-packages survival, mclust and vabayelMix. R-scripts that apply the algorithms as implemented here are available on request.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a figure showing the weighted average D-index over test sets in the training cohorts. Additional data file 2 is a figure showing the D-index variation as a function of top-ranked genes in the overall molecular classifier. Additional data file 3 lists the 52-gene optimal classifier. Additional data files 4, 5 and 6 are figures showing the D-index of hybrid equal-and-variable weight prognostic models in independent cohorts. Additional data files 7, 8 and 9 contain the GO analysis results for the top 200 prognostic genes, as produced by the GOTM [29]. Additional data file 10 shows the overlap of our 52-gene classifier with the prognostic signatures in [4, 10–12]. Additional data file 11 contains the clinical and gene expression data of the NCH cohort.
Notes
Declarations
Acknowledgements
This research was supported by grants from Cancer Research UK, the Cambridge-MIT Institute and a grant from the Isaac Newton Trust to Simon Tavaré. NLBM is supported by Fellowship PRAXIS XXI SFRH/BD/2914/2000 from Fundação para a Ciência e a Tecnologia (Portugal)/European Social Fund (3rd Framework Programme). JDB is a CR-UK Senior Clinical Research Fellow. We wish to thank Max Parmar and Patrick Royston for their helpful advice on survival analysis.
Authors’ Affiliations
References
- Simon R: Development and validation of therapeutically relevant multi-gene biomarker classifiers. J Natl Cancer Inst. 2005, 97: 866-867.PubMedView ArticleGoogle Scholar
- Paik S, Shak S, Tang G, Kim C, Baker J, Cronin M, Baehner FL, Walker MG, Watson D, Park T, et al: A multigene assay to predict recurrence of tamoxifen-treated, node-negative breast cancer. N Engl J Med. 2004, 351: 2817-2826. 10.1056/NEJMoa041588.PubMedView ArticleGoogle Scholar
- Pawitan Y, Bjohle J, Amler L, Borg AL, Egyhazi S, Hall P, Han X, Holmberg L, Huang F, Klaar S, et al: Gene expression profiling spares early breast cancer patients from adjuvant therapy: derived and validated in two population-based cohorts. Breast Cancer Res. 2005, 7: R953-R964. 10.1186/bcr1325.PubMedPubMed CentralView ArticleGoogle Scholar
- Sotiriou C, Wirapati P, Loi S, Harris A, Fox S, Smeds J, Nordgren H, Farmer P, Praz V, Haibe-Kains B, et al: Gene expression profiling in breast cancer: understanding the molecular basis of histologic grade to improve prognosis. J Natl Cancer Inst. 2006, 98: 262-272.PubMedView ArticleGoogle Scholar
- Foekens JA, Atkins D, Zhang Y, Sweep FC, Harbeck N, Paradiso A, Cufer T, Sieuwerts AM, Talantov D, Span PN, et al: Multicenter validation of a gene expression-based prognostic signature in lymph node-negative primary breast cancer. J Clin Oncol. 2006, 24: 1665-1671. 10.1200/JCO.2005.03.9115.PubMedView ArticleGoogle Scholar
- Fan C, Oh DS, Wessels L, Weigelt B, Nuyten DS, Nobel AB, van't Veer LJ, Perou CM: Concordance among gene-expression-based predictors for breast cancer. N Engl J Med. 2006, 355: 560-569. 10.1056/NEJMoa052933.PubMedView ArticleGoogle Scholar
- Hu Z, Fan C, Oh DS, Marron JS, He X, Qaqish BF, Livasy C, Carey LA, Reynolds E, Dressler L, et al: The molecular portraits of breast tumors are conserved across microarray platforms. BMC Genomics. 2006, 7: 96-10.1186/1471-2164-7-96.PubMedPubMed CentralView ArticleGoogle Scholar
- Ein-Dor L, Kela I, Getz G, Givol D, Domany E: Outcome signature genes in breast cancer: is there a unique set?. Bioinformatics. 2005, 21: 171-178. 10.1093/bioinformatics/bth469.PubMedView ArticleGoogle Scholar
- Ein-Dor L, Zuk O, Domany E: Thousands of samples are needed to generate a robust gene list for predicting outcome in cancer. Proc Natl Acad Sci USA. 2006, 103: 5923-5928. 10.1073/pnas.0601231103.PubMedPubMed CentralView ArticleGoogle Scholar
- van't Veer LJ, Dai H, van de Vijver MJ, He YD, Hart AA, Mao M, Peterse HL, van der Kooy K, Marton MJ, Witteveen AT, et al: Gene expression profiling predicts clinical outcome of breast cancer. Nature. 2002, 415: 530-536. 10.1038/415530a.View ArticleGoogle Scholar
- Wang Y, Klijn JG, Zhang Y, Sieuwerts AM, Look MP, Yang F, Talantov D, Timmermans M, Meijer-van Gelder ME, Yu J, et al: Gene-expression profiles to predict distant metastasis of lymph-node-negative primary breast cancer. Lancet. 2005, 365: 671-679.PubMedView ArticleGoogle Scholar
- Naderi A, Teschendorff AE, Barbosa-Morais NL, Pinder SE, Green AR, Powe DG, Robertson JF, Aparicio S, Ellis IO, Brenton JD, Caldas C: A gene-expression signature to predict survival in breast cancer across independent data sets. Oncogene. 2006, Aug 28; [Epub ahead of print] doi: 10.1038/sj.onc.1209920Google Scholar
- Brenton JD, Carey LA, Ahmed AA, Caldas C: Molecular classification and molecular forecasting of breast cancer: ready for clinical application?. J Clin Oncol. 2005, 23: 7350-7360. 10.1200/JCO.2005.03.3845.PubMedView ArticleGoogle Scholar
- Royston P, Altman DG, Sauerbrei W: Dichotomizing continuous predictors in multiple regression: a bad idea. Stat Med. 2006, 25: 127-141. 10.1002/sim.2331.PubMedView ArticleGoogle Scholar
- Royston P, Parmar MK, Sylvester R: Construction and validation of a prognostic model across several studies, with an application in superficial bladder cancer. Stat Med. 2004, 23: 907-926. 10.1002/sim.1691.PubMedView ArticleGoogle Scholar
- Royston P, Sauerbrei W: A new measure of prognostic separation in survival data. Stat Med. 2004, 23: 723-748. 10.1002/sim.1621.PubMedView ArticleGoogle Scholar
- Galea MH, Blamey RW, Elston CE, Ellis IO: The Nottingham Prognostic Index in primary breast cancer. Breast Cancer Res Treat. 1992, 22: 207-219. 10.1007/BF01840834.PubMedView ArticleGoogle Scholar
- van de Vijver MJ, He YD, van't Veer LJ, Dai H, Hart AA, Voskuil DW, Schreiber GJ, Peterse JL, Roberts C, Marton MJ, et al: A gene-expression signature as a predictor of survival in breast cancer. N Engl J Med. 2002, 347: 1999-2009. 10.1056/NEJMoa021967.PubMedView ArticleGoogle Scholar
- Chang HY, Nuyten DS, Sneddon JB, Hastie T, Tibshirani R, Sorlie T, Dai H, He YD, van't Veer LJ, Bartelink H, et al: Robustness, scalability, and integration of a wound-response gene expression signature in predicting breast cancer survival. Proc Natl Acad Sci USA. 2005, 102: 3738-3743. 10.1073/pnas.0409462102.PubMedPubMed CentralView ArticleGoogle Scholar
- Eden P, Ritz C, Rose C, Ferno M, Peterson C: 'Good Old' clinical markers have similar power in breast cancer prognosis as microarray gene expression profilers. Eur J Cancer. 2004, 40: 1837-1841. 10.1016/j.ejca.2004.02.025.PubMedView ArticleGoogle Scholar
- Rhodes DR, Yu J, Shanker K, Deshpande N, Varambally R, Ghosh D, Barrette T, Pandey A, Chinnaiyan AM: Large-scale meta-analysis of cancer microarray data identifies common transcriptional profiles of neoplastic transformation and progression. Proc Natl Acad Sci USA. 2004, 101: 9309-9314. 10.1073/pnas.0401994101.PubMedPubMed CentralView ArticleGoogle Scholar
- Segal E, Friedman N, Koller D, Regev A: A module map showing conditional activity of expression modules in cancer. Nat Genet. 2004, 36: 1090-1098.PubMedView ArticleGoogle Scholar
- Shen R, Ghosh D, Chinnaiyan AM: Prognostic meta-signature of breast cancer developed by two-stage mixture modeling of microarray data. BMC Genomics. 2004, 5: 94-10.1186/1471-2164-5-94.PubMedPubMed CentralView ArticleGoogle Scholar
- Parmigiani G, Garrett ES, Anbazhagan R, Gabrielson E: A statistical framework for expression-based molecular classification in cancer. J Roy Stat Soc B. 2002, 64: 717-736. 10.1111/1467-9868.00358.View ArticleGoogle Scholar
- Sotiriou C, Neo SY, McShane LM, Korn EL, Long PM, Jazaeri A, Martiat P, Fox SB, Harris AL, Liu ET: Breast cancer classification and prognosis based on gene expression profiles from a population-based study. Proc Natl Acad Sci USA. 2003, 100: 10393-10398. 10.1073/pnas.1732912100.PubMedPubMed CentralView ArticleGoogle Scholar
- Miller LD, Smeds J, George J, Vega VB, Vergara L, Ploner A, Pawitan Y, Hall P, Klaar S, Liu ET, Bergh J: An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc Natl Acad Sci USA. 2005, 102: 13550-13555. 10.1073/pnas.0506230102.PubMedPubMed CentralView ArticleGoogle Scholar
- Michiels S, Koscielny S, Hill C: Prediction of cancer outcome with microarrays: a multiple random validation strategy. Lancet. 2005, 365: 488-492. 10.1016/S0140-6736(05)17866-0.PubMedView ArticleGoogle Scholar
- Kaufman L, Rousseeuw PJ: Finding Groups in Data: An Introduction to Cluster Analysis. 1990, New York: WileyView ArticleGoogle Scholar
- Zhang B, Schmoyer D, Kirov S, Snoddy J: GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics. 2004, 5: 16-10.1186/1471-2105-5-16.PubMedPubMed CentralView ArticleGoogle Scholar
- Yeung KY, Fraley C, Murua A, Raftery AE, Ruzzo WL: Model-based clustering and data transformations for gene expression data. Bioinformatics. 2001, 17: 977-987. 10.1093/bioinformatics/17.10.977.PubMedView ArticleGoogle Scholar
- Teschendorff AE, Wang Y, Barbosa-Morais NL, Brenton JD, Caldas C: A variational Bayesian mixture modelling framework for cluster analysis of gene-expression data. Bioinformatics. 2005, 21: 3025-3033. 10.1093/bioinformatics/bti466.PubMedView ArticleGoogle Scholar
- Naderi A, Ahmed AA, Barbosa-Morais NL, Aparicio S, Brenton JD, Caldas C: Expression microarray reproducibility is improved by optimising purification steps in RNA amplification and labelling. BMC Genomics. 2004, 5: 9-10.1186/1471-2164-5-9.PubMedPubMed CentralView ArticleGoogle Scholar
- Naderi A, Ahmed AA, Wang Y, Brenton JD, Caldas C: Optimal amounts of fluorescent dye improve expression microarray results in tumor specimens. Mol Biotechnol. 2005, 30: 151-154. 10.1385/MB:30:2:151.PubMedView ArticleGoogle Scholar
- Hubbard T, Barker D, Birney E, Cameron G, Chen Y, Clark L, Cox T, Cuff J, Curwen V, Down T, et al: The Ensembl genome database project. Nucleic Acids Res. 2002, 30: 38-41. 10.1093/nar/30.1.38.PubMedPubMed CentralView ArticleGoogle Scholar
- Cox DR, Oakes D: Analysis of survival data. Monographs on Statistics and Applied Probability 21. 1984, London: Chapman and HallGoogle Scholar
- Graf E, Schmoor C, Sauerbrei W, Schumacher M: Assessment and comparison of prognostic classification schemes for survival data. Stat Med. 1999, 18: 2529-2545. 10.1002/(SICI)1097-0258(19990915/30)18:17/18<2529::AID-SIM274>3.0.CO;2-5.PubMedView ArticleGoogle Scholar
- The R Project for Statistical Computing. [http://www.R-project.org]
Copyright
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.