- Open Access
Searching for differentially expressed gene combinations
© Dettling et al.; licensee BioMed Central Ltd. 2005
- Received: 4 April 2005
- Accepted: 8 August 2005
- Published: 19 September 2005
We propose 'CorScor', a novel approach for identifying gene pairs with joint differential expression. This is defined as a situation with good phenotype discrimination in the bivariate, but not in the two marginal distributions. CorScor can be used to detect phenotype-related dependencies and interactions among genes. Our easily interpretable approach is scalable to current microarray dimensions and yields promising results on several cancer-gene-expression datasets.
- Acute Myeloid Leukemia
- Gene Pair
- Additional Data File
- Conditional Correlation
- Permutation Distribution
Gene-expression monitoring by microarray technologies has become an important approach in biological and medical research over the past decade. A common experimental design is the comparison of two sets of samples from different phenotypes (diseases and normal tissue), with the goal of searching for genes showing differential expression. This is usually done via statistical testing procedures and, often, subsequent multiple testing corrections. Prominent examples include t-testing, significance analysis of microarrays , and empirical Bayes analysis . A comprehensive review of such approaches can be found in Pan . All these methods use a one-gene-at-a-time strategy, considering only the association between single genes and the phenotype.
Many approaches for classification of phenotypes using microarrays do consider multiple genes simultaneously, but they address a different question, as their goal is to produce sets of differentially expressed genes for use in class prediction [4–8]. While interesting, these approaches have the limitation that they cannot be applied comprehensively to all possible pairs, that is, there currently are no practical tools for exploring phenotype-related dependencies and interactions among all gene pairs in large datasets. In this paper we present a methodology for addressing this issue, and we show that it can find interesting biological relationships that would be missed by existing approaches.
A more complex case is shown in our second artificial example, in the right panel of Figure 1. There is no obvious demarcation in space and, again, neither of the two genes carries information on its own. However, together they do. Biologically speaking, this example could reflect an 'on/off situation'. If both genes are off (expression values below 1.5 units), or both genes are on (expression value above 1.5 units), we observe the red-circle phenotype. In contrast, if only one of the genes is turned on, the blue-triangle phenotype is predominant.
Statistically, we define joint differential expression as good phenotype discrimination by the joint distribution, but not by the univariate marginal distributions of two genes. From a functional genomics perspective, such pairs could represent interesting novel biological interactions, as for example genes that are in the same pathway.
The identification of gene pairs with joint differential expression is ambitious for several reasons. First, gene pair identification is subject to the curse of dimensionality. While the usual number p of genes is in the tens of thousands, the number of gene pairs is p(p-1)/2, usually in the millions. Second, there are no existing and quickly computable test statistics that exactly address our notion of joint differential expression. Existing bivariate tests such as Hotelling's T2  only screen for differences in the bivariate mean vectors and will thus favor pairs that consist of genes with strong marginal effects. Third, identifying joint differential expression based on comparing predictive models for pairs and single genes is conceptually sound but is unattractive because of its prohibitive computational burden.
Here we propose a novel, efficient, and scalable approach for searching gene pairs with joint differential expression. It relies on calculating an appropriately defined test statistic from the unconditional as well as the class-conditional correlation matrices. Therefore, we call our method CorScor, as a shorthand for correlation scoring. Its biggest advantages are its straightforward interpretation and the fact that it can be calculated very quickly, which allows for an exhaustive search among the millions of pairs even in large gene-expression datasets. On the basis of several gene-expression datasets from the literature, we illustrate our method and collect empirical evidence that it yields gene pairs that have a tendency to share biological relationships.
We illustrate the power and utility of our method with a comprehensive analysis of two datasets, and display the results for two further problems in the additional data files section. The first dataset discussed in detail is from a publicly available study on colon cancer by Alon et al. [10, 11]. It originated from Affymetrix Hum6000 arrays and contains the expression values of the 2,000 genes with highest minimal intensity across 62 colon tissues, 40 of which were tumorous and 22 of which were normal. We transformed the data by a base 10 log-transformation and standardized each array to zero mean and unit variance across genes. The second is a publicly available breast cancer dataset from Hedenfalk et al. [12, 13]. The data were obtained from Stanford-type cDNA microarrays, monitoring 2,654 genes across 22 breast cancer samples, 7 of which were found to carry germline BRCA1 mutations. Normalization was carried out following the approach of Yang et al. . Our selection of data illustrates that CorScor works independently of the platform. We require accurately preprocessed expression data from n samples and p genes, stored in an (n × p) matrix denoted by (x ig ). In what follows, we will encode the phenotype information generically as 0 and 1, and store it in the n-dimensional response variable y.
The gap/substitution cases
Our method for revealing genes with joint differential expression relies on computing a simple score function. Given a pair consisting of genes g and g', we determine a measure of pairwise dependence ρ(g,g') among their expression vectors. Next, by restricting in turn to just the samples from each phenotype, we obtain both class-conditional measures of dependence ρ0(g,g') and ρ1(g,g').
For finding gene pairs that jointly discriminate the two phenotypes according to a gap or substitution mechanism as shown by the artificial example in the left panel of Figure 1, we recommend computing the scoring function
S(ρ,ρ0,ρ1) = | ρ0 + ρ1 - αρ | (1)
The rationale for the success of scoring function (1) is as follows. High conditional correlations arise if the data points within each group are tightly aligned along a straight line, which can be represented by the first principal components, shown in Figure 2 by the dashed lines. Good joint differential expression requires such tight clustering and close-to-parallel axis alignment. Hence, high conditional correlations with concordant sign, and also a shift between the alignment axes, are necessary. The bigger this shift, and thus the clearer the joint separation, the lower the unconditional correlation ρ gets. Hence, we diminish the sum of ρ0 and ρ1 by αρ. By taking the absolute value, we achieve symmetric treatment of positively and negatively sloped alignment axes, that is, we can capture the gap and the substitution cases together. The scalar tuning parameter α governs the balance between separation and parallel alignment. We observed empirically good results with α∈ [1, 2], and use α = 1.5 throughout the paper.
Correlation coefficients and CorScor values for the gap/substitution scenario
Three of these six genes (GSN, ACTN1, and SPARCL1) share a common annotation in the Kyoto Encyclopedia of Genes and Genomes pathway database (KEGG ). They are all involved in the 'regulation of actin cytoskeleton'. The remaining three genes lack pathway annotation in KEGG, but an analysis of their Gene Ontology terms (GO ) still reveals a functional connection: TPM1 has the GO terms 'actin binding' and 'cytoskeleton'. SPARCL1 is involved in 'calcium ion binding', a term it shares with GSN and ACTN1.
The heat map of the BRCA1 data, shown in the right panel of Figure 3, does not show an equally pronounced block structure. The absence of KEGG annotation for a large proportion of the genes makes it challenging to carry out the same type of validation. However, consistent with the known DNA-binding function of the BRCA1 gene , many of the genes are related to binding activities. For a full overview of the genes involved in the heat maps, we refer readers to our supplementary web page .
Our findings on the colon data illustrate that CorScor has the potential to bring up gene pairs with a functional relationship, and that our heat maps are a helpful visualization tool for grouping and detecting the most important ones among them. The major benefit of CorScor, compared with established clustering techniques based on the expression values of single genes, is that we are able to capture genes without strong marginal effects. The genes involved in our pairs do not show pronounced fold changes across the phenotypes, but nevertheless seem to be key in molecular processes closely linked to the phenotype.
Another scenario in which joint differential expression is important is illustrated with the artificial example in the right panel of Figure 1. While the marginal distributions are not informative, the joint distribution clearly is: one phenotype is prevalent when the expression of both genes is either turned on or turned off, whereas the other phenotype is predominant when only one of the genes is expressed. An effective scoring function to capture these gene pairs is
S(ρ,ρ0,ρ1) = | ρ1 - ρ0 |, (2)
Correlation coefficients and CorScor values for the on/off scenario
We emphasize again that because of the very different scope, such findings could not be made with one-at-a-time gene selection and/or hierarchical clustering based on gene-expression values. Again, for this on/off-scenario, the full information and annotation of the genes that are involved in the most promising gene pairs are available from our supplementary website .
Next, we address the question of whether and how many gene pairs achieve promising score values by chance alone. We do this by performing permutation-based empirical Bayes analysis . We generate 100 noise gene-expression datasets by scrambling the phenotype labels. We then run CorScor on each of these 100 noise datasets, obtain a vector of score values with length p(p-1)/2 and rank their values. By taking the average within rank over the 100 permutations, we obtain an estimated null distribution of CorScor values.
Gene pairs exceeding quantiles
Comparison with predictive modeling
Next, we contrast the results of searching for jointly differentially expressed gene pairs by CorScor to an alternative search based on predictive modeling, implemented with logistic regression. This is also a novel method, although some ideas in this direction were presented in a conference talk by P. Wirapati . The predictive-modeling approach is far more computer intensive and currently not applicable to arrays with tens of thousands of features. We chose the following procedure for our predictive-modeling search. In the gap/substitution situation and for each gene pair (g,g'), we fitted three logistic regression models: a model with both genes as additive inputs to capture bivariate differential expression, and two univariate models with each gene as input to capture the marginal separation. This generates conditional probability estimates p i (x g , x g' ), p i (x g ), and p i (x g' ) for each observation i. We then compute three log-likelihoods on the basis of these probabilities,
The log-likelihood is a very natural measure for the amount of discrimination in binary problems. A gene pair with good joint differential expression reflecting a gap or substitution should show good discrimination for the bivariate model but comparably poor discrimination for the single-gene models. Hence, we can define a scoring function based on predictive modeling as
The on/off-scenario requires a different approach. For each gene pair (g,g'), we chose to measure the improvement in predictive accuracy when comparing a full two-gene interaction model versus a two-gene additive model. This requires generating conditional probability estimates p i (x g ,x g' ,x gg' ) and p i (x g , x g' ) using logistic regression for each observation i. These are then plugged into the log-likelihood from (3). From these, we can obtain a predictive-modeling-based scoring function for the on/off scenario via
T(g,g') = l(y,p(x g ,x g' ,x gg' )) - l(y,p(x g ,x g' )) (5)
The concordance of this measure with CorScor's output is illustrated in the right two panels of Figure 6. We observe a correlation of 0.54 in the colon data and 0.29 in the BRCA1 data, but many of CorScor's top-scoring gene pairs are not identified by predictive modeling.
All our computations were implemented in the statistical programming language R . Via its function cor, it provides a very convenient and efficient routine for estimating Pearson and Spearman gene-pair correlation coefficients from an expression matrix. In the colon and BRCA1 data, an exhaustive search across all gene pairs with CorScor takes about 5 seconds on a 1.5 GHz Intel-Pentium-powered personal computer with 512 Mb of RAM.
All our code for identifying gene pairs with joint differential expression, as well as for their visualization by scatterplots and heat maps, is available as a documented package named corscor, and will be submitted to the Bioconductor project . Links and updates can also be found on our supplementary website .
In a recent paper, Xiao and colleagues  considered multivariate searches for differentially expressed gene combinations. Their goal was to uncover subsets of predefined size k that are such that the multivariate distributions of expression in the two phenotypes differ. Similar ideas were used by the same group in the context of data exploration and variable selection [23, 24]. The goal of their approach is to uncover sets that potentially consist of combinations of joint and marginally differentially expressed genes. This is a different goal from that considered here. For example, in Figure 4, vertically shifting all the blue points would increase multivariate difference but leave the on/off scores from Equation (2) unchanged. Here, we emphasize the search for interactions per se, because of the clearer functional genomics implications, though high multivariate distance can also be of interest. The Xiao et al. approach is computationally demanding because each set is evaluated by an additional cross-validation. Comprehensive exploration of all pairs is challenging and stochastic search is necessary for subsets of three or more.
In the section 'Comparison with predictive modeling', we presented an approach to screening for joint differential expression based on predictive modeling. While this shares the scope of CorScor, it is not scalable to the current dimensions of gene-expression data. A full search with predictive modeling on the colon or the BRCA1 data with less than 3,000 genes each requires about two weeks of central processing unit time, whereas CorScor needs only about 5 seconds. Since the number of gene pairs and thus the computing time grows quadratically with the number of genes, the analysis of a roughly quintupled Affymetrix HGU133 array with more than 12,000 genes would increase the computing time by a factor of roughly 25, making the predictive-modeling approach prohibitive for practical application. We also observed that the gene pairs found by CorScor and by the predictive-modeling approach differ. To develop a better sense of the nature of the differences, we visually compared a large number of gene pairs from the two methods (not shown). The scatterplots of the top gene pairs according to the gap/substitution predictive-modeling scoring function in Equation (4) reveal that the predictive approach is very sensitive to outliers, whereas CorScor is more robust in this regard. Additionally, the joint separation is often more pronounced with CorScor. In the on/off search, visual scatterplot inspection and examination of gene annotations favor CorScor further. The predictive-modeling objective function in Equation (5) does not seem to exactly match the scope of its correlation-based counterpart and generally did not yield any gene pairs that could serve as indicators for aberrant molecular processes.
In the on/off search, in particular, a critical difference is in the fact that pairs can show strong evidence of a reversal in the sign of the conditional correlations, while still having a substantial overlap of the two conditional distributions (see for example the top left and top right pairs in Figure 4). This can lead to a high CorScor value, but leads to only a moderate predictive score, and a small multivariate distance. These cases, however, can be highly relevant biologically, and it is important to be able to identify them. In conclusion, of the two approaches that we are proposing and investigating here, CorScor is the simpler and more efficient computationally, and it also appears to identify gene pairs that are more promising candidates for a detailed biological analysis.
Another tool for finding interactions among gene pairs is relevance networks . They examine interactions among genes by thresholding covariance matrices and graphically displaying the connections among the genes whose correlations exceed the threshold. We investigated a different type of gene interactions here, namely interactions that are altered as a result of the phenotype comparison of interest. However, the type of visualization implemented in relevance networks could also be used to represent the findings of our algorithm. Moreover, our approach was illustrated here using Pearson's and Spearman's correlations, but the general idea can be extended straightforwardly to any easily computed measure of pairwise association among gene expression levels. Finally, Zhou et al.  introduced second-order expression correlations that investigate regulatory networks by exploring variation of correlations across conditions. Whereas their method focuses on concordant correlations, our approach is based on correlation differences.
In summary, this paper presents a novel approach for finding gene pairs with joint differential expression. This represents a complement to the widely used one-gene-at-a-time testing approaches and the associated list-enrichment tests. The idea behind joint differential expression is to find genes that only in pairs, and not individually, discriminate two given phenotypes. These pairs make it possible to explore dependence and interaction among genes, as well as to screen for molecular processes that are linked to disease. Since the usual number of gene pairs is in the millions, there is a need for a quickly computable criterion. We propose two scoring functions, based on conditional and unconditional correlation coefficients. We show that these measures have the ability to uncover gene pairs that show promising scatterplot patterns and tend to share a biological relationship. In cancer research, a strength of CorScor lies in its potential ability to find genes that have not traditionally been involved with cancer, as they may represent new avenues for cancer cell biology and, more importantly, therapeutic intervention.
The following additional data are available with the online version of this paper. To provide further evidence for the general applicability of the CorScor approach, we provide empirical results for four additional microarray problems as additional data files. Additional data file 1 is from a publicly available leukemia study by Armstrong et al. [27, 28]. The data originated from Affymetrix HG U95A arrays and, after our normalization, feature the expression of 6,177 genes across a total of 72 samples. For the CorScor analysis, we restricted to the binary distinction of 24 samples from acute lymphoblastic leukemias (ALL) versus 28 samples from acute myeloid leukemias (AML).
Additional data file 2 is based on a dataset from a publicly available lung cancer study of Bhattacharjee et al. [29, 30]. It also originated from Affymetrix HG U95A arrays and contains 3,171 genes after our normalization. The CorScor analysis was run on 20 carcinoid samples and 17 normal lung tissues. Additional data file 3 is a dataset from the seminal leukemia study of Golub et al. [31, 32]. It originated from Affymetrix Hu6800 arrays. The version we used after our normalization contained the expression of 3,571 genes across a total of 72 samples, 25 of which were from patients who had acute myeloid leukemias and 47 of which were from patients with acute lymphoblastic leukemia. Additional data file 4 is our analysis of publicly available cDNA arrays from Gruvberger et al. [33, 34]. The data in Additional data file 4 monitor 3,389 genes across 30 estrogen-receptor-negative and 28 estrogen-receptor-positive breast cancer samples.
The scatterplots in the additional data files clearly show the presence of joint differential expression for the gap/substitution situation in all four datasets. Again, our idea works here because the red and blue data points are tightly aligned along their respective principle component, yielding good conditional correlation. On the other hand, the two phenotypes are separated, resulting in a low overall correlation. Also, the scatterplots for the on/off-situation clearly show the presence of joint differential expression, and they confirm that that there are gene pairs with reverse correlation in the case and control samples.
In the tables in the additional data files, we report the results from the permutation test on each of the four datasets. They are qualitatively similar to the ones from the colon and BRCA1 data shown in Table 3, meaning that, again, the real gene pairs score sufficiently better than the random ones.
Work supported by NSF grant NSF034211, by the Johns Hopkins SPORE in breast cancer P50CA88843 and GI cancer P50CA62924, and by core grant P30CA06973. We thank Ben Ho Park for his useful comments.
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 5116-5121. 10.1073/pnas.091062498.PubMedPubMed CentralView ArticleGoogle Scholar
- Efron B, Tibshirani R, Storey J, Tusher V: Empirical Bayes analysis of a microarray experiment. J Am Stat Assoc. 2001, 96: 1151-1160. 10.1198/016214501753382129.View ArticleGoogle Scholar
- Pan W: A comparative review of statistical methods for discovering differentially expressed genes in replicated microarray experiments. Bioinformatics. 2002, 18: 546-554. 10.1093/bioinformatics/18.4.546.PubMedView ArticleGoogle Scholar
- Dudoit S, Fridlyand J: Classification in microarray experiments. Statistical Analysis of Gene Expression Data. Edited by: Speed T. 2003, New York: Chapman and Hall, 93-158.Google Scholar
- Dettling M, Bühlmann P: Finding predictive gene groups from microarray data. J Multivariate Anal. 2004, 90: 106-131. 10.1016/j.jmva.2004.02.012.View ArticleGoogle Scholar
- Dettling M: Bagboosting for tumor classification with gene expression data. Bioinformatics. 2004, 20: 3583-3593.PubMedView ArticleGoogle Scholar
- Li T, Zhang C, Ogihara M: A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression. Bioinformatics. 2004, 20: 2429-2437. 10.1093/bioinformatics/bth267.PubMedView ArticleGoogle Scholar
- Soukup M, Cho H, Lee J: Robust classification modeling on microarray data using misclassification penalized posterior. Bioinformatics. 2005, 21 (suppl 1): i423-i430. 10.1093/bioinformatics/bti1020.View ArticleGoogle Scholar
- Hotelling H: Multivariate quality control. Techniques of Statistical Analysis. Edited by: Eisenhart C, Hastay MW, Wallis WA. 1947, New York: McGraw-Hill, 111-184.Google Scholar
- Alon U, Barkai N, Notterdam D, Gish K, Ybarra S, Mack D, Levine A: Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc Natl Acad Sci USA. 1999, 96: 6745-6750. 10.1073/pnas.96.12.6745.PubMedPubMed CentralView ArticleGoogle Scholar
- Princeton Colorectal Cancer Research Page. [http://microarray.princeton.edu/oncology]
- Hedenfalk I, Duggan D, Chen Y, Radmacher M, Bittner M, Simon R, Meltzer P, Gusterson B, Esteller M, Raffeld M, et al: Gene-expression profiles in hereditary breast cancer. New Engl J Med. 2001, 344: 539-548. 10.1056/NEJM200102223440801.PubMedView ArticleGoogle Scholar
- Hedenfalk BRCA1 Data Supplementary Page. [http://research.nhgri.nih.gov/microarray/NEJM_Supplement]
- Yang Y, Dudoit S, Luu P, Lin D, Peng V, Ngai J, Speed T: Normalization for cDNA microarray data: a robust composite method addressing single and multiple slide systematic variation. Nucleic Acids Res. 2002, 30: e15-10.1093/nar/30.4.e15.PubMedPubMed CentralView ArticleGoogle Scholar
- Kanehisa M, Goto S: KEGG: Kyoto Encyclopedia of Genes and Genomes. Nucleic Acids Res. 2000, 28: 27-30. 10.1093/nar/28.1.27.PubMedPubMed CentralView ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology: the Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- Paull T, Cortez D, Bowers B, Elledge S, Gellert M: From the cover: direct DNA binding by BRCA1. Proc Natl Acad Sci USA. 2001, 98: 6086-6091. 10.1073/pnas.111125998.PubMedPubMed CentralView ArticleGoogle Scholar
- Marcel Dettling's Joint Differential Expression Supplementary Page. [http://stat.ethz.ch/~dettling/jde.html]
- Identifying Joint Differential Expression in Microarray Data. [http://stat.ethz.ch/talks/Ascona_04/Slides/wirapati.pdf]
- R Development Core Team: R: A Language and Environment for Statistical Computing. 2004, Vienna, AustriaGoogle Scholar
- Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, et al: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMedPubMed CentralView ArticleGoogle Scholar
- Xiao Y, Frisina R, Gordon A, Klebanov L, Yakovlev A: Multivariate search for differentially expressed gene combinations. BMC Bioinformatics. 2004, 5: 164-10.1186/1471-2105-5-164.PubMedPubMed CentralView ArticleGoogle Scholar
- Szabo A, Boucher K, Carroll W, Klebanov L, Tsodikov A, Yakovlev A: Variable selection and pattern recognition with gene expression data generated by the microarray technology. Math Biosci. 2002, 176: 71-98. 10.1016/S0025-5564(01)00103-1.PubMedView ArticleGoogle Scholar
- Szabo A, Boucher K, Jones D, Klebanov L, Tsodikov A, Yakovlev A: Multivariate exploratory tools for microarray data analysis. Biostatistics. 2003, 4: 555-567. 10.1093/biostatistics/4.4.555.PubMedView ArticleGoogle Scholar
- Butte AJ, Tamayo P, Slonim D, Golub TR, Kohane IS: Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. Proc Natl Acad Sci USA. 2000, 97: 12182-12186. 10.1073/pnas.220392197.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhou X, Kao M, Huang H, Wong A, Nunez-Iglesias J, Primig M, Aparicio O, Finch C, Morgan T, Wong W: Functional annotation and network reconstruction through cross-platform integration of microarray data. Nat Biotechnol. 2005, 23: 238-243. 10.1038/nbt1058.PubMedView ArticleGoogle Scholar
- Armstrong S, Staunton J, Silverman L, Pieters R, den Boer M, Minden M, Sallan S, Lander E, Golub T, Korsmeyer S: MLL translocations specify a distinct gene expression profile that distinguishes a unique leukemia. Nat Genet. 2002, 30: 41-47. 10.1038/ng765.PubMedView ArticleGoogle Scholar
- Broad Institute Cancer Program Publication. [http://www.broad.mit.edu/cgi-bin/cancer/publications/pub_paper.cgi?mode=view&paper_id=63]
- Bhattacharjee A, Richards W, Staunton J, Li C, Monti S, Vasa P, Ladd C, Behesti J, Bueno R, Gillette M, et al: Classification of human lung carcinomas by mRNA expression profiling reveals distinct adenocarcinoma subclasses. Proc Natl Acad Sci USA. 2001, 98: 13790-13795. 10.1073/pnas.191502998.PubMedPubMed CentralView ArticleGoogle Scholar
- Meyerson Laboratory: Lung Cancer Genomics. [http://research.dfci.harvard.edu/meyersonlab/lungca/]
- Golub T, Slonim D, Tamayo P, Huard C, Gassenbeek M, Mesirov J, Coller H, Loh M, Downing J, Caliguri M, et al: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531-538. 10.1126/science.286.5439.531.PubMedView ArticleGoogle Scholar
- Broad Institute: Cancer Program Datasets. [http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi]
- Gruvberger S, Ringner M, Chen Y, Panavally S, Saal L, Borg A, Fernö M, Peterson C, Meltzer P: Estrogen receptor status in breast cancer is associated with remarkably distinct gene expression patterns. Cancer Res. 2001, 61: 5979-5984.PubMedGoogle Scholar
- NIH Website Supporting the Gruvberger et al. Publication. [http://research.nhgri.nih.gov/microarray/ER_data.txt]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.