Open Access

Classifying tumors by gene expression profiling

  • Jonathan B Weitzman
Genome Biology20001:reports008

DOI: 10.1186/gb-2000-1-1-reports008

Received: 15 November 1999

Published: 17 March 2000


Microarray DNA chip technology has been used for the first time to demonstrate the feasibility of cancer classification on the basis of gene-expression profile analysis.

Significance and context

New modalities for cancer therapy will be derived from the development of treatment strategies that are specifically adapted to the tumor type, thereby maximizing efficacy and minimizing toxicity. This requires improved methods for cancer classification. At present, classification is carried out by histopathological analysis of tumor samples, using techniques that are time-consuming and imperfect. Improved classification and subclass definition will depend on the identification of sets of molecular markers. Golub and colleagues from the Lander lab at the Massachusetts Institute of Technology have applied microarray technology to cancer-class discovery and class prediction. They chose human acute leukemias as a test case for such an approach. Acute leukemias can be divided into those of lymphoid (ALL) or myeloid (AML) origin. ALL and AML are currently diagnosed by immunochemistry and cytogenetic analysis and require distinct clinical treatment protocols. This study clearly shows that future cancer classification could be based on gene-expression profiles.

Key results

The authors used primary material from cancer patients, rather than resorting to cultured cell lines. Initially, they analyzed 38 bone marrow samples to define a set of genes necessary for class prediction. They used the commercially available Affymetrix microarray DNA chips and screened 6,817 genes to identify 1,100 genes expressed differently in the AML and ALL classes. 'Neighborhood analysis' methodology was then used to test for correlations stronger than chance. This approach allowed the authors to create a 'class predictor', a set of 50 informative genes that are most closely correlated with the distinction between AML and ALL. When this set was used to test additional leukemia samples, the authors were able to make predictions in most cases and reported 100% accuracy. Furthermore, an examination of the class predictor genes provides useful insights into cancer pathogenesis. The set includes genes involved in cell-cycle regulation, chromatin remodeling and cell adhesion. The authors then tested whether the gene expression data could be used in 'class discovery', the automatic discovery of new classes of cancer. To do this, they used a self-organizing map (SOM) methodology to define clusters of tumors on the basis of gene expression profiles. The SOM approach was tested on the leukemia sets by using it to define cluster data from which a class predictor could be constructed, and this was then cross-validated with the tumor samples. This analysis enabled the authors to subdivide the ALL population into two groups, which proved to reflect the T- or B-cell lineage origin of the tumors.

Methodological innovations

This study uses commercially available microarrays produced by Affymetrix. The details of the experimental protocol and the analysis are presented very clearly at Eric Lander's laboratory website. Although microarray chips of this type are currently beyond the budget of many smaller laboratories, this is likely to change as the chip technology is developing at a startling rate. The statistical and mathematical methodology may be daunting for readers unfamiliar with concepts such as neighborhood analysis, clustering algorithms and self-organizing maps, but the steps are clearly explained in the accompanying information on the authors' website.


Visit Patrick Brown's laboratory website for a guide to building your own chips.


This study illustrates the powerful potential of the analysis of genome-wide gene expression to define distinct pathologies and identify classes of cancer. Its beneficial diagnostic applications are obvious. Careful analysis and record keeping will allow doctors to correlate expression profiles with patients' responses to treatment and clinical outcomes. This will undoubtedly lead to the development of focused cancer therapy protocols that are specific to the multigene expression signature and molecular markers of particular tumor types. The authors point out that the method of sample preparation is critical, and care should be taken in comparing results generated by different laboratories. This issue, and the cost of microarray technology, will need to be addressed if this approach is to be widely used in a diagnostic setting. The authors also suggest that techniques such as laser microdissection of tumor material will help to avoid artifacts. Furthermore, as the data sets increase in size, the number of critical genes in the class predictor might be reduced, enabling quick and accurate analysis.

Reporter's comments

This report shows why everyone is getting excited about DNA chips. The study is particularly elegant and thorough in its analysis and conclusions. The ability to define molecular signatures for classes of disease, be it cancer or other diseases with heterogeneous pathologies, has potential for refining therapeutic protocols. Furthermore, this approach will lead to the unbiased identification of genes associated with particular tumor types, which will provide new information about the mechanisms of oncogenesis and new targets for drug discovery.

Table of links


  1. Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531-537. 0036-8075PubMedView ArticleGoogle Scholar


© BioMed Central Ltd 2000