- Deposited research article
- Open Access
Identifying biological themes within lists of genes with EASE
© BioMed Central Ltd 2003
- Received: 17 April 2003
- Published: 25 April 2003
EASE is a customizable software application for rapid biological interpretation of gene lists that result from the analysis of microarray, proteomics, SAGE, and other high-throughput genomic data. The biological themes returned by EASE recapitulate manually determined themes in previously published gene lists and are robust to varying methods of normalization, intensity calculation and statistical selection of genes. EASE is a powerful tool for rapidly converting the results of functional genomics studies from "genes to themes."
- Gene List
- Text File
- Online Tool
- Biological Theme
- Gene Selection Method
High-density microarray and proteomic technologies have enabled the discovery of global patterns of biological responses with respect to experimental or natural perturbations . Much work has addressed the issues of data normalization and statistical selection of genes significantly modulated or clustered based upon expression profiles . The net result of these efforts is one or more lists of genes. Unfortunately, little work has addressed the issue of rapidly identifying biological themes in such lists . Most investigators currently annotate genes one at-a-time using internet-based databases or manual literature searches. Following this tedious process, many researchers struggle to identify the most salient biological themes in order to make sense of their results and have no systematic way to prioritize these themes for further analysis. A parallel issue in interpreting such data regards how to leverage the ever-expanding flood of functional genomic data and tools. We developed the Expression Analysis Systematic Explorer (EASE) to automate the process of biological theme determination for lists of genes and to serve as a customizable gateway to online analysis tools. This is the first report to show that the highest-ranking themes derived by a computational method can recapitulate manually derived themes in previously published results, and that these themes are stable to varying methods of gene selection.
EASE performs three basic functions with any list of genes: 1) over-representation analysis of functional gene categories, 2) customizable linking to online tools, and 3) creation of descriptive annotation tables. Each of these functions uses a system of tab-delimited text files that are easy to customize and update. EASE is an easy-to-use, customizable tool that allows investigators to systematically mine the mass of functional information associated with data generated by microarray, proteomics or SAGE studies.
The core function of EASE is to annotate or analyze a list of genes input as gene identifiers, and display the result in the system web-browser or save the result in a tab-delimited text or Microsoft Excel format. The identifiers can be loaded from a text file or pasted into EASE from another application. Upon input of identifiers, the user can generate an annotation table by clicking the "Annotate Genes" button (Figure 1). The user can also link to any number of online tools such as DAVID  via the "Link to:" list box; this function automatically loads the information specific to the current gene list into the online tool, thereby allowing EASE to serve as a convenient interface to these resources.
The identification of biological themes in the gene list is initiated by clicking the "Find over-represented gene categories" button. This function returns an output of all gene categories ranked by over-representation, with associated probabilities, counts used in the probability calculation, associated genes from the original list and links to various online tools for these genes. The most significantly over-represented categories that result from this analysis are deemed "biological themes" of the gene list. The user can optionally limit these analyses to any particular set of gene categories to answer questions such as "what is special about the mitochondrial genes on my list compared to all mitochondrial genes on the microarray?" The user can further use the "Refine" functionality of EASE to remove specific genes from the original list and enable an over-representation analysis of the remaining genes exclusively. These two functions can be applied repeatedly until the gene list is thoroughly characterized. EASE also allows for comparisons of gene lists at a thematic level, wherein the results are expressed in terms of gene categories over-represented in one list compared to all lists combined.
Calculating statistics on thousands of gene categories can lead to a few seemingly significant probabilities due simply to random chance. To address this multiple comparison issue, EASE is capable of implementing a wide variety of probability corrections including Bonferroni-type methods and bootstrap methods performed by iteratively running over-representation analyses on random gene lists to more accurately determine the true probability of observing a given categorical enrichment. Nevertheless, the power of EASE is most appropriately viewed as an exploratory tool to direct the attention of the researcher to enriched biological themes by prioritizing functional categories based on the significance of over-representation.
The published gene lists of Kayo et al.  were re-analyzed with EASE to test the ability of EASE to generate themes comparable to manually determined themes. In the Kayo study, the authors generated four gene lists corresponding to genes up- and down-regulated in primate muscle in response to aging or caloric restriction. These gene lists were analyzed with the categorical over-representation function of EASE using EASE scores that were corrected for multiplicity using 10,000 bootstrap iterations. All significant (p < 0.05) categories resulting from each list were compared to the themes manually determined and published by Kayo et al. (Figure 2).
Figure 3a demonstrates the instability of the size and overlap of the gene lists that result from varying gene selection methods. The percentage of genes overlapping in any two lists was highly variable, and ranged from 7% to 60%. In spite of this striking variation, the top five biological themes returned by EASE for each of the eight gene lists were virtually the same; all derived from a group of six categories that implicate a vigorous interferon-induced immune response in patients with rebounding HIV viral loads (Figure 3b). The conversion of genes to themes with EASE allowed the "biological result" of the experiment to be determined despite substantial differences in gene list content resulting from the use of various normalization, gene intensity and statistical selection methods.
EASE rapidly converts a list of genes into an ordered table of robust biological themes that summarize the biological result of the experiment. This method has immediate utility for finding themes that most differentiate lists of genes, e.g. up-regulated versus down-regulated in a single experiment, but could potentially be applied to compare the results of different experiments, even involving different species and/or microarray platforms. The EASE method has proven useful for a SAGE analysis of cancer (W.D. Stein, manuscript in preparation) and for microarray analyses of cancer (A. Domkowski, manuscript in preparation; K. Akagi, personal communication), cataracts (M. Kantorow, manuscript in preparation) and immune function in HIV disease [9, 10]. The EASE method also enables a rapid assay for overlap between gene clusters identified in any number of experiments when the user creates gene classification schema based upon these clusters. EASE can potentially be used to facilitate the development of data normalization and gene selection criteria by observing the highest enrichment attained for EASE themes within a particular experiment in which the biological phenomenon is well characterized and confirmed. EASE allows investigators to fully leverage the potential of high-throughput functional genomics technologies to infer biological themes. A full-featured version of EASE is freely available to non-profit researchers for use on Windows operating systems http://david.niaid.nih.gov/david/ease.htm and a limited online version of the EASE over-representation function is available on the DAVID website .
- Heller MJ: DNA microarray technology: devices, systems, and applications. Annu Rev Biomed Eng. 2002, 4: 129-153. 10.1146/annurev.bioeng.4.020702.153438.PubMedView ArticleGoogle Scholar
- Quackenbush J: Microarray data normalization and transformation. Nat Genet. 2002, 32 Suppl: 496-501. 10.1038/ng1032.PubMedView ArticleGoogle Scholar
- Slonim DK: From patterns to pathways: gene expression data analysis comes of age. Nat Genet. 2002, 32 Suppl: 502-508. 10.1038/ng1033.PubMedView ArticleGoogle Scholar
- Database for Annotation, Visualization and Integrated Discovery. [http://david.niaid.nih.gov/]
- Kayo T, Allison DB, Weindruch R, Prolla TA: Influences of aging and caloric restriction on the transcriptional profile of skeletal muscle from rhesus monkeys. Proc Natl Acad Sci USA. 2001, 98: 5093-5098. 10.1073/pnas.081061898.PubMedPubMed CentralView ArticleGoogle Scholar
- Li C, Wong WH: Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. Proc Natl Acad Sci USA. 2001, 98: 31-36. 10.1073/pnas.011404098.PubMedPubMed CentralView ArticleGoogle Scholar
- Sidorov IA, Hosack DA, Gee D, Yang J, Cam MC, Lempicki RA, Dimitrov DS: Oligonucleotide microarray data distribution and normalization. Information Sciences. 2002, 146: 65-71. 10.1016/S0020-0255(02)00215-3.View ArticleGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 5116-5121. 10.1073/pnas.091062498.PubMedPubMed CentralView ArticleGoogle Scholar
- Cicala C, Arthos J, Selig SM, Dennis G, Hosack DA, Van Ryk D, Spangler ML, Steenbeke TD, Khazanie P, Gupta N, et al: HIV envelope induces a cascade of cell signals in non-proliferating target cells that favor virus replication. Proc Natl Acad Sci USA. 2002, 99: 9380-9385. 10.1073/pnas.142287999.PubMedPubMed CentralView ArticleGoogle Scholar
- Chun TW, Justement JS, Lempicki RA, Yang J, Dennis G, Hallahan CW, Sanford C, Pandya P, Liu S, McLaughlin M, et al: Gene expression and viral prodution in latently infected, resting CD4+ T cells in viremic versus aviremic HIV-infected individuals. Proc Natl Acad Sci USA. 2003, 100: 1908-1913. 10.1073/pnas.0437640100.PubMedPubMed CentralView ArticleGoogle Scholar