The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists
- Da Wei Huang†1,
- Brad T Sherman†1,
- Qina Tan1,
- Jack R Collins2,
- W Gregory Alvord3,
- Jean Roayaei3,
- Robert Stephens2,
- Michael W Baseler4,
- H Clifford Lane5 and
- Richard A Lempicki1Email author
© Huang et al.; licensee BioMed Central Ltd. 2007
Received: 5 February 2007
Accepted: 4 September 2007
Published: 04 September 2007
The DAVID Gene Functional Classification Tool http://david.abcc.ncifcrf.gov uses a novel agglomeration algorithm to condense a list of genes or associated biological terms into organized classes of related genes or biology, called biological modules. This organization is accomplished by mining the complex biological co-occurrences found in multiple sources of functional annotation. It is a powerful method to group functionally related genes and terms into a manageable number of biological modules for efficient interpretation of gene lists in a network context.
Biological interpretation of large gene lists derived from high-throughput genomic or proteomic studies can be a challenging and daunting process. Some of the difficulties include: acquiring large amounts of functional annotation for every gene; the distributed nature of annotation across numerous sources, that is, not centralized; summarizing which genes are associated with specific biological processes and ranking these processes by over-representation analysis; condensing repetitive or redundant annotation data; identifying functional biological modules consisting of related genes and terms; and viewing inter-relationships between groups of genes and groups of biological terms. A number of publicly available bioinformatics tools have addressed the first three points above, including, but not limited to, GoMiner, DAVID, EASE, GOstat, Onto-express, GoToolBox, FatiGO, GOSSIP, GFINDer, GOBar, and so on [1–25]. The power of many of these applications is to systematically highlight the most over-represented biological terms, out of a list of hundreds or thousands of terms, to increase the likelihood of investigators identifying biological processes most pertinent to the biological phenomena under study . While these tools are extremely useful, they are still weak in mining the many-to-many gene-to-term relationships found in functional annotation databases, as well as in condensing redundant contents.
Individual genes can clearly be associated with multiple biological terms and, conversely, individual biological terms can be associated with multiple genes. These associations form a complex relationship network of 'many-genes-to-many-terms' that represents the true complex nature of biological processes. Data-mining tools that can extract these complex and redundant relationships should be able to identify functional gene-term biological modules. This identification can be accomplished by using exploratory statistical methods that identify groups of genes sharing similar biological terms or, alternatively, identifying groups of biological terms sharing similar genes. For example, if a subset of genes in a list is sodium transporters, then one can expect that they will have major functional annotations in common. A method that can group these genes based on the strength of overlap of the functional annotation should identify modules of related genes and terms. Similarly, terms that have many genes in common can also be grouped into a module of related terms and genes; for example, the terms 'apoptosis', 'cell death', 'death', and 'regulation of cell death' will be grouped together because these terms share a large number of common genes. The advantages of this method of classifying groups of genes and terms into biological modules are: it largely reduces redundant results into a manageable size; it is much easier to understand and visualize gene-to-gene, term-to-term, and gene-to-term relationships, since related genes and terms are brought together in one place; and it is much easier to relate biological modules of interest to a study than it is to relate hundreds of individual terms.
The goals of the project are to identify groups of genes sharing common biology or, alternatively, to identify groups of biological terms sharing common genes relevant to an investigator's study. Most importantly, the heterogeneous annotations/genes can be grouped as long as they are within the same, relevant biological context. In this sense, the definition of functional group in this work is much broader than the traditional concept. The improvement of biological discovery is through better organization of massive and redundant results into a more readable and manageable format (that is biological groups). To this end, we developed the DAVID (The Database for Annotation, Visualization and Integrated Discovery ) Gene Functional Classification Tool and the DAVID Functional Annotation Clustering Tool to provide a module-centric approach for functional analysis of large gene lists. First, we developed a new method to measure gene-gene similarity, based on the assumption that genes that share global functional annotation profiles are functionally related to each other. Conversely, we measure term-term similarity based on the assumption that terms that share global gene profiles are functionally related to each other. Then, a DAVID agglomeration method was developed to group related genes or terms into functional groups (biological modules) based on the similarity distances measure. The fuzziness feature of the agglomeration method allows a gene or term to participate in more than one functional group, better reflecting the true 'multiple-roles' nature of genes that can be lost if exclusive methods, such as Hierarchical, K-means, or SOM clustering are used. Functional groups are ranked based on all group members' overall participation in the enriched biological processes associated with the total gene list. A global view of group-to-group relationships is also provided through a unique fuzzy heat map visualization. A subset of 'drill-down' functions associated with each biological module allows investigators to explore and visualize relationships between genes and terms. In this paper, we will mainly describe the key algorithms associated with the DAVID Gene Functional Classification Tool, illustrate the usefulness of several of the functionalities, and demonstrate how quickly investigators can apply the information in a biological module to their study.
Measuring functional relationship of gene pairs based on the similarity of global annotation profiles
The traditional ways of grouping related genes are based on either sequence similarity (sequence homologs), functional categories (protein domain families), or co-expression clusters (microarray clusters). In fact, the majority of co-functioning genes are neither sequence-related nor in the same protein families, such as genes in the same pathway. Therefore, the traditional phylogenetic grouping methods are powerful for evolution-based studies, but too specific and strict to be of much use in classifying genes for the purpose of functional annotation. We propose a novel method to identify related genes by measuring the similarity of their global annotation profiles based on the hypothesis that if two genes have similar annotation profiles, they should be functionally related. This method is able to identify much broader gene groups in which genes share major common biological features as well as tolerate some differences. For example, many different types of genes, with or without too much sequence similarity, could be grouped into a transcription regulation class. We believe that the broader functional groups are more useful for functional annotation purposes and, hence, biological interpretation.
where K mn is 1 for perfect co-occurrence and 0 for co-occurrence no better than random chance (Figure 2b).
A novel agglomeration method to classify a gene list into functionally related groups based on the functional similarity scores
After the kappa score matrix of all possible pair-wide genes is calculated, it is possible to classify the highly related genes. We examined the typical clustering methods, including hierarchical tree, K-means, hierarchical, FANNY, and SOM. All of them produced weaker clustering results (Additional data file 5) with our test datasets. The poor clustering results stem from one or more of the following weaknesses associated with the aforementioned clustering algorithms. First,: genes must be assigned to one cluster, even though their absolute relationship is weak to all clusters. This results in higher contamination of clusters with noise by forcing membership of weakly related genes. Second, genes can belong to only one cluster, which does not align well with the basic biological nature of genes, in that one gene could participate in multiple, different roles. Third, outliers and uneven cluster sizes can greatly affect clustering quality. Fourth, it is difficult to know the optimal K (number of clusters) for K-means, FANNY, or SOM.
This method works better than others for this particular type of analysis (Additional data file 5). This method: eliminates overall irrelevant/weak elements, as orphan genes, to significantly push the signal out of noise; allows for fuzziness by allowing genes to be assigned to more than one cluster which aligns with the biological nature; dynamically determines the number of clusters based on the chosen threshold; generates grand groups for easy interpretation; and tolerates outliers extremely well by excluding them in step 1.
There is no gold standard or null hypothesis to evaluate clustering methods and hence no right or wrong answers for any given clustering algorithms. One method may work better than others in the sense that it is more sensitive to the natural structure of a particular problem. However, this method, like any other heuristic approach, has the common weakness that an improper running criteria setting can lead to distorted results. In order to aid less advanced users with the setting of these criteria, we preset five general levels representing combinations of the detailed settings from very low to very high stringencies; based on our extensive tests on multiple datasets, the default stringency level (medium) should be optimal for most cases.
Since there is not a null hypothesis test to compare the quality between clustering algorithms, we try to summarize the quality of our agglomeration algorithm based on randomly selected genes that all clearly belong to one protein family (for example, kinase, phosphatase, chemokine, and so on). Then, the genes were classified by the method. Since we have pre-knowledge about the gene family information, the gene(s) that are grouped incorrectly or excluded from the correct group(s) can be roughly estimated. We observed that the leaking rate (that is, a gene not placed into a group to which it does belong) is between 1% and 2%, and the noise rate (that is, a gene incorrectly placed into a group to which it does not belong) is between 1% and 5%. Most importantly, the method is able to identify key members of groups so that the major biology of each group can quickly be determined. Since the analytical approach is biological module-centric, the major biology associated with each gene group is determined by the majority of gene members rather than by individual genes. Thus, the biology of each group should be very stable, even though there is a chance that a few members are excluded or incorrectly included. In summary, this clustering method shows reasonable performance by eliminating irrelevant, 'noisy' genes and by bringing together strongly related functional groups, while maintaining the fuzzy nature of biology by which genes may be involved in multiple processes.
The last question is, 'Which final functional gene groups are more significant for the experiment?' We extended the traditional enrichment analysis logic so that a gene group is more important if a majority of its gene members is associated with highly enriched annotation terms as found in the traditional enrichment analysis of the total gene list. Thus, the enrichment score of each group is measured by the geometric mean of the EASE Scores (modified Fisher Exact)  associated with the enriched annotation terms that belong to this gene group. Importantly, the multiple testing correction issues are considered in the individual EASE scores . And all EASE scores (significant or insignificant) associated with the group participate in the algorithm. In order to emphasize that the geometric mean is a relative score instead of an absolute p value, minus log transformation is applied on the geometric mean (Additional data file 6). Therefore, the group enrichment scores are intended to order the relative importance of the gene groups instead of as absolute decision values. A higher score for a group indicates that the group members are involved in more important (enriched) roles. However, all gene groups are potentially interesting despite lower rankings.
Visualization of results in a very simple text format and a novel fuzzy heat map view
Results and discussion
We examined the newly developed biological module-centric tools (see Additional data file 8 for a graphical tutorial of using the tools) on two published microarray datasets. It is important to mention that, to avoid potential bias, the datasets of the case studies are different from those used during algorithm development. For the first microarray dataset , G1 response genes were identified by microarray experiments after introducing G1 cyclin Cln3p to cln- yeast cells that were previously arrested with cdc34-2. For comparison, the dataset was analyzed by tools with very different algorithms, that is, DAVID Tools , GoMiner , Ontologizer , GOStat , ermineJ , ADGO  and GENECODIS . All tools are able to highlight the major terms (for example, cell cycle, DNA repair, DNA replication, budding, and so on), consistent with previously published observations. However, the DAVID methods are more sensitive to a couple of additional important terms (for example, cyclin-dependant kinase activity, mating, and so on) that were not found among the top terms in the output from the other tools. For more detailed results, comparisons and discussion, see Additional data file 14.
The following detailed discussion is mainly focused on the second microarray dataset , of which the gene list is available as demo list 2 on our tool entry page. In this example dataset, authors treated freshly isolated peripheral blood mononuclear cells (PBMCs) with an HIV envelope protein (gp120) and further measured genome-wide gene expression changes using Affymetrix U95A chips . This study provides a global view of the complex interaction between viral and cellular factors, which is an essential mechanism for HIV replication in resting or suboptimally activated PBMCs. A functionally significant annotation of approximately 400 genes (Additional data file 1) derived from the microarray experiment was classified by the authors into five major functional categories: cytokines, chemokines, transcription factors, kinases, and membrane fusion . While the cytokine and chemokine categories were systematically highlighted by EASE (a GO enrichment analysis based on the Fisher Exact Test) , other annotation categories reported in the publication were discovered through semi-manual analysis by bioinformatics experts with an advanced level of knowledge of both biology and computer tools.
The same data re-analyzed by typical functional annotation tools
The top 20 enriched terms for demo list 2 by various traditional functional annotation tools
Response to pathogenic bacteria
Response to stimulus
Induction of positive chemotaxis
Inflammatory response/extracellular region
Clathrin coat of coated pit
Response to pest, pathogen or parasite
Positive regulation of vascular endothelium
Viral genome replication
Response to stress
Cell surface receptor linked signal transduction
Cell-cell signaling/extracellular space
Response to external biotic stimulus
Positive regulation of protein metabolic process
Soluble fraction/chemokine activity
Clathrin vesicle coat
Response to wounding
Cytoskeleton organization and biogenesis
Vascular endothelial growth factor receptor
Establishment of spindle localization
Clathrin coated vesicle membrane
Negative regulation of biological process
Extracellular matrix binding
Sensory perception/chemokine activity
Negative regulation of physiological process
Viral genome replication
Inflammatory response/chemokine activity
Establishment of mitotic spindle localization
Response to other organism
Cytoplasmic vesicle membrane
Sensory perception/extracellular space
Regulation of cellular process
Cytoplasmic vesicle membrane
Regulation of biological process
RNA polymerase II transcription factor activity
Negative regulation of cellular process
Regulation of biological process
G-protein coupled receptor protein signaling pathway/extracellular space
Establishment of cellular localization
Inflammatory response/extracellular space
Viral infectious cycle
Extracellular space/chemokine activity
Positive regulation of protein metabolism
Calpain inhibitor activity
Anatomical structure formation
Ammonia ligase activity
G-protein coupled receptor protein signaling pathway/chemokine activity
Regulation of protein-nucleus import
Regulation of isotype switching
Endothelin-converting enzyme 1 activity
Immune cell migration
Negative regulation of cellular physiological process
U-plasminogen activator receptor activity
Cell-cell signaling/chemokine activity
Nitrogen compound biosynthetic process
Cell proliferation/extracellular space
Extracellular region/chemokine activity
Cytokine biosynthetic process
Response to pathogenic bacteria
G-protein coupled receptor protein signaling pathway/soluble fraction
Immune system process
Hyaluronic acid binding
Sensory perception/extracellular region
Total 380 terms (p < 0.05)
Total 157 terms (p < 0.05)
Total 119 terms (p < 0.05)
Total 31 terms (p < 0.05)
Total 160 terms (p < 0.05)
Total 67 terms (p < 0.05)
Even though the results from the tools all point in the same biological direction, there are four obvious problems. First, redundant/similar/hierarchical terms appear in different (significance) positions within the reports (for example, response to stress, response to wounding, response to pathogenic bacteria, response to other organisms, response to external biotic stimulus, inflammatory response, and so on), which makes it difficult for the user to gain or maintain a clear focus of the whole biological picture. It is not easy for users to comprehensively pool all genes related to the same key biology without manually summarizing all related redundant terms. Second, the redundant/similar/hierarchical terms could largely dilute the focus on other key biology that has few or no redundancies (for example, only one term is for establishment of cellular localization). If several redundant/similar/hierarchical terms are represented in the top of the list, less redundant terms may be pushed down the list, possibly decreasing the chance of discovery; for example, a transcription regulation term, reported in an original publication, was not listed in the top 20 by any of the tools. Third, in contrast, due to differences of the annotation levels of different sources, redundant/similar/hierarchical terms may themselves be diluted. While alone a single term may not be at the top of the list, in combination with redundant/similar/hierarchical terms, the biological function may be very significant. Fourth, current tools do not emphasize the inter-relationships between key biological terms (for example, relationships between chemokine/cytokine and signal transduction).
In conclusion, the recent improvement of functional annotation tools provides a powerful means for users to systematically identify key biological functions associated with a gene list. However, due to the weaknesses discussed above, refinement of current gene-term enrichment algorithms and improvement of software usability alone may not address all the issues. Therefore, the development of novel alternative algorithms as a complement is still very necessary.
The same data analyzed by the DAVID Gene Functional Classification Tool
Sixteen total gene functional groups identified by the Functional Classification Tool
Gene functional group no.
Group enrichment score
Signal transduction/membrane receptors
RNA processing/splicing factors
Organic acid transport
DNA metabolism/chromosome organization
Cellular macromolecule catabolism
RAS small GTPase
The DAVID Gene Functional Classification Tool allows users to further explore a given biological module/gene group in depth. For example, the 'enriched terms' button '2-D View' is able to list all related terms and genes for the kinase group. Thus, a user who is not familiar with kinases can explore the terms of kinase activity, transferase activity, ATP-binding, nucleotide binding, protein metabolism, tyrosine specificity, serine/threonine specificity, regulation of G protein signaling, and signal transduction, and so on in one view at the same time (Figure 6). Therefore, we can quickly learn the biology for the kinase group with the above related terms in a single view and also identify the fine differences among them. For example, there are two G-protein coupling receptor kinases, three protein tyrosine kinases and six kinases involved in cell surface receptor-linked signal transduction among the 23 kinases within the group (Figure 6). The fine details may be very important for pinpointing the key biology associated with a study.
Furthermore, the DAVID Gene Functional Classification Tool allows one gene to be present in more than one functional group, which closely reflects the nature of biology whereby one gene could play multiple roles in different processes. This fuzziness feature improves the chances of discovery by maximally preserving all of the true relationships. For example, general transcription factor II H (GTF2H4/TFIIH, 41371_at) was assigned to group 2 (transcription regulation group) and group 5 (DNA damage/repair group) (Additional data file 2). Some studies suggest TFIIH increases polymerase processivity in HIV infection . Currently, there are few reports about the TFIIH DNA repair mechanism being involved in HIV infection, although this DNA repair mechanism could be essential in HIV integration. Hence, the fuzzy capability allows users not only to focus on the TFIIH transcription regulation role but also to consider the possible role in HIV integration through the DNA repair mechanism. For another example, ring finger protein 40 (RNF40) is in group 2 (transcription regulation group) and group 10 (chromosome assembly) (Additional data file 2). Although the biological significance of the ring finger protein in HIV infection is still largely unclear, the annotation result points out two potential areas for further exploration: first, the ring finger protein regulates the tumor necrosis factor-related transcriptional pathway, which is critical to many aspects of HIV transcription; and second, it plays some role in DNA packaging and chromosome integration. Thus, the fuzziness capability is a powerful feature to maximally preserve biological patterns and to discover fine differences for a given gene compared to exclusive methods.
The sensitivity of the Functional Classification Tool can vary with different datasets and stringency criteria. If the running criteria are not suitable to a particular dataset, the output can be distorted. In such cases, some exploration of different running stringencies is necessary in order to obtain the optimal results to meet the expectation of the study.
The same data analyzed by the Functional Annotation Clustering Tool
The top 20 annotation clusters identified by the DAVID Functional Annotation Clustering Tool
Representative annotation terms
Negative regulation of biological process
Viral genome replication
Regulation of biological process
Regulation of cell cycle
Positive regulation of biological process
Biological process unknown
Physiological interaction between organisms
Antimicrobial humoral response
Transcription cofactor activity
Integral to plasma membrane
Coated vesicle membrane
DNA repair/DNA metabolism
The DAVID Gene Functional Classification Tool  is able to organize and condense large gene lists into biologically meaningful modules. It changes functional annotation analysis from term- or gene-centric to biological module-centric. This method takes into account the redundant and network nature of biological annotation contents in order to concentrate on the larger biological picture rather than an individual terms or genes. The DAVID Gene Functional Classification Tool is complementary to other functional annotation tools.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 lists the genes used in the paper. Additional data file 2 provides the complete output in text format for demo list 2 analyzed by the DAVID Gene Functional Classification Tool. Additional data file 3 provides the complete output in text format for demo list 2 analyzed by the DAVID Functional Annotation Clustering Tool. Additional data file 4 is a figure showing the fuzzy heat map visualization of biological modules. Additional data file 5 is a comparison of the novel fuzzy heuristic partitioning method with other clustering methods. Additional data file 6 is an example of the group enrichment score calculation used for the Functional Annotation Clustering Tool. Additional data file 7 describes the fourteen annotation categories used in the DAVID Functional Classification Tool. Additional data file 8 provides graphical instruction and a tutorial on how to use the DAVID Functional Classification Tool and the DAVID Functional Annotation Clustering Tool. Additional data file 9 gives the output examples for the related gene search and related term search. Additional data file 10 is the default setting for minimum overlapped annotation in kappa score calculation. Additional data file 11 describes the effect of Kappa statistics on biased annotation data. Additional data file 12 provides a hypothetical example to measure the relationships of gene-gene pairs by kappa statistics with annotations organized in a 'flat' matrix. Additional data file 13 provides a hypothetical example to demonstrate the general procedure of our agglomeration procedure. Additional data file 14 includes detailed results, comparisons of the new DAVID clustering tools with regards to yeast cell cycle G1 genes . Additional data file 15 gives the annotation results of demo list 2 by GOMiner, GOStat, DAVID chart, topGO, ermineJ ORA, Ontologizer (three methods), ADGO and GENECODIS.
peripheral blood mononuclear cell.
The authors are grateful to the reviewers for their constructive comments. We would like to thank David Liu and David Bryant in the ABCC group for database and web server support. We thank Yongjian Guo in BSSP/OTIS/NIAID for his comments on the manuscript. We would like to thank Wei Gao, Melaku Gedil, Ping Ren, and Jun Yang in the LIB group for helpful works and discussions, and Doug Powell in the CS&S group for helpful statistical discussions. We also thank Bill Wilton and Mike Tartakovsky for information technology and network support. This research was supported in whole by the National Institute of Allergy and Infectious Disease. This project has been funded in whole with federal funds from the National Cancer Institute, National Institutes of Health, under contract N01-CO-12400. The content of this publication does not necessarily reflect the views or policies of the Department of Health and Human Services, nor does mention of trade names, commercial products, or organizations imply endorsement by the US Government. Funding to pay the Open Access publication charges for this article was provided by the same source as above.
- Dennis G, Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol. 2003, 4: P3-PubMedView ArticleGoogle Scholar
- Hosack DA, Dennis G, Sherman BT, Lane HC, Lempicki RA: Identifying biological themes within lists of genes with EASE. Genome Biol. 2003, 4: R70-PubMedPubMed CentralView ArticleGoogle Scholar
- Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004, 20: 1464-1465.PubMedView ArticleGoogle Scholar
- Al-Shahrour F, Diaz-Uriarte R, Dopazo J: FatiGO: a web tool for finding significant associations of Gene Ontology terms with groups of genes. Bioinformatics. 2004, 20: 578-580.PubMedView ArticleGoogle Scholar
- Martin D, Brun C, Remy E, Mouren P, Thieffry D, Jacq B: GOToolBox: functional analysis of gene datasets based on Gene Ontology. Genome Biol. 2004, 5: R101-PubMedPubMed CentralView ArticleGoogle Scholar
- Al-Shahrour F, Minguez P, Vaquerizas JM, Conde L, Dopazo J: BABELOMICS: a suite of web tools for functional annotation and analysis of groups of genes in high-throughput experiments. Nucleic Acids Res. 2005, 33: W460-464.PubMedPubMed CentralView ArticleGoogle Scholar
- Castillo-Davis CI, Hartl DL: GeneMerge - post-genomic analysis, data mining, and hypothesis testing. Bioinformatics. 2003, 19: 891-892.PubMedView ArticleGoogle Scholar
- Zhong S, Storch KF, Lipan O, Kao MC, Weitz CJ, Wong WH: GoSurfer: a graphical interactive tool for comparative analysis of large gene sets in gene ontologytrade mark space. Appl Bioinformatics. 2004, 3: 261-264.PubMedView ArticleGoogle Scholar
- Zhang B, Schmoyer D, Kirov S, Snoddy J: GOTree Machine (GOTM): a web-based platform for interpreting sets of interesting genes using Gene Ontology hierarchies. BMC Bioinformatics. 2004, 5: 16-PubMedPubMed CentralView ArticleGoogle Scholar
- Shah NH, Fedoroff NV: CLENCH: a program for calculating Cluster ENriCHment using the Gene Ontology. Bioinformatics. 2004, 20: 1196-1197.PubMedView ArticleGoogle Scholar
- Draghici S, Khatri P, Bhavsar P, Shah A, Krawetz SA, Tainsky MA: Onto-Tools, the toolkit of the modern biologist: Onto-Express, Onto-Compare, Onto-Design and Onto-Translate. Nucleic Acids Res. 2003, 31: 3775-3781.PubMedPubMed CentralView ArticleGoogle Scholar
- Khatri P, Bhavsar P, Bawa G, Draghici S: Onto-Tools: an ensemble of web-accessible, ontology-based tools for the functional design and interpretation of high-throughput gene expression experiments. Nucleic Acids Res. 2004, 32: W449-456.PubMedPubMed CentralView ArticleGoogle Scholar
- Sharan R, Maron-Katz A, Shamir R: CLICK and EXPANDER: a system for clustering and visualizing gene expression data. Bioinformatics. 2003, 19: 1787-1799.PubMedView ArticleGoogle Scholar
- Liu H, Hu ZZ, Wu CH: DynGO: a tool for visualizing and mining of Gene Ontology and its associations. BMC Bioinformatics. 2005, 6: 201-PubMedPubMed CentralView ArticleGoogle Scholar
- Lee JS, Katari G, Sachidanandam R: GObar: a gene ontology based analysis and visualization tool for gene sets. BMC Bioinformatics. 2005, 6: 189-PubMedPubMed CentralView ArticleGoogle Scholar
- Zeeberg BR, Qin H, Narasimhan S, Sunshine M, Cao H, Kane DW, Reimers M, Stephens RM, Bryant D, Burt SK, et al: High-throughput GoMiner, an 'industrial-strength' integrative gene ontology tool for interpretation of multiple-microarray experiments, with application to studies of common variable immune deficiency (CVID). BMC Bioinformatics. 2005, 6: 168-PubMedPubMed CentralView ArticleGoogle Scholar
- Khatri P, Draghici S: Ontological analysis of gene expression data: current tools, limitations, and open problems. Bioinformatics. 2005, 21: 3587-3595.PubMedPubMed CentralView ArticleGoogle Scholar
- Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005, 21: 3448-3449.PubMedView ArticleGoogle Scholar
- Berriz GF, King OD, Bryant B, Sander C, Roth FP: Characterizing gene sets with FuncAssociate. Bioinformatics. 2003, 19: 2502-2504.PubMedView ArticleGoogle Scholar
- Ben-Shaul Y, Bergman H, Soreq H: Identifying subtle interrelated changes in functional gene categories using continuous measures of gene expression. Bioinformatics. 2005, 21: 1129-1137.PubMedView ArticleGoogle Scholar
- Zhong S, Tian L, Li C, Storch KF, Wong WH: Comparative analysis of gene sets in the Gene Ontology space under the multiple hypothesis testing framework. Proc IEEE Comput Syst Bioinform Conf. 2004, 425-435.Google Scholar
- Doniger SW, Salomonis N, Dahlquist KD, Vranizan K, Lawlor SC, Conklin BR: MAPPFinder: using Gene Ontology and GenMAPP to create a global gene-expression profile from microarray data. Genome Biol. 2003, 4: R7-PubMedPubMed CentralView ArticleGoogle Scholar
- Cheng J, Sun S, Tracy A, Hubbell E, Morris J, Valmeekam V, Kimbrough A, Cline MS, Liu G, Shigeta R, et al: NetAffx Gene Ontology Mining Tool: a visual approach for microarray data analysis. Bioinformatics. 2004, 20: 1462-1463.PubMedView ArticleGoogle Scholar
- Robinson PN, Wollstein A, Bohme U, Beattie B: Ontologizing gene-expression microarray data: characterizing clusters with Gene Ontology. Bioinformatics. 2004, 20: 979-981.PubMedView ArticleGoogle Scholar
- Bluthgen N, Brand K, Cajavec B, Swat M, Herzel H, Beule D: Biological profiling of gene groups utilizing Gene Ontology. Genome Inform. 2005, 16: 106-115.PubMedGoogle Scholar
- DAVID Home Page. [http://david.abcc.ncifcrf.gov]
- DAVID Gene Functional Classification Tool. [http://david.abcc.ncifcrf.gov/gene2gene.jsp]
- DAVID Functional Annotation Clustering. [http://david.abcc.ncifcrf.gov/summary.jsp]
- DAVID Knowledgebase. [http://david.abcc.ncifcrf.gov/content.jsp?file=/knowledgebase/DAVID_knowledgebase.html]
- Cohen J: A coefficient of agreement for nominal scales. Educ Psychol Meas. 1960, 20: 37-46.View ArticleGoogle Scholar
- Byrt T, Bishop J, Carlin JB: Bias, prevalence and kappa. J Clin Epidemiol. 1993, 46: 423-429.PubMedView ArticleGoogle Scholar
- Alexa A, Rahnenfuhrer J, Lengauer T: Improved scoring of functional groups from gene expression data by decorrelating GO graph structure. Bioinformatics. 2006, 22: 1600-1607.PubMedView ArticleGoogle Scholar
- Grossmann S, Buaer S, Robinson PN, Vingron M: An improved statistic for detecting over-represented Gene Ontology annotations in gene sets. Res Comput Mol Biol. 2006, 3909: 85-98.View ArticleGoogle Scholar
- Bader GD, Betel D, Hogue CW: BIND: the Biomolecular Interaction Network Database. Nucleic Acids Res. 2003, 31: 248-250.PubMedPubMed CentralView ArticleGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9: 3273-3297.PubMedPubMed CentralView ArticleGoogle Scholar
- Lee HK, Braynen W, Keshav K, Pavlidis P: ErmineJ: tool for functional analysis of gene expression data sets. BMC Bioinformatics. 2005, 6: 269-PubMedPubMed CentralView ArticleGoogle Scholar
- Nam D, Kim SB, Kim SK, Yang S, Kim SY, Chu IS: ADGO: analysis of differentially expressed gene sets using composite GO annotation. Bioinformatics. 2006, 22: 2249-2253.PubMedView ArticleGoogle Scholar
- Carmona-Saez P, Chagoyen M, Tirado F, Carazo JM, Pascual-Montano A: GENECODIS: a web-based tool for finding significant concurrent annotations in gene lists. Genome Biol. 2007, 8: R3-PubMedPubMed CentralView ArticleGoogle Scholar
- Cicala C, Arthos J, Selig SM, Dennis G, Hosack DA, Van Ryk D, Spangler ML, Steenbeke TD, Khazanie P, Gupta N, et al: HIV envelope induces a cascade of cell signals in non-proliferating target cells that favor virus replication. Proc Natl Acad Sci USA. 2002, 99: 9380-9385.PubMedPubMed CentralView ArticleGoogle Scholar
- Affymetrix. [http://www.affymetrix.com/products/arrays/specific/hgu95.affx]
- Clayton F, Kapetanovic S, Kotler DP: Enteric microtubule depolymerization in HIV infection: a possible cause of HIV-associated enteropathy. Aids. 2001, 15: 123-124.PubMedView ArticleGoogle Scholar
- Isel C, Karn J: Direct evidence that HIV-1 Tat stimulates RNA polymerase II carboxyl-terminal domain hyperphosphorylation during transcriptional elongation. J Mol Biol. 1999, 290: 929-941.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.