ArrayPlex: distributed, interactive and programmatic access to genome sequence, annotation, ontology, and analytical toolsets
© Killion and Iyer; licensee BioMed Central Ltd. 2008
Received: 22 September 2008
Accepted: 12 November 2008
Published: 12 November 2008
ArrayPlex is a software package that centrally provides a large number of flexible toolsets useful for functional genomics, including microarray data storage, quality assessments, data visualization, gene annotation retrieval, statistical tests, genomic sequence retrieval and motif analysis. It uses a client-server architecture based on open source components, provides graphical, command-line, and programmatic access to all needed resources, and is extensible by virtue of a documented application programming interface. ArrayPlex is available at http://sourceforge.net/projects/arrayplex/.
Although centralized storage of microarray data is provided by a number of databases, such as ArrayExpress, Gene Expression Omnibus, Stanford Microarray Database/Longhorn Array Database, Bioarray Software Environment, and TM4 [1–6], many common downstream analysis procedures remain challenging, especially when reference to large-scale data in external databases is required. Data analysis typically involves association of gene names with systematic and custom annotations, gene ontology information, and genomic DNA sequence, followed by a battery of analyses such as enrichment of functional annotations in gene sets, statistical tests for significance, analysis of cis-regulatory motifs and regulator-target relationships. Resources for these tasks are difficult to manually assemble while ensuring they remain error free. Amplifying the challenge is the fact that such analyses are not executed just once, but usually consist of a series of iterations with changing parameters. In order to reduce inefficiency and minimize errors, new algorithms for newly devised data analyses must ideally interface with pre-existing code and algorithms that already satisfactorily address other domains of data analysis.
Gene Ontology Descriptors
Hs, Mm, Sc
Hs Gene Ontology assignments
Mm Gene Ontology assignments
Sc genome sequence
Sc Gene Ontology assignments
Genomic sequence matching
Our goal was to develop an open-source, robust, and easy to maintain network-centric system that enables the construction of reusable pipelines of complex data analysis procedures. We designed the system to communicate on three levels of interaction: a graphical user interface for interactive data manipulation, a set of command-line analytical modules for script-driven analysis, and a documented Java-based programmatic application programming interface (API). Below we describe the systematic architecture of the ArrayPlex environment and the genomic resources included within it. Additionally, we demonstrate how ArrayPlex has been indispensable in the large-scale analysis of a transcriptional regulatory network.
Core technology, design, network operation
ArrayPlex was implemented with exclusively open-source technologies. Components were selected to enable creation of an encapsulated system; virtually all of the open source distributable software components required for function are bundled within the installation package.
The ArrayPlex client is a graphical user interface that contains dozens of data management, analysis, and visualization features. It is compatible with Mac OS X, Windows XP, Windows Vista and most distributions of Linux operating systems. It communicates by standard network protocols with the ArrayPlex server and, thus, can operate on any computer with network connectivity to the ArrayPlex server. Because it communicates with the ArrayPlex server using the same protocol a web browser utilizes, the ArrayPlex client requires no special changes to client firewall configurations or network settings for operation. The ArrayPlex client requires no local installation. The application resides on the ArrayPlex server and is remotely retrieved and launched through use of Java Web Start . This ensures that with each execution the end-user is using the latest version of the ArrayPlex client. This design and implementation allows a large user group to share a customizable and expanding graphical user interface without the constant need for distributed upgrades or reinstallations with each cycle of improvement. In addition to the graphical user interface, ArrayPlex has a set of command-line executed client-side modules packaged in the form of standard Java Archive format (JAR) files . These modules contain documented analytical routines that communicate with the ArrayPlex server exactly like the ArrayPlex client. This feature allows the distributed network design of ArrayPlex to be used by command-line application and script-driven analysis just as easily as the graphical interface.
Bundled genomic resources
All resources are processed from their heterogeneous downloaded forms to a structured query language (SQL) format that is loaded into the ArrayPlex relational database schema. The transformation removes all of the organism-specific nature of the data and allows the ArrayPlex programmatic API to be designed such that reusable code modules can be implemented independent of the original source of the annotations. A functional example of this would be GO assignments. This information is species-specific and details the mapping of universal GO terms to specific genes in a given organism. The downloaded forms of these assignments for human and mouse differ from yeast in format and content, because these assignments are curated and managed by independent research institutions: European Bioinformatics Institute for human and mouse, Stanford Genome Database for yeast. The transformation of this information to a single format and storage in a relational schema enabled a single set of ArrayPlex database source-code to be written to retrieve and use this information. This allows programmers using the ArrayPlex programmatic API to write data retrieval and analysis routines that are independent of the organism-specific caveats and institution-specific file formats. File format changes will be handled through alteration of the ArrayPlex parsing routines and released upgrades. These internal adaptations will be transparent to programmers using the API, thus shielding them from future file format evolution.
In addition to GO and gene annotations, complete genome sequence is downloaded for each of the supported model organisms. This genome sequence is in FASTA format but is converted to National Center for Biotechnology Information (NCBI) BLAST-database format by the ArrayPlex installation program using NCBI-provided utilities . This transformation provides two advantages. First, it allows the ArrayPlex programmatic API to include complete BLAST functionality as a part of its catalogue of analytical operations. Second, and more importantly, it allows the ArrayPlex environment to take advantage of all the pre-existing NCBI-bundled toolsets for genome sequence retrieval.
Genome resources are most valuable when synchronized with the most recent versions available. Frequent modifications and additions occur to GO and other gene annotation assignments as they are continually curated and updated. In order to keep analysis routines and the resulting biological interpretations up to date, ArrayPlex is designed to not only download and store annotations upon system installation, but also to check for updated information, retrieve it, and update the resources managed within the relational schema. This functionality is provided and documented in the format of a standard system scheduler that is a part of the server operating system.
Integrated open-source sequence analysis toolsets
In addition to the many genome resources hosted on the ArrayPlex server, a large number of open-source analytical toolsets are integrated into the environment (Table 2). This set of tools includes NCBI BLAST, cluster, CLUSTALW, AVID/rVista, and several sequence motif discovery applications: AlignAce, MDSCAN, and MEME [13–20]. As detailed in Table 2, the majority of these applications are downloaded, compiled from source-code, and installed by the ArrayPlex installation program. Licensing restrictions prevented this for a few of the integrated toolsets. Complete documentation is included with the ArrayPlex installation on how to retrieve and install these additional utilities. The inclusion of these toolsets transformed ArrayPlex from solely an information warehouse to a server capable of extended analytical capacity. All of these analytical features are accessible by way of the graphical ArrayPlex client application, the command-line modules, and the programmatic API. Such access facilitates centralized and coordinated high-throughput data and sequence operations such as sequence retrieval, data manipulation and transformation, multi-genome BLAST, sequence motif search and discovery, hierarchical clustering, and sequence alignment. For example, it is possible to retrieve genomic sequence upstream of a set of genes of interest and carry out sequence motif discovery, all based on a few user-defined parameters. All of these utilities are executed on the ArrayPlex server, with only the results being transmitted immediately to the client computer. Thus, client computers that might not be able to compile or run these large-scale functional analysis programs can still access all their power in real time, and programmatically if so desired.
Analytical accessibility and customization
In addition to the many genome resources and toolsets hosted by the ArrayPlex environment, Figure 2 depicts the overall interactivity and relationship of the subcomponent elements. Both the ArrayPlex client and the command-line modules communicate over a network connection with the ArrayPlex server using the hypertext transfer protocol (HTTP). Many individual clients and/or command-line modules can simultaneously interact with a single server. On several occasions we have executed more than a dozen command-line modules simultaneously interacting with a single ArrayPlex server for annotation, ontology, and genome sequence, as well as analytical toolset executions. The ArrayPlex server was easily able to manage these parallel requests, some of which took days to weeks to complete.
This design and capacity is notable in two ways. First, the user invoking the client API routines needs no actual knowledge that the programmatic request will be fulfilled over a network on a remote server. The API is designed such that the complication of network implementation is hidden from the user. For example, the operation executeBlastAll (organism, evalue, sequence), which is part of the SequenceResources client API, does not reveal to the programmatic user that, during its execution, the parameters organism, evalue, and sequence are encoded into an object and sent across the network to the ArrayPlex server where the NCBI-BLAST utility blastall is actually executed. The result of that blastall execution is then formatted into a programmatic object on the server, and returned across the network to the client computer. To the programmatic user of the client API no network operation is evident; the BlastResult object is the result of the operation and their programmatic routines move to the next step just as if everything executed and completed on their local computer. Second, the information that is exchanged with the ArrayPlex server is in the form of documented API objects. This increases the efficiency by which a programmatic user can utilize the ArrayPlex API compared to other methods that launch processes remotely and retrieve results locally. Most methods of remote task invocation require the user to parse a stream of resulting information that is returned from the server. The task of parsing this information and determining actual results is error-prone. The ArrayPlex APIs are designed to communicate in terms of API documented objects. In the example above, the BlastResult object that is returned from the ArrayPlex server is a programmatic object just like any other in the application environment. Referring to the provided documentation the programmatic user can find out that the BlastResult object is composed of a set of BlastHit objects, each of which has parameters describing the genomic loci where BLAST found matching sequences.
The entire ArrayPlex environment is designed to allow customization. The ArrayPlex client can incorporate internationalization and localization of language elements through modification of a single resource bundle containing nearly all labels that appear throughout its interactive graphical interface. Sections of the ArrayPlex client can be removed; newly designed sections can be accommodated.
Documentation and guidance
Results and discussion
Analytical proving ground
We have tested the entire ArrayPlex system - server resources, client, and all command-line modules - over the course of more than a year in a real-world research context. We recently described the reconstruction of a genome-wide transcriptional regulatory network based on integrating data from more than 600 individual microarray experiments covering more than 260 transcription factors . ArrayPlex was the central hub of all computational activities for this project throughout each phase of data transformation and analysis.
We systematically screened hundreds of independent microarray experiments for channel-specific signal bias. We used a sophisticated error model implementation to identify statistically significant target genes based on replicate microarray data. With target genes identified for each of the 260 transcription factors profiled, we carried out regulatory epistasis analysis, expansive GO enrichment analysis, characterized sequence motif search, and novel sequence motif discovery. Additionally, ArrayPlex format-conversion capabilities were used to elucidate significant novel transcription factor-to-factor regulatory insights.
Genome annotation and ontology retrieval
User dataset retrieval, transformation, and manipulation
Genome sequence extraction, search, discovery, manipulation
Example routines in replicate combination
Example routines in network modelling
Example routines in ontological and sequence analysis
High-throughput microarray data quality analysis
One important step in most DNA microarray analysis is that of data quality evaluation. For example, it is important to check for any signal intensity bias and understand the effect of data normalization on individual and entire batches of microarray experiments. Secondarily, the selection of significant microarray values for an individual or set of experiments involves the filtering of candidate spots based on a variety of spot metrics. Measurements such as signal to noise ratios, spot consistency regression correlations, and background subtracted single-channel intensity values are typical metrics that are used to separate statistically meaningful spot values from those of dubious quality.
To address these issues we developed an entire section of the ArrayPlex client dedicated to processing, statistical analysis, and visualization of large batches of input data. The GenePix Results File Operations section of the ArrayPlex client has the capacity to batch-process a large number of GenePix Results (GPR) files for quality control evaluation. First, the GenePix Results File Charting section can read sets of GPR files into a batch queue for graphical analysis, such as generating MA plots (spot fluorescent intensity A to log-ratio M), which can detect a bias in the relationship of absolute signal intensity to ratio of spots . In addition to MA plots, histograms and scatter-plots can be mass-produced for any of the dozens of GPR spot metrics, enabling detection of biased signal-to-ratio relationships, non-normal log-ratio distributions, and substandard signal to noise distributions with the selection of just a few parameters and the browsing of automatically saved images.
The feature-set provided by the GenePix Results File Group Analysis section of the ArrayPlex client was invaluable in the earliest stages of transcription factor knock-out primary data aggregation and processing. Implementation of the error model required the systematic construction of several separate primary data matrices for the hundreds of individual microarray experiments that were the input to this stage of data processing. These included channel-specific foreground intensity, background intensity, and signal-to-noise matrices as well as spot-quality metrics such as regression correlation. ArrayPlex allowed us to explore and aggregate hundreds of individual microarray experiments as a single unit through importation as a single file group. Once the file group was created we were able to study the dataset-specific effects of various spot-metric thresholds on matrix construction and filter-mediated spot exclusion. These features impacted our understanding of both individual experiments as well as sets of microarray hybridizations performed together as batch groups. For each of the candidate statistical thresholds that were under consideration, we were able to understand the proportion of spots that would be excluded from individual experiments as well as gain visibility as to which batch groups were the most susceptible to filter-induced data exclusion. Filter toggling allowed us to clearly understand which individual filters in a logical group were having the most impact on spot exclusion. Once we arrived at a set of thresholds we deemed functionally appropriate, we then exported internally consistent data-matrices for each of the spot-metrics required by the error model. This section of the ArrayPlex client was so effective for these operations that it replaced our microarray database (Longhorn Array Database) for all data aggregation, filtering, and filtered dataset extraction portions of this research initiative.
Ontological enrichment and connectivity
A successful component of the reconstruction of the functional regulatory network was the mining of GO assignments among the target genes of a given transcription factor for statistically significant GO term enrichment . This functionality is built into both the ArrayPlex client and the command-line module TargetAnalysis.jar. The command-line module AnnotationResources.jar has the supplemental capacity to return a normalized single-format set of both ontology term declarations and organism-specific term assignments for each of the supported organisms.
The high-throughput capacity of the GO term enrichment toolsets provided by the command-line module TargetAnalysis.jar allowed us to calculate statistical enrichment for regulated target sets of each of the hundreds of transcription factors characterized. The process was simplified and easily repeatable through the module-provided ability to process input as a single file for all transcription factor target sets. Execution time was significantly reduced through parallel multi-threaded processing functionality provided as a user-selectable option. Configurable ArrayPlex server-mediated maintenance of GO terms and assignments ensured that up-to-date information was provided for each repetition of the analytical workflow.
The GO term enrichment toolsets contained the novel capacity to evaluate both individual (RAW) and network-aggregate (COMPOSITE) terms for statistical enrichment. The capacity for GO term enrichment calculations to evaluate both individual and aggregate terms ensured that ontology terms that were proximally co-located in the GO hierarchy were mined for statistical significance. Individual terms that might have missed statistical thresholds for significance were evaluated for the ability to network-aggregate up to significant higher-order terms.
Sequence analysis - discovery and search
We used ArrayPlex exclusively in the analysis of promoter sequences of regulated targets for each profiled transcription factor. We wished to determine whether sets of regulated targets had novel over-represented sequence motifs in their respective promoter regions. Similarly, we used ArrayPlex to characterize the statistical over-representation of previously characterized motifs within these regulatory regions. The module TargetAnalysis.jar provides the transcription factor target-specific implementations of these methods developed during this study while more generic implementations of these services are available in the module SequenceAnalysis.jar.
Sequence sets for each regulated target pool were extracted using ArrayPlex genome sequence retrieval services - available through the ArrayPlex client, command-line modules, and programmatic API. These sequence sets were then re-submitted to the ArrayPlex server for de novo analysis using APIs that invoked the bundled toolsets AlignACE, MEME, and MDscan. Each of these programs is a motif-discovery application designed to use a background sequence model to find over-represented motifs within the set of sequences provided. The background sequence model we utilized was a nucleotide frequency matrix computed by extraction of all S. cerevisiae intergenic regions. Each was permuted by a set of module-accessible parameters, including the desired motif width and the number of expected motifs. The output from this process was converted by ArrayPlex from the native output of each of the motif-discovery toolsets to a single universal format, thereby significantly simplifying downstream analysis. Ultimately, 105 transcription factor target sets produced 490 statistically significant and novel sequence motifs that passed systematic evaluation over 400,000 candidates sequences for properties such as nucleotide complexity and motif length .
A motif search process was executed for each transcription factor target set to scan promoter regions for the existence of previously characterized cis-regulatory sequences. A set of consensus sequences for each of the transcription factor deletions was aggregated from Stanford Genome Database, The Promoter Database of Saccharomyces cerevisiae (SCPD), and previous research [24–27]. Each of the consensus sequences was used by ArrayPlex to synthesize a regular expression that was evaluated for statistical enrichment relative to the previously described background model. Of the more than 200 profiled, 102 transcription factor target-set promoter regions showed enrichment for previously characterized sequence motifs, thereby increasing confidence in both the biological validity of the characterized motifs and the target pools characterized by this study .
Each of these sequence analysis processes was made possible and iteratively repeatable through the on-demand and up-to-date genome sequence resources offered by the ArrayPlex server, the parametric options available in its command-line modules and programmatic API, and the bundled sequence discovery and search toolsets.
Visualization - regulator on regulator analysis
The transcription factors PHD1, STP4, MCM1, MBF1, and HMS2 each have either a significant count of in-bound or out-bound regulatory connections with the other transcription factors that were profiled . Specifically, MCM1 activates a large number of transcription factors while STP4 is conversely activated by a large number of transcription factors. It is not surprising that MCM1 appears to be an activation hub for many transcription factors in the larger regulatory network. MCM1 has been shown to perform an active role in cell-cycle regulation through regulation of DNA replication initiation . STP4 has little official annotation. GO term enrichment analysis of its affected targets indicates statistically significant roles in nucleotidyltransferase, polyamine transporter, spermine transporter, and polyamine activities. These activities are general to the many pathways of amino acid metabolism and it is thus not surprising that STP4 would then be activated by a wide variety of other transcription factors. Also of interest in the regulatory network, the transcription factors MBF1, PHD1 and HMS2 are each repressed by many factors. Both PHD1 and HMS2 have been shown to perform an active role in pseudohyphal growth adaptation [30–32]. It is reasonable to believe that their transcriptional abundance would be repressed by many transcription factors in the many conditions in which their cellular role is not required.
It would have been difficult to detect these nuanced relationships without this rendering. The capacity for ArrayPlex to interconvert file formats from primary data formats to visualization-ready formats increases the efficiency and flexibility of data exploration and analysis.
Comparison to similar software packages
While ArrayPlex provides features common to many commonly utilized microarray databases (Bioarray Software Environment, Stanford Microarray Database, Longhorn Array Database), the ArrayPlex environment is not intended to operate as one. ArrayPlex was developed to fulfil the need for interactive, command-line, and programmatic access to up-to-date genomic resources and analytical toolsets in a networked computational environment. Thus, while several ArrayPlex client functions such as hierarchical clustering, ontology analysis, and GenePix Result File Group Analysis have the intrinsic capacity to store proprietary and tabular microarray data, ArrayPlex was not designed to supplant functionality nor provide this capacity in a manner that is comparable to traditional microarray databases. Several other research projects address some of these goals in a variety of ways, but none provide the combined suite of resources that ArrayPlex does. EnsMart, Atlas, Mayday, SeqHound, EMMA, GEPAS, and DAVID are all examples of bioinformatic server environments that address many of the stated data association and analytical goals [33–39]. SeqHound and Atlas each house an extensive API-accessible list of resources yet lack both an extensible user interface and pre-defined command-line modules. EnsMart, EMMA, and GEPAS each have a web interface or command-shell environment but lack a client-server enabled API. This feature was core to ArrayPlex's design goal of enabling all computers in a research environment to be productive platforms on which data analysis can be accomplished. Mayday and DAVID are toolsets focused upon DNA microarray data analysis and GO analysis, respectively. They each are feature-rich in these categories but lack integration with the wide variety of genomic resources provided by the ArrayPlex environment.
The high-throughput quality evaluation capabilities of the GenePix Results File Operations section of the ArrayPlex client, command-line modules, and programmatic API surpass existing commercial and open-source software offerings and greatly reduce the time and error involved in screening large microarray datasets for signal bias. Molecular Devices, the original manufacturer of GenePix Scanners, provides similar quality evaluation features in its GenePix Pro and Acuity software packages. These features, however, are limited to graphical user interface access and low-throughput single-microarray analysis. Bioconductor is an open-source microarray data analysis environment that offers programmatic API access to software routines capable of high-throughput quality evaluation and plot generation similar to that of ArrayPlex . Use of Bioconductor, however, is not extended to a graphical user interface or a simplified command-line module. In this manner, Bioconductor requires specialized knowledge of both the R programming language and shell environments, and is thus an option suited primarily for experienced computational biologists and programmers. Finally, in addition to quality evaluation, the GenePix Results File Normalization and GenePix Results File Group Analysis modules of the ArrayPlex client provide varied normalization, filter-based evaluation, and extraction features only partially provided by commercial software packages.
The ArrayPlex environment is a robust platform for genomic data analysis and visualization. Its ease of installation and operation provide ready-to-use aggregated genome resources, genome sequence, and analytical toolsets to users of the graphical interactive ArrayPlex client, command-line modules, and programmatic API. The ArrayPlex server keeps managed genome resources up-to-date, thus providing information and analytical results that are synchronized with curated knowledge. The open-source programmatic API allows all of the ArrayPlex functions, both client and server, to be expanded. ArrayPlex has been tested and improved in the course of a large-scale research project involving the utilization of all of its genome resources, genome sequence, and analytical toolsets.
Requirements and availability
ArrayPlex is available from its project site at sourceforge.net . The ArrayPlex server, client, and command-line modules are included in a single installation package. The ArrayPlex client and the command-line modules are prepared during the process of ArrayPlex server installation such that they are configured to communicate with the ArrayPlex server being installed by the system administrator. Complete source-code is provided for each of the operational components.
ArrayPlex server requirements
The default server installation requires either an Intel-based computer running the Linux operating system or any computer running Mac OS X. Linux servers running both the 2.4 and 2.6 generation of kernels have been tested. During its development, ArrayPlex was operated on many distributions of Linux [42–46]. Mac OS X has been tested with version 10.4 (Tiger), but we expect that most generations of this operating system will be compatible. Additionally, an operational PostgreSQL relational database system is required. The ArrayPlex development and testing process has utilized PostgreSQL server versions from 7.3 to 8.2. The database server does not need to be installed on the same computer as the ArrayPlex server, only reachable by TCP/IP network connectivity and standard PostgreSQL client utilities. A sequestered ArrayPlex schema instance is created within the PostgreSQL database server such that ArrayPlex can co-exist with other database instances. Neither the Java Runtime Environment (JRE) nor Apache Tomcat needs to be separately installed by the user, since each of these resources is bundled within the ArrayPlex installation in order to create a more encapsulated and ready-to-operate system. Alternative implementations of the JRE or Apache Tomcat can be substituted through simple sub-folder replacement within the installed ArrayPlex server. This process is documented in the ArrayPlex Server Installation Guide.
The ArrayPlex distribution, as downloaded from the SourceForge.net project site, is 350 MB in size. The ArrayPlex server, however, downloads a large quantity of genomic annotation and sequence during the installation process. The genomic sequence files are transformed into NCBI BLAST-compatible databases that allow for rapid sequence retrieval. This results in the consumption of significant drive space such that an operational ArrayPlex server requires at least 14 GB for complete installation.
ArrayPlex client requirements
The ArrayPlex client is not installed but rather launched from the ArrayPlex server by clicking a link within any web browser. The client is supported on Mac OS X, Windows XP, Windows Vista, and most distributions of the Linux operating system. Each of these client operating systems must have a JRE installed . The default Microsoft-provided Java installation on any version of Windows is not supported. A JRE should be downloaded and installed from Sun Microsystems. The JRE that is bundled with Mac OS X (10.2 Jaguar to 10.5 Leopard) has been tested.
Command-line module requirements
The requirements for use of the command-line modules match those of the ArrayPlex client. They are built by the ArrayPlex server installation process and downloaded by a web-browser to any supported client computer.
application programming interface
hypertext transfer protocol
Java Runtime Environment
National Center for Biotechnology Information
pre-clustering file format.
We would like to thank the authors, creators, and maintainers of all the referenced and included data resources and toolsets for both the initial creation and continued support of these materials. We thank all members of the Iyer Lab for use and testing of initial ArrayPlex prototypes. This work was supported by grants from the US National Institutes of Health (NIH) and the US National Science Foundation (VRI), by a training grant from the US National Institute on Alcohol Abuse and Alcoholism and by a University of Texas Continuing Fellowship (PJK).
- Sherlock G, Hernandez-Boussard T, Kasarskis A, Binkley G, Matese J, Dwight S, Kaloper M, Weng S, Jin H, Ball C, Eisen M, Spellman P, Brown P, Botstein D, Cherry J: The Stanford Microarray Database. Nucleic Acids Res. 2001, 29: 152-155. 10.1093/nar/29.1.152.PubMedPubMed CentralView ArticleGoogle Scholar
- Sarkans U, Parkinson H, Lara G, Oezcimen A, Sharma A, Abeygunawardena N, Contrino S, Holloway E, Rocca-Serra P, Mukherjee G, Shojatalab M, Kapushesky M, Sansone S, Farne A, Rayner T, Brazma A: The ArrayExpress gene expression database: a software engineering and implementation perspective. Bioinformatics. 2005, 21: 1495-1501. 10.1093/bioinformatics/bti157.PubMedView ArticleGoogle Scholar
- Saeed A, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M, Sturn A, Snuffin M, Rezantsev A, Popov D, Ryltsov A, Kostukovich E, Borisovsky I, Liu Z, Vinsavich A, Trush V, Quackenbush J: TM4: a free, open-source system for microarray data management and analysis. BioTechniques. 2003, 34: 374-378.PubMedGoogle Scholar
- Saal L, Troein C, Vallon-Christersson J, Gruvberger S, Borg A, Peterson C: BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol. 2002, 3: SOFTWARE0003-10.1186/gb-2002-3-8-software0003.PubMedPubMed CentralView ArticleGoogle Scholar
- Killion PJ, Sherlock G, Iyer VR: The Longhorn Array Database (LAD): an open-source, MIAME compliant implementation of the Stanford Microarray Database (SMD). BMC Bioinformatics. 2003, 4: 32-10.1186/1471-2105-4-32.PubMedPubMed CentralView ArticleGoogle Scholar
- Ball C, Awad I, Demeter J, Gollub J, Hebert J, Hernandez-Boussard T, Jin H, Matese J, Nitzberg M, Wymore F, Zachariah Z, Brown P, Sherlock G: The Stanford Microarray Database accommodates additional microarray platforms and data formats. Nucleic Acids Res. 2005, 33: D580-582. 10.1093/nar/gki006.PubMedPubMed CentralView ArticleGoogle Scholar
- Linux. [http://www.linux.org/]
- Apache Tomcat. [http://tomcat.apache.org/]
- PostgreSQL. [http://www.postgresql.org/]
- Sun Java Web Start - JWS. [http://java.sun.com/products/javawebstart/]
- Sun Java Archive Format - JAR. [http://java.sun.com/javase/6/docs/technotes/guides/jar/jar.html]
- FASTA Format. [http://www.ncbi.nlm.nih.gov/blast/fasta.shtml]
- Bray N, Dubchak I, Pachter L: AVID: A global alignment program. Genome Res. 2003, 13: 97-102. 10.1101/gr.789803.PubMedPubMed CentralView ArticleGoogle Scholar
- Loots GG, Ovcharenko I, Pachter L, Dubchak I, Rubin EM: rVista for comparative sequence-based discovery of functional transcription factor binding sites. Genome Res. 2002, 12: 832-839.PubMedPubMed CentralView ArticleGoogle Scholar
- Hughes J, Estep P, Tavazoie S, Church G: Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. J Mol Biol. 2000, 296: 1205-1214. 10.1006/jmbi.2000.3519.PubMedView ArticleGoogle Scholar
- Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.PubMedPubMed CentralView ArticleGoogle Scholar
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.PubMedView ArticleGoogle Scholar
- Sherlock G: Cluster. [http://genome-www5.stanford.edu/download/]
- MEME. [http://meme.nbcr.net/downloads/]
- MDSCAN. [http://motif.stanford.edu/distributions/mdscan/]
- Hu Z, Killion PJ, Iyer VR: Genetic reconstruction of a functional transcriptional regulatory network. Nat Genet. 2007, 39: 683-687. 10.1038/ng2012.PubMedView ArticleGoogle Scholar
- Smyth G, Speed T: Normalization of cDNA microarray data. Methods. 2003, 31: 265-273. 10.1016/S1046-2023(03)00155-5.PubMedView ArticleGoogle Scholar
- Hahn JS, Hu Z, Thiele DJ, Iyer VR: Genome-wide analysis of the biology of stress responses through heat shock transcription factor. Mol Cell Biol. 2004, 24: 5249-5256. 10.1128/MCB.24.12.5249-5256.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Chiang DY, Moses AM, Kellis M, Lander ES, Eisen MB: Phylogenetically and spatially conserved word pairs associated with gene-expression changes in yeasts. Genome Biol. 2003, 4: R43-10.1186/gb-2003-4-7-r43.PubMedPubMed CentralView ArticleGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, MacIsaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, Jennings EG, Zeitlinger J, Pokholok DK, Kellis M, Rolfe PA, Takusagawa KT, Lander ES, Gifford DK, Fraenkel E, Young RA: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431: 99-104. 10.1038/nature02800.PubMedPubMed CentralView ArticleGoogle Scholar
- Kellis M, Patterson N, Endrizzi M, Birren B, Lander ES: Sequencing and comparison of yeast species to identify genes and regulatory elements. Nature. 2003, 423: 241-254. 10.1038/nature01644.PubMedView ArticleGoogle Scholar
- Zhu J, Zhang MQ: SCPD: a promoter database of the yeast Saccharomyces cerevisiae. Bioinformatics. 1999, 15: 607-611. 10.1093/bioinformatics/15.7.607.PubMedView ArticleGoogle Scholar
- Shannon P, Markiel A, Ozier O, Baliga N, Wang J, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.PubMedPubMed CentralView ArticleGoogle Scholar
- Lydall D, Ammerer G, Nasmyth K: A new role for MCM1 in yeast: cell cycle regulation of SW15 transcription. Genes Dev. 1991, 5: 2405-2419. 10.1101/gad.5.12b.2405.PubMedView ArticleGoogle Scholar
- Gimeno C, Fink G: Induction of pseudohyphal growth by overexpression of PHD1, a Saccharomyces cerevisiae gene related to transcriptional regulators of fungal development. Mol Cell Biol. 1994, 14: 2100-2112.PubMedPubMed CentralView ArticleGoogle Scholar
- Lorenz M, Heitman J: Regulators of pseudohyphal differentiation in Saccharomyces cerevisiae identified through multicopy suppressor analysis in ammonium permease mutant strains. Genetics. 1998, 150: 1443-1457.PubMedPubMed CentralGoogle Scholar
- Pan X, Heitman J: Sok2 regulates yeast pseudohyphal differentiation via a transcription factor cascade that regulates cell-cell adhesion. Mol Cell Biol. 2000, 20: 8364-8372. 10.1128/MCB.20.22.8364-8372.2000.PubMedPubMed CentralView ArticleGoogle Scholar
- Dietzsch J, Gehlenborg N, Nieselt K: Mayday - a microarray data analysis workbench. Bioinformatics. 2006, 22: 1010-1012. 10.1093/bioinformatics/btl070.PubMedView ArticleGoogle Scholar
- Dondrup M, Goesmann A, Bartels D, Kalinowski J, Krause L, Linke B, Rupp O, Sczyrba A, Puhler A, Meyer F: EMMA: a platform for consistent storage and efficient analysis of microarray data. J Biotechnol. 2003, 106: 135-146. 10.1016/j.jbiotec.2003.08.010.PubMedView ArticleGoogle Scholar
- Huang da W, Sherman BT, Tan Q, Collins JR, Alvord WG, Roayaei J, Stephens R, Baseler MW, Lane HC, Lempicki RA: The DAVID Gene Functional Classification Tool: a novel biological module-centric algorithm to functionally analyze large gene lists. Genome Biol. 2007, 8: R183-10.1186/gb-2007-8-9-r183.PubMedPubMed CentralView ArticleGoogle Scholar
- Kasprzyk A, Keefe D, Smedley D, London D, Spooner W, Melsopp C, Hammond M, Rocca-Serra P, Cox T, Birney E: EnsMart: a generic system for fast and flexible access to biological data. Genome Res. 2004, 14: 160-169. 10.1101/gr.1645104.PubMedPubMed CentralView ArticleGoogle Scholar
- Michalickova K, Bader GD, Dumontier M, Lieu H, Betel D, Isserlin R, Hogue CW: SeqHound: Biological Sequence and Structure Database as a Platform for Bioinformatics Research. BMC Bioinformatics. 2002, 3: 32-10.1186/1471-2105-3-32.PubMedPubMed CentralView ArticleGoogle Scholar
- Montaner D, Tarraga J, Huerta-Cepas J, Burguet J, Vaquerizas JM, Conde L, Minguez P, Vera J, Mukherjee S, Valls J, Pujana MA, Alloza E, Herrero J, Al-Shahrour F, Dopazo J: Next station in microarray data analysis: GEPAS. Nucleic Acids Res. 2006, 34: W486-491. 10.1093/nar/gkl197.PubMedPubMed CentralView ArticleGoogle Scholar
- Shah SP, Huang Y, Xu T, Yuen MM, Ling J, Ouellette BF: Atlas - a Data Warehouse for Integrative Bioinformatics. BMC Bioinformatics. 2005, 6: 34-10.1186/1471-2105-6-34.PubMedPubMed CentralView ArticleGoogle Scholar
- Gentleman R, Carey V, Bates D, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge Y, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini A, Sawitzki G, Smith C, Smyth G, Tierney L, Yang J, Zhang J: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMedPubMed CentralView ArticleGoogle Scholar
- SourceForge.net ArrayPlex Project. [http://sourceforge.net/projects/arrayplex/]
- Mandriva Linux. [http://www.mandriva.com/]
- Fedora Linux. [http://www.fedoraproject.org/]
- Gentoo Linux. [http://www.gentoo.org/]
- RedHat Linux. [http://www.redhat.com/]
- Ubuntu Linux. [http://www.ubuntu.com]
- Sun Java Runtime Environment - JRE. [http://www.java.com]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.