XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments
- Morris A Swertz1, 2, 3Email author,
- K Joeri van der Velde1, 2,
- Bruno M Tesson2,
- Richard A Scheltema2,
- Danny Arends1, 2,
- Gonzalo Vera2,
- Rudi Alberts4,
- Martijn Dijkstra5,
- Paul Schofield6,
- Klaus Schughart4,
- John M Hancock7,
- Damian Smedley3,
- Katy Wolstencroft8,
- Carole Goble8,
- Engbert O de Brock9,
- Andrew R Jones10,
- Helen E Parkinson3,
- members of the Coordination of Mouse Informatics Resources (CASIMIR)6,
- Genotype-To-Phenotype (GEN2PHEN) Consortiums1 and
- Ritsert C Jansen1, 2
© Swertz et al.; licensee BioMed Central Ltd. 2010
Received: 14 July 2009
Accepted: 9 March 2010
Published: 9 March 2010
We present an extensible software model for the genotype and phenotype community, XGAP. Readers can download a standard XGAP (http://www.xgap.org) or auto-generate a custom version using MOLGENIS with programming interfaces to R-software and web-services or user interfaces for biologists. XGAP has simple load formats for any type of genotype, epigenotype, transcript, protein, metabolite or other phenotype data. Current functionality includes tools ranging from eQTL analysis in mouse to genome-wide association studies in humans.
Modern genetic and genomic technologies provide researchers with unprecedented amounts of raw and processed data. For example, recent genetical genomics [1–3] studies have mapped gene expression (eQTL), protein abundance (pQTL) and metabolite abundance (mQTL) to genetic variation using genome-wide linkage and genome-wide association experiments on various microarray, mass spectrometry and proton nuclear magnetic resonance (NMR) platforms and in a wide range of organisms, including human [4–8], yeast [9, 10], mouse , rat , Caenorhabditis elegans  and Arabidopsis thaliana [14–16].
Understanding these and other high-tech genotype-to-phenotype data is challenging and depends on suitable 'cyber infrastructure' to integrate and analyze data [17, 18]: data infrastructures to store and query the data from different organisms, biomolecular profiling technologies, analysis protocols and experimental designs; graphical user interfaces (GUIs) to submit, trace and retrieve these particular data; communicating infrastructure in, for example, R , Java and web services to connect to different processing infrastructures for statistical analysis [20–24] and/or integration of background information from public databases ; and a simple file format to load and exchange data within and between projects.
Many elements of the required cyber infrastructure are available: The Generic Model Organism Database (GMOD) community developed the Chado schema for sequence, expression and phenotype data  and delivered reusable software components like gbrowse ; the BioConductor community has produced many analysis packages that include data structures for particular profiling technologies and experimental protocols ; and numerous bespoke databases, data models, schemas and formats have been produced, such as the public and private microarray expression databases and exchange formats [29–31]. Some integrated cyber infrastructures are also available: the National Center for Biotechnology Information (NCBI) has launched dbGaP (database of genotypes and phenotypes) , a public database to archive genotype and clinical phenotype data from human studies; and the Complex Trait Consortium has launched GeneNetwork , a database for mouse genotype, classical phenotype and gene expression phenotype data with tools for 'per-trait' quantitative trait loci (QTL) analysis.
However, a suitable and customizable integration of these elements to support high throughput genotype-to-phenotype experiments is still needed : dbGaP, GeneNetwork and the model organism databases are designed as international repositories and not to serve as general data infrastructure for individual projects; many of the existing bespoke data models are too complicated and specialized, hard to integrate between profiling technologies, or lack software support to easily connect to new analysis tools; and customization of the existing infrastructures dbGaP, GeneNetwork or other international repositories [35, 36] or assembly of Bioconductor and generic model organism database components to suit particular experimental designs, organisms and biotechnologies still requires many minor and sometimes major manual changes in the software code that go beyond what individual lab bioinformaticians can or should do, and result in duplicated efforts between labs if attempted.
To fill this gap we here report development of an extensible data infrastructure for genotype and phenotype experiments (XGAP) that is designed as a platform to exchange data and tools and to be easily customized into variants to suit local experimental models. We therefore adopted an alternative software engineering strategy, as outlined in our recent review , that enables generation of such software efficiently using three components: a compact and extensible 'standard' model of data and software; a high-level domain-specific language (DSL) to simply describe biology-specific customizations to this software; and a software code generator to automatically translate models and extensions into all low-level program files of the complete working software, building on reusable elements such as listed above as well as general informatics elements and some new/optimized elements that were missing.
Features of XGAP database for genotype and phenotype experiments
Store genotype and phenotype experimental data using only four 'core' data types: Trait, Subject, Data, and DataElement. For example: a single-channel microarray reports raw gene expression Data for each microarray probe Trait and each individual Subject. Add information on data provenance by giving details in Investigation, Protocols and ProtocolApplications
Customize 'my' XGAP database with extended variants of Trait and Subject. In the online XGAP demonstrator, Probe traits have a sequence and genome location and Strain subjects have parent strains and (in)breeding method. Describe extensions using MOLGENIS language and the generator automatically changes XGAP database software to your research
Upload data from measurement devices, public databases, collaborating XGAP databases, or a public XGAP repository with community data. Simply download trait information as tab-delimited files from one XGAP and upload it into another; this works because of the uniformity of the core data types (and extensions thereof)
Search genetical genomics data using the graphical user interface with advanced query tools. The uniformity of the 'code generated' interfaces make it easy to learn and use interfaces for both 'core' data types as well as customized extensions
Analyze data by connecting tools using simple methods in Java, R, Web Services or Internet hyperlinks. For example, map and plot quantitative trait loci in R using XGAP data retrieved via the R interface
Plug-in the best analysis tools into the user interface so biologists can use them. Bioinformaticians are provided with simple mechanisms to seamlessly add such tools to XGAP, building on the automatically generated GUI and API building blocks
Share data, customizations, connected analysis tools and user interface plug-ins with the genetical genomics community, using XGAP as exchange platform. For example, the MetaNetwork R package can talk to data in XGAP. This makes it easy for other XGAP owners to also use it
Minimal and extensible object model
We developed the XGAP object model to uniformly capture the wide variety of (future) genotype and phenotype data, building on generic standard model FuGE (Functional Genomics Experiment)  for describing the experimental 'metadata' on samples, protocols and experimental variables of functional genomics experiments, the OBO model (of the Open Biological and Biomedical Ontologies foundry for use of standard and controlled vocabularies and ontologies that ease integration , and lessons learned from previous, profiling technology-specific modeling efforts .
Use cases of core data types
A growth measurement (Data) reports the time (DataElement) it took to flower (Trait) for an Arabidopsis plant (Subject)
A two-color microarray result (Data) describes raw intensities measured (DataElement) for gene transcript probe hybrdization (Trait) for each pair of Arabidopsis individuals (Subject)
A marker measurement (ProtocolApplication) resulted in a genetic profile (Data) with genotype values (DataElement) for each SNP/microsatellite marker (Trait) for each human individual (Subject)
A genetical genomics stem cell Investigation was carried out on 30 recombinant mouse inbred strains (Subject). It involved a ProtocolApplication of the 'Affymetrix MG-U74Av2' Protocol to produce expression profiles (Data) for 12,422*16 microarray probes (Traits). These profiles consisted of a matrix of signals (DataElement) for each Probe (Traits) and each InbredStrain (Subject). Subsequently, these Data were taken as inputData in a normalization procedure (ProtocolApplication) using RMA normalization Protocol, which resulted in outputData of normalized profiles (Data) of Probe*InbredStrain (Trait*Subject)
Use cases of extended data types
Sample is a Subject with the additional property that 'Tissue' can be specified
Individual is a Subject with the additional property that relationships with Mother and Father individuals, as well as Strain, can be specified
PairedSample is a Sample with the additional property that 'Dye' has to be specified and which two Subjects (or subclasses such as Individual) are labeled with 'Cy3' and 'Cy5'
An InbredStrain is a Strain with the additional property that the 'Parents' (mother Individual and father Individual) are specified and the 'type' of inbreeding used
An amplified fragment length polymorphism, microsatellite or SNP Marker (is a Trait) may refer to genetic and possible genomics location (Marker also is a Locus)
A correlation computation (Data) reports associations (DataElement) between Metabolite (is a Trait); because Trait and Subject are both extensions of DimensionElement, they can be connected to a row and column of DataElement interchangeably
Several standard data types were also inherited from FuGE to enable researchers to provide 'Minimum Information' for QTLs and Association Studies such as defined in the MIQAS checklist  - a member of the Minimum Information for Biological and Biomedical Investigations (MIBBI) guideline effort . Data types Action(Application), Software(Application), Equipment(Application) and Parameter(Value) can be used to describe Protocol(Application)s in more detail. For example, a normalization Protocol may involve a 'robust multiarray average (RMA) normalization' Action that uses Bioconductor 'affy' Software  with certain ParameterValues. Data types Description, BibliographicReferences, DatabaseEntry, URI, and FileAttachment enable researchers to freely add additional annotations to certain data types - DimensionElement, Investigation, Protocol, ProtocolApplication, and Data. For example, researchers can annotate a Gene with one or more DatabaseEntries, referring to unique database accession numbers for automated data integration.
Use cases of annotation data types
A Gene in an Arabidopsis Investigation can be connected to a DatabaseEntry describing a reference to related information in the TAIR database  and another DatabaseEntry describing a reference to the MIPS database 
Each Individual in a C. elegans Investigation is annotated with an OntologyTerm to indicate that it was grown in an environment of either 16°C or 24°C
The Arabidopsis Investigation was annotated with the BibliographicReferences pointing to the paper describing the investigation and expected results
A Protocol describes the 'MapTwoPart' method for QTL mapping and was annotated with the URI linking to the 'MetaNetwork R-package', which contains this method, and a BibliographicReference pointing to the paper [22, 67] that describes the MapTwoPart protocol
A file with a Venn diagram describing the number of masses detected in each population was added as FileAttachement to the Arabidopsis metabolite Investigation
Another feature of XGAP is the uniform treatment of all data on these subjects and traits. To understand basic data in XGAP, newcomers just have to learn that all data are stored as Data matrices with each DataElement describing an observation on Subjects and/or Traits (rows × columns). Unlike the proven matrix structures used in MAGE-TAB (tabular format for microarray gene expression experiments) , in XGAP these data can be on any Trait and/or Subject combination, that is, we did not create many variants of DataElement to accommodate each combination of Trait and Subject such as MAGE-TAB's ExpressionDataElement (Probe × Sample), MassSpecDataElement (MassPeak × Sample), eQtlMappingDataElement (Marker × Probe), and so on. Instead, we store all these data using the generic type DataElement and limit extension to Trait and Subject only. This avoids the (combinatorial) explosion of DataElement extensions so researchers can provide basic data as common data matrices (of DataElements) and can still add particular annotations flexibly to the matrix row and columns to allow for (new) biotechnologies as demonstrated in the various Trait extensions in Figure 1. Keeping this simple and uniform data structure greatly enhances data and software (re)usability and hence productivity, in line with the findings by Brazma et al.  and Rayner et al.  that the simple tabular structures underlying biological data should be exploited instead of making it overly complicated.
After structural homogenization, such as provided by FuGE and XGAP, semantic queries are the remaining major barrier for integration of experimental metadata. This requires ontologies that describe the properties of the materials and also descriptions of experimental processes, data and instruments. The former are provided by species-specific ontologies that are available from various sources. The Ontology for BioMedical investigation  may provide a solution for the experimental descriptors and is being used in this context by, for example, the Immune Epitope Database . To enable researchers to use these well understood descriptors, XGAP inherits from FuGE the mechanism of 'annotations', a special field to link any data object to one or more ontology terms. For example, researchers can annotate a Gene with one or more OntologyTerms if required, referring to standard ontology terms from OBO  or ontology terms defined locally.
Simple text-file format for data exchange
To enable data exchange using the XGAP model, we produced a simple text-file format (XGAP-TAB) based on the experience that for data formats to be used, data files should be easily created using simple Excel and text editor tools and closely resemble existing practices. This format is automatically derived from the model by requiring that all annotations on Investigations, Protocols, Traits, Subjects, and extensions thereof, are described as delimited text files (one file per data type) with columns matching the properties described in the object model and each row describing one data instance. Optionally, sets of DataElements can also be formatted as separate text matrices with row and column names matching these in the Trait and Subject annotation files, and with each matrix value matching one DataElement. The dimensions of each data matrix are then listed by a row in the annotations on Data.
Easy to customize software infrastructure
A pilot software infrastructure is available at  to help genotype-to-phenotype researchers to adopt XGAP as a backbone for their data and tool integration. We chose to use the MOLGENIS toolkit (biosoftware generator for MOLecular GENetics Information Systems; see Materials and methods) to auto-generate from the XGAP model: 1, an SQL (Structured Query Language for relational databases) file with all necessary statements for setting up your own, customized variant of the XGAP database; 2, application programming interfaces (APIs) in R, Java and Web Services that allow bioinformaticians to plug-in their R processing scripts, Taverna workflows [25, 52, 53] and other tools; 3, a bespoke web-based graphical user interface (GUI) by which researchers can submit and retrieve data and run plugged-in tools; and 4, import/export wizards to (un)load and validate data sets exchanged in XGAP-TAB format. The auto-generation process can be repeated to quickly customize XGAP from an extended model, for example, to accommodate a particular new type of measurement technology or experimental design.
Graphical user interface
Use cases of the graphical user interface for biologists
Navigate all Investigations, and for each Investigation, see the Assays and available Data
Select a Gene and find all Investigations in which this Gene is regulated as suggested by significant eQTL Data (P-value < 0.001)
For a given Locus, select all Genes that have QTL Data mapping 'in trans'; and this may be regulated by this Locus, for example, absolute(QTL locus - gene locus) > 10 Mb and QTL P-value < 0.001
Download a selection of raw gene expression Data as a tab-delimited file (to import into other software)
Upload Investigation information from tab-delimited files
Upload Affymetrix Assays using custom *.CEL/*.CDF file readers
Plot highly correlated metabolic network Data in a network visualization graph
Define security levels for Assays/Investigations to ensure that appropriate data can be viewed only by collaborators, and not by other people
A MassPeak has been identified to be 'proline' and we can follow the link-out URI to Pubchem , because it was annotated to have 'cid' 614, to find information on structure, activity, toxicology, and more
Application programming interfaces
De facto standard analysis tools are emerging, for example, tools for transcript data [20, 21, 24] or metabolite abundance data  to mention just a few. These tools are typically implemented using the open source software for statistical analysis and graphics named R . Bioinformaticians can connect their particular R or Java programs to the XGAP database using an API with similar functionality to the GUI, that is, using simple commands like 'find', 'add' and 'update' (R/API, Java/API). Scripts in other programming languages and workflow tools like Taverna  can use web services (SOAP/API) or a simple hyperlink-based interface (HTTP/API), for example, http://my-xgap/api/find/Data?investigation=1 returns all data in investigation '1'. On top of this, conversion tools have been added to the R interface to read and write XGAP data to the widely used R/qtl package .
Use cases of the application programming interface for bioinformaticians
In R, parse a set of tab-delimited Marker, Genotype and Trait files and load them into the database (R/API)
In R, retrieve all Traits, Markers, expression Data, and genotype Data from an investigation as data matrices, before QTL mapping with MetaNetwork (R/API)
In Java, retrieve a list of QTL profile correlation Data to show them as a regulatory network graph (J/API)
In Java, customize generated file readers to load specific file formats (J/API)
In Taverna, retrieve Genes from XGAP to find pathway information in KEGG (WS/API)
In Python, retrieve a list of QTL mapping Data using a hyperlink to XGAP (HTTP/API)
A generated import tool takes care of checking the consistency of all traits, subjects and data that are provided in XGAP-TAB text files and loads them into the database. The entries in all files should be correctly linked, the data must be imported in the right order and the names and IDs need to be resolved between all the annotation files to check and link genes, microarray probes and gene expression to the data. The import program takes care of all these issues (conversion, relationship checks, dependency ordering, and so on). Moreover, the import program supports 'transactions', which ensures that all data inserts are rolled back if an import fails halfway, preventing incomplete or incorrect investigation data to be stored in the database. In a similar way, an export wizard is provided to download investigation data as a zipped directory of XGAP-TAB files.
When XGAP is customized with additional data type variants, the import/export program is automatically extended by the MOLGENIS generator, 'future-proofing' the data format for new biotechnological profiling platforms. Moreover, the auto-generated import program can also be used as a template for parsers of proprietary data formats, such as implemented in parsers for the PED/MAP, HapMap, and GeneNetwork data. Collaborations are underway within EBI and GEN2PHEN to also enable import/export of MAGE-TAB  files, the standard format for microarray experiments, of PAGE-OM  files, a specialized format for genome-variation oriented genotype-to-phenotype experiments, and of ISA-TAB  files, a generalized evolution of MAGE-TAB to represent all experimental metadata on any investigation, study and assay designed to be FuGE compatible. Also, convertors to ease retrieval and submission to public repositories like dbGaP are under development. It is envisaged that integration of all these formats will enable integrated analysis of experimental data from, for example, mouse and human experiments using various biotechnology platforms, which was previously near impossible for biological labs to implement.
All XGAP and MOLGENIS software can be downloaded for free under the terms of the open source license LGPL. Extended documentation on XGAP and MOLGENIS customization is available online at the XGAP and MOLGENIS wikis [51, 57].
In this paper we report a minimal and extensible data infrastructure for the management and exchange of genotype-to-phenotype experiments, including an object model for genotype and phenotype data (XGAP-OM), a simple file format to exchange data using this model (XGAP-TAB) and easy-to-customize database software (XGAP-DB) that will help groups to directly use and adapt XGAP as a platform for their particular experimental data and analysis protocols.
XGAP participating consortia
The collection and distribution of large volumes of complex data typical of functional genomics is carried out by an increasing number of disseminated databases of hugely variable scale and scope. Combined analysis of highly distributed datasets provides much of the power of the approach of functional genomics, but depends on databases' ability to exchange data with each other and on analytical tools with semantic and structural integrity. Agreement on the standards adopted by databases will inevitably be a matter of community consensus and to that end a recent coordination action funded by the European Commission, CASIMIR , is engaged in a community consultation on the nature of the technical and semantic standards needed. What has already become clear in use-case studies conducted so far is that whatever standards are adopted, they will inevitably remain dynamic and continue to develop, particularly as new data types are collected. Crucially, they should allow the open-ended development of analytical and data-mining software, while integration of efforts to agree such standards and develop new software is essential
Currently available genotype-to-phenotype (G2P) databases are few and far between, have great diversity of design, and limited or no interoperability between them. This arrangement provides no convenient way to populate the databases, no easy way to exchange, compare or integrate their content, and absolutely no way to search the totality of gathered information. In this context, the European Commission has recently funded the GEN2PHEN project , which intends to significantly improve the database infrastructure available within Europe for the collation, storage, and analysis of human and model-organism G2P data. This will be achieved by first developing various cutting-edge solutions, and then deploying these in conjunction with proven concepts, so as to transform the current elementary G2P database reality into a powerful networked hierarchy of interlinked databases, tools and standards
Based on these experiences, we expect use of XGAP to help the community of genome-to-phenome researchers to share data and tools, notwithstanding large variations in their research aims. The XGAP data format can be used to represent and exchange all raw, intermediate and result data associated with an investigation, and an XGAP database, for instance, can be used as a platform to share both data and computational protocols (for example, written in the R statistical language) associated with a research publication in an open format. We envision a directory service to which XGAP users can publish metadata on their investigations either manually or automatically by configuring this option in the XGAP administration user interface. This directory service can then be used as an entry point for federated querying between the community of XGAPs to share data and tools.
Groups that already have an infrastructure can assimilate XGAP to ease evolution of their existing software. Next to their existing user tools, they can 'rewire' algorithms and visual tools to also use the MOLGENIS APIs as data backend. Thus, researchers still have the same features as before, plus the features provided by the generated infrastructure (for example, data management GUIs, R/API) and connected tools (for example, R packages developed elsewhere). Moreover, much less software code needs to be maintained by hand when replacing hand-written parts by MOLGENIS-generated parts, allowing software engineers to add new features for researchers much more rapidly.
We invite the broader community to join our efforts at the public XGAP.org wiki, mailing list and source code versioning system to evolve and share the best XGAP customizations and GUI/API 'plug-in' enhancements, to support the growing range of profiling technologies, create data pipelines between repositories, and to push developments in the directions that will most benefit research.
Materials and methods
Software modeling, auto-generation/configuration and component toolboxes are increasingly used in bioinformatics to speed up (bespoke) biological software development; see our recent review . For XGAP we required a software toolbox providing query interfaces, data management interfaces, programming interfaces to R and web services, simple data exchange formats and a minimal requirement of programming knowledge. The MOLGENIS modeling language and software generator toolbox [37, 56] was chosen as it combines all these features.
Several alternative toolboxes were evaluated: BioMart [57, 61] and InterMine  generate powerful query interfaces for existing data but are not suited for data management; Omixed  generates programmatic interfaces onto databases, including a security layer, but lacks user interfaces; PEDRO/Pierre  generates data entry and retrieval user interfaces but lacks programmatic interfaces; and general generators such as AndroMDA  and Ruby-on-Rails  require much more programming/configuration efforts compared to tools specific to the biological domain. Turnkey  seemed to be closest to our needs: it emerged from the GMOD community having GUI and SOAP interfaces but lacks auto-generation of R interfaces and a file exchange format.
application programming interface
database of genotypes and phenotypes
domain-specific computer language
Functional Genomics Experiment model
Generic Model Organism Database
graphical user interface
liquid chromatography-mass spectrometry
tabular format for microarray gene expression experiments
biosoftware generator for MOLecular GENetics Information Systems
proton nuclear magnetic resonance
quantitative trait locus
web services using simple object access protocol
Structured Query Language for relational databases
eXtensible Genotype And Phenotype platform.
The authors thank CASIMIR (funded by the European Commission under contract number LSHG-CT-2006-037811, ; Table 7), and GEN2PHEN, a FP7 project funded by the European Commission (FP7-HEALTH contract 200754, ; Table 7). The authors also thank NWO (Rubicon Grant 825.09.008) for financial support.
- Li Y, Breitling R, Jansen RC: Generalizing genetical genomics: getting added value from environmental perturbation. Trends Genet. 2008, 24: 518-524. 10.1016/j.tig.2008.08.001.PubMedView ArticleGoogle Scholar
- Jansen RC, Nap JP: Genetical genomics: the added value from segregation. Trends Genet. 2001, 17: 388-391. 10.1016/S0168-9525(01)02310-1.PubMedView ArticleGoogle Scholar
- Li J, Burmeister M: Genetical genomics: combining genetics with gene expression analysis. Hum Mol Genet. 2005, 14 (Spec No 2): R163-169. 10.1093/hmg/ddi267.PubMedView ArticleGoogle Scholar
- Editorial: Pinpointing expression differences. Nat Genet. 2007, 39: 1175-
- Goring HH, Curran JE, Johnson MP, Dyer TD, Charlesworth J, Cole SA, Jowett JB, Abraham LJ, Rainwater DL, Comuzzie AG, Mahaney MC, Almasy L, Maccluer JW, Kissebah AH, Collier GR, Moses EK, Blangero J: Discovery of expression QTLs using large-scale transcriptional profiling in human lymphocytes. Nat Genet. 2007, 39: 1208-1216. 10.1038/ng2119.PubMedView ArticleGoogle Scholar
- Dixon AL, Liang L, Moffatt MF, Chen W, Heath S, Wong KC, Taylor J, Burnett E, Gut I, Farrall M, Lathrop GM, Abecasis GR, Cookson WO: A genome-wide association study of global gene expression. Nat Genet. 2007, 39: 1202-1207. 10.1038/ng2109.PubMedView ArticleGoogle Scholar
- Stranger BE, Nica AC, Forrest MS, Dimas A, Bird CP, Beazley C, Ingle CE, Dunning M, Flicek P, Koller D, Montgomery S, Tavare S, Deloukas P, Dermitzakis ET: Population genomics of human gene expression. Nat Genet. 2007, 39: 1217-1224. 10.1038/ng2142.PubMedPubMed CentralView ArticleGoogle Scholar
- Heap GA, Trynka G, Jansen RC, Bruinenberg M, Swertz MA, Dinesen LC, Hunt KA, Wijmenga C, Vanheel DA, Franke L: Complex nature of SNP genotype effects on gene expression in primary human leucocytes. BMC Med Genomics. 2009, 2: 1-10.1186/1755-8794-2-1.PubMedPubMed CentralView ArticleGoogle Scholar
- Brem RB, Yvert G, Clinton R, Kruglyak L: Genetic dissection of transcriptional regulation in budding yeast. Science. 2002, 296: 752-755. 10.1126/science.1069516.PubMedView ArticleGoogle Scholar
- Foss EJ, Radulovic D, Shaffer SA, Ruderfer DM, Bedalov A, Goodlett DR, Kruglyak L: Genetic basis of proteome variation in yeast. Nat Genet. 2007, 39: 1369-1375. 10.1038/ng.2007.22.PubMedView ArticleGoogle Scholar
- Bystrykh L, Weersing E, Dontje B, Sutton S, Pletcher MT, Wiltshire T, Su AI, Vellenga E, Wang JT, Manly KF, Lu L, Chesler EJ, Alberts R, Jansen RC, Williams RW, Cooke MP, de Haan G: Uncovering regulatory pathways that affect hematopoietic stem cell function using 'genetical genomics'. Nat Genet. 2005, 37: 225-232. 10.1038/ng1497.PubMedView ArticleGoogle Scholar
- Hubner N, Wallace CA, Zimdahl H, Petretto E, Schulz H, Maciver F, Mueller M, Hummel O, Monti J, Zidek V, Musilova A, Kren V, Causton H, Game L, Born G, Schmidt S, Muller A, Cook SA, Kurtz TW, Whittaker J, Pravenec M, Aitman TJ: Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat Genet. 2005, 37: 243-253. 10.1038/ng1522.PubMedView ArticleGoogle Scholar
- Li Y, Alvarez OA, Gutteling EW, Tijsterman M, Fu J, Risken JAG, Hazendonk E, Prins P, Plaster RHA, Jansen RC, Breitling R, Kammenga JE: Mapping determinants of gene expression plasticity by genetical genomics in C. elegans. PLoS Genet. 2006, 2: e222-10.1371/journal.pgen.0020222.PubMedPubMed CentralView ArticleGoogle Scholar
- Keurentjes JJ, Fu J, Terpstra IR, Garcia JM, Ackerveken van den G, Snoek LB, Peeters AJ, Vreugdenhil D, Koornneef M, Jansen RC: Regulatory network construction in Arabidopsis by using genome-wide gene expression quantitative trait loci. Proc Natl Acad Sci USA. 2007, 104: 1708-1713. 10.1073/pnas.0610429104.PubMedPubMed CentralView ArticleGoogle Scholar
- Keurentjes JJ, Fu J, de Vos CH, Lommen A, Hall RD, Bino RJ, Plas van der LH, Jansen RC, Vreugdenhil D, Koornneef M: The genetics of plant metabolism. Nat Genet. 2006, 38: 842-849. 10.1038/ng1815.PubMedView ArticleGoogle Scholar
- Fu J, Keurentjes JJ, Bouwmeester H, America T, Verstappen FW, Ward JL, Beale MH, de Vos RC, Dijkstra M, Scheltema RA, Johannes F, Koornneef M, Vreugdenhil D, Breitling R, Jansen RC: System-wide molecular evidence for phenotypic buffering in Arabidopsis. Nat Genet. 2009, 41: 166-167. 10.1038/ng.308.PubMedView ArticleGoogle Scholar
- Stein L: Towards a cyberinfrastructure for the biological sciences: progress, visions and challenges. Nat Rev Genet. 2008, 9: 678-688. 10.1038/nrg2414.PubMedView ArticleGoogle Scholar
- Fay DS: Classical genetics goes high-tech. Nat Methods. 2008, 5: 863-864. 10.1038/nmeth1008-863.PubMedView ArticleGoogle Scholar
- Ihaka R, Gentleman RC: R: A language for data analysis and graphics. J Comput Graphical Stat. 1996, 399-414.Google Scholar
- Carey VJ, Morgan M, Falcon S, Lazarus R, Gentleman R: GGtools: analysis of genetics of gene expression in bioconductor. Bioinformatics. 2007, 23: 522-523. 10.1093/bioinformatics/btl628.PubMedView ArticleGoogle Scholar
- Alberts R, Vera G, Jansen RC: affyGG: computational protocols for genetical genomics with Affymetrix arrays. Bioinformatics. 2008, 24: 433-434. 10.1093/bioinformatics/btm614.PubMedView ArticleGoogle Scholar
- Fu J, Swertz MA, Keurentjes JJ, Jansen RC: MetaNetwork: a computational protocol for the genetic study of metabolic networks. Nat Protocols. 2007, 2: 685-694. 10.1038/nprot.2007.96.PubMedView ArticleGoogle Scholar
- Bhave SV, Hornbaker C, Phang TL, Saba L, Lapadat R, Kechris K, Gaydos J, McGoldrick D, Dolbey A, Leach S, Soriano B, Ellington A, Ellington E, Jones K, Mangion J, Belknap JK, Williams RW, Hunter LE, Hoffman PL, Tabakoff B: The PhenoGen informatics website: tools for analyses of complex traits. BMC Genet. 2007, 8: 59-10.1186/1471-2156-8-59.PubMedPubMed CentralView ArticleGoogle Scholar
- Broman KW, Wu H, Sen S, Churchill GA: R/qtl: QTL mapping in experimental crosses. Bioinformatics. 2003, 19: 889-890. 10.1093/bioinformatics/btg112.PubMedView ArticleGoogle Scholar
- Smedley D, Swertz MA, Wolstencroft K, Proctor G, Zouberakis M, J B, Hancock JM, Schofield P, consortium aomotC: Solutions for data integration in functional genomics: a critical assessment and case study. Brief Bioinform. 2008, 9: 532-544. 10.1093/bib/bbn040.PubMedView ArticleGoogle Scholar
- Mungall CJ, Emmert DB: A Chado case study: an ontology-based modular schema for representing genome-associated biological information. Bioinformatics. 2007, 23: i337-346. 10.1093/bioinformatics/btm189.PubMedView ArticleGoogle Scholar
- Stein LD, Mungall C, Shu SQ, Caudy M, Mangone M, Day A, Nickerson E, Stajich JE, Harris TW, Arva A, Lewis S: The Generic Genome Browser: a building block for a model organism system database. Genome Res. 2002, 12: 1599-1610. 10.1101/gr.403602.PubMedPubMed CentralView ArticleGoogle Scholar
- Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, Ellis B, Gautier L, Ge YC, Gentry J, Hornik K, Hothorn T, Huber W, Iacus S, Irizarry R, Leisch F, Li C, Maechler M, Rossini AJ, Sawitzki G, Smith C, Smyth G, Tierney L, Yang JYH, Zhang JH: Bioconductor: open software development for computational biology and bioinformatics. Genome Biol. 2004, 5: R80-10.1186/gb-2004-5-10-r80.PubMedPubMed CentralView ArticleGoogle Scholar
- Brazma A, Krestyaninova M, Sarkans U: Standards for systems biology. Nat Rev Genet. 2006, 7: 593-605. 10.1038/nrg1922.PubMedView ArticleGoogle Scholar
- Saal LH, Troein C, Vallon-Christersson J, Gruvberger S, Borg A, Peterson C: BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol. 2002, 3: SOFTWARE0003-10.1186/gb-2002-3-8-software0003.PubMedPubMed CentralView ArticleGoogle Scholar
- Galperin MY, Cochrane GR: Nucleic Acids Res annual Database Issue and the NAR online Molecular Biology Database Collection in 2009. Nucleic Acids Res. 2009, 37: D1-4. 10.1093/nar/gkn942.PubMedPubMed CentralView ArticleGoogle Scholar
- Mailman MD, Feolo M, Jin Y, Kimura M, Tryka K, Bagoutdinov R, Hao L, Kiang A, Paschall J, Phan L, Popova N, Pretel S, Ziyabari L, Lee M, Shao Y, Wang ZY, Sirotkin K, Ward M, Kholodov M, Zbicz K, Beck J, Kimelman M, Shevelev S, Preuss D, Yaschenko E, Graeff A, Ostell J, Sherry ST: The NCBI dbGaP database of genotypes and phenotypes. Nat Genet. 2007, 39: 1181-1186. 10.1038/ng1007-1181.PubMedPubMed CentralView ArticleGoogle Scholar
- Chesler EJ, Lu L, Shou SM, Qu YH, Gu J, Wang JT, Hsu HC, Mountz JD, Baldwin NE, Langston MA, Threadgill DW, Manly KF, Williams RW: Complex trait analysis of gene expression uncovers polygenic and pleiotropic networks that modulate nervous system function. Nat Genet. 2005, 37: 233-242. 10.1038/ng1518.PubMedView ArticleGoogle Scholar
- Thorisson GA, Muilu J, Brookes AJ: Genotype-phenotype databases: challenges and solutions for the post-genomic era. Nat Rev Genet. 2009, 10: 9-18. 10.1038/nrg2483.PubMedView ArticleGoogle Scholar
- Zeng H, Luo L, Zhang W, Zhou J, Li Z, Liu H, Zhu T, Feng X, Zhong Y: PlantQTL-GE: a database system for identifying candidate genes in rice and Arabidopsis by gene expression and QTL information. Nucleic Acids Res. 2007, 35: D879-882. 10.1093/nar/gkl814.PubMedPubMed CentralView ArticleGoogle Scholar
- Hu ZL, Fritz ER, Reecy JM: AnimalQTLdb: a livestock QTL database tool set for positional QTL information mining and beyond. Nucleic Acids Res. 2007, 35: D604-609. 10.1093/nar/gkl946.PubMedPubMed CentralView ArticleGoogle Scholar
- Swertz MA, Jansen RC: Beyond standardization: dynamic software infrastructures for systems biology. Nat Rev Genet. 2007, 8: 235-243. 10.1038/nrg2048.PubMedView ArticleGoogle Scholar
- Jones AR, Miller M, Aebersold R, Apweiler R, Ball CA, Brazma A, Degreef J, Hardy N, Hermjakob H, Hubbard SJ, Hussey P, Igra M, Jenkins H, Julian RK, Laursen K, Oliver SG, Paton NW, Sansone SA, Sarkans U, Stoeckert CJ, Taylor CF, Whetzel PL, White JA, Spellman P, Pizarro A: The Functional Genomics Experiment model (FuGE): an extensible framework for standards in functional genomics. Nat Biotechnol. 2007, 25: 1127-1133. 10.1038/nbt1347.PubMedView ArticleGoogle Scholar
- Smith B, Ashburner M, Rosse C, Bard J, Bug W, Ceusters W, Goldberg LJ, Eilbeck K, Ireland A, Mungall CJ, Leontis N, Rocca-Serra P, Ruttenberg A, Sansone SA, Scheuermann RH, Shah N, Whetzel PL, Lewis S: The OBO Foundry: coordinated evolution of ontologies to support biomedical data integration. Nat Biotechnol. 2007, 25: 1251-1255. 10.1038/nbt1346.PubMedPubMed CentralView ArticleGoogle Scholar
- Brown SD, Chambon P, de Angelis MH: EMPReSS: standardized phenotype screens for functional annotation of the mouse genome. Nat Genet. 2005, 37: 1155-10.1038/ng1105-1155.PubMedView ArticleGoogle Scholar
- MIQAS - Minimum Information for QTLs and Association Studies. [http://miqas.sourceforge.net/]
- Taylor CF, Field D, Sansone SA, Aerts J, Apweiler R, Ashburner M, Ball CA, Binz PA, Bogue M, Booth T, Brazma A, Brinkman RR, Michael Clark A, Deutsch EW, Fiehn O, Fostel J, Ghazal P, Gibson F, Gray T, Grimes G, Hancock JM, Hardy NW, Hermjakob H, Julian RK, Kane M, Kettner C, Kinsinger C, Kolker E, Kuiper M, Novere NL, et al: Promoting coherent minimum reporting guidelines for biological and biomedical investigations: the MIBBI project. Nat Biotechnol. 2008, 26: 889-896. 10.1038/nbt.1411.PubMedPubMed CentralView ArticleGoogle Scholar
- Irizarry RA, Hobbs B, Collin F, Beazer-Barclay YD, Antonellis KJ, Scherf U, Speed TP: Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics. 2003, 4: 249-264. 10.1093/biostatistics/4.2.249.PubMedView ArticleGoogle Scholar
- Jones AR, Paton NW: An analysis of extensible modelling for functional genomics data. BMC Bioinformatics. 2005, 6: 235-10.1186/1471-2105-6-235.PubMedPubMed CentralView ArticleGoogle Scholar
- Rayner TF, Rocca-Serra P, Spellman PT, Causton HC, Farne A, Holloway E, Irizarry RA, Liu J, Maier DS, Miller M, Petersen K, Quackenbush J, Sherlock G, Stoeckert CJ, White J, Whetzel PL, Wymore F, Parkinson H, Sarkans U, Ball CA, Brazma A: A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB. BMC Bioinformatics. 2006, 7: 489-10.1186/1471-2105-7-489.PubMedPubMed CentralView ArticleGoogle Scholar
- The PubChem Project. [http://pubchem.ncbi.nlm.nih.gov/]
- Peters B, Sidney J, Bourne P, Bui HH, Buus S, Doh G, Fleri W, Kronenberg M, Kubo R, Lund O, Nemazee D, Ponomarenko JV, Sathiamurthy M, Schoenberger S, Stewart S, Surko P, Way S, Wilson S, Sette A: The immune epitope database and analysis resource: from vision to blueprint. PLoS Biol. 2005, 3: e91-10.1371/journal.pbio.0030091.PubMedPubMed CentralView ArticleGoogle Scholar
- XGAP data sets. [http://www.xgap.org/wiki/DataSets]
- Stranger BE, Forrest MS, Dunning M, Ingle CE, Beazley C, Thorne N, Redon R, Bird CP, de Grassi A, Lee C, Tyler-Smith C, Carter N, Scherer SW, Tavare S, Deloukas P, Hurles ME, Dermitzakis ET: Relative impact of nucleotide and copy number variation on gene expression phenotypes. Science. 2007, 315: 848-853. 10.1126/science.1136678.PubMedPubMed CentralView ArticleGoogle Scholar
- Myers AJ, Gibbs JR, Webster JA, Rohrer K, Zhao A, Marlowe L, Kaleem M, Leung D, Bryden L, Nath P, Zismann VL, Joshipura K, Huentelman MJ, Hu-Lince D, Coon KD, Craig DW, Pearson JV, Holmans P, Heward CB, Reiman EM, Stephan D, Hardy J: A survey of genetic human cortical gene expression. Nat Genet. 2007, 39: 1494-1499. 10.1038/ng.2007.16.PubMedView ArticleGoogle Scholar
- XGAP - eXtensible Genotype And Phenotype platform. [http://www.xgap.org]
- Taverna Workbench. [http://taverna.sourceforge.net]
- Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T: Taverna: a tool for building and running workflows of services. Nucleic Acids Res. 2006, 34: W729-732. 10.1093/nar/gkl320.PubMedPubMed CentralView ArticleGoogle Scholar
- PAGE-OM - The Phenotype And Genotype Object Model. [http://www.pageom.org/]
- GEN2PHEN - EU consortium to unify human Genotype-To-Phenotype databases. [http://www.gen2phen.org]
- Swertz MA, de Brock EO, van Hijum SAFT, de Jong A, Buist G, Baerends RJS, Kok J, Kuipers OP, Jansen RC: Molecular Genetics Information System (MOLGENIS): alternatives in developing local experimental genomics databases. Bioinformatics. 2004, 20: 2075-2083. 10.1093/bioinformatics/bth206.PubMedView ArticleGoogle Scholar
- MOLGENIS flexible biosoftware generation toolkit. [http://www.molgenis.org]
- Baile JS, Grabowski-Boas L, Steff BM, Wiltshire T, Churchil GA, Tarantino LM: Identification of quantitative trait loci for locomotor activation and anxiety using closely related inbred strains. Genes Brain Behav. 2008, 7: 761-769. 10.1111/j.1601-183X.2008.00415.x.View ArticleGoogle Scholar
- Beamer WG, Shultz KL, Churchill GA, Frankel WN, Baylink DJ, Rosen CJ, Donahue LR: Quantitative trait loci for bone density in C57BL/6J and CAST/EiJ inbred mice. Mamm Genome. 1999, 10: 1043-1049. 10.1007/s003359901159.PubMedView ArticleGoogle Scholar
- Fu J, Keurentjes JJ, Bouwmeester H, America T, Verstappen FW, Ward JL, Beale MH, de Vos RC, Dijkstra M, Scheltema RA, Johannes F, Koornneef M, Vreugdenhil D, Breitling R, Jansen RC: System-wide molecular evidence for phenotypic buffering in Arabidopsis. Nat Genet. 2009, 41: 166-167. 10.1038/ng.308.PubMedView ArticleGoogle Scholar
- Smedley D, Haider S, Ballester B, Holland R, London D, Thorisson G, Kasprzyk A: BioMart - biological queries made easy. BMC Genomics. 2009, 10: 22-10.1186/1471-2164-10-22.PubMedPubMed CentralView ArticleGoogle Scholar
- Lyne R, Smith R, Rutherford K, Wakeling M, Varley A, Guillier F, Janssens H, Ji W, McLaren P, North P, Rana D, Riley T, Sullivan J, Watkins X, Woodbridge M, Lilley K, Russell S, Ashburner M, Mizuguchi K, Micklem G: FlyMine: an integrated database for Drosophila and Anopheles genomics. Genome Biol. 2007, 8: R129-10.1186/gb-2007-8-7-r129.PubMedPubMed CentralView ArticleGoogle Scholar
- Omixed. [http://www.omixed.org/]
- Jameson D, Garwood K, Garwood C, Booth T, Alper P, Oliver SG, Paton NW: Data capture in bioinformatics: requirements and experiences with Pedro. BMC Bioinformatics. 2008, 9: 183-10.1186/1471-2105-9-183.PubMedPubMed CentralView ArticleGoogle Scholar
- AndroMDA. [http://www.andromda.org/]
- Ruby on Rails. [http://www.rubyonrails.org]
- O'Connor BD, Day A, Cain S, Arnaiz O, Sperling L, Stein LD: GMODWeb: a web framework for the Generic Model Organism Database. Genome Biol. 2008, 9: R102-10.1186/gb-2008-9-6-r102.PubMedPubMed CentralView ArticleGoogle Scholar
- FuGE - Functional Genomics Experiment model. [http://fuge.sourceforge.net]
- Eclipse Integrated Software Development platform. [http://www.eclipse.org]
- CASIMIR - EU consortium for Coordination and Sustainability of International Mouse Informatics Resources. [http://www.casimir.org.uk]
- TAIR - The Arabidopsis Information Resource. [http://www.Arabidopsis.org]
- MIPS - The MIPS Mammalian Protein-Protein Interaction Database. [http://mips.helmholtz-muenchen.de/proj/ppi/]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.