Exploration of the omics evidence landscape: adding qualitative labels to predicted protein-protein interactions
© van Noort et al.; licensee BioMed Central Ltd. 2007
Received: 4 June 2007
Accepted: 19 September 2007
Published: 19 September 2007
In the post-genomic era various functional genomics, proteomics and computational techniques have been developed to elucidate the protein interaction network. While some of these techniques are specific for a certain type of interaction, most predict a mixture of interactions. Qualitative labels are essential for the molecular biologist to experimentally verify predicted interactions.
Of the individual protein-protein interaction prediction methods, some can predict physical interactions without producing other types of interactions. None of the methods can specifically predict metabolic interactions. We have constructed an 'omics evidence landscape' that combines all sources of evidence for protein interactions from various types of omics data for Saccharomyces cerevisiae. We explore this evidence landscape to identify areas with either only metabolic or only physical interactions, allowing us to specifically predict the nature of new interactions in these areas. We combine the datasets in ways that examine the whole evidence landscape and not only the highest scoring protein pairs in both datasets and find specific predictions.
The combination of evidence types in the form of the evidence landscape allows for qualitative labels to be inferred and placed on the predicted protein interaction network of S. cerevisiae. These qualitative labels will help in the biological interpretation of gene networks and will direct experimental verification of the predicted interactions.
Genome sequencing projects have resulted in the listing of all protein coding and RNA genes for a large number of organisms. In order to understand how the inner workings of the cell, a plethora of omics (genome-scale) techniques that measure the functional coupling between all the components has been developed. All these techniques measure different aspects of functional coupling: for example, yeast-two-hybrid assays [1, 2] uncover direct physical interactions between proteins, whereas affinity purification [3, 4] measures the tendency for proteins to be members of the same protein complex, and micro-arrays  detect the concerted expression of genes at the mRNA level. Furthermore, functional relationships are predicted from many other sources: genetic interaction data , gene fusion, conserved gene neighborhood and gene co-occurrence [7–9], conserved co-expression between species [10, 11] or the sharing of transcription factors . Many of these high-throughput techniques to infer functional relationships produce noisy data. The noise level of the data has lead to the development of bioinformatics data integration strategies to increase the reliability of the prediction of functional coupling.
Despite the obvious success of these integrative approaches, they remove from the raw data the information pertaining to functional coupling that was measured in the original assay; high quality generic gene networks have been inferred from the integration of very heterogeneous data, such as synthetic lethals, yeast-two-hybrid and mRNA derived co-expression [13, 14]. These networks contain many accurate predictions, but specific information on the type of functional coupling is lost. In addition to the loss of specificity from integration, some techniques to measure interactions, such as co-expression, predict, even without integration, only generic functional couplings. This lack of specificity is a problem, because for the biological interpretation of gene networks and the prioritization of experimental verification, we not only need to identify protein interactions, but also to add qualitative labels to the interactions . We here present a bioinformatics approach that distinguishes different types of functional coupling on the basis of their behavior across different high-throughput datasets. We study how well in silico predictions and omics data serve to specifically predict a specific type of interaction. Subsequently, we combine the information from in silico predictions, functional genomics data and protein interaction assays into evidence landscapes. In these landscapes we identify regions that are populated solely by physical or metabolic interactions, allowing specific prediction of the nature of interactions between proteins.
Co-pathway membership is another common functional relationship that is frequently used [13, 14] and which we chose as our second category. Metabolic interactions, in which proteins are part of the same metabolic pathway, are the clearest exponent of these pathway interactions for which clear cut databases of sufficient size are available. No high-throughput method exists that exclusively detects pathway or metabolic interactions, even though certain methods detect them among other functional relationships. As a basis for metabolic interactions we took only those KEGG maps that represent metabolic pathways (that is, with map number below 2000); obviously, metabolic pathways contain multimeric enzyme complexes, but we did not score the intra-complex interaction of these as positives or as negatives in our metabolic reference. We did, however, consider the links between these enzymes and other enzymes from the pathway as metabolic (Figure 1). This resulted in 18,460 positives and 275,768 negatives.
Score-intervals and positive predictive value
To determine whether there are omics evidence type data that alone are typical for either of the two categories, we cannot simply plot the prediction performance for each reference set independently. We have to take into account true and false metabolic interactions as false physical interactions and vice versa. The ppv of metabolic interactions is calculated as the total number of true metabolic interactions divided by the sum of the true and false metabolic and the true and false physical interactions. The ppv of physical interactions is then calculated as the total number of true physical interactions divided by the sum of the true and false metabolic and the true and false physical interactions. By doing this, we can determine not only whether at a certain score in a certain dataset proteins are likely to interact, but also how they interact.
Qualitative information from individual omics datasets
We calculated the ppv for each omics evidence type and each score interval. Figure 2 shows at what score each evidence type successfully predicts either metabolic or physical interactions. The ppv for physical interaction (ppv phys) increases similarly to the ppv for metabolic interactions (ppv meta) for gene co-expression (CoExp), as well as for combinations of gene co-expression between species (CoExp2Sp, CoExp4Sp) and the combination of gene co-expression with shared transcription factor binding sites (ChIP-chipCoExp) (Figure 2). These data are, therefore, not specific for either metabolic or physical interactions. In contrast, for gene neighborhood (GenNeigh) the ppv depends on the score: very high is specific for physical interactions whereas a lower, but still significant, score is indicative of a metabolic interaction (Figure 2, GenNeigh). The highest ppv meta in this set is 0.79, at a point where the ppv phys is 0.05, whereas the highest ppv phys is 0.73 when the ppv meta is 0.11. Therefore, GenNeigh can be used to obtain some specificity about the type of predicted interaction. Correlated phylogenetic profiles (CoOccur) show a similar, but less pronounced, trend of differential ppv.
Logistic regression coefficients with metabolic and physical interactions
Qualitative information from evidence landscapes
Normally, logistic regression provides the best fitting function between a dependent variable and a set of independent variables. In this case the variables are clearly not independent, as can be readily observed in a correlation matrix (Additional data file 1). There are also huge differences in coefficients between fitted functions on the separate variables and fitted functions on multiple variables at the same time (data not shown). Therefore, a simple logistic regression, for example, as applied in , is not permitted by these data. Moreover, the interval score-ppv plots show that the probabilities of interactions do not always follow a logistic curve. An exploration of the combinations of scores of the different input data is more suitable. We call the combinations of omics data 'evidence landscapes', surfaces on which the x and y coordinates represent the scores of two types of 'omics' data. In these areas we plot the specificity for either metabolic or physical interactions, estimated by the differential ppv. The differential ppv is computed by subtracting the physical interaction ppv from the metabolic interaction ppv. This means that if a region scores equally well in both reference sets (be it very poor or very well), it has a zero differential ppv, reflecting the inability of this region to differentiate between metabolic and physical interactions. However, if it is very accurate in predicting metabolic relations but unable to accurately predict physical interactions, it has a very high differential ppv and, vice versa, a very negative value reflects specificity for physical interactions. Thus, the differential ppv is a tool to judge whether areas exist that specifically predict either type of interaction.
What we have observed in Figure 2 is that intermediate scores in correlated phylogenetic profiles and gene neighborhood conservation are often indicative of metabolic interactions. The evidence landscape of these two has specific metabolic interactions in intermediate scores of both sets (Figure 3g). Thus, not only do we find purely metabolic interactions from gene pairs that score null in protein-protein interaction datasets, we also find them in overlaps with intermediate scoring parts of other evidence types.
A cellular network with qualitative labels on the predicted interactions
Several metabolic pathways are completely retrieved, such as the arginine and the threonine biosynthesis pathways, which are connected only by predicted metabolic interactions (blue lines). The arginine biosynthesis pathway is depicted in Figure 4b. We find many known physical protein complexes as clusters densely connected by red lines, as has been previously shown in many integrative bioinformatics studies [18–21]. Interestingly, we now also observe the pathway interactions that exist between them. For example, in the upper right corner is the oxidative phosphorylation pathway. Members of the same complex have red lines (physical interactions) between them, whereas members of different complexes have blue lines (metabolic interactions) between them. Even though we derived the metabolic pathway interactions by identifying the regions in the landscapes that scored highly in a metabolic reference set, we still expect this class in addition to be general for other functional associations from other types of cellular pathways. Therefore, the blue lines between, for example, the exosome and the small nucleolar ribonucleoprotein complex, are not necessarily metabolic as in the case of the oxidative phosphorylation pathway, but rather other types of functional associations in which a substrate is passed on from one protein to another. Likewise, the oxidative stress cluster contains interactions between thioredoxin reductases and glutaredoxins. These proteins are, as far as is known, not part of the same pathway in the sense that they pass, for example, reducing equivalents to each other, but they are part of the same system.
It is perhaps logical in hindsight that we detect metabolic interactions in areas where both proteomic approaches report no co-purification while there are strong indications for co-regulation, but there are some important implications. We should use not only integrations based on the top scoring proteins but also non-scoring proteins. For the co-purification data this implies that the absence of a reported interaction is in fact the reflection of a cellular reality: in other words, we need physical protein interaction datasets where the negatives are really true negatives rather than the absence of results. Although the comparison of the Gavin et al.  and Krogan et al.  co-purification data reveals that both datasets still harbor some false negatives, a combined dataset of both comes close to having the perfect properties for our objective, and it is only since the publication of these data that a differential genomics approach as proposed here has become possible.
Another contribution in distinguishing metabolic from physical interactions comes from differential rates of evolution. We could not obtain the same level of differential ppv for the prediction of metabolic interactions in landscapes with the conserved co-expression set of Stuart and co-workers  as we did with a two-species orthologous conserved co-expression  because the first predicts mainly physical interactions. As the conserved co-expression set of Stuart et al. is based on four species and the other one on only two, we speculate that metabolic interactions are less conserved in evolution than physical interactions, which is consistent with results on the evolutionary modularity of metabolic pathways and protein complexes in biological systems . The higher rate of evolution of metabolic interactions also explains that a very high level of conservation of gene neighborhood conservation or correlation of phylogenetic profiles indicates a physical interaction whereas intermediate levels are more indicative of metabolic interactions.
One striking observation is that we predict many more physical interactions than metabolic interactions. This difference might be easily explained by the fact that there are specific experimental methods to find physical interactions and no specific methods to find metabolic interactions. Even the shared genetic interactions, which we previously thought to be indicative of co-pathway membership, turn out to correlate mostly with physical protein interactions. Only co-expression data and the in silico prediction methods contain metabolic interactions mixed with physical interactions, making it hard to specifically extract metabolic interactions from omics data. Ultimately, it might even be the nature of metabolic interactions themselves that makes them less amendable to prediction: metabolic interactions are, by nature, indirect, and only in the case of linear pathways do the enzymes involved have the kind of mutual dependence that proteins in the same complex have, which might explain why the former leave a less strong signal in the genomics data than the latter. A weaker type of interaction between enzymes in the same pathway is also suggested by our observation that metabolic interactions are prevalent at intermediate degrees of gene order conservation or correlation between phylogenetic profiles while high levels of gene order conservation correlate with physical interactions.
It is of course tempting to combine more than two types of omics data. There are, however, two reasons why we here explore pairs of evidence types rather than the multidimensional evidence landscape given by all evidence types simultaneously. Firstly, visual inspection of differential ppv plots is still possible in two dimensions but becomes more troublesome in higher dimensions. Secondly, and more importantly, overlapping all evidence types at the same time results in very small numbers of protein pairs in each multidimensional volume in the reference sets, which in turn hampers the reliable calculation of prediction ppv.
As an extension to this work we would like to specifically predict more than only two types of interactions. One type of interaction that we can not predict is a kinase-target interaction; the prediction of these kinds of interactions is a field on its own and requires integration of many more types of prediction methods and data, such as sequence data . Furthermore, for the type of method we use here it is necessary to have reference sets that are of high quality and at the same time cover many protein pairs. For transient physical interactions, such reference sets are not available at the moment, although they might become available in the near future.
Protein relations predicted by our computational integration should be less laborious to experimentally test, because they prioritize the usability of various assays for biochemical verification. For example, it would be disingenuous to verify our metabolic relations by CoIP. In general, we expect that novel ways of integration and the advent of more and more types of omics data will allow the further development of approaches to increase the specificity and to extract more qualitative data on the nature of protein interactions.
When predicting interactions between genes it is essential to specify the type of interaction that is predicted to allow biological interpretation. Some data types are already specific for the type of interaction, for example, ChIP-on-chip data of transcription factors is indicative of regulatory interactions and co-purifications are specific for physical interactions. However, co-regulation, correlated expression, shared genetic interactions and in silico interactions are not intrinsically specific to any type of interaction. Here we have shown that although some datasets do contain a high level of metabolic interactions at intermediate scores, it is not possible to reliably predict metabolic interactions from them. However, by combining the datasets in ways that examine the whole evidence landscape and not only the highest scoring protein pairs in both datasets we can find specific predictions; for example, by taking protein pairs whose co-expression is evolutionarily conserved but that never co-purify in two comprehensive protein-protein interaction datasets, we can label these predicted interaction as metabolic interactions. This is a first step towards improved biological interpretation of gene networks generated from the integration of high throughput data.
Materials and methods
We downloaded the yeast protein complex purifications published by Gavin and co-workers  and recalculated the SA scores that reflect the likelihood of interaction to include also proteins that were purified only once. Protein pairs that weree never co-purified but were both purified at least once received a SA score of zero. We also downloaded the protein complex purifications of Krogan and co-workers . These authors produced a different interaction score per protein pair, which was optimized to overlap with protein complexes from the MIPS database. To have a reference set-independent score we calculated SA scores based on the purifications of Krogan et al. Protein pairs that were never found together in a purification but were purified at least once were given a score of zero. As a third set we took the sum of SA scores of all protein pairs occurring in both protein-protein interaction datasets. Scored yeast-two-hybrid interactions were obtained from the STRING database .
In silico predictions of functional interactions were obtained from the STRING database . From this database we took the co-occurrence scores based on phylogenetic profiles of COGs and gene neighborhood conservation also based on COGs. The scores were transferred from pairs of COGs to pairs of S. cerevisiae genes. If more than three yeast genes belonged to the same COG, the score was considered ambiguous and was removed from the dataset.
We used two multi-species conserved co-expression datasets; co-expression conservation between human, yeast, fly and worm  and between yeast and worm . We also used co-expression conservation between pairs of paralogs  in yeast. For the two-species conservation we took the maximum expression correlation of all pairs of orthologs and averaged this maximum with the expression correlation of the gene pair itself. For paralogous conservation we took the maximum expression correlation between all parallel duplicated gene pairs and averaged this maximum with the expression correlation of the gene pair itself.
Gene pairs that share a promoter were excluded. To increase the reliability of the co-regulation signal, we multiplied the correlation in binding profile by the correlation in mRNA expression profile based on a large-scale expression dataset in yeast , that is:
Snew ij = rij × Sij
where rij is the expression correlation of gene i and j.
A set of synthetic lethal and synthetic sick interactions were downloaded from the Saccharomyces Genome Database . It was found earlier that genetic interactions  on their own are only marginally useful for predicting direct interactions, but shared genetic interactions do indicate involvement in similar pathways . We corrected the number of shared genetic interactions Ni, j by the geometric average of total interactions T per protein, exactly the same as for the co-regulation score.
We downloaded known complexes from MIPS  and removed all categories containing the terms 'other' or 'predicted'. Removal of the predicted category was especially acute, because these contain complexes derived from purified complexes identified by mass-spectrometry from earlier high-throughout publications from the same groups that produced the Krogan et al. and the Gavin et al. datasets. We took complexes at the lowest level of definition. Protein pairs that are in the same complex are positive examples, and protein pairs that are in different complexes are negative examples. The positive and negative examples constitute the physical interaction reference set.
From the KEGG database  we took all metabolic maps with indices smaller than 2,000. Maps with higher index are not metabolic and contain other processes, including many that consist of a single protein complex. Positive examples are all protein pairs that co-occur on a metabolic map, and negative examples are all protein pairs that do not co-occur on a metabolic map but are, nevertheless, present in the metabolic maps of KEGG. In order to not have any physical interactions in our metabolic reference set, we removed all protein pairs with the same EC number and removed all protein pairs that are part of the same complex according to SGD/GO annotation [30, 31] or MIPS. Together, the positive and negative examples form the metabolic interaction reference set.
Cytoplasmic ribosomal proteins were removed from all reference sets and datasets. As they confer very many pair-wise interactions, including them would bias all statistics towards ribosomes.
ppv and differential ppv
The conserved co-expression values of the Kim lab  were rescaled by transforming the -log(P-value) to scores between 0 and 1, such that high scores correspond to more likely interactions. All other scores were rescaled to scores between 0 and 1 by a linear transformation. In the score-ppv plots for each set we calculated ppv based on intervals with bin width 0.025. In the evidence landscape plots, we plotted two datasets against each other in a heat map-like fashion and color squares according to their differential ppv (see below). Squares were made with sides of 0.05; if a square contained fewer than two true positives, a larger square with sides 0.1 was made to avoid high performance scores based on very few examples.
True positives and false positives
Present in bin
ppv meta = TP meta/(TP meta + FP meta + TP phys + FP phys)
ppv phys = TP phys/(TP meta + FP meta + TP phys + FP phys)
ppv diff = ppv meta - ppv phys
Differential ppv is computed by subtracting the ppv phys from ppv meta. This means that if a region scores equally well in both reference sets (be it very poor or very well), it has a zero differential ppv, reflecting the inability of this region to differentiate between metabolic and physical interactions. However, if it is very accurate in predicting metabolic relations but unable to accurately predict physical interactions, it has a very high differential ppv and, vice versa, a very negative value reflects specificity for physical interactions.
We took all gene pairs that fell into the reference sets and took as a binary dependent variable the absence or presence of a known interaction. Again, for metabolic interactions the gene pairs of the physical interaction reference set were added as gene pairs with an absent interaction and for physical interactions the gene pairs of the metabolic interaction reference set were added as gene pairs with an absent interaction. The scores of the 'omics' datasets were, in turn, considered as the continuous independent variable to fit a logit function. The intercepts (a) and coefficients (b) are reported in Table 1. An approximation of the R2 value was calculated as:
R2 = (null variance - residual variance)/(null variance)
Adding specificity to predicted interactions
We took all gene pairs that fell into squares with differential ppv larger than 0.95 and at least five true positive metabolic interactions and called them 'predicted metabolic interactions'. We selected all gene pairs that fell into squares with differential ppv smaller than -0.95 and at least five true positive physical interactions and called them 'predicted physical interactions'.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a correlation matrix that provides the correlation between the scores of all pairs of omics datasets.
chromatin imuno purification followed by chip
correlated expression in two species
correlated expression in four species
gene neighborhood conservation
positive predictive value
VvN was supported by a grant from the Netherlands Bioinformatics Center (NBIC).
- Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001, 98: 4569-4574. 10.1073/pnas.061034498.PubMedPubMed CentralView ArticleGoogle Scholar
- Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000, 403: 623-627. 10.1038/35001009.PubMedView ArticleGoogle Scholar
- Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dumpelfeld B, et al: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440: 631-636. 10.1038/nature04532.PubMedView ArticleGoogle Scholar
- Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, et al: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440: 637-643. 10.1038/nature04670.PubMedView ArticleGoogle Scholar
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, et al: Functional discovery via a compendium of expression profiles. Cell. 2000, 102: 109-126. 10.1016/S0092-8674(00)00015-5.PubMedView ArticleGoogle Scholar
- Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, et al: Global mapping of the yeast genetic interaction network. Science. 2004, 303: 808-813. 10.1126/science.1091317.PubMedView ArticleGoogle Scholar
- Huynen M, Snel B, Lathe W, Bork P: Predicting protein function by genomic context: quantitative evaluation and qualitative inferences. Genome Res. 2000, 10: 1204-1210. 10.1101/gr.10.8.1204.PubMedPubMed CentralView ArticleGoogle Scholar
- Jensen LJ, Lagarde J, von Mering C, Bork P: ArrayProspector: a web resource of functional associations inferred from microarray expression data. Nucleic Acids Res. 2004, 32: W445-448. 10.1093/nar/gkh407.PubMedPubMed CentralView ArticleGoogle Scholar
- Kelley R, Ideker T: Systematic interpretation of genetic interactions using protein networks. Nat Biotechnol. 2005, 23: 561-566. 10.1038/nbt1096.PubMedPubMed CentralView ArticleGoogle Scholar
- van Noort V, Snel B, Huynen MA: Predicting gene function by conserved co-expression. Trends Genet. 2003, 19: 238-242. 10.1016/S0168-9525(03)00056-8.PubMedView ArticleGoogle Scholar
- Stuart JM, Segal E, Koller D, Kim SK: A gene-coexpression network for global discovery of conserved genetic modules. Science. 2003, 302: 249-255. 10.1126/science.1087447.PubMedView ArticleGoogle Scholar
- Snel B, van Noort V, Huynen MA: Gene co-regulation is highly conserved in the evolution of eukaryotes and prokaryotes. Nucleic Acids Res. 2004, 32: 4725-4731. 10.1093/nar/gkh815.PubMedPubMed CentralView ArticleGoogle Scholar
- von Mering C, Jensen LJ, Kuhn M, Chaffron S, Doerks T, Kruger B, Snel B, Bork P: STRING 7 - recent developments in the integration and prediction of protein interactions. Nucleic Acids Res. 2007, 35: D358-362. 10.1093/nar/gkl825.PubMedPubMed CentralView ArticleGoogle Scholar
- Lee I, Date SV, Adai AT, Marcotte EM: A probabilistic functional network of yeast genes. Science. 2004, 306: 1555-1558. 10.1126/science.1099511.PubMedView ArticleGoogle Scholar
- Beyer A, Workman C, Hollunder J, Radke D, Moller U, Wilhelm T, Ideker T: Integrated assessment and prediction of transcription factor binding. PLoS Comput Biol. 2006, 2: e70-10.1371/journal.pcbi.0020070.PubMedPubMed CentralView ArticleGoogle Scholar
- Sprinzak E, Altuvia Y, Margalit H: Characterization and prediction of protein-protein interactions within and between complexes. Proc Natl Acad Sci USA. 2006, 103: 14718-14723. 10.1073/pnas.0603352103.PubMedPubMed CentralView ArticleGoogle Scholar
- LANDSCAPE Project. [http://www.cmbi.ru.nl/~vvnoort/LANDSCAPE]
- Troyanskaya OG, Dolinski K, Owen AB, Altman RB, Botstein D: A Bayesian framework for combining heterogeneous data sources for gene function prediction (in Saccharomyces cerevisiae). Proc Natl Acad Sci USA. 2003, 100: 8348-8353. 10.1073/pnas.0832373100.PubMedPubMed CentralView ArticleGoogle Scholar
- Han JD, Bertin N, Hao T, Goldberg DS, Berriz GF, Zhang LV, Dupuy D, Walhout AJ, Cusick ME, Roth FP, et al: Evidence for dynamically organized modularity in the yeast protein-protein interaction network. Nature. 2004, 430: 88-93. 10.1038/nature02555.PubMedView ArticleGoogle Scholar
- Newman ME: Modularity and community structure in networks. Proc Natl Acad Sci USA. 2006, 103: 8577-8582. 10.1073/pnas.0601602103.PubMedPubMed CentralView ArticleGoogle Scholar
- Palla G, Derenyi I, Farkas I, Vicsek T: Uncovering the overlapping community structure of complex networks in nature and society. Nature. 2005, 435: 814-818. 10.1038/nature03607.PubMedView ArticleGoogle Scholar
- Snel B, Huynen MA: Quantifying modularity in the evolution of biomolecular systems. Genome Res. 2004, 14: 391-397. 10.1101/gr.1969504.PubMedPubMed CentralView ArticleGoogle Scholar
- Linding R, Jensen LJ, Ostheimer GJ, van Vugt MA, Jorgensen C, Miron IM, Diella F, Colwill K, Taylor L, Elder K, et al: Systematic discovery of in vivo phosphorylation networks. Cell. 2007, 129: 1415-1426. 10.1016/j.cell.2007.05.052.PubMedPubMed CentralView ArticleGoogle Scholar
- von Mering C, Huynen M, Jaeggi D, Schmidt S, Bork P, Snel B: STRING: a database of predicted functional associations between proteins. Nucleic Acids Res. 2003, 31: 258-261. 10.1093/nar/gkg034.PubMedPubMed CentralView ArticleGoogle Scholar
- Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Hannett NM, Tagne JB, Reynolds DB, Yoo J, et al: Transcriptional regulatory code of a eukaryotic genome. Nature. 2004, 431: 99-104. 10.1038/nature02800.PubMedPubMed CentralView ArticleGoogle Scholar
- Saccharomyces Genome Database. [http://www.yeastgenome.org/]
- Ozier O, Amin N, Ideker T: Global architecture of genetic interactions on the protein network. Nat Biotechnol. 2003, 21: 490-491. 10.1038/nbt0503-490.PubMedView ArticleGoogle Scholar
- Mewes HW, Frishman D, Guldener U, Mannhaupt G, Mayer K, Mokrejs M, Morgenstern B, Munsterkotter M, Rudd S, Weil B: MIPS: a database for genomes and protein sequences. Nucleic Acids Res. 2002, 30: 31-34. 10.1093/nar/30.1.31.PubMedPubMed CentralView ArticleGoogle Scholar
- Kanehisa M, Goto S, Kawashima S, Okuno Y, Hattori M: The KEGG resource for deciphering the genome. Nucleic Acids Res. 2004, 32: D277-280. 10.1093/nar/gkh063.PubMedPubMed CentralView ArticleGoogle Scholar
- Dwight SS, Harris MA, Dolinski K, Ball CA, Binkley G, Christie KR, Fisk DG, Issel-Tarver L, Schroeder M, Sherlock G, et al: Saccharomyces Genome Database (SGD) provides secondary gene annotation using the Gene Ontology (GO). Nucleic Acids Res. 2002, 30: 69-72. 10.1093/nar/30.1.69.PubMedPubMed CentralView ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- XMGRACE. [http://plasma-gate.wizmann.ac.il/Grace]
- R Project. [http://www.R-project.org]
- Cytoscape. [http://www.cytoscape.org]
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.