Towards accurate imputation of quantitative genetic interactions
© Ulitsky et al.; licensee BioMed Central Ltd. 2009
Received: 1 September 2009
Accepted: 10 December 2009
Published: 10 December 2009
Recent technological breakthroughs have enabled high-throughput quantitative measurements of hundreds of thousands of genetic interactions among hundreds of genes in Saccharomyces cerevisiae. However, these assays often fail to measure the genetic interactions among up to 40% of the studied gene pairs. Here we present a novel method, which combines genetic interaction data together with diverse genomic data, to quantitatively impute these missing interactions. We also present data on almost 190,000 novel interactions.
Understanding the interactions between genes and proteins is essential for elucidating their function. Genetic interactions (GIs) describe the phenotype of a double knock-out in comparison to the phenotypes of single mutants, and they can be crudely classified into positive (alleviating), neutral, and negative (aggravating) interactions [1, 2]. In a negative GI, the fitness (typically estimated by growth rate) of the double-mutant is lower than expected based on the fitness of single mutants. The most extreme example of a negative interaction is synthetic lethality, in which the joint deletion of two nonessential genes leads to a lethal phenotype. In a positive GI, on the other hand, the double mutant is healthier than expected. The expected fitness is usually defined as the product of the fitnesses of the single mutants [1, 3, 4].
In a genome of over 6,000 genes, such as that of Saccharomyces cerevisiae, there are some 18 million gene pairs, making the mapping of the complete genetic interactome a formidable challenge. Towards this goal, several techniques for high-throughput GI profiling have been developed. For example, two approaches, systematic genetic analysis (SGA) [5, 6] and dSLAM (heterozygote diploid-based synthetic lethality analysis with microarrays) [7, 8], have made it possible to screen for negative GIs, namely synthetic sick or synthetic lethal interactions, between a query gene and the collection of all nonessential genes. The recent introduction of E-MAP (epistatic miniarray profile) technology, which is an adaptation of SGA [9–12], has made it possible to quantitatively measure both positive and negative GIs among several hundreds of genes [9–11]. The largest published E-MAP to date  covers GIs between 743 S. cerevisiae genes involved in various aspects of chromosome biology. The use of quantitative GIs was shown to significantly improve gene function prediction .
Using the E-MAP technology, hundreds of thousands of GIs have been measured in S. cerevisiae. It is therefore appealing to use these data along with other genomic information to predict additional GIs. Wong et al.  pioneered the prediction of GIs in S. cerevisiae, using probabilistic decision trees and diverse genomic data, including mRNA expression, functional annotations, subcellular localization, deletion phenotypes and physical interactions. These authors also introduced '2-hop features' for capturing the relationship between a gene pair and a third gene. For example, if protein A physically interacts with protein C, and gene B is synthetic lethal with gene C, then the gene pair A-B possesses the characteristic '2-hop physical-synthetic lethal', which was shown to increase the likelihood of a synthetic lethal interaction between A and B. Assessment of the performance on SGA-tested gene pairs revealed sensitivity of 80% at a false positive rate of 18%. The 2-hop features were shown to be the most effective features for prediction of GIs, and omission of other individual features did not significantly hurt the performance. This result suggested that most negative GIs occur between pairs of compensating physical pathways. This phenomenon has since been extensively studied [14–18]. Zhong and Sternberg  used similar ideas and combined diverse genomic information from three species to predict synthetic lethal interactions in Caenorhabditis elegans using a logistic regression classifier. Paladugu et al.  focused on features based on protein-protein interaction (PPI) networks, such as node degree, centrality, and clustering coefficient. Using a support vector machine classifier, they showed that using PPI network information together with 2-hop features is sufficient for predicting synthetic lethality at about 90% accuracy.
Recently, Qi et al.  devised the first GI prediction scheme based solely on GI data. Observing that genetically interacting gene pairs are connected by many odd-length paths in the GI network, they developed a graph diffusion kernel that successfully predicts novel GIs. Combining this kernel with kernels based on other genomic data had little effect on prediction accuracy, leading them to conclude that most of the information needed to predict new GIs can be found in the existing GI network. Another method for predicting negative GIs using random walks has been recently proposed by Chipman and Singh .
All available methods for predicting GIs were designed and tested on synthetic sick or synthetic lethal GIs obtained with the SGA method [5, 6]. SGA differs from E-MAP in two key aspects. First, SGA screens are inherently asymmetrical, as a relatively small set of 'baits' are tested against a genome-wide collection of 'preys'. Using E-MAP, all pairwise interactions among a subset of the genes are tested. Second, E-MAP is quantitative and is capable of capturing both positive and negative GIs. Unfortunately, for technical reasons E-MAPs contain a large number of missing interactions. In the ChromBio E-MAP, for example, over 34% of the interactions were not measured. The fraction of interactions that are missing is higher for essential genes (46% on average), but is similar for genes with reduced fitness in rich media and for other non-essential genes (29% and 33%, respectively). It is logical to surmise that the vast number of interactions measured in the available E-MAPs can be used to predict the unmeasured GIs. The unique features of E-MAPs suggest that a dedicated approach to prediction of missing GIs in E-MAPs may be more powerful than previously suggested techniques for GI prediction. It is this possibility that we address here.
Most of the previous studies on GI prediction were based on a large variety of genomic information available for each gene in S. cerevisiae. An exception are the studies by Qi et al.  and Chipman and Singh , which showed that information about the GI network alone is sufficient for a relatively accurate qualitative prediction of negative GIs. Here we show that by integrating GI information across genes, it is possible to achieve quantitative prediction of both positive and negative GIs that significantly outperforms predictions made by other methods. Furthermore, this prediction can be improved by combining E-MAP-based information with other genomic data, although this improvement is relatively minor. We thus show that the measured gene pairs in the E-MAP are the best source of information for predicting the pairs that could not be measured.
We demonstrate the utility of the imputed E-MAP values for two tasks: to improve the ability to detect functionally similar genes using either predicted interactions or correlations of imputed GI profiles; and to more fully inspect the landscape of GIs among co-complexed genes. Finally, we address three scenarios that give rise to missing values in E-MAPs and discuss the ability of our method to predict a substantial number of new interactions through a combination of E-MAPs.
Results and Discussion
Construction of gene-pair feature sets
Features used in this study
Number of features
Previous use for GI prediction
Shortest physical path
Mutual clustering coefficient
BioGrid  and E-MAP
BioGrid  and the E-MAP
Sequence similarity (BLAST E-value)
Occurrence in a specific protein complex
Co-occurrence in any protein complex
A common deletion phenotype
Correlation of quantitative phenotype profiles
Gene Ontology semantic similarity
A common subcellular localization
S-score in S. pombe
mRNA expression (correlation)
S-score between A and genes similar to B (or vice versa)
S-scores among genes similar to A and to genes similar to B
The first two groups contain features that were used in previous studies [13, 20]: the NETWORK group, which includes features based on the physical and GI networks, and the GENOMIC group, which includes features based on various genomic characteristics. Unlike previous studies, we defined separate individual features for each protein complex, phenotype and localization, whereas others used a single feature, encoding whether the gene pair shares any complex, phenotype or localization. This change stemmed from observations that some complexes tend to take part in a large number of GIs [15, 16].
The third and the fourth groups constitute the main innovation in our feature set compared to previous works - the use of information on genetically similar genes (GSGs; Figure 1b; Materials and methods). The GI profile of a gene is a vector representing the scores of its GIs with other genes that took part in the GI screen. Previous studies have shown that similarity of GI profiles is a powerful indicator of functional similarity between genes [9, 10, 18, 23]. Following this reasoning, we hypothesized that when predicting the GI between genes A and B, it would be useful to detect genes with GI profiles similar to those of A and B and to check the GIs among them (Figure 1b). We call a set of genes GSGs of gene A if their GI profiles are the most similar to those of A among all the genes in the E-MAP. The third group is called the GSG feature set. When we wish to predict the GI between genes A and B, it contains the GI scores (which, following , we call S-scores) between A and the GSGs of B and vice versa (see Materials and methods).
Recent studies have shown that many GIs occur between pairs of functional modules [15–18]. If A and B belong to distinct functional modules, it is reasonable that the S-scores between other members of the same module will be indicative of the S-score between A and B. This is the rationale behind the fourth group, called GSG-MATRIX, which contains S-scores between GSGs of A and GSGs of B (see Materials and methods). For the ChromBio E-MAP we used 15 NETWORK, 117 GENOMIC, 10 GSG and 25 GSG-MATRIX features (167 features in total).
Comparison of feature sets and classifiers for prediction of quantitative GIs
Classifiers used in this study
Quantitative GI prediction
Least median squared linear regression
Gaussian radial basis function network
k nearest neighbors
GI class prediction
J48 decision tree
Discretized linear regression
See Materials and methods
Comparison of feature sets and classifiers for prediction of GI class
We also tested different combinations of feature sets and classifiers for qualitative prediction of GIs. The GIs in the training set were assigned to be positive, negative or neutral (see Materials and methods), and the classifiers were trained to predict the three classes. We compared five classifiers (Table 2), including those used in previous GI prediction studies [13, 19]. We also compared our approach to the diffusion kernel method recently proposed by Qi et al.  (using the original implementation provided by the authors, which we applied to the same dataset; see Materials and methods). We used the G- diffusion kernel (based on the number of odd-length paths between the two genes) for prediction of negative interactions, and the G+ kernel (based on the number of even-length paths) for prediction of positive interactions (see Materials and methods). An implementation of the random walk method of Chipman and Singh  was not available for comparison. Classifier performance was evaluated separately for prediction of positive and negative interactions, using two criteria. First, as in previous studies, we computed the area under the curve (AUC) score; this is the area under the receiver operating characteristic (ROC) curve, which plots the fraction of true positives as a function of the false positive rate, as the prediction threshold varies . Although widely used, the AUC criterion is not very informative in our case because the dataset is skewed: there are many more negative than positive examples (the ratio between negative, positive and neutral interactions is approximately 6:3:91 in the ChromBio E-MAP and 3:2:95 in the ER and RNA E-MAPs). In the case of GI prediction, it is especially important that there be a sufficient fraction of true positives among the best-ranked predictions that could potentially be experimentally tested. One way to quantify this is to look at the precision-recall curve, which plots the fraction of the predictions that are correct as a function of the true positive rate (the fraction of true pairs that were predicted correctly) . The area under the precision-recall curve (AUPR) provides a better quantitative assessment of the performance when the dataset is skewed. A method with perfect classification accuracy has an AUC of 1 and an AUPR of 1, while a random classifier would have an AUC of 0.5 and (for data with a low fraction of positive examples) an AUPR close to 0.
Accurate imputation of negative GIs not measured in the E-MAP
Validation of quantitative predictions of GIs
While the comparison with BioGrid shows that our method is capable of predicting strong negative GIs, our main goals are to predict positive GIs and to make quantitative predictions. To test our ability to accomplish these goals, we used the RNA E-MAP, which shares 127 genes with the ChromBio E-MAP. Among these genes, we found 779 gene pairs for which GIs were measured only in the RNA E-MAP. These pairs could be effectively used as an independent test of our ability to predict quantitative GIs. When we imputed the missing values in the ChromBio E-MAP using linear regression with all the features, the correlation between the predicted values and the S-scores in the RNA E-MAP was 0.452 (Pearson correlation P-value = 2.2 × 10-16). While highly significant, this correlation is lower than the 0.604 we recorded in our cross-validation experiments using only the ChromBio E-MAP. A likely partial explanation for this is the E-MAP-specific normalization, which uses data from other genes in the same E-MAP to compute S-scores based on raw colony size measurements . Similar to the results of the cross-validation experiments, the accuracy of the prediction of negative interactions was higher than that of positive interactions (52.5% versus 37.5%).
Individual features most useful for prediction of GI type
The features with the highest correlation to measured S-scores
GSG #1 for A
GSG #1 for B
GSG #2 for A
GSG #2 for B
GSG #3 for A
GSG #3 for B
GSG #4 for A
GSG #4 for B
GSG #5 for A
GSG #5 for B
SL degree (average of A and B)
SS degree (average of A and B)
S-score in S. pombe
GO cellular compartment similarity
MIPS phenotype: Slow-growth
Quantitative phenotype correlation
GO biological process similarity
Co-occurrence in any subcellular localization
Our feature set contained separate features representing individual complexes, phenotypes or localizations. This information was summarized using a single feature in . Thirteen individual complex features were ranked higher than the 'same MIPS complex' feature; 25 individual phenotype features were ranked higher than 'same MIPS phenotype'; and two localizations were ranked higher than 'same localization'. Hence, using individual features is indeed beneficial, as their information content frequently exceeds that of 'summary' features.
Finally, we compared the performance of each of the four groups of features separately with linear regression (Figure S5 in Additional file 1) and found that the performance of the GSG features alone was best, followed, in decreasing order, by GSG_MATRIX, NETWORK and GENOMIC groups. Note that this order is reversed to the number of features in each group, indicating that the quality of the features is much more important than their number.
Gene pairs predicted to genetically interact are functionally related
Imputation improves correspondence between genetic and functional similarity
Predicted genetic interactions within protein complexes
We next analyzed the predicted landscape of GIs among genes belonging to the same protein complex. Bandyopadhyay et al.  studied the ChromBio E-MAP and found that many protein complexes are enriched with either positive or negative GIs, and that complexes enriched with negative interactions commonly carry out essential functions and thus are more likely to contain essential genes. However, several complexes, such as TFIID, TFIIF and Mediator, contained a very large number of missing values and therefore could not be reliably studied using the measured interactions. We performed imputation on the ChromBio E-MAP using linear regression and all the features, and inspected the fraction of positive and negative interactions among genes belonging to the same complex.
We were able to significantly increase the number of complexes that have predominantly negative interactions (Figure 8a). Four such complexes are shown in Figure 8c: DNA replication factor C, TFIIF, RNA polymerase III and TFIID. Among protein complexes enriched with positive GIs (Figure 8b), most of the interactions were measured ones, with the exception of the SWI/SNF complex, in which we predicted many positive GIs (Figure 8d). Consistent with the results of Bandyopadhyay et al. , in six out of the seven complexes in which the majority of the negative interactions were newly predicted ones, at least two-thirds of the complex members are essential. In contrast, none of the members of the SWI/SNF complex are essential.
We emphasize that gene essentiality was not part of the features used for GI prediction. Our results provide further evidence that complexes enriched with negative GIs are likely to carry out essential functions.
The effect of missing values abundance and distribution on prediction accuracy
The results are presented in Figure 9d, e. Our predictions were reasonably accurate (r > 0.4) when up to 50% of the E-MAP values were hidden for the 'Random' and 'Submatrix' models. For the 'Cross' model, performance already deteriorated when 40% of the data were removed. The performance was better than random (which results in correlation 0) even when up to 90% of the data were missing. As could be expected, when the fraction of hidden interactions was up to 40%, the prediction was more accurate in the 'Random' model than the 'Submatrix' model. Surprisingly, this trend was reversed when 50% or more of the data were hidden. A possible explanation for this phenomenon is that the number of common GI partners scales quadratically with the fraction of missing values for all the gene pairs in the first scenario, and scales linearly for some gene pairs in the second scenario (see Text S1 in Additional file 1 for a detailed explanation). With regard to the utility of our method for a combination of E-MAPs, we find that missing GIs can be predicted quite accurately (r > 0.4) when the two E-MAPs share ≥ 64% of their genes (which leads to ≤ 30% missing values). It is expected that as the percentage of missing GIs increases, the inclusion of NETWORK and GENOMIC features will be more helpful. Indeed, the difference between the performance using the GSG+MATRIX features only (Figure 9d) and using all the features (Figure 9e) was small (<10%) as long as ≤ 40% of the data were removed, but rose to above 20% when ≥ 70% of the data were removed.
In this study we investigated prediction of quantitative GIs using data from E-MAP experiments. To the best of our knowledge, this is the first study attempting to address this problem. Our results suggest that such imputation is possible with about 60% accuracy by combining information from the available GI maps. Adding genomic data contributes only marginally to the prediction accuracy. This finding has important implications for the study of organisms other than S. cerevisiae, such as Schizosaccharomyces pombe for which two GI maps are already available [11, 34], but other genomic data, such as PPIs, are still scarce. Our results show that imputation of missing values in future studies in such organisms will not be seriously affected by the lack of other genomic data.
The strength of the proposed approach is that it borrows information about GIs from related genes. This also underlines one of its limitations: it can only predict GIs among genes that have been studied genetically (that is, they appear in the same E-MAP). This limitation is shared by other methods utilizing only data about GIs , which are restricted to predicting GIs among genes that appear in the GI network.
To the best of our knowledge, this is also the first attempt to predict positive GIs. Our results show that the available approaches for predicting negative GIs perform poorly for prediction of positive interactions. While our method provides encouraging results in predicting such interactions, this task is evidently much more difficult than prediction of negative interactions. The accuracy of the best method to predict negative interactions is more than double that of the best method for prediction of positive interactions (0.45 versus 0.2 using the AUPR measure). One possible explanation for this difference in performance is that there are fewer positive interactions in the E-MAPs, and therefore less data points to properly train the classifiers. Another possibility is that the nature of these interactions is more complex than that of the negative GIs, making their prediction a more difficult task. Perhaps other, yet to be discovered features can predict these interactions with better accuracy.
The use of GI maps in yeast has already led to identification of novel complexes and gene functions, some of which were not recovered by other available methods [10, 35–40]. It is thus expected that the use of such maps will increase, and large GI maps will be created for other biological systems (for example, mammalian cell lines) in the near future. As long as these maps remain prone to biological and technical noise, imputation of missing data will play a key role in their computational analysis.
Materials and methods
We used the S-scores reported in the original publications [9–12]. To avoid bias due to extreme S-scores, S-scores below -10 were set to -10 and S-scores above 10 were set to 10. When an open reading frame was represented by more than one deletion strain (for example, a knock-out strain and a strain with a DAmP allele ), the strain with the least missing values was chosen. When predicting the type of the GI, following , we defined a GI as negative if the S-score was below -2.5 and as positive if the S-score was above 2.
Network and genomic feature sets
We now describe the features based on network properties and genomic information that we used. Previous studies that employed these features for GI prediction are listed in Table 1. We used three networks: PPI, and synthetic lethal and synthetic sick networks, all taken from BioGrid . We added to the synthetic lethal network interactions between gene pairs from the analyzed E-MAP that had S-scores ≤ -2.5.
Physical interaction is a binary feature indicating if the proteins interact in the physical network.
Network degree is the number of neighbors in the PPI, synthetic lethal and synthetic sick networks recorded for each gene. Following  we used two features for each network and each gene pair with degrees d1 and d2: the average degree (d1 + d2)/2 and the absolute difference between the degrees, |d1 - d2|.
Shortest physical path
The shortest physical path is the length of the shortest path between the proteins in the PPI network.
Mutual clustering coefficient
Mutual clustering coefficient was computed as described in  using the PPI network.
The 2-hop feature was computed as described in , using the physical, synthetic lethal and synthetic sick networks.
Protein complexes were taken from the MIPS (Munich Information Center for Protein Sequences) database . Only complexes in which at least three proteins appeared in the analyzed E-MAP were used. For each complex we added a ternary feature indicating how many of the proteins in the gene pair (0, 1 or 2) appeared as part of the complex. These features were called 'individual' as they refer to individual complexes. In addition we added a binary feature indicating whether the genes in the pair shared any protein complex. Using a newer collection of protein complexes  did not significantly affect the prediction performance (results not shown).
S. cerevisiae single deletion strain phenotypes (for example, sensitivity to DNA damaging agents) were obtained from MIPS . Only phenotypes shared by at least three genes in the analyzed E-MAP were used. As for protein complexes, we added a ternary feature for each phenotype and a binary feature indicating whether the gene pair shared any phenotype.
Quantitative phenotype correlation
We used the quantitative measurements of single deletion phenotypes described in . For each gene pair, we computed the Pearson correlation between the phenotypic profiles of the genes.
GO semantic similarity
Semantic similarity between the annotations of the two genes were computed using the method described in . Similarity was computed separately for each part of the GO - 'biological process', 'molecular function' and 'cellular compartment'.
Protein sequence similarity
Translated open reading frames obtained from the Saccharomyces Genome Database  were BLASTed for quantifying the protein sequence similarity. The feature equals the -log(E-value) for the best local alignment found (if the best E-value was above 5 the feature was set to 0).
Subcellular localization for S. cerevisiae proteins was obtained from . Only localizations shared by at least three genes in the analyzed E-MAP were used.
S-score in S. pombe
For each gene pair this feature contained the S-score between the orthologs of the genes in S. pombe (if available in the Pombe E-MAP ). Orthology assignments between S. cerevisae and S. pombe were taken from .
GSG and GSG-MATRIX features
For each gene A, we ordered all the other genes based on the similarity between their GI profile and the GI profile of A (using Euclidean distance as a measure of similarity). Gene B is called a GSG of A if it is among the genes most similar to A. A GSG of A is informative about B if the information about its GI with B is available (that is, it is neither missing nor hidden in the cross-validation experiments). The GSG feature set consists of 2 k features: for each gene pair A-B, it contains the S-scores between A and the k highest order GSGs of B that are also informative about A (called GSG #1 through GSG #k for A) and between B and the k top informative GSGs of A (called GSG #1 - k for B; Figure 1b; Figure S7 in Additional file 1). Note that since the gene pairs are not ordered, the k pairs of GSG features are symmetric (that is, GSG #1 for A and GSG #1 for B should be equally informative). Therefore, the small differences we observe between these feature pairs (Table S1 in Additional file 1) probably arise by pure chance. We used k = 5 throughout this study (see Figure S2 in Additional file 1 for the analysis of sensitivity to k).
The GSG-MATRIX feature set contains k2 features representing the available S-scores between the top GSGs of A and the top GSGs of B (Figure S7 in Additional file 1). Due to missing values, typically there will be less than k2 S-scores available between the top k GSG of A and the top k GSGs of B. We therefore used the following strategy. Denote by GSGi(A) the i-th GSG of A. In iteration i we added to the feature set the available S-scores between GSG i (A) and the i top GSGs of B and between GSG i (B) and the i top GSGs of A. Starting from i = 1, we increased i until k2 features were constructed. In each iteration we iteratively increased j from 1 to i - 1 and added the features corresponding to the GIs between [GSG i (A), GSGj(B)] and between [GSG i (B), GSG j (A)]. The iteration was stopped once k2 features were obtained. This way, we ensured that the feature set did not contain missing values and preferred features corresponding to genes more similar to A and B.
We used the classifiers implemented in Weka . A fast implementation of Random Forest was taken from . All the classifiers were used with default parameters. For GI class prediction, the linear regression predicted values were treated as negative if the predicted score was ≤ -2.5 and positive if it was ≥ 2.
Prediction of GIs using a diffusion kernel
We constructed a synthetic lethality network by combining interactions from BioGrid with interactions between genes whose S-score in the E-MAP was ≤ -2.5. The network was analyzed using supplementary MATLAB code from . G- kernel was used to predict negative GIs, and G+ to predict positive GIs. Note that the G+ was originally proposed for prediction of PPIs, but we found that it performed better than G- for prediction of positive interactions (a task that was not addressed by Qi et al. ). We tested different values of the γ parameter between 1 and 40 and selected for each E-MAP the parameter value that obtained the best AUC.
The gene pairs with measured values in the analyzed E-MAP were divided into ten random groups. In each iteration (fold), nine of the groups were used to train the classifiers and their performance was evaluated using the tenth group. In order to enhance computational efficiency, only 30% of the ChromBio and 50% of the RNA E-MAP measured gene pairs were used as the training set in each fold (the subset used was chosen randomly).
Enrichment of protein complexes with positive or negative interactions
We used the following procedure to evaluate if a protein complex C is enriched with positive (negative) interactions. Suppose C contains k positive interactions. We generated an unweighted graph G p in which the nodes are the genes in the E-MAP and an edge connects v and u in G p if there is a positive interaction between u and v in the E-MAP. We then generated 1,000 random degree preserving graphs using edge shuffling . The empirical P-value of the enrichment of C with positive interactions was estimated as the fraction of these graphs that contained at least k edges between the nodes in C. An analogous procedure was used to estimate the significance of the enrichment of C with negative interactions. Complexes enriched with a false discovery rate < 0.05 were selected using the Benjamini-Hochberg procedure .
area under curve
area under the precision-recall curve
epistatic miniarray profile
genetically similar gene
receiver operating characteristic
systematic genetic analysis.
We thank Roye Rozov for comments on an early version of this manuscript. Ron Shamir was supported in part by the Raymond and Beverly Sackler Chair in Bioinformatics and by the Israel Science Foundation (grant no. 802/08). Igor Ulitsky was supported in part by a fellowship from the Edmond J Safra Bioinformatics Program at Tel Aviv University.
- Segre D, Deluna A, Church GM, Kishony R: Modular epistasis in yeast metabolism. Nat Genet. 2005, 37: 77-83.PubMedGoogle Scholar
- Beyer A, Bandyopadhyay S, Ideker T: Integrating physical and genetic maps: from genomes to interaction networks. Nat Rev Genet. 2007, 8: 699-710. 10.1038/nrg2144.PubMedPubMed CentralView ArticleGoogle Scholar
- Mani R, St Onge RP, Hartman JLt, Giaever G, Roth FP: Defining genetic interaction. Proc Natl Acad Sci USA. 2008, 105: 3461-3466. 10.1073/pnas.0712255105.PubMedPubMed CentralView ArticleGoogle Scholar
- St Onge RP, Mani R, Oh J, Proctor M, Fung E, Davis RW, Nislow C, Roth FP, Giaever G: Systematic pathway analysis using high-resolution fitness profiling of combinatorial gene deletions. Nat Genet. 2007, 39: 199-206. 10.1038/ng1948.PubMedPubMed CentralView ArticleGoogle Scholar
- Tong AH, Evangelista M, Parsons AB, Xu H, Bader GD, Page N, Robinson M, Raghibizadeh S, Hogue CW, Bussey H, Andrews B, Tyers M, Boone C: Systematic genetic analysis with ordered arrays of yeast deletion mutants. Science. 2001, 294: 2364-2368. 10.1126/science.1065810.PubMedView ArticleGoogle Scholar
- Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, Chen Y, Cheng X, Chua G, Friesen H, Goldberg DS, Haynes J, Humphries C, He G, Hussein S, Ke L, Krogan N, Li Z, Levinson JN, Lu H, Menard P, Munyana C, Parsons AB, Ryan O, Tonikian R, Roberts T, et al: Global mapping of the yeast genetic interaction network. Science. 2004, 303: 808-813. 10.1126/science.1091317.PubMedView ArticleGoogle Scholar
- Pan X, Yuan DS, Xiang D, Wang X, Sookhai-Mahadeo S, Bader JS, Hieter P, Spencer F, Boeke JD: A robust toolkit for functional profiling of the yeast genome. Mol Cell. 2004, 16: 487-496. 10.1016/j.molcel.2004.09.035.PubMedView ArticleGoogle Scholar
- Pan X, Ye P, Yuan DS, Wang X, Bader JS, Boeke JD: A DNA integrity network in the yeast Saccharomyces cerevisiae. Cell. 2006, 124: 1069-1081. 10.1016/j.cell.2005.12.036.PubMedView ArticleGoogle Scholar
- Schuldiner M, Collins SR, Thompson NJ, Denic V, Bhamidipati A, Punna T, Ihmels J, Andrews B, Boone C, Greenblatt JF, Weissman JS, Krogan NJ: Exploration of the function and organization of the yeast early secretory pathway through an epistatic miniarray profile. Cell. 2005, 123: 507-519. 10.1016/j.cell.2005.08.031.PubMedView ArticleGoogle Scholar
- Collins SR, Miller KM, Maas NL, Roguev A, Fillingham J, Chu CS, Schuldiner M, Gebbia M, Recht J, Shales M, Ding H, Xu H, Han J, Ingvarsdottir K, Cheng B, Andrews B, Boone C, Berger SL, Hieter P, Zhang Z, Brown GW, Ingles CJ, Emili A, Allis CD, Toczyski DP, Weissman JS, Greenblatt JF, Krogan NJ: Functional dissection of protein complexes involved in yeast chromosome biology using a genetic interaction map. Nature. 2007, 446: 806-810. 10.1038/nature05649.PubMedView ArticleGoogle Scholar
- Roguev A, Bandyopadhyay S, Zofall M, Zhang K, Fischer T, Collins SR, Qu H, Shales M, Park HO, Hayles J, Hoe KL, Kim DU, Ideker T, Grewal SI, Weissman JS, Krogan NJ: Conservation and rewiring of functional modules revealed by an epistasis map in fission yeast. Science. 2008, 322: 405-410. 10.1126/science.1162609.PubMedPubMed CentralView ArticleGoogle Scholar
- Wilmes GM, Bergkessel M, Bandyopadhyay S, Shales M, Braberg H, Cagney G, Collins SR, Whitworth GB, Kress TL, Weissman JS, Ideker T, Guthrie C, Krogan NJ: A genetic interaction map of RNA-processing factors reveals links between Sem1/Dss1-containing complexes and mRNA export and splicing. Mol Cell. 2008, 32: 735-746. 10.1016/j.molcel.2008.11.012.PubMedPubMed CentralView ArticleGoogle Scholar
- Wong SL, Zhang LV, Tong AH, Li Z, Goldberg DS, King OD, Lesage G, Vidal M, Andrews B, Bussey H, Boone C, Roth FP: Combining biological networks to predict genetic interactions. Proc Natl Acad Sci USA. 2004, 101: 15682-15687. 10.1073/pnas.0406614101.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhang LV, King OD, Wong SL, Goldberg DS, Tong AH, Lesage G, Andrews B, Bussey H, Boone C, Roth FP: Motifs, themes and thematic maps of an integrated Saccharomyces cerevisiae interaction network. J Biol. 2005, 4: 6-10.1186/jbiol23.PubMedPubMed CentralView ArticleGoogle Scholar
- Kelley R, Ideker T: Systematic interpretation of genetic interactions using protein networks. Nat Biotechnol. 2005, 23: 561-566. 10.1038/nbt1096.PubMedPubMed CentralView ArticleGoogle Scholar
- Ulitsky I, Shamir R: Pathway redundancy and protein essentiality revealed in the Saccharomyces cerevisiae interaction networks. Mol Syst Biol. 2007, 3: 104-10.1038/msb4100144.PubMedPubMed CentralView ArticleGoogle Scholar
- Bandyopadhyay S, Kelley R, Krogan NJ, Ideker T: Functional maps of protein complexes from quantitative genetic interaction data. PLoS Comput Biol. 2008, 4: e1000065-10.1371/journal.pcbi.1000065.PubMedPubMed CentralView ArticleGoogle Scholar
- Ulitsky I, Shlomi T, Kupiec M, Shamir R: From E-MAPs to module maps: dissecting quantitative genetic interactions using physical interactions. Mol Syst Biol. 2008, 4: 209-10.1038/msb.2008.42.PubMedPubMed CentralView ArticleGoogle Scholar
- Zhong W, Sternberg PW: Genome-wide prediction of C. elegans genetic interactions. Science. 2006, 311: 1481-1484. 10.1126/science.1123287.PubMedView ArticleGoogle Scholar
- Paladugu SR, Zhao S, Ray A, Raval A: Mining protein networks for synthetic genetic interactions. BMC Bioinformatics. 2008, 9: 426-10.1186/1471-2105-9-426.PubMedPubMed CentralView ArticleGoogle Scholar
- Qi Y, Suhail Y, Lin YY, Boeke JD, Bader JS: Finding friends and enemies in an enemies-only network: a graph diffusion kernel for predicting novel genetic interactions and co-complex membership from yeast genetic interactions. Genome Res. 2008, 18: 1991-2004. 10.1101/gr.077693.108.PubMedPubMed CentralView ArticleGoogle Scholar
- Chipman KC, Singh AK: Predicting genetic interactions with random walks on biological networks. BMC Bioinformatics. 2009, 10: 17-10.1186/1471-2105-10-17.PubMedPubMed CentralView ArticleGoogle Scholar
- Ye P, Peyser BD, Pan X, Boeke JD, Spencer FA, Bader JS: Gene function prediction from congruent synthetic lethal interactions in yeast. Mol Syst Biol. 2005, 1: 2005.0026-10.1038/msb4100034.PubMedPubMed CentralView ArticleGoogle Scholar
- Collins SR, Schuldiner M, Krogan NJ, Weissman JS: A strategy for extracting and analyzing large-scale quantitative epistatic interaction data. Genome Biol. 2006, 7: R63-10.1186/gb-2006-7-7-r63.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang Y, Witten I: Induction of model trees for predicting continuous classes. Induction of Model Trees for Predicting Continuous Classes. 1996, Hamilton: The University of WaikatoGoogle Scholar
- Van Rijsbergen CJ: Information Retrieval. 1979, Newton, MA; Butterworth-HeinemannGoogle Scholar
- Chua HN, Sung WK, Wong L: Exploiting indirect neighbours and topological weight to predict protein function from protein-protein interactions. Bioinformatics. 2006, 22: 1623-1630. 10.1093/bioinformatics/btl145.PubMedView ArticleGoogle Scholar
- Stark C, Breitkreutz BJ, Reguly T, Boucher L, Breitkreutz A, Tyers M: BioGRID: a general repository for interaction datasets. Nucleic Acids Res. 2006, 34: D535-539. 10.1093/nar/gkj109.PubMedPubMed CentralView ArticleGoogle Scholar
- Ozier O, Amin N, Ideker T: Global architecture of genetic interactions on the protein network. Nat Biotechnol. 2003, 21: 490-491. 10.1038/nbt0503-490.PubMedView ArticleGoogle Scholar
- Lord PW, Stevens RD, Brass A, Goble CA: Investigating semantic similarity measures across the Gene Ontology: the relationship between sequence and annotation. Bioinformatics. 2003, 19: 1275-1283. 10.1093/bioinformatics/btg153.PubMedView ArticleGoogle Scholar
- Resnik P: Semantic similarity in a taxonomy: An information-based measure and its application to problems of ambiguity in natural language. J Artificial Intelligence. 1999, 11: 95-130.Google Scholar
- Wang JZ, Du Z, Payattakool R, Yu PS, Chen CF: A new method to measure the semantic similarity of GO terms. Bioinformatics. 2007, 23: 1274-1281. 10.1093/bioinformatics/btm087.PubMedView ArticleGoogle Scholar
- Pu S, Wong J, Turner B, Cho E, Wodak SJ: Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res. 2009, 37: 825-831. 10.1093/nar/gkn1005.PubMedPubMed CentralView ArticleGoogle Scholar
- Dixon SJ, Fedyshyn Y, Koh JL, Prasad TS, Chahwan C, Chua G, Toufighi K, Baryshnikova A, Hayles J, Hoe KL, Kim DU, Park HO, Myers CL, Pandey A, Durocher D, Andrews BJ, Boone C: Significant conservation of synthetic lethal genetic interaction networks between distantly related eukaryotes. Proc Natl Acad Sci USA. 2008, 105: 16653-16658. 10.1073/pnas.0806261105.PubMedPubMed CentralView ArticleGoogle Scholar
- Jonikas MC, Collins SR, Denic V, Oh E, Quan EM, Schmid V, Weibezahn J, Schwappach B, Walter P, Weissman JS, Schuldiner M: Comprehensive characterization of genes required for protein folding in the endoplasmic reticulum. Science. 2009, 323: 1693-1697. 10.1126/science.1167983.PubMedPubMed CentralView ArticleGoogle Scholar
- Kornmann B, Currie E, Collins SR, Schuldiner M, Nunnari J, Weissman JS, Walter P: An ER-mitochondria tethering complex revealed by a synthetic biology screen. Science. 2009, 325: 477-481. 10.1126/science.1175088.PubMedPubMed CentralView ArticleGoogle Scholar
- Schuldiner M, Metz J, Schmid V, Denic V, Rakwalska M, Schmitt HD, Schwappach B, Weissman JS: The GET complex mediates insertion of tail-anchored proteins into the ER membrane. Cell. 2008, 134: 634-645. 10.1016/j.cell.2008.06.025.PubMedPubMed CentralView ArticleGoogle Scholar
- Nagai S, Dubrana K, Tsai-Pflugfelder M, Davidson MB, Roberts TM, Brown GW, Varela E, Hediger F, Gasser SM, Krogan NJ: Functional targeting of DNA damage to a nuclear pore-associated SUMO-dependent ubiquitin ligase. Science. 2008, 322: 597-602. 10.1126/science.1162790.PubMedPubMed CentralView ArticleGoogle Scholar
- Keogh MC, Kurdistani SK, Morris SA, Ahn SH, Podolny V, Collins SR, Schuldiner M, Chin K, Punna T, Thompson NJ, Boone C, Emili A, Weissman JS, Hughes TR, Strahl BD, Grunstein M, Greenblatt JF, Buratowski S, Krogan NJ: Cotranscriptional set2 methylation of histone H3 lysine 36 recruits a repressive Rpd3 complex. Cell. 2005, 123: 593-605. 10.1016/j.cell.2005.10.025.PubMedView ArticleGoogle Scholar
- Fiedler D, Braberg H, Mehta M, Chechik G, Cagney G, Mukherjee P, Silva AC, Shales M, Collins SR, van Wageningen S, Kemmeren P, Holstege FC, Weissman JS, Keogh MC, Koller D, Shokat KM, Krogan NJ: Functional organization of the S. cerevisiae phosphorylation network. Cell. 2009, 136: 952-963. 10.1016/j.cell.2008.12.039.PubMedPubMed CentralView ArticleGoogle Scholar
- Goldberg DS, Roth FP: Assessing experimentally derived interactions in a small world. Proc Natl Acad Sci USA. 2003, 100: 4372-4376. 10.1073/pnas.0735871100.PubMedPubMed CentralView ArticleGoogle Scholar
- Mewes HW, Hani J, Pfeiffer F, Frishman D: MIPS: a database for protein sequences and complete genomes. Nucleic Acids Res. 1998, 26: 33-37. 10.1093/nar/26.1.33.PubMedPubMed CentralView ArticleGoogle Scholar
- Pu S, Wong J, Turner B, Cho E, Wodak SJ: Up-to-date catalogues of yeast protein complexes. Nucleic Acids Res. 2008, 37: 825-831. 10.1093/nar/gkn1005.PubMedPubMed CentralView ArticleGoogle Scholar
- Brown JA, Sherlock G, Myers CL, Burrows NM, Deng C, Wu HI, McCann KE, Troyanskaya OG, Brown JM: Global analysis of gene function in yeast by quantitative phenotypic profiling. Mol Syst Biol. 2006, 2: 2006.0001-10.1038/msb4100043.PubMedPubMed CentralView ArticleGoogle Scholar
- Cherry JM, Adler C, Ball C, Chervitz SA, Dwight SS, Hester ET, Jia Y, Juvik G, Roe T, Schroeder M, Weng S, Botstein D: SGD: Saccharomyces Genome Database. Nucleic Acids Res. 1998, 26: 73-79. 10.1093/nar/26.1.73.PubMedPubMed CentralView ArticleGoogle Scholar
- Huh WK, Falvo JV, Gerke LC, Carroll AS, Howson RW, Weissman JS, O'Shea EK: Global analysis of protein localization in budding yeast. Nature. 2003, 425: 686-691. 10.1038/nature02026.PubMedView ArticleGoogle Scholar
- Gasch AP, Huang M, Metzner S, Botstein D, Elledge SJ, Brown PO: Genomic expression responses to DNA-damaging agents and the regulatory role of the yeast ATR homolog Mec1p. Mol Biol Cell. 2001, 12: 2987-3003.PubMedPubMed CentralView ArticleGoogle Scholar
- Gasch AP, Spellman PT, Kao CM, Carmel-Harel O, Eisen MB, Storz G, Botstein D, Brown PO: Genomic expression programs in the response of yeast cells to environmental changes. Mol Biol Cell. 2000, 11: 4241-4257.PubMedPubMed CentralView ArticleGoogle Scholar
- Roberts CJ, Nelson B, Marton MJ, Stoughton R, Meyer MR, Bennett HA, He YD, Dai H, Walker WL, Hughes TR, Tyers M, Boone C, Friend SH: Signaling and circuitry of multiple MAPK pathways revealed by a matrix of global gene expression profiles. Science. 2000, 287: 873-880. 10.1126/science.287.5454.873.PubMedView ArticleGoogle Scholar
- Hughes TR, Marton MJ, Jones AR, Roberts CJ, Stoughton R, Armour CD, Bennett HA, Coffey E, Dai H, He YD, Kidd MJ, King AM, Meyer MR, Slade D, Lum PY, Stepaniants SB, Shoemaker DD, Gachotte D, Chakraburtty K, Simon J, Bard M, Friend SH: Functional discovery via a compendium of expression profiles. Cell. 2000, 102: 109-126. 10.1016/S0092-8674(00)00015-5.PubMedView ArticleGoogle Scholar
- Spellman PT, Sherlock G, Zhang MQ, Iyer VR, Anders K, Eisen MB, Brown PO, Botstein D, Futcher B: Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Mol Biol Cell. 1998, 9: 3273-3297.PubMedPubMed CentralView ArticleGoogle Scholar
- O'Rourke SM, Herskowitz I: Unique and redundant roles for HOG MAPK pathway components as revealed by whole-genome expression analysis. Mol Biol Cell. 2004, 15: 532-542. 10.1091/mbc.E03-07-0521.PubMedPubMed CentralView ArticleGoogle Scholar
- Causton HC, Ren B, Koh SS, Harbison CT, Kanin E, Jennings EG, Lee TI, True HL, Lander ES, Young RA: Remodeling of yeast genome expression in response to environmental changes. Mol Biol Cell. 2001, 12: 323-337.PubMedPubMed CentralView ArticleGoogle Scholar
- Frank E, Hall M, Trigg L, Holmes G, Witten IH: Data mining in bioinformatics using Weka. Bioinformatics. 2004, 20: 2479-2481. 10.1093/bioinformatics/bth261.PubMedView ArticleGoogle Scholar
- Fast Random Forest Project. [http://code.google.com/p/fast-random-forest/]
- Shen-Orr SS, Milo R, Mangan S, Alon U: Network motifs in the transcriptional regulation network of Escherichia coli. Nat Genet. 2002, 31: 64-68. 10.1038/ng881.PubMedView ArticleGoogle Scholar
- Benjamini Y, Hochberg Y: Controlling the false discovery rate - a practical and powerful approach to multiple testing. J Roy Stat Soc Methodological. 1995, 57: 289-300.Google Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- Witten I, Frank E: Data mining: practical machine learning tools and techniques with Java implementations. ACM SIGMOD Record. 2002, 31: 76-77. 10.1145/507338.507355.View ArticleGoogle Scholar
- Wang Y, Witten I: Induction of model trees for predicting continuous classes. [http://www.cs.waikato.ac.nz/~ml/publications/1997/Wang-Witten-Induct.pdf]
- Rousseeuw P, Leroy A: Robust Regression and Outlier Detection. 1987, WileyView ArticleGoogle Scholar
- Bishop C: Neural Networks for Pattern Recognition. 1995, Oxford University PressGoogle Scholar
- Krogan NJ, Keogh MC, Datta N, Sawa C, Ryan OW, Ding H, Haw RA, Pootoolal J, Tong A, Canadien V, Richards DP, Wu X, Emili A, Hughes TR, Buratowski S, Greenblatt JF: A Snf2 family ATPase complex required for recruitment of the histone H2A variant Htz1. Mol Cell. 2003, 12: 1565-1576. 10.1016/S1097-2765(03)00497-0.PubMedView ArticleGoogle Scholar
- Cessie SL, Houwelingen JC: Ridge estimators in logistic regression. Applied Statistics. 1992, 41: 191-201. 10.2307/2347628.View ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited