Differential analysis of high-throughput quantitative genetic interaction data

Synthetic genetic arrays have been very effective at measuring genetic interactions in yeast in a high-throughput manner and recently have been expanded to measure quantitative changes in interaction, termed 'differential interactions', across multiple conditions. Here, we present a strategy that leverages statistical information from the experimental design to produce a novel, quantitative differential interaction score, which performs favorably compared to previous differential scores. We also discuss the added utility of differential genetic-similarity in differential network analysis. Our approach is preferred for differential network analysis, and our implementation, written in MATLAB, can be found at http://chianti.ucsd.edu/~gbean/compute_differential_scores.m.


Background
Genetic interactions are functional dependencies between genes, which become apparent when the phenotypic effect of one mutation is altered by the presence of a second. In model organisms such as yeast, genetic interactions can be rapidly assessed through the systematic construction of double mutants and measurement of quantitative phenotypes such as growth rate. Quantitative interactions may be positive or negative, indicating less or more severe double mutant phenotypes than expected from the single mutant phenotypes. Many large genetic network maps have been constructed from high-throughput genetic interaction screens in yeast, providing insight into the global landscape of interactions within the cell as well as the functional relationships between specific components of biological processes and pathways [1][2][3][4][5].
Recently, we used genetic interaction mapping in a 'differential mode' to compare the changes in genetic networks across experimental conditions [6][7][8]. To demonstrate this approach, called differential epistasis mapping, we compared the difference between quantitative genetic interaction scores derived from yeast grown on standard versus DNA-damaging media [6]. We found substantial changes in interaction patterns and demonstrated that the difference in scores was more effective than the scores in either static condition for highlighting interactions relevant to the pathway under study (DNA damage response (DDR)). Other biological networks, such as protein-protein interaction (PPI) or protein-DNA interaction networks, have also progressed from observing single experimental conditions to comparing the changes in interactions across multiple experimental conditions or genetic backgrounds. For example, Wrana and colleagues [9] developed the LUMIER (luminescence-based mammalian interactome mapping) strategy to identify pairwise PPIs among a set of human factors with and without stimulation by transforming growth factor β. Similarly, Workman et al. [10] used genome-wide chromatin immunoprecipitation to focus on changes in transcription factor binding after exposure to the DNA damaging agent methyl methanesulfonate (MMS). More recently, a quantitative approach has been presented by Bisson et al. [11] for measuring differential interactions in PPI networks. This approach, which the authors call affinity purification-selected reaction monitoring (AP-SRM), was used to map quantitative changes in interaction with the protein Grb2, which showed that the composition of Grb2 complexes was remarkably dependent on the stimulation. By focusing on additional hub proteins beyond Grb2, this method is likely to be useful for obtaining a global overview of protein network remodeling in response to a stimulus.
The progression from static to differential network biology in many fields increases the need for specialized statistical strategies for scoring differential networks. One approach to improving differential signal is to use paired experimental designs that reduce the noise between treated and untreated measurements. For example, experimental designs such as the two-color microarray were originally developed to reduce the noise resulting from technical variability, and various statistical methods have been developed to leverage the paired structure of these experiments (reviewed in [12][13][14][15]). Similar to two-color microarrays, differential network measurements can pair treated and untreated measurements. While some of the differential interaction studies [6,7] have employed such an experimental design, they did not utilize this information in their analysis, treating each measurement as independent.
Here, we investigate the statistical structure of two largescale differential genetic interaction experiments [6,7] and present a generalized strategy for scoring differential genetic interaction data. Our strategy produces differential genetic interaction networks that are more reproducible and more enriched for biologically relevant interactions than previous approaches based on network subtraction. A MATLAB implementation of our strategy is provided as Additional file 1 with the online version of this article.

Results and discussion
The differential interaction model The format of a differential genetic interaction experiment takes growth-rate measurements for each double mutant across two or more conditions. A single mutant yeast strain, called the 'query', is mated with an entire set of other single mutants (for example, deletions of all nonessential yeast genes), referred to as 'array' strains. The resulting diploids are sporulated and then undergo multiple selection steps to produce colonies of haploid double deletion mutants. In the last step of the pipeline, the same yeast colonies are replicated onto different media exhibiting the chosen growth conditions (Figure 1a; see [3,6,16] for high-throughput genetic interaction screening protocols).
Because one run of this experimental pipeline produces double mutant colonies that are grown in separate conditions but share the same initial steps, we had reason to believe that the double mutant growth-rate measurements are not independent. Using data from Bandyopadhyay et al. [6], we tested this hypothesis by comparing the correlation of experimental replicates (that is, colonies generated in separate pipelines but grown in the same condition) with the correlation of colonies generated in the same pipeline but grown in different final conditions. Strikingly, we found that the correlation of colonies grown in different conditions was much greater than the correlation of experimental replicates (Figure 1b growth conditions and the conditional replicates were not. This observation suggested some degree of statistical dependence between the conditional replicate measurements. We further assessed the dependence across the conditional measurements with an analysis of the variance of replicate measurements. Assuming independence, the difference between two normally distributed random variables is distributed normally, with a variance equal to the sum of the variances of the original distributions (Equation 1): Therefore, for each double mutant, the variance of the differences between the static measurements should be equal to the sum of the variances of the static measurements.
Using the data from two differential interaction mapping experiments comparing MMS and standard growth conditions [6,7], we found that the variance of the difference for each double mutant was less than half of the expected differential variance, and even less than the variance of static (non-differential) measurements ( Figure 2). These results confirm that the across-condition measurements are not independent and raise the possibility that significant error reduction may be achieved by the differential mode of analysis.
The dS score: a quantitative measure of differential interaction Accordingly, we developed a strategy for scoring differential genetic interactions, which accounts for the dependency structure of the data. Assuming a growth constant p for each plate, which captures plate-to-plate differences in growth rate, the observed double mutant colony size z qai can be factored as follows: where q and a represent the query and array strains, i represents the experimental replicate, c represents the condition, f indicates the single mutant fitnesses, and Î represents the residual. Collins et al. [17] developed a strategy that uses colony size population trends to estimate p, f q , and f a and obtain a measurement of the residual, which serves to quantify the degree of genetic interaction between the query and array mutants.
For differential interactions, the null or 'non-interaction' model is that the mean of the differences between paired residuals is equal to zero: where c indicates the treatment and c 0 indicates the untreated, or reference, condition, and δ represents the difference in colony size residuals. Assuming thes Î are normally distributed, the degree to which this mean differs from zero given the variance of the replicates can be modeled using the paired t-statistic. We call our statistic the dS score, 'd' for 'differential' and 'S score' after the name of the statistic used by Collins et al. [17]: where δ qac is the mean of the differences of the residuals (Equation 3) and s qac is the sample standard deviation of the differences of the residuals. Unlike the S-score [17], we found that the sample variance was the best approximation of the variance (based on the quality control metrics described below) and did not employ a minimum bound or any modifiers or priors (such as in the case of SAM, Cyber-T, or LIMMA in microarray analysis [15,18,19]; see also [20]).

Similarity of differential interaction profiles provides distinct functional information
Previously, it has been shown that the correlation of static interaction profiles identifies many gene functional relationships not identified by direct genetic interactions (a genetic interaction profile is the set of all interactions with a given gene) [1,17]. Given our new quantitative  score for differential interactions, we therefore investigated whether differential interaction profiles could also be used to provide distinct functional information. Indeed, we found that the correlation of differential interaction profiles was able to identify relationships relevant to the treatment response and, furthermore, that these links were not identified either by direct interactions (static or differential) or by correlation of static profiles.
For example, using the dS score, we observed a very high differential similarity score between SWI4 and the subunits of the HIR complex ( Figure 3). In contrast, when computing genetic profile similarity between SWI4 and HIR in either static condition (standard or MMS-treated), similarity scores were strikingly low. SWI4 is the DNA-binding member of the SBF complex, a key regulator of genes involved in DNA synthesis and repair in G1 to S phase [21,22]. HIR1, HIR2, and HIR3 are subunits of the HIR complex that negatively regulate histone protein transcription [23] under control of the DNA-damage checkpoint kinase DUN1 [24]. Although SWI4 and HIR have not been previously implicated in a genetic relationship, SWI4 has been shown to regulate histone gene expression [25,26], suggesting that an interaction between SWI4 and HIR is feasible, especially in context of the DDR. Thus, differential similarity can identify functional relationships between genes that are not apparent from profile similarity analysis in static conditions.
We identified a total of 99 functional associations like SWI4 and HIR, that is, gene pairs with low static similarity and high differential similarity (see Additional file 2 for a complete list of gene pairs and their interaction and similarity scores). These gene pairs indicate DDRrelevant interactions that would not be identified through previously available methods. One of the key limitations of static profile similarity is that the static profile is populated by interactions pertaining to both the treatment as well as general cell growth. These non-relevant interactions diminish the similarity between genes that otherwise function very similarly in the treatment response. Additionally, the larger variance inherent in the static measurements contributes to noisier interaction profiles, which decreases the similarity of otherwise related profiles. Differential interactions are effective at identifying treatment-relevant relationships because they cut down the noise and eliminate non-related interactions.
Performance of the dS score and differential profile similarity We investigated the quality of the dS score by examining its false discovery rate, reproducibility and biological enrichment. As a baseline for comparison, where applicable dS scores were compared to the differential P-values described by Bandyopadhyay et al. [6], which indicate an empirically determined significance for the difference in S scores between two conditions. We designate the -log P-values from Bandyopadhyay et al. [6] as the 'B score'. To estimate the false discovery rate of different dS score thresholds, we first generated a dS null distribution using the data from Bandyopadhyay et al. [6], in which the final step involved pinning each double mutant twice in the same condition. These two colonies were paired and scored as if they were colonies grown in separate conditions (corresponding to z qaic and z qaic0 in Equation 2 above). We observed that the dS score has approximately symmetric false discovery rates for positive and negative scores ( Figure 4a). Next, we assessed reproducibility of the dS score by comparing B and dS scores generated using replicates 1 to 3 and, separately, 4 to 6 from Guénolé et al. [7]. Using only gene pairs that were scored in both analyses, we found that the dS score yields a much tighter reproducibility across replicates than the B score (Figure 4b-c; Figure S1 in Additional file 3). In particular, the Pearson correlation across replicates was remarkably higher for the dS score than the B score ( Figure 4d; the values on the far right correspond to data shown in Figure 4b,c). We found it of particular interest that for the most significant interactions, the dS score tends to greater and greater reproducibility, while the reproducibility of the B score drops to zero, indicating that for larger and larger values, the B score picks up on less and less signal.
To measure the biological enrichment of the dS score, we generated a bronze-standard set of interactions similar to that used by Bandyopadhyay et al. [6]. We included in our standard set any gene pair in which both genes were annotated as 'DNA-damage response' (DDR) in the Gene Ontology [27] (corresponding to 903 or 2,575 gene pairs in the Bandyopadhyay et al. [6] or Guénolé et al. [7] data sets, respectively), as well as any gene pair defined by the YeastNet 2.0 benchmark set [28] containing at least one DDR gene (390 or 772 gene pairs, respectively). As a second standard, we used the set of co-complex interactions compiled by Baryshnikova et al. [29], which is based on the set of macromolecular complexes recorded in the Saccharomyces Genome Database [30] or in the CYC2008 protein complex catalogue [31]. Using these two standards, we generated precision-recall plots for two previously published differential interaction networks (Bandyopadhyay et al. [6] and Guénolé et al. [7]). This analysis indicated that the dS score has essentially the same precision for recovering the DDR and the cocomplex standards as the original P-values published by Bandyopadhyay et al. [6] (Figure 5; see also Figures S2 to S4 in Additional file 3). However, we observed a notable improvement in enrichment for DDR interactions when using profile similarity of dS scores compared to profile similarity of B scores (Figure 5a,b).
Additionally, it is well known that gene pairs with high profile similarity are often members of the same physical complexes [32,33], so we investigated whether the same is true for differential-profile similarity. We found that the genes with similar dS score profiles are strikingly more enriched for co-complex pairs (Figure 5c,d), and specifically for protein complexes involved in the DDR ( Figure S2 in Additional file 3). For example, differential profile similarity was able to achieve a precision of 60 to 100% for recovering either DDR pathway interactions or protein complexes, using data from either of two studies. This performance was in contrast to that of individual differential interactions, which had a precision of 1 to 20% using these same standards and data.
It is interesting that B score profile similarity is underenriched for meaningful relationships. Part of this behavior may be explained by our observation that extreme B score values tend to capture noise and are not reproducible (Figure 4b-d). Because profile similarity is heavily influenced by larger values, B score profile similarity is overly sensitive to noise. Thus, relatively few spurious interactions can have an extensive influence on profile similarity.
We finally compared dS scores and dS profile similarity scores to the static S scores and profile similarity scores from the same data. We found that differential similarity scores are more enriched for DDR interactions than static similarity scores, even though static scores are more enriched for non-DDR-specific interactions ( Figure S3 in Additional file 3).
The reasons for the improved performance in identifying relevant genetic relationships of the dS score over the B score and the static scores deserve some attention. Genetic interaction mapping experiments are subject to many systematic sources of noise. For example, the ratio of double mutant cells to single mutant cells in the colonies growing on the single-mutant selection plate (see Figure 1 for an outline of the experimental workflow) affects the observed double mutant fitness in the following step. Other sources of systematic noise include uneven agar surfaces, which affect the quantity of material that is picked up and deposited during plate pinning, and variations in incubation time, humidity, and so on (Table 1). Despite sophisticated data processing methods, traces of these systematic artifacts may be preserved, and this noise can influence the estimation of interaction effects. The current experimental design for static interaction mapping experiments does not control for these artifacts, and the previous method for scoring differential interactions did not take advantage of builtin controls. However, our approach uses the paired relationships between plates to eliminate many sources of systematic noise, increasing our ability to identify reproducible and relevant differential interactions (Figures 1,  2, and 4). This result is of broad interest because finding the appropriate control plays an important part in differential experimental design in many fields.

Interpretation of the dS score
The previous approach to scoring differential interactions derived a score from the difference between static interaction scores in each condition. This explicit comparison of scores led to a natural discussion about the interpretation of the differential score based on the sign and magnitudes of the static scores [6]. However, because the dS score is not based on the difference between static scores, we suggest the dS score be interpreted following the same logic as static interaction scores. In the static case, positive interactions generally denote gene relationships within the same pathway or complex, while negative interactions generally indicate gene relationships that span parallel or redundant pathways [34]. The difference between differential and static interpretation is that static scores indicate interactions that affect general cell growth, whereas differential scores indicate interactions that affect the treatment response.
While the theoretical interpretation of the dS score is straightforward, the practical interpretation is more complex because the static interaction scores provide a context for the interpretation of the dS score. For example, a gene pair exhibiting a positive interaction in untreated conditions that is more positive in MMS (yielding a positive dS score) should be interpreted differently than an interaction that is negative in untreated conditions that becomes positive in MMS (also yielding a positive dS score). According to the standard interaction model, the latter example is supposedly going from  Bean and Ideker Genome Biology 2012, 13:R123 http://genomebiology.com/content/13/12/R123 a between-pathway relationship in untreated conditions to a within-pathway relationship in the treatment, which quality the former example does not have, even though both examples exhibit a co-pathway relationship in the DDR response. These various classes of differential interactions exhibit different enrichment rates for our DDR standard ( Figure S4 in Additional file 3), suggesting that there may be unique qualities to each class, but a more detailed investigation of differential interaction interpretation is left for future work.

Conclusions
Here, we have put forth a quantitative differential interaction score, the dS score, based on important statistical information inherent in the experimental design. This score not only provides more information about each interaction than previous approaches, but also shows improved reproducibility and comparable biological enrichment. Additionally, quantitative differential interactions give rise to differential interaction profiles, which we demonstrate to be biologically relevant and uniquely insightful. Furthermore, we provide a new interpretation for differential interactions based on the accepted interpretation of static genetic interactions. We conclude that our differential interaction score is preferred to the previous approach for differential genetic interaction mapping analysis.

Correlation of query replicates
We used normalized colony size residuals to calculate the correlation of query replicates (Figure 1b). Our approach to computing these residuals is based on the approach published by Collins et al. [17]. In brief, the raw colony sizes are pre-processed to filter bad colonies and correct spatial artifacts. Each plate (that is, the set of all colony sizes from the same plate) is normalized by the plate mode, calculated using a kernel density estimation method [35]. Next, array single mutant fitnesses are estimated using the median normalized colony size for a given array position across all plates, which are then subtracted from the respective double mutant colony sizes to yield normalized colony size residuals. These residuals are, in turn, used to calculate several quantities: (1) the pair-wise correlation for each pair of conditional plate replicates, that is, double mutant selection plates derived from the same single mutant selection plate differing only in the growth condition; (2) pairwise correlation of untreated experimental replicates; and (3) pairwise correlation of randomly selected queries.

The dS score
Normalized differentials are obtained by subtracting untreated normalized colony sizes from the corresponding treated normalized colony sizes. The dS score is then computed as the pooled t-statistic of the six replicates for a given double mutant versus all double mutant measurements containing the respective array gene deletion. Note that the S score, for scoring static interactions, employs a minimum bound on the variance of the six double mutant replicates [17], while the dS score does not bound the variance.

Scoring null differential interactions
The null distribution of dS scores was generated by using replicate pairs of measurements grown on the same plate (and therefore same condition) and following the same scoring procedure already described. The differentials for the three replicates in each condition were pooled to produce six total replicates for each gene pair. We computed false discovery rates for each dS score cutoff as the ratio of the proportion of null scores beyond the cutoff to the proportion of observed dS scores beyond the cutoff.

Biological enrichment
The 'bronze' standard for differential genetic interactions in response to DNA damage was compiled as (1) the set of all gene pairs in which both genes are annotated as 'DNA damage response' (DDR) in the Gene Ontology [27] (term ID GO:0006974, direct association; accessed December 2011), and (2) the set of all gene pairs indicated by the YeastNet 2.0 benchmark set [28] in which at least one gene is annotated as DDR. The lists of DDR genes and bronze-standard DDR gene pairs are provided as Additional file 4.
The gold standard used for co-complex membership is defined by Baryshnikova et al. [29]. Precision-recall plots were computed using the absolute value of the dS scores (treating positive and negative interactions equally).

Significance of Pearson correlation
To assess the significance of the difference between the correlation coefficients of the scores in Figure 3, we calculated the correlation of bootstrapped data for 10,000 iterations in a paired fashion and counted the number of