Comparative analysis indicates regulatory neofunctionalization of yeast duplicates

Tirosh, Itay; Barkai, Naama

doi:10.1186/gb-2007-8-4-r50

Research
Open access
Published: 05 April 2007

Comparative analysis indicates regulatory neofunctionalization of yeast duplicates

Itay Tirosh¹ &
Naama Barkai²

Genome Biology volume 8, Article number: R50 (2007) Cite this article

9593 Accesses
76 Citations
6 Altmetric
Metrics details

Abstract

Background

Gene duplication provides raw material for the generation of new functions, but most duplicates are rapidly lost due to the initial redundancy in gene function. How gene function diversifies following duplication is largely unclear. Previous studies analyzed the diversification of duplicates by characterizing their coding sequence divergence. However, functional divergence can also be attributed to changes in regulatory properties, such as protein localization or expression, which require only minor changes in gene sequence.

Results

We developed a novel method to compare expression profiles from different organisms and applied it to analyze the expression divergence of yeast duplicated genes. The expression profiles of Saccharomyces cerevisiae duplicate pairs were compared with those of their pre-duplication orthologs in Candida albicans. Duplicate pairs were classified into two classes, corresponding to symmetric versus asymmetric rates of expression divergence. The latter class includes 43 duplicate pairs in which only one copy has a significant expression similarity to the C. albicans ortholog. These may present cases of regulatory neofunctionalization, as supported also by their dispensability and variability.

Conclusion

Duplicated genes may diversify through regulatory neofunctionalization. Notably, the asymmetry of gene sequence evolution and the asymmetry of gene expression evolution are only weakly correlated, underscoring the importance of expression analysis to elucidate the evolution of novel functions.

Background

Current genomes were shaped by numerous duplications of single genes, chromosomal segments and even entire genomes [1–3]. In most cases, one copy of the duplicated gene is rapidly lost either by deletion or through mutations ('nonfunctionalization'), reflecting the lack of selection for each individual copy. In other cases, however, both duplicates may survive despite the initial redundancy and become fixed in the genome. The retention of both duplicates over millions of years implies that they confer an advantage such that deletion of either copy will cause a reduction in fitness.

While the evolutionary advantage of duplicates retention is usually difficult to ascertain, several models have been suggested [4]. First, duplicates could be retained due to selection for robustness through redundancy [5], although this view has been frequently challenged [6, 7]. Second, selection for high protein dosage may favor the presence of two gene copies [8]. In these cases, similarity between the two copies can be maintained by negative selection or by gene conversion. Third, each of the duplicates may specialize in a subset of the ancestral functions, such that the ancestral functions require the activity of both genes ('subfunctionalization'). Fourth, one of the duplicates may retain the ancestral functions while the other evolves to perform a novel function ('neofunctionalization'). Identifying these scenarios and, in particular, recognizing cases of neofunctionalization may provide new insights into genome evolution, since duplications are believed to constitute the main origin of novel functions.

The term neofunctionalization refers to the acquisition of a novel function. However, it is typically difficult to define what the function of a gene is, and what constitutes a novel function. One obvious aspect of gene function is the catalytic activity performed by encoded enzymes. A broader definition of gene function, however, should include other aspects, such as protein localization, interactions with other proteins and expression patterns. These features are usually difficult to infer from the protein sequence, but the abundance of functional genomics datasets and the advent of microarray technology can now be used to analyze these properties directly. Of particular interest are the expression patterns of genes in various conditions. Changes in expression patterns have been suggested to be the primary source of phenotypic divergence among related species [9]. Such regulatory changes can have a profound effect on the function of a duplicated gene and, thus, lead to the preservation of a duplicate pair [10–12]. We refer to this scenario, where one copy of a duplicate pair diverges in expression pattern thereby facilitating the acquisition of a novel function, as regulatory neofunctionalization.

The yeast Saccharomyces cerevisiae is an excellent model to study the diversification of duplicate gene pairs. First, extensive functional annotations and expression data are available for S. cerevisiae. Second, the S. cerevisiae ancestor has undergone a whole genome duplication (WGD) event about 100 million years ago [13]. Sequencing of the pre-duplication yeast, Kluyveromyces waltii, identified hundreds of duplicate gene-pairs that were retained following this WGD event [2]. Many of these duplicate pairs accumulated extensive divergence and evolved new or altered functions. For example, sequence comparisons between S. cerevisiae duplicate pairs and their single orthologs from K. waltii revealed that in a significant portion of the duplicate pairs (115 out of 457), one copy has diverged in sequence significantly faster than the other copy [2]. This was taken as evidence for neofunctionalization, with the more conserved copy retaining the ancestral function and the other copy evolving to perform a new or altered function. A similar analysis of expression patterns may reveal additional cases of regulatory neofunctionalization.

Recent studies reported that 40% of the duplicate pairs in S. cerevisiae differ significantly in their expression patterns [14, 15]. However, to identify cases of neofunctionalization, the expression pattern of each of the copies must be compared with the ancestral expression pattern. To circumvent this problem, Gu et al. [15] focused on gene families that contain a duplicate pair and at least one additional gene that was assumed to represent the ancestral expression pattern. In the absence of data about the expression of the ancestral genes, however, the validity of this assumption is difficult to assess.

Here we analyze the diversification of yeast duplicates by directly comparing their expression patterns in a post-duplication species (S. cerevisiae) to those of their single orthologs in a pre-duplication species (Candida albicans) as a proxy for the ancestral gene expression. We first describe a general method for comparative analysis of expression profiles from related organisms. We apply this method to compare large datasets from hundreds of microarray experiments in both yeast species. Focusing on duplicate gene pairs, we identify 43 duplicated gene pairs with asymmetric rates of expression divergence. These gene pairs are likely to present instances of regulatory neofunctionalization. Notably, the level of sequence divergence in many of these duplicates is similar, emphasizing the need to include gene regulation as a complementary means for analyzing functional divergence.

Results

We first describe our method for comparison of expression profiles between one-to-one orthologs from two organisms, and later extend it to examine the expression conservation of duplicate gene pairs.

A novel method for comparative analysis of gene expression

Ideally, we wish to compare the transcription responses of the S. cerevisiae genes to those of their C. albicans orthologs under the same set of conditions. However, the expression data of the two species was measured under different conditions and by different laboratories and could not be directly compared. We thus developed a novel method for comparing the expression profiles of two organisms, called 'iterative comparison of coexpression' (ICC; see Materials and methods and Figure 1). To analyze the expression conservation of an orthologous gene pair from S. cerevisiae and Candida albicans (a_i^cerand a_i^can, respectively), we compare their expression correlations with all other one-to-one orthologous pairs (a_g^cer, a_g^can; g = 1n), as described below. This method follows the conceptual framework described by Ihmels et al. [16] and Dutilh et al. [17] and compares the architecture of the co-expression networks.

Dutilh et al. [17] defined expression conservation as the similarity between (i) the expression correlations between a gene from S. cerevisiae (a_i^cer) and all other S. cerevisiae genes (a_g^cer, g = 1..n), and (ii) the expression correlations between its ortholog from Candida albicans (a_i^can) and all other Candida albicans orthologs (a_g^can, g = 1..n), that is:

EC(i) = PCC(R_i,g^cer, R_i,g^can), g = 1..n

where PCC is the Pearson correlation coefficient and R_i,g^ceris a vector of intra-species correlations, whose component R_i,j^ceris the correlation between the expression patterns of a_i^cerand a_j^cer(Figure 1). However, we note that a difference between R_i,g^cerand R_i,g^candoes not necessarily correspond to a difference in the expression patterns of a_i^cerand a_i^can. For example, if a_j^cerand a_j^canhave highly divergent expression profiles, then R_i,j^cerand R_i,j^canwill be different even if the expression of a_i^cerand a_i^canhas been completely conserved. Thus, when calculating the similarity between the vectors of correlations (R_i,g^cerand R_i,g^can), larger weight should be given to orthologous pairs whose expression has been conserved. In other words, when comparing a pair of orthologs, we would like to focus on their correlations with other orthologous pairs whose expression has been conserved.

To account for this effect, we employ an iterative algorithm that estimates expression conservation iteratively (see Figure 1 and Materials and methods). Briefly, at the first iteration, expression conservation is calculated as in Dutilh et al. [17]; at each subsequent iteration, the expression conservation values from the previous iteration are used as weights to calculate new expression conservation values. The iterative process proceeds until the expression conservation values converge.

Expression conservation between S. cerevisiae and C. albicansorthologs

We applied ICC to the set of one-to-one orthologs between S. cerevisiae and C. albicans [18]. To this end, we assembled a large dataset of genome-wide expression data, consisting of approximately 1,700 expression profiles for S. cerevisiae and 244 expression profiles for C. albicans [16, 19]. The results are summarized in Figure 2 and Additional data file 1.

Several tests were performed to validate the results. First, we ran the algorithm several times, starting from randomly chosen initial weights for each orthologous pair. In all cases the algorithm converged to the same results (Figure 2a), thus verifying the robustness of the iterative procedure. Second, we ran the algorithm with randomly chosen subsets of the expression datasets consisting of half the number of conditions for each species. Also in this case, the algorithm converged to the same results (Figure 2a). Third, we defined the set of conserved and divergent genes (5% highest or lowest expression conservation values, respectively) and examined their properties. Approximately 60% of the most conserved S. cerevisiae genes are essential [20] compared with 26% for all genes with orthologs in C. albicans (p < 10^-16 by the hypergeometric test). Furthermore, ribosome biogenesis was found to be the most enriched Gene Ontology (GO) term among the conserved genes (p < 10^-50 by hypergeometric test), whereas mitochondrion and mitochondrial ribosome were the most enriched GO terms among the divergent genes (p < 10^-17 for both by hypergeometric test). Indeed, these latter groups have undergone a large-scale adaptation of their expression profiles following the WGD [19]. Thus, the results of our algorithm are in good agreement with prior knowledge and expectations. Finally, we compared the distribution of expression conservation scores obtained for the orthologous pairs to that obtained for randomly chosen (non-orthologous) gene pairs. Expression conservation was higher for orthologs than non-orthologs (Figure 2b), indicating that the expression networks of the two yeast species have retained significant similarities.

Comparison of duplicate gene pairs with their single orthologs

We next focused on the expression conservation of duplicate gene pairs. To this end, we used the expression conservation scores generated by the ICC for each of the one-to-one orthologs as weights to calculate the expression conservation of duplicate genes. Namely, for each duplicate gene pair in S. cerevisiae, we calculated two expression conservation scores between each of the duplicates and their single ortholog from C. albicans (see Materials and methods).

Out of the 457 duplicate pairs from the WGD event [2], we focused on 244 pairs compiled with the following two conditions. First, we performed a phylogenetic analysis and required that a single ortholog from C. albicans was predicted for both of the duplicates and that no other in-paralogs were found in S. cerevisiae (Materials and methods). This single ortholog serves as an out-group to estimate the expression of the ancestral gene before the WGD. Second, to avoid cases where the duplicates cross-hybridize to microarrays, thus leading to artificial correlations, we considered only duplicate pairs whose nucleotide sequence similarity was lower than 90%.

Two modes of expression divergence for duplicates

As shown in Figure 3a, a large percentage of the duplicates appear to have evolved at a similar rate, as both gene pairs show similar expression conservation to their single C. albicans orthologs (for example, 79 duplicates in the yellow region). Notably, however, a similarly large fraction of duplicate pairs display distinctly different levels of expression conservation (for example, 63 duplicates in the green region). These cases indicate asymmetric rates of expression evolution among the two duplicate genes.

To further explore the distinction between duplicate pairs that evolve at similar versus asymmetric rates, we focused on the 96 duplicate pairs in which at least one of the copies has significantly high expression conservation (EC > 0.37; see dashed line in Figure 3a). This constraint removed cases for which it is difficult to infer the ancestral expression pattern, since the C. albicans expression pattern is much different from that of both duplicate genes. The expression conservation of the least conserved duplicates, in these cases, display a bimodal distribution with a boundary at approximately EC = 0 (Figure 3b). This distribution thus partitions the duplicate gene pairs into two classes.

The first class corresponds to duplicate gene pairs for which the expression of both copies resembles that of the C. albicans ortholog. Of these duplicate pairs, 28 have significantly high expression conservation for both copies; we refer to these as duplicate pairs with conserved expression. This class includes duplicate pairs whose divergence is probably related to other aspects of protein function, such as protein structure or interactions. In addition, duplicates in this class tend to have higher mRNA and protein abundance [21, 22] than other duplicates (Additional data file 2), suggesting that some of these duplicate pairs could have been retained due to selection for high dosage.

Interestingly, in the second class, comprising 45% of the duplicate gene pairs (43 out of 96), one copy has a significant similarity (EC > 0.37) to the C. albicans ortholog (conserved), and the second copy has no similarity (EC < 0) to the C. albicans ortholog (divergent). The duplicate pairs displaying this asymmetric pattern of expression evolution are given in Table 1. This pattern is consistent with regulatory neofunctionalization, suggesting that the conserved copy has retained the ancestral function while the divergent copy performs a novel or altered function.

Table 1 Duplicate pairs with asymmetric expression evolution

Full size table

To verify the asymmetric divergence of these duplicate pairs we also performed an ancestral reconstruction analysis; since our method relies on correlations of expression with multiple genes, we performed a parsimony-based reconstruction [23] for each correlation value (see Materials and methods). This allowed us to decompose the expression divergence of each duplicate gene into two components: duplicate versus ancestor and ancestor versus C. albicans ortholog (Figure 3c). By definition, the ancestral reconstruction procedure tends to estimate an ancestral state that is an intermediate between the two duplicates. However, asymmetric expression divergence was still evident when examining the duplicate versus ancestor expression similarity (Table 1 and Figure 3c). In all cases, the expression similarity of the divergent copy was much lower than that of the conserved copy, and in most cases even lower than zero. Furthermore, the predicted ancestral expression patterns were more similar to the C. albicans patterns in duplicate pairs with asymmetric divergence compared to duplicate pairs with conserved expression (Figure 3c; p = 0.004 in a Wilcoxon rank sum test). This implies that expression of the duplicate pairs with asymmetric divergence is, in general, highly conserved, and divergence in these cases was restricted to one of the copies after duplication.

Properties of duplicates predicted to undergo regulatory neofunctionalization

Within a duplicate pair predicted to undergo regulatory neofunctionalization, our analysis distinguishes the conserved from the divergent copy. We next compared the set of conserved copies with that of the divergent copies using several datasets. First, we examined the fraction of essential genes [20] in the two gene sets. While eight of the conserved copies are essential, all of the divergent copies are dispensable (Figure 4). Second, we examined the extent of sequence variability [24], as well as expression variability [25], of these genes among the closely related sensu-stricto species, which diverged from S. cerevisiae long after the WGD. In both cases, the divergent copies were, on average, more variable than the conserved ones (Figure 4), indicating that they are still evolving rapidly. Taken together, these results suggest that the conserved copies typically perform stable and important functions, while the divergent copies are dispensable and undergoing continuous fine-tuning, as expected for newly derived functions.

Whole-genome versus smaller-scale duplications

Recent studies have suggested that duplicate pairs arising from a WGD event have different characteristics to those arising from smaller-scale duplications [26–28]. To examine if this is the case with respect to gene expression evolution, we repeated the analysis presented above with 46 gene pairs from S. cerevisiae that were predicted to arise from small-scale duplications after speciation from C. albicans (see Materials and methods). Interestingly, only 1 duplicate pair had asymmetric expression divergence while 14 duplicate pairs had conserved expression (see Additional data file 3). This ratio is much different from the results in the WGD analysis where 43 duplicate pairs had asymmetric expression divergence and only 28 duplicate pairs had conserved expression. This difference may indicate that divergence of WGD duplicates is more likely to occur through regulatory divergence compared with small-scale duplicates.

Divergence of protein sequence versus expression pattern

We asked whether the observed asymmetry in the evolution of duplicates' expression patterns is correlated with asymmetric evolution of protein sequences [17, 29, 30]. To this end, we used a parsimony-based approach to asses the protein sequence divergence of each of the WGD duplicates from their pre-duplication ancestors (see Materials and methods and Table 1). We then compared the asymmetry of protein sequence divergence with that of expression divergence as estimated in the ancestral reconstruction analysis (Figure 5; see Materials and methods). The two measures of asymmetry are only weakly correlated (r = 0.15, p = 0.11). While most of the copies with asymmetric expression divergence also have high asymmetry of sequence divergence, others show similar levels of sequence divergence, and some even show an opposite trend where the divergent copy in terms of expression is more conserved in terms of sequence (negative sequence divergence asymmetry in Figure 5). These results suggest that although in many cases protein sequence and expression divergence are correlated, they represent distinct evolutionary mechanisms for the acquisition of novel functions.

Discussion

We developed a new method for comparative analysis of genome-wide expression data (ICC) and applied it to characterize the diversification of yeast duplicates that originated at the WGD event. We identified a natural separation of duplicate pairs into two classes. The first class includes duplicates with symmetric expression divergence, such that both S. cerevisiae gene pairs displayed similar conservation with their C. albicans ortholog. The expression of many of these duplicate pairs is highly correlated (not shown), suggesting that they were retained by selection for high protein dosage or evolved through other functional aspects, such as protein structure or interactions.

The second class includes 43 duplicate gene pairs in which one copy showed a significant expression similarity to the C. albicans ortholog while the other copy displayed no similarity to the C. albicans ortholog. Some of these cases may represent neutral evolution of gene expression that has no functional significance. Alternatively, these cases may involve regulatory neofunctionalization. Although our method is not capable of detecting the action of directional selection, as required for neofunctionalization, the high conservation of one copy and the total lack of conservation of the other copy appear to be inconsistent with a neutral model. We thus interpreted this class as enriched with cases of regulatory neofunctionalization.

Another alternative interpretation is that this pattern indicates evolution by subfunctionalization, whereby the expression of the ancestral gene is partitioned between the two copies [4, 11]. Our method does not compare the expression of duplicates and their orthologs under the same conditions, and thus subfunctionalization can lead to different patterns of expression conservation and is difficult to infer. In contrast, the neofunctionalization model clearly predicts that the gene with ancestral function will have high expression conservation, while the gene with derived function has low expression conservation. Our observations are, therefore, more consistent with the neofunctionalization model. It is important to note, however, that the neofunctionalization and subfunctionalization models are not mutually exclusive. For example, duplicates can evolve by subfunctionalization in terms of protein structure but by neofunctionalization in terms of expression profile. Furthermore, an initial subfunctionalization can be followed by neofunctionalization [31].

Our interpretation of neofunctionalization of the indicated genes is supported by their increased dispensability and enhanced variability in sequence and expression among closely related yeast species. Importantly, each of these 43 cases (Table 1) entails a prediction for the function of the ancestral protein in C. albicans and the evolutionary trajectory of the duplicate pair.

The new functions encoded by genes that evolved by neofunctionalization probably had an important role in the adaptation of yeast following the WGD. Perhaps the most significant adaptation of the S. cerevisiae lineage was the transformation from aerobic to predominantly anaerobic metabolism [32]. This adaptation involved the generation of novel pathways, most notably the repression of oxidative phosphorylation and related processes in the presence of glucose, known as glucose repression [33]. Indeed, of the duplicate pairs with asymmetric expression evolution at least two encode isoenzymes, and in these pairs the genes encoding the predicted novel function (HXK1 and PYK2) are under the control of the glucose repression pathway, while the genes encoding the predicted ancestral function (HXK2 and CDC19) are not repressed by glucose [34, 35]. Another pair of isoenzymes (APA1 and APA2), which are ATP adenylyltransferases whose functional distinction is unclear, shows a similar pattern of regulation. The enzyme with the predicted novel function (APA2) is co-regulated with the anaerobic genes (expression correlation r = 0.4875 for HXK1 and r = 0.3966 for PYK2 in the S. cerevisiae dataset), while the enzyme with the predicted ancestral function (APA1) is co-regulated with the aerobic genes (expression correlation r = 0.5220 for HXK2 and r = 0.2940 for CDC19 in the S. cerevisiae dataset). This observation suggests that APA2 is the anaerobic ATP adenylyltransferase while APA1 is the aerobic one.

Neofunctionalization could also refine the function of existing complexes by creating specialized subunits with an elaborate regulation. For example, two of the duplicate gene pairs predicted to have evolved by neofunctionalization are alternative subunits of the same complex: EGD1 and BTT1 of the nascent polypeptide-associated complex [17, 36], and FKS1 and GSC2 of beta-1,3-glucan synthase. Similarly, the transcription factors GZF3 (conserved) and DAL80 (divergent) are two regulators of nitrogen metabolism that can homo- or hetero-dimerize [37], presumably leading to different activities. These cases may provide examples where regulation of the alternative subunits' expression determines the composition of the complex at any cellular state, and thus dictates its condition-dependent function.

Conclusion

Genes can evolve new functions by modulation of different characteristics, including the structure, physical interactions, expression patterns and localization of the proteins they encode. A comprehensive understanding of functional divergence thus requires an integrated analysis of different measures of divergence. Here, we studied the expression divergence of yeast duplicate pairs and identified 43 pairs with asymmetric divergence that is compatible with regulatory neofunctionalization. Importantly, most of these were not identified by sequence analysis [2] and, in general, the asymmetry of sequence divergence and that of expression divergence were only marginally correlated. Future studies will undertake the challenge of integrating these and other data types to provide a better understanding of the functional diversification of genes following duplications.

Materials and methods

Definition of homology relationships

The Inparanoid software [18] was used to identify one-to-one orthology between genes in S. cerevisiae and C. albicans. Duplicate pairs from the WGD were taken from Kellis et al. [2] and filtered with the following phylogenetic analysis: for each duplicate pair we constructed a clustalw multiple alignment of the duplicates, their single K. waltii ortholog (which was determined by synteny [2]) and all other matches from S. cerevisiae and C. albicans with a BLAST p value smaller than 10^-4. These alignments were used to construct a neighbor-joining phylogenetic tree with the jukes-cantor distance, after ignoring gaps. We then demanded that each tree (or its subtree) contain only the pair of duplicates, the syntenic K. waltii ortholog and a single C. albicans ortholog. To further verify the C. albicans ortholog we also verified that the K. waltii ortholog and one of the duplicates are its best matches in the corresponding genomes.

The set of smaller-scale duplications was defined by: first, taking all duplications predicted by Inparanoid (that is, clusters of one C. albicans gene and two S. cerevisiae genes); second, excluding those that were predicted to arise from the WGD [2]; and third, filtering the remaining set using the phylogenetic analysis described above.

Method for comparative analysis of gene expression

Expression datasets for S. cerevisiae and C. albicans containing multiple experimental conditions were collected as described in Ihmels et al. [16]. These expression matrices were restricted to genes for which orthology relationships were identified and ordered accordingly (that is, equivalent rows of the two matrices correspond to the expression profiles of a pair of orthologs). Next, these matrices were converted into correlation matrices by calculating, within each organism, the Pearson correlation coefficient (PCC) between the expression profiles of each pair of genes, over all the conditions. The resulting matrices ( $R_{g, g}^{c e r}$ , $R_{g, g}^{c a n}$ ) contain all the correlations between genes for which an orthology relationship was defined (g = 1..n). These matrices have similar dimensionality, and we proceeded by comparing equivalent rows:

E C_{0} (i) = P C C (R_{i, g}^{c e r}, R_{i, g}^{c a n})

This corresponds to the initial estimation of expression conservation (EC) in which identical weights are given to the correlations with all genes. We then iteratively refined this measure by calculating a weighted correlation, where the weight for a correlation with each gene is given by the EC of that gene from the previous iteration:

E C_{k} (i) = P C C w (R_{i, g^{'}}^{c e r}, R_{i, g^{'}}^{c a n})

where:

P C C w (X, Y) = \frac{\sum w_{i} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum w_{i} {(X_{i} - \bar{X})}^{2} \sum w_{i} {(Y_{i} - \bar{Y})}^{2}}}

w_i= EC_{k - 1}(i)

g' = {l ∈ g | EC_{k - 1}(l) > 0}

This procedure was repeated until convergence:

{\sum_{i \in g} [E C_{k} (i) - E C_{k - 1} (i)]}^{2} < 0.1

Finally, to validate the iterative heuristic, we calculated EC scores when the initial weights were randomly selected:

E C_{0} (i) = P C C w (R_{i, g}^{c e r}, R_{i, g}^{c a n})

where w_i= rand([0,1]).

This was repeated ten times; in each case the algorithm described above was applied until convergence and the EC scores were compared to those without randomization. In all cases the results from the randomized procedure were similar to those of the original procedure (PCC > 0.99), indicating that the original results reflect a global minimum.

Application to duplicate gene pairs

After EC scores were computed for all ortholog gene pairs, these scores were used as weights for a similar analysis of duplicates. For each pair of duplicates from S. cerevisiae and their orthologs from C. albicans, we calculated two EC scores for comparison of each of the duplicates with their ortholog.

mRNA and protein abundance

mRNA abundance averaged over various studies was taken from Beyer et al. [22], and protein abundance was taken from Ghaemmaghami et al. [21]. These values were log₂-tranformed, and then centered and normalized.

Ancestral reconstruction of expression correlations

Each gene in S. cerevisiae and C. albicans is represented in our analysis by its expression correlation with a reference set of one-to-one orthologs. Thus, for each pair of duplicates and each reference gene, we performed ancestral reconstruction to infer the correlation of the ancestral gene (before duplication) with the reference gene. Ancestral reconstruction is done with a parsimony-based procedure [23], which uses the correlation values in each of the duplicates and the C. albicans ortholog to infer the ancestral correlation that minimizes the total divergence of that value. The inferred correlations with the entire reference set defines the ancestral expression pattern that is then compared to the duplicate pair and the C. albicans ortholog using the EC score defined above.

Variability of protein sequence and expression profiles

Variability of protein sequences (adjusted Ka/Ks) among four yeast species from the Saccharomyces sensu-stricto complex were taken from [17], transformed as in the original study (f(k) = Log [k + 0.001]), and normalized by subtracting their mean and dividing by their standard deviation. Variability of expression profiles in response to environmental stresses among four yeast species from the Saccharomyces sensu-stricto complex were taken from [18].

Protein sequence divergence

Multiple alignments of the duplicates and their single orthologs from K. waltii and C. albicans were used to estimate protein sequence divergence using a parsimony-based approach. Namely, each position with the same amino acid in the K. waltii ortholog, the C. albicans ortholog and at least one of the duplicates was assumed to represent the ancestral state before duplication; if the second duplicate had a different amino acid at that position, then a substitution was inferred. The number of substitutions inferred for each duplicate gene is used as an estimate of protein sequence divergence (Table 1).

Asymmetry of sequence and expression divergence

Asymmetry was defined as $\frac{x_{1} - x_{2}}{x_{1} + x_{2}}$ , where x₁ and x₂ are measures of divergence of the duplicate gene pair. For sequence divergence x_i represented the number of amino acid substitutions and for expression divergence it was 1 - EC. For each gene pair, x₁ was chosen as the copy with lower expression conservation, such that asymmetry of expression divergence is always positive and the sign of asymmetry of sequence divergence reflects the congruence between sequence and expression analyses (negative asymmetry of sequence divergence means that the copy with lower expression conservation had higher sequence conservation). Note that this measure is not equivalent to that used to detect extreme cases of asymmetry where we demanded that one copy has EC > 0.372 and the other copy has EC < 0.

Additional data files

The following additional data are available with the online version of this paper. Additional data file 1 is a table that lists the expression conservation values of 2,644 orthologous pairs from S. cerevisiae and C. albicans. Additional data file 2 is a figure showing the high mRNA and protein abundance of duplicated genes with conserved expression compared with other duplicated genes. Additional data file 3 is a figure showing the expression conservation of duplicated genes from small-scale duplication events. In contrast to duplicates from the WGD, there is only one case of asymmetric divergence and many cases of conserved expression.

References

Lynch M, Force A: The probability of duplicate gene preservation by subfunctionalization. Genetics. 2000, 154: 459-473.
PubMed CAS PubMed Central Google Scholar
Kellis M, Birren BW, Lander ES: Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae. Nature. 2004, 428: 617-624. 10.1038/nature02424.
Article PubMed CAS Google Scholar
Li WH, Gu Z, Cavalcanti AR, Nekrutenko A: Detection of gene duplications and block duplications in eukaryotic genomes. J Struct Funct Genomics. 2003, 3: 27-34. 10.1023/A:1022644628861.
Article PubMed CAS Google Scholar
Prince VE, Pickett FB: Splitting pairs: the diverging fates of duplicated genes. Nat Rev Genet. 2002, 3: 827-837. 10.1038/nrg928.
Article PubMed CAS Google Scholar
Tischler J, Lehner B, Chen N, Fraser AG: Combinatorial RNA interference in C. elegans reveals that redundancy between gene duplicates can be maintained for more than 80 million years of evolution. Genome Biol. 2006, 7: R69-10.1186/gb-2006-7-8-r69.
Article PubMed PubMed Central Google Scholar
Kafri R, Bar-Even A, Pilpel Y: Transcription control reprogramming in genetic backup circuits. Nat Genet. 2005, 37: 295-299. 10.1038/ng1523.
Article PubMed CAS Google Scholar
Nowak MA, Boerlijst MC, Cooke J, Smith JM: Evolution of genetic redundancy. Nature. 1997, 388: 167-171. 10.1038/40618.
Article PubMed CAS Google Scholar
Sugino RP, Innan H: Selection for more of the same product as a force to enhance concerted evolution of duplicated genes. Trends Genet. 2006, 22: 642-644. 10.1016/j.tig.2006.09.014.
Article PubMed CAS Google Scholar
King MC, Wilson AC: Evolution at two levels in humans and chimpanzees. Science. 1975, 188: 107-116. 10.1126/science.1090005.
Article PubMed CAS Google Scholar
Adams KL, Wendel JF: Novel patterns of gene expression in polyploid plants. Trends Genet. 2005, 21: 539-543. 10.1016/j.tig.2005.07.009.
Article PubMed CAS Google Scholar
Duarte JM, Cui L, Wall PK, Zhang Q, Zhang X, Leebens-Mack J, Ma H, Altman N, dePamphilis CW: Expression pattern shifts following duplication indicative of subfunctionalization and neofunctionalization in regulatory genes of Arabidopsis. Mol Biol Evol. 2006, 23: 469-478. 10.1093/molbev/msj051.
Article PubMed CAS Google Scholar
Li WH, Yang J, Gu X: Expression divergence between duplicate genes. Trends Genet. 2005, 21: 602-607. 10.1016/j.tig.2005.08.006.
Article PubMed Google Scholar
Wolfe KH, Shields DC: Molecular evidence for an ancient duplication of the entire yeast genome. Nature. 1997, 387: 708-713. 10.1038/42711.
Article PubMed CAS Google Scholar
Gu Z, Nicolae D, Lu HH, Li WH: Rapid divergence in expression between duplicate genes inferred from microarray data. Trends Genet. 2002, 18: 609-613. 10.1016/S0168-9525(02)02837-8.
Article PubMed CAS Google Scholar
Gu X, Zhang Z, Huang W: Rapid evolution of expression and regulatory divergences after yeast gene duplication. Proc Natl Acad Sci USA. 2005, 102: 707-712. 10.1073/pnas.0409186102.
Article PubMed CAS PubMed Central Google Scholar
Ihmels J, Bergmann S, Berman J, Barkai N: Comparative gene expression analysis by differential clustering approach: application to the Candida albicans transcription program. PLoS Genet. 2005, 1: e39-10.1371/journal.pgen.0010039.
Article PubMed PubMed Central Google Scholar
Dutilh BE, Huynen MA, Snel B: A global definition of expression context is conserved between orthologs, but does not correlate with sequence conservation. BMC Genomics. 2006, 7: 10-10.1186/1471-2164-7-10.
Article PubMed PubMed Central Google Scholar
Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001, 314: 1041-1052. 10.1006/jmbi.2000.5197.
Article PubMed CAS Google Scholar
Ihmels J, Bergmann S, Gerami-Nejad M, Yanai I, McClellan M, Berman J, Barkai N: Rewiring of the yeast transcriptional network through the evolution of motif usage. Science. 2005, 309: 938-940. 10.1126/science.1113833.
Article PubMed CAS Google Scholar
Giaever G, Chu AM, Ni L, Connelly C, Riles L, Veronneau S, Dow S, Lucau-Danila A, Anderson K, Andre B, et al: Functional profiling of the Saccharomyces cerevisiae genome. Nature. 2002, 418: 387-391. 10.1038/nature00935.
Article PubMed CAS Google Scholar
Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O'Shea EK, Weissman JS: Global analysis of protein expression in yeast. Nature. 2003, 425: 737-741. 10.1038/nature02046.
Article PubMed CAS Google Scholar
Beyer A, Hollunder J, Nasheuer HP, Wilhelm T: Post-transcriptional expression regulation in the yeast Saccharomyces cerevisiae on a genomic scale. Mol Cell Proteomics. 2004, 3: 1083-1092. 10.1074/mcp.M400099-MCP200.
Article PubMed CAS Google Scholar
Rossnes R, Eidhammer I, Liberles DA: Phylogenetic reconstruction of ancestral character states for gene expression and mRNA splicing data. BMC Bioinformatics. 2005, 6: 127-10.1186/1471-2105-6-127.
Article PubMed PubMed Central Google Scholar
Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW: Functional genomic analysis of the rates of protein evolution. Proc Natl Acad Sci USA. 2005, 102: 5483-5488. 10.1073/pnas.0501761102.
Article PubMed CAS PubMed Central Google Scholar
Tirosh I, Weinberger A, Carmi M, Barkai N: A genetic signature of interspecies variations in gene expression. Nat Genet. 2006, 38: 830-834. 10.1038/ng1819.
Article PubMed CAS Google Scholar
Davis JC, Petrov DA: Do disparate mechanisms of duplication add similar genes to the genome?. Trends Genet. 2005, 21: 548-551. 10.1016/j.tig.2005.07.008.
Article PubMed CAS Google Scholar
Guan Y, Dunham MJ, Troyanskaya OG: Functional analysis of gene duplications in Saccharomyces cerevisiae. Genetics. 2007, 175: 933-943. 10.1534/genetics.106.064329.
Article PubMed CAS PubMed Central Google Scholar
Casneuf T, De Bodt S, Raes J, Maere S, Van de Peer Y: Nonrandom divergence of gene expression following gene and genome duplications in the flowering plant Arabidopsis thaliana. Genome Biol. 2006, 7: R13-10.1186/gb-2006-7-2-r13.
Article PubMed PubMed Central Google Scholar
Wagner A: Decoupled evolution of coding region and mRNA expression patterns after gene duplication: implications for the neutralist-selectionist debate. Proc Natl Acad Sci USA. 2000, 97: 6579-6584. 10.1073/pnas.110147097.
Article PubMed CAS PubMed Central Google Scholar
Kim SH, Yi SV: Correlated asymmetry of sequence and functional divergence between duplicate proteins of Saccharomyces cerevisiae. Mol Biol Evol. 2006, 23: 1068-1075. 10.1093/molbev/msj115.
Article PubMed Google Scholar
He X, Zhang J: Rapid subfunctionalization accompanied by prolonged and substantial neofunctionalization in duplicate gene evolution. Genetics. 2005, 169: 1157-1164. 10.1534/genetics.104.037051.
Article PubMed PubMed Central Google Scholar
Piskur J, Langkjaer RB: Yeast genome sequencing: the power of comparative genomics. Mol Microbiol. 2004, 53: 381-389. 10.1111/j.1365-2958.2004.04182.x.
Article PubMed CAS Google Scholar
Carlson M: Glucose repression in yeast. Curr Opin Microbiol. 1999, 2: 202-207. 10.1016/S1369-5274(99)80035-6.
Article PubMed CAS Google Scholar
Boles E, Schulte F, Miosga T, Freidel K, Schluter E, Zimmermann FK, Hollenberg CP, Heinisch JJ: Characterization of a glucose-repressed pyruvate kinase (Pyk2p) in Saccharomyces cerevisiae that is catalytically insensitive to fructose-1,6-bisphosphate. J Bacteriol. 1997, 179: 2987-2993.
PubMed CAS PubMed Central Google Scholar
Rodriguez-Trelles F, Tarrio R, Ayala FJ: Evolution of cis-regulatory regions versus codifying regions. Int J Dev Biol. 2003, 47: 665-673.
PubMed CAS Google Scholar
Reimann B, Bradsher J, Franke J, Hartmann E, Wiedmann M, Prehn S, Wiedmann B: Initial characterization of the nascent polypeptide-associated complex in yeast. Yeast. 1999, 15: 397-407. 10.1002/(SICI)1097-0061(19990330)15:5<397::AID-YEA384>3.0.CO;2-U.
Article PubMed CAS Google Scholar
Svetlov VV, Cooper TG: The Saccharomyces cerevisiae GATA factors Dal80p and Deh1p can form homo- and heterodimeric complexes. J Bacteriol. 1998, 180: 5682-5688.
PubMed CAS PubMed Central Google Scholar

Download references

Acknowledgements

We thank Yonatan Bilu and Andreas Doncic for critical reading and members of our lab for helpful discussions. This work was supported by grants from the Kahn fund for Systems Biology at the Weizmann Institute of Science, the Tauber fund, the Israeli Ministry of Science and the Bi-national Science Foundation (BSF).

Author information

Authors and Affiliations

Department of Molecular Genetics, Weizmann Institute of Science, 76100, Rehovot, Israel
Itay Tirosh
Department of Physics of Complex Systems, Weizmann Institute of Science, 76100, Rehovot, Israel
Naama Barkai

Authors

Itay Tirosh
View author publications
You can also search for this author in PubMed Google Scholar
Naama Barkai
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Naama Barkai.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Tirosh, I., Barkai, N. Comparative analysis indicates regulatory neofunctionalization of yeast duplicates. Genome Biol 8, R50 (2007). https://doi.org/10.1186/gb-2007-8-4-r50

Download citation

Received: 21 December 2006
Revised: 15 February 2007
Accepted: 05 April 2007
Published: 05 April 2007
DOI: https://doi.org/10.1186/gb-2007-8-4-r50

Comparative analysis indicates regulatory neofunctionalization of yeast duplicates

Abstract

Background

Results

Conclusion

Background

Results

A novel method for comparative analysis of gene expression

Expression conservation between S. cerevisiae and C. albicansorthologs

Comparison of duplicate gene pairs with their single orthologs

Two modes of expression divergence for duplicates

Properties of duplicates predicted to undergo regulatory neofunctionalization

Whole-genome versus smaller-scale duplications

Divergence of protein sequence versus expression pattern

Discussion

Conclusion

Materials and methods

Definition of homology relationships

Method for comparative analysis of gene expression

Application to duplicate gene pairs

mRNA and protein abundance

Ancestral reconstruction of expression correlations

Variability of protein sequence and expression profiles

Protein sequence divergence

Asymmetry of sequence and expression divergence

Additional data files

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Electronic supplementary material

Authors’ original submitted files for images

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Genome Biology

Contact us