Co-evolutionary networks of genes and cellular processes across fungal species
© Tuller et al.; licensee BioMed Central Ltd. 2009
Received: 24 February 2009
Accepted: 5 May 2009
Published: 5 May 2009
The introduction of measures such as evolutionary rate and propensity for gene loss have significantly advanced our knowledge of the evolutionary history and selection forces acting upon individual genes and cellular processes.
We present two new measures, the 'relative evolutionary rate pattern' (rERP), which records the relative evolutionary rates of conserved genes across the different branches of a species' phylogenetic tree, and the 'copy number pattern' (CNP), which quantifies the rate of gene loss of less conserved genes. Together, these measures yield a high-resolution study of the co-evolution of genes in 9 fungal species, spanning 3,540 sets of orthologs. We find that the evolutionary tempo of conserved genes varies in different evolutionary periods. The co-evolution of genes' Gene Ontology categories exhibits a significant correlation with their functional distance in the Gene Ontology hierarchy, but not with their location on chromosomes, showing that cellular functions are a more important driving force in gene co-evolution than their chromosomal proximity. Two fundamental patterns of co-evolution of conserved genes, cooperative and reciprocal, are identified; only genes co-evolving cooperatively functionally back each other up. The co-evolution of conserved and less conserved genes exhibits both commonalities and differences; DNA metabolism is positively correlated with nuclear traffic, transcription processes and vacuolar biology in both analyses.
Overall, this study charts the first global network view of gene co-evolution in fungi. The future application of the approach presented here to other phylogenetic trees holds much promise in characterizing the forces that shape cellular co-evolution.
The molecular clock hypothesis states that throughout evolutionary history mutations occur at an approximately uniform rate [1, 2]. In many cases this hypothesis provides a good approximation of the actual mutation rate [2, 3] while in other cases it has proven unrealistic [2, 4]. The evolutionary rate (ER) of a gene, the ratio between the number of its non-synonymous to synonymous mutations, dN/dS, is a basic measure of evolution at the molecular level. This measure is affected by many systemic factors, including gene dispensability, expression level, number of protein interactions, and recombination rate [5–11]. Since the factors that influence evolutionary rate are numerous and change in a dynamic fashion, it is likely that the evolutionary rate of an individual gene may vary between different evolutionary periods. Previous studies have investigated co-evolutionary relationships between genes on a small scale, mainly with the aim of inferring functional linkage [12–17]. These studies were mostly based on the genes' phyletic patterns (the occurrence pattern of a gene in a set of current organisms). Recently, Lopez-Bigas et al.  performed a comprehensive analysis of the evolution of different functional categories in humans. They showed that certain functional categories exhibit dynamic patterns of sequence divergence across their evolutionary history. Other studies have examined the correlations between genes' evolutionary rates to predict physical protein-protein interactions [19–24]. A recent publication by Juan et al.  focused on Escherichia coli and generated a co-evolutionary network containing the raw tree similarities for all pairs of proteins in order to improve the prediction accuracy of protein-protein interactions. Here our goal and methodology are different; we concentrate on a set of nine fungal species spanning approximately 1,000 million years . We develop tools to investigate co-evolution in both conserved and less-conserved genes. For the first group, whose members have an identical phylogenetic tree, we employ high-resolution ER measures to investigate gene co-evolution. In the case of less conserved genes, we generalize the concept of propensity for gene loss  to encompass the whole phylogenetic tree in order to better understand the driving forces behind co-evolution.
The first part of this paper describes the analysis of conserved genes. We define a new measure of co-evolution for such genes and study their evolutionary rates along different parts of the evolutionary tree. Next, we reconstruct a co-evolutionary network of genes and a co-evolutionary network of cellular processes according to this measure. In such a network two genes/processes are connected if their co-evolution is correlated. We identify two patterns of co-evolution, correlated (cooperative) and anti-correlated (reciprocal). We show that co-evolution is significantly correlated with co-functionality but not with chromosomal co-organization of genes. We conclude this part by identifying clusters of functions in the co-evolutionary network. Subsequently, in the second part of the paper, we study the evolution of less-conserved genes. We describe a new measure of evolution for such genes and reconstruct a co-evolutionary network of cellular processes according to this measure. We study the resulting clusters in this network and compare it to the co-evolutionary network of the conserved genes.
Results and discussion
The co-evolution of conserved genes
Computing the relative evolutionary rate pattern
The evolutionary rate along different branches of the evolutionary tree
Co-evolution of cellular processes
Two fundamental types of co-evolution
Co-evolutionary network of SOGs and its properties
Co-evolution is correlated with similar functionality
A co-evolution network of cellular functional categories was built for each of the three GO ontologies (biological process, cellular component, molecular function), using two significance cutoff values (Spearman P-value < 0.01 and Spearman P-value < 0.001) to determine significant correlations between GO categories. A list of highly correlated pairs of GO terms is provided in Additional data file 5. The correlation between the distance of GO groups in the 0.001 cutoff co-evolution network (that is, their evolutionary distance) and their distance in the corresponding GO ontology network (that is, their functional distance) is highly significant: 0.38 for cellular component, 0.16 for biological process and 0.43 for molecular function (all three with P-values <10-16; a similar trend is observed using the 0.01 cutoff network). A similarly marked correlation between evolutionary and functional relationships of GO groups is also found when considering positive and negative co-evolution networks separately (Note 3 in Additional data file 2).
Similar results were observed when we considered classification according to Enzyme Commission (EC) number , which is a numerical classification scheme for enzymes based on the chemical reactions they catalyze. By this classification, the code of each enzyme consists of the letters 'EC' followed by four numbers separated by periods. Those numbers represent progressively finer classifications of the enzyme. Thus, it induces a functional distance. Our analysis shows that pairs of orthologs with smaller functional distance (genes whose first two roughest classification levels are identical) exhibit higher levels of correlation between their rERP than other pairs of orthologs (mean rERP correlation of 0.31 versus 0.27, P = 1.23 × 10-7).
Co-evolutionary score and other properties of cellular functions and SOGs
We did not find a parallel significant correlation between the genomic co-localization of GO groups and their co-evolutionary score (see Materials and methods for a description of how we computed the co-localization score of pairs of GO groups). The co-evolution of genes and their chromosomal location are not correlated even when considering each chromosome separately. Thus, we conclude that cellular functionality is a more important force driving gene co-evolution than their genomic organization.
The rERP measure correlates well with other systemic qualities such as genetic and physical interactions. The average Spearman correlation between rERP levels of interacting proteins in the S. cerevisiae protein interaction network is 0.063, which is 155 times higher than the average correlation (4.05 × 10-4) for non-interacting proteins (P < 10-16). Proteins that are part of a complex show a correlation of 0.05 between their rERPs, 100 times higher than the average correlation for proteins that are not a part of the same complex (P < 10-16). The Spearman correlation between rERP levels of genetically interacting proteins is 0.02, which is 32 times higher than the average correlation (6.08 × 10-4) for non-interacting proteins (P = 2.71 × 10-6). Protein rERPs are also correlated with the co-expression of their genes (Spearman correlation 0.063, P < 10-16). The significant correlation between co-evolution and physical/functional interactions suggests that physical interactions between the products of conserved genes play a part in their co-evolution. Namely, to maintain the functionality of an interaction, a change in one protein is likely to facilitate the evolution of the proteins interacting with it, as has already been shown . Yet, as the magnitude of this correlation is rather low, it is likely that other co-evolutionary forces play a part in determining co-evolution, such as the sharing of common and varying growth environments during evolutionary history.
Clustering of co-evolutionary networks
Co-evolution of less conserved genes
The copy number pattern measure
The results presented above were focused on the analysis of a conserved set of genes whose orthologs appear in all nine fungal species studied, comprising 1,372 SOGs and spanning a total of 12,348 genes. The fungal dataset additionally includes 2,168 orthologous sets spanning more than 74,851 genes that exhibit at least one change in their copy number along the phylogenetic tree (and hence have undergone gene loss and/or gene duplication events). The 'propensity for gene loss' (PGL)  was shown to correlate with gene essentiality, the number of protein-protein interactions and the expression levels of genes. PGL has been used in methods for predicting functional gene linkage [42, 43], extending upon previous methods that used the occurrence pattern of a gene in different organisms for the same aim [12–14]. Recently, a probabilistic approach related to the PGL was developed . A related measure, which is also based on a gene's phyletic pattern (the occurrence pattern of a gene in different current organisms), is phylogenetic profiling (PP) [15, 16, 43]. This measure has been employed in previous small scale studies to identify sets of genes with a shared evolutionary history [12–15, 43]. We describe a new measure of co-evolution that is a generalization/unification of both PGL and PP, termed the copy number pattern (CNP). Like PP, it characterizes each gene by examining its phyletic pattern (but additionally takes into account the number of paralogous copies of each gene in the genome). Like PGL, it exploits the information embedded in a species' phylogenetic tree to more accurately characterize the evolutionary history of each gene (in comparison, PP carries out a similar computation based on just the phyletic pattern). We used the new CNP measure to analyze orthologous sets that exhibit at least one change in copy number along the analyzed phylogentic tree. This set of genes is, by definition, not completely conserved, and complements the conserved set of genes analyzed by the rERP measure.
Co-evolution of less conserved genes with the copy number pattern measure
Since changes in the copy number of genes are infrequent events, the Spearman correlations between pairs of CNP vectors are usually very high (the average Spearman correlation is 0.63). To overcome this, we generated CNP vectors of GO processes (according to the biological processes ontology) where the CNP of a GO category is the mean CNP of all the genes it contains. These GO process vectors exhibit a wider range of CNP values. Next, we constructed a GO process co-evolution network. In this network two biological processes are connected by an edge only if they manifest an extreme co-evolution pattern - that is, if they have a Spearman rank correlation that is higher (green colored edges, denoting coordinated relationships) or lower (red, denoting reciprocal relationships) than the correlation values of X% of the total GO pairs. We examined the networks formed under two edge-selection regimes, a more stringent one where X% = 99.9% and a less stringent one where X% = 98%. The correlation between the distance of GO groups in the network with X% = 99.9% and the distance of GO groups in the different GO ontology networks is highly significant (r = 0.4209, P < 10-16) for green, cooperative edges, and negatively correlated (r = -0.12, P < 0.04) for red, reciprocal edges. This suggests that the two types of edges are informative: the green edges represent functional relationships while the red ones represent pairs of GOs with distant functions.
Clustering of the copy number pattern evolutionary network
Comparison of the co-evolution of conserved versus less-conserved genes
A comparison between the results obtained by the rERP and CNP methodologies at a global level should be done with some caution, for three main reasons. First, these two measures are applicable for the analysis of completely disjoint, complementary sets of orthologs. Second, the two methodologies measure different types of co-evolution. The rERP measures evolution via amino acid substitutions while the CNP measures co-evolution via changes in gene copy number, which are mainly driven by gene gain and loss events. Thus, third, these co-evolutionary relationships are possibly the result of the action of different evolutionary forces. However, it may be noted that some biological processes present the same type of evolutionary relationship with both methods. For example, DNA metabolism is always positively correlated with nuclear traffic, transcription and vacuolar biology (ER-Golgi traffic). Yet, some of the clusters exhibit different relationships when analyzed by the two measures. For example, a cluster containing mainly genes labeled ribosome biology and vacuolar biology exhibits reciprocal evolution with DNA metabolism by rERP (clusters A3 to A7) but coordinated evolution by CNP (clusters B5 to B7). Thus, within a certain biological process, the evolutionary pressures exerted on highly conserved genes may differ from those that apply to less conserved ones, and may thus provide different opportunities for co-evolution.
Our analysis charts the first global network view of the co-evolution of conserved and less conserved genes in nine fungal species. We find that cellular functions play a more important driving force in gene co-evolution than the genes' chromosomal location. Two fundamental patterns of co-evolution, cooperative and reciprocal, are defined, and, remarkably, we find that only genes co-evolving cooperatively functionally back each other up. At the single gene level, the observation that genes have evolved at accelerated rates in a localized manner on only three branches of the fungal tree is in line with previous findings suggesting that a large fraction of DNA mutations can be attributed to punctuated evolution . The fungal tree analyzed here is a natural starting point. The future application of the approach presented here to other phylogenetic trees, including the mammalian one, holds much promise in characterizing the forces that shape cellular co-evolution.
Materials and methods
The GO functional classification used in this work is the most comprehensive, qualitative, and widely used annotation database .
The GO and GO slim annotations and protein composition data for complexes were downloaded from the Saccharomyces Genome Database . We checked and report results both for the GO slim classification (Figure 5), the roughest level of classification, and the general GO ontology (Figures 4, 7, 8, 10, and 12), where we filtered GO groups that were too small (with less than five SOGs in our dataset). That is, the main bulk of the analysis was performed across the whole GO ontology, without focusing on any arbitrary level. Note 4 in Additional data file 2 and Additional data file 7 includes statistics and error rates of the annotations used in this work.
The genetic interaction network data were downloaded from the BioGrid database ; similar results were obtained when the genetic interaction network of Tong et al.  was used (data not shown). Recent work showed that only 5% of the genetic interactions are conserved in S. cerevisiae and Caenorhabditis elegans . Note that the evolutionary distances between these species (1,542 million years, according to ) are much larger than those between the organisms in our dataset (20 to 837 million years). Further, C. elegans is multi-cellular while all the analyzed fungi are unicellular. Thus, it is not clear how the conclusions of Tischler et al.  are related to our dataset.
More importantly, in this work we study the relationship between the genetic interaction network and its co-evolutionary network for one organism (S. cerevisiae) for which we know the genetic interactions network. As there is no reason to believe that this organism is 'special' in any way, we believe that these findings are representative of the expected findings for other organisms if or when their genetic interaction networks become known.
The protein interaction network of the budding yeast S. cerevisiae was downloaded from the BioGrid database .
Gene expression data were taken from the Stanford MicroArray Database . The GO ontology network of yeast was downloaded from the Open Biomedical Ontologies Foundry ontologies . EC numbers of the analyzed genes were downloaded from the Kyoto Encyclopedia of Genes and Genomes (KEGG) .
Computing relative evolutionary rate patterns for orthologous gene sets
The selection of species used here is not arbitrary; obviously, different selections are perhaps equally plausible but several considerations led us to the current selection, which we outline below. For this study, we used fungi whose genomes have been completely assembled (at the time this study was performed: July 2007) according to the National Center for Biotechnology Information (NCBI) and for which we could infer the tRNA gene repertoire reliably and, thus, compute the tRNA adaptation index (tAI). These include S. cerevisiae, C. glabrata, K. lactis, D. hansenii, Y. lipolytica and Schizosaccharomyces pombe. This selection was then augmented by three additional species: C. albicans, an important fungal pathogen for which a high-quality gene collection (including tRNA genes) has recently become available ; S. bayanus, a Saccharomyces sensu stricto species that diverged from S. cerevisiae approximately 20 million years ago and for which an overwhelming majority of the open reading frames are available ; and A. nidulans, a filamentous fungus with a high-quality sequence. Furthermore, these species were analyzed recently by Man and Pilpel , serving as an appropriate reference set for studying evolutionary events in fungi.
Finally, due to the large evolutionary distance between S. pombe and the hemiascomycotic species (350 to 1,000 million years ago ), this set of species present a nice distribution of evolutionary time. We believe that small changes in the set of fungi species would likely yield quite similar results (see details in Note 5 in Additional data file 2).
The final dataset included genomes of nine fungal species: A. nidulans, C. albicans, C. glabrata, D. hansenii, K. lactis, S. bayanus, S. cerevisiae, S. pombe, Y. lipolytica.
Computation of the rERPs is a multi-step process (Figure 1 provides an overview), described in detail as follows. The phylogenetic tree used to analyze the data (Figure 2) was formed according to the analysis of 18S rRNA data in , the analysis of 531 concatenated proteins , and the analysis of additional gene sets listed in  (step A in Figure 1). The orthologous sets for the nine fungi were downloaded from  (step B in Figure 1). This dataset was generated by the MultiParanoid program . We considered only sets that include orthologs in all nine species. Sets of homologs that did not include exactly one representative in each organism were removed from our dataset to filter out paralogs and avoid potential errors in evolutionary rate estimation due to duplication events (step C in Figure 1). Horizontal gene transfer events (see, for example, ) are rare in fungi  and thus were not considered in our analysis. The final dataset included 1,372 orthologous sets. Stop codons were removed and each gene was translated to a sequence of amino acids. Each orthologous set was then aligned by CLUSTALW 1.83  with default parameters. By using amino acids as templates for the nucleotide sequences and by ignoring gaps we generated gap-free multiple alignments of the nine orthologous proteins in each orthologous set and their corresponding coding sequences (step D in Figure 1).
Given the alignments of each set of orthologs and given the phylogenetic tree, we used the codeml program in PAML for the joint reconstruction of ancestral codons  in each of the internal nodes of the phylogenetic tree (step E in Figure 1). This reconstruction induced the sequence of ancestral proteins and their corresponding ancestral DNA coding sequences. We hence obtained sets of 16 sequences; 9 from the previous step (corresponding to the 9 leaves of the phylogenetic tree; Figure 2) plus 7 reconstructed sequences of the internal nodes of the phylogenetic tree (ancestral nodes 10-16 in Figure 2). We denote such a set of 16 sequences a 'complete ortholgous set'. For each complete ortholgous set, we computed the dN and dS in each branch of the evolutionary tree using the y00 program in PAML [28, 59] (step F in Figure 1). The outputs of this stage are two vectors of 15 positive real numbers for each complete ortholgous set (1,372 pairs of vectors in our case). These vectors denote the dN and dS values at the 15 different branches of the evolutionary tree.
where r 0 is the neutral evolutionary rate, k is a constant, and t is time. Our goal is to estimate dS' = r 0 × t, which is done using regression. This requires the computation of the tAIs of each of the gene sequences (the leaves of the phylogenetic tree), and the estimation of the tAIs of the sequences at the internal nodes of the phylogenetic tree. To this end (step G in Figure 1) we used the tRNA copy number of each species as reported in , and the ancestral tRNA copy numbers were reconstructed following  (step I in Figure 1) using the CAFÉ program.
The edge lengths (step H in Figure 1) for CAFÉ were computed by the following steps. Step one: we inferred edge lengths under the molecular clock assumption for the tree topology of Figure 2 and the concatenation of all the sets of ortholog proteins (561,072 sites) using the codeml program in PAML . Step two: we normalized the log of branch lengths to obtain branch lengths that are integers between 0 and 1,000 that reflect putative time units (this is the requirement of the method of . Step three: we used an expectation-maximization (EM) algorithm to find the optimal value of λ (0.001756) for the model (see ).
It is important to note that by optimizing λ we actually optimize the likelihood of the model, and the result is invariant for the choice of the normalization factor of the branch lengths. The relatively similar tRNA copy number distribution of the nine species  also induces a quite similar tRNA copy number distribution at the ancestral nodes of the phylogenetic tree. To compute the tAI of each complete orthologous set (step J in Figure 1), we used the Matlab, R, and Pearl scripts from  (see  for the exact description of how to compute the tAI).
Thus, by using tAI and dS we were able to adjust the dS values for selection on synonymous sites, resulting in a new value, dS'. This was done for each ortholgous set in each of the tree branches (step K in Figure 1). These dS' values were used for computing the corresponding values of adjusted evolutionary rates, dN/dS'. As mentioned, the idea underlying this step  is to assume a linear relationship between dS and tAI, and its computation proceeds as follows.
The final output of this procedure is a total of 1,373 vectors, each with 15 dN/dS' values denoting the ERP values of each complete orthologous set.
Usually, for very high levels of substitution rate (long branches in the evolutionary tree), the error in the estimated dS values increases . This well known phenomenon is named saturation. Thus, we perform an additional normalization of the dN values by computing the ranked evolutionary rate, rER. The ranked evolutionary rate, rER (step L in Figure 1), is computed separately for each branch of the evolutionary tree. For a given branch, the rank of the dN/dS' of a complete ortholgous set among the dN/dS' values of all the complete ortholgous sets is the number of sets that have lower dN/dS' values in this branch (a number between 1 and the total number of complete ortholgous sets, 1,373). The rERP of a complete ortholgous set is the vector of its ranked evolutionary rate along the 15 branches of the evolutionary tree. Note 6 in Additional data file 2 and Additional data file 8 include a comparison of the dN/dS' values to previous evolutionary rate results in a previous study by Wall et al. .
The rERP developed and used in this work is different from the measure used in  in many important ways. We ranked the ER and adjusted the computed dN/dS for selection on synonymous sites by using the tAI measure (an approach that has not been used before). Additionally, we used the non-parametric Spearman correlation instead of the Pearson correlation. A comparison of the results obtained using our measure with those obtained using Fraser et al.'s ER measure for studying co-evolution shows that the ratio between the average correlation of physically interacting genes versus non interacting genes is very low when using Fraser et al.'s measure (0.06/0.022 = 2.72), showing a very low discriminative power. In comparison, the ratio obtained using our measure is markedly higher (0.063/4.05 × 10-4 = 155), in correspondence with the expectation that interacting proteins would tend to co-evolve much more than non-interacting ones (for example, see ).
Measures that were based only on dN instead of dN/dS performed worse than the rERP mentioned above. Using dN without ranking is problematic as longer branches have, in general, higher dN and, thus, the correlations obtained were very high and quite similar when comparing genes that physically interact and those that do not (r = 0.77 versus r = 0.73).
When we used ranked dN instead of ranked dN/dS, we achieved better results, which were almost as good as the results we got using rERP, in terms of separating pairs of physically interacting from non-interacting proteins via their co-evolution. For example, the ratio between the correlation of protein-interacting to non-interacting proteins was 20 using ranked dN, weaker than the ratio of 155 observed using rERP values.
Constructing and analyzing the co-evolution networks of GO terms
The co-evolution network of GO terms was constructed as follows. First, consider only GO groups that include at least three genes. Second, compute the rERP of each GO group. Third, compute the Spearman correlation between the rERP of all pairs of GO groups. Fourth, connect a pair of GO groups, Gi and Gj, if the following two conditions are satisfied: condition one, they have significant correlation (P < 0.01 in the case of the network in Figure 10), where significance is computed empirically versus a corresponding random shuffled network; condition two, the two sets do not strongly overlap in their gene content, having a Jaccard coefficient < 0.5 .
The distance between GO terms on the GO network was computed by replacing each directed edge in the original graph with an undirected one, and computing the length of the minimal path between the two GO groups.
The co-evolution network was clustered and visualized using the Matlab implementation of the PRISM algorithm . The PRISM algorithm was instrumental to our analysis as it partitions the graph according to the two types of edges (positive (cooperative) or negative (reciprocal) rERP correlations) to get clusters of nodes, such that nodes from one cluster have edges of a similar type with nodes from other clusters. This is of particular interest when studying co-evolution, since it identifies 'monochromatic' relationships between groups of genes/GO functions - that is, groups of genes that relate to each other in either a completely cooperative or a completely reciprocal manner. To the best of our knowledge, no other method/algorithm is available to achieve this goal. Other clustering algorithms do not preserve the 'monochromatic' property and, hence, are not suitable for addressing the question at hand.
The significance of the monochromaticity of the resulting clustering was computed by comparing the number of conflicts (the number of edges between nodes that are in different clusters and have a color different from that of the majority of edges between the two clusters) in the original clustering to its distribution in 1,000 randomly shuffled networks with similar topological properties.
Analyzing the genomic co-localization of GO terms
We define the distance between two GO groups as the median of all shortest distances between each gene in one GO group to each gene in the other GO group. We did not consider genes that are common to the two GO groups. For estimating to what extent two GO groups tend to be located close to each other in the genome, we computed a P-value based on comparing their median distance to that of a background model obtained by randomly locating all the genes of both groups in the genome, and recomposing their median distance for each such assignment (repeating this process 100 times to obtain a distribution of background model medians). The P-value is the fraction of times that a random shift yields a lower distance between the two GO groups.
Additional data files
The following additional data are available with the online version of this paper: a table that includes the orthologous sets that exhibit positive evolution for each of the tree branches (Additional data file 1); supplementary notes 1 to 6 (Additional data file 2); a table with GO processes with less than 20 genes (biological process ontology) sorted by their mean rERP and variance of rERP (Additional data file 3); a figure that includes the mean correlation between the evolutionary patterns of pairs of GO groups (y-axis) as a function of their distance (the shortest connecting pathway) in the GO network (x-axis) when using the ontology of S. pombe (Additional data file 4); a table with pairs of GO groups exhibiting a significant correlation between their rERPs (Additional data file 5); a table with GO enrichments (biological process) for the conserved and non-conserved genes (Additional data file 6); a figure that depicts the distribution of the number of annotations per gene for the conserved and non-conserved genes (Additional data file 7); a figure that depicts the ER values computed in our study versus the ER values computed in Wall et al.  (Additional data file 8).
copy number pattern
evolutionary rate pattern
propensity for gene loss
relative evolutionary rate pattern
set of orthologous genes
tRNA adaptation index.
TT is supported by the Edmond J Safra Bioinformatics program at Tel Aviv University and the Yeshay Horowitz Association through the Center for Complexity Science. MK is supported by grants from the Israel Science Foundation, the Israel Cancer Research Fund, the Israel Cancer Association and the US-Israel Bi-National Fund (BSF). ER's research is supported by grants from the Israel Science Foundation, including the Converging Technologies grant with MK, and the Tauber Fund.
- Zuckerkandl E, Pauling LB: Molecular disease, evolution, and genetic heterogeneity. Horizons in Biochemistry. Edited by: Kasha M, Pullman B. 1962, New York: Academic Press, 189-225.Google Scholar
- Kumar S: Molecular clocks: four decades of evolution. Nat Rev Genet. 2005, 6: 654-662.PubMedView ArticleGoogle Scholar
- Douzery EJ, Delsuc F, Stanhope MJ, Huchon D: Local molecular clocks in three nuclear genes: divergence times for rodents and other mammals and incompatibility among fossil calibrations. J Mol Evol. 2003, 57: S201-213.PubMedView ArticleGoogle Scholar
- Pagel M, Venditti C, Meade A: Large punctuational contribution of speciation to evolutionary divergence at the molecular level. Science. 2006, 314: 119-121.PubMedView ArticleGoogle Scholar
- Fraser HB, Hirsh AE, Steinmetz LM, Scharfe C, Feldman MW: Evolutionary rate in the protein interaction network. Science. 2002, 296: 750-752.PubMedView ArticleGoogle Scholar
- Hirsh AE, Fraser HB: Protein dispensability and rate of evolution. Nature. 2001, 411: 1046-1049.PubMedView ArticleGoogle Scholar
- Drummond DA, Bloom JD, Adami C, Wilke CO, Arnold FH: Why highly expressed proteins evolve slowly. Proc Natl Acad Sci USA. 2005, 102: 14338-14343.PubMedPubMed CentralView ArticleGoogle Scholar
- Wall DP, Hirsh AE, Fraser HB, Kumm J, Giaever G, Eisen MB, Feldman MW: Functional genomic analysis of the rate of protein evolution. Proc Natl Acad Sci USA. 2005, 102: 5483-5488.PubMedPubMed CentralView ArticleGoogle Scholar
- Pál C, Papp B, Lercher MJ: An integrated view of protein evolution. Nat Rev Genet. 2006, 7: 337-348.PubMedView ArticleGoogle Scholar
- Chena Y, Dokholyana NV: The coordinated evolution of yeast proteins is constrained by functional modularity. Trends Genet. 2006, 22: 416-419.View ArticleGoogle Scholar
- Marino-Ramirez L, Bodenreider O, Kantz N, Jordan IK: Co-evolutionary rates of functionally related yeast genes. Evol Bioinform Online. 2006, 2: 295-300.Google Scholar
- Wu J, Kasif S, DeLisi C: Identification of functional links between genes using phylogenetic profiles. Bioinformatics. 2003, 19: 1524-1530.PubMedView ArticleGoogle Scholar
- Snel B, Huynen MA: Quantifying modularity in the evolution of biomolecular systems. Genome Res. 2004, 14: 391-397.PubMedPubMed CentralView ArticleGoogle Scholar
- Bowers PM, Pellegrini M, Thompson MJ, Fierro J, Yeates TO, Eisenberg D: Prolinks: a database of protein functional linkages derived from coevolution. Genome Biol. 2004, 5: R35-PubMedPubMed CentralView ArticleGoogle Scholar
- Jothi R, Przytycka TM, Aravind L: Discovering functional linkages and uncharacterized cellular pathways using phylogenetic profile comparisons: a comprehensive assessment. BMC Bioinformatics. 2007, 8: 173-PubMedPubMed CentralView ArticleGoogle Scholar
- Sun J, Xu J, Liu Z, Liu Q, Zhao A, Shi T, Li Y: Refined phylogenetic profiles method for predicting protein-protein interactions. Bioinformatics. 2005, 21: 3409-3415.PubMedView ArticleGoogle Scholar
- Krylov DM, Wolf YI, Rogozin IB, Koonin EV: Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 2003, 13: 2229-2235.PubMedPubMed CentralView ArticleGoogle Scholar
- Lopez-Bigas N, De S, Teichmann SA: Functional protein divergence in the evolution of Homo sapiens. Genome Biol. 2008, 9: R33-PubMedPubMed CentralView ArticleGoogle Scholar
- Pazos F, Helmer-Citterich M, Ausiello G, Valencia A: Correlated mutations containinformation about protein-protein interaction. J Mol Biol. 1997, 271: 511-523.PubMedView ArticleGoogle Scholar
- Goh C, Cohen FE: Co-evolutionary analysis reveals insights into protein-protein interactions. J Mol Biol. 2002, 324: 177-192.PubMedView ArticleGoogle Scholar
- Pazos F, Valencia A: In silico two-hybrid system for the selection of physically interacting protein pairs. Proteins. 2002, 47: 219-227.PubMedView ArticleGoogle Scholar
- Ramani AK, Marcotte EM: Exploiting the co-evolution of interacting proteins to discover interaction specificity. J Mol Biol. 2003, 327: 273-284.PubMedView ArticleGoogle Scholar
- Fraser HB, Hirsh AE, Wall DP, Eisen MB: Coevolution of gene expression among interacting proteins. Proc Natl Acad Sci USA. 2004, 101: 9033-9038.PubMedPubMed CentralView ArticleGoogle Scholar
- Juan D, Pazos F, Valencia A: High-confidence prediction of global interactomes based on genome-wide coevolutionary networks. Proc Natl Acad Sci USA. 2008, 105: 934-939.PubMedPubMed CentralView ArticleGoogle Scholar
- Berbee M, Taylor J: Systematics and evolution. The Mycota. Edited by: McLaughlin D, McLaughlin E, Lemke P. 2001, Berlin: Springer, VIIB: 229-245.Google Scholar
- Prillinger H, Lopandic K, Schweigkofler W, Deak R, Aarts HJ, Bauer R, Sterflinger K, Kraus GF, Maraz A: Phylogeny and systematics of the fungi with special reference to the Ascomycota and Basidiomycota. Chem Immunol. 2002, 81: 207-295.PubMedView ArticleGoogle Scholar
- Kuramae EE, Robert V, Snel B, Weiss M, Boekhout T: Phylogenomics reveal a robust fungal tree of life. FEMS Yeast Res. 2006, 6: 1213-1220.PubMedView ArticleGoogle Scholar
- Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17: 32-43.PubMedView ArticleGoogle Scholar
- Hirsh AE, Fraser HB, Wall DP: Adjusting for selection on synonymous sites in estimates of evolutionary distance. Mol Biol Evol. 2005, 22: 174-177.PubMedView ArticleGoogle Scholar
- Koonin EV, Rogozin IB: Getting positive about selection. Genome Biol. 2003, 4: 331-PubMedPubMed CentralView ArticleGoogle Scholar
- Wolfe KH, Shields DC: Molecular evidence for an ancient duplication of the entire yeast genome. Nature. 1997, 387: 708-713.PubMedView ArticleGoogle Scholar
- Ohno S: Evolution by Gene Duplication. 1970, Berlin: SpringerView ArticleGoogle Scholar
- Man O, Pilpel Y: Differential translation efficiency of orthologous genes is involved in phenotypic divergence of yeast species. Nat Genet. 2007, 39: 415-421.PubMedView ArticleGoogle Scholar
- Dujon B, Sherman D, Fischer G, Durrens P, Casaregola S, Lafontaine I, De Montigny J, Marck C, Neuve'glise C, Talla E, Goffard N, Frangeul L, Aigle M, Anthouard V, Babour A, Barbe V, Barnay S, Blanchin S, Beckerich JM, Beyne E, Bleykasten C, Boisrame' A, Boyer J, Cattolico L, Confanioleri F, De Daruvar A, Despons L, Fabre E, Fairhead C, Ferry-Dumazet H: Genome evolution in yeasts. Nature. 2004, 430: 35-44.PubMedView ArticleGoogle Scholar
- Scannell DR, Butler G, Wolfe KH: Yeast genome evolution-the origin of the species. Yeast. 2007, 24: 929-942.PubMedView ArticleGoogle Scholar
- Bremer J, Greenberg DM: Methyl transferring enzyme system of microsomes in the biosynthesis of lecithin (phosphatidylcholine). Biochim Biophys Acta. 1961, 46: 205-216.View ArticleGoogle Scholar
- Chin J, Bloch K: Phosphatidylcholine synthesis in yeast. J Lipid Res. 1988, 29: 9-14.PubMedGoogle Scholar
- Tong AH, Lesage G, Bader GD, Ding H, Xu H, Xin X, Young J, Berriz GF, Brost RL, Chang M, Chen Y, Cheng X, Chua G, Friesen H, Goldberg DS, Haynes J, Humphries C, He G, Hussein S, Ke L, Krogan N, Li Z, Levinson JN, Lu H, Me'nard P, Munyana C, Parsons AB, Ryan O, Tonikian R, Roberts T, et al: Global mapping of the yeast genetic interaction network. Science. 2004, 303: 808-813.PubMedView ArticleGoogle Scholar
- van Noort V, Snel B, Huynen MA: The yeast coexpression network has a small-world, scale-free architecture and can be explained by a simple model. EMBO Rep. 2004, 5: 280-284.PubMedPubMed CentralView ArticleGoogle Scholar
- Enzyme EC Numbers. [http://www.genome.ad.jp/htbin/get_htext?ECtable]
- Segre D, Deluna A, Church GM, Kishony R: Modular epistasis in yeast metabolism. Nat Genet. 2005, 37: 77-83.PubMedGoogle Scholar
- Borenstein E, Shlomi T, Ruppin E, Sharan R: Gene loss rate: a probabilistic measure for the conservation of eukaryotic genes. Nucleic Acids Res. 2007, 35: e7-PubMedPubMed CentralView ArticleGoogle Scholar
- Wu J, Hu Z, DeLisi C: Gene annotation and network inference by phylogenetic profiling. BMC Bioinformatics. 2006, 7: 80-PubMedPubMed CentralView ArticleGoogle Scholar
- De Bie T, Cristianini N, Demuth JP, Hahn MW: CAFE: a computational tool for the study of gene family evolution. Bioinformatics. 2006, 22: 1269-1271.PubMedView ArticleGoogle Scholar
- Gene Ontology. [http://www.geneontology.org/index.shtml]
- Saccharomyces Genome Database. [http://www.yeastgenome.org/]
- BioGRID Database. [http://www.thebiogrid.org/]
- Tischler J, Lehner B, Fraser AG: Evolutionary plasticity of genetic interaction networks. Nat Genet. 2008, 40: 390-391.PubMedView ArticleGoogle Scholar
- Marinelli RJ, Montgomery K, Liu CL, Shah NH, Prapong W, Nitzberg M, Zachariah ZK, Sherlock GJ, Natkunam Y, West RB, Rijn van de M, Brown PO, Ball CA: The Stanford Microarray Database. Nucleic Acids Res. 2001, 29: 152-155.View ArticleGoogle Scholar
- The Open Biomedical Ontologies. [http://obofoundry.org/]
- KEGG LIGAND Database. [ftp://ftp.genome.jp/pub/kegg/ligand/]
- Braun BR, van Het Hoog M, d'Enfert C, Martchenko M, Dungan J, Kuo A, Inglis DO, Uhl MA, Hogues H, Berriman M, Lorenz M, Levitin A, Oberholzer U, Bachewich C, Harcus D, Marcil A, Dignard D, Iouk T, Zito R, Frangeul L, Tekaia F, Rutherford K, Wang E, Munro CA, Bates S, Gow NA, Hoyer LL, Köhler G, Morschhäuser J, Newport G: A human-curated annotation of the Candida albicans genome. PLoS Genet. 2005, 1: 36-57.PubMedView ArticleGoogle Scholar
- Kellis M, Birren BW, Lander ES: Proof and evolutionary analysis of ancient genome duplication in the yeast Saccharomyces cerevisiae Nature. Nature. 2004, 428: 617-624.PubMedView ArticleGoogle Scholar
- Kurtzman CP, Robnett CJ: Phylogenetic relationships among yeasts of the 'Saccharomyces complex' determined from multigene sequence analyses. FEMS Yeast Res. 2003, 3: 417-432.PubMedView ArticleGoogle Scholar
- Alexeyenko A, Tamas I, Liu G, Sonnhammer EL: Automatic clustering of orthologs and in paralogs shared by multiple proteomes. Bioinformatics. 2006, 22: e9-e15.PubMedView ArticleGoogle Scholar
- Bapeste E, Susko E, Leigh J, MacLeod D, Charlebois RL, Doolittle WF: Do orthologous gene phylogenies really support tree-thinking?. BMC Evol Biol. 2005, 5: 33-View ArticleGoogle Scholar
- Ahenna RC, Sugawara H, Koike T, Lopez R, Gibson TJ, Higgins DG, Thompson JD: Multiple sequence alignment with the Clustal series of programs. Nucleic Acids Res. 2003, 31: 3497-3500.View ArticleGoogle Scholar
- Pupko T, Pe'er I, Shamir R, Graur D: A fast algorithm for joint reconstruction of ancestral amino acid sequences. Mol Biol Evol. 2000, 17: 890-896.PubMedView ArticleGoogle Scholar
- Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-PubMedGoogle Scholar
- Akashi H: Gene expression and molecular evolution. Curr Opin Genet Dev. 2001, 11: 660-666.PubMedView ArticleGoogle Scholar
- dos Reis M, Savva R, Wernisch L: Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 2004, 32: 5036-5044.PubMedView ArticleGoogle Scholar
- Sharp PM, Li WH: The codon adaptation index: a measure of directional synonymous codon usage bias and its potential applications. Nucleic Acids Res. 1987, 15: 1281-1295.PubMedPubMed CentralView ArticleGoogle Scholar
- Gojobori T: Codon substitution in evolution and the "saturation" of synonymous changes. Genetics. 1983, 105: 1011-1027.PubMedPubMed CentralGoogle Scholar
- Jaccard P: The distribution of flora in the alpine zone. New Phytologist. 1912, 11: 37-50.View ArticleGoogle Scholar
- Kiryutin B, Koonin EV, Krylov DM, Mazumder R, Mekhedov SL, Nikolskaya AN, Rao BS, Smirnov S, Sverdlov AV, Vasudevan S, Wolf YI, Yin JJ, Natale DA: The COG database: an updated version includes eukaryotes. BMC Bioinformatics. 2003, 4: 41-PubMedPubMed CentralView ArticleGoogle Scholar
- Jones CE, Brown AL, Baumann U: Estimating the annotation error rate of curated GO database sequence annotations. BMC Bioinformatics. 2007, 8: 170-PubMedPubMed CentralView ArticleGoogle Scholar
- Buza TJ, McCarthy FM, Wang N, Bridges SM, Burgess SC: Gene Ontology annotation quality analysis in model eukaryotes. Nucleic Acids Res. 2008, 36: e12-PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.