Evolution of biological interaction networks: from models to real data
© BioMed Central Ltd 2011
Received: 15 August 2011
Accepted: 12 December 2011
Published: 28 December 2011
Skip to main content
© BioMed Central Ltd 2011
Received: 15 August 2011
Accepted: 12 December 2011
Published: 28 December 2011
We are beginning to uncover common mechanisms leading to the evolution of biological networks. The driving force behind these advances is the increasing availability of comparative data in several species.
Biological interaction networks have been the main focus of systems biology in recent years. These interactions form a spectrum of biological networks, including protein-protein interaction (PPI), transcription factor-target regulation, genetic interaction and metabolic networks. Each of these networks provides insight into different intracellular communication systems, from gene and post-translational regulation (transcription factor-target and kinase-substrate phosphorylation networks, respectively) to physical interactions between proteins (PPI networks). Given their importance, studies have attempted to characterize the global evolutionary mechanisms that shape network architectures, which would help to understand the network design principles and evolutionary forces that ultimately determine the network of a species. Such studies are possible as a result of the development of methods such as the yeast-two hybrid system [1, 2], tandem affinity purification followed by mass spectrometry [3, 4], and chromatin immunoprecipitation followed by either microarray chip (ChIP-chip)  or high-throughput sequencing (ChIP-seq) [6, 7], which can rapidly interrogate the interaction network of a given species, leading to a dramatic increase in biological interaction data for several species. Large, but yet incomplete, networks for Homo sapiens [1, 2, 8] and model eukaryotic organisms such as Saccharomyces cerevisiae [3–5, 9–13], Caenorhabditis elegans [6, 14, 15] and Drosophila melanogaster [7, 16–20] are available in many multispecies data repositories [21–24].
We review recent progress in the study of biological network evolution, with a particular focus on the PPI network, because this has been studied in more depth (other networks, such as the transcription factor-target network, are also available to varying degrees of completion). While networks have been examined in the past using computational simulations [25, 26], here we focus on studies based on experimental data primarily from high-throughput methods. The shift to using experimental data has enabled observation of different properties of network evolution. For instance, early studies suggested that certain interactions tend to be conserved, and this finding was used to transfer annotation knowledge and identify important cellular pathways between different species. We also discuss network hubs and motifs, which are conserved elements whose members are more likely to maintain the same functionality between species. Conversely, networks are evolutionarily very dynamic. We explore divergent network elements, such as how networks change over time between species (a phenomenon known as network rewiring). We review the different rates at which interaction networks, such as PPI and transcription factor-target networks, rewire, and explore why regulatory networks rewire at a more rapid rate than PPI networks. Finally, we look at methods to estimate the rate of network rewiring, given that different types of interaction networks have been elucidated to different degrees of completeness.
Similar to the history of molecular evolution, interest in network evolution began before the availability of sufficient high-throughput data for empirical study. Thus, early studies focused on top-down models of network evolution, and attempted to recapitulate global features of these experimentally determined networks by focusing on their topological properties. Biological networks maintain interesting topological properties, such as a scale-free topology (where only a few proteins or metabolic substrates maintain a high number of interacting partners), hierarchical modularity (a scale-free network composed of modular components) and degree-dissortativity (the tendency of hub proteins to connect with non-hub proteins in PPI networks) [38–40]. To explore evolutionary mechanisms responsible for developing current biological networks, various models have been proposed to describe these observed network properties, thereby proposing how different network architectures and topologies are formed. Each of these models provides a unique perspective on network evolution and can be grouped by how the network is grown; they include (1) preferential attachment, (2) gene duplication and divergence (DD), and (3) physical constraint models (for example, the crystal growth model).
Instead of focusing on topological rules such as preferential attachment to generate networks with a desired network topology, other studies have focused on physical attributes or constraints that may guide network evolution. One of the first constraints developed was the notion that intrinsic protein fitness influences the growth of new nodes (that is, proteins that are more important to the cell gradually acquire more interactions). Such a method is capable of realizing scale-free networks; however, it only considers protein change, which is only one of the possible network rewiring mechanisms . To capture additional topological attributes found in biological networks, a crystal growth model has been proposed , whereby network growth is governed by the availability of unoccupied protein interaction surface. In this model, new nodes are added to an existing cluster of nodes or used to start a new cluster upon which interactions to other nodes are made based on the availability of unoccupied interaction surface area. If the new node (i) is added to an existing cluster, a two-step extension process is performed: (1) selecting a cluster member, node j, based on the available unoccupied interaction surface area; and (2) adding neighbors of node j randomly as interaction partners to the new node i (Figure 2). Interestingly, this physical constraint approach creates networks that not only have scale-free topology, but also hierarchic modularity and degree-dissortativity, properties found in experimentally derived PPI networks .
To comprehensively describe the evolutionary events giving rise to the network architectures we observe today, knowledge about the ancestral networks are required. While ancestral network reconstruction methods exist, they require the complete interaction networks . Since complete interaction networks are currently unavailable, the validity of different evolutionary models is evaluated by their ability to recapitulate known topological network properties, thereby demonstrating that these models provide insight into the origins and putative mechanisms giving rise to the organization of present day networks. By these metrics, the crystal growth model is likely to capture the underlying principals resulting in different network architectures. Interestingly, the models presented here all randomly grow the network while capturing many network topological features without including any explicit selective pressure, suggesting a stochastic component is responsible for network evolution. Additionally, current models of network evolution only grow the interaction network. Once the nodes and interactions are created, they remain unaltered for the duration of the simulation. Future models should not only consider network growth but also network rewiring events to nodes other than the newly created node, thereby incorporating a larger number of observed network mechanisms.
Conserved network elements such as protein orthologs and interologs were some of the first features discovered in networks. These conserved elements can be considered analogous to conserved motifs or domains in protein sequences. Focusing on proteins, or conserved interactions between proteins and their binding partners, led to the discovery of several features that are evolutionarily conserved [33, 52, 53].
The binary classification between hub and non-hub proteins only considers the total number of interactions in which a protein participates. The network motif is used to help describe interaction patterns among multiple proteins; where a network motif is an interaction pattern within an interaction network that occurs significantly more or less frequently than in a randomized network (Figure 3b). Network motifs are analogous to sequence motifs. They were first found in the Escherichia coli and S. cerevisiae transcription regulation networks and their discovery helped uncover local network substructures, such as feed-forward loops, highlighting a preference for specific protein arrangement within these regulation networks . A subsequent study by Wuchty et al.  asked whether proteins participating within a network motif were more conserved. Using the presence of an ortholog in the comparison species to determine protein conservation, they found that proteins that are members of larger and more interconnected motifs are more likely to be conserved than proteins that are not. This is most probably because highly interconnected motifs tend to be protein complexes, whereby the interaction partners place evolutionary constraints on each other.
While functional conserved network elements such as motifs have been identified, functional conservation itself can also be used for interaction prediction because conserved proteins are likely to maintain the same functionality. Thus, in a given species interaction network, functional knowledge about an interaction can likely be transferred to another species if both interaction partners are identifiable as orthologs. The additional constraint of an interaction further increases the likelihood that the conserved proteins maintain the same functionality across different species . Hence, we can transfer an interaction by identifying its so-called interolog of the queried interaction in a PPI network or the regulog in the transcription factor-target regulation network [33, 35]. Taking this a step further, we can identify conserved pathways between species using network alignments. Similar to sequence alignments, network alignments must account for gaps and mismatches corresponding to the loss or gain of a protein and dissimilar proteins, respectively, in the alignment [62–65]. In this manner, it is possible to assign functionality by transferring knowledge to conserved protein groups across species or to make interaction predictions based on sequence similarity and the concurrence of interaction partners. Using the protein interaction networks of three species, S. cerevisiae, C. elegans and D. melanogaster, interactions can be predicted with relatively high accuracy, resulting in a 40% to 52% success rate upon experimental validation. In this manner Sharan et al.  could predict and experimentally validate the interaction between the non-ATPase subunit 2 (Nas2p) and the regulatory particle triphosphatase 4 (Rpt4p) proteins in S. cerevisiae. One-to-many orthology relationships and lost or gained interactions add to the complexity of aligning conserved protein groups that modern approaches have overcome. For an example of a network alignment of the S. cerevisiae, C. elegans and D. melanogaster protein interaction networks, see Figure 3c.
To observe evolutionary change, one has to look beyond conserved elements and examine divergent network features. The increase in the number and completeness of biological networks has now enabled researchers to study the differences between biological networks of species in more detail, highlighting the heterogeneous nature of interaction networks within the cell [67–69]. However, network incompleteness remains a concern-either false negatives or interactions that were never queried may lead to erroneous conclusions about changes between networks in two species. To study global properties of diverged interactions while minimizing the effect of varying network coverage for different species, early network rewiring studies used the interaction network of a single species to derive changing rates of interaction by comparing paralogs, which are proteins derived from genes that are related by a gene duplication event. This strategy has been used in experimental studies to characterize the functional divergence of transcription factors. Therefore, such studies can perform the same assay for different paralogous genes within the same species, thereby facilitating the ability to compare the elucidated interactions directly [70, 71]. Generating more complete biological networks can be attained by computational methods using experimentally determined binding specificity maps. However, these computational methods are currently restricted to proteins with domains that bind linear motifs such as SH3, kinase and transcription-factor domains [68, 72, 73]. Experimental identification of all interaction partners for the same protein orthologs across multiple species has been performed, but only for few proteins, such as the Ste12p and Tec1p transcription factors in three closely related yeast species , and the CEBPA and HNF4A transcription factors across five and three vertebrates, respectively (both sets include human and mouse) . Such work provides a high resolution network, enabling one to delineate the evolutionary mechanisms contributing to the formation of the network. Future technologies capable of interrogating the interaction network in a high-throughput manner for multiple species will enable more comprehensive and reliable network characterization across species.
Determining the network rewiring rates for interaction networks enables one to select appropriate species based on their divergence distance when performing comparative network analyses. For example, a recent evolutionary study showed that the kinase-substrate phosphorylation and PPI networks rewired at different rates in S. cerevisiae and H. sapiens (who shared a common ancestor 1 billion years ago) , the former at 2.2 × 10-5 interaction changes per protein pair per million years and the latter at 1.1 × 10-6 interaction changes per protein pair per million years . When they considered only network rewiring events involving the loss of an interaction using S. cerevisiae as the reference, 100% (0/4,068) and 98.5% (30,247/30,695) of the kinase-substrate phosphorylation and PPI networks were lost, respectively , indicating that the use of highly divergent species for global system comparisons is likely to be inappropriate, especially for kinase-substrate phosphorylation networks.
Neutral changes play an important role in molecular evolution. Kimura's neutral theory of molecular evolution postulates that most genomic mutations are neutral and do not affect the fitness of an organism . In network evolution, changes to neutral interactions can be thought of in a similar way. Neutral interactions are non-functional interactions within a network and their existence can provide an explanation for the poor interaction conservation found across species . Such neutral interactions and their associated network rewirings are especially thought to be common in kinase-substrate interactions, suggesting that only a fraction of kinase-substrate interactions are functional [67, 78]. Likewise, spurious DNA-binding sites found in recent transcription-factor-binding studies with no apparent functional role are also likely to evolve in an unconstrained manner , resulting in an apparent rate of network rewiring faster than that found within the kinase-substrate interaction network . This phenomenon is analogous to neutral changes in sequence evolution; these occur at a faster rate than negatively selected changes. Finding the rate of neutral interactions across different networks will help identify the extent to which these interactions shape networks and the evolutionary constraints they define compared with those applied by selective pressures [77, 80].
To obtain more accurate network rewiring rates, subsequent studies utilized multiple species' interaction maps of greater coverage [69, 82], as the rates based on paralogs within the S. cerevisiae PPI network used only a small subset of the interactions . Beltrao and Serrano  determined the interaction rewiring rate for the PPI networks for four species: S. cerevisiae, C. elegans, D. melanogaster and H. sapiens. Because these species are highly divergent from each other (sharing a common ancestor over 1 billion years ago) , a direct comparison results in a small overlap between these PPI networks . Since many of the PPI networks were elucidated using high-throughput methods with relatively high false-negative and false-positive rates, the observed small overlap is likely to be due to a combination of poor coverage and data quality . In a similar vein to earlier work by Wagner , paralogs were used to estimate the network rewiring rates indirectly within the PPI network of each species by using a number of closely related species to establish orthology relationships to identify recently duplicated genes. Using this approach, the divergence time to a reference species was less than 100 million years and estimated that the network rewiring rate (including rates of both interaction gain and interaction loss) was in the order of 1 × 10-5 interactions per protein pair per million years ; this is ten times faster than the rate found by Wagner . The difference in the estimated rates shows the importance of selecting the appropriate species when performing network comparisons, because of the dependence between rates and divergence distance between the compared species.
PPIs span a range of different interaction types from the binding of large globular domains  to the binding of domain-peptide-mediated interactions of SH3 domains . Different types of interactions evolve at different rates, probably due to a number of biophysical differences in the protein interaction, such as the amount of protein surface participating in the interaction. Interactions between two globular proteins or between members of stable complexes often involve large interaction surface areas and tend to be conserved throughout evolution. This is mainly due to the large number of mutations needed to abolish this interaction . Conversely, in the case of peptide mediated interactions, the interaction surface provided by the binding motif is small: between three and ten amino acids in length . These short regions of interaction binding render it relatively dynamic, since a single or few mutations will abolish the interaction. Similarly, only a few mutations are required for the formation of a new binding site . The rapid rate of binding site loss can be observed in the experimentally determined transcription factor-target binding sites of specific transcription factors [68, 90–92] was highlighted in the comparison between yeast species, where up to 80% of the target binding sites are lost over about 300 million years of divergence [68, 91]. However, using closely related species sharing a common ancestor 10 million years ago reveals that only 1% to 5% of the binding sites change [92, 93]. Changes in specificity and target binding sites are not the only means by which interactions may diverge. Alternatively, changes to protein localization, expression timing and novel inhibitors can also change interaction binding, either permanently or temporarily.
Different types of protein interactions have been associated with specific biological processes. For example, domain-peptide interactions are common in regulatory networks, such as in cell signaling , transcription factor-target regulation and kinase-substrate phosphorylation , which have been the focus of current studies. One would expect that these biological networks would rewire at a faster rate than the general PPI network, and this appears indeed to be the case [67, 69]. Unfortunately, a direct comparison between network rewiring rates is difficult due to the different reference species used in the different studies. Owing to low coverage of current networks, no authoritative answer can be given at this point as to the exact degree to which these biological networks differ in terms of network rewiring rates. Nevertheless, indirect methods can be applied to compare the network rewiring between multiple interaction networks from different reference species. Shou and colleagues  fitted a linear model to each biological network using different species as a reference, enabling them to extrapolate the rate of interaction rewiring to a predetermined divergence distance chosen to perform the comparison. Ordering the network rewiring rates for biological networks from the fastest to slowest after 800 million years of divergence, they found the following order: transcription factor-target > phosphorylation-substrate > genetic interactions ~ PPIs > metabolic pathway . While the network rewiring rates for each biological network is likely to be underestimated due to the inability to observe multiple rewiring events since the last common ancestor, such a method is capable of elucidating the network ordering. It is tempting to speculate that this ordering suggests that regulatory networks of transcription factor-target and phosphorylation-substrate networks rapidly reorganize to adapt to selective constraints while maintaining a core network that coordinates basic cellular functionality with regulatory inputs.
While theoretical models have driven initial advances in network evolution, ultimately, more comparative cross-species data are needed. Although technological advances have enabled early biological network comparisons, the availability and coverage of biological interaction network data remain as obstacles. The varying degrees of coverage between interaction networks of different species make it difficult to perform direct network comparisons. Several approaches have been developed to overcome this limitation, thus enabling network comparisons on a limited scale. Technologies permitting the elucidation of complete interaction networks for multiple species will allow the precise evolutionary mechanisms to be revealed. Despite the current lack of complete interaction networks, network alignment methods have been developed to compare the interaction networks of multiple species. Ultimately, such comparisons will lead to more sophisticated evolutionary models that will enable the reconstruction of putative ancestral networks. Both conserved and divergent elements of biological network evolution have been identified, providing the raw material upon which evolution may act upon to form the observed biological interaction networks. Conserved network elements have provided insights into the essentiality of particular topological arrangements, such as the importance of hub proteins for cell viability. Analyses into network rewiring have revealed a wide range of rewiring rates between different biological networks, providing the stepping stones for future network evolutionary models. Such models might enable identification of evolutionary events that are under selective pressure, analogous to models in molecular evolution. Ultimately, they will enable a mesoscale view of evolution and, as such, provide a link between molecular evolution and evolution at the whole organism level.
duplication and divergence
mitogen-activated protein kinase
sucrose non-fermenting 1.
The authors acknowledge support from the Natural Sciences and Engineering Research Council of Canada (NSERC); PMK received funding from an NSERC Discovery Grant (#386671) and MGS from a PGS Fellowship (PGSD3-410270-2011).