The elusive yeast interactome
Genome Biology volume 7, Article number: 223 (2006)
Simple eukaryotic cells such as yeast could contain around 800 protein complexes, as two new comprehensive studies show. But slightly different approaches resulted in surprising differences between the two datasets, showing that more work is required to get a complete picture of the yeast interactome.
Protein complexes are the workhorses of the cell as they are involved in almost all biological processes from transmembrane signaling to gene expression. Only a few are really well understood in terms of structure and function, however, and many appear to be involved in processes we do not know much about. In two independent recent papers [1, 2], groups from the European Molecular Biology Laboratory (EMBL), Cellzome (a spin-off company from EMBL), and the University of Toronto have published comprehensive surveys of all the protein complexes detected in yeast - the yeast interactome or 'complexome' as one might now call it (Table 1; see also [3, 4]). This is a landmark achievement, given that no other cell or organism has been surveyed at such a level of detail. More important, yeast is a prototypic eukaryotic cell that is a model for human cells, and most yeast complexes probably have homologs in humans.
From proteome to complexome
The characterization of protein complexes sounds trivial: insert a piece of DNA encoding a 'tag' into a protein-coding gene and let the cells express the tagged protein (the 'bait'). Then break up the cell and 'pull out' the tagged protein with all its associated proteins (the 'preys') by some technique such as co-immunoprecipitation or tandem affinity purification (TAP). Finally, identify all the proteins in the purified complex by mass spectrometry . Then repeat this procedure for all the protein-coding genes in the yeast genome.
This is exactly what the two teams did [1, 2]. But although the identification of protein complexes sounds easy, it is not. Complications arise, for example, when proteins belonging to the same complex are tagged and the resulting complexes are purified. In most cases this leads to conflicting information, because these purifications have slightly different protein compositions, depending on which protein was the tagged one (Figure 1 and Table 2). Different 'complexes' are recovered even when the same tagged protein is purified repeatedly. For example, Gavin et al.  repeated 139 of their purifications (99 with soluble and 40 with membrane proteins), and as foreshadowed in their previous pilot study  only 69% of the recovered proteins were common to both purifications. The pull-down approach is thus fairly reproducible but does have a significant error margin. In addition, many proteins are part of several different complexes: one bait protein may thus pull down several independent complexes that appear in the experiment to be one large complex.
Although the strategy is similar, there are a number of differences between the approaches taken by Gavin et al.  and Krogan et al. . First, the protocols were not identical. Second, only Gavin et al. attempted to tag all transmem-brane proteins. Third, Gavin et al. provide raw purification data whereas Krogan et al. provide only computationally processed information at the time of writing: for example, the latter removed 44 preys detected in more than 3% of purifications and nearly all ribosomal proteins. These proteins were considered as nonspecific contaminants and thus as false positives. In contrast, such nonspecific contaminants were left in the raw dataset of Gavin et al. and only later removed (or not) when they determined their final list of complexes (see Figure 1 and below). Both groups aimed at the same goal: to unravel all the protein complexes in yeast. Using similar technology they should have got the same results, despite certain differences in method. But did they? As we shall see, not quite.
To distill defined complexes from their raw purification data, both first transformed their raw data into weighted binary interactions. While Krogan et al.  used a machine learning algorithm trained by hand-curated protein complexes, Gavin et al.  invented a new measure, solely based on raw purification data, which they called the 'socio-affinity index'. In the next step, cluster algorithms were used to determine distinct complexes. Using an iterative clustering procedure, Gavin et al.  came up with the classification outlined in Figure 1e. The first class is defined as 'cores'; these are sets of proteins that are present in most purifications of a complex, no matter which protein is tagged; they consisted on average of around three proteins, but ranged from one to 23 proteins. Altogether, Gavin et al.  found 491 complexes in yeast and an equivalent number of cores. In fact, they estimated that there may be up to 800 core machines in yeast. The second class comprises proteins often found together but not always with the same cores; such groups were called 'modules'. Gavin and colleagues identified 147 modules, of which 87 were mutually exclusive. Of the 87 modules, 31 appear to be related to differences in subcellular location, and might thus specify subtle differences in function. Most modules consisted of two or three proteins. Finally, a large number of proteins appear to be more or less loosely associated with cores and modules; these so-called 'attachments' may not always be essential for complex formation and may often represent modulators of the function of a protein complex. Interestingly, modules tend to be even more conserved, or share the same function and localization, than are cores. Attachments often do not share a common function or localization although they appear to be well conserved.
These are not the first studies to get to grips with the yeast complexome. In the previous study from the EMBL and Cellzome authors , 1,739 proteins were tagged with TAP tags and the associated proteins analyzed. In another study, Ho et al.  tagged 725 proteins with the eight amino-acid FLAG epitope and purified the associated complexes. These datasets represent only subsets of the yeast proteome, however, and are only partially overlapping. For example, only 94 baits were common to both screens. Both groups also used quite different protocols for their analysis. Not surprisingly, the resulting complexes looked very different. On average, the number of proteins common to corresponding purifications was less than 9% of the total number of proteins in both datasets . The degree of reproducibility was thus rather disappointing, even though it could be explained by the different protocols.
With the much more comparable procedures and comprehensive datasets from the two new studies, we can compare their results more rationally. Both groups tagged the vast majority of all yeast proteins, although only a third of these were ultimately purified, namely 1,993 in Gavin et al.  and 2,357 in Krogan et al.  (Table 1). While this does not sound a lot, most of these purifications co-purified with at least one interacting protein (namely 1,754 out of 1,993 attempts in ; no such number was given in ). Altogether, about 2,700 unique proteins were reliably identified this way by each group, corresponding to about 60% of the yeast proteome.
Gavin et al.  found 73% of the complexes that have been documented in the Munich Information Center for Protein Sequence (MIPS) database  (217 complexes) and the literature (62 complexes not in MIPS). Thus the study was comprehensive, but also missed many complexes. In fact, the authors mention that they have not found 74 complexes that have been reported in the literature. This may be due to technical limitations (for example, when membrane-associated complexes were involved) or to biological reasons (for example, because complexes form only under conditions not tested). On the other hand, 257 of the 491 complexes were entirely novel and only 20 of those known previously had no novel component in this study .
How do the two screens compare?
The two datasets cannot in fact be compared easily because of different data formats and computational methods used to infer complexes from raw purification data. Gavin et al.  provide a list of baits and co-purifying preys, whereas Krogan et al.  do not show their raw purification data (instead, they provide four lists of interactions computationally generated from raw data). Both groups condense their raw purification data into one list of 'complexes' - in each case it is important to remember that these complexes do not necessarily correspond to real physical entities, but rather to perceived complexes (see Figures 1 and 2).
In the following discussion we will consider only these two lists of derived complexes. It is impossible to say which is of 'better quality' until the two raw datasets are systematically compared to thoroughly studied individual complexes (which will then serve as 'gold standards'). Also, both groups have applied various computational strategies to weed out false positives from their final complexes, which in turn affects the size of the complexes: the more stringent the weeding the fewer false positives there are, but the resulting complex may also have lost some biologically relevant proteins. That said, each group identified parameters that appear to represent a reasonable balance between removal of false positives and loss of real positives.
The 491 complexes found by Gavin et al.  comprise 1,483 proteins (including modules and attachments) or 23% of the yeast proteome, while the 547 complexes found by Krogan et al.  contain 2,702 proteins or 42% of the yeast proteome. When both datasets are combined they add up to 3,033 proteins or 47% of the yeast proteome. Interestingly, the intersection of both datasets contains only 1,152 proteins (18%). Given this overlap, it is a reasonable assumption that there are 800 to 900 complexes in yeast.
Only six complexes are identical between the two datasets. Remarkably, 132 cores (27.62%) from the study of Gavin et al.  are completely contained in 115 complexes (21.02%) from the study by Krogan et al. , with an average overlap of 2.64 proteins. We found 188 complexes in  that do not share a single subunit with any complex found in ; by contrast, there are only 20 complexes in  which do not share any subunits with any of the complexes in . A comparison of the two datasets is shown in Figure 3. Although our initial comparisons provide reasonable evidence that the two datasets are quite different, both groups need to run their own algorithm on the dataset of their competitor and see if they retrieve the same lists of complexes as with their own raw data. This would allow a comparison not only of the derived complexes but also of the underlying algorithms.
Comparison of protein purification and yeast two-hybrid data
It is difficult enough to compare the two datasets of complexes in  and , but it is even more difficult to compare them with protein-interaction datasets obtained with other methods. After complex purification, the most common procedure for identifying protein interactions is the yeast two-hybrid system , which discovers binary interactions but not complexes. Ideally, a two-hybrid screen using all the proteins of a complex would yield all the binary interactions within that complex, but this is rarely the case (Figure 2). In most cases, only a few interactions are discovered. On the other hand, the two-hybrid system often picks up weak interactions that are lost during complex purification because of the necessary washing steps. Thus, the data generated by protein-complex purification and two-hybrid analysis overlap even less than datasets obtained using the same method.
Comparison is also limited by the fact that no two-hybrid screen has been done in yeast that is as comprehensive as the protein-complex purification studies in  and . Although the two-hybrid screens by Ito et al.  and by one of us (P.U.) and colleagues  claim to be comprehensive, they were by no means saturated. In fact, we estimate that only about 20% of the yeast genome has been used as baits and exhaustively screened by two-hybrid methods. In addition, two-hybrid screens suffer from a similar problem as protein-complex purifications: only about half of all screens yield reproducible interactions ( and C. Ester, R. Häuser, T. Kuhn, C. Müller, S.V. Rajagopala, B. Titz, P.U. and K. Wohlbold, unpublished observations). For example, we found only 19 complexes in the dataset in  and 40 in that in  in which all proteins had previously been screened productively in two-hybrid screens. Most of these complexes are small, containing only two to five proteins. An example is shown in Figure 2.
Two-hybrid screens clearly do yield quite different interactions from protein-complex purifications. Given the very different nature of the methods this is hardly surprising. In fact, Aloy and Russell  have shown that protein purifications tend to pick up stable interactions whereas two-hybrid screens have a certain preference for transient interactions. It will be interesting to see how strong these trends are when truly quantitative and structural data become available. We have not compared the studies in  and  with other large-scale datasets such as genetic synthetic lethal screens, but such analyses will certainly be published shortly. For further comparisons with two-hybrid datasets or protein array data we will need more complete data. Comprehensive datasets using protein  or peptide arrays  are not available for yeast, but it is clear that they will also yield different results .
What remains to be done?
Gavin et al.  and Krogan et al.  have provided us with a glimpse of what the yeast complexome looks like in a mixture of happily growing cells. This is only half the truth. In nature, yeast is mostly starving and exposed to a variety of environmental conditions from heat to cold and wet to dry. We know that many physiological processes adapt with dramatic changes to such different growth conditions, and protein interactions reflect that. It would be exciting to see how the interactome reacts to such environmental factors, but such studies require much extra effort. Not only the interactome is subject to environmental influences: gene expression, signal transduction and metabolism are all affected as well. Given that at least several thousand proteins appear to be phosphorylated and dephosphorylated in yeast , we begin to sense how complex even simple cells must be.
Comparative studies tell us that each analytic method only provides part of the truth. Although there are comprehensive datasets for purified complexes, there are only partial data for two-hybrid interactions and we have not even started to seriously apply protein arrays or structural genomics to the whole proteome or interactome of yeast. Let us not even think about more complex organisms.
Even assuming all those datasets had been collected under all conditions for all proteins and other compounds in a cell, and that we even knew how those molecules behave in space and time. Do we understand the cell? Not unless we can represent this plethora of information in computer-readable databases and information systems that can been understood by humans. Only if we manage to solve these informational problems as well as the technological ones will we be doing systems biology.
Gavin AC, Aloy P, Grandi P, Krause R, Boesche M, Marzioch M, Rau C, Jensen LJ, Bastuck S, Dümpelfeld B, et al: Proteome survey reveals modularity of the yeast cell machinery. Nature. 2006, 440: 631-636. 10.1038/nature04532.
Krogan NJ, Cagney G, Yu H, Zhong G, Guo X, Ignatchenko A, Li J, Pu S, Datta N, Tikuisis AP, et al: Global landscape of protein complexes in the yeast Saccharomyces cerevisiae. Nature. 2006, 440: 637-643. 10.1038/nature04670.
Protein complexes in yeast. [http://yeast-complexes.embl.de]
The TAP project. [http://tap.med.utoronto.ca]
Aebersold R, Mann M: Mass spectrometry-based proteomics. Nature. 2003, 422: 198-207. 10.1038/nature01511.
Gavin AC, Bosche M, Krause R, Grandi P, Marzioch M, Bauer A, Schultz J, Rick JM, Michon AM, Cruciat CM, et al: Functional organization of the yeast proteome by systematic analysis of protein complexes. Nature. 2002, 415: 141-147. 10.1038/415141a.
Ho Y, Gruhler A, Heilbut A, Bader GD, Moore L, Adams SL, Millar A, Taylor P, Bennett K, Boutilier K, et al: Systematic identification of protein complexes in Saccharomyces cerevisiae by mass spectrometry. Nature. 2002, 415: 180-183. 10.1038/415180a.
Cornell M, Paton NW, Oliver SG: A critical and integrated view of the yeast interactome. Comp Funct Genom. 2004, 5: 382-402. 10.1002/cfg.412.
Munich Information Center on Protein Sequences (MIPS). [http://mips.gsf.de]
Fields S, Song O: A novel genetic system to detect protein-protein interactions. Nature. 1989, 340: 245-246. 10.1038/340245a0.
Ito T, Chiba T, Ozawa R, Yoshida M, Hattori M, Sakaki Y: A comprehensive two-hybrid analysis to explore the yeast protein interactome. Proc Natl Acad Sci USA. 2001, 98: 4569-4574. 10.1073/pnas.061034498.
Uetz P, Giot L, Cagney G, Mansfield TA, Judson RS, Knight JR, Lockshon D, Narayan V, Srinivasan M, Pochart P, et al: A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae. Nature. 2000, 403: 623-627. 10.1038/35001009.
Aloy P, Russell RB: The third dimension for protein interactions and complexes. Trends Biochem Sci. 2002, 27: 633-638. 10.1016/S0968-0004(02)02204-1.
Zhu H, Bilgin M, Bangham R, Hall D, Casamayor A, Bertone P, Lan N, Jansen R, Bidlingmaier S, Houfek T, et al: Global analysis of protein activities using proteome chips. Science. 2001, 293: 2101-2105. 10.1126/science.1062191.
Jones RB, Gordus A, Krall JA, MacBeath G: A quantitative protein interaction network for the ErbB receptors using protein microarrays. Nature. 2005, 439: 168-174. 10.1038/nature04177.
Uetz P, Stagljar I: The interactome of human EGF/ErbB receptors. Mol Systems Biol. 2006, 2: E1-E2. 10.1038/msb4100048. doi 10.103814100048
Ptacek J, Devgan G, Michaud G, Zhu H, Zhu X, Fasolo J, Guo H, Jona G, Breitkreutz A, Sopko R, et al: Global analysis of protein phosphorylation in yeast. Nature. 2005, 438: 679-684. 10.1038/nature04187.
Chen XP, Yin H, Huffaker TC: The yeast spindle pole body component Spc72p interacts with Stu2p and is required for proper microtubule assembly. J Cell Biol. 1998, 141: 1169-1179. 10.1083/jcb.141.5.1169.
Knop M, Pereira G, Geissler S, Grein K, Schiebel E: The spindle pole body component Spc97p interacts with the gamma-tubulin of Saccharomyces cerevisiae and functions in micro-tubule organization and spindle pole body duplication. EMBO J. 1997, 16: 1550-1564. 10.1093/emboj/16.7.1550.
Knop M, Schiebel E: Spc98p and Spc97p of the yeast gamma-tubulin complex mediate binding to the spindle pole body via their interaction with Spc110p. EMBO J. 1997, 16: 6985-6995. 10.1093/emboj/16.23.6985.
Protein complexes in yeast - a comparative look at different datasets. [http://uetz.fzk.de/yeast-complexes]
We thank Patrick Aloy, Rob Russell, and Anne-Claude Gavin for comments on an earlier version of the manuscript.
About this article
Cite this article
Goll, J., Uetz, P. The elusive yeast interactome. Genome Biol 7, 223 (2006). https://doi.org/10.1186/gb-2006-7-6-223
- Binary Interaction
- Comprehensive Dataset
- European Molecular Biology Laboratory
- Tandem Affinity Purification
- Yeast Proteome