- Open Access
Expansion of the human mitochondrial proteome by intra- and inter-compartmental protein duplication
Genome Biologyvolume 10, Article number: R135 (2009)
Mitochondria are highly complex, membrane-enclosed organelles that are essential to the eukaryotic cell. The experimental elucidation of organellar proteomes combined with the sequencing of complete genomes allows us to trace the evolution of the mitochondrial proteome.
We present a systematic analysis of the evolution of mitochondria via gene duplication in the human lineage. The most common duplications are intra-mitochondrial, in which the ancestral gene and the daughter genes encode mitochondrial proteins. These duplications significantly expanded carbohydrate metabolism, the protein import machinery and the calcium regulation of mitochondrial activity. The second most prevalent duplication, inter-compartmental, extended the catalytic as well as the RNA processing repertoire by the novel mitochondrial localization of the protein encoded by one of the daughter genes. Evaluation of the phylogenetic distribution of N-terminal targeting signals suggests a prompt gain of the novel localization after inter-compartmental duplication. Relocalized duplicates are more often expressed in a tissue-specific manner relative to intra-mitochondrial duplicates and mitochondrial proteins in general. In a number of cases, inter-compartmental duplications can be observed in parallel in yeast and human lineages leading to the convergent evolution of subcellular compartments.
One-to-one human-yeast orthologs are typically restricted to their ancestral subcellular localization. Gene duplication relaxes this constraint on the cellular location, allowing nascent proteins to be relocalized to other compartments. We estimate that the mitochondrial proteome expanded at least 50% since the common ancestor of human and yeast.
Mitochondria, next to their widely recognized function in respiration and ATP production, also play a role in key cellular processes such as lipid metabolism, synthesis of steroid hormones, regulation of apoptosis  and calcium signaling . Instrumental to mitochondrial function is the proteome of the organelle, consisting of an estimated 1,500 proteins in human . Recently, owing to advanced proteomics techniques, major progress has been made in elucidating the content of the mammalian mitochondrial proteome. The integration of many types of experimental data and computational predictions resulted in a list of mitochondrial proteins approaching saturation, with a reasonably small false discovery rate of 10% . At the same time analyses of the list of proteins revealed that only a minor fraction of the present day mitochondrial proteome, less than 20%, shows convincing evidence of having originated from the alpha-proteobacterial ancestor [5–7]. This brings the origin of the large majority of mitochondrial proteins into question and suggests that other cellular compartments may have been a source for new mitochondrial proteins. We can examine this hypothesis by comparing organellar proteomes between species.
Detailed, large-scale studies of the inter-species evolution of subcellular localization have begun only recently and have shown conservation between Schizosaccharomyces pombe and Saccharomyces cerevisiae . There are a number of specific discoveries that indicate that present-day localizations for mitochondrial enzymes and complete pathways do not necessarily reflect their evolutionary origin and there is evidence for the relocalization of multiple metabolic pathways between subcellular compartments. For example, a citrate synthase has been relocalized from mitochondria to the peroxisome in S. cerevisiae , and most of the proteins that were derived from the ancestor of the mitochondria are not mitochondrial in present day species . It has been observed that Frataxin and Isu1P, which are involved in the iron-sulfur cluster assembly in mitochondria, are localized mainly in the cytosol of the microsporidian species Trachipleistophora hominis . After the whole genome duplication event in the ancestor of S. cerevisiae a great majority of duplicated genes were purged from the genome . Of those retained, at least 25% functionally diversified via a localization change, altering their amino acid composition, interaction partners and level of expression . But what are the quantitative trends in the evolution of mitochondria in the lineage leading to human?
The composition of the human and mammalian mitochondrial proteome has received great attention in the past years [13–17]. Most recently, probabilistic integrative strategies, which are less plagued with false discoveries specific to any single approach, have allowed the estimation of the mammalian mitochondrial proteome at a level nearing saturation . Next to the human mitochondrion, a wealth of data is available specifically on the localization of mitochondrial proteins in various species: S. cerevisiae [18, 19], Arabidopsis thaliana  and Tetrahymena thermophila . More than 500 proteins have been found in the mitochondria of the ciliate T. thermophila and the estimate for yeast reaches approximately 1,000 proteins . The mammalian mitochondrion is larger still and leads to the question: which biological processes and molecular functions of proteins were introduced to the organelle? Furthermore, how and when were these integrated? We examine the evolutionary history of gene families that contain mitochondrial proteins to answer these questions.
The phylogenomic evidence indicates that the mitochondrial proteome expanded not only by duplications of mitochondrial proteins, but also by relocalizations of paralogs to the organelle, when a copy of a non-mitochondrial protein became targeted to the mitochondrion. We also found that the dates of the appearance of mitochondrial targeting signals indicate that the relocalization of proteins followed promptly after gene duplication.
Human nuclear-encoded mitochondrial proteins were collected from MitoCarta, the state-of-the-art compendium for the mammalian mitochondrial proteome, created using a combination of experimental identification, bioinformatics analysis, and literature curation . The mitochondrial proteome of S. cerevisiae, containing published experimental data [18, 22–24] was obtained from the MitoP2 database  together with the most comprehensive yeast mitochondrial proteome dataset to date . For the dataset of non-mitochondrial proteins required for our analysis, we used proteins known to localize to 1 of 24 other subcellular compartments (see Materials and methods for details).
Conservation of mitochondrial localization among one-to-one orthologs
We first ask to what extent mitochondrial localization is conserved between man and yeast for unambiguous one-to-one orthologs that have not been duplicated since the common ancestor of the two species. Mitochondrial localization appears to be very well conserved, with a few notable exceptions. From 143 one-to-one orthologous pairs localized to mitochondria in either of the two species, we find that 124 proteins (87%) are found in this organelle in both species and only 19 proteins localize to mitochondria in one species, but not the other (13%; Table S1 in Additional data file 1). Of the 19 differentially localized proteins, 17 are localized to mitochondria in human and not in yeast, with experimental evidence supporting the localization for all but one protein (Table S1 in Additional data file 1). The two remaining yeast proteins (SEN2 and DNM1), unlike the 17 human mitochondrial proteins, do not enter the yeast mitochondrion, but instead attach to the outer membrane [26, 27]. We can infer the ancestral localization of the human mitochondrial proteins by using the A. thaliana mitochondrial proteome. Of all 143 unambiguous human-yeast orthologs, 27 proteins were found in plant mitochondria in a liquid chromatography-tandem mass spectrometry study , a number that includes only 1 of the 19 differentially localized proteins. With this lack of corroborated mitochondrial localization in the outgroup species, we propose that a gain of mitochondrial localization in the human lineage, rather than a loss in the yeast lineage, has been the main cause of this disparate localization.
A search for a discernible functional coherence among the retargeted proteins revealed the relocalization of a multi-protein functional module in human. Three enzymes participating in ornithine metabolism can be found in mitochondria in human and ureotelic mammals, but not in yeast: OTC, CPSase I and P5CS. Of these, OTC and CPSase I are part of the urea cycle whose evolutionary relocalization has been reported extensively [28, 29].
At least 8 of the 17 proteins relocalized in human were concomitantly found in other subcellular compartments of the mammalian cell as indicated in the published literature based on small-scale experiments (Table S2 in Additional data file 1). It should therefore be noted that complete relocalizations to the mitochondria that also involve the loss of the ancestral localization are even more rare than proteins that gain mitochondrial localization without the loss of the ancestral one. Apparently, a protein tends to gain a novel localization without losing the ancestral subcellular localization - for example, by adding a mitochondrial targeting signal to one of its isoforms, as in the case of dUTP pyrophosphatase (DUT) and peroxiredoxin-5 [30, 31]. Although interesting in themselves, these observations emphasize that relocalizations of products of single copy genes between subcellular compartments are rare and limited to a relatively small set of cellular functions.
Increase of the human mitochondrial proteome via intra-mitochondrial protein duplication
Investigations of the subcellular localization of one-to-one orthologs do not explain the expansion of the mitochondrial proteome. We therefore examined the evolutionary history of duplicated genes containing mitochondrial paralogs. We analyzed eukaryotic gene trees reconciled with the species phylogeny to identify gene duplications that followed the divergence of human and yeast (see Materials and methods for details). We observed two prevailing ways in which gene duplications contributed to the expansion of the metazoan mitochondrial proteome (Table 1). In the first mode, 65 duplications of nuclear genes encoding mitochondrial proteins gave rise to a set of 118 mitochondrial proteins, with up to four proteins per family as in the case of pyruvate dehydrogenases or ADP/ATP translocases (see Table S3 in Additional data file 1 for the list of proteins). With all human paralogs and the yeast ortholog localized to mitochondria, the ancestral protein was most likely targeted to this organelle as well, which is confirmed by the presence of approximately 50% orthologous proteins in plant mitochondria in the study . Figure 1 shows the specific cellular functions performed by intra-mitochondrial protein duplications. A Gene Ontology (GO) analysis reveals enrichment of proteins involved in carbohydrate metabolism ([GO:5975], P < 2e-4) and various components of transport ([GO:6810], P < 6e-4, amino acid transport, ion transport and protein transport complexes embedded in the inner and outer membranes). Additionally, 11 out of 23 calcium ion binding proteins [GO:5509] originate from intra-mitochondrial duplications (P < 7e-4; see Table S5 in Additional data file 1 for the list of all categories). These functional gene classes are significantly overrepresented relative to the composition of the whole mitochondrial proteome, and therefore reflect a specific characteristic of intra-mitochondrial duplications.
Increase of the human mitochondrial proteome via inter-compartmental protein duplication
The second most common type of duplication associated with increasing the mitochondrial proteome is characterized by human mitochondrial proteins with a human non-mitochondrial paralog (Table 1; Table S6 in Additional data file 1). For those gene families that have a non-mitochondrial ortholog in yeast, the most parsimonious scenario suggests a non-mitochondrial localization in the common ancestor of human and yeast, and a subsequent gain of mitochondrial localization. We hypothesized that these proteins can constitute gains of mitochondrial localization in the human lineage. To validate this hypothesis, we inspected the localization of plant orthologs of inter-compartmental duplications, identifying only two mitochondrial proteins among 29 orthologs in A. thaliana. This suggests that the majority of mitochondrial proteins with a non-mitochondrial paralog were ancestrally non-mitochondrial and represent gains of mitochondrial localization in the lineage leading to human. A detailed GO analysis of the entire set of inter-compartmental duplications reveals enrichment among biological processes responsible for molecular functions, such as cofactor binding (P < 2e-3, [GO:48037]), intramolecular oxidoreductase (P < 5e-3, [GO:16863]), ceramide kinase (P < 4e-4, [GO:1729]), catalytic activity in general (P < 2e-3, [GO:3824]), but also the process of 12S rRNA methylation (P < 4e-3, [GO:154]; Table S7 in Additional data file 1) necessary for the stability of the small ribosomal subunit .
The assumption that we can use the non-mitochondrial localization in yeast as a proxy for the ancestral localization enables us to recognize protein retargeting events between mitochondria and other subcellular compartments, including the nucleus (8 out of 29 proteins; Table S8 in Additional data file 1), peroxisome (6 out of 29) and endoplasmic reticulum (5 out of 29 proteins). Four of the six peroxisomal relocalization events encode proteins responsible for fatty acid beta-oxidation in yeast (PCD1, ECI1, DCI1, POX1) and their duplicated orthologs are found in human mitochondria.
Relocalized proteins often originate from ancient, pre-metazoan duplications
Using phylogenetic trees of genes that encode the modern human mitochondrion, we inferred the timing of duplications (see Materials and methods). Around 80% of duplications are equally divided between two evolutionary stages: before the divergence of bilateria and before the divergence of vertebrates (Figure 2). Intra-mitochondrial gene duplications were found to be representative of the general duplication trends across the whole genome (no statistical difference with the genome-wide duplication trend, P > 0.6 Fisher exact test). By contrast, the duplications associated with relocalizations to the mitochondria happened predominantly in the earlier stage of evolution, before the divergence of bilateria. At this evolutionary time point they significantly exceed the genome-wide fraction of duplications (P < 0.003). Following the massive duplication events before the radiation of vertebrates (the 2R hypothesis [33, 34]; although alternative hypotheses exist ), mitochondrial protein content continued to evolve as exemplified by the recent duplication of glutamate dehydrogenase . And even though the reference mitochondrial proteome used in this study is derived from mouse tissues, and therefore the accurate protein localization data for primate-specific duplications is limited, we encountered 16 gene duplications of mitochondrial proteins in primates (Table S11 in Additional data file 1).
Relocalizations promptly follow duplications
An unmentioned assumption in the analysis of inter-compartmental protein duplications is that the protein relocalization followed shortly after the gene duplication. Even though the pre-sequence mitochondrial import pathway is only one of four presently recognized means of protein import (reviewed in ), many mitochondrial proteins contain a short, amino-terminal localization sequence that is indicative of this pathway. This sequence feature is amenable to computational methods . For proteins imported to the mitochondria via the pre-sequence pathway, the gain of a novel localization may be caused by the acquisition of an amino-terminal targeting signal. Indeed, when examining all proteins with a novel mitochondrial localization, a potential mitochondrial targeting signal can be identified in 50% of the proteins, five times more often than in their non-mitochondrial human paralogs (P < 0.00005, Fisher exact test). Assuming that in these proteins the targeting signal is responsible for the mitochondrial localization, we examined whether its appearance in evolution coincides with the gene duplication, and thus whether the duplication was concomitant with a gain of mitochondrial localization.
Among human mitochondrial proteins with a non-mitochondrial paralog we find 12 proteins with a recognizable short, amino-terminal targeting sequence. Despite the limitations of computational targeting sequence prediction (for example, ) in 9 out of the 12 gene families the phylogenetic analysis indicates that the mitochondrial targeting signal was gained in the same era as, or shortly after, the gene duplication (Table 2).
Tissue-specific expression of novel mitochondrial proteins
Using mass spectrometry total peak intensity data available for 14 different mouse tissues , we performed quantitative analysis of tissue-specific protein expression by counting the number of tissues in which the protein was detected (specifically, the number of tissues with log10 peak intensities of at least 7). A typical mitochondrial protein is abundantly expressed and detectable in 12 (median value) out of 14 tissues (Table S12 in Additional data file 1). Only proteins that underwent inter-compartmental duplications are expressed in significantly fewer tissues (median 5; P < 0.01 using a two-sided Wilcoxon rank sum test performed pairwise with other datasets). These novel mitochondrial proteins (proteins that possess a non-mitochondrial paralog and a non-mitochondrial yeast ortholog) more often exhibit a tissue specific expression pattern with 45% expressed in three tissues or fewer (compared to the mitochondrial average of 23%), and are more rarely widely expressed (in more than 10 tissues; 28% novel mitochondrial proteins compared to 55% on average) (Figure 3).
Subcellular differentiation via independent gene duplications
While tracing the history of duplications that extend the mitochondrial proteome, one can imagine, in the most drastic scenario, that independent duplications in unrelated lineages with subsequent parallel relocalizations to mitochondria could lead to a convergent evolution in the mitochondrial protein content. Several paralogs present this unusual evolutionary pattern (Table 3). For example, branched-chain-amino-acid aminotransferase underwent duplication at the root of vertebrates, in addition to an independent event in yeast as a result of whole genome duplication. In both species one copy is targeted to the mitochondria (BCAT2 in human), the other is cytosolic (BCAT1). In the case of this gene family, the analysis of distant orthologs for the presence/absence of the targeting signal sheds light on the likely ancestral localization. Using MitoProt II  and TargetP  the signal can be detected in the fly sequence as well as Leishmania major orthologs, suggesting that the ancestral BCAT protein was part of the mitochondrial proteome in the ancestor of human and yeast (Figure 4).
The growth of the mitochondrial proteome by gene duplication
Knowing the homology of proteins with a determined localization in human and yeast, we reconstructed the (partial) protein complement of mitochondria of the common ancestor of human and yeast, comprising circa 200 proteins in total. Starting with this ancestral proteome, we counted 128 duplications of mitochondrial proteins in the human lineage, including intra-mitochondrial duplications and proteins novel to the mitochondria (relocalizations following the duplication of non-mitochondrial proteins). As not all types of evolutionary events allow us to easily infer the ancestral localization, this puts a lower bound on the protein count, concluding that the metazoan mitochondrion in the human lineage expanded by 64% (128 out of 200) by means of gene duplication and relocalization since the evolutionary split with the yeast lineage (see Materials and methods for details). These counts are likely to be an underestimate of a real mitochondrial proteome expansion, as we disregard proteins without recognizable orthologs in S. cerevisiae that appeared in the metazoan lineage.
Discussion and conclusions
Our investigation reveals a dynamic mitochondrial proteome and paints a picture of a eukaryotic organelle with a functional repertoire evolving by gene duplication. In the absence of gene duplication, we find little room for functional diversification of the mitochondrial proteome by relocalization of proteins. The subcellular localization of proteins that did not duplicate since the divergence of human and yeast is almost always conserved in evolution, with a few notable exceptions. In the presence of duplication events the mitochondrion expanded via two major modes. In the first, more conservative mode, intra-mitochondrial duplications expanded the mitochondrial proteome by duplication of proteins that were already localized to mitochondria. In the second and a more radical mode of proteome growth, inter-compartmental duplications expanded the metazoan and human mitochondrial proteome by the duplication of non-mitochondrial proteins and redirecting the newly arisen gene products to the mitochondria.
The two modes of proteome expansion comprise different functional protein classes. Duplications of genes responsible for carbohydrate metabolism, calcium ion binding and various forms of transport appear to be specific to intra-mitochondrial protein duplications, whereas cofactor binding, intramolecular oxidoreductases, ceramide kinase and rRNA methylation functions are more often associated with duplicates that have novel mitochondrial localization.
Intra-mitochondrial duplications that expanded the repertoire of transport proteins are exemplified by two duplications of TIMM8A/B and TIMM17A/B proteins. Expression of both paralogs leads to distinct variants of the intermembrane complexes TIMM8-TIMM13 and Tim23 embedded in the inner membrane [40–42] (Additional data file 1). The Pyruvate dehydrogenase (PDH) complex, which participates in carbohydrate metabolism (a functional class significantly enriched among intra-mitochondrial duplications), underwent intra-mitochondrial duplications at various points in evolution (E1-beta subunit duplicated before the divergence of bilateria; E2 subunit duplicated before the divergence of chordates; E1-alpha subunit duplicated before the divergence of eutheria). The duplication pattern of post-translational regulators of the PDH complex differs from that of the complex itself. The inactivating phosphorylation of the PDH complex is carried out by four paralogs of PDH kinase, and all duplication events occurred before the divergence of the vertebrates. Prior to the catalytic activation, PDH must be dephosphorylated by one of the two paralogous proteins: PDP1 (PPM2C) and PDP2. PDP1, in contrast to its paralog, is activated by calcium ions and, therefore, might mediate the effects of calcium-mobilizing hormones . It is difficult to establish the evolutionary origin of a domain responsible for the binding with Ca2+, as the binding site is created upon the formation of a complex with the E2 subunit of the PDH complex and requires the lipoyl groups of E2 . Nevertheless, the calcium-dependence of PDP1 is consistent with a trend present in mitochondrial proteins. We identify duplications of Ca2+-binding mitochondrial solute carriers , as well as proteins responsible for calcium-sensitive mitochondrial trafficking along microtubules [46, 47]. Overall, 11 out of 23 of the calcium ion binding proteins originate from intra-mitochondrial duplications that occurred at the root of vertebrates (P < 7e-4, [GO:5509]).
In general, it appears that the regulation of cellular complexes is more evolutionarily recent than the complexes they control. That the duplications of the PDH complex occurred before the vertebral duplications of their regulators, kinases and phosphatases, is not a unique case. Also, the soluble mitochondrial matrix deacetylase SIRT3 has a relatively recent origin, and was shown to augment Complex I activity by binding with the 39 kDa subunit of Complex I, NDUFA9 . It is known that the growth of many mitochondrial protein complexes occurred early in evolution, with mitochondrial Complex I and the mitochondrial ribosome expanding significantly at the root of eukaryotes [49–51]. Interestingly, regulators of activity of the complexes via phosphorylation and dephosphorylation (as for PDH) or deacetylation (Complex I) did not appear concomitantly in evolution and were not adapted from existing regulators, but emerged long after the metazoan diversification.
When analyzing duplications of proteins that expanded the mitochondrial proteome, it would be interesting to know the selective forces driving duplication events. We show that the novel mitochondrial localization that is detectable at the sequence level has been gained rapidly after the duplication event. On the one hand, we know that only a small fraction of duplicated genes is retained in the genome in the long term, and this holds also for large-scale genomic events such as whole genome duplication . On the other hand, the acquisition of an amino-terminal targeting signal coinciding with the gene duplication event could provide the rationale for the retention of the duplicated gene. As the change of localization alters the role of a protein in the cell, it could be accompanied by further functional diversification. This diversification may be extensive, even for relatively recent duplications, as in the case of HTRA2 protease (Table 2). The membrane-bound HTRA2, unlike its secreted paralogs, promotes or induces cell apoptosis through caspase-dependent and -independent pathways  and its loss of function mutations cause neurodegeneration and Parkinson's disease .
Analysis of the timing of duplication events reveals that the majority of inter-compartmental duplications occurred further back in time than the genomic trend would suggest and that they contributed little to the expansion of the mitochondrial proteome in the vertebrate lineage. The fact that most inter-compartmental duplications occurred before animals diverged suggests that cellular differentiation is partly responsible for inter-compartmental duplications. We propose that the inter-compartmental duplicated proteins could have helped to satisfy the variable energy demands that emerging metazoan tissues presented. There is some anecdotal evidence that could support this hypothesis. For example, the pattern of tissue-specific expression of TOP1MT (Table 2) has adapted to meet the requirements for higher mitochondrial activity in specific organs - for example, skeletal muscle, heart, and brain . Additionally, we observed that inter-compartmental duplications/relocalizations are characterized by a more narrow, tissue-specific expression than average mitochondrial proteins (see Table S12 and Figure S2 in Additional data file 1).
Our quantitative results of the evolution of the mitochondrial proteome match anecdotal evidence for the role of inter-compartmental duplications in the expansion of the proteomes of other eukaryotic organelles. Some pathways and key enzymes were known to have duplicated between plastids and other cellular compartments , as observed in the case of sulfate assimilation and cysteine biosynthesis found in the chloroplasts, cytosol and mitochondria of plants . In addition, the evolutionary history of 12 Calvin cycle enzymes shows that plant proteins encoded by the nucleus have relocalized to alternative compartments, regardless of their origin, cyanobacterial or otherwise .
With 87% of mitochondrial proteins preserving their ancestral compartment between human and yeast, a gene duplication event appears to be a necessary prerequisite to release the localization constraint, allowing nascent proteins to be retargeted to distinct compartments. We therefore conclude that non-mitochondrial protein duplications followed by the gain of a novel mitochondrial localization comprise a qualitatively and quantitatively important mode of expansion of the mitochondrial proteome.
Materials and methods
Mammalian nuclear-encoded mitochondrial proteins were downloaded from MitoCarta, the state-of-the-art compendium of the human mitochondrial proteome established using combination of experimental identification, bioinformatic analysis, and literature curation . We mapped 1,001 human orthologous proteins onto Ensembl identifiers using human-mouse ortholog lists from Ensembl v44 (April 2007)  and Mouse Genome Database . For yeast, to assure specificity of its mitochondrial proteome, a reference set was downloaded from the MitoP2 database . This set of 545 proteins contains published experimental data based on various studies [18, 22–24] and was subsequently manually curated. To exclude non-confirmed mitochondrial proteins, for which a mitochondrial localization was only predicted or derived from early high-throughput studies, we also required mitochondrial proteins to be present among 851 proteins from the most comprehensive dataset of the yeast mitochondrial proteome to date . The proteomes selected as described assure few false positive proteins, but do not completely cover mitochondrial protein content. Because of the incomplete coverage, the absence of evidence for mitochondrial localization cannot be taken as evidence for the absence of mitochondrial localization. For the non-mitochondrial proteins set, only proteins localized to other eukaryotic subcellular compartments were taken into account. This included proteins explicitly assigned to 24 non-mitochondrial compartments as annotated in GO of human genes (see Table S10 in Additional data file 1 for the full list of the compartments), analogous to the non-mitochondrial reference dataset from .
Gene trees of mitochondrial proteins
To take into account the evolutionary history of every protein, including gene losses and duplications, we performed analysis of individual gene trees reconciled with the species phylogeny, as provided by the Ensembl team . The phylogenomic Ensembl pipeline provides a dataset of gene trees across multiple species, constructed using both dS, dN (substitution rates), nucleotide and protein distance measures . These data, together with the standard species tree, informs the gene tree construction performed by the TreeBeST program  (L Heng, AJ Vilella, E Birney, R Durbin, in preparation). First, all protein coding genes are queried using WUBLASTP against the whole protein database. Subsequently, a graph of proteins is constructed, with edges created for best reciprocal hits or when score(P1, P2)/max(score(P1, P1), score(P2, P2)) >0.33. Connected components of the graph are extracted and aligned subsequently with MUSCLE . The back-translated multiple alignment is passed to the tree constructing program, TreeBeST, together with the species tree for the reconciliation and the duplication calls on internal nodes, as the coverage of genomes in the Ensembl database provides topologically based timings in order to label duplication events . All human gene trees with a mitochondrial gene product (mitochondrial proteins in either human or yeast) were downloaded from Ensembl database v44 . When integrating datasets from human and yeast for 50% human genes and 46% yeast proteins, we did not detect homologs in the other species, representing a likely gene loss or gain in one of these lineages.
Unambiguous one-to-one orthologs between human and yeast
The trees for gene families were separated at the speciation branches into opisthokont orthogroups and the number of paralogs in human and yeast lineages was counted. One-to-one unambiguous orthologs were represented by trees with a single gene in both lineages.
For each gene family of n genes, we infer n-1 duplications, each duplication corresponding to an internal tree node. The dating of the duplication was inferred from the analysis of the tree topology, as annotated by the Ensembl team. We use rooted trees of homologous genes, where branching points are labeled with the inferred time of duplication. For example, a gene tree ((GeneA, GeneB):Euteleostomi,(GeneC, GeneD):Euteleostomi):Chordata yields a single chordate duplication that is followed by two vertebrate duplications. For the inter-compartmental duplication a divergence time of a mitochondrial and a closest non-mitochondrial paralog was inferred from the internal node giving rise to the duplication. To asses the quality of gene duplication calls, we used the duplication consistency score . The score measures the intersection of the number of species post-duplication over the union; one expects that most duplications should have the gene persisting in an equally likely manner in subsequent lineages . All of the three duplication datasets (intra-mitochondrial, inter-compartmental or duplications outside mitochondria) had similar, high consistency scores, with median values of 0.85, 0.86, 0.85, respectively (Figure S1 in Additional data file 1). The datasets tested with two-sided Wilcoxon rank sum test do not exhibit statistically significant differences (P-value > 0.65).
Of the differentially localized one-to-one orthologs, we find 17 proteins localized to mitochondria only in human and 16 of these are either reference mitochondrial proteins known from the literature or were experimentally verified in the Pagliarini et al. study . For families with gene duplications and differentially localized human paralogs, localization was predicted computationally for only three mitochondrial proteins, with the remaining proteins validated experimentally in the Pagliarini et al. study by either green fluorescent protein marker (4 proteins), proteomics approaches (7 proteins) or being part of a mammalian mitochondrial reference set based on the literature curation (15 proteins).
Of the one-to-one human-yeast orthologs, 104 possess an ortholog in plants (determined using the homologene database  and 27 were found in mitochondria in Heazlewood et al. . With regard to intra-mitochondrial duplications, 47 plant orthologs were found, 23 of which are in the mitochondria.
Estimation of the expansion of the mitochondrial proteome
We identified 122 unambiguous one-to-one nuclear encoded gene products with a reliable mitochondrial localization in human and yeast (Table S1 in Additional data file 1), with 17 differentially localized orthologs likely to be mitochondrial gains in the human lineage (see Results). Genes that underwent duplications originated from at least 66 ancestral opisthokont genes (for which we can find at least one protein from the family in mitochondria of both human and yeast; family counts are 53 + 8 + 4 + 1 from Table S4 in Additional data file 1, with each family stemming from a single ancestral gene), or 78 if we add families with uncertain common ancestry (mitochondrial only in human; an additional 12 families). This, together with one-to-one orthologs, gives 188 to 200 ancestral proteins. Given the present human mitochondrial protein compendium, restricted to proteins with an ortholog in yeast with a known localization, we arrive at 128 to 140 mitochondrial acquisitions in the human lineage. Given 188 to 200 ancestral mitochondrial proteins and 128 to 140 gains in the metazoan evolutionary branch, we estimate an expansion of the mitochondrial proteome between 64% (128/200) and 74% (140/188).
Dating mitochondrial relocalization
For the prediction of the amino-terminal targeting signal in the protein sequences, Target P was used  for all known isoforms of a given gene. It is important to mention that the pre-sequence analysis programs do not use homology to known mitochondrial proteins or mitochondria-specific domains as an indicator of presence/absence of targeting signal.
Gene Ontology analysis
Additional data files
The following additional data are available with the online version of this paper: supplementary text, Tables S1-S12, and Figures S1 and S2 (Additional data file 1).
Green DR, Reed JC: Mitochondria and apoptosis. Science. 1998, 281: 1309-1312. 10.1126/science.281.5381.1309.
Berridge MJ, Lipp P, Bootman MD: The versatility and universality of calcium signalling. Nat Rev Mol Cell Biol. 2000, 1: 11-21. 10.1038/35036035.
Meisinger C, Sickmann A, Pfanner N: The mitochondrial proteome: from inventory to function. Cell. 2008, 134: 22-24. 10.1016/j.cell.2008.06.043.
Pagliarini DJ, Calvo SE, Chang B, Sheth SA, Vafai SB, Ong S, Walford GA, Sugiana C, Boneh A, Chen WK, Hill DE, Vidal M, Evans JG, Thorburn DR, Carr SA, Mootha VK: A mitochondrial protein compendium elucidates complex I disease biology. Cell. 2008, 134: 112-123. 10.1016/j.cell.2008.06.016.
Karlberg O, Canbäck B, Kurland CG, Andersson SG: The dual origin of the yeast mitochondrial proteome. Yeast. 2000, 17: 170-187. 10.1002/1097-0061(20000930)17:3<170::AID-YEA25>3.0.CO;2-V.
Gabaldón T, Huynen MA: Reconstruction of the proto-mitochondrial metabolism. Science. 2003, 301: 609-10.1126/science.1085463.
Gabaldón T, Huynen MA: From endosymbiont to host-controlled organelle: the hijacking of mitochondrial protein synthesis and metabolism. PLoS Comput Biol. 2007, 3: e219-10.1371/journal.pcbi.0030219.
Matsuyama A, Arai R, Yashiroda Y, Shirai A, Kamata A, Sekido S, Kobayashi Y, Hashimoto A, Hamamoto M, Hiraoka Y, Horinouchi S, Yoshida M: ORFeome cloning and global analysis of protein localization in the fission yeast Schizosaccharomyces pombe. Nat Biotechnol. 2006, 24: 841-847. 10.1038/nbt1222.
Gabaldón T, Snel B, van Zimmeren F, Hemrika W, Tabak H, Huynen MA: Origin and evolution of the peroxisomal proteome. Biol Direct. 2006, 1: 8-10.1186/1745-6150-1-8.
Goldberg AV, Molik S, Tsaousis AD, Neumann K, Kuhnke G, Delbac F, Vivares CP, Hirt RP, Lill R, Embley TM: Localization and functionality of microsporidian iron-sulphur cluster assembly proteins. Nature. 2008, 452: 624-628. 10.1038/nature06606.
Wolfe KH, Shields DC: Molecular evidence for an ancient duplication of the entire yeast genome. Nature. 1997, 387: 708-713. 10.1038/42711.
Marques A, Vinckenbosch N, Brawand D, Kaessmann H: Functional diversification of duplicate genes through subcellular adaptation of encoded proteins. Genome Biol. 2008, 9: R54-10.1186/gb-2008-9-3-r54.
Taylor SW, Fahy E, Zhang B, Glenn GM, Warnock DE, Wiley S, Murphy AN, Gaucher SP, Capaldi RA, Gibson BW, Ghosh SS: Characterization of the human heart mitochondrial proteome. Nat Biotechnol. 2003, 21: 281-286. 10.1038/nbt793.
Forner F, Foster LJ, Campanaro S, Valle G, Mann M: Quantitative proteomic comparison of rat mitochondria from muscle, heart, and liver. Mol Cell Proteomics. 2006, 5: 608-619.
Johnson DT, Harris RA, French S, Blair PV, You J, Bemis KG, Wang M, Balaban RS: Tissue heterogeneity of the mammalian mitochondrial proteome. Am J Physiol Cell Physiol. 2007, 292: C689-697. 10.1152/ajpcell.00108.2006.
Foster LJ, de Hoog CL, Zhang Y, Zhang Y, Xie X, Mootha VK, Mann M: A mammalian organelle map by protein correlation profiling. Cell. 2006, 125: 187-199. 10.1016/j.cell.2006.03.022.
Kislinger T, Cox B, Kannan A, Chung C, Hu P, Ignatchenko A, Scott MS, Gramolini AO, Morris Q, Hallett MT, Rossant J, Hughes TR, Frey B, Emili A: Global survey of organ and organelle protein expression in mouse: combined proteomic and transcriptomic profiling. Cell. 2006, 125: 173-186. 10.1016/j.cell.2006.01.044.
Sickmann A, Reinders J, Wagner Y, Joppich C, Zahedi R, Meyer HE, Schönfisch B, Perschil I, Chacinska A, Guiard B, Rehling P, Pfanner N, Meisinger C: The proteome of Saccharomyces cerevisiae mitochondria. Proc Natl Acad Sci USA. 2003, 100: 13207-13212. 10.1073/pnas.2135385100.
Reinders J, Zahedi RP, Pfanner N, Meisinger C, Sickmann A: Toward the complete yeast mitochondrial proteome: multidimensional separation techniques for mitochondrial proteomics. J Proteome Res. 2006, 5: 1543-1554. 10.1021/pr050477f.
Heazlewood JL, Tonti-Filippini JS, Gout AM, Day DA, Whelan J, Millar AH: Experimental analysis of the Arabidopsis mitochondrial proteome highlights signaling and regulatory components, provides assessment of targeting prediction programs, and indicates plant-specific mitochondrial proteins. Plant Cell. 2004, 16: 241-256. 10.1105/tpc.016055.
Smith DGS, Gawryluk RMR, Spencer DF, Pearlman RE, Siu KWM, Gray MW: Exploring the mitochondrial proteome of the ciliate protozoon Tetrahymena thermophila: direct analysis by tandem mass spectrometry. J Mol Biol. 2007, 374: 837-863. 10.1016/j.jmb.2007.09.051.
Pflieger D, Le Caer J, Lemaire C, Bernard BA, Dujardin G, Rossier J: Systematic identification of mitochondrial proteins by LC-MS/MS. Anal Chem. 2002, 74: 2400-2406. 10.1021/ac011295h.
Ohlmeier S, Kastaniotis AJ, Hiltunen JK, Bergmann U: The yeast mitochondrial proteome, a study of fermentative and respiratory growth. J Biol Chem. 2004, 279: 3956-3979. 10.1074/jbc.M310160200.
Prokisch H, Scharfe C, Camp DG, Xiao W, David L, Andreoli C, Monroe ME, Moore RJ, Gritsenko MA, Kozany C, Hixson KK, Mottaz HM, Zischka H, Ueffing M, Herman ZS, Davis RW, Meitinger T, Oefner PJ, Smith RD, Steinmetz LM: Integrative analysis of the mitochondrial proteome in yeast. PLoS Biol. 2004, 2: e160-10.1371/journal.pbio.0020160.
Prokisch H, Andreoli C, Ahting U, Heiss K, Ruepp A, Scharfe C, Meitinger T: MitoP2: the mitochondrial proteome database--now including mouse data. Nucleic Acids Res. 2006, 34: D705-711. 10.1093/nar/gkj127.
Yoshihisa T, Ohshima C, Yunoki-Esaki K, Endo T: Cytoplasmic splicing of tRNA in Saccharomyces cerevisiae. Genes Cells. 2007, 12: 285-297. 10.1111/j.1365-2443.2007.01056.x.
Otsuga D, Keegan BR, Brisch E, Thatcher JW, Hermann GJ, Bleazard W, Shaw JM: The dynamin-related GTPase, Dnm1p, controls mitochondrial morphology in yeast. J Cell Biol. 1998, 143: 333-349. 10.1083/jcb.143.2.333.
Casey CA, Anderson PM: Submitochondrial localization of arginase and other enzymes associated with urea synthesis and nitrogen metabolism, in liver of Squalus acanthias. Comp Biochem Physiol B. 1985, 82: 307-315. 10.1016/0305-0491(85)90246-9.
Walsh : Subcellular localization and biochemical properties of the enzymes of carbamoyl phosphate and urea synthesis in the batrachoidid fishes Opsanus beta, Opsanus tau and Porichthys notatus. J Exp Biol. 1995, 198: 755-766.
Ladner RD, McNulty DE, Carr SA, Roberts GD, Caradonna SJ: Characterization of distinct nuclear and mitochondrial forms of human deoxyuridine triphosphate nucleotidohydrolase. J Biol Chem. 1996, 271: 7745-7751. 10.1074/jbc.271.13.7745.
Knoops B, Clippe A, Bogard C, Arsalane K, Wattiez R, Hermans C, Duconseille E, Falmagne P, Bernard A: Cloning and characterization of AOEB166, a novel mammalian antioxidant enzyme of the peroxiredoxin family. J Biol Chem. 1999, 274: 30451-30458. 10.1074/jbc.274.43.30451.
Metodiev MD, Lesko N, Park CB, Cámara Y, Shi Y, Wibom R, Hultenby K, Gustafsson CM, Larsson N: Methylation of 12S rRNA is necessary for in vivo stability of the small subunit of the mammalian mitochondrial ribosome. Cell Metab. 2009, 9: 386-397. 10.1016/j.cmet.2009.03.001.
Lundin LG: Evolution of the vertebrate genome as reflected in paralogous chromosomal regions in man and the house mouse. Genomics. 1993, 16: 1-19. 10.1006/geno.1993.1133.
Dehal P, Boore JL: Two rounds of whole genome duplication in the ancestral vertebrate. PLoS Biol. 2005, 3: e314-10.1371/journal.pbio.0030314.
Hughes AL, Friedman R: 2R or not 2R: testing hypotheses of genome duplication in early vertebrates. J Struct Funct Genomics. 2003, 3: 85-93. 10.1023/A:1022681600462.
Rosso L, Marques AC, Reichert AS, Kaessmann H: Mitochondrial targeting adaptation of the hominoid-specific glutamate dehydrogenase driven by positive Darwinian selection. PLoS Genet. 2008, 4: e1000150-10.1371/journal.pgen.1000150.
Bolender N, Sickmann A, Wagner R, Meisinger C, Pfanner N: Multiple pathways for sorting mitochondrial precursor proteins. EMBO Rep. 2008, 9: 42-49. 10.1038/sj.embor.7401126.
Emanuelsson O, von Heijne G: Prediction of organellar targeting signals. Biochim Biophys Acta. 2001, 1541: 114-119. 10.1016/S0167-4889(01)00145-8.
Claros MG, Vincens P: Computational method to predict mitochondrially imported proteins and their targeting sequences. Eur J Biochem. 1996, 241: 779-786. 10.1111/j.1432-1033.1996.00779.x.
Bauer MF, Gempel K, Reichert AS, Rappold GA, Lichtner P, Gerbitz KD, Neupert W, Brunner M, Hofmann S: Genetic and structural characterization of the human mitochondrial inner membrane translocase. J Mol Biol. 1999, 289: 69-82. 10.1006/jmbi.1999.2751.
Bömer U, Rassow J, Zufall N, Pfanner N, Meijer M, Maarse AC: The preprotein translocase of the inner mitochondrial membrane: evolutionary conservation of targeting and assembly of Tim17. J Mol Biol. 1996, 262: 389-395. 10.1006/jmbi.1996.0522.
Ewing RM, Chu P, Elisma F, Li H, Taylor P, Climie S, McBroom-Cerajewski L, Robinson MD, O'Connor L, Li M, Taylor R, Dharsee M, Ho Y, Heilbut A, Moore L, Zhang S, Ornatsky O, Bukhman YV, Ethier M, Sheng Y, Vasilescu J, Abu-Farha M, Lambert J, Duewel HS, Stewart II, Kuehl B, Hogue K, Colwill K, Gladwish K, Muskat B, et al: Large-scale mapping of human protein-protein interactions by mass spectrometry. Mol Syst Biol. 2007, 3: 89-10.1038/msb4100134.
Huang B, Gudi R, Wu P, Harris RA, Hamilton J, Popov KM: Isoenzymes of pyruvate dehydrogenase phosphatase. DNA-derived amino acid sequences, expression, and regulation. J Biol Chem. 1998, 273: 17680-17689. 10.1074/jbc.273.28.17680.
Turkan A, Gong X, Peng T, Roche TE: Structural requirements within the lipoyl domain for the Ca2+-dependent binding and activation of pyruvate dehydrogenase phosphatase isoform 1 or its catalytic subunit. J Biol Chem. 2002, 277: 14976-14985. 10.1074/jbc.M108434200.
del Arco A, Satrústegui J: Identification of a novel human subfamily of mitochondrial carriers with calcium-binding domains. J Biol Chem. 2004, 279: 24701-24713. 10.1074/jbc.M401417200.
Saotome M, Safiulina D, Szabadkai G, Das S, Fransson A, Aspenstrom P, Rizzuto R, Hajnóczky G: Bidirectional Ca2+-dependent control of mitochondrial dynamics by the Miro GTPase. Proc Natl Acad Sci USA. 2008, 105: 20728-20733. 10.1073/pnas.0808953105.
Macaskill AF, Rinholm JE, Twelvetrees AE, Arancibia-Carcamo IL, Muir J, Fransson A, Aspenstrom P, Attwell D, Kittler JT: Miro1 is a calcium sensor for glutamate receptor-dependent localization of mitochondria at synapses. Neuron. 2009, 61: 541-555. 10.1016/j.neuron.2009.01.030.
Ahn B, Kim H, Song S, Lee IH, Liu J, Vassilopoulos A, Deng C, Finkel T: A role for the mitochondrial deacetylase Sirt3 in regulating energy homeostasis. Proc Natl Acad Sci USA. 2008, 105: 14447-14452. 10.1073/pnas.0803790105.
Gabaldón T, Rainey D, Huynen MA: Tracing the evolution of a large protein complex in the eukaryotes, NADH:ubiquinone oxidoreductase (Complex I). J Mol Biol. 2005, 348: 857-870. 10.1016/j.jmb.2005.02.067.
Huynen MA, de Hollander M, Szklarczyk R: Mitochondrial proteome evolution and genetic disease. Biochim Biophys Acta. 2009, 1792: 1122-1129.
Smits P, Smeitink JAM, Heuvel van den LP, Huynen MA, Ettema TJG: Reconstructing the evolution of the mitochondrial ribosomal proteome. Nucleic Acids Res. 2007, 35: 4686-4703. 10.1093/nar/gkm441.
Nadeau JH, Sankoff D: Comparable rates of gene loss and functional divergence after genome duplications early in vertebrate evolution. Genetics. 1997, 147: 1259-1266.
Vande Walle L, Lamkanfi M, Vandenabeele P: The mitochondrial serine protease HtrA2/Omi: an overview. Cell Death Differ. 2008, 15: 453-460. 10.1038/sj.cdd.4402291.
Strauss KM, Martins LM, Plun-Favreau H, Marx FP, Kautzmann S, Berg D, Gasser T, Wszolek Z, Müller T, Bornemann A, Wolburg H, Downward J, Riess O, Schulz JB, Krüger R: Loss of function mutations in the gene encoding Omi/HtrA2 in Parkinson's disease. Hum Mol Genet. 2005, 14: 2099-2111. 10.1093/hmg/ddi215.
Zhang H, Barceló JM, Lee B, Kohlhagen G, Zimonjic DB, Popescu NC, Pommier Y: Human mitochondrial topoisomerase I. Proc Natl Acad Sci USA. 2001, 98: 10608-10613. 10.1073/pnas.191321998.
Lunn JE: Compartmentation in plant metabolism. J Exp Bot. 2007, 58: 35-47. 10.1093/jxb/erl134.
Lunn JE, Droux M, Martin J, Douce R: Localization of ATP Sulfurylase and O-Acetylserine(thiol)lyase in Spinach Leaves. Plant Physiol. 1990, 94: 1345-1352. 10.1104/pp.94.3.1345.
Martin W, Schnarrenberger C: The evolution of the Calvin cycle from prokaryotic to eukaryotic chromosomes: a case study of functional redundancy in ancient pathways through endosymbiosis. Curr Genet. 1997, 32: 1-18. 10.1007/s002940050241.
Hubbard TJP, Aken BL, Beal K, Ballester B, Caccamo M, Chen Y, Clarke L, Coates G, Cunningham F, Cutts T, Down T, Dyer SC, Fitzgerald S, Fernandez-Banet J, Graf S, Haider S, Hammond M, Herrero J, Holland R, Howe K, Howe K, Johnson N, Kahari A, Keefe D, Kokocinski F, Kulesha E, Lawson D, Longden I, Melsopp C, Megy K, et al: Ensembl 2007. Nucleic Acids Res. 2007, 35: D610-617. 10.1093/nar/gkl996.
Bult CJ, Eppig JT, Kadin JA, Richardson JE, Blake JA: The Mouse Genome Database (MGD): mouse biology and model systems. Nucleic Acids Res. 2008, 36: D724-728. 10.1093/nar/gkm961.
Perocchi F, Jensen LJ, Gagneur J, Ahting U, von Mering C, Bork P, Prokisch H, Steinmetz LM: Assessing systems properties of yeast mitochondria through an interaction map of the organelle. PLoS Genet. 2006, 2: e170-10.1371/journal.pgen.0020170.
Calvo S, Jain M, Xie X, Sheth SA, Chang B, Goldberger OA, Spinazzola A, Zeviani M, Carr SA, Mootha VK: Systematic identification of human mitochondrial disease genes through integrative genomics. Nat Genet. 2006, 38: 576-582. 10.1038/ng1776.
Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: Complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009, 19: 327-335. 10.1101/gr.073585.107.
TreeSoft: Softwares for Phylogenetic Trees. [http://treesoft.sourceforge.net/treebest.shtml]
Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797. 10.1093/nar/gkh340.
Wheeler DL, Barrett T, Benson DA, Bryant SH, Canese K, Chetvernin V, Church DM, DiCuccio M, Edgar R, Federhen S, Geer LY, Kapustin Y, Khovayko O, Landsman D, Lipman DJ, Madden TL, Maglott DR, Ostell J, Miller V, Pruitt KD, Schuler GD, Sequeira E, Sherry ST, Sirotkin K, Souvorov A, Starchenko G, Tatusov RL, Tatusova TA, Wagner L, Yaschenko E: Database resources of the National Center for Biotechnology Information. Nucleic Acids Res. 2007, 35: D5-12. 10.1093/nar/gkl1031.
Emanuelsson O, Nielsen H, Brunak S, von Heijne G: Predicting subcellular localization of proteins based on their N-terminal amino acid sequence. J Mol Biol. 2000, 300: 1005-1016. 10.1006/jmbi.2000.3903.
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet. 2000, 25: 25-29. 10.1038/75556.
Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005, 21: 3448-3449. 10.1093/bioinformatics/bti551.
Dereeper A, Guignon V, Blanc G, Audic S, Buffet S, Chevenet F, Dufayard J, Guindon S, Lefort V, Lescot M, Claverie J, Gascuel O: Phylogeny.fr: robust phylogenetic analysis for the non-specialist. Nucleic Acids Res. 2008, 36: W465-469. 10.1093/nar/gkn180.
Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res. 1994, 22: 4673-4680. 10.1093/nar/22.22.4673.
We thank the Ensembl team, including A Vilella and B Overduin for helping us with tree analysis, I Duarte, U Kudla, and T Cuypers for stimulating discussions, and J Parmley for the critical reading of the manuscript. We also thank anonymous reviewers for suggestions. This work was supported by the Netherlands Genomics Initiative (Horizon Programme).
RS and MH conceived the study. RS carried out the analysis and wrote the manuscript. All authors read and approved the final manuscript.