Comparison of the oxidative phosphorylation (OXPHOS) nuclear genes in the genomes of Drosophila melanogaster, Drosophila pseudoobscura and Anopheles gambiae

An analysis of nuclear-encoded oxidative phosphorylation genes in Drosophila and Anopheles reveals that pairs of duplicated genes have strikingly different expression patterns.


Background
The accessibility of whole-genome sequence data for several organisms, together with the development of efficient computer-based search tools, has revolutionized modern biology, allowing in-depth comparative analysis of genomes [1][2][3][4]. In many cases, comparisons among species at various levels of divergence have helped to define protein-coding genes, recognize nonfunctional genes, and find regulatory sequences and other functional elements in the genome. When applied to a set of genes correlated by function and/or subcellular localization of their products, intra-and interspecies comparative analyses can be especially efficient tools to obtain information on the functional constraints acting on the evolution of the gene set and on the mechanisms regulating its coordinate expression.
A set of genes present in all eukaryotic genomes and expected to be subject to peculiar evolutionary constraints is represented by the genes involved in oxidative phosphorylation (OXPHOS), the primary energy-producing process in all aerobic organisms [5]. To generate cellular ATP, OXPHOS uses the products of both nuclear and mitochondrial genes, organized in five large complexes embedded in the lipid bilayer of the inner mitochondrial membrane. Except for complex II, which is formed by four proteins encoded by nuclear genes, the other respiratory complexes depend on both mitochondrial and nuclear genomes; so, assembling the OXPHOS complexes and fine tuning their activity to satisfy cell-and tissuespecific energy demands requires specialized regulatory mechanisms and evolutionary strategies to optimize the cross-talk between the two genomes and ensure the coordinated expression of their relevant products.
Analysis of co-regulated mitochondrial and nuclear genes, and of the transcription factors regulating the functional network they constitute, might also be a useful approach to investigate the origin of mitochondrial dysfunction in humans. Disorders of mitochondrial oxidative phosphorylation are now recognized as the most common inborn errors of metabolism, affecting at least one in 5,000 newborn children [6]. In this context, the expanding spectrum of identified mitochondrial proteins provides an opportunity to test a whole new range of candidate genes whose mutations may be responsible for common human diseases. For example, a recent study by Mootha et al. [7] suggests a promising strategy for clarifying the molecular etiology of mitochondrial pathologies by profiling the tissue-specific expression pattern of candidate mitochondrial proteins.
Despite the long evolutionary divergence time, many key pathways that control development and physiology are conserved between Drosophila and humans, and about 70% of the genes associated with human disease have direct counterparts in the Drosophila genome [8,9]. For example, the potential role of Drosophila as a model system for understanding the molecular mechanisms involved in human genetic disease is validated by the recent identification of a Drosophila mutation causing a necrotic phenotype that mimics in detail the diseases that arise from serpin mutations in humans [10].
It has been suggested that comparisons between D. melanogaster and other species of the genus Drosophila could provide a model system for developing and testing new algorithms and strategies for the functional annotation of complex genomes [3]. To obtain new information on the evolution of a set of genes that control a basic biological function by encoding products targeted to a specific cellular compartment, we have performed a comparative analysis of the OXPHOS genes of D. melanogaster and D. pseudoobscura; the complete genome of the latter was recently made available by the Baylor Human Genome Sequencing Center. These two species are the only species of the Drosophila genus for which whole-genome sequence data exist at present [11][12][13]. We also took advantage of the complete sequence of the A. gambiae genome [14] to compare the Drosophila OXPHOS genes with those of this more distantly related dipteran (the divergence time between D. melanogaster and A. gambiae is thought to be approximately 250 million years, as compared to 46 million years between D. melanogaster and D. pseudoobscura [15,16]). Although extensive reshuffling within and between chromosomal regions is known to have occurred since the divergence of Anopheles from Drosophila [4,17,18], we show that in these organisms the conservation of the OXPHOS genes is still sufficient to permit their meaningful comparison.
Here we report the identification of 78 D. pseudoobscura and 78 A. gambiae genes representing the counterparts of D. melanogaster OXPHOS genes which, in turn, were previously identified as putative orthologs of human OXPHOS genes [19]. We have annotated these genes, taking into account conservation in amino-acid sequence, intron-exon structure, intron length, and the presence of duplications in the genome. The conservation of genomic organization and evidence from evolutionary trees based on sequence similarity suggest that these genes are one-to-one orthologs in the three species, and that in many cases they originated (produced?) duplicates by transpositional and/or recombinational events during evolution. We have identified in the three dipteran genomes a total of 47 genes that probably originated by duplication of the above-mentioned genes, and we show that the duplicate gene has usually acquired a pattern of expression strikingly different from that of the gene from which it derived. Moreover, when the comparison is possible, the gene duplicate almost always shows a strongly testis-biased expression, in contrast to the soma-biased expression of its parent gene.

Identification and comparative annotation of D. pseudoobscura and A. gambiae OXPHOS genes
We have previously reported [19] the identification of 285 D. melanogaster nuclear genes encoding mitochondrial proteins that represent the counterparts of human peptides annotated in the Swiss-Prot database as mitochondrial [20]. On the basis of comparative evidence obtained by BLASTP analysis, 78 of these genes are involved in the OXPHOS system, encoding 66 proteins known to be components of the five large respiratory complexes and 12 proteins involved in oxidative phosphorylation as accessory proteins. To identify   [13,21] on the whole genome sequences of these species using the amino-acid sequences of the 78 D. melanogaster peptides as queries. Sequences giving the best reciprocal BLAST hits were tentatively assumed to identify functional counterparts in two species if they could be aligned over at least 60% of the gene length and the BLAST E-score was less than 10 -30 . By these criteria, all the 78 D. melanogaster OXPHOS genes investigated have a counterpart both in D. pseudoobscura and in A. gambiae. To better compare the structure of the OXPHOS genes in the three dipteran species, we used the predicted coding sequences as queries for a search of expressed sequence tags (EST) [21], and used the retrieved sequences to annotate the transcribed noncoding sequences of the A. gambiae genes investigated. Although little EST information is available for D. pseudoobscura, it was still possible to predict unambiguously the exon-intron gene structure of the OXPHOS genes in this species, as well as the amino-acid sequence of their full-length products, by exploiting the high level of similarity with D. melanogaster.
The results of BLAST analysis, together with the construction of phylogenetic trees that also include other genes that show lesser but still significant sequence similarity to the 78 genes assumed to be one-to-one orthologs in the three species investigated (see below), strongly suggest that the newly identified D. pseudoobscura and A. gambiae genes are the functional counterparts of the 78 D. melanogaster genes used as probes. Table 1 lists the 78 putative orthologous OXPHOS genes in the three dipteran genomes and their cytological location. For each gene, a record showing the gene map and reporting the annotated genomic sequences as well as the mRNA and protein sequences is available and can be queried at the Mito-Comp website [22] (see also Additional data files). MitoComp also compares the structure of the D. melanogaster, D. pseudoobscura and A. gambiae putative orthologous genes and their duplications when present (see below), and aligns the orthologous coding sequences (CDS), and also aligns their deduced amino-acid products with the corresponding human protein.

Amino-acid sequence comparison
For the products of the OXPHOS genes investigated, the D. melanogaster/D. pseudoobscura average amino-acid sequence identity is 88%, compared to 64% between D. melanogaster and A. gambiae. Figure 1 shows the frequency dis- they are much higher if the putative amino-terminal presequences are excluded, since such sequences, possessed by most mitochondrion-targeted products, show little aminoacid sequence conservation [23,24], although they do share specific physicochemical properties [25,26]. When only the predicted mature protein is considered, the average percentage identity increases to 90% between D. melanogaster and D. pseudoobscura, and to 70% between D. melanogaster and A. gambiae.
A striking example of evolutionary conservation is provided by the genes encoding cytochrome c (an essential and ubiquitous protein found in all organisms) in the three dipteran species: the amino-acid sequences of the gene products are identical in D. melanogaster and D. pseudoobscura, whereas 96% identity is preserved between Drosophila and Anopheles. Coding sequences are also extremely conserved, suggesting that the nucleotide sequence itself is subject to strong evolutionary constraints, maybe due to codon usage bias.
Only synonymous substitutions (21 out of 108 codons) were found on comparing D. melanogaster and D. pseudoobscura cytochrome c coding sequences, whereas 28 synonymous substitutions and only four nonsynonymous substitutions were observed between D. melanogaster and A. gambiae (see MitoComp website [22]).

Gene structure comparisons
It is well known that a given function may be supplied in different species by genes that are not directly derived from a common ancestor, that is, by paralogous, not orthologous, genes. Therefore, we thought it would be interesting to compare the structural organization of the OXPHOS genes in the three species investigated, on the principle that it should be possible to infer derivation from a common ancestor, that is, 'structural orthology', if an identical or very similar overall structure was preserved. As the introns of the putative orthologous OXPHOS genes in the three species are, as expected, too divergent in DNA sequence to be aligned, we used conservation of number of introns, conservation of their location in the coding sequence, and preservation of the reading frame with respect to the flanking exons as our primary criteria.
With the only exception of Dpse\CG5037, putatively encoding protoheme IX farnesyltransferase, whose 5' genomic sequence was impossible to find in the relevant contig assembly, all other investigated D. pseudoobscura genes show a structural organization almost identical to that of their D. melanogaster counterparts. Of the 78 Anopheles genes studied, 39 maintain the structural organization observed in Drosophila, whereas gain or loss of introns occurred in 33, and in six the location of introns is not preserved at all. In agreement with a previous report [4], the intron-exon structure of the gene appears to be conserved in all three dipteran species when splicing of alternative coding exons occurs: the alternative splice forms of both the Drosophila NADH-ubiquinone oxidoreductase acyl carrier protein (mtacp1, CG9160) [27] and the Drosophila ATP synthase epsilon chain (sun, CG9032) [19] have very similar counterparts in Anopheles, as shown by genomic structure comparison, alignment of splice variants and EST mapping ( Figure 2).
Genes encoding the acyl carrier protein (mtacp1) in the three species are characterized by the mutually exclusive use of homologous exons that are repeated in tandem (Figure 2a). The duplicate exons occur at the same location in the aligned amino-acid sequences, and are flanked on both sides by a phase 1 intron. When the sequences of the duplicated exons are compared, they show the expected divergence pattern (that is, the similarity between duplicate exons within a gene is less than the similarity of each exon to its equivalent in the orthologous gene). Evidence from genomic and transcribed sequences (GenBank accession numbers BI510891 and BI508135) shows that the duplicated mtacp1exons are also preserved in the more distantly related insect Apis mellifera (honeybee) (Figure 2c,d), indicating a specific adaptive benefit for this gene structure, as also suggested by the evolutionary convergence leading to the occurrence of alternative splicing in members of three different ion-channel gene families from Drosophila to humans [28]. However, there is no evidence from ESTs that duplicated mtacp1 exons undergo alternative splicing in vertebrates and nematodes.

Analysis of intron length
Interspecies comparison of the introns of putative orthologous genes indicates that there is little constraint on their nucleotide sequence, which undergoes nucleotide substitu-tions at a rate comparable to that of pseudogenes [29]. However, several observations suggest that intron size is subject to natural selection. For example, in D. melanogaster and several other organisms the distribution of intron length has been shown to be asymmetrical, with a large group of introns falling into a narrow distribution around a 'minimal' length and the remaining showing a much broader length distribution, ranging from hundreds to thousands of base-pairs [30][31][32].
Of the introns that interrupt the coding sequence in the 78 OXPHOS genes investigated in the present study, 88 (

OXPHOS gene duplications
It is generally accepted that gene duplication is the basic process that underlies the diversification of genes and the origination of novel gene functions [33]; however, many features of this process are still elusive. To obtain more information on the molecular evolution of the genes involved in the OXPHOS system, we searched the genomes of D. melanogaster, D. pseudoobscura and A. gambiae for duplications of the 78 OXPHOS genes whose orthologs we have identified in the three species.
Duplicate gene pairs were tentatively identified within each genome as best reciprocal hits with an E-value of less than 10 -20 in both directions in a TBLASTN search using the default parameters. Deciding whether two proteins may be considered homologous becomes difficult when their sequence identity is within the 20-30% range (the so-called 'twilight zone' [34]), and so the following additional criteria were used: first, the two sequences could be aligned over more than 60% of their length; second, the putative processed proteins encoded had to have more than 40% identity; and third, amino-acid percentage similarity had to be larger than percentage identity [35]. Even if meeting these criteria and reported as different genes in the ENSEMBL database [36], identical Anopheles nucleotide sequences were excluded from further analysis, as they are likely to reflect annotation artifacts.
Duplications, or in some instances triplications, of 24 OXPHOS genes were found. Overall, we identified 47 genes (20 in D. melanogaster, 19 in D. pseudoobscura and eight in A. gambiae) each of which shows significant similarity with one of the 78 OXPHOS genes reported above. When the structure of a member of a paralogous gene set indicates that it has been produced by retroposition, it seems reasonable to assume that it is derived from a pre-existing 'parent' gene. For duplicates not clearly originating by retroposition, we also assume, on the basis of the much higher level of conservation and expression, that the genes we find to be the structural orthologs in all three species are the parent ones, and in this case also we will henceforth refer to their paralogs as OXPHOS gene duplicates. The amino-acid percentage identity between the products of duplicate gene pairs ranges from 40% to 85%. For each of the OXPHOS gene duplicates, cytological localization, number of exons interrupting the coding sequence, and number of ESTs found in the D. melanogaster and A. gambiae EST databases are reported in Table 2.
Neighbor-joining trees derived from distance matrix analysis and showing the inferred evolutionary relationship between members of each gene cluster are available at the MitoComp website [22].
Duplications (or triplications) of 16 of the 78 OXPHOS genes investigated were found in both D. melanogaster and D. pseudoobscura. In such cases, to assign pairwise orthology, besides taking into account conservation of structural organ-ization, given the general conservation of microsyntenic gene order in the two species, we used the products of D. melanogaster genes flanking the duplicate loci to search for homologous sequences also flanking the same genes in the D. pseudoobscura genome.
The genomic organization of many OXPHOS duplicates shows that they were originated by retropositional events, because they are intronless, or have only very few introns that are likely to have been inserted into the coding sequence after the duplication event. In other cases, duplication apparently resulted from transposition of genomic DNA sequences or from recombinational events, as duplicate genes maintain an identical or very similar structural organization.
On the basis of the presence of the duplication in both species, supported by evidence from evolutionary trees and conservation of microsyntenic gene order, it can be inferred that 15 of the duplications identified occurred before the D. melanogaster/D. pseudoobscura divergence (about 46 million years ago). On the other hand, five duplications were found only in D. melanogaster and four only in D. pseudoobscura; in these instances, if the duplication occurred before the divergence of the two species, it has been followed by loss of one of the copies in the lineage leading to the species in which the gene is no longer duplicated. On the assumption that the rate of gene duplication is constant over time, this translates to approximately 0.0014 duplications per gene per million years (4 or 5 duplications per 78 genes per 46 million years) that achieved fixation and long-term preservation in the genome. This value is about twofold lower than the 0.0023 value calculated by Lynch and Conery [37] for the 13,601 genes of the whole genome of D. melanogaster. However, it can be argued that the rate of long-term preservation in the genome of OXPHOS gene duplicates cannot be meaningfully compared with the general rate of preservation of duplicates in the whole genome since, while recent data suggest that in eukaryotic genomes there is preferential duplication of conserved proteins [38], duplicates of genes that encode subunits of multiprotein complexes, as most of the genes we have investigated do, negatively influence the fitness of an organism [39], and are therefore unlikely to become fixed in the population. In summary, it appears reasonable to assume that the preservation in the genome of OXPHOS gene duplicates should occur very infrequently, unless special mechanisms allowing their fixation in the population are present (see the next section).
In A. gambiae we found only four duplications and two triplications of the OXPHOS genes analyzed; of these, four involve genes also duplicated in one or both Drosophila species (Table 2). Pairwise orthology could not be assigned between Drosophila and Anopheles gene duplicates as neither microsynteny nor evolutionary trees provide sufficient evidence for the origin of the gene pairs from a single-copy gene before the Drosophila/Anopheles divergence.

Expression pattern of OXPHOS gene duplicates
The relative abundance of ESTs in a EST library may be assumed roughly to reflect the level of expression of each mRNA in the tissues from which the library was prepared. We therefore used the mRNA sequences predicted in silico to be transcribed from the OXPHOS duplicate genes investigated in this work as queries in a search of the public D. melanogaster and A. gambiae EST databases to infer the relative abundance of the mRNA copies from the hits scored. For each gene, the number of ESTs found in the databases is detailed in Our finding that the expression of the OXPHOS gene originated by duplication is strongly testis-biased is validated by the data obtained by Parisi et al. [40] using the FlyGEM microarray to identify D. melanogaster genes showing ovary-, testis-or soma-biased expression. With the exception of CG7349, CG30354, CG30093 and CG12810, for which no data were presented by Parisi et al. [40], all other genes reported in this work as OXPHOS gene duplicates were found in the genomic fraction showing testis-biased expression, whereas all the parent genes present in the dataset showed soma-biased expression. Additional data file 2 summarizes the relevant data extracted from Parisi et al. [40].
The pattern of strongly testis-biased expression of OXPHOS gene duplicates holds for a further sample of 40 duplications of genes annotated in the MitoDrome database [19] as encoding products that are mitochondrion-targeted but not involved in the OXPHOS system. For 15 of these no data are provided by Parisi et al. [40], but all the remaining 25 genes show a testes-biased expression (data not shown).
Duplications of genes encoding OXPHOS subunits, for which stoichiometry is important, are likely to be strongly deleterious owing to the negative consequences of an imbalance in the concentration of the respiratory complex constituents, unless, as proposed by Lynch and Force [41], 'subfunctionalization' and/or a differential expression pattern of duplicate copies occurs. In this case, the duplicate OXPHOS genes would have a reduced or absent capacity to functionally complement mutations in their parent genes, in contrast to what is generally assumed to be the main short-term advantage of gene duplication. In D. melanogaster at least there is evidence for this, as FlyBase [42] Table 2).
A systematic investigation of the expression pattern of other D. melanogaster duplicate genes will be necessary to answer the question of whether the testis-biased expression pattern reported here is specific to the duplicates of genes encoding mitochondrial proteins, or is a more general phenomenon. According to the balance hypothesis, validated by experimental results obtained on yeast [39], single gene duplications involving genes encoding components of multiprotein complexes are expected to severely affect fitness. Therefore, the expression pattern we have observed could be a necessary condition to maintain some gene duplicates in the D. melanogaster genome, at least until they evolve a new useful function. Finally, as nothing is known about the tissue-specific pattern of expression of the genes investigated in D. pseudoobscura and Anopheles, it also remains unclear whether the testis-biased expression of gene copies originated by duplication is specific to D. melanogaster, or is also to be found in other dipterans, and possibly in other organisms.

Codon usage in the OXPHOS genes
Because of the preferential use of codons ending in C or G, the D. melanogaster coding sequences have an average GC content higher than the genomic average [44,45]. This is also true for the 78 D. melanogaster OXPHOS coding sequences reported in this work and for their D. pseudoobscura and A. gambiae counterparts (68% of the codons in the OXPHOS genes end in C or G in D. pseudoobscura and 77% in A. gambiae, compared to 74% in D. melanogaster). In all three species, the coding sequences of OXPHOS gene duplicates show a lower percentage of codons ending in C or G, when compared to both the entire set of 78 orthologous OXPHOS genes and the gene subset including only their parent genes. In samples including all the OXPHOS gene duplicates annotated in this paper the aggregate percentage of C-or G-ending codons is 63%, 46% and 73% in D. melanogaster, D. pseudobscura and A. gambiae respectively, as compared with 70%, 64% and 88% in their parent genes. In D. pseudoobscura, the shift toward a higher percentage of A-or T-ending codons is also detected in the pattern of synonymous codon usage; for 12 of the 18 amino acids that are encoded by more than one codon, the most frequently used codon in the D. pseudoobscura gene duplicates is different from the one used in their parent genes (see Additional data file 3).

Chromosomal arm location, interarm homology and microsynteny
It has been reported that in many eukaryotes including yeast [46], C. elegans [47], D. melanogaster [48,49] and humans [50], genes with related functions and similar expression patterns tend to be clustered, suggesting that they share aspects of transcriptional regulation depending on their inclusion in the same chromatin domain. In particular, Boutanaev et al. [48] reported that in D. melanogaster clusters of three or more testis-specific genes are much more frequent than expected by chance. Therefore, we investigated the chromosomal distribution of the OXPHOS genes to determine whether clustering could be detected. In all three dipteran species considered, the 78 OXPHOS orthologous genes are randomly distributed on all chromosomal arms (Table 1). Two D. melanogaster genes (Ucrh, encoding the 11 kDa subunit of ubiquinol-cytochrome c reductase, and CG40002, encoding the AGGG subunit of NADH-ubiquinone oxidoreductase) have a heterochromatic location.
No evidence of OXPHOS gene duplicate clustering was found either, despite the common testis-biased expression of such genes. Moreover, no evidence of clustering with other testisspecific genes was found when an EST database search for such genes was performed in the regions flanking the investigated gene duplicates.
However, in accord with two studies reporting a significant deficit of genes with a male-biased expression on the D. melanogaster X chromosome [51,52], only one out of the 20 D. melanogaster OXPHOS gene duplicates, two out of 19 in D. pseudoobscura and none (out of eight) in A. gambiae were found to be X-linked (Table 2). It may be that duplications of X-linked genes encoding OXPHOS subunits would be especially deleterious because of the male X chromosome transcriptional hyperactivity, which allows dosage compensation.
In all three dipteran species, a disproportionately high fraction of OXPHOS gene duplicates appears to be constituted of autosomal genes derived from parent genes located on the X chromosome ( Table 2). As suggested by recent work on the generation and preservation of functional genes produced by retroposition both in Drosophila [53] and in the human and  mouse genomes [54], this may be explained by a selective advantage for duplicates of X-linked genes that move to an autosomal location and so escape the X inactivation in early spermatogenesis that occurs both in Drosophila [55] and in mammals [56].
We would like to speculate that such selective advantage may be especially significant for duplicates of OXPHOS genes, given the heavy reliance of sperm on mitochondrial function. In fact, the excess of autosomal duplicates of X-linked genes is not observed for MitoDrome annotated genes not involved in the OXPHOS system (see above). However, as the general pattern of much lower, testis-biased expression holds even for OXPHOS and other mitochondrial gene duplicates that apparently derive from autosomal parental genes, and even for X-linked duplicates, this pattern (and the explanation of the evolutionary preservation of such genes) cannot only be due to the selective advantage of escaping X inactivation during spermatogenesis.
With the exception of CG9603, all euchromatic D. melanogaster orthologs maintain their localization on the homologuos D. pseudoobscura chromosomal arm (Table 3). CG9603, encoding the VIIa polypeptide of cytochrome c oxidase, is located on the 3R chromosomal arm in D. melanogaster, whereas Dpse\CG9603, its counterpart in D. pseudoobscura, is located on XR; microsyntenic gene order with the flanking genes is conserved in both species, suggesting that a chromosomal rearrangement occurred after their divergence.
OXPHOS gene duplicates also almost always maintain the same chromosomal location and microsyntenic gene order in D. melanogaster and in D. pseudoobscura. However, a more complex situation was observed with regard to the gene encoding subunit IV of cytochrome c oxidase, which is duplicated in D. melanogaster and triplicated in D. pseudoobscura (  (Table 3). This result is consistent with previous reports that compared the location of homologous genes in D. melanogaster and A. gambiae, concluding that extensive reshuffling both within and between chromosomal regions has occurred since the divergence of the two species [4,17].

Conclusions
We have catalogued 78 nuclear genes that control oxidative phosphorylation in three dipteran species and compiled a web-based dataset, MitoComp [22], that contains all the data on which this article is based and which is available with the online version of this article. We have conducted only some basic comparative analyses of the many which are possible using such a dataset, and it is our hope that it will provide a valuable resource for those looking for information about nuclear genes encoding mitochondrion-targeted products in the context of functional genomics and proteomics. Future studies based on this information, especially if the comparative analysis is extended to other species, will surely allow a better understanding of the evolutionary history of a set of genes that control a basic biological function, and also offer interesting insights into the mechanisms of their coordinated expression. In fact, a first in silico analysis of the D. melanogaster and D. pseudoobscura nuclear energy gene sequences suggests that a genetic regulatory circuit, based on a single regulatory element, coordinates the expression of the whole set of energy-producing genes in Drosophila [57].
The comparative analysis of the 78 OXPHOS genes in the three dipteran species shows a high level of amino-acid sequence identity, as well as a substantial conservation of intron-exon structure, indicating that these genes are under strong selective constraints. An unexpected and intriguing result of this study is that in D. melanogaster, duplicationoriginated OXPHOS genes are expressed at a much lower level (or possibly not expressed at all) in most or all the tissues where their parent genes are expressed, as judged by the abundance of ESTs derived from their transcripts in all libraries other than those derived from testis. On the other hand, OXPHOS gene duplicates have a strongly testis-biased pattern of expression, a finding validated by other authors with a different approach based on the use of microarrays [40]. In A. gambiae, although no testis-specific ESTs databases are available, a pattern of expression of almost all duplicate OXPHOS genes different from that of the gene from which they originated, and possibly limited to specific tissues, is suggested by the fact that in all EST libraries available the abundance of the sequences originated from the duplicate genes is very low when compared with that of the sequences derived from their respective parent genes.
We suggest that, at least in D. melanogaster, the acquisition of a new, testis-biased pattern of expression may be required to maintain duplicates of certain genes in the genome. This may also allow rapid acquisition of new functions by the gene product(s), as it has recently been shown that proteins encoded by duplicated genes with a changed expression pattern often show accelerated evolution [58,59]. Subfunctionalization could then further favor the preservation of multiple paralogous genes.
No data are at present available to support the possibility that our findings could be extrapolated to other gene sets or even to the whole genome. However, we propose that duplication of the genes encoding products that are part of multiprotein complexes may be especially deleterious, unless sequence divergence allowing only testis-specific expression of one of the duplicate copies occurs. In turn, this could facilitate the development of novel functions, which is usually assumed to be the main evolutionary advantage of gene duplication, providing a general mechanism for originating phenotypic changes that might also lead to species differentiation.

Materials and methods
To identify orthologous OXPHOS genes and their duplications in D. pseudoobscura and A. gambiae, contigs from BCM [13] and scaffolds from AnoBase [21] were searched using TBLASTN with the D. melanogaster OXPHOS peptides listed in the MitoDrome database [19] as queries.
Amino-acid sequence identity and similarity values were obtained from pairwise alignments using the Needleman-Wunsch global alignment algorithm at the EMBL-EBI server [60]. Multiple sequence alignments of the OXPHOS aminoacid and coding sequences and visualization of the dendro-grams were obtained using the MultAlin 5.4.1 software [61] from MultAlin server [62].
The genomic sequence of each gene was manually searched for intron-exon boundaries and the predicted mRNA sequence reconstructed in silico. A. gambiae mRNAs were assembled by overlapping ESTs extracted from AnoBase [21].
We have named each newly identified A. gambiae gene with the four-letter code 'agEG' followed by the last four or five digits of its Ensembl [36] gene number, excluding the multiple zeros of the prefix; the D. pseudoobscura genes were named with the code 'Dpse\CG' followed by the Celera number of their D. melanogaster counterparts.
The D. pseudoobscura OXPHOS genes investigated here were assigned a chromosomal location where possible, using the putative chromosomal assignments available at BCM [13] for the majority of the large D. pseudoobscura contigs. We also utilized the Ensembl mosquito genome server [36] to identify and visualize the chromosomal location of the A. gambiae annotated OXPHOS DNA sequences.
The D. melanogaster EST database, available from the National Center for Biotechnology Information (NCBI) contains ESTs from cDNA libraries obtained from different developmental stages and body parts. The relative abundance of the transcripts of duplicate or triplicate D. melanogaster OXPHOS genes was defined by counting their cognate ESTs in non-normalized cDNA libraries generated by the Berkeley Drosophila Genome Project (BDGP) [43] from embryos (LD), larvae/pupae (LP), and adult ovary (GM), head (GH) and testes (AT), and also the ESTs from adult testes generated at the National Institute of Diabetes and Digestive and Kidney Diseases (NIDDK) [63]. ESTs from BDGP normalized EST