- Open Access
A genetic code alteration generates a proteome of high diversity in the human pathogen Candida albicans
Genome Biology volume 8, Article number: R206 (2007)
Genetic code alterations have been reported in mitochondrial, prokaryotic, and eukaryotic cytoplasmic translation systems, but their evolution and how organisms cope and survive such dramatic genetic events are not understood.
Here we used an unusual decoding of leucine CUG codons as serine in the main human fungal pathogen Candida albicans to elucidate the global impact of genetic code alterations on the proteome. We show that C. albicans decodes CUG codons ambiguously and tolerates partial reversion of their identity from serine back to leucine on a genome-wide scale.
Such codon ambiguity expands the proteome of this human pathogen exponentially and is used to generate important phenotypic diversity. This study highlights novel features of C. albicans biology and unanticipated roles for codon ambiguity in the evolution of the genetic code.
Since the elucidation of the genetic code in the 1960s, 24 alterations in codon identity have been recorded in prokaryotic and eukaryotic translation systems. These alterations involve redefinition of identity of both sense and nonsense codons and codon unassignment (codons vanished from genomes) . Furthermore, artificial expansion of the genetic code to incorporate non-natural amino acids [2–4] and natural incorporation of selenocysteine (Sec; 21st amino acid) and pyrrolysine (22nd amino acid) have also been reported [5, 6]. Sec is incorporated in both prokaryotic and eukaryotic selenoproteins through reprogramming of UGA stop codons by novel translation elongation factors (selenoprotein translation factor B prokaryotes, elongation factor [EF]-Sec, and selenium-binding protein 2 eukaryotes), a new tRNA (tRNASec), and a Sec mRNA insertion element . L-pyrrolysine insertion occurs in the archeon Methanosarcina barkeri through reprogramming of the UAG stop codon by a pyrrolysine insertion sequence in the methylamine methyltransferase mRNA . The flexibility of the genetic code is further exemplified by the absence of glutamine and asparagine aminoacyl-tRNA synthetases in several mitochondria and archaeal and bacterial species. In those particular cases, aminoacylation of tRNAGln and tRNAAsn is accomplished by an ATP-dependent transamidation reaction on mis-charged Glu-tRNAGln and Asp-tRNAAsn [9–11]. Methanococcus jannaschii, Methanopyrus kandleri, and Methanothermobacter thermoautotrophicus all lack canonical cysteinyl-tRNA synthetases and charge tRNACys with the intermediate substrate O-phosphoseryl (Sep), using the enzyme Sep-tRNA synthetase. Sep-tRNACys is then converted to Cys-tRNACys by Sep-tRNA:Cys-tRNA synthetase .
The unusual decoding properties described above reflect evolutionary steps in the development of the genetic code. They support the co-evolutionary theory of organization of the primordial genetic code  and demonstrate that most of the alterations and expansions are mediated by structural changes in the protein synthesis machinery, in particular in tRNAs, aminoacyl-tRNA synthetases, EFs and termination factors . However, these data per se do not provide insight into the evolutionary forces that drive codon identity redefinition, and neither do they help in evaluating the impact of genetic code alterations on proteome and genome stability, gene expression, adaptation, and ultimately evolution of new phenotypes.
In order to shed new light on the above questions, we chose the human pathogen Candida albicans as a well studied model system [15–18]. C. albicans and other Candida spp. have a unique genetic code because of the change in the identity of the leucine CUG codon to serine, which evolved through an ambiguous codon decoding mechanism that affected approximately 30,000 CUG codons in more than 50% of the genes . Because serine is polar and leucine hydrophobic, the change in identity of CUG codons across all of the open reading frames (ORFeome) must have caused major proteome disruption. This raises an important question of how the Candida ancestor managed to survive such a dramatic genetic event. Here, we deployed direct protein mass spectrometry analysis to shed new light on this important biologic issue. We show that the CUG codon is decoded as both serine and leucine in vivo and that C. albicans tolerates up to 28.1% of leucine mis-incorporation at CUG positions, which represents a 28,000-fold increase in decoding error. This increased dramatically the number of different proteins encoded by the 6,438 C. albicans genes and resulted in extensive and unanticipated phenotypic variability. The data provide new insight into the evolution of the genetic code and C. albicans biology, and demonstrate that alterations in genetic code are dynamic molecular processes of unexpected relevance to phenotypic diversity.
Identity of the C. albicans CUG codon in vivo
The genetic code alteration in Candida is the only known case of a sense-to-sense codon identity redefinition in eukaryotes. The other cases deal with redefinition of stop codons, for instance UAR to glutamine in various ciliates and green algae, UGA to cysteine in Euplotes spp., and UAG to glutamate in various peritrich species .
In Candida, the alteration in identity of the CUG codon evolved over 272 ± 25 million years through an ambiguous codon decoding mechanism [17, 19]. It arose from competition of a mutant tRNACAGSer with wild-type tRNACAGLeu and from leucine mischarging of the former tRNA [19–21]. Because the novel C. albicans tRNACAGSer has identity elements for both seryl-tRNA synthetases and leucyl-tRNA synthetases (LeuRSs) and can still be mischarged in vitro with leucine , we investigated whether CUG codons could remain ambiguous in vivo. For this purpose, a reporter protein for monitoring ambiguous CUG decoding, containing an amino-terminal CUG cassette, was constructed based on the C. albicans PGK (phospho-glycero kinase) protein (Figure 1a). The protein was then expressed in C. albicans CAI-4 cells using a C. albicans shuttle vector (pUA63; Additional data file 1 [Figure S1A]), purified to near homogeneity (Figure 1a), and in-gel digested with enterokinase and thrombin. The resulting peptides were identified and quantified using high-pressure liquid chromatography (HPLC) and tandem mass spectrometry (Figure 2).
In order to determine whether the HPLC-mass spectrometry methodology used was adequate to quantify leucine mis-incorporation at the CUG codon, synthetic peptides of identical amino acid sequence were used (see Materials and methods, below). Furthermore, amino acid mis-incorporation at near-cognate codons was monitored to ensure that leucine mis-incorporation at the CUG position could be detected above background noise. Near-cognate misreading is the most frequent mistranslation error because it involves misreading at the wobble position by near cognate tRNAs . This error has been monitored in yeast in vivo and is in the order of 0.001% . Because the aspartate GAU and lysine AAA codons encoded by the reporter peptide (Figure 1a) could be misread by near-cognate tRNAGlu and tRNAAsn, respectively, the mass on these aberrant peptides containing glutamate at the aspartate-GAU position or asparagine at the lysine-AAA position was determined (Figure 2a). The peptides resulting from correct serine incorporation and leucine mis-incorporation at the CUG position were clearly visible in the mass spectrum (Figure 2b,c), whereas the peptides containing serine at the CUG position plus glutamate at the aspartate-GAU position or serine at CUG plus asparagine at the lysine-AAA position were not detected (Figure 2d,e). This confirmed that our methodology was robust for accurate quantification of mistranslation of the C. albicans serine CUG codon as leucine.
The levels of leucine mis-incorporation at the CUG codons were then quantified and were 2.96% in C. albicans white cells grown at 30°C, 3.9% at 37°C, 4.03% in presence of hydrogen peroxide (H2O2), and 4.95% at pH 4.0 (Figure 3a,b). These values represent between 2,960-fold and 4,950-fold increases in mistranslation (10-5 typical error ) and imply that the tRNACAGSer is charged in vivo with both serine and leucine and that the mischarged leu-tRNACAGSer is neither edited by the LeuRS nor discriminated by translation elongation factor 1A.
The unexpected CUG mistranslation in wild-type cells prompted us to investigate whether the identity of the CUG codon could be reverted to leucine or whether CUG ambiguity could be tolerated at higher levels. For this, a Saccharomyces cerevisiae gene encoding a mutant tRNACAGLeu, which decodes CUG codons as leucine by standard Watson-Crick base pairing, was inserted into plasmid pUA63, which already contained the CUG-reporter protein gene, producing plasmid pUA65 (Additional data file 1 [Figure S1B]). The pUA65 plasmid was then transformed into C. albicans CAI-4 cells. Because the recombinant tRNACAGLeu was expected to decode CUG codons as leucine, higher levels of leucine incorporation were expected at the CUG codon position in the reporter protein. This protein was purified by nickel affinity chromatography and CUG ambiguity was quantified by HPLC-mass spectrometry, as above. Surprisingly, the levels of leucine and serine incorporated in response to the CUG codon in the PGK reporter were 28.1% and 71.9%, respectively (Figure 3c,d). Remarkably, however, this dramatic increase in decoding error (28,000-fold) did not significantly decrease growth rate (data not shown).
Double identity of the CUG codon expands the C. albicansproteome
The discoveries that C. albicans tolerates up to 28.1% of leucine mis-incorporation (Figure 3c,d) and that wild-type cells mis-incorporate leucine at 3% to 5% under standard and mild stress conditions (Figure 3a,b) raised the intriguing issue of proteome complexity in C. albicans. In other words, how many different proteins can be generated from the 6,438 C. albicans genes? To address this important question, we conducted a detailed survey of the global distribution of CUGs in the C. albicans genome. There are 13,074 CUG codons in the haploid genome of C. albicans, distributed over 66% of its genes, at a frequency of 1 to 38 CUGs per gene (Figure 4a), with an average of three CUGs per gene. A genome-wide codon-context survey did not identify any particular context bias for the CUG codon (see Additional data file 2), suggesting that leucine and serine are inserted randomly at CUG positions. Therefore, the total number of different proteins that can be generated from ambiguous CUG decoding is 2n(n = total number of CUGs per gene). This implies that the size (diversity) of the C. albicans proteome expands exponentially with the number of CUG codons per gene, and that the 6,438 protein-encoding genes of C. albicans have the potential to produce a staggering 2.8379 × 1011 different proteins through CUG ambiguity (Figure 4b). In other words, each protein is represented by a mixture (array) of molecules containing leucine or serine at positions encoded by CUG codons. This is of profound biologic significance because it implies that each C. albicans cell has a unique combination of proteins.
An important characteristic of the C. albicans proteome is that small differences in leucine mis-incorporation have large effects on proteome expansion and diversity. This effect results from the binomial probability of one gene with n CUG codons having i leucines incorporated at these CUG positions (see Materials and methods, below). To illustrate this, we calculated the probability of synthesis of different proteins for number of leucines 0, 1, 2, and 3; for genes containing three CUGs; and for ambiguity levels of 2.96% (cells grown at 30°C), 3.9% (cells grown at 37°C), 4.95% (cells grown at pH 4.0), 4.03% (cells grown in presence of H2O2), and 28.1% (pUA65 cells; Figure 4c). Indeed, the probabilities of such a protein to contain one leucine in cells grown at 30°C, 37°C, pH 4.0 and H2O2 are 8.36%, 10.8%, 13.4% and 11.1%, respectively. In engineered highly ambiguous cells (28.1% leucine mis-incorporation), 43% of the proteins contain at least one leucine at one of the CUG positions (Figure 4c).
We also calculated the direct impact of ambiguous CUG decoding on expansion of the C. albicans proteome by taking advantage of the 'codon adaptation index' (CAI; Figure 5a-d). In S. cerevisiae, the 10% of the proteins with the highest CAI values are represented by 50,000 molecules/cell, whereas the 10% of the proteins with the lowest CAI values are represented by 5,000 molecules/cell . Because S. cerevisiae and C. albicans are close relatives, we used these values as reference for protein expression levels in the latter. For this, the global distribution of CAI values was calculated for C. albicans (Figure 5a). In C. albicans, CAI values had a broader distribution toward higher values, indicating that its genes often use a small subset of codons to optimize gene expression. We then assumed the following: all C. albicans genes are expressed; the abundance of proteins is 5,000 molecules/cell for the 10% of genes with lowest CAI values; the abundance of proteins is 50,000 molecules/cell for the 10% of genes with highest CAI values; and the abundance of proteins is 20,000 molecules/cell for the remaining 80% of genes. This permitted estimation of the number of different protein molecules that could be present within a C. albicans cell according to their level of expression. On the basis of CAI distribution for C. albicans (Figure 5a,b), we estimated that for CUG mis-translation levels of 2.9% and 28.1% the 6,438 C. albicans genes will produce 6 × 106 and 40 × 106 proteins, respectively (Figure 4d).
The proteome analysis was extended one step further to compare the impact of CUG ambiguity in abundant and rare proteins. CDC3 and RAD17 genes, whose CAI values (0.69 and 0.448, respectively) are at the high and low extremes of the distribution of CAI values for C. albicans (Figure 5a,b), were chosen for this analysis. Ambiguous CUG decoding had a stronger impact on CDC3 than on RAD17, indicating that highly expressed proteins encoded by genes with high CAI values are affected the most. Indeed, for 2.9% ambiguity, Rad17p is represented by 4,569 wild-type and 429 novel polypeptides (8.58%), whereas Cdc3p is represented by 45,691 wild-type and 4,306 novel polypeptides (8.6%), containing a combination of one, two, or three leucines at the three CUG positions (Figures 6 and 7). Overall, approximately 10% of the proteins synthesized from mRNAs containing three CUG codons are novel. Interestingly, codon usage analysis showed that CUG codons are highly under-represented in 10% of C. albicans genes with the highest CAI values, but are used frequently in 10% of the genes with the lowest CAI values (Figure 5c,d). Furthermore, 83% of C. albicans genes with the highest CAI do not have CUG codons, whereas 81% of genes with the lowest CAI have at least one CUG. This is in sharp contrast to CUG usage in S. cerevisiae, in which only 56% of genes with highest CAI and 6% of genes with average CAI did not have CUGs.
Ambiguous CUG decoding generates phenotypic diversity
C. albicans cells grow on agar plates as white smooth or slightly wrinkled colonies (Figure 8a). They can acquire alternative morphologies at low frequency (10-4 to 10-1) when they are exposed to both physical and chemical agents, namely serum, low pH, nutrient starvation, high temperature, and UV light . These morphologies range from smooth to various wrinkled forms, and result from induction of hypha development inside the colonies. Also, some strains are able to switch from the typical white form to an alternative form termed opaque . Opaque cells are larger, have different gene expression profiles, and are less virulent than white cells. They are also homozygotic for the mating locus (MTL; AA or αα) and are able to mate, while white cells are heterozygotic (A/α) and do not mate .
Ambiguous CUG decoding exposed hidden phenotypic diversity without any chemical or physical inducer. Indeed, a high percentage of the colonies of the pUA65 clone, expressing the S. cerevisiae leucine CUG decoding tRNACAGLeu, but not the cells transformed with plasmid pUA63 (lacking the S. cerevisiae tRNACAGLeu), exhibited highly variable morphologies characterized by formation of aerial hyphae and white-opaque sectoring (data not shown). To exclude eventual secondary effects caused by the PGK reporter gene in the phenotypic variation observed, we have constructed two new plasmids that lack the reporter gene, namely a plasmid containing the S. cerevisiae tRNACAGLeu gene only (pUA15) and a control plasmid that does not contain the heterologous tRNACAGLeu gene (Additional data file 3 [Figures S3A,B]). Again, 88% of the colonies of the pUA15 clone, expressing the S. cerevisiae leucine tRNACAGLeu gene, exhibited highly variable morphologies characterized by formation of aerial hypha and white-opaque sectoring (Figure 8b,c). Colonies of pUA12 clones (control plasmid) did not show this phenotypic variability and were similar to untransformed CAI-4 cells (Figure 8a). Approximately, 40% of the pUA15 clones produced hypha that penetrated deeply into agar, and 40% to 50% (depending on the clone) produced opaque sectors that frequently occupied 20% or more of the colony. In some colonies the entire surface was covered with long aerial hyphae (Figure 8b) and cells from these colonies formed very long filaments and flocculated when grown in liquid media (data not shown), suggesting that they were highly hydrophobic. Cells from colonies with alternative morphologies also exhibited strong morphologic variability. Each colony was composed by a mixture of yeast-like cells, pseudophyphae, and hyphal cells in various proportions, depending on the clone (Figure 9a-e). Large cells and ovoid-elongated cells were often observed, suggesting that these colonies contained a mixture of opaque and white cells (Figure 9b-e).
Considering that increased CUG ambiguity induced extensive morphologic variation and that C. albicans plasmids lack a centromere and are inherently unstable, we tested whether random integration of the pUA15 plasmid in the C. albicans genome could be responsible for the phenotypes observed. For this, we selected clones that could rapidly lose the pUA12 or pUA15 plasmids (nonintegrated plasmids) using minimal medium containing uridine plus 5-fluoro-orotic acid (5-FOA) . Because clones that maintained the plasmids (pUA12 or pUA15) would die in presence of 5-FOA as a result of expression of their URA3 selective marker gene, we were able to confirm whether plasmid loss would result in disappearance of the phenotypic diversity observed. Indeed, CAI-4 untransformed as well as pUA12 and pUA15 transformed cells that grew in 5-FOA (lost the plasmid) did not exhibit morphologic variation (Additional data file 4 [Figures S4A-D]). To ensure further that the above-mentioned spurious plasmid integrations did not affect phenotypic variability through eventual disruption of one of the copies of the endogenous serine tRNACAGSer gene, we checked the integrity of this gene by PCR amplification of its locus. No disruption was observed in the clones tested (Additional data file 5 [Figures S5A-C]). Finally, the high level of white-opaque switching prompted us to verify the conformation of the mating locus of our C. albicans CAI-4 strain. Because only homozygotic MTLAA or MTLαα cells can switch from the white to the opaque phenotype [29, 30], we checked whether the original strain was MTL homozygotic. For this, the OBPα and MTLA1 genes were amplified by PCR. Untransformed CAI-4 cells or cells transformed with the pUA12 control plasmid were heterozygotic MTLAα, but two pUA15 clones tested were homozygotic MTLαα (Additional data file 6 [Figures S6A,B]). These findings, plus the inability of the pUA12 plasmid to induce phenotypic variation, confirmed that CUG ambiguity is an authentic generator of phenotypic diversity in C. albicans.
We attempted to isolate colonies that could maintain homogeneous morphologies by removing cells from sectors of pUA15 clones and re-plating them on fresh agar (Figure 8c). However, there was always high reversion and switching between different morphologies. This was in accordance with the statistical nature of the C. albicans proteome and it is likely that the main role of the dual identity of the tRNACAGSer is to generate phenotypic diversity. It raises the hypothesis that CUG ambiguity created by this unique tRNA may increase adaptation potential and allow C. albicans to escape the immune system by continuously rearranging its surface antigens.
Implications for the evolution of the genetic code
Genetic code alterations pose unanswered questions about the mechanisms by which they evolve, and their potential selective advantage and physiologic acceptability. We chose the Candida genetic code change as a molecular and cellular model to elucidate those questions. This and previous studies [17, 31–33] strongly support the hypothesis that genetic code alterations evolved through ambiguous codon decoding mechanisms [16, 34].
Ambiguous CUG decoding in C. albicans, which results from mis-charging of the tRNACAGSer, proved interesting from a structural perspective, because it is not yet clear how this novel tRNA is recognized by the LeuRS and why this enzyme fails to edit the mischarged leu-tRNACAGSer. Archeal and most eukaryotic LeuRSs recognize the long variable arm of cognate tRNALeu , whereas the yeast LeuRS makes direct contact with the methyl group of m1G37 and with A35 in the anticodon-loop and nonspecific contacts with the phosphate backbone of the anticodon stem [21, 36]. Like canonical tRNALeu, tRNACAGSer contains A35 and m1G37 in its anticodon loop. However, the discriminator base is G73 (as in other tRNASer) and not A73 (as in tRNALeu), which should prevent its recognition by the C. albicans LeuRS. This is of particular relevance because changing A73 to G73 in both yeast  and human tRNALeu [37, 38] changes its identity from leucine to serine. In the Pyrococcus horikoshii LeuRS-tRNALeu complex, A73 is recognized by the amino acid residue 504 of the editing domain and the interaction is disrupted when A73 is replaced by G73 . It is possible that the C. albicans LeuRS evolved a novel mechanism for recognizing both G and A at position 73. Regarding the failure of LeuRS to edit mis-charged leu-tRNACAGSer, the LeuRS binds its cognate amino acid (leucine), activates it (as normal), and transfers it to the tRNACAGSer (see above). In other words, both leucine and tRNACAGSer are cognate substrates for the LeuRS and consequently the post-transfer editing mechanism is not activated. This is supported by the high degree of amino acid conservation between LeuRS of C. albicans and those of other yeasts, particularly within the editing domain. Functionally, the S. cerevisiae CDC60 (LeuRS) gene could also be complemented by its C. albicans homolog .
Implications of CUG ambiguity for C. albicansbiology
C. albicans is a diploid polymorphic commensal opportunist that causes infection in immune compromised hosts. Morphologic variation, growth at high temperature, yeast-hypha transition, proteinase and lipase secretion, and various adhesins all play important roles in infection [40–42]. The phenotypic diversity induced by CUG ambiguity was unanticipated, but it is not yet clear whether it is relevant to pathogenesis. To clarify this important new question, novel reporter systems for monitoring CUG ambiguity in vivo during infection will have to be developed. Nevertheless, the phenotypic diversity generated by CUG ambiguity also suggests that genetic code ambiguity has a strong impact on C. albicans gene expression, which may in part explain the morphologic diversity observed (see below). However, the multiplicity of forms of C. albicans pUA15 cells in liquid and agar cultures complicates quantitative analysis of the link between CUG ambiguity and phenotypic diversity because of differences in gene expression between cells present in the same culture. The exponential increase in the size of the C. albicans proteome may ultimately be the main factor contributing to morphologic variation (see below). However, one cannot exclude the hypothesis that CUG ambiguity may activate a master regulator or signalling pathway that regulates morphogenesis in C. albicans. This should be clarified by stabilizing some of the morphologies (Figure 8b,c) and comparing the gene expression profiles of each morphotype with that of control cells.
The most remarkable consequence of CUG ambiguity is the exponential expansion of the C. albicans proteome. This is of profound biologic significance because arrays of proteins are generated from single mRNAs creating a statistical proteome. It implies that C. albicans proteins are quasi-species  and that the probability of finding two identical cells in a population is extremely small. It also implies that the C. albicans proteome is unstable, and it will be most interesting to determine whether such instability affects genome stability because the latter is notoriously unstable in this human pathogen [44, 45]. Our data leave no doubt that important proteome diversity can be generated by small increases in CUG decoding ambiguity. We have found slight increases in CUG ambiguity under stress, in particular at low pH (4.95%), suggesting that the relative activity of the LeuRS increases under stress (Figure 3b). At this point it is not clear how this is achieved, but in S. cerevisiae the LeuRS is processed by yscY endopeptidase, which cleaves and inactivates it . Also, the two alleles of the C. albicans CaCDC60 gene (LeuRS) are under control of divergent promoters (data not shown), suggesting that LeuRS expression and activity may be modulated by transcriptional and post-transcriptional regulatory mechanisms.
Genetic code ambiguity as a generator of phenotypic diversity
In yeast, codon ambiguity successfully induces the stress response and increases tolerance to high temperature, lethal doses of heavy metals, and drugs . In an earlier described case, inactivation of the heat shock protein (Hsp)90 molecular chaperone in Drosophila melanogaster and Arabidopsis thaliana allowed expression of polymorphic proteins that are involved in cell signalling pathways and generated phenotypic diversity [47–50]. In S. cerevisiae and C. albicans, Hsp90 plays a critical role in drug resistance by maintaining mutant drug resistance genes in a functional state . In another example, proteome disruption created by generalized stop codon read-through of genes and pseudogenes, induced by the yeast [PSI] prion , resulted in morphologic variation and in a combinatorial response to an array of carbon and nitrogen sources and toxic concentrations of metals, salts, and drugs [50, 53]. All three cases - Hsp90 inhibition, [PSI] prion induction, and genetic code ambiguity - have similar destabilizing impacts on the proteome (they all lead to large scale synthesis/accumulation of aberrant proteins) and increase phenotypic variation. Recent studies showed that mRNA mistranslation in multicellular organisms is associated with disease [54, 55]. However, our data clearly indicate that the negative effect of codon ambiguity on the proteome may, under certain physiologic conditions, be overcome by its capacity to generate novel adaptive traits, at least in unicellular organisms.
Recent reports on the introduction of non-natural amino acids into the genetic code confirm the hypothesis that organisms are highly tolerant to genetic code changes and readily adapt to genetic code ambiguity [32, 56–59]. Our study strongly suggests that genetic code ambiguity generates unanticipated proteome expansion and advantageous phenotypes. This supports the hypothesis that earlier expansion of the genetic code, from a small number of amino acids existent in primordial life forms to the 22 encoded by extant organisms, could have been driven by selection through codon ambiguity. This is compatible with the co-evolutionary theory of the genetic code, which postulates that gradual establishment of amino acid biosynthetic pathways permitted gradual incorporation of new amino acids into the code through a mechanism of donation of codons belonging to pre-existing amino acids [13, 60]. The statistical proteome and phenotypic changes described herein for C. albicans support the hypothesis that gradual codon identity changes will inevitably block lateral gene transfer and create genetic barriers that may result in evolution of new species. This is confirmed by the inability to express heterologous genes in C. albicans. If this hypothesis is valid, then the Candida genus should have arisen as a direct consequence of this genetic code alteration, thus illustrating how ambiguous expansion of the genetic code could have played a critical role in the evolution of the primordial life forms, whereas general mRNA mistranslation is de facto a generator of phenotypic diversity.
Materials and methods
Strains and growth conditions
Escherichia coli strain JM109 (recA1 SupE44 endA1 hsdR17 gyrA96 relA1 thi Δ[Lac-proAB] F'[traD36 proAB-lacI lacZ ΔM15) was used as a host for all DNA manipulations. C. albicans CAI-4 (ura3Δ::imm434/ura3::imm434) was grown at 30°C in YEPD (2% glucose; 1% yeast extract, and 1% peptone). Transformed C. albicans CAI-4 was grown in minimal medium lacking uridine (0.67% yeast nitrogen base without amino acids, 2% glucose, 2% agar and 100 μg/ml of the required amino acids). Growth under suboptimal conditions was performed in MM-uri at 37°C or supplemented with either 50 mmol/l citrate buffer (pH 4.0) or 1.5 mM H2O2 at 30°C. Opaque cells were grown at 25°C.
Plasmid construction and transformation
The C. albicans plasmids used in this study were based on the stable double ARS pRM1 vector described by Pla and coworkers , with the following modifications. A multi-cloning site was inserted (NruI/EcoRV) into that plasmid to construct plasmid pUA12. For heterologous expression of the S. cerevisiae tRNACAG gene in C. albicans CAI-4, a genomic DNA fragment containing the wild-type S. cerevisiae tRNAGAGLeu gene (90 base pairs [bp]) was cloned into Apa I/Ava III cloning sites of the pUA12 plasmid. Upstream of this gene, a 250 bp fragment of the 5' flanking C. albicans Ser-tRNACAG gene was also inserted at the XhoI/ApaI cloning sites, yielding the plasmid pUA15. The S. cerevisiae tRNAGAGLeu gene was then altered by site-directed mutagenesis to change its near cognate anticodon 5'-GAG-3' to the cognate anticodon 5'-CAG-3' for the CUG codon.
The reporter system was constructed on the basis of the C. albicans CaPGK1 gene and was assembled into pSL1190 in three cloning steps. First, the promoter and the amino-terminal sequence, encoding the first 69 amino acids of CaPGK1, was amplified with the forward primer 5'-ATTAGGAAGCTTAGTGTTGCGTGTGTGTCAG-3' and the reverse primer 5'-TTATCCCTCGAGACCGTTTGGTCTACCCAAG-3', and inserted at the HindIII and XhoI restriction sites of pSL1190. Second, a cassette containing the CUG codon and the sequence encoding both proteases cleavage sites, along with XhoI and SacII restriction sites, was inserted into the tail of the forward-primer 5'-ACTAGACCGCGGGATT ATAAAGATGATGATGATAAGAACGACAAATACTCATTAGC-3', which hybridized with CaPGK1. The reverse primer 5'-ATTAGATCGCGATTAGTGATGGTGAT GGTGATGGTTTTTGTTGGAAAGAGCAAC-3' had a six-histidine tail to aid protein purification by nickel affinity chromatography. This second fragment was cloned into the pSL1190 plasmid containing the first fragment at the XhoI and NruI restriction sites. Finally, the 3'-untranslated region sequence of CaeEF1-α was amplified with the forward primer 5'-CTCAACTCGCGAGCTAGTTGAATATTATGTAAGATCTG-3' and the reverse primer 5'-AATTTTCTGCAGCCTTTTGGTGTACGAGAG-3', and cloned into the NruI and PstI restriction sites of the plasmid from above. Once assembled in the pSL1190, the whole reporter protein was subcloned into the HindIII and PstI restriction sites of both pUA12 and pUA15. This yielded plasmids pUA63 and pUA65, respectively, which were used to determine CUG decoding ambiguity in C. albicans. DNA amplifications were carried out using a Mastercycle gradient (Eppendorf) and standard PCR protocols, and all the cloning was done as described by Sambrook and coworkers . Transformation of E. coli was carried out as described by Sambrook and coworkers , and C. albicans CAI-4 transformation was performed by the spheroplast method, as described in the .
Protein purification and digestion
Cells from overnight cultures were collected by centrifugation and lysed in 100 mmol/l NaH2PO4, 10.0 mmol/l Tris-Cl (pH 8.0), 8.0 mol/l urea, 2.0 mmol/l PMSF and complete mini EDTA-free protease inhibitor cocktail (Roche, Basel, Switzerland), using glass beads and a BeadBeater (Biospec Products, Bartlesville, OK, USA), with 15 cycles of 1 minute beating and 3 minutes resting on ice. The His-tagged reporter protein was purified by Ni-NTA agarose chromatography, as described by the manufacturer (Qiagen, Hilden, Germany). After fractionation on SDS-PAGE, the band corresponding to the reporter protein was cut and in-gel digested, as described by Kussmann and Roepstorff , except that the proteases used were enterokinase and thrombin (Novagen-Merck, Darmstadt, Germany) and the cleavage buffer was a 20 mmol/l Tris-Cl (pH 7.6), 0.15 mol/l NaCl, and 2.5 mmol/l CaCl2 solution.
Mass spectrometry and data analysis
Mass spectra were collected using a Micromass Q-ToF Micro (Waters, Milford, MA, USA) equipped with a nanoeletrospray ion source coupled to a nanoflow HPLC system (CapLC; Micromass). Synthetic peptides with amino acid sequences identical to that of the CUG-reporter peptide were used as mass fingerprint controls in all experiments. The identity of the peptides was determined by tandem mass spectrometry analysis. The spectra were analyzed with Masslynx software version 4.0 from Micromass. Peaks corresponding to leucine and serine containing peptides of +3 and +2 charges with m/Z of 508.56, 762.35, 499.88 and 749.32, respectively, were analyzed. The percentage of leucine incorporation at the CUG codon position was calculated as the fraction of the leucine peptide present in the mixture of both leucine and serine peptides. Three or four independent measurements were taken for quantification of leucine and serine incorporation at the CUG codon positions. An analysis of variance (ANOVA) of the data obtained was performed; when the null hypothesis of equal variances within groups of the ANOVA was rejected, the post-hoc Scheffe's test was used and the P values determined. In order to ensure that only the CUG codon was misread, the peaks corresponding to hypothetical peptides resulting from misreading of cognate codons by near-cognate tRNAs, namely of the aspartate-GAU codon as glutamate and the lysine-AAA codon as asparagine, were screened in the mass spectrum.
Bioinformatics analysis of the genome and proteome
The C. albicans genome (assembly 19; haploid version), containing 6,438 annotated ORFs, was downloaded from the Candida Genome Database  and analyzed with ANACONDA . This in-house built software package counted all codons present in the annotated ORFs. The probability of different proteins being generated from genes containing CUGs because of serine or leucine insertion at those CUG positions was calculated using the binomial distribution (b(i,n,P)):
Where n is the total number of CUG codons per gene, P is the probability of leucine incorporation at CUG positions for different percentages of ambiguity, and i is the number of CUGs decoded as leucine. (For example, for genes containing three CUGs, n = 3 and i = 0, 1, 2, or 3.) The total number of novel proteins in the proteome of C. albicans was estimated taking into consideration the studies of Ghaemmaghami and colleagues , who calculated the correlation between protein abundance and CAI and showed that protein abundance in yeast ranges from 50 up to more than 106 molecules per cell. We have assumed the following: all C. albicans genes are expressed; the abundance of proteins (N total ) is 5,000 molecules/cell for the 10% of genes with the lowest CAI values; the N total is 50,000 molecules/cell for the 10% of genes with the highest CAI values ; and the N total is 20,000 molecules/cell for the remaining 80% of genes. The number of novel proteins arising (N novel ) for each gene was given by the following equation:
N novel = N total × (1 - b(0,n,P)); where b(0,n,P) is the the probability of polypeptides having no leucine at CUG codons.
Phenotypic diversity analysis
C. albicans cells grown overnight at 30°C in MM-uri were serially diluted to 1,000 cells/ml. Approximately 50 cells were plated onto fresh agar plates and then allowed to grow at 30°C for 7 days in a humidified incubator to prevent drying of the agar surface. Sectored colonies exhibiting atypical morphology were scored and the data were analyzed for significance using ANOVA. Colonies were photographed using a Stemi 2000-C dissecting microscope equipped with AxioVision Software and a AxioCam HRc camera from Zeiss (Munich, Germany). Cells were photographed using a Zeiss MC80 Axioplan2 light microscope.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a figure showing maps of the pUA63 and pUA65 plasmids that were used to quantify CUG decoding ambiguity in C. albicans. Additional data file 2 is a figure of CUG codon context in various yeast species, including C. albicans. Additional data file 3 is a figure of the maps of pUA12 and pUA15 plasmids that were used throughout the study. Additional data file 4 is a figure showing that elimination of the pUA15 vector in 5-FOA selective media results in disappearance of phenotypic diversity. Additional data file 5 is a figure showing that the pUA15 plasmid did not alter the tRNACAGSer locus. Additional data file 6 is a figure of the amplification of the MTL locus of CAI-4/pUA12 and CAI-4/pUA15 cells.
analysis of variance
codon adaptation index
high-pressure liquid chromatography
heat shock protein
open reading frame
polymerase chain reaction
Knight RD, Freeland SJ, Landweber LF: Rewiring the keyboard: evolvability of the genetic code. Nat Rev Genet. 2001, 2: 49-58.
Anderson JC, Wu N, Santoro SW, Lakshman V, King DS, Schultz PG: An expanded genetic code with a functional quadruplet codon. Proc Natl Acad Sci USA. 2004, 101: 7566-7571.
Pastrnak M, Magliery TJ, Schultz PG: A new orthogonal suppressor tRNA/aminoacyl-tRNA synthetase pair for evolving an organism with an expanded genetic code. Helv Chim Acta. 2000, 83: 2277-2286.
Santoro SW, Anderson JC, Lakshman V, Schultz PG: An archaebacteria-derived glutamyl-tRNA synthetase and tRNA pair for unnatural amino acid mutagenesis of proteins in Escherichia coli. Nucleic Acids Res. 2003, 31: 6700-6709.
Zinoni F, Birkmann A, Leinfelder W, Bock A: Cotranslational insertion of selenocysteine into formate dehydrogenase from Escherichia coli directed by a UGA codon. Proc Natl Acad Sci USA. 1987, 84: 3156-3160.
Hao B, Gong W, Ferguson TK, James CM, Krzycki JA, Chan MK: A new UAG-encoded residue in the structure of a methanogen methyltransferase. Science. 2002, 296: 1462-1466.
Namy O, Rousset JP, Napthine S, Brierley I: Reprogrammed genetic decoding in cellular gene expression. Mol Cell. 2004, 13: 157-168.
Theobald-Dietrich A, Giege R, Rudinger-Thirion J: Evidence for the existence in mRNAs of a hairpin element responsible for ribosome dependent pyrrolysine insertion into proteins. Biochimie. 2005, 87: 813-817.
Curnow AW, Tumbula DL, Pelaschier JT, Min B, Soll D: Glutamyl-tRNA(Gln) amidotransferase in Deinococcus radiodurans may be confined to asparagine biosynthesis. Proc Natl Acad Sci USA. 1998, 95: 12838-12843.
Rogers KC, Soll D: Divergence of glutamate and glutamine aminoacylation pathways: providing the evolutionary rational for mischarging. J Mol Evol. 1995, 40: 476-481.
Tumbula-Hansen D, Feng L, Toogood H, Stetter KO, Soll D: Evolutionary divergence of the archaeal aspartyl-tRNA synthetases into discriminating and nondiscriminating forms. J Biol Chem. 2002, 277: 37184-37190.
Sauerwald A, Zhu W, Major TA, Roy H, Palioura S, Jahn D, Whitman WB, Yates JR, Ibba M, Soll D: RNA-dependent cysteine biosynthesis in archaea. Science. 2005, 307: 1969-1972.
Wong JTF: A co-evolution theory of the genetic code. Proc Natl Acad Sci USA. 1975, 72: 1909-1912.
Yokobori S, Suzuki T, Watanabe K: Genetic code variations in mitochondria: tRNA as a major determinant of genetic code plasticity. J Mol Evol. 2001, 53: 314-326.
Santos MAS, Keith G, Tuite MF: Non-standard translational events in Candida albicans mediated by an unusual seryl-tRNA with a 5'-CAG-3' (leucine) anticodon. EMBO J. 1993, 12: 607-616.
Santos MAS, Tuite MF: The CUG codon is decoded in vivo as serine and not leucine in Candida albicans. Nucleic Acids Res. 1995, 23: 1481-1486.
Santos MAS, Perreau VM, Tuite MF: Transfer RNA structural change is a key element in the reassignment of the CUG codon in Candida albicans. EMBO J. 1996, 15: 5060-5068.
Santos MAS, Ueda T, Watanabe K, Tuite MF: The non-standard genetic code of Candida spp.: an evolving genetic code or a novel mechanism for adaptation?. Mol Microbiol. 1997, 26: 423-431.
Massey SE, Moura G, Beltrao P, Almeida R, Garey JR, Tuite MF, Santos MA: Comparative evolutionary genomics unveils the molecular mechanism of reassignment of the CTG codon in Candida spp. Genome Res. 2003, 13: 544-557.
Sugiyama H, Ohkuma M, Masuda Y, Park SM, Ohta A, Takagi M: In vivo evidence for non-universal usage of the codon CUG in Candida maltosa. Yeast. 1995, 11: 43-52.
Suzuki T, Ueda T, Watanabe K: The 'polysemous' codon: a codon with multiple amino acid assignment caused by dual specificity of tRNA identity. EMBO J. 1997, 16: 1122-1134.
Kurland C, Gallant J: Errors of heterologous protein expression. Curr Opin Biotechnol. 1996, 7: 489-493.
Stansfield I, Jones KM, Herbert P, Lewendon A, Shaw WV, Tuite MF: Missense translation errors in Saccharomyces cerevisiae. J Mol Biol. 1998, 282: 13-24.
Ghaemmaghami S, Huh WK, Bower K, Howson RW, Belle A, Dephoure N, O'Shea EK, Weissman JS: Global analysis of protein expression in yeast. Nature. 2003, 425: 737-741.
Brown AJ: Morphogenetic signaling pathways in Candida albicans. Candida and Candidiasis. Edited by: Calderone R. 2002, Washington, DC: ASM Press, 95-106.
Soll DR: Phenotypic switching. Candida and candidiasis. Edited by: Riachard AC. 2002, Washington, DC: ASM Press, 123-142. 1
Miller MG, Johnson AD: White-opaque switching in Candida albicans is controlled by mating-type locus homeodomain proteins and allows efficient mating. Cell. 2002, 110: 293-302.
Wellington M, Kabir MA, Rustchenko E: 5-fluoro-orotic acid induces chromosome alterations in genetically manipulated strains of Candida albicans. Mycologia. 2006, 98: 393-398.
Magee BB, Magee PT: Induction of mating in Candida albicans by construction of MTLa and MTLalpha strains. Science. 2000, 289: 310-313.
Lockhart SR, Pujol C, Daniels KJ, Miller MG, Johnson AD, Pfaller MA, Soll DR: In Candida albicans, white-opaque switchers are homozygous for mating type. Genetics. 2002, 162: 737-745.
Pezo V, Metzgar D, Hendrickson TL, Waas WF, Hazebrouck S, Doring V, Marliere P, Schimmel P, Crecy-Lagard V: Artificially ambiguous genetic code confers growth yield advantage. Proc Natl Acad Sci USA. 2004, 101: 8593-8597.
Bacher JM, Bull JJ, Ellington AD: Evolution of phage with chemically ambiguous proteomes. BMC Evol Biol. 2003, 3: 24-
Santos MAS, Cheesman C, Costa V, Moradas-Ferreira P, Tuite MF: Selective advantages created by codon ambiguity allowed for the evolution of an alternative genetic code in Candida spp. Mol Microbiol. 1999, 31: 937-947.
Schultz DW, Yarus M: Transfer RNA mutation and the malleability of the genetic code. J Mol Biol. 1994, 235: 1377-1380.
Fukunaga R, Yokoyama S: Aminoacylation complex structures of leucyl-tRNA synthetase and tRNALeu reveal two modes of discriminator-base recognition. Nat Struct Mol Biol. 2005, 12: 915-922.
Soma A, Kumagai R, Nishikawa K, Himeno H: The anticodon loop is a major identity determinant of Saccharomyces cerevisiae tRNA(Leu). J Mol Biol. 1996, 263: 707-714.
Breitschopf K, Gross HJ: The exchange of the discriminator base A73 for G is alone sufficient to convert human tRNA(Leu) into a serine-acceptor in vitro. EMBO J. 1994, 13: 3166-3167.
Breitschopf K, Achsel T, Busch K, Gross HJ: Identity elements of human tRNA(Leu): structural requirements for converting human tRNA(Ser) into a leucine acceptor in vitro. Nucleic Acids Res. 1995, 23: 3633-3637.
O' Sullivan JM, Mihr MJ, Santos MAS, Tuite MF: The Candida albicans gene encoding the cytoplasmic leucyl-tRNAsynthetase: implications for the evolution of CUG codon reassignment. Gene. 2001, 275: 133-140.
Calderone RA, Fonzi WA: Virulence factors of Candida albicans. Trends Microbiol. 2001, 9: 327-335.
Cutler JE: Putative virulence factors of Candida albicans. Annu Rev Microbiol. 1991, 45: 187-218.
Berman J, Sudbery PE: Candida albicans: a molecular revolution built on lessons from budding yeast. Nat Rev Genet. 2002, 3: 918-930.
Freist W, Sternbach H, Pardowitz I, Cramer F: Accuracy of protein biosynthesis: quasi-species nature of proteins and possibility of error catastrophes. J Theor Biol. 1998, 193: 19-38.
Barton RC, Scherer S: Induced chromosome rearrangements and morphologic variation in Candida albicans. J Bacteriol. 1994, 176: 756-763.
Rustchenko E: Chromosome instability in Candida albicans. FEMS Yeast Res. 2007, 7: 2-11.
Larrinoa IF, Heredia CF: Yeast proteinase yscB inactivates the leucyl tRNA synthetase in extracts of Saccharomyces cerevisiae. Biochim Biophys Acta. 1991, 1073: 502-508.
Queitsch C, Sangster TA, Lindquist S: Hsp90 as a capacitor of phenotypic variation. Nature. 2002, 417: 618-624.
Rutherford SL, Lindquist S: Hsp90 as a capacitor for morphological evolution. Nature. 1998, 396: 336-342.
Sollars V, Lu X, Xiao L, Wang X, Garfinkel MD, Ruden DM: Evidence for an epigenetic mechanism by which Hsp90 acts as a capacitor for morphological evolution. Nat Genet. 2003, 33: 70-74.
True HL, Lindquist SL: A yeast prion provides a mechanism for genetic variation and phenotypic diversity. Nature. 2000, 407: 477-483.
Cowen LE, Lindquist S: Hsp90 potentiates the rapid evolution of new traits: drug resistance in diverse fungi. Science. 2005, 309: 2185-2189.
Tuite MF, Lindquist SL: Maintenance and inheritance of yeast prions. Trends Genet. 1996, 12: 467-471.
Wilson MA, Meaux S, Parker R, van Hoof A: Genetic interactions between [PSI+] and nonstop mRNA decay affect phenotypic variation. Proc Natl Acad Sci USA. 2005, 102: 10244-10249.
Nangle LA, Motta CM, Schimmel P: Global effects of mistranslation from an editing defect in mammalian cells. Chem Biol. 2006, 13: 1091-1100.
Lee JW, Beebe K, Nangle LA, Jang J, Longo-Guess CM, Cook SA, Davisson MT, Sundberg JP, Schimmel P, Ackerman SL: Editing-defective tRNA synthetase causes protein misfolding and neurodegeneration. Nature. 2006, 443: 50-55.
Bacher JM, Ellington AD: Selection and characterization of Escherichia coli variants capable of growth on an otherwise toxic tryptophan analogue. J Bacteriol. 2001, 183: 5414-5425.
Balashov S, Humayun MZ: Mistranslation induced by streptomycin provokes a RecABC/RuvABC-dependent mutator phenotype in Escherichia coli cells. J Mol Biol. 2002, 315: 513-527.
Ren L, Rahman MS, Humayun MZ: Escherichia coli cells exposed to streptomycin display a mutator phenotype. J Bacteriol. 1999, 181: 1043-1044.
Slupska MM, Baikalov C, Lloyd R, Miller JH: Mutator tRNAs are encoded by the Escherichia coli mutator genes mutA and mutC: a novel pathway for mutagenesis. Proc Natl Acad Sci USA. 1996, 93: 4380-4385.
Di Giulio M: Genetic code origin: are the pathways of type Glu-tRNA(Gln) --> Gln-tRNA(Gln) molecular fossils or not?. J Mol Evol. 2002, 55: 616-622.
Pla J, Perez-Diaz RM, Navarro-Garcia F, Sanchez M, Nombela C: Cloning of the Candida albicans HIS1 gene by direct complementation of a C. albicans histidine auxotroph using an improved double-ARS shuttle vector. Gene. 1995, 165: 115-120.
Sambrook J, Fritsch EF, Maniatis T: Molecular Cloning: a Laboratory Manual. 1989, Cold Spring Harbor, NY: Cold Spring Harbor Laboratory Press
Invitrogen: Manual for Preparation and Transformation of Pichia pastoris Spheroplasts, version A. 2002, San Diego, CA: Invitrogen, [http://www.invitrogen.com/content/sfs/manuals/pichspher_man.pdf]
Kussmann M, Roepstorff P: Sample preparation techniques for peptides and proteins analysis by MALDI-MS. Mass Spectrometry of Proteins and Peptides: Methods in Molecular Biology. 2000, New Jersey: Humana Press, 146: 405-424. 1
d'Enfert C, Goyard S, Rodriguez-Arnaveilhe S, Frangeul L, Jones L, Tekaia F, Bader O, Albrecht A, Castillo L, Dominguez A, et al: CandidaDB: a genome database for Candida albicans pathogenomics. Nucleic Acids Res. 2005, D353-D357. 33 Database issue
Moura G, Pinheiro M, Silva R, Miranda I, Afreixo V, Dias G, Freitas A, Oliveira JL, Santos MA: Comparative context analysis of codon pairs on an ORFeome scale. Genome Biol. 2005, 6: R28-
Sharp PM, Li WH: The codon adaptation index: a measure of directional synonymous codon usage bias, and its potential applications. Nucleic Acids Res. 1987, 15: 1281-1295.
We are most grateful to Mick F Tuite for his useful comments and critical reading of the manuscript, to Jorge Rino for helping with the light microscopy studies, to Alexander Jonhson for providing the C. albicans CAI-4 strain, and to Concha Gil for the pRM1 plasmid. This study was supported by FCT/FEDER projects REF: POCI/BIA-MIC/55466/04, POCI/BIA-PRO/55472/2004, and POCI/SAU-MMO/55476/2004. IM, RR and ACG are supported by FCT/FEDER, BD/19807/99, BD/8296/2002, SFRH/BD/15233/2004 PhD grants, respectively. MASS was supported by an EMBO YIP and a Human Frontier Science Programme Grant (REF: RGP45/2005). BT and AA are supported by Wellcome Trust and EP Abraham Research Fund (Oxford).
ACG, IM, and GRM carried out experimental work. RMS and GRM contributed to data discussion. AK and BT helped with mass spectrometry analysis. MASS wrote the manuscript, supervised the study, and contributed to the experimental design.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Gomes, A.C., Miranda, I., Silva, R.M. et al. A genetic code alteration generates a proteome of high diversity in the human pathogen Candida albicans. Genome Biol 8, R206 (2007) doi:10.1186/gb-2007-8-10-r206
- Genetic Code
- Additional Data File
- Editing Domain
- Opaque Cell