Genomic and proteomic adaptations to growth at high temperature
© BioMed Central Ltd 2004
Published: 30 September 2004
Skip to main content
© BioMed Central Ltd 2004
Published: 30 September 2004
Most positively selected mutations cause changes in metabolism, resulting in a better-adapted phenotype. But as well as acting on the information content of genes, natural selection may also act directly on nucleic acid and protein molecules. We review the evidence for direct temperature-dependent natural selection acting on genomes, transcriptomes and proteomes.
Variations in environmental temperature represent an obvious and easily quantifiable form of environmental heterogeneity. Biologists have long been aware of a host of behavioral, morphological and physiological adaptations to this environmental variable. Recently, the accumulation of genomic data has led to an interest in another type of temperature adaptation. Specifically, we would like to know whether the genomes themselves - along with their encoded proteomes - are subject to predictable, temperature-dependent patterns of molecular evolution.
While variations in environmental temperature share many of the characteristics of other environmental variables, temperature is special because of its pervasiveness: it can penetrate physical barriers and can have dramatic effects on the structure of virtually all macromolecules. And given that temperature variation affects all levels of biological adaptation, we see adaptive responses at all of these levels. For instance, variations in environmental temperature can be used to explain the evolution of biological phenomena as diverse as the migration patterns of birds, on the one hand, or the density of hydrogen bonds in a nucleic acid sequence, on the other.
A number of recent studies (discussed in more detail below) have shown other sequence differences between mesophiles and thermophiles, such as the increased level of purine bases in the coding strands of thermophiles [4, 8, 13, 14]. While these effects can be detected at the DNA level, and may be due to the effects of natural selection, they reflect selection for RNA stability rather than direct selection on DNA.
The transcriptome includes both the structural RNAs (such as ribosomal and transfer RNAs, rRNAs and tRNAs) and the protein-encoding messenger RNAs. One could argue that these molecules, especially the structural RNAs, would be subject to the same temperature-dependent constraints as DNA. Of course, given that the expected correlation between G+C content of genomic DNA and growth temperature is not seen, we might expect that the correlation would also be lacking at the RNA level. But, interestingly, this is not the case. For instance, Galtier and Lobry  demonstrated that there is a significant correlation between the G+C content of structural RNAs and growth temperature, and that the high G+C content was concentrated in the double-stranded stem regions of the molecule. This provides strong evidence for selection acting to increase the thermostability of these regions by changing the nucleotide composition. Indeed, this enrichment of G and C is so striking that structural RNA genes virtually identify themselves within the genomes of hyperthermophiles whose DNA is otherwise AT-rich .
The effects of natural selection are not limited to the double-stranded regions of these RNAs, however: selection is also acting to reduce the G+C content of the single-stranded regions of rRNA molecules, thus maintaining them in the single-stranded state . An obvious question that comes to mind is why we observe the expected correlation between nucleotide content and growth temperature in the paired regions of an RNA molecule, but not in double-stranded DNA. One possible answer is that single mutations affecting nucleotide composition have a much greater effect on the stability of the stem regions of an RNA molecule than they do on double-stranded genomic DNA, simply because the length of the paired region is much shorter in the RNA molecule.
In contrast to structural RNAs, the critical feature of the protein-coding messenger RNAs is not their secondary structure but their coding capacity. Thus we might not a priori expect to see strong selection for structural stability in these molecules. While it is true that a given, specific secondary structure may not be important for mRNAs, stability per se is critically important, because it affects the steady-state level of the genetic message within the cell. There is now growing evidence [8, 13, 14, 16–18] that all single-stranded RNA molecules, along with the single-stranded segments of structural RNAs, show characteristic patterns of nucleotide composition in all organisms. Specifically, they are relatively rich in purines, particularly adenine [13, 14, 16]. Moreover, the degree of purine-richness correlates with environmental growth temperature. The initial interpretation of these trends  was that they acted to prevent purine-pyrimidine base pairing between coding sequences. Such base pairing would be prevented by having a preponderance of one type of base - either purines or pyrimidines - on the coding strand. Subsequent studies [4, 8, 13] indicate, however, that the selection is specifically for purines.
Although different synonymous codons may encode a single amino acid, there has been considerable interest in the possibility that some codons are functionally 'preferred'. The idea of preferred codons stems from the work of Ikemura , who showed a positive correlation between the frequency of particular codons and the abundance of their cognate tRNAs. Over the past two decades, many genomic studies have attempted to detect clear evidence for selection acting on synonymous codons, but despite all of these studies it now appears that the major determinant of synonymous codon usage on a genome-wide scale is mutational bias rather than selection [10, 20–22]. Despite the dominant effect of nucleotide composition, recent genomic surveys have shown that environmental growth temperature can have an important secondary effect on patterns of synonymous codon usage [8, 23, 24]. Although there is no obvious explanation for why particular codons are used preferentially among thermophiles, the fact that the pattern is repeated within different evolutionary lineages provides strong support for the fact that it is based on natural selection.
Given that the thermolability of protein structures - like that of nucleic acid structures - can easily be demonstrated in the laboratory, and since protein function depends on protein structure, we expect the proteins of thermophilic organisms to have been subjected to intense natural selection for stability at high temperature. It is, however, difficult to predict the precise outcome of such selection because the forces governing protein structure and function are not yet well understood. Many comparisons of individual protein sequences between mesophiles and thermophiles have been reported in the recent literature. Although several of these studies point to differences between thermophilic proteins and their mesophilic homologs, different studies have tended to identify different aspects of protein sequence and structure as contributing to thermostability . The attraction of studying entire proteomes is that we can hope to identify the more 'universal' adaptations underlying protein stability at high temperature. But, as pointed out by Petsko , the problem with such genome-wide studies is that they may only discover some of the lowest common denominators for thermal adaptation at the protein level.
Most of the proteome-based studies to date have focused on the average amino-acid composition of proteins in the proteomes of mesophiles and thermophiles. If we consider that protein structure is determined to a large extent by the primary amino-acid sequences, then we can look for consistent differences in amino-acid composition between the proteins of thermophiles and mesophiles. Such differences have been reported for individual genes and in whole-genome comparisons [8, 27–29]. These studies show that while the average amino-acid composition of a given proteome is dramatically affected by the underlying patterns of genomic nucleotide bias [6, 9], there is a secondary but highly significant effect of growth temperature. One study  found a significant effect of nucleotide bias, but did not reveal any selection on the amino-acid content of thermophilic proteins. By limiting the analysis to a subset of genomes with comparable nucleotide compositions, we  showed that the major effect of thermophily at the proteome level was a significant reduction in the frequency of the thermolabile amino acids histidine, glutamine and threonine. This is consistent with the recent observation of increased evolutionary constraint on thermophilic proteomes . The concomitant increase, among thermophiles, of both positively charged residues (arginine and lysine) and negatively charged residues (glutamic acid) suggests that ionic bonds between oppositely charged residues may help to stabilize multimeric proteins at high temperature . The proteomes of thermophiles also contain a larger fraction of proteins with isoelectric points in the basic range , and a general bias in favor of charged rather than polar residues among thermophiles has been noted in two separate studies [32, 33]. One of the genome-wide surveys  also found support for the conclusions of previous pilot studies (based on one or a few genes) that there are average length differences between the proteins of mesophilic and thermophilic species [32, 34, 35]. Specifically, the proteins of thermophiles tend to be somewhat shorter than their mesophilic homologs. Finally, a number of recent structural genomics studies [36–39] support the sequence-based studies in that they point to an increase in intra-helical salt bridges and in hydrogen-bond formation among thermophiles. The increased number of salt bridges may contribute to protein stability at high temperature .
Most species can survive for short periods of time at temperatures that are significantly higher than their normal growth temperature. Such a pulse of increased temperature usually triggers the expression of heat-shock proteins that act as chaperones to facilitate protein stabilization and proper protein folding. Such protein chaperones do, in fact, also play a role in thermophiles . Furthermore, genome-sequence surveys have uncovered evidence for a novel, thermophile-specific set molecular chaperones among highly thermophilic species . Thus, in addition to encoding more thermostable mRNAs and proteins, thermophilic organisms may devote more energy to the stabilization of those proteins at high temperature.
A significant complication in genomic surveys, although one that is often ignored, is that the average patterns seen in genomes and proteomes are not independent; for instance, the nucleotide composition of the genome can have a dramatic effect on the amino-acid composition of the encoded proteome [6, 43, 44]. Although most of the studies to date have looked at the effect of G+C content on protein composition, similar effects will result from other kinds of genomic biases [45, 46]. For instance, a genome whose coding regions are very rich in purines will necessarily encode a proteome that is deficient in phenylalanine residues, and a genome with pyrimidine-rich coding regions would correspondingly encode few lysines and glutamic acids. Thus, if the sequences on the coding strand are subject to selection for increased purine content because of increased mRNA stability, this selection at the level of RNA can result in a correlated change in the amino-acid content of the proteins, and even in deterministic changes in the biochemical properties of these proteins - the isoelectric point, for example. Many recent studies have discussed the possibility that mutational biases can mimic the effects of selection, but few authors seem aware of the problem where a selective effect at one level results in an apparent selective effect at another level.
Several authors have drawn parallels between thermophilic and mesophilic microbes on the one hand, and warm- and cold-blooded vertebrates on the other. In fact, a considerable amount of work has been done on the correlation of differences in genomic G+C content with the body temperature of animals . Although at first glance there does appear to be a convincing correlation between elevated genomic G+C content (especially in isochore regions) and homeothermy, these results are subject to alternative explanations. For instance, the higher G+C content in certain regions of mammalian genomes may be due to elevated recombination rates in those regions [56, 57]. It is also worth noting that the body temperature of mammals is well below 45°C, which is usually taken as the lower threshold for thermophily among prokaryotes.
In conclusion, given that temperature is a single, clearly defined environmental variable, one might expect to see a single, characteristic genomic and/or proteomic response to changes in this variable. We do see selective responses at the nucleic acid and protein levels, but they are varied and unpredictable. It is especially difficult to predict any significant differences above the level of primary sequence composition. A number of general trends have been identified in the sequence composition of DNA, RNA and proteins, but it has proved much more difficult to identify thermophilic responses at the higher levels of structural organization. This is particularly true of protein structure, partly because we do not yet have a good understanding of the rules governing protein folding, and partly because it now seems likely that different proteins may respond to selection for greater thermostability in distinctly different ways. Despite the obvious complexities of the issue, we can expect widespread continued study of temperature adaptation at the molecular level, especially in proteins, because the results are not only of great biological interest but also of commercial and practical interest - both in the discovery of new, naturally occurring 'thermozymes' and in the design of new custom thermozymes for industrial purposes [58–60].
The authors' research was supported by grants from NSERC Canada to DAH and from the Science Foundation Ireland to K.H. Wolfe, supervisor to GACS.