- Open Access
Developmental stage related patterns of codon usage and genomic GC content: searching for evolutionary fingerprints with models of stem cell differentiation
Genome Biologyvolume 8, Article number: R35 (2007)
The usage of synonymous codons shows considerable variation among mammalian genes. How and why this usage is non-random are fundamental biological questions and remain controversial. It is also important to explore whether mammalian genes that are selectively expressed at different developmental stages bear different molecular features.
In two models of mouse stem cell differentiation, we established correlations between codon usage and the patterns of gene expression. We found that the optimal codons exhibited variation (AT- or GC-ending codons) in different cell types within the developmental hierarchy. We also found that genes that were enriched (developmental-pivotal genes) or specifically expressed (developmental-specific genes) at different developmental stages had different patterns of codon usage and local genomic GC (GCg) content. Moreover, at the same developmental stage, developmental-specific genes generally used more GC-ending codons and had higher GCg content compared with developmental-pivotal genes. Further analyses suggest that the model of translational selection might be consistent with the developmental stage-related patterns of codon usage, especially for the AT-ending optimal codons. In addition, our data show that after human-mouse divergence, the influence of selective constraints is still detectable.
Our findings suggest that developmental stage-related patterns of gene expression are correlated with codon usage (GC3) and GCg content in stem cell hierarchies. Moreover, this paper provides evidence for the influence of natural selection at synonymous sites in the mouse genome and novel clues for linking the molecular features of genes to their patterns of expression during mammalian ontogenesis.
Synonymous codons, which encode the same amino acid, are not used randomly. Such codon usage biases are explained as the balance between mutational drift and natural selection . In unicellular organisms [2–6] and invertebrate metazoans [7–11], the levels of gene expression can be used to interpret their codon biases. Specifically, highly expressed genes, compared with weakly expressed ones, selectively use 'optimal codons' that correspond to abundant tRNAs so as to improve their translational efficiency [11–15].
Nevertheless, in vertebrates, whose genes display more dramatic codon usage biases than those of simple organisms , the correlations between codon usage and patterns of gene expression (that is, the levels and breadth of gene expression) remain a subject of controversy [11, 16]. In a number of rodent and human tissues, recent studies have indicated positive correlations between levels of gene expression, as estimated by SAGE and/or microarray analysis, and GC3 [16–19]. However, these results are in contradiction with observations made by analyzing expressed sequence tags (ESTs) [11, 16]. Among extremely highly expressed genes, the H3 histone gene family is biased to use GC-ending codons . However, there is no difference in codon usage between ribosomal protein genes, which are also expressed at very high levels, and other genes . As to correlations between breadth of gene expression and codon usage, some studies suggest that housekeeping genes, with a wider breadth of expression, are biased to use GC-ending codons [18, 21–24] (also see the debate between  and ); however, other papers have described different observations [11, 26–29]. Although codon usage has been found to exhibit variations in human genes specifically expressed in six tissues , the effect is very weak  and cannot be generalized to interpret the global variation (the preference of AT-ending or GC-ending codons) of synonymous codons in the thousands of mammalian genes.
Moreover, in vertebrates, the reasons why there are correlations between codon usage and patterns of gene expression remain to be elucidated. By using multivariance analyses (MVA), highly expressed genes have been observed to have excessive usage of T-ending codons in Xenopus  and the Cyprinidae family . However, both natural selection and 'transcriptional associated mutation bias' (TAMB) [34–36] would account for these observations. In the tissues with no evidence of TAMB, a set of GC-ending codons favored in highly expressed genes has been suggested to be optimal codons . Moreover, GC-ending codons are more abundant in highly expressed genes  and constitutively spliced exons . However, if GC-ending codons are optimal due to selective advantages, it is difficult to see why the synonymous substitution rate (Ks) would be increased with GC-ending codon usage [38–41] or why the Ks of alternatively spliced exons would be lower than that of constitutively spliced exons . It has been reported that highly expressed genes have higher recombination rates [43–45]. Moreover, according to the model of biased gene conversion (BGC), recombination rates are positively correlated with GC3 [46–51], indicating that both natural selection and BGC may be responsible for the correlations between the levels of gene expression and GC3. The variations of synonymous codon usage among tissue-specific genes have been suggested to be the consequence of translational selection ; a recent study, however, has indicated that these observations were due to regional variations of substitutional patterns rather than translational selection . Taken together, further research is obviously still needed to explore the mechanisms of vertebrate codon usage bias.
In this paper, to investigate the regularity and mechanisms of mammalian codon usage, we have taken developmental stage-related patterns of gene expression into account in models of stem cell differentiation (Figure 1 and Table 1). Stem cells, progenitor cells and their derivates, defined by their distinct differentiation potential (Figure 1a), play critical roles in the early stages of metazoan ontogenesis and thus provide ideal models of the mammalian developmental hierarchy. Moreover, developmental processes are believed to be of critical importance to the investigation of evolutionary mechanisms , even at the genomic level . In the current study, therefore, we have investigated the correlations between developmental stage-related patterns of gene expression and codon usage in developmental hierarchies of stem cell differentiation. Specifically, we have taken advantage of two independent models of stem cell differentiation [54, 55] to identify developmental stage-related patterns of gene expression, as well as the correlations between these patterns of gene expression and codon usage.
To define the developmental stage-related patterns of gene expression in models of stem cell differentiation, we have introduced two parameters. First, the 'level of gene expression' has been defined as the intensity of gene transcription in a particular cell type. Second, the 'fold change of gene expression' has been defined as the ratio of the expression levels of the same gene in two cell types of two neighboring stages in the developmental hierarchy (Figure 1b). We have further defined one of these two cell types, in the upper developmental hierarchy, as the earlier cell type, and the other, in the lower developmental hierarchy, as the later cell type. These two cell types together constitute a 'differentiation pair'. Thus, the 'fold change of gene expression' is a descriptive index of the levels of gene enrichment in a given differentiation pair.
In the present work, we investigate the correlations between developmental stage-related patterns of gene expression (that is, the 'levels of gene expression' in each cell type in the models of stem cell differentiation and the 'fold changes of gene expression' in each differentiation pair) and the molecular features (GC3 and genomic GC (GCg) content) of these genes. We also explore possible mechanisms for these developmental stage-related patterns of codon usage. This study reveals that developmental stage-related patterns of gene expression are correlated with GC3 and GCg in models of stem cell differentiation. Moreover, these analyses suggest that the model of translational selection, rather than other known hypotheses that have been put forward, might be the most likely to account for the developmental stage-related patterns of codon usage, especially for the negative correlations between the levels of gene expression and GC3.
'Levels of gene expression' are correlated with GC3 and GCg: variation of optimal codons within developmental hierarchies
First, we focused on the correlations between the levels of gene expression and GC3. We found significant negative correlations between the levels of gene expression and GC3 in eight cell types (P < 0.005; Table 2). In these datasets, we observed that only in the lateral ventricles of the brain (LVB), which contain predominantly mature neural cells, were the levels of gene expression significantly positively correlated with GC3 (P < 0.005; Table 2). We next investigated the variation of codon usage between 'highly expressed genes' and 'mid to lowly expressed genes', which were divided by quantiles of 0.67 (Q0.67) of the levels of gene expression in each cell type. We observed that in the eight cell types in which the levels of gene expression were negatively correlated with GC3, the highly expressed genes used significantly more AT-ending codons compared with the mid to lowly expressed genes (P < 0.01; Table 2). In addition, in LVB, highly expressed genes used more GC-ending codons than mid to lowly expressed genes (P < 0.05; Table 2). The 'optimal codons' are defined here as the codons that were preferentially present in highly expressed genes. Our observations, therefore, show that the optimal codons vary within the developmental hierarchies.
In accordance with the variation in GC3, we found that GCg was also significantly different between highly expressed genes and mid to lowly expressed genes in each of the nine cell types (P < 0.05), where the levels of gene expression were significantly correlated (positively in LVB or negatively in the eight cell types) with GC3 (P < 0.005; Table 2). Consistent with earlier studies (for example, [14, 40]), we observed that GC3 and GCg were closely correlated in our dataset (Spearman rank correlation coefficient (Rs) = 0.665, N = 11,066; P < 10-6). We thus suggest that the variation of GCg between the highly expressed and mid to lowly expressed genes might well be a consequence of this correlation.
'Fold changes of gene expression' are correlated with GC3 and GCg: genes specifically expressed in different developmental stages bear different molecular features
First, we established correlations between the fold changes of gene expression and GC3 in 12 differentiation pairs for which there was experimental evidence of the differentiation processes (Figure 1b; also see Discussion). We found that in 10 of the 12 differentiation pairs, the fold changes of gene expression were significantly correlated with GC3 (P < 0.005; Table 3). Strikingly, in differentiation pairs of neural stem cells (NSCs)/LVB and embryonic stem cells (ESCs)/hematopoietic stem cells (HSCs), up to 14.3% (Rs = 0.378) and 11.4% (Rs = 0.338) variation of GC3 could be explained by the respective fold changes of gene expression in these differentiation pairs.
We next investigated the variation of GC3 and GCg between genes enriched in two cell types of each differentiation pair. When genes are expressed in both cell types of a given differentiation pair, the 'fold change of gene expression' is a measurement of the level of gene enrichment in this differentiation pair. Thus, if the fold change of a certain gene expression is higher than 2 or less than 0.5, this gene is defined as a developmental-pivotal gene in this paper. Our results show that, in nine differentiation pairs, GC3 between the developmental-pivotal genes enriched at the earlier and later developmental stages differed significantly (P < 0.05; Table 3). Moreover, we also found GCg between these two groups of genes to be significantly different in seven differentiation pairs (P < 0.05), especially in ESC/NSC, NSC/LVB, ESC/HSC, and ESC/fetal neural stem cells (FNSCs) (P < 0.001; Table 3).
It should be noted that some genes, which were only expressed in either the earlier or later developmental stages, cannot be described in terms of 'fold change of gene expression'. We have defined these genes as developmental-specific genes. We found that both GC3 and GCg were different between developmental-specific genes in seven differentiation pairs (P < 0.05; Table 3). In addition, at the same developmental stage, most groups of developmental-specific genes generally use more GC-ending codons and are located in genomic domains with higher GC content compared with developmental-pivotal genes (Table 3; Additional data file 1).
Possible mechanisms of developmental stage-related codon usage: testing the hypotheses of BGC, TAMB and natural selection
We then attempted to investigate the mechanisms resulting in the patterns of developmental stage-related codon usage observed. In mammals, BGC, mutational bias, and natural selection have been suggested to account for the biased usage of synonymous codons [11, 40].
The BGC model suggests a positive correlation between GC content (including GC3) and recombination rates [46–50]. We observed that GC3 was positively correlated with recombination rates in our datasets (Rs = 0.14, N = 10383, P < 10-6). In this paper, we established the correlations between GC3 and the patterns of gene expression. Therefore, to determine if the developmental stage-related patterns of codon usage are byproducts of the BGC effect, we further studied the correlations between the patterns of gene expression and recombination rates. No significant correlations between recombination rates and the levels of gene expression were observed (Rs range from -0.033 to 0.020, P > 0.10; Additional data file 2). The only exception was in fetal liver mature blood cells (FLMBCs; Rs = -0.043, P = 0.02), but this correlation coefficient was weaker than that between the levels of gene expression and GC3 in FLMBCs. In our datasets, the fold changes of gene expression were significantly correlated with recombination rates only in the differentiation pairs NSC/LVB and FLLCP/FLMBC (Rs = -0.083 and 0.062, respectively, P < 0.01; Additional data file 3). Moreover, these correlation coefficients were weaker than those between the fold changes of gene expression and GC3 in these differentiation pairs (Table 3). In other differentiation pairs, no significant correlations between the fold changes of gene expression and recombination rates were observed (Rs range from -0.045 to 0.034, P > 0.05; Additional data file 3). We also observed that the recombination rates of developmental-specific genes, with their excessive usage of GC-ending codons, were not significantly higher than those of non-development pivotal genes (the fold changes of gene expression are within 0.5 and 2) (data not shown). Taken together, our results suggest that the developmental stage-related patterns of codon usage are not byproducts of the BGC effect.
The model of mutational bias proposes that the codon bias is simply due to unbalanced base substitutions [15, 56–60]. Transcriptional processes can increase the mutation frequency from cytidine (C) to thymine (T) and adenosine (A) to guanosine (G), because the single-stranded DNA that more frequently appears during the course of transcription is more sensitive to deamination [34–36]. This TAMB model thus predicts a positive correlation between the levels of gene expression and the T or G content. If TAMB is the only cause of the excessive usage of T-ending and G-ending codons in highly expressed genes, we would expect an increase in the T3/G3 (T/G content at the third codon position) and Ti/Gi (T/G content in the untranslated region) in parallel with the levels of gene expression. To evaluate the influence of TAMB, we measured the slopes of Ni (the nucleotide content in the untranslated regions) and N3 (the nucleotide content at the third codon position) with the levels of gene expression as the descriptive index of their increase rates. Our results show that although there was a parallel increase in G3 and Gi in LVB, the increase in T3 (with the slopes ranging from 5.38 to 10.60) was more rapid than the increase in Ti (with the slopes ranging from 1.86 to 5.03) in other cell types where the levels of gene expression were negatively correlated with GC3 (Additional data file 2). Moreover, the increase in C3 (in LVB) relative to the levels of gene expression was not due to the contribution of TAMB. Consequently, although these results cannot completely rule out a potential effect of TAMB, there is a strong suggestion that some factors other than TAMB are the primary cause underlying our observations.
Natural selection could act on mammalian genes, for example, highly expressed genes are reported to prefer shorter [19, 61] and less introns , as well as cheaper amino acids  (however, see ). Natural selection could also influence mammalian codon usage biases [62–68], for example, at the levels of transcription [69, 70], RNA processing [71–73], translation [19, 62, 74, 75] and mRNA secondary structure , as well as at the protein level [77, 78]. If codons are selected to improve transcriptional efficiency, there would be more GC-ending codons in highly expressed genes, as the conformation of DNA with a higher GC content would facilitate transcription [69, 70]. Therefore, it is not likely that the excessive usage of AT-ending codons in highly expressed genes is a result of this effect. If certain codons have selective advantages of translational efficiency over other codons, these codons would be used more frequently in highly expressed than in weakly expressed genes. Therefore, the correlations between the levels of gene expression and codon usage seem to be consistent with this hypothesis. Taken together, it is more likely that the model of translational selection, rather than BGC or TAMB, would account for these findings, especially for the negative correlations between the levels of gene expression and GC3.
If the codon bias of highly expressed genes has undergone selective pressures, it would be useful to determine whether selective pressures were still effective after the human-mouse divergence. Assuming mutational rates are near homogeneous in the mammalian genome, there would be lower synonymous substitution rates (Ks) between human-mouse orthologous genes if selective pressure was still effective. Except for HSCs, bone marrow (BM) of model A and CD45 of model B, our results show that highly expressed genes had lower Ks compared with mid to lowly expressed genes in all other cell types (P < 0.05; Table 2). Previous studies have indicated that the substitution rates at nonsynonymous sites may indirectly affect silent substitution rates . We thus removed the codons in which doublet substitutions occurred to recalculate synonymous substitution rates (Ks_noDS) . The data show that, in each of the 15 cell types in the different developmental stages, highly expressed genes had lower Ks_noDS compared to mid to lowly expressed genes (P < 0.05; Table 2). Moreover, we also demonstrate that the nonsynonymous substitution rates (Ka) and Ka/Ks of highly expressed genes are significantly lower than those of mid to lowly expressed genes (P < 0.01; Table 2).
We next focused on the substitution rates of developmental-pivotal genes and developmental-specific genes. We found that the developmental-pivotal genes in the earlier developmental stages of ESC/HSC and NSC/LVB had lower Ks and Ka/Ks than non-developmental-pivotal genes (P < 0.05; Table 3). Moreover, developmental-pivotal genes in the earlier developmental stages of ESC/HSC had lower Ks_noDS after removal of doublet substitutions (P < 0.05; Table 3). These results suggest the possibility that negative selection following human and mouse divergence may still be detectable in terms of the codon usage of some groups of developmental-pivotal genes. Nevertheless, we also show that many groups of developmental-pivotal genes, as well as almost all groups of developmental-specific genes, have higher Ks, Ka/Ks and Ks_noDS compared with non-developmental-pivotal genes (Table 3).
The models of stem cell differentiation are precise descriptions of developmental hierarchies of mammalian ontogenesis
In this paper, to investigate developmental-stage related patterns of mammalian codon usage, we used two models of stem cell differentiation to define the developmental-stage related patterns of gene expression. Here we suggest that the patterns of gene expression defined in these models are faithful reflections of developmental regulation. First, development, as a process of ontogenesis, can be divided into many stages according to the steps of cellular differentiation. In our models, distinct cell types within the processes of differentiation were isolated with high homogeneity by strategies of selective culture and fluorescence activated cell sorting (FACS) (Table 1). To identify the patterns of gene expression in early developmental stages, these strategies of cell isolation seem more precise than those used previously, which postulated that complete embryos represent 'early developmental stages' [26, 81], because embryos in fact are a mixture of differentiated mature cells with undifferentiated stem cells. Second, in our models, the processes of stem cell differentiation (Figure 1b) were constructed according to published experimental evidence. The pluripotency of ESCs can be examined by injecting them into blastocysts to produce normal embryos [82–84]. ESCs are able to differentiate into multipotent stem cells (MSCs), including the MSCs in neural  and hematopoietic  tissues. Moreover, both FNSCs  and adult NSCs  are able to generate mature neural cells in vitro and in vivo, including neurons, astrocytes and oligodendrocytes. Furthermore, both fetal liver hematopoietic stem cells (FLHSCs)  and bone marrow HSCs (or long-term hematopoietic stem cells (LTHSCs))  can functionally repopulate entire hematopoietic systems in recipients. In these repopulation processes, hematopoietic stem cells give rise to mature blood cells by generating lineage-committed progenitors (LCPs). Notably, in cell lineage tracing assays, FLHSCs have been observed to acquire the ability to directly generate LTHSCs during ontogenesis .
Developmental stage-related patterns of codon usage: methodological artifacts or byproducts of other correlations?
In this study, we observed that developmental stage-related patterns of gene expression (that is, the 'levels of gene expression' and the 'fold changes of gene expression') were correlated with GC3. Here we suggest that neither the methodological bias of the microarray nor the effect of the correlations between gene length and GC3 substantially influence these observations. Methodological issues are involved in the correlations between the levels of gene expression and codon usage. The SAGE and microarray analysis methods introduce a risk of overestimating the levels of gene expression with high GC content [11, 92]. Therefore, our observation of excessive usage of AT-ending codons in highly expressed genes is not due to a methodological bias of microarray analysis. On the contrary, the actual correlation coefficients between the levels of gene expression and AT-ending codon usage might be even higher. Correlations between patterns of gene expression and gene length have been reported in mammals [19, 62]; therefore, it is necessary for us to identify whether the correlations between the patterns of gene expression and GC3 are byproducts of these correlations. We suggest that gene lengths do not substantially influence these observations because, in our datasets, the levels of gene expression were negatively correlated with the lengths of both transcripts (ranging from -0.182 to -0.084, P < 10-6) and coding sequences (ranging from -0.172 to -0.084, P < 10-6) (Additional data file 2), whereas the levels of gene expression were negatively correlated with GC3 in most cases (Table 2). Moreover, gene lengths do not substantially affect the correlations between the fold changes of gene expression and GC3. In each of nine of ten differentiation pairs in which these correlations exist with significance (positively or negatively), the correlations between the fold changes of gene expression and gene lengths were weaker than, or were opposite to, the correlations between the fold changes of gene expression and GC3 (Table 3; Additional data file 3).
Analyses of codon usage within developmental hierarchies: implications for understanding of evolutionary issues
Developmental processes are believed to be useful guides to the exploration of evolutionary mechanisms . One famous example is the Haeckel's hypothesis that ontogeny may recapitulate, to some extent, phylogeny. Although it is clear that we can not simply regard the early stages of mammalian development as simple organisms , in this paper, using models of stem cell differentiation covering early stages of mammalian ontogeny, certain useful clues about evolutionary issues at the molecular level have been obtained. Some of these clues, for instance, the correlations between the levels of gene expression and codon usage, are shown to be helpful to understanding the codon usage biases that occur in simple organisms [2–11]. In addition, stem cells are observed as the units of natural selection [95, 96] and the origin of many types of cancer [97, 98]. These observations suggest that stem cells might play critical roles during evolutionary processes. Here we suggest that considering patterns of gene expression in early stages of developmental hierarchies (that is, stem cells and progenitor cells) might lead to a better understanding of mammalian codon usage biases.
AT-ending optimal codons in early developmental stages
In this paper, we found that optimal codons displayed variation (AT-ending or GC-ending codons) in different cell types within the developmental hierarchy. The 'optimal codons' are defined here as those codons that are excessively used in highly expressed genes. It has long been assumed that, in certain vertebrates, the optimal codons, if they exist, are consistent with the major codons, which are, on average, used more frequently when taking all the known transcripts of a species into account [16, 18, 19, 62]. Notably, our results show that, in some special circumstances, for example, in certain mouse stem cells and progenitor cells in early developmental stages of mammalian ontogeny, the optimal codons were the AT-ending ones, while the mouse major codons are the GC-ending ones (average GC3 content of mouse transcripts is 0.555, based on Ensembl build 26). The difference between our observations and previous results may be explained by the fact that the previous studies, suggesting that GC-ending codons are the optimal codons, defined the levels of gene expression as average levels of gene expression in whole tissues, or whole organisms in embryonic or adult stages, which actually contain a mixture of all cell types in different developmental stages [16, 18, 19, 62]. These strategies thus mainly reflect the patterns of gene expression in mature cells, and may not allow accurate characterization of gene expression patterns in the early developmental stages because stem cells and progenitor cells only constitute a negligible fraction of the tissues.
Previous reports have indicated correlations between GC-content and the patterns of gene expression in both human and mouse [11, 16–18, 25, 27, 99, 100]. Specifically, mouse GC3 content is positively correlated with levels of gene expression in many tissues. The R2 (R2: the correlation coefficient of determination that indicates how much of the variability in codon usage can be "explained by" variation in the levels of gene expression) of these correlations is as high as 2.6% (Spleen) and 2.3% . In this work, we show that the R2 of the negative correlations between mouse GC3 and the levels of gene expression could reach as high as 2.8% (ESCs of model A). This value is comparable with previous observations . Notably, in the models of stem cell differentiation, defining the 'fold change of gene expression' as a novel pattern of gene expression, we observed that the R2 of correlations between GC3 and the fold changes of gene expression in NSC/LVB (R2 = 14.3%), ESC/HSC (R2 = 11.4%) and ESC/FNSC (R2 = 5.7%) were higher than the R2 of correlations between GC3 and other known patterns of gene expression tested in the other mouse microarray dataset [16, 18]. In this dataset, the levels of gene expression were defined as the average levels in each of 45 tissues . We further tested whether taking early developmental stages into consideration could improve the predictability of codon usage by means of gene expression. Using MVA, we found that the levels of gene expression explained 16.0% (in 5 cell types of model A) and 15.5% (in 10 cell types of model B) of GC3 variation. These values are much higher than the 8.8% obtained from the average levels of gene expression in each of the 45 tissues . This difference between our and previous results suggests that the AT-ending optimal codons in the early developmental stages seem to be critical to the understanding of the regularity of codon usage.
Possible explanations for the correlations between GC3 and the levels of gene expression
It has been suggested that the model of translational selection cannot be used to explain mammalian codon usage [14, 102]. Conversely, recent studies have presented evidence that translational selection might influence the synonymous sites of coding regions [19, 62, 74, 75]. These recent findings also agree with the observations that synonymous changes could dramatically influence translational efficiency in mammalian cells [103–106]. In the present study, we tested the hypotheses of BGC, TAMB and natural selection specifically at the levels of transcription and translation to analyze the possible mechanisms behind the developmental stage-related patterns of codon usage. From our results it is suggested that natural selection at the translational level, compared to the other hypotheses tested in this paper, most probably accounts for the finding that the levels of gene expression are correlated with GC3 in many cell types.
If the usage of synonymous codons correlates with translational efficiency, there might be a selective pressure to choose the synonymous codon that matches the most abundant tRNA. In unicellular organisms and invertebrate metazoans, the optimal codons are in general correspondence with the abundant tRNAs of high copy number [11–14, 80, 107]. Moreover, in the case of mammals, the abundances of tRNAs are also assumed to correlate with their copy number [19, 74]. However, based on this assumption, it would be difficult to understand why optimal codons display variation (AT-ending or GC-ending codons) in the same species. Although the biological bases of the variations of optimal codons remain an issue for further investigation, we hypothesize that one of the aspects of these pressures may be related to variations in specific biochemical environments, for example, the developmental stage-related modification patterns of tRNA molecules. It has been reported that biochemical modification at the wobble positions of tRNA molecules helps regulate their codon recognition preference [108–111]. For example, uridine modified by thiolation or 5-carboxymethylation exhibits a preference for A over G at the third position of the codon . Moreover, developmental stage-related patterns of tRNA modification have been observed [113, 114]. Taken together, we suggest that the developmental stage-related variation of optimal codons might be correlated with developmental stage-related patterns of tRNA modification.
Possible explanations for the correlations between GC3 and the fold change of gene expression
In this paper, we defined the 'fold change of gene expression' as the ratio of the expression levels of the same gene in two cell types from neighboring stages in the developmental hierarchy. It is not surprising that the correlations between the 'fold change of gene expression' and GC3, in specific differentiation pairs, are related to the correlations between the 'levels of gene expression' and GC3 in these two cell types. Moreover, if the correlations between the 'levels of gene expression' and GC3 are the consequence of natural selection, we would regard the correlations between the 'fold change of gene expression' and GC3 as a reflection of the difference between selective pressures in the cell types occupying earlier and later developmental stages. In the differentiation pairs ESC/NSC, NSC/LVB, ESC/HSC, ESC/FNSC, FLHSC/LTHSC and LCP/mature blood cells (MBCs), selective pressure towards AT-ending codons is much stronger in cell types of an earlier rather than a later developmental stage; the genes enriched in the earlier cell types will show a greater usage of AT-ending codons than those in later cell types. In short-term hematopoietic stem cells (STHSCs)/LCP, similar results were obtained. Consistent with the explanation above, in ESC/FLHSC, the selective pressures towards AT-ending codons are very similar between the cell types of earlier and later developmental stages, the patterns of codon usage between the genes enriched in the earlier and later developmental stages are not significantly different (Table 3). However, we observed that, in FLHSC/FLLCP, FLLCP/FLMBC, and LTHSC/STHSC, in which selective pressures towards AT-ending codons are very similar for the cell types of earlier and later developmental stages, the fold changes of gene expression were significantly correlated with AT3. We suggest that these observations may be attributed to the fact that the codon usage of many genes enriched in certain differentiation pairs is affected by other factors that contribute to the codon usage bias of this differentiation pair. Taken together, our observations are consistent with the possibility that the greater the differences between the putative selective pressures of the cell types occupying earlier and later developmental stages, the greater the variation in codon usage (GC3) between genes enriched in the earlier and latter cell types (Table 3). In the differentiation pairs, we also show that the GC3 of the genes that were highly expressed in both earlier and later developmental stages were correlated with the sum of the correlation coefficients between the levels of gene expression and GC3 in these two stages (that is, the putative combination of selective pressures; Rs = 0.78).
Comparative genomic analysis of developmental stage-related genes
We also provide evidence of the presence of negative selection at synonymous sites following the human-mouse divergence. The observation that, in all mouse cell types, highly expressed genes have a lower Ks_noDS (Ks after removing doublet substitution) is consistent with previous results showing that synonymous substitution rates are lower in highly expressed genes compared with other genes in bacteria and Drosophila [9, 115–117]. Considering the occurrence of negative selection at synonymous sites, it is suggested that Ka/Ks, which have long been used to evaluate protein evolutionary rates, carry a risk of overestimation . Therefore, early studies in which exonic synonymous sites have been assumed neutral may require reevaluation (also see [19, 64, 65]). Notably, even with lower Ks, highly expressed genes and developmental-pivotal genes in ESCs of the ESC/HSC differentiation pair still showed lower evolutionary rates (Ka/Ks; Tables 2 and 3). These findings are consistent with previous results that protein evolutionary rates are negatively correlated with levels of gene expression from unicellular organisms to vertebrates [118–120].
In many groups of developmental-pivotal and developmental-specific genes, we also show that both Ks and Ka/Ks are higher than in non-developmental-pivotal genes. These results suggest that the codon usage of most developmental-pivotal and developmental-specific genes has been under less selective constraints. Furthermore, the higher Ka/Ks of these genes may imply that these genes have been subject to different functional constraints after the divergence of human and mouse. This explanation is consistent with the observation that orthologous genes can play different roles in human and mouse stem cells . However, it should be noted that current knowledge of the mechanisms of stem cell differentiation is very limited. Therefore, further study of the function of orthologous developmental-pivotal and developmental-specific genes will deepen our understanding of the higher Ks and Ka/Ks in these genes.
Comparisons between developmental-pivotal genes and developmental-specific genes
The expression of developmental-pivotal genes (regulated up and down) and developmental-specific genes (regulated on and off) is regulated by different strategies. After the combination of these two groups of genes, both GC3 and GCg still differed significantly between the genes selectively expressed at the earlier and later developmental stages of many differentiation pairs (Additional data file 4). However, our data show that these two groups of genes are different in their molecular characteristics, genomic composition and the related evolution rates. Therefore, in this paper, developmental-pivotal genes and developmental-specific genes are discussed separately.
First, compared with developmental-pivotal genes, developmental-specific genes used more GC-ending codons and were located in genomic regions with higher GC content in most cases (Table 3; Additional data file 1). Second, the Ka, Ks, Ks_noDS, and Ka/Ks for many groups of developmental-specific genes were significantly higher than those of the developmental-pivotal genes (Table 3; Additional data file 5). According to these observations, we suggest these two groups of genes are different. Although more evidence is clearly still necessary, the results suggest the possibility that the regulation patterns of genes might be correlated with their codon usage, genomic GC content and evolutionary rates.
Analyses of codon usage within developmental models: implications for understanding differentiation processes
The current study has applied analyses of codon usage to processes of stem cell differentiation to gain a better understanding of developmental processes (that is, the processes of stem cell differentiation) at the genomic level . First, both developmental-pivotal genes and developmental-specific genes have been proposed, and many of them are experimentally demonstrated, to be responsible for maintaining cells at each developmental stage as well as regulating cell differentiation processes [54, 55]. We have shown that codon usage, a 'silent' property of both developmental-pivotal genes and developmental-specific genes, are different between the earlier and later developmental stages in differentiation pairs. These findings suggest that the genes responsible for different developmental stages have different derivations and regulation patterns. Moreover, developmental-pivotal genes and developmental-specific genes exhibit different regulation patterns. During differentiation, the transcriptional intensities of developmental-pivotal genes need to be appropriately regulated up or down, whereas the transcription of developmental-specific genes should be silenced in one stage and activated in another. It has been suggested that chromatin structures and the genome location of developmental-pivotal and developmental-specific genes are quite different: developmental-pivotal genes might be located in euchromatin, whereas most developmental-specific genes might be located in facultative heterochromatin . In this paper, we demonstrate that developmental-specific genes generally use more GC-ending codons than developmental-pivotal genes. We suggest that this different molecular property may correlate with different regulation patterns and chromatin structure, but the precise mechanisms at the moment remain unclear.
Second, it has been shown that the processes of stem cell differentiation are accompanied by remodeling of the entire chromatin structure [123–128]. However, little is known about the characteristics of chromatin segments involved in these remodeling processes. Previous studies have shown that the chromatin segments in which developmental stage-specific genes are located have been remodeled during differentiation [129–132]. Moreover, it has been reported that nucleosome formation potential is correlated with the GC content of DNA . Our results suggest that the GC content of genomic regions where developmental-pivotal genes and developmental-specific genes are located is different between the earlier and later developmental stages in differentiation pairs. Altogether, our results suggest that, during differentiation, the genome segments that are involved in chromatin remodeling are correlated with their GC content. It has been suggested that mammalian genomes are made up of mosaic 'isochore' structures, which might relate to the variation in GC content on the scale of hundreds of kilobases to megabases [22, 23, 40, 133, 134]. Furthermore, the isochores are proposed to correlate with tissue specificity . Previous work also shows that, during ESC differentiation, many differentiation-induced replication-timing and expression changes are restricted to AT-rich isochores . Our findings of developmental stage-correlated codon usage and GCg content indicate that the isochores are related to different developmental stages during mammalian ontogenesis.
In this investigation, using models of stem cell differentiation, developmental stage-related patterns of mouse codon usage have been observed. Notably, in early stages of mouse ontogeny, we found a bias for AT-ending optimal codons. Moreover, during mammalian ontogenesis, we also found that genes selectively expressed during different developmental stages have different codon usage (GC3) and local GCg content. We hypothesize that translational selection, compared to other hypotheses such as BGC and TAMB, most probably accounts for these codon usage biases, especially for the AT-ending optimal codons. The selective constraints were still detectable at synonymous sites of many groups of developmental stage-related genes. Moreover, at the same developmental stage, we also found that developmental-specific genes usually used more GC-ending codons, had higher GCg content and higher substitution rates compared with developmental-pivotal genes. Applying codon usage analysis in developmental hierarchies, this paper provides new clues for understanding differentiation processes. For example, the genome segments that are involved in chromatin remodeling may correlate with GC content. Further investigation will be needed to better understand the significance and implications of the findings presented here.
Materials and methods
Removing 2,672 pseudo genes according to their annotations, we extracted information on 31,022 transcripts from the Mouse division (build 26) of the Ensembl genome database for further analysis. To investigate the evolutionary conservation of mouse genes, we also extracted information from the Human division (build 26) of the Ensembl database.
We used two independent oligonucleotide microarray datasets (Affymetrix MG-U74Av2) for the models of mouse stem cell differentiation [54, 55]. For dataset A, the raw data are available from the website of Melton's lab . We processed these raw data by Affymetrix MAS 5.0. For dataset B, the raw data were processed by Affymetrix MAS 4.0 . We accessed these data from Science website . For both datasets, we used the 'Detection Call' provided by the Affymetrix MAS system to identify whether a transcript is present (P) or absent (A); the marginal situation is marked as M.
The mapping relationships between Affymetrix probe-sets and their corresponding transcripts were extracted from the Ensembl database. The detailed mapping algorithms were implemented by the Ensembl team .
For dataset A, we used the average levels of two replicates as the levels of gene expression, if the probe-sets fulfilled the following criteria. First, in both replicates, the gene was expressed stably such that the standard error (SE) was less than a quarter of the measured expression value:
Second, the gene expression levels were stable between two replicates such that the absolute value of difference between the two replicates' expression values is smaller than half of their mean value
According to the data provided, in dataset B, the average levels of two to four replicates were used as the levels of gene expression. Moreover, genes with expression levels below 200 were removed to confirm gene expression as suggested by Su et al. .
To calculate the codon usage, only probe-sets corresponding to unique transcripts on U74Av2 were considered.
Nucleotide composition analysis
The untranslated regions (UTRs) and coding sequences (CDSs) of a given transcript were extracted from the Ensembl database according to the entry's annotation and validated by chromosome mapping. Sequences with ambiguous annotations were checked manually. To evaluate the influence of TAMB on gene composition, we calculated the nucleotide content in UTRs and the third position of synonymous codons in CDS for A, C, G and T [19, 36]. We also calculated nucleotide composition (GC fraction) in contiguous 20 kb windows, as suggested by Lercher et al. , as genomic background of a given gene (Tables 2 and 3)
Recombination rate estimates
Recombination rates across the mouse genome were estimated by dividing the genetic length (cM) by the sequence length (Mb) between genetic markers [49, 139]. These data were derived from The Whitehead Mouse Genetic Map website .
Codon usage analyses
CodonW software was used to calculate the GC content at the third codon positions (GC3) and the RSCU value of each synonymous codon according to Sharp et al. . Only genes with CDS > 200 were considered.
We detected an orthologous relationship based on the Ensembl build 26 EnsMart Database's annotation. The Ka, Ks and Ka/Ks were calculated using Nei and Gojobori methods  using PAML (yn00) [142, 143] for each ortholog pair. According to the PAML manual , we excluded genes with Ks > 1 for further analyses. Synonymous substitution rates after removing doublet substitutions (Ks_noDS) were calculated as previous described  (Tables 2 and 3).
Spearman's correlation test was used for analysis of paired samples and linear regression analysis was performed by standard routines using the statistical package R . All necessary scripts and/or programs are available.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 provides comparisons of GC3 and GCg between developmental-pivotal genes and developmental-specific genes. Additional data file 2 includes supplementary information about the mechanisms of our observations showing that levels of gene expression are correlated with codon usage, recombination rate, gene length and nucleotide composition. Additional data file 3 includes supplementary information about the mechanisms of our observations showing that the fold changes of gene expression are correlated with codon usage, recombination rate and gene length. Additional data file 4 provides results on the GC3 and GCg of developmental-pivotal genes, developmental-specific genes and both together in each differentiation pair. Additional data file 5 provides comparisons of substitution rates between developmental-pivotal genes and developmental-specific genes.
Bulmer M: The selection-mutation-drift theory of synonymous codon usage. Genetics. 1991, 129: 897-907.
Gouy M, Gautier C: Codon usage in bacteria: correlation with gene expressivity. Nucleic Acids Res. 1982, 10: 7055-7074. 10.1093/nar/10.22.7055.
Sharp PM, Li WH: An evolutionary perspective on synonymous codon usage in unicellular organisms. J Mol Evol. 1986, 24: 28-38. 10.1007/BF02099948.
Sharp PM, Tuohy TM, Mosurski KR: Codon usage in yeast: cluster analysis clearly differentiates highly and lowly expressed genes. Nucleic Acids Res. 1986, 14: 5125-5143. 10.1093/nar/14.13.5125.
Coghlan A, Wolfe KH: Relationship of codon bias to mRNA concentration and protein length in Saccharomyces cerevisiae. Yeast. 2000, 16: 1131-1145. 10.1002/1097-0061(20000915)16:12<1131::AID-YEA609>3.0.CO;2-F.
Akashi H: Gene expression and molecular evolution. Curr Opin Genet Dev. 2001, 11: 660-666. 10.1016/S0959-437X(00)00250-1.
Stenico M, Lloyd AT, Sharp PM: Codon usage in Caenorhabditis elegans: delineation of translational selection and mutational biases. Nucleic Acids Res. 1994, 22: 2437-2446. 10.1093/nar/22.13.2437.
Moriyama EN, Powell JR: Codon usage bias and tRNA abundance in Drosophila. J Mol Evol. 1997, 45: 514-523. 10.1007/PL00006256.
Powell JR, Moriyama EN: Evolution of codon usage bias in Drosophila. Proc Natl Acad Sci USA. 1997, 94: 7784-7790. 10.1073/pnas.94.15.7784.
Duret L, Mouchiroud D: Expression pattern and, surprisingly, gene length shape codon usage in Caenorhabditis, Drosophila, and Arabidopsis. Proc Natl Acad Sci USA. 1999, 96: 4482-4487. 10.1073/pnas.96.8.4482.
Duret L: Evolution of synonymous codon usage in metazoans. Curr Opin Genet Dev. 2002, 12: 640-649. 10.1016/S0959-437X(02)00353-2.
Dong H, Nilsson L, Kurland CG: Co-variation of tRNA abundance and codon usage in Escherichia coli at different growth rates. J Mol Biol. 1996, 260: 649-663. 10.1006/jmbi.1996.0428.
Akashi H, Eyre-Walker A: Translational selection and molecular evolution. Curr Opin Genet Dev. 1998, 8: 688-693. 10.1016/S0959-437X(98)80038-5.
Kanaya S, Yamada Y, Kinouchi M, Kudo Y, Ikemura T: Codon usage and tRNA genes in eukaryotes: correlation of codon usage diversity with translation efficiency and with CG-dinucleotide usage as assessed by multivariate analysis. J Mol Evol. 2001, 53: 290-298. 10.1007/s002390010219.
Duret L: tRNA gene number and codon usage in the C. elegans genome are co-adapted for optimal translation of highly expressed genes. Trends Genet. 2000, 16: 287-289. 10.1016/S0168-9525(00)02041-2.
Semon M, Mouchiroud D, Duret L: Relationship between gene expression and GC-content in mammals: statistical significance and biological relevance. Hum Mol Genet. 2005, 14: 421-427. 10.1093/hmg/ddi038.
Konu O, Li MD: Correlations between mRNA expression levels and GC contents of coding and untranslated regions of genes in rodents. J Mol Evol. 2002, 54: 35-41. 10.1007/s00239-001-0015-z.
Vinogradov AE: Isochores and tissue-specificity. Nucleic Acids Res. 2003, 31: 5212-5220. 10.1093/nar/gkg699.
Comeron JM: Selective and mutational patterns associated with gene expression in humans: influences on synonymous composition and intron presence. Genetics. 2004, 167: 1293-1304. 10.1534/genetics.104.026351.
DeBry RW, Marzluff WF: Selection on silent sites in the rodent H3 histone gene family. Genetics. 1994, 138: 191-202.
Mouchiroud D, Fichant G, Bernardi G: Compositional compartmentalization and gene composition in the genome of vertebrates. J Mol Evol. 1987, 26: 198-204. 10.1007/BF02099852.
Bernardi G: The isochore organization of the human genome and its evolutionary history - a review. Gene. 1993, 135: 57-66. 10.1016/0378-1119(93)90049-9.
Bernardi G: The human genome: organization and evolutionary history. Annu Rev Genet. 1995, 29: 445-476. 10.1146/annurev.ge.29.120195.002305.
Pesole G, Bernardi G, Saccone C: Isochore specificity of AUG initiator context of human genes. FEBS Lett. 1999, 464: 60-62. 10.1016/S0014-5793(99)01675-0.
Lercher MJ, Urrutia AO, Pavlicek A, Hurst LD: A unification of mosaic structures in the human genome. Hum Mol Genet. 2003, 12: 2411-2415. 10.1093/hmg/ddg251.
Ponger L, Duret L, Mouchiroud D: Determinants of CpG islands: expression in early embryo and isochore structure. Genome Res. 2001, 11: 1854-1860.
Goncalves I, Duret L, Mouchiroud D: Nature and structure of human genes that generate retropseudogenes. Genome Res. 2000, 10: 672-678. 10.1101/gr.10.5.672.
D'Onofrio G: Expression patterns and gene distribution in the human genome. Gene. 2002, 300: 155-160. 10.1016/S0378-1119(02)01048-X.
Zhang L, Li WH: Mammalian housekeeping genes evolve more slowly than tissue-specific genes. Mol Biol Evol. 2004, 21: 236-239. 10.1093/molbev/msh010.
Plotkin JB, Robins H, Levine AJ: Tissue-specific codon usage and the expression of human genes. Proc Natl Acad Sci USA. 2004, 101: 12588-12591. 10.1073/pnas.0404957101.
Semon M, Lobry JR, Duret L: No evidence for tissue-specific adaptation of synonymous codon usage in humans. Mol Biol Evol. 2006, 23: 523-529. 10.1093/molbev/msj053.
Musto H, Cruveiller S, D'Onofrio G, Romero H, Bernardi G: Translational selection on codon usage in Xenopus laevis. Mol Biol Evol. 2001, 18: 1703-1707.
Romero H, Zavala A, Musto H, Bernardi G: The influence of translational selection on codon usage in fishes from the family Cyprinidae. Gene. 2003, 317: 141-147. 10.1016/S0378-1119(03)00701-7.
Fryxell KJ, Zuckerkandl E: Cytosine deamination plays a primary role in the evolution of mammalian isochores. Mol Biol Evol. 2000, 17: 1371-1383.
Francino MP, Ochman H: Deamination as the basis of strand-asymmetric evolution in transcribed Escherichia coli sequences. Mol Biol Evol. 2001, 18: 1147-1150.
Green P, Ewing B, Miller W, Thomas PJ, Green ED: Transcription-associated mutational asymmetry in mammalian evolution. Nat Genet. 2003, 33: 514-517. 10.1038/ng1103.
Iida K, Akashi H: A test of translational selection at 'silent' sites in the human genome: base composition comparisons in alternatively spliced genes. Gene. 2000, 261: 93-105. 10.1016/S0378-1119(00)00482-0.
Smith NG, Hurst LD: The effect of tandem substitutions on the correlation between synonymous and nonsynonymous rates in rodents. Genetics. 1999, 153: 1395-1402.
Bielawski JP, Dunn KA, Yang Z: Rates of nucleotide substitution and mammalian nuclear gene evolution. Approximate and maximum-likelihood methods lead to different conclusions. Genetics. 2000, 156: 1299-1308.
Eyre-Walker A, Hurst LD: The evolution of isochores. Nat Rev Genet. 2001, 2: 549-555. 10.1038/35080577.
Hurst LD, Williams EJ: Covariation of GC content and the silent site substitution rate in rodents: implications for methodology and for the evolution of isochores. Gene. 2000, 261: 107-114. 10.1016/S0378-1119(00)00489-3.
Chen FC, Wang SS, Chen CJ, Li WH, Chuang TJ: Alternatively and constitutively spliced exons are subject to different evolutionary forces. Mol Biol Evol. 2006, 23: 675-682. 10.1093/molbev/msj081.
Nickoloff JA: Transcription enhances intrachromosomal homologous recombination in mammalian cells. Mol Cell Biol. 1992, 12: 5311-5318.
Droge P: Transcription-driven site-specific DNA recombination in vitro. Proc Natl Acad Sci USA. 1993, 90: 2759-2763. 10.1073/pnas.90.7.2759.
Nicolas A: Relationship between transcription and initiation of meiotic recombination: toward chromatin accessibility. Proc Natl Acad Sci USA. 1998, 95: 87-89. 10.1073/pnas.95.1.87.
Fullerton SM, Bernardo Carvalho A, Clark AG: Local rates of recombination are positively correlated with GC content in the human genome. Mol Biol Evol. 2001, 18: 1139-1142.
Galtier N, Piganeau G, Mouchiroud D, Duret L: GC-content evolution in mammalian genomes: the biased gene conversion hypothesis. Genetics. 2001, 159: 907-911.
Galtier N: Gene conversion drives GC content evolution in mammalian histones. Trends Genet. 2003, 19: 65-68. 10.1016/S0168-9525(02)00002-1.
Meunier J, Duret L: Recombination drives the evolution of GC-content in the human genome. Mol Biol Evol. 2004, 21: 984-990. 10.1093/molbev/msh070.
Montoya-Burgos JI, Boursot P, Galtier N: Recombination explains isochores in mammalian genomes. Trends Genet. 2003, 19: 128-130. 10.1016/S0168-9525(03)00021-0.
Khelifi A, Meunier J, Duret L, Mouchiroud D: GC content evolution of the human and mouse genomes: insights from the study of processed pseudogenes in regions of different recombination rates. J Mol Evol. 2006, 62: 745-752. 10.1007/s00239-005-0186-0.
Arthur W: The emerging conceptual framework of evolutionary developmental biology. Nature. 2002, 415: 757-764.
Chang CC, Cook CE: Trends in genomic 'evo-devo'. Genome Biol. 2002, 3: REPORTS4019-10.1186/gb-2002-3-7-reports4019.
Ramalho-Santos M, Yoon S, Matsuzaki Y, Mulligan RC, Melton DA: "Stemness": transcriptional profiling of embryonic and adult stem cells. Science. 2002, 298: 597-600. 10.1126/science.1072530.
Ivanova NB, Dimos JT, Schaniel C, Hackney JA, Moore KA, Lemischka IR: A stem cell molecular signature. Science. 2002, 298: 601-604. 10.1126/science.1073823.
Wolfe KH, Sharp PM, Li WH: Mutation rates differ among regions of the mammalian genome. Nature. 1989, 337: 283-285. 10.1038/337283a0.
Eyre-Walker AC: An analysis of codon usage in mammals: selection or mutation bias?. J Mol Evol. 1991, 33: 442-449. 10.1007/BF02103136.
Sharp PM, Averof M, Lloyd AT, Matassi G, Peden JF: DNA sequence evolution: the sounds of silence. Philos Trans R Soc Lond B Biol Sci. 1995, 349: 241-247. 10.1098/rstb.1995.0108.
Hughes AL, Yeager M: Comparative evolutionary rates of introns and exons in murine rodents. J Mol Evol. 1997, 45: 125-130. 10.1007/PL00006211.
Francino MP, Ochman H: Isochores result from mutation not selection. Nature. 1999, 400: 30-31. 10.1038/21804.
Castillo-Davis CI, Mekhedov SL, Hartl DL, Koonin EV, Kondrashov FA: Selection for short introns in highly expressed genes. Nat Genet. 2002, 31: 415-418.
Urrutia AO, Hurst LD: The signature of selection mediated by expression on human genes. Genome Res. 2003, 13: 2260-2264. 10.1101/gr.641103.
Eyre-Walker A: Evidence of selection on silent site base composition in mammals: potential implications for the evolution of isochores and junk DNA. Genetics. 1999, 152: 675-683.
Hurst LD, Pal C: Evidence for purifying selection acting on silent sites in BRCA1. Trends Genet. 2001, 17: 62-65. 10.1016/S0168-9525(00)02173-9.
Chamary JV, Hurst LD: Similar rates but different modes of sequence evolution in introns and at exonic silent sites in rodents: evidence for selectively driven codon usage. Mol Biol Evol. 2004, 21: 1014-1023. 10.1093/molbev/msh087.
Hellmann I, Zollner S, Enard W, Ebersberger I, Nickel B, Paabo S: Selection on human genes as revealed by comparisons to chimpanzee cDNA. Genome Res. 2003, 13: 831-837. 10.1101/gr.944903.
Chamary JV, Parmley JL, Hurst LD: Hearing silence: non-neutral evolution at synonymous sites in mammals. Nat Rev Genet. 2006, 7: 98-108. 10.1038/nrg1770.
Schattner P, Diekhans M: Regions of extreme synonymous codon selection in mammalian genes. Nucleic Acids Res. 2006, 34: 1700-1710. 10.1093/nar/gkl095.
Vinogradov AE: Noncoding DNA, isochores and gene expression: nucleosome formation potential. Nucleic Acids Res. 2005, 33: 559-563. 10.1093/nar/gki184.
Vinogradov AE: DNA helix: the importance of being GC-rich. Nucleic Acids Res. 2003, 31: 1838-1844. 10.1093/nar/gkg296.
Parmley JL, Chamary JV, Hurst LD: Evidence for purifying selection against synonymous mutations in mammalian exonic splicing enhancers. Mol Biol Evol. 2006, 23: 301-309. 10.1093/molbev/msj035.
Willie E, Majewski J: Evidence for codon bias selection at the pre-mRNA level in eukaryotes. Trends Genet. 2004, 20: 534-538. 10.1016/j.tig.2004.08.014.
Chamary JV, Hurst LD: Biased codon usage near intron-exon junctions: selection on splicing enhancers, splice-site recognition or something else?. Trends Genet. 2005, 21: 256-259. 10.1016/j.tig.2005.03.001.
Lavner Y, Kotlar D: Codon bias as a factor in regulating expression via translation rate in the human genome. Gene. 2005, 345: 127-138. 10.1016/j.gene.2004.11.035.
Comeron JM: Weak selection and recent mutational changes influence polymorphic synonymous mutations in humans. Proc Natl Acad Sci USA. 2006, 103: 6940-6945. 10.1073/pnas.0510638103.
Chamary JV, Hurst LD: Evidence for selection on synonymous mutations affecting stability of mRNA secondary structure in mammals. Genome Biol. 2005, 6: R75-10.1186/gb-2005-6-9-r75.
Oresic M, Dehn M, Korenblum D, Shalloway D: Tracing specific synonymous codon-secondary structure correlations through evolution. J Mol Evol. 2003, 56: 473-484. 10.1007/s00239-002-2418-x.
Archetti M: Selection on codon usage for error minimization at the protein level. J Mol Evol. 2004, 59: 400-415. 10.1007/s00239-004-2634-7.
Wolfe KH, Sharp PM: Mammalian gene evolution: nucleotide sequence divergence between mouse and rat. J Mol Evol. 1993, 37: 441-456. 10.1007/BF00178874.
Duret L, Mouchiroud D: Determinants of substitution rates in mammalian genes: expression pattern affects selection intensity but not mutation rate. Mol Biol Evol. 2000, 17: 68-74.
Castillo-Davis CI, Hartl DL: Genome evolution and developmental constraint in Caenorhabditis elegans. Mol Biol Evol. 2002, 19: 728-735.
Martin GR: Isolation of a pluripotent cell line from early mouse embryos cultured in medium conditioned by teratocarcinoma stem cells. Proc Natl Acad Sci USA. 1981, 78: 7634-7638. 10.1073/pnas.78.12.7634.
Nagy A, Gocza E, Diaz EM, Prideaux VR, Ivanyi E, Markkula M, Rossant J: Embryonic stem cells alone are able to support fetal development in the mouse. Development. 1990, 110: 815-821.
Ying QL, Nichols J, Chambers I, Smith A: BMP induction of Id proteins suppresses differentiation and sustains embryonic stem cell self-renewal in collaboration with STAT3. Cell. 2003, 115: 281-292. 10.1016/S0092-8674(03)00847-X.
Ying QL, Stavridis M, Griffiths D, Li M, Smith A: Conversion of embryonic stem cells into neuroectodermal precursors in adherent monoculture. Nat Biotechnol. 2003, 21: 183-186. 10.1038/nbt780.
Burt RK, Verda L, Kim DA, Oyama Y, Luo K, Link C: Embryonic stem cells as an alternate marrow donor source: engraftment without graft-versus-host disease. J Exp Med. 2004, 199: 895-904. 10.1084/jem.20031916.
Qian X, Shen Q, Goderie SK, He W, Capela A, Davis AA, Temple S: Timing of CNS cell generation: a programmed sequence of neuron and glial cell production from isolated murine cortical stem cells. Neuron. 2000, 28: 69-80. 10.1016/S0896-6273(00)00086-6.
Doetsch F, Caille I, Lim DA, Garcia-Verdugo JM, Alvarez-Buylla A: Subventricular zone astrocytes are neural stem cells in the adult mammalian brain. Cell. 1999, 97: 703-716. 10.1016/S0092-8674(00)80783-7.
Jordan CT, McKearn JP, Lemischka IR: Cellular and developmental properties of fetal hematopoietic stem cells. Cell. 1990, 61: 953-963. 10.1016/0092-8674(90)90061-I.
Krause DS, Theise ND, Collector MI, Henegariu O, Hwang S, Gardner R, Neutzel S, Sharkis SJ: Multi-organ, multi-lineage engraftment by a single bone marrow-derived stem cell. Cell. 2001, 105: 369-377. 10.1016/S0092-8674(01)00328-2.
Gothert JR, Gustin SE, Hall MA, Green AR, Gottgens B, Izon DJ, Begley CG: In vivo fate-tracing studies using the Scl stem cell enhancer: embryonic hematopoietic stem cells significantly contribute to adult hematopoiesis. Blood. 2005, 105: 2724-2732. 10.1182/blood-2004-08-3037.
Margulies EH, Kardia SL, Innis JW: Identification and prevention of a GC content bias in SAGE libraries. Nucleic Acids Res. 2001, 29: E60-10.1093/nar/29.12.e60.
Gilbert SF: The morphogenesis of evolutionary developmental biology. Int J Dev Biol. 2003, 47: 467-477.
Gould SJ: Ontogeny and Phylogeny. 1977, Cambridge: Harvard University Press
Laird DJ, De Tomaso AW, Weissman IL: Stem cells are units of natural selection in a colonial ascidian. Cell. 2005, 123: 1351-1360. 10.1016/j.cell.2005.10.026.
Weissman IL: Stem cells: units of development, units of regeneration, and units in evolution. Cell. 2000, 100: 157-168. 10.1016/S0092-8674(00)81692-X.
Clarke MF, Fuller M: Stem cells and cancer: two faces of eve. Cell. 2006, 124: 1111-1115. 10.1016/j.cell.2006.03.011.
Huntly BJ, Gilliland DG: Leukaemia stem cells and the evolution of cancer-stem-cell research. Nat Rev Cancer. 2005, 5: 311-321. 10.1038/nrc1592.
Versteeg R, van Schaik BD, van Batenburg MF, Roos M, Monajemi R, Caron H, Bussemaker HJ, van Kampen AH: The human transcriptome map reveals extremes in gene density, intron length, GC content, and repeat pattern for domains of highly and weakly expressed genes. Genome Res. 2003, 13: 1998-2004. 10.1101/gr.1649303.
Lercher MJ, Urrutia AO, Hurst LD: Clustering of housekeeping genes provides a unified model of gene order in the human genome. Nat Genet. 2002, 31: 180-183. 10.1038/ng887.
Su AI, Cooke MP, Ching KA, Hakak Y, Walker JR, Wiltshire T, Orth AP, Vega RG, Sapinoso LM, Moqrich A, et al: Large-scale analysis of the human and mouse transcriptomes. Proc Natl Acad Sci USA. 2002, 99: 4465-4470. 10.1073/pnas.012025199.
dos Reis M, Savva R, Wernisch L: Solving the riddle of codon usage preferences: a test for translational selection. Nucleic Acids Res. 2004, 32: 5036-5044. 10.1093/nar/gkh834.
Levy JP, Muldoon RR, Zolotukhin S, Link CJ: Retroviral transfer and expression of a humanized, red-shifted green fluorescent protein gene into human tumor cells. Nat Biotechnol. 1996, 14: 610-614. 10.1038/nbt0596-610.
Wells KD, Foster JA, Moore K, Pursel VG, Wall RJ: Codon optimization, genetic insulation, and an rtTA reporter improve performance of the tetracycline switch. Transgenic Res. 1999, 8: 371-381. 10.1023/A:1008952302539.
Zhou J, Liu WJ, Peng SW, Sun XY, Frazer I: Papillomavirus capsid protein expression level depends on the match between codon usage and tRNA availability. J Virol. 1999, 73: 4972-4982.
Kudla G, Lipinski L, Caffin F, Helwak A, Zylicz M: High guanine and cytosine content increases mRNA levels in mammalian cells. PLoS Biol. 2006, 4: e180-10.1371/journal.pbio.0040180.
Kanaya S, Yamada Y, Kudo Y, Ikemura T: Studies of codon usage and tRNA genes of 18 unicellular organisms and quantification of Bacillus subtilis tRNAs: gene expression level and species-specific diversity of codon usage based on multivariate analysis. Gene. 1999, 238: 143-155. 10.1016/S0378-1119(99)00225-5.
Agris PF: Decoding the genome: a modified view. Nucleic Acids Res. 2004, 32: 223-238. 10.1093/nar/gkh185.
Murphy FVt, Ramakrishnan V, Malkiewicz A, Agris PF: The role of modifications in codon discrimination by tRNA(Lys)UUU. Nat Struct Mol Biol. 2004, 11: 1186-1191. 10.1038/nsmb861.
Tong KL, Wong JT: Anticodon and wobble evolution. Gene. 2004, 333: 169-177. 10.1016/j.gene.2004.02.028.
Umeda N, Suzuki T, Yukawa M, Ohya Y, Shindo H, Watanabe K: Mitochondria-specific RNA-modifying enzymes responsible for the biosynthesis of the wobble base in mitochondrial tRNAs. Implications for the molecular pathogenesis of human mitochondrial diseases. J Biol Chem. 2005, 280: 1613-1624. 10.1074/jbc.M409306200.
Ikemura T: Codon usage and tRNA content in unicellular and multicellular organisms. Mol Biol Evol. 1985, 2: 13-34.
White BN, Tener GM, Holden J, Suzuki DT: Analysis of tRNAs during the development of Drosophila. Dev Biol. 1973, 33: 185-195. 10.1016/0012-1606(73)90173-5.
Hosbach HA, Kubli E: Transfer RNA in aging Drosophila: II. Isoacceptor patterns. Mech Ageing Dev. 1979, 10: 141-149. 10.1016/0047-6374(79)90077-0.
Sharp PM, Li WH: The rate of synonymous substitution in enterobacterial genes is inversely related to codon usage bias. Mol Biol Evol. 1987, 4: 222-230.
Sharp PM, Li WH: On the rate of DNA sequence evolution in Drosophila. J Mol Evol. 1989, 28: 398-402. 10.1007/BF02603075.
Shields DC, Sharp PM, Higgins DG, Wright F: "Silent" sites in Drosophila genes are not neutral: evidence of selection among synonymous codons. Mol Biol Evol. 1988, 5: 704-716.
Pal C, Papp B, Hurst LD: Highly expressed genes in yeast evolve slowly. Genetics. 2001, 158: 927-931.
Krylov DM, Wolf YI, Rogozin IB, Koonin EV: Gene loss, protein sequence divergence, gene dispensability, expression level, and interactivity are correlated in eukaryotic evolution. Genome Res. 2003, 13: 2229-2235. 10.1101/gr.1589103.
Subramanian S, Kumar S: Gene expression intensity shapes evolutionary rates of the proteins encoded by the vertebrate genome. Genetics. 2004, 168: 373-381. 10.1534/genetics.104.028944.
Rao M: Conserved and divergent paths that regulate self-renewal in mouse and human embryonic stem cells. Dev Biol. 2004, 275: 269-286. 10.1016/j.ydbio.2004.08.013.
Lowell S: Stem cells in the genomic age. Genome Biol. 2006, 7: 315-10.1186/gb-2006-7-5-315.
Kosak ST, Groudine M: Form follows function: The genomic organization of cellular differentiation. Genes Dev. 2004, 18: 1371-1384. 10.1101/gad.1209304.
Akashi K, He X, Chen J, Iwasaki H, Niu C, Steenhard B, Zhang J, Haug J, Li L: Transcriptional accessibility for genes of multiple tissues and hematopoietic lineages is hierarchically controlled during early hematopoiesis. Blood. 2003, 101: 383-389. 10.1182/blood-2002-06-1780.
Ajamian F, Suuronen T, Salminen A, Reeben M: Upregulation of class II histone deacetylases mRNA during neural differentiation of cultured rat hippocampal progenitor cells. Neurosci Lett. 2003, 346: 57-60. 10.1016/S0304-3940(03)00545-7.
Lee JH, Hart SR, Skalnik DG: Histone deacetylase activity is required for embryonic stem cell differentiation. Genesis. 2004, 38: 32-38. 10.1002/gene.10250.
Rasmussen TP: Embryonic stem cell differentiation: a chromatin perspective. Reprod Biol Endocrinol. 2003, 1: 100-10.1186/1477-7827-1-100.
Lee TI, Jenner RG, Boyer LA, Guenther MG, Levine SS, Kumar RM, Chevalier B, Johnstone SE, Cole MF, Isono K, et al: Control of developmental regulators by Polycomb in human embryonic stem cells. Cell. 2006, 125: 301-313. 10.1016/j.cell.2006.02.043.
Ballas N, Battaglioli E, Atouf F, Andres ME, Chenoweth J, Anderson ME, Burger C, Moniwa M, Davie JR, Bowers WJ, et al: Regulation of neuronal traits by a novel transcriptional complex. Neuron. 2001, 31: 353-365. 10.1016/S0896-6273(01)00371-3.
Takizawa T, Nakashima K, Namihira M, Ochiai W, Uemura A, Yanagisawa M, Fujita N, Nakao M, Taga T: DNA methylation is a critical cell-intrinsic determinant of astrocyte differentiation in the fetal brain. Dev Cell. 2001, 1: 749-758. 10.1016/S1534-5807(01)00101-0.
Song MR, Ghosh A: FGF2-induced chromatin remodeling regulates CNTF-mediated gene expression and astrocyte differentiation. Nat Neurosci. 2004, 7: 229-235. 10.1038/nn1192.
Kuwabara T, Hsieh J, Nakashima K, Taira K, Gage FH: A small modulatory dsRNA specifies the fate of adult neural stem cells. Cell. 2004, 116: 779-793. 10.1016/S0092-8674(04)00248-X.
Bernardi G, Olofsson B, Filipski J, Zerial M, Salinas J, Cuny G, Meunier-Rotival M, Rodier F: The mosaic genome of warm-blooded vertebrates. Science. 1985, 228: 953-958. 10.1126/science.4001930.
Bernardi G: Isochores and the evolutionary genomics of vertebrates. Gene. 2000, 241: 3-17. 10.1016/S0378-1119(99)00485-0.
Hiratani I, Leskovar A, Gilbert DM: Differentiation-induced replication-timing changes are restricted to AT-rich/long interspersed nuclear element (LINE)-rich isochores. Proc Natl Acad Sci USA. 2004, 101: 16861-16866. 10.1073/pnas.0406687101.
Mapping Algorithms. [http://www.ensembl.org/info/data/docs/microarray_probe_set_mapping.html]
Dietrich WF, Miller JC, Steen RG, Merchant M, Damron D, Nahf R, Gross A, Joyce DC, Wessel M, Dredge RD, et al: A genetic map of the mouse with 4,006 simple sequence length polymorphisms. Nat Genet. 1994, 7: 220-245. 10.1038/ng0694supp-220.
Nei M, Gojobori T: Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. Mol Biol Evol. 1986, 3: 418-426.
Yang Z: PAML: a program package for phylogenetic analysis by maximum likelihood. Comput Appl Biosci. 1997, 13: 555-556.
Fischer A, Gilad Y, Man O, Paabo S: Evolution of bitter taste receptors in humans and apes. Mol Biol Evol. 2005, 22: 432-436. 10.1093/molbev/msi027.
PAML Manual. [http://abacus.gene.ucl.ac.uk/software/paml.html]
Statistical Package R. [http://www.r-project.org]
We thank anonymous reviewers for valuable suggestions. This work is supported by the Ministry of Science and Technology Grant (2001CB510106), National Nature Science Foundation of China for Outstanding Young Scientist Award (30125022) and for Creative Research Groups (30421004) to HD. We thank Dr Chung-I Wu, Dr Liping Wei, and Dr Johnny He for helpful discussions and Xiaojun Wang, Meiling Zhang, Wenzhe Lu, Dongbiao Shen and Lingyun Xie for data collection. We are grateful to Bruce Michael and Jiayuan Quan for assistance in manuscript editing.
Electronic supplementary material
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.