Conservation of gene order in evolution
General trends in gene order conservation
To address the issue of how gene order is conserved during evolution, I measured gene order conservation in prokaryotes in relation to evolutionary distance in terms of small subunit rRNA (SSU rRNA) substitutions. The results are shown in Figure 1.
In the Bacteria, conservation of gene order apparently follows a common trend for all species. The loss of gene order conservation when phylogenetic distance increases is clear, but even at long distances some conservation is maintained. This is mainly because of clusters of genes that remain well conserved during bacterial evolution [12]. Gene order is extensively conserved at small phylogenetic distances, mostly because rearrangement has not yet had time to occur.
The distribution in Figure 1a fits to a sigmoid curve, revealing the existence of a cooperative process in the loss of gene ordering. This might be related to the existence of operons, in which the displacement of a single gene can facilitate the rearrangement of the rest of the operon. Previous studies proposed an exponential shape for the distribution [8,17]. This disagreement probably arises because those studies did not include pairs of closely related species, and therefore missed the leftmost part of the graph, which is highly significant for the sigmoid shape.
Within this observed global trend, several bacterial species present small deviations from the average. Although such deviations are small, in some cases they are indicative of evolutionary processes shaping the genomes.
An interesting case is that of Buchnera. Figure 1b shows that the degree of gene order conservation in Buchnera is greater than expected according to the phylogenetic position of this bacterium, as previously observed [18,19]. As an endosymbiont, Buchnera is experiencing extensive gene loss due to reductive processes, and consequently, lower levels of gene order conservation could be expected. However, many gene rearrangement processes are dependent on RecA activity [20,21], which could not be found in Buchnera [18]. As a result, it is likely that the genome of this bacterium has experienced few rearrangement events. Lateral gene transfer also seems negligible in this case [22], and therefore gene loss remains as the only process capable of altering gene order in the Buchnera genome. With the exception of lost genes, the Buchnera genome might reflect the gene order it had when the bacterium became an endosymbiont and lost recA. Accordingly, it could be used as a convenient reference point in studies on gene order.
Deep-branching species on the bacterial tree, such as Aquifex and Synechocystis, also deviate from the average. These species have the lowest values for gene order conservation among the Bacteria (Figure 1b). This agrees with classical molecular phylogenies as well as with genomic phylogenies based on whole-proteome analysis [3], in which these species are also the most divergent within the Bacteria.
To study whether a common trend in conservation of gene order occurs within prokaryotes, I also included archaeal species in the comparisons. According to Figure 1a, the trend observed in Bacteria is not found in the Archaea. Conservation of gene order between archaea is less than between bacteria, even for very closely related species (Pyrococcus horikoshii and Pyrococcus abyssi), and the point at which only residual conservation persists is reached much faster. I think that this difference is probably artifactual, and due to anomalous measurement of the phylogenetic distances between organisms. Brinkmann and Philippe [23] argued that SSU rRNAs of bacteria evolve faster than those of archaea, thus resulting in an underestimation of the phylogenetic distances between archaea. The distances between archaea are thus probably higher than shown in Figure 1a and, consequently, gene order conservation would fit well into the overall trend found for the Bacteria, although the lack of points on the left-hand side of the graph makes it difficult to extract a conclusion. Moreover, measures of phylogenetic distances between bacteria and archaea should also be higher, which would shift the Bacteria-Archaea points to the right in the plot, thus eliminating the surprising artificial overlap between Bacteria-Archaea and Bacteria-Bacteria points.
This is a good example of the difficulties encountered when using molecular phylogenies. Phenomena such as unequal mutation rates and lateral gene transfer, or artifacts such as long-branch attraction may produce biased results [1]. Here, I show that these problems seem surmountable with the aid of genomic methods. The unequal mutation rate in SSU rRNA, detectable only by careful comparison of different molecular phylogenies, can be readily discovered by looking at gene order conservation. Hence, gene order conservation could be used as an alternative measure of distances between organisms, especially when such distances are small.
Conservation of gene order between bacteria and archaea is much lower than within each domain, and is even nonexistent in some cases. There is one exception: gene order conservation between the hyperthermophilic bacterium Thermotoga maritima and archaea is higher than the rest, and much higher than between Aquifex and archaea, even though the SSU rRNA distances between bacteria and archaea are approximately equal. The existence of extensive lateral gene transfer between Thermotoga maritima and archaea has been claimed [24]. This possibility is of great importance, as it suggests lateral gene transfer can occur between different domains. Thermotoga thus provides a nice example of conservation of gene order produced via lateral gene transfer.
Molecular phylogenies of universally conserved genes for better estimating distances
A different set of phylogenetic distances can be extracted by averaging those obtained from the molecular phylogenies of universally conserved genes (see Materials and methods). The results are shown in Figure 2. Distances between organisms seem to be more accurately estimated using this set of genes, and thus gene order conservation within the Archaea follows more closely the trend observed in the Bacteria. As the agreement between the two distributions is still not complete, however, I conjecture that the estimates of distances are still not entirely correct. It is likely that there are no differences between the amount of gene order conservation among the Bacteria and among the Archaea, and therefore the trend of conservation of gene order could be approximately the same for both domains.
Common gene content and gene order conservation
Realizing the difficulty of estimating the relationships between organisms using molecular phylogenies, some authors have proposed a genomic method based on the common gene content of the genomes [2,3]. This method of estimating distances is claimed to be more accurate as it is not affected by the drawbacks of molecular phylogenies. I used common gene content as an additional estimation of distance between genomes, and compared the resulting distances with gene order conservation. The results are shown in Figure 3.
When using common gene content as a measure of phylogenetic distance, gene order conservation in the Archaea follows a similar trend to that in the Bacteria (Figure 3a). Even if common gene content has some biases, as I will illustrate below, such biases are expected to be the same for the Bacteria as for the Archaea. This reinforces the hypothesis that both domains have a similar trend in the conservation of gene order.
In a more general sense, common gene content seems to be a noisy measure, as it is affected by factors such as the different lifestyles of the organisms. For example, Xylella fastidiosa is a proteobacterium, and one of its closest relatives in this study is Pseudomonas aeruginosa. Nevertheless, their common gene content is low, less than 40%. Between E. coli and Haemophilus influenzae, with a comparable phylogenetic distance, common gene content is around 70%. The fact is that X. fastidiosa has a very high number of open reading frames (ORFs) with no known relatives in other species (unique genes). The number of unique genes is as high as 40% for X. fastidiosa [25], and it is also very high for some other species [26]. As a result, distances between X. fastidiosa and other bacteria are overestimated by using common gene content. This is often the case for closely related bacteria with different lifestyles, such as E. coli and Vibrio cholerae, which share less than half of their genes because their different environments require different adaptations and different systems. Common gene content thus has disadvantages as a measure for estimating phylogenetic distances. In contrast, gene order conservation defines much more precisely the course of evolution of genomes, as it is not affected by the presence of particular sets of genes in individual genomes.
Regions of conservation and non-conservation of gene order in the genome
The second object of this study was to determine how the conservation of gene order is distributed along the genome. Are the conserved regions uniformly spread, or are there instead well-defined regions of high and low conservation? The latter answer seems to be the right one. Figure 4 shows conservation of gene order using the genomes of E. coli and X. fastidiosa as references. The rest of the genomes are sorted according to their phylogenetic distance (estimated by SSU rRNA substitutions) to the reference genomes. The gradual loss of gene order is easily seen, and it is apparent that regions of high gene order conservation coexist with regions in which no conservation can be found.
Regions with no trace of gene order conservation are not rare, even between closely related organisms. They represent either regions in which active rearrangement processes occur, or regions with a majority of unique genes. The first case is illustrated in Figure 4a for E. coli, in which the terminus of replication, which is a recombination hotspot, has no gene order conservation because of the extensive rearrangement in this region. An example of the second case is shown in Figure 4b for the genome of X. fastidiosa, in which regions where unique genes are prevalent are easily detected because of their lack of gene order conservation.
At the other extreme, regions of high gene order conservation exist in all the genomes. Figure 1 shows that there is a remnant of gene order conservation even between distantly related organisms, in both the Bacteria and the Archaea. These regions of special conservation can be thought of as being subject to selective processes for keeping genes together. I analyzed the functional composition of these regions.
To find out whether the conserved regions are related to any functional characteristics, the proteins encoded by the genes in these regions were functionally classified using the EUCLID system [27]. I also explored the correspondence of the runs of genes with experimentally determined operons, as found in the RegulonDB database [28]. The most conserved runs are shown in Table 1. No apparent preferences for particular functional classes were found (apart from the translation class, over-represented because of ribosomal proteins). The runs are composed of genes for proteins involved in many different types of processes, from metabolic-related classes to information-related ones. With some exceptions, every run is preferentially composed of ORFs belonging to the same functional class. The selective forces acting to keep these genes together could indeed be different when the run is composed of different functional classes. For instance, the conservation of gene order in metabolic-related runs is often related to their coding for enzymes that act sequentially in a pathway, forming multifunctional complexes in several cases. For the runs related to cellular processes and information management, the selective scenario might be more complex [7].
The conserved runs of genes usually correspond to operons in E. coli, and combinations of two or even three operons are common. If we consider the proposal that operons are unstable structures [9], the maintenance of gene order within the operons would be striking in itself, but the conservation of combinations of operons points to additional factors, other than common regulation, acting in the conservation of gene order. Lateral gene transfer could play a part in such a process [13], even if it is not easy to envisage how it could explain such extensive conservation. It is too early to say whether the assumption that operons are independent units [29] is challenged. Additional research on these conserved structures is needed in order to elucidate the factors acting in each case.