Dramatic changes in transcription factor binding over evolutionary time
© BioMed Central Ltd. 2010
Published: 1 June 2010
Skip to main content
© BioMed Central Ltd. 2010
Published: 1 June 2010
A recent study reveals a surprisingly high degree of change in the occupancy patterns of two transcription factors in the livers of five vertebrates.
It is a long standing hypothesis that alterations in transcriptional regulation are a major driving force in evolution, and the results of many recent studies offer corroborating evidence (reviewed in ). Recent studies also indicate that cis-regulatory sequence is the major determinant of differences in transcriptional output among related species, as opposed to other influences, such as changes in transcription factor (TF) DNA binding domains, other chromatin factors, or external signals. Wilson et al.  showed that mouse liver cells containing human chromosome 21 'read' the human DNA in much the same way as do human liver cells, with the TFs hepatocyte nuclear factor (HNF)1A, HNF4A, and HNF6 all binding the same chromosome 21 locations that they would in human, rather than the locations bound in the orthologous mouse chromosome. However, important details have remained elusive, including the degree to which regulatory interactions vary between species across the entire genome, the types of mutations that are responsible for regulatory changes, and whether striking differences in TF binding occupancy are observed more generally among species. In a recent issue of Science, Schmidt et al.  now show that individual regulatory elements are frequently gained and lost among vertebrates and that local cis-regulatory point mutations can account for much of the evolution of transcriptional regulation.
For both TFs, the majority of lineage-specific 'losses' (binding events not present in one placental mammal, but present at aligned, orthologous regions in the other two placental mammals) can be accounted for by either one or two point mutations (and not by insertions or deletions), suggesting that changes in TF occupancy are largely caused by the steady accumulation of small sequence changes . Interestingly, a substantial proportion of losses (between 20% and 40%) occur at genomic locations with unchanged sequence composition at the TF binding site. Although changes in other trans-acting factors might have a role in these cases, another explanation could be the presence of local sequence changes that influence the chromatin state and/or the association of other factors (such as cofactors) with DNA.
Despite widespread evidence of binding site loss and gain, a small number of binding events were found to be 'ultra-shared' (present in all five species; Figure 1d). The relative scarceness of such events emphasizes the low sensitivity of comparative techniques such as phylogenetic footprinting for identifying in vivo binding sites. However, these events were found to be almost always located near known liver-specific genes, suggesting that deep conservation of a binding event is indeed indicative of functionality, in agreement with the fact that highly conserved sequence is known to specifically identify functional regulatory sequence. In contrast, the authors  did not find a tendency for stronger binding events to be preferentially conserved: neither the strength of match to the consensus sequence nor sequencing read depth correlate with sequence conservation. If conservation is a measure of functionality, these results suggest that stronger binding does not necessarily imply functionality, a result compatible with evidence that weaker binding sites are functionally important and that TFs can often bind to a wide range of sequences.
The finding that TF binding events have diverged rapidly throughout the vertebrate lineage  is consistent with recent results comparing related yeasts  and different human and yeast individuals [5–7]. In contrast, a recent study comparing the genome-wide binding of six TFs among two closely related Drosophila species reports  that 'where we observe binding by a factor in one species, we almost always observe binding by that factor to the orthologous sequence in the other species'. What factors might contribute to such strikingly different findings? One possible explanation is that the observed differences might be attributable to discrepancies in the evolutionary distance separating the species analyzed in each study. The Drosophila species of Bradley et al.  have neutral substitution rates of approximately one in ten bases, a rate much lower than that of the vertebrates of Schmidt et al.  (about one in three among placental mammals) and the yeast species of Borneman et al.  (about one in four). With such low Drosophila substitution rates, perhaps there simply has not been enough time for changes in the regulatory sequences to accumulate. However, this notion is inconsistent with the data comparing different human and yeast individuals [5–7]. Furthermore, recent results comparing the global binding patterns of RNA polymerase II between human and chimpanzee, which have substantially lower substitution rates than the two Drosophila species, also indicate that as many as 32% of genes have diverged regulatory programs .
An alternative explanation is that Bradley et al. focus on early embryogenesis, a developmental stage that might be expected to be under stronger selection constraints, whereas the other studies [3, 5, 6] analyze samples taken from adult tissues. It is also possible that some of the differences between conclusions reached by different studies are due to differences in methodology of data collection and analysis. For example, Bradley et al.  identified binding event losses as those present in one species (using a stringent threshold) and completely absent in the other species (using a lenient threshold). Accordingly, a binding event that is strong in one species and weak in the other would be considered a 'conservation' event by Bradley et al.  but a 'loss' event by Schmidt et al. . Other discrepancies might arise from differences in false negative rates. If one study has a false negative rate of 5%, the expected divergence rate for two species with completely conserved binding events would be 10% - a second study with a different false negative rate would have a different expected divergence rate. Finally, simulation studies have shown that TF binding sites cannot be aligned accurately at many of the divergence distances considered in the above studies, resulting in the manifestation of binding site loss events simply as a result of alignment errors. In the end, an unbiased, methodologically uniform assessment comparing the results of these studies would be greatly beneficial. Ideally, such a study would address whether there is evidence for selection acting to preserve binding events - it is currently unclear how many conserved binding events would be expected by chance alone.
Central to the significance of all of these studies [2–8] is the question of what proportion of individual TF binding sites are functional. Results from several recent ChIP-microarray (ChIP-chip) and ChIP-Seq studies (reviewed in ) demonstrate that many TFs bind promiscuously genome-wide, but that most binding events seem to have little influence on gene expression, echoing earlier results from yeast. Given the large number of binding events and mounting evidence supporting the transient nature of TF binding events, it is possible that most individual TF binding sites have limited functional importance. Furthermore, given that 30 to 50% of CEBPA and HNF4A binding site sequences overlap in the genome, many binding events might be non-functional interactions with accessible motifs in regions of open chromatin - in yeast, nucleosome depletion is a strong predictor of where TFs will bind.
Deciphering the determinants of TF binding and their relationship to gene expression output will be important for understanding both the function and the evolution of transcriptional regulatory mechanisms. Nonetheless, the findings of Schmidt et al.  offer intriguing insights not only into the evolution of transcriptional regulation, but into evolution itself. At first glance, it might seem somewhat surprising that something as important as TF binding sites is evolving so rapidly. However, assuming that gene regulation occurs by ensembles of modules that act largely independent of one another - a model that is supported by a wealth of evidence  - most losses (and gains) of individual binding sites are likely to have a small effect on overall transcriptional output. In such a model, the vast majority of individual TF binding sites would be disposable over the long term, because compensatory sites would also arise frequently, resulting in the accumulation of point mutations disrupting individual binding sites at near-neutral rates. The ability to tolerate such changes could also increase an organism's capacity to generate heritable phenotypic variation, and so increase overall 'evolvability'. The fluidity of eukaryotic transcriptional regulatory regions may therefore enable the exploration of potentially beneficial new regulatory sequence configurations.
We are grateful to Alan Moses and Harm van Bakel for their thoughtful critique of this manuscript.