Extreme conservation of non-repetitive non-coding regions near HoxDcomplex of vertebrates

Homeotic gene complexes determine the anterior-posterior body axis in animals. The expression pattern and function of hox genes along this axis is colinear with the order in which they are organized in the complex. This 'chromosomal organization and functional correspondence' is conserved in all bilaterians investigated. Although the molecular basis of this 'colinearity' in not yet understood, it is possible that there are control elements within or in the proximity of these complexes that establish and maintain the expression patterns of hox genes in a coordinated fashion. We report here an unprecedented conservation of non-coding DNA sequences adjacent to the HoxD complex of vertebrates. Stretches of hundreds of base pairs in a 7 kb region, upstream of HoxD complex, show 100% conservation from fish to human. Using primers designed from these sequences of human HoxD complex, we amplified the corresponding regions from different vertebrates, including mammals, aves, reptiles, amphibians and pisces. Such a high degree of conservation, where no variation was allowed during ~500 million years of evolution, suggests critical function for these sequences in the regulation of the HoxD complex. Furthermore, these sequences provide a molecular handle to gain insight into the mechanism of regulation of this complex.

conserved in all bilaterians investigated. Although the molecular basis of this 'colinearity' in not yet understood, it is possible that there are control elements within or in the proximity of these complexes that establish and maintain the expression patterns of hox genes in a coordinated fashion.
We report here an unprecedented conservation of non-coding DNA sequences adjacent to the HoxD complex of vertebrates. Stretches of hundreds of base pairs in a 7kb region, upstream of HoxD complex, show 100% conservation from fish to human. Using primers designed from these sequences of human HoxD complex, we amplified the corresponding regions from different vertebrates, including mammals, aves, reptiles, amphibians and pisces. Such a high degree of conservation, where no variation was allowed during ~500 million years of evolution, suggests critical function for these sequences in the regulation of the HoxD complex. Furthermore, these sequences provide a molecular handle to gain insight into the mechanism of regulation of this complex.

Background:
Eukaryotic genome contains a large excess of non-coding sequences. Conservation of these sequences among species is a strong indication of their functional significance. With the availability of genome sequences it is possible to identify such sequences taking the comparative genomics approach [1][2][3] . Clustering of genes that are regulated in a linked manner has been noticed in several cases 4,5 . Among the most conserved regions of the vertebrate genome are the clusters of homeotic genes 6,7 . Homeotic gene complex was first identified in Drosophila melanogaster and was demonstrated to play major role in anterior-posterior body axis formation 8 . Hox genes in flies and similarly in vertebrates are expressed in a coordinated manner along the body axis. The molecular mechanism behind such coordination in regulation, however, is not yet understood. Several mechanisms have been proposed that link the organization of homeotic genes and the spaciotemporally controlled expression [9][10][11] of which the most attractive one implicates higher order chromatin organization in this process 12 . It has been shown that an upstream region spanning up to 20 kb plays an important role in the regulation of this complex 13 . Such studies have lead to the speculation that repressive elements in this region may initially silence the complex and then release the genes for expression in a sequential manner. Fine mapping of such sequences and their conservation in other vertebrates have not been reported. Role of higher order chromatin organization in the regulation of homeotic gene complex is relatively better known in case of bithorax complex of Drosophila 14 .

Results and discussion:
We compared genomic regions flanking hox complexes in order to identify conserved regions. Here we report that the upstream regions of several hundred base pairs, in particular the CR-2, shows 100 % conservation, Table 1, Fig.1b.
These sequences are found as single copy and are vertebrate specific. We also noticed longer stretches of conservation among mammals, which gradually shortens as we go towards lower vertebrates, defining the core of each conserved region, across the vertebrate classes, Table-1. This and the fact that in case of shark, as compared to mammals, the intervening sequence lengths between CR-2 and CR-3, and CR-1 and Evx-2 is shorter by ~1300 bp and ~600 bp, respectively Sequencing or the PCR products confirmed these observations.
Several recent reports using comparative genomics approach have identified conserved non-coding regions among different vertebrates [15][16][17] but none to the degree that we report here. The mechanism that may require such a high degree of conservation is not known. It is not, therefore, immediately clear what precisely is the (regulatory) role of these sequences. A part of CR-1, 2 or 3 could be the enhancer of Evx-2 gene or other regulatory elements, that could be in this region 4,5 . The size and the extent of conservation, however, rules out such enhancer type regulatory sequences to be the only functional element associate with these sequences. The conserved sequences fall within the region that has been suggested to organize a repressive complex 13 . Identification of CR-1, 2 and 3, and their 'class specific' extensions (Table 1) will help in the search for molecular components of any such or any other mechanism of HoxD regulation.
EST data base search revealed that part of CR-1 and CR-3 are transcribed but no EST corresponding to CR-2 or any other part of the 7.5 Kb region was found. These transcripts are expressed early in the development, Fig.1. A possible mechanism could involve RNA from this region that may be functioning by base pairing to implement temporal and spatial regulation of the homeotic genes. If that is the case, such high conservation could be expected. Role of transcription in the regulation of bithorax complex is emerging from recent studies [18][19][20][21] . Further studies will be required to determine if such a process may be common to vertebrate Hox complexes as well.
While such an extreme conservation of several hundred nucleotides over half a billion years in a region that does not code for any known proteins certainly implicates essential role for such sequences, probably in the regulation of HoxD complex, no known regulatory element requires such extreme conservation extending up to hundreds of base pairs. It is therefore, likely that these elements are a component of a novel mechanism common to all vertebrates that regulates this gene complex. We are tempted to suggest that such a strongly conserved region from fish to human linked to a gene complex that is known to determine body axis formation may be the key determinant of molecular basis of early ontogeny. Early embryos of all vertebrates show striking similarity and we suggest that these elements may be controlling the early expression pattern of HoxD which leads to similar pattern of the embryo shape. While very speculative, such possibilities can be tested experimentally. The gradient of conservation seen in this region from fish to human may signify the evolutionary history of this locus. Diversification of the vertebrate classes and the morphological features along the anterio-posterior body axis that have been acquired during evolution 22, 23 could potentially be correlated by extensive molecular analysis of these sequences.

Sequence analysis
The genomic sequences that contained Evx-2 and any of the Hoxd genes were downloaded and annotated using gene/ORF prediction tools. Similar approach was used for other hox complexes.
Homology searches of the upstream sequences of HoxD region from human (AC009336; from nucleotide 56601 to 64095) was carried out using the BLAST program of NCBI. The sequences that showed significant homology were further used to analyze the extent of homology by BLAST 2 program. The conserved regions from each sequence was obtained and subjected to multiple sequence analysis using Clustal X. In order to identify the expressed sequences corresponding to the conserved sequence, the conserved sequences along with the unique sequences were BLASTed against EST databases (human, mouse and dbEST).
The contigs that showed significant homology to the upstream sequences of human HoxD were annotated using the tBLASTx program and searching the translated amino acid sequence in the   ______________________________________________________________________________ Vertebrate conservation is based on human, mouse and shark comparison. Although we find genomic sequences of other vertebrates (fugu and zebrafish) that show conservation in this region, those sequences are in draft form and, therefore, have been excluded from this comparison. Length of core and extended regions of conservation are given in base pairs. Number with in the bracket indicates percent homology.