The role of DNA sequence in centromere formation
© BioMed Central Ltd 2003
Published: 29 April 2003
Skip to main content
© BioMed Central Ltd 2003
Published: 29 April 2003
Centromeres are key to the correct segregation and inheritance of genetic information. Eukaryotic centromeres, which are located in large blocks of highly repetitive DNA, have been notoriously difficult to sequence. Several groups have recently succeeded in analyzing centromeric sequences in human, Drosophila and Arabidopsis, providing new insights into the importance of DNA sequence for centromere function.
Centromeres are essential for the proper segregation of chromosomes during cell division in eukaryotes. They are characterized by highly repetitive DNA regions and bound kinetochore proteins, which are required for the attachment of microtubules to the chromosomes during mitosis. Centromeres are a paradox in that their basic function is highly conserved across eukaryotes but their sequences are divergent, even between closely related species . Several investigators have therefore suggested that the DNA sequence may not be essential in centromere formation . It has been difficult to address this issue because of a lack of complete sequence for any higher eukaryotic centromere. Sequencing efforts have been confounded because centromeres are located in regions of highly repetitive DNA. Several groups [3–7] have recently developed novel methods to overcome these difficulties and report extensive centromeric sequence data from human, Drosophila and Arabidopsis.
Deletion of large regions of the human Y chromosome has shown that centromere activity is associated with a block of tandemly repeated 171 base-pair (bp) units, termed α-satellite DNA . Further work has demonstrated that every human centromere is associated with arrays of this α-satellite DNA that can be several megabases (Mb) in size. These massive arrays are imbedded between blocks of pericentric heterochromatin containing highly repetitive DNA . In situ hybridization with α-satellite and immunolabeling using antibodies against kinetochore proteins also confirms that centromeres are located in these regions .
Schueler et al.  used variation among the 171 bp repeats of α-satellite DNA in the human centromere to design PCR markers. The markers were used for constructing a 500 kilobase (kb) contig of bacterial artificial chromosomes (BACs) that covers a region that is immediately adjacent to, and including part of, a 3 Mb array of α-satellite located at the centromere of the human X chromosome. Shotgun and BAC-end sequencing gave a sampling of this region that consisted of approximately 62% diverged α-satellite DNA, about 24% other satellite repeats, and about 16% LINE-type retroelements, as well as other sequences. The 3 Mb array of α-satellite DNA consists of nearly identical copies of the 171 bp unit that have more than 99% sequence identity and are all oriented in the same direction. At the edge of the array is approximately 40 kb of α-satellite DNA that becomes more divergent with distance from the center of the 3 Mb array, moving from 98% to 70% identity at the edge.
Arabidopsis centromeres include a 178 bp satellite repeat, which is organized in tandem arrays that range in size from 0.4 Mb to 1.4 Mb on different chromosomes and are located between regions enriched for various satellites and other repetitive elements [6, 11]. The clusters of α-satellite DNA in human and the 178 bp centromeric element in Arabidopsis are organized in similar ways, although their primary sequences are completely unrelated. Interestingly, centromeres of other plants have also been shown to contain DNA elements of similar length, and this may reflect a common requirement for centromere function (see, for example, ).
To overcome difficulties in sequencing repetitive DNA from Drosophila centromeres, a novel approach  was used involving the Drosophila minichromosome Dp1187, which is derived from the X chromosome and retains a fully functional centromere. Several deletion derivatives of this minichromosome were recovered after irradiation and were used to map the centromere to a 420 kb region. One derivative chromosome of 620 kb was isolated by electrophoresis and gel extraction. Its DNA was fractionated and cloned and bacterial transposons were inserted into the cloned DNA . Previous work  had demonstrated that the centromere of the Drosophila X chromosome is composed of arrays of two types of simple 5 bp satellites, AATAT and AAGAG, that are interrupted by five retrotransposons and an 'island' of complex DNA. Using primers specific to the inserted bacterial transposons or tagged primers that consisted of satellite sequence attached to non-homologous sequence, Sun et al.  were able to sample 31 kb of the AATAT and AAGAG satellites. This study  and previous work  showed that the arrays in the Drosophila centromere are highly similar - the AATAT sequence had 2.2% variation and AAGAG had only 0.3% variation in sequence - and that the repeats in each satellite are in the same orientation. Whereas transposon-like sequences previously found in Drosophila heterochromatin often consisted of scrambled clusters of different elements , the retrotransposons in the centromere of the X chromosome were intact. This suggested that they had recently been inserted into the genome or that their sequence is functionally conserved. The island of complex DNA was shown to be 39 kb long, including 16.2 kb of AT-rich sequence and retrotransposon-like elements that are arranged in blocks in different orientations. The beginning and end of this island contain a similar sequence, but are oriented in opposite directions - an arrangement analogous to fission yeast centromeres .
All of the elements identified by Sun et al.  are also found at non-centromeric locations in the Drosophila genome; the AATAT and AAGAG satellites are present in other but not all centromeres. Indeed, in Drosophila there are no DNA sequences that are located at every centromere, suggesting that primary centromeric sequence alone is neither sufficient nor necessary for centromere formation. The arrays identified in the X chromosome may therefore be merely permissive for centromere organization.
Drosophila centromeres are unusual in being composed of sequences that are abundant elsewhere in the genome whereas in plants or mammals this is not the case under normal circumstances. But there are some cases, in which the usual human centromeric sequences can be found at other chromosomal locations, where they display no detectable centromeric activity. For example, Robertsonian translocations, which are whole-arm rearrangements between acrocentric chromosomes can link two centromeres and yet the resulting chromosome is stably transmitted through mitosis and meiosis. Furthermore, in situ analysis using antibodies against essential kinetochore proteins, such as CENP-C, an essential component of the inner kinetochore plate, and CENP-A, the centromere-specific variant of histone H3 in human, has shown that only one of the two centromeric locations retains function .
Also in humans, rearranged chromosomes have been found that lack the region in which the centromere is usually present, and in these cases a new location has acquired centromeric activity. The new site ('neocentromere') has the usual hallmarks of a centromere - it forms a cytologically discernible constriction on the centromere and has kinetochore proteins bound [10, 14]. The DNA sequences that gave rise to two of these neocentromeres were determined by immunoprecipitation of chromatin with antibodies against the centromeric histone H3 protein CENP-A. Analysis of the isolated DNA region showed that there are no elements in common between the two neocentromeres and normal centromeres [15, 16].
Human artificial chromosomes can be generated by introducing α-satellite DNA arrays into cells , but not by introducing the DNA sequences of neocentromeres in a similar fashion . Nevertheless, when the chromosome arms surrounding the neocentromere are truncated by insertion of telomere sequences, the resulting minichromosomes composed of the neocentromere DNA can be perpetuated through cell divisions . This indicates that the satellite array of normal centromeres can direct de novo centromere formation, whereas the neocentromere DNA cannot. Nevertheless, the chromatin structure of the neocentromere appears to be stably maintained throughout the cell cycle. Because the primary sequences are not similar between neocentromeres and usual centromeres, the presence of neocentromeres suggests that centromere function may be regulated on an epigenetic level independent of DNA sequence.
The importance of chromatin structure for centromere function is supported by the presence of species-specific variants of histone H3 found in the centromeric chromatin of all eukaryotes. The variants interact with the other core histone proteins, H2a, H2b and H4, to form a type of nucleosome that is present only at functional centromeres. It has been suggested that nucleosomes containing centromeric histone H3 are indispensable for centromere function and likely to serve as anchors for kinetochore formation. A model proposing that correct spacing of centromeric and normal nucleosomes is required for centromere function is supported by recent data from Drosophila and human cells showing that stretched chromatin from centromeres is organized into blocks of centromeric nucleosomes interspersed between blocks of nucleosomes containing the normal core histone H3 . This spacing may be facilitated by the satellites present at centromeres. Centromeric satellites from mammals and plants are approximately the length required to wrap around a nucleosome, and even in Drosophila multiples of the 5 bp satellites could add up to a unit of nucleosomal length.
Analysis of centromeric histone H3 in related species of mammals, flies, and plants has shown that the variants are highly similar to core histone H3 proteins in the regions that interact with the other histone proteins [20–22]. But in the region that is likely to contact the DNA strand centromeric histone H3 proteins appear to be under adaptive selection. Because the DNA sequence elements that are in contact with the centromeric H3 histones are divergent between species, it has been suggested that the centromeric histone H3 protein and the DNA are coevolving. Meiotic drive (a distortion of chromosome segregation) resulting from preferential positioning of 'stronger' centromeres to the egg during female meiosis might be the mechanism for this coevolution [20, 21].
Many models for centromere determination predict that centromere function is independent of the underlying sequence. Such models are formulated to explain how nucleosomes containing centromeric histone H3 are maintained at all functional centromeres regardless of the DNA sequence with which they are associated. Spatial or temporal sequestration of the centromeres within nuclear compartments coupled to the availability of centromeric nucleosomes within these compartments or time phases has been suggested as a mechanism. Another model predicts that extant nucleosomes containing centromeric histone H3 are distributed to each strand during replication and subsequently used in post-replication recruitment of additional centromeric nucleosomes (for further discussion see ).
Models for centromere formation that do not rely on sequence must account for certain elements, such as the human α-satellite DNA and the Arabidopsis 178 bp repeat, that are present at every centromere in a normal karyotype within a given species. It seems that there must be mechanisms that homogenize repetitive elements such as centromeric repeats. For example, unequal crossing-over has been postulated to explain homogenization of α-satellite DNA within a chromosome , but there must also be a process that homogenizes the repeats between nonhomologous chromosomes. Unless the homogenization mechanism imposes constraints on the substrate sequence, changes to centromeric elements that become fixed in different populations would become randomly distributed in the absence of selection for sequence content. The analysis of Arabidopsis and human centromeric satellites identified regions that were conserved among the various iterations, as well as regions that were more variable than average, implying that selection pressures act on the sequence of centromeric elements . The observed non-random distribution of centromeric satellite DNA is not consistent with a model proposing complete irrelevance of sequence.
Some investigators [23, 24] have raised the possibility that secondary structure or even higher order DNA structure could be a factor in determining centromere position and function. This idea may reconcile data showing irrelevance of primary sequence on the one hand with data that show conservation of DNA elements on the other. Conservation of DNA secondary structure allows for large variation in sequence, but does not exclude fine-tuning of the primary sequence, perhaps through coevolution with the domain of the centromeric histone H3 that associates with DNA. Similarly, epigenetic models of centromere formation, proposing regulation at the chromatin level, would not exclude fine-tuning of primary sequence. In either model, formation of a centromere with a new sequence would be allowed as long as the region permitted the proper higher-order DNA organization.
Data from neocentromere analysis do provide support for the idea that centromeres self-perpetuate without the need for a specific underlying sequence. In contrast, conservation of human and Arabidopsis centromeric repeat sequences suggests specific requirements at this level. Extreme models advocating a specific DNA element at centromeres versus no requirement at all will probably require a new synthesis. The means by which the position of the centromere on the chromosome is determined has yet to be resolved, but the recent elucidation of DNA sequence from the centromeres of various species is valuable information for making new predictive models. To determine the importance of various DNA elements found in or near the centromere, the mechanisms that drive evolution of centromeric DNA need to be clarified. For example, the lack of any centromeric elements common to all centromeres in Drosophila may be the result of a homogenization mechanism that is fundamentally different from the one that seems to function in mammals and plants. As additional centromeric sequences continue to become available from many different species, insights into the homogenization of sequences and their involvement in centromere formation will grow.