- Open Access
Recurrent insertion and duplication generate networks of transposable element sequences in the Drosophila melanogaster genome
© Bergman et al.; licensee BioMed Central Ltd. 2006
- Received: 31 July 2006
- Accepted: 29 November 2006
- Published: 29 November 2006
The recent availability of genome sequences has provided unparalleled insights into the broad-scale patterns of transposable element (TE) sequences in eukaryotic genomes. Nevertheless, the difficulties that TEs pose for genome assembly and annotation have prevented detailed, quantitative inferences about the contribution of TEs to genomes sequences.
Using a high-resolution annotation of TEs in Release 4 genome sequence, we revise estimates of TE abundance in Drosophila melanogaster. We show that TEs are non-randomly distributed within regions of high and low TE abundance, and that pericentromeric regions with high TE abundance are mosaics of distinct regions of extreme and normal TE density. Comparative analysis revealed that this punctate pattern evolves jointly by transposition and duplication, but not by inversion of TE-rich regions from unsequenced heterochromatin. Analysis of genome-wide patterns of TE nesting revealed a 'nesting network' that includes virtually all of the known TE families in the genome. Numerous directed cycles exist among TE families in the nesting network, implying concurrent or overlapping periods of transpositional activity.
Rapid restructuring of the genomic landscape by transposition and duplication has recently added hundreds of kilobases of TE sequence to pericentromeric regions in D. melanogaster. These events create ragged transitions between unique and repetitive sequences in the zone between euchromatic and beta-heterochromatic regions. Complex relationships of TE nesting in beta-heterochromatic regions raise the possibility of a co-suppression network that may act as a global surveillance system against the majority of TE families in D. melanogaster.
- Transposable Element
- Long Terminal Repeat
- Pericentromeric Region
- Transposable Element Insertion
- Fourth Chromosome
Nearly all eukaryotic genomes contain a substantial fraction of middle repetitive, transposable element (TE) sequences interspersed with the unique sequences encoding genes and cis-regulatory elements. The broad-scale patterns of TE abundance and distribution in various model organisms have become increasingly well-understood with the recent availability of essentially complete genome sequences (for example, [1–4]). Despite these general advances, however, a detailed understanding of the evolutionary forces that control the abundance and distribution of TEs remains elusive, owing in part to the dynamic nature of this component of the genome as well as to the inherent problems that TE sequences present for genome assembly and annotation.
As with all unfinished whole-genome shotgun assemblies, uncertainty in the assembly of repetitive DNA in the first two releases of the Drosophila melanogaster genome sequence posed difficulties for analysis of TE sequences [5–8]. The improved assembly of repetitive regions in the D. melanogaster Release 3 genome sequence presented the first opportunity to study TEs in a finished whole genome shotgun sequence [2, 9], revealing the true challenge that these sequences pose for their systematic annotation [10, 11]. With further improvements in the Release 4 genome sequence made possible by the efforts of the Berkeley Drosophila Genome Project  (especially in regions of high TE density where several gaps have been completed), we are now in a position to establish more stable trends in TE abundance for D. melanogaster. In addition to having access to improved genome sequence data, we have recently developed an improved TE annotation pipeline that uses the combined evidence of multiple computational methods to predict 'TE models' in genome sequences . We have shown that this pipeline identifies a large number of predicted TEs that were omitted from the Release 3 genome annotations, and subsequently applied this system to the D. melanogaster Release 4 sequence . Here we analyze the results of this effort in detail, which allows an extremely high-resolution view of the structure and location of TEs in one of the highest quality metazoan genome sequences currently available.
We first revised baseline estimates of the TE abundance in the Drosophila genome sequence, based on the fact that TEs show a strikingly non-random distribution across the genome. We then used this baseline to identify specific regions of extremely high TE density in the genome sequence. This analysis showed that regions of the genome broadly known to have high TE abundance, such as pericentromeric regions and the fourth chromosome, are in fact often characterized by distinctly localized regions of extremely high TE density interrupted by regions of lower TE density. Comparative sequence analysis showed that this punctate pattern is unlikely to have arisen in the D. melanogaster genome by inversion of TE-rich heterochromatic sequences, but can evolve in situ by the joint action of recurrent transposition and duplication. Finally, we analyzed in detail the patterns of TE nesting in the genome sequence, taking advantage of the improved joining of fragments from the same TE insertion event in our new annotation. We framed the process of TE nesting as a directed graph and borrowed techniques from network analysis to study genome-wide patterns of TE nesting. This work demonstrates the added value of high-resolution annotations for understanding how TEs impact genome organization and evolution, and preludes the interpretation of TE-rich heterochromatic regions currently being sequenced by the Drosophila Heterochromatin Genome Project .
Abundance and distribution of TEs in the Release 4 genome sequence
Abundance of D. melanogaster TEs annotated in Release 4 genome sequence by genomic region
Total bp TE
No. of TEs
No. of TE per Mbp
No. of TE full length
% TE full length
No. of TE nested
% TE nested
This set of 5,390 TEs defined 4,684 TE-free regions (TFRs)  in the Release 4 genome sequence; 94.5% (111.9 Mb of 118.4 Mb) of the Release 4 genome sequence can be found in TFRs, with 89.8% (106.2 Mb) and 56.1% (66.4 Mb) of the genome found in TFRs of greater than 10 Kb (n = 1,393) and 100 Kb (n = 357), respectively. The longest TFR in D. melanogaster is 855,890 base-pairs (bp) in length on chromosome 2R from 14,374,883-15,230,772, contains 106 genes, and is over 10 times longer than the longest TFR in the human genome . The mean TFR length of 23,878 bp is consistent with the genome-wide minimum estimate of the distance between middle-repetitive interspersed repeats (>13 Kb) based on reassociation kinetics ; however, the median TFR length of 1,992 bp is much smaller. The distribution of TFR lengths departs significantly from an exponential distribution parameterized on this mean length using an adjusted Kolmogorov-Smirnov test (D = 0.4513, p < 0.001), which is based on the maximal difference between observed and expected cumulative distributions and accounts for the fact that the rate parameter for the exponential distribution has been estimated from the data . Similar results are obtained if the rate parameter for the exponential is calculated from the number of TE insertions divided by the total length of TFRs (as in ), both including (adjusted Kolmogorov-Smirnov test, D = 0.4719, p < 0.001) or excluding (adjusted Kolmogorov-Smirnov test, D = 0.4456, p < 0.001) TEs nested in other TEs. These results are not simply a result of a high density in pericentromeric regions (see below) and demonstrate that the location of TEs is non-randomly distributed at the level of the complete D. melanogaster genome sequence, confirming previous results [7, 8, 15]. We note that TFRs in the D. melanogaster genome are likely to vary among individuals since most TE insertions are not fixed in the species ; however, these results should be representative of other strains to the extent that the TE composition of the genome sequence reflects general properties of the species .
Pericentromeric regions, non-pericentromeric regions and the fourth chromosome differ drastically in TE content
Since non-random distribution of TEs can lead to greater than one order of magnitude differences in TE abundance in pericentromeric and non-pericentromeric regions [2, 7, 8, 15, 24], overall genome-wide summary statistics do not accurately reflect TE abundance for any region of the genome sequence. To account for this heterogeneity, we attempted to partition the major chromosome arms into regions of high (pericentromeric) and low (non-pericentromeric) TE density using an independent criterion that is not based on TE content. Our primary goal here was to estimate the TE content in non-pericentromeric regions of the genome as accurately as possible, to understand baseline levels of TE abundance throughout the majority of the genome. Initially we investigated using a partition based on the cytologically defined boundaries between euchromatin and β-heterochromatin estimated in Hoskins et al. . As shown in Figure 1 (red triangles), the cytologically defined limits of the euchromatin/β-heterochromatin boundaries correspond almost exactly to the most distal pericentromeric region of high TE density on chromosome arms 3L and 3R. However, on chromosome arms 2L, 2R and X the most distal pericentromeric regions of extreme TE density are up to 2 Mb from the estimated euchromatin/β-heterochromatin boundary. Thus, using this cytological criterion to partition the genome into regions of high and low TE density still leads to an over-estimate of the true TE abundance for the majority of the genome.
We next evaluated whether genetically defined regions of different recombination rates estimated by Charlesworth  could partition the genome into high and low TE density regions. For all chromosome arms (excluding the fourth chromosome), we found that the estimated boundaries between 'reduced' and 'null' (that is, very low) recombination rates in pericentromeric regions (Figure 1, orange triangles) were located extremely close to the cytologically defined boundaries between euchromatin and β-heterochromatin. Thus, the same tendency to bias estimates of TE abundance exists if the boundary between reduced and null recombination rates is used to partition the genome as for the cytological criterion above. In contrast, the estimated transitions between 'high' and 'reduced' recombination rates in pericentromeric regions (Figure 1, green triangles) are approximately 1 to 2 Mb distal to estimated euchromatin/β-heterochromatin boundaries for all major chromosome arms. Virtually all regions with high TE density were included in the 11% of the genome sequence labeled under this definition as 'pericentromeric' (Figure 1), and, therefore, this partition was used to estimate TE abundance in different regions of D. melanogaster genome. Because our aim was to estimate the TE content in non-pericentromeric regions as a baseline to identify regions of extremely high TE content elsewhere in the genome, the inclusion of some low TE content regions in pericentromeric regions on chromosome arms 3L and 3R using this partition should not bias estimates of the background TE abundance throughout the euchromatin.
A 'typical' region of the D. melanogaster Release 4 genome sequence (that is, the 88% of the genome in non-pericentromeric, high recombination regions on the major chromosome arms) contains approximately 3.32% TE sequences, with an average of 16.9 TEs per Mb (Table 1). Previous estimates based on Release 1 and 2 are not meaningful because of assembly errors [7, 15], and those based on Releases 3 and 4 were computed across the entire genome [2, 10], thus the current figures represent the first unbiased estimates of TE content for the majority of the D. melanogaster genome sequence. As observed in previous releases of the D. melanogaster genome sequence [2, 7], the rank order of abundance of major TE classes in non-pericentromeric regions is: LTR elements (2.42%, 4.96/Mb) > non-LTR elements (0.62%, 3.24/Mb) > TIR elements (0.15%, 2.06/Mb). INE-1 elements account for only 0.10% of a typical region of the D. melanogaster genome, but contribute 6.36 TEs/Mb. Approximately 20.5% of the TEs in non-pericentromeric regions are estimated to be full-length (± 3% of the canonical element including the length of inserted sequences), although this value will undoubtedly change with different definitions of what constitutes a full-length element. Virtually every TE in non-pericentromeric regions exists as an individual insertion, with only 6.41% involved in nests of TEs inserted into other TEs. The majority of TE families (97/121, 80.2%) present in the genome sequence have copies in non-pericentromeric regions.
In stark contrast, the 11% of the genome sequence in pericentromeric, low-recombination regions on major chromosome arms contains 57.5% (n = 3,101) of the 5,390 TEs annotated and 42.7% (2.78 Mb) of the 6.51 Mb of sequence annotated as TE. On average, pericentromeric regions are composed of 20.9% TE sequences, with 233 TEs/Mb (Table 1). Overall, there is approximately 6-fold enrichment in amount of DNA and a 14-fold increase in TE density in pericentromeric regions relative to non-pericentromeric regions. It must be noted, however, that average values of TE content for pericentromeric regions are more variable than for non-pericentromeric regions, because of heterogeneity both within a given pericentromeric region (Figure 1, see below) and among pericentromeric regions on different chromosome arms. For example, the pericentromeric region of chromosome arm 3R had a much lower TE density than other chromosome arms, perhaps relating to the lack of β-heterochromatic sequences in polytene chromosomes at the base of this chromosome arm [27, 28]. TE abundance in the pericentromeric region of the X chromosome is likely to be underestimated because of an unsized and unsequenced physical gap in cytological division 20 [9, 12], which is embedded in a region of extremely high TE density. Because of these effects and the inclusion of some low TE content regions on 3L and 3R that arise from our use of the high-reduced recombination rate boundary (see above), estimates of TE abundance in pericentromeric regions should be treated as approximate. The rank order of abundance for the major classes of TEs is the same in the pericentromeric regions as in non-pericentromeric regions (% TE sequence: LTR > non-LTR > TIR > INE-1; number of TEs/Mb: INE-1 > LTR > non-LTR > TIR). Four-fold fewer pericentromeric TEs were full-length (5.1%) relative to non-pericentromeric regions, with 3-fold greater numbers involved in nests (19.5%) (see Table 1). Virtually all TE families (118/121, 97.5%) present in the genome sequence have copies in pericentromeric regions.
Like pericentromeric regions, the fourth chromosome has a much higher TE abundance than is typical of the genome as a whole: although the fourth chromosome is only 1% of the genome sequence, approximately 10% of TEs annotated are found on chromosome 4. Overall, there is approximately 7-fold enrichment in amount of DNA and a 25-fold increase in TE density on the fourth chromosome relative to regions of normal TE abundance. Important differences in TE abundance between pericentromeric regions and the fourth chromosome were also observed [2, 7] (Table 1). Relative to pericentromeric regions, the fourth chromosome has a higher number of TEs per unit of physical distance (422 TEs/MB), but a similar proportion of genome sequence annotated as TE (22.6%). As noted previously [2, 7], the rank order abundance of the major TE classes on chromosome 4 differs from the rest of the genome, with TIR elements as the most abundant class of TE (% TE sequence: TIR ~ INE-1 > LTR > non-LTR; number of TEs/Mb: INE-1 > TIR > non-LTR > LTR). To test the robustness of this pattern, we removed the most numerous family from each of the major TE classes on the fourth chromosome: LTR, 297 (n = 3); non-LTR, Cr1a (n = 17); TIR, 1360 (n = 62). In the absence of these three highly abundant families, the rank order percent TE sequence (INE-1 > LTR > non-LTR > TIR) and number of TEs/Mb (INE-1 > TIR ~ non-LTR > LTR) change for the fourth chromosome. This result indicates that patterns of abundance by class on the fourth chromosome are heavily influenced by a few highly abundant families, suggesting that Cr1a in addition to INE-1 and 1360 may play an important role in defining the unusual features of this chromosome [18, 29]. Fewer TEs on the fourth chromosome are full-length (2.77%) relative to pericentromeric regions, and a lower proportion of TEs are involved in nests (12.6%). Less than half of all TE families (55/121, 45.5%) present in the genome sequence have copies on the fourth chromosome.
Clear differences were also observed in the distribution of TFRs in these three genomic compartments. Consistent with TE densities, non-pericentromeric regions have on average the largest uninterrupted regions of unique sequence (mean 60,320 bp; median 29,280 bp; n = 1,663), relative to pericentromeric regions (mean 4,147 bp; median 726 bp; n = 2,541) and the fourth chromosome (mean 2,067 bp; median 1,150 bp; n = 480). Nevertheless, separate analyses of TFR distributions within each compartment revealed non-random distribution of TEs based on mean TFR lengths in non-pericentromeric regions (adjusted Kolmogorov-Smirnov test, D = 0.1627, p < 0.001), pericentromeric regions (adjusted Kolmogorov-Smirnov test, D = 0.3501, p < 0.001) and chromosome 4 (adjusted Kolmogorov-Smirnov test, D = 0.1541, p < 0.001). We note that finding of non-random distribution of TEs in non-pericentromeric regions in the genome sequence differs from previous conclusions based on cytological estimates . Our results indicate that the non-random distribution of TEs across the entire genome is not explained solely by overall differences in TE abundance between genomic compartments and suggest that the mechanisms that determine the location of TE insertions, such as gene density and ectopic recombination [7, 15, 31], may be decoupled from overall TE abundance.
Localized regions of extremely high TE density
Regions with extreme TE density in the D. melanogaster Release 4 genome sequence
No. of families
No. of TEs
As expected, nearly all HDRs are located in pericentromeric regions or on chromosome 4, consistent with the general observation that heterochromatic and/or low-recombination rate regions of the genome sequence have high TE densities (see above) [2, 7, 15]. Three HDRs (1, 16, 17) on the major chromosome arms are located in regions not defined as pericentromeric; however, HDR1 on the X-chromosome is found very close to the boundary demarcating these regions and could probably be classified as pericentromeric. HDRs total 4.27 Mb of sequence and, therefore, comprise only 3.6% of the genome, but contain one-third (1,822/5,390; 33.8%) of annotated TEs. Interestingly, one of the most extreme regions of localized TE density in the D. melanogaster genome sequence (HDR4) contains the insertion site for a P-element induced allele (flampy+(P)) of the as-yet-uncharacterized gene flamenco , one of the few genetic loci shown to regulate the activity of transposable elements in Drosophila . HDR4 (which includes the physical gap in cytological division 20) occupies over 230 Kb of DNA and contains at least 104 TEs and 6 genes, including DIP1, which has been excluded as being the gene that is causal for the flamenco mutation . We note that the COM locus also in 20A2-3, which is known to regulate the ZAM and Idefix families of LTR elements, is genetically separable from flamenco  and, therefore, unlikely to correspond to the same region.
The second exceptional HDR (17) on chromosome arm 3R corresponds to a tandemly duplicated array of invader4 elements embedded within the sub-telomeric mini-satellites called telomere-associated sequences ('TAS'). We also found that TAS repeats from chromosome arm 2R  and the original TAS repeat derived from the Dp1187 X-minichromosome  also contain invader4 sequences (results not shown), although no homology to invader4 (or any other TE) is observed in the TAS repeat derived from chromosome arms 2L or 3L [48, 50], suggesting that TE sequences are not functionally constitutive components of TAS repeats. The presence of mobile TE sequences in TAS repeats may explain non-telomeric hybridization signal to TAS probes in the chromocenter and basal euchromatic locations . No HDRs are observed at the ends of other chromosome arms, despite the fact that, in Drosophila, the retrotransposons Het-A, TART and TAHRE function as telomeric repeats to ensure proper integrity of the chromosome ends [51–53]. In the Release 4 sequence, only the X chromosome and fourth chromosome  terminate with small clusters of telomeric TE sequences.
Mechanisms that generate localized regions of high TE density
Surprisingly, the improved resolution provided by our new annotation showed that TE density is not uniformly high in pericentromeric regions, nor is TE density simply an increasing function of proximity to centromeric regions (Figure 1, inset panels). This is especially true for chromosome arms X, 2L and 2R, where pericentromeric HDRs are interspersed with regions of normal TE density, creating a ragged, punctate increase in TE abundance in the direction of the centromere. Chromosome 4 also exhibits discrete regions of different TE density (Table 2), despite a higher overall level of TE abundance. Some HDRs (for example, 1, 8, 13, 16) clearly occur in regions of low INE-1 density, which suggests a recent origin for the high TE density in these regions, assuming that INE-1 represents the ancestral TE distribution at the time of its major burst activity prior to the split of D. melanogaster from its sister species D. simulans [16, 32]. Other HDRs (9, 10, 15 and those on the fourth chromosome) co-occur with regions of high INE-1 density, suggesting these regions of the genome have permitted a high density of TEs, at least as far back as the ancestor of the D. melanogaster species subgroup [16, 32]. This also is likely to hold true for HDRs 11, 12 and 14 at the bases of chromosome arms 2L, 2R and 3L, where non-INE-1 TEs occupy virtually all of the sequence, creating an apparent negative association with INE-1 density.
We did, however, find a relatively high incidence of duplicated sequences in HDRs, suggesting that tandem or segmental duplication plays an important role in the genesis of TE-rich regions of the genome: 13 of 23 HDRs show evidence of duplication (Table 2, Figures 2 and 3c,d). Duplications in HDRs can contain multiple TEs from different families, often nested, sometimes with different copies of the duplicated region containing additional TE insertions (Figure 2). Duplications in HDRs also amplified cellular genes as well as TE sequences: for example, eight partial and complete duplicates of the gene CG32381 are present in HDR1 (Figure 2b). HDRs may also include retrotransposed gene duplicates, such as the Mgst1-like CG12628 , which is found in a nest of TEs in HDR11. The series of events leading to tandem duplication of TEs in HDRs is often highly complex, with repeat structures present at different scales (Figure 3c,d). Duplication of TE sequences could also be observed in other regions of the genome with lower TE density, such as duplication of Rt1c elements interspersed between the SDIC gene duplicates [55, 56]. A more thorough analysis of the interplay between TEs and segmental duplications will be the subject of a separate study (A-S Fiston, D Anxolabehere and H Quesneville, personal communication).
A graph-based approach to analyze patterns of TE nesting
Regions of extremely high TE density typically contain a high proportion of TEs inserted into other TEs, and our new annotation allowed us to examine patterns of TE nesting in greater detail than has previously been possible. Few methods exist to analyze TE nesting, partly because of limitations in accurately joining fragments of a TE insertion that become separated in the genome by a subsequent nested TE insertion, and partly because analysis of TE nesting is complicated by the redundancies inherent in complex nesting relationships. For example, if one TE (A) is nested within a second (B) that is in turn nested within a third (C), simply analyzing overlapping ranges of TEs in the genome will erroneously yield three nesting events (A→B, A→C, and B→C), when only two occurred historically (A→B and B→C). We found that complex nesting relationships could best be analyzed by identifying 'primary' nesting relationships (A→B and B→C in the example above) and assembly of these simple binary events into more complex nesting relationships by applying concepts from network analysis to describe and quantify patterns of TE nesting. In this formulation of the problem, TE nesting relationships are represented as a graph having TEs as nodes and transposition events as directed edges. The directed nature of this graph implies both the spatial relationships of nested TEs in the genome as well as temporal relationships implied in TE nesting resulting from the fact that the outer TE in a nest must have existed in the genome prior to the insertion of the inner TE . This 'nesting graph' is amenable to standard computation and can be recast in several forms, since each annotated TE node can be analyzed at the individual, family or class level. (We chose not to analyze the degree of distribution of nesting graphs for 'small-world' properties because of biases resulting from duplicated nests, and because the subgraphs in the sequenced portion of the genome may not reflect properties of the entire nesting graph .)
Patterns of nesting among different classes of TE in the D. melanogaster Release 4 genome sequence
Organization of TEs in β-heterochromatic regions
The nature of the transition zone between euchromatin and heterochromatin in D. melanogaster has been the subject of much controversy, in part because heterochromatic regions (as defined in mitotic chromosomes) can be further subdivided into α-heterochromatin and β-heterochromatin . β-heterochromatic regions are cytologically visible in polytene chromosomes, although their banding pattern is 'diffuse' or 'mesh-like,' suggesting under-replication relative to the finely banded euchromatic regions (reviewed in ). Under-replicated regions are observed elsewhere in polytene chromosomes and co-localize with regions referred to as 'weak points' or 'intercalary heterochromatin' that form ectopic contacts and are subject to chromosome breakage [62, 63]. The amount and degree of polytenization in β-heterochromatic regions is subject to both environmental and genetic factors , as most conclusively shown by the appearance of several large banded regions in the chromocenter of salivary gland chromosomes of the Su(UR) mutant . Charlesworth et al. estimate that 10% of the D. melanogaster genome is composed of β-heterochromatin  and large amounts of β-heterochromatic DNA are found in pericentromeric regions of most (but not all) chromosome arms [27, 28], a fraction of which is captured in the Release 4 genome sequence (Figure 1).
Analysis of the first draft of the D. melanogaster genome sequence offered the first glimpse of the contiguous molecular organization of β-heterochromatin, and suggested that "there is no clear boundary between heterochromatin and euchromatin" but rather that the transition is characterized by "a gradual increase in the density of transposable elements and other repeats" . The view that the β-heterochromatic regions exhibit a gradual increase in TE density has been subsequently reiterated [25, 67], although our results call this view into question for three of the five major chromosome arms. Far from a gradual transition, our high-resolution TE annotation provides evidence for discretely localized regions of extremely high TE density at the base of chromosome arms X, 2L and 2R overlain on a background of increased TE abundance, such that the increase in TE content is not monotonic in the direction of the centromere. This result represents the inverse of, and provides an explanation for, previous observations that the distribution of genes on these chromosome arms alternates between low and high density in the centromere proximal direction [66, 67]. We note that the alternating pattern of high and low TE (versus unique) sequences reported here in β-heterochromatic regions differs from the 'islands' of complex (TE) sequences surrounded by 'seas' of satellite DNA observed deeper in α-heterochromatic regions .
Further evidence that discrete regions of extreme TE density outside of β-heterochromatic regions may have unusual cytological properties can be found on chromosome 2. Discrete HDRs can be observed in the vicinity of the Histone cluster in 39E (HDRs 9+10) and just distal to the major tRNA cluster at 42A (HDR13). Both of these regions are known to be 'weak points' in polytene chromosomes, which form breaks and ectopic contacts with other weak points in the genome that are alleviated by the Su(UR) mutation, suggesting that these regions are under-replicated in polytene chromosomes . These observations, together with the generally poor banding patterns in high TE density pericentromeric regions and on the fourth chromosome, suggest that high TE density may directly interfere with the process of polytenization, either through stalling replication forks  or through DNA elimination . Thus, high TE densities may not be directly responsible for heterochromatin formation per se, but may simply inhibit the ability to detect bona fide euchromatic regions that are TE dense, at least in salivary gland polytene chromosomes. The formation of large blocks of TE-rich, banded material deep in heterochromatic regions in under-replication suppressing strains like Su(UR) supports this view [65, 74]. Moreover, if regions of high TE density affect polytenization, ectopic contact among 'weak points' may occur via homology between sequences of the same TE family. Additionally, the inherent mobility of TEs provides a mechanism to explain differences in the presence or absence of β-heterochromatin on homologous chromosome arms among Drosophila species .
Origin of 'clustered scrambled repeats'
Although the predominant organization of middle repetitive DNA such as TE sequences in D. melanogaster is characterized by individual repeats found within long regions of single copy DNA (the 'long period interspersion' pattern) , direct evidence has long existed for an alternative organization characterized by 'clustered scrambled repeats' [21, 41]. Wensink et al.  estimated that the genome of D. melanogaster contained over 1,000 such clustered scrambled repeats and predicted that these regions were created by recurrent mobile element insertion. The HDRs and TE nests detected in the present study likely correspond to a subset of the clustered, scrambled repeats detected by Wensink et al. , with the remainder yet to be discovered in currently unfinished or unsequenced heterochromatic regions. Clustered, scrambled TE nests are generally thought to arise through the serial transposition of individual elements into previously inserted TEs, as shown by the analysis of nested TEs in maize, which demonstrated that the ages of inner TEs are younger than the outer TEs into which they insert . Such serial transposition is ultimately responsible for the origin of nested TEs, though once formed, nests may be subsequently copied and amplified. Therefore, it is possible other mechanism may play an important role in the genome-wide pattern of clustered scramble repeats, such as the transposition or duplication of previously nested elements.
Such observations are difficult to explain without proposing that the duplicated copies of this jockey2→Cr1a nest arose by transposition of a pre-existing nested element, as was proposed to occur by Wensink et al. . Other potential examples of transposition of clustered, scrambled repeats can also be observed in our data, such as a jockey-Rt1c cluster present in both HDR1 and HDR2, which are separated by over a megabase of DNA, and a BS3-X-element cluster present near su(f) and that is also found in HDR7 . Though it may seem unlikely, the transposition of nested TEs is indeed plausible since DNA-based elements can transpose when additional sequences are inserted between TIRs , and retroelements may reverse transcribe mRNAs arising from nested or rearranged TEs, a mechanism that has been invoked previously for the formation of new TE families . Moreover, the raw material for retrotransposition of nested elements is available in the fly transcriptome, as reflected in the chimeric transcripts that arise from two or more TEs found in D. melanogaster EST/cDNA libraries (results not shown).
Do β-heterochromatic regions permit the evolution of co-suppression networks?
A growing body of evidence implicates RNA silencing mechanisms in regulating the activity of TE expression and transposition in Drosophila. Expression of TE-derived transcripts is elevated in mutations for genes involved in RNA silencing, including spn-E, aubergine and piwi [81–84]. The capacity of telomeric P-element insertions to induce the repressive P-cytotype is also impaired in aubergine mutants . All major classes of TE in Drosophila produce small repeat associated RNAs (rasiRNAs)  that may be used to silence TE expression using a dicer-independent RNA silencing pathway . Moreover, the Argonaute family member piwi regulates expression from copia and gypsy reporter transgenes [82, 83] and rates of mdg1 transposition are elevated in a piwi mutant background . Similarly, resistance to invasion by the I-element can be provided by strains carrying a transgene containing I-element sequences in a dose-, length- and transcription-dependent manner . Heterologous reporter genes carrying transcribed gypsy sequences are also sensitive to regulation by flamenco , suggesting the possibility of an RNA dependent mechanism of action for this locus, which is known to regulate rates of gypsy transposition.
Regulation of TE transposition may rely on endogenous TE sequences present in the genome as well as the RNA silencing machinery. Jensen et al.  proposed an indirect model of co-suppression through 'relay' sequences derived from defective I-elements located in pericentromeric regions. Likewise, mapping of factors controlling rates of gypsy, ZAM and Idefix transposition to a β-heterochromatic location at the base of the X chromosome has led Desset et al.  to propose that transcription from remnants of TEs in 20A may provide the critical substrate for co-suppression of these transposable element families. Our work demonstrates that the Drosophila genome contains ample material for co-suppression within virtually all TE families, given the fact that transcription is known to occur in β-heterochromatin regions .
In addition to the possibility of co-suppression among different copies of the same TE family, our analysis of nesting relationships among different TE families suggests the possibility of an extensive network of co-suppression among essentially all families in the genome (Figure 5). We propose that expression of chimeric sequences from TE nests may simultaneously co-suppress multiple TE families by acting as relay sequences that co-suppress transcripts from other nests or individual elements located in the euchromatic arms. Evidence for such a 'co-suppression network' is found in the COM locus, which appears to control the activity of more than one TE family simultaneously. Even in the absence of direct co-suppression on a family, once a member of a newly invading TE family transposes into the nesting network, the entire family could become regulated by co-suppression mechanisms. This model proposes that the accumulation of clustered, nested TEs in β-heterochromatic regions may incidentally provide a trap for the regulation of TEs across the genome, and solves the need for the host to evolve separate genic changes to regulate the transposition of each new family that invades the genome. Such a co-suppression network could act as a global TE surveillance mechanism, with the highly nested structure of TEs in β-heterochromatic regions intrinsically providing a systems-level 'adaptive immunity' from invasion by horizontal transfer. Moreover, since nesting can bring several TE promoters in close proximity to each other and thereby increase the probability of transcription, TE nesting may facilitate a more effective co-suppression network than would be possible through the cumulative effects among isolated TEs within single families. Finally, since as a TE family increases in number the chance it participates in the co-suppression network is likely to increase, pervasive nesting may also generate a pressure on TE families to diversify (as has previously been proposed for the mechanism of ectopic recombination ), potentially providing an explanation for the large diversity of TE families in the Drosophila genome.
By accounting for the non-random distribution of TEs across the genome, we provide an accurate estimate of TE abundance for the vast majority of the genome sequence in high-recombination, non-pericentromeric regions. We confirm that regions of extreme TE density are mostly in the pericentromeric and/or low-recombination regions of the genome that are known broadly to have high TE abundance. However, we show that regions of high TE density within pericentromeric regions are often distinctly localized and interrupted by regions of normal TE density in the transition zone from euchromatic to β-heterochromatic regions. Through comparative analysis with D. yakuba, we found no evidence that this ragged, punctate pattern of highly localized TE abundance arises via inversion of TE-rich sequences from deeper in heterochromatic regions. We did find, however, that duplication of TE sequences plays an important role in the rapid evolution of localized regions of extreme TE abundance. We introduced network analysis techniques to study patterns of TE nesting, providing a comprehensive view of the spatial and temporal interactions among TEs at the individual, family and class levels. We show the existence of a highly connected family-level nesting network, which suggests the possibility of an intrinsic 'co-suppression network' acting to regulate the vast majority of TE families in D. melanogaster genome. The results presented here provide a framework for comparison with finished heterochromatic sequences being produced by the Drosophila Heterochromatin Genome Project .
Dataset of TE annotations
The combined-evidence method used to identify TE sequences has been described previously . Briefly, borrowing concepts from gene annotation, we have developed a TE annotation pipeline that integrates multiple lines of computational evidence to generate 'TE models.' The 6,013 predicted TE models of Quesneville et al.  were used with the following exceptions. Three TE annotations were removed (FlyBase IDs: FBti0062904, FBti0060950 and FBti0060875) that have subsequently been shown to be redundant entries that resulted from edge effects in overlapping contigs. In addition, all TE models based on non-D. melanogaster canonical elements were removed with the exception of those from D. simulans, the sister species to D. melanogaster, to be conservative in our analyses. These 620 annotations from foreign elements account for over 10% of the TE models but only 82,229 bp (1.2%) of sequence of the sequence annotated as TE in .
Testing a model of random TE distribution
Under the null hypothesis that TEs are distributed uniformly throughout a genomic region, distances between TEs (TE-free regions, abbreviated as TFRs) should follow a negative exponential distribution . In contrast to the analysis of Simons et al.  who evaluate the number of TFRs above an arbitrary length cutoff, we test the fit of observed TFR lengths to the full negative exponential distribution. The rate parameter for the negative exponential can be estimated in two ways, either as the inverse of the mean of observed TFR lengths, or by dividing the number of TE insertions by the total length of TFRs, as in . In the first case, the observed TFR distribution can be tested directly against the expected distribution computed from the negative exponential distribution. In the second case, since nested elements contribute to the number of TE insertions but not the length of TFRs, the number of inner nested TEs (491) must be discounted from the total number of insertions before computing the average TFR length, or an equivalent number of 0-length TFRs must be added to the observed TFR length distribution. Goodness-of-fit to the negative exponential distribution was calculated using the Kolmogorov-Smirnov one-sample statistic in R , which computes the maximal difference between the observed and expected cumulative distributions. We have used adjusted critical values taken from  to account for the fact that the rate parameter of the expected distribution was estimated from the data.
Definition of chromatin and recombination boundaries
Cytological boundaries of the pericentromeric euchromatin/heterochromatin boundary were estimated from the mitotic FISH data in , as mapped to Release 4 (Chris D Smith, personal communication). Boundaries between 'high,' reduced,' and 'null' recombination rates in pericentromeric regions  were estimated by mapping cytological locations to the Release 4 sequence using the 'cytolocation' search in FlyBase gbrowser . Ranges of cytological divisions were grouped into genome coordinates following Bartolome et al. . Boundaries of pericentromeric regions were operationally defined for the major chromosome arms as regions of reduced recombination as the proximal positions of bands 19D3 on chromosome arm X (20,369,021), 38A1 on chromosome arm 2L (19,669,505), and 77E1 on chromosome arm 3L (20,545,022), and the distal positions of bands 42F3 on chromosome arm 2R (2,692,485) and 84B1 on chromosome arm 3R (2,811,816).
Definition and analysis of regions of high TE density
Sliding window analysis to identify HDRs was done using 50 Kb windows with a 25 Kb offset. The number of TEs per window, rather than percent TE sequence, was used to identify regions of high TE density. Windows having 10 or more TEs/50 Kb (that is, 200 TEs/Mb), a density that is approximately 20-fold the average of non-pericentromeric regions were used to seed HDRs. Neighboring windows were then merged to define the final set of HDRs, allowing intervening windows of 9 or more TEs/50 Kb to account for small fluctuations in TE abundance. Orthologous regions in the D. yakuba (droYak1, April 2004) genome sequence of HDRs in D. melanogaster ± 50 Kb were extracted from the Berkeley pipeline whole-genome alignments  and updated to the most recent version of the D. yakuba genome sequence (droYak2, November 2005) using BLAT . Dotplot analysis of orthologous regions was performed on both forward and reverse strands or HDRs and their orthologues using the dottup program in the EMBOSS package .
Graphical analysis of TE nesting
Patterns of TE nesting were analyzed using network analysis techniques, with nesting events represented as directed edges between two TE nodes. The Release 4 annotation represents nested TEs as overlapping spans among sets of genome coordinates, where the range of an inner TE in a nest is fully subsumed within the range of an outer TE. For each inner TE in the genome that met these conditions, we identified the 'primary' nesting relationship among the single outer TE immediately present on both sides of the inner TE span, and created a directed edge in the nesting graph labeled inner→outer. The inner and outer labels were individual, family or class identifiers, depending on the biological level of analysis. These primary nesting relationships provide a sufficient and non-redundant description of TE nesting in the genome, and can be used to reconstruct more complex nesting relationships at the individual, family or class levels. Manipulation, analysis and visualization of nesting graphs were conducted using the PERL Graph module version 0.69 , the Combinatorica package in Mathematica 5.1  and Cytoscape 2.2 . Enumeration of all directed cycles of a fixed path length was performed using the method described in .
We thank Chris Smith for supplying Release 4 estimates of the euchromatin/heterochromatin boundaries, Brian Charlesworth for advice on estimated boundaries of recombination rates, and the Washington University Genome Sequencing Center for the D. yakuba genome sequences. We thank Douda Bensasson, Brian Charlesworth, Scott Hawley, Roger Hoskins, Gary Karpen and Steve Russell for insightful discussions and Brian Charlesworth, Roger Hoskins, Max Reuter, Chris Smith and one anonymous reviewer for comments on the manuscript. This work was supported by a USA Research Fellowship from the Royal Society to CMB; by the 'Centre National de Recherche Scientifique' (CNRS), the Universities P and M Curie and D Diderot (Institut Jacques Monod, UMR 7592) and by the 'Programme Bio-Informatique' (CNRS); and by a MRC Programme Grant to MA and S Russell.
- The Arabidopsis Genome Initiative: Analysis of the genome sequence of the flowering plant Arabidopsis thaliana. Nature. 2000, 408: 796-815. 10.1038/35048692.View ArticleGoogle Scholar
- Kaminker JS, Bergman CM, Kronmiller B, Carlson J, Svirskas R, Patel S, Frise E, Wheeler DA, Lewis SE, Rubin GM, et al: The transposable elements of the Drosophila melanogaster euchromatin: a genomics perspective. Genome Biol. 2002, 3: RESEARCH0084-10.1186/gb-2002-3-12-research0084.PubMedPubMed CentralView ArticleGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al: Initial sequencing and analysis of the human genome. Nature. 2001, 409: 860-921. 10.1038/35057062.PubMedView ArticleGoogle Scholar
- Yu J, Hu S, Wang J, Wong GK, Li S, Liu B, Deng Y, Dai L, Zhou Y, Zhang X, et al: A draft sequence of the rice genome (Oryza sativa L. ssp. indica). Science. 2002, 296: 79-92. 10.1126/science.1068037.PubMedView ArticleGoogle Scholar
- Myers EW, Sutton GG, Delcher AL, Dew IM, Fasulo DP, Flanigan MJ, Kravitz SA, Mobarry CM, Reinert KH, Remington KA, et al: A whole-genome assembly of Drosophila. Science. 2000, 287: 2196-2204. 10.1126/science.287.5461.2196.PubMedView ArticleGoogle Scholar
- Benos PV, Gatt MK, Murphy L, Harris D, Barrell B, Ferraz C, Vidal S, Brun C, Demaille J, Cadieu E, et al: From first base: the sequence of the tip of the X chromosome of Drosophila melanogaster, a comparison of two sequencing strategies. Genome Res. 2001, 11: 710-730. 10.1101/gr.173801.PubMedPubMed CentralView ArticleGoogle Scholar
- Bartolome C, Maside X, Charlesworth B: On the abundance and distribution of transposable elements in the genome of Drosophila melanogaster. Mol Biol Evol. 2002, 19: 926-937.PubMedView ArticleGoogle Scholar
- Blumenstiel JP, Hartl DL, Lozovsky ER: Patterns of insertion and deletion in contrasting chromatin domains. Mol Biol Evol. 2002, 19: 2211-2225.PubMedView ArticleGoogle Scholar
- Celniker SE, Wheeler DA, Kronmiller B, Carlson JW, Halpern A, Patel S, Adams M, Champe M, Dugan SP, Frise E, et al: Finishing a whole genome shotgun sequence assembly: release 3 of the Drosophila euchromatic genome sequence. Genome Biol. 2002, 3: RESEARCH0079-10.1186/gb-2002-3-12-research0079.PubMedPubMed CentralView ArticleGoogle Scholar
- Quesneville H, Bergman CM, Andrieu O, Autard D, Nouaud D, Ashburner M, Anxolabehere D: Combined evidence annotation of transposable elements in genome sequences. PLoS Comput Biol. 2005, 1: e22-10.1371/journal.pcbi.0010022.PubMed CentralView ArticleGoogle Scholar
- Caspi A, Pachter L: Identification of transposable elements using multiple alignments of related genomes. Genome Res. 2006, 16: 260-270. 10.1101/gr.4361206.PubMedPubMed CentralView ArticleGoogle Scholar
- Berkeley Drosophila Genome Project: Release 4 Notes. [http://www.fruitfly.org/annot/release4.html]
- Drosophila Heterochromatin Genome Project. [http://www.dhgp.org/]
- Locke J, Howard LT, Aippersbach N, Podemski L, Hodgetts RB: The characterization of DINE-1, a short, interspersed repetitive element present on chromosome and in the centric heterochromatin of Drosophila melanogaster. Chromosoma. 1999, 108: 356-366. 10.1007/s004120050387.PubMedView ArticleGoogle Scholar
- Rizzon C, Marais G, Gouy M, Biemont C: Recombination rate and the distribution of transposable elements in the Drosophila melanogaster genome. Genome Res. 2002, 12: 400-407. 10.1101/gr.210802. Article published online before print in February 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Kapitonov VV, Jurka J: Molecular paleontology of transposable elements in the Drosophila melanogaster genome. Proc Natl Acad Sci USA. 2003, 100: 6569-6574. 10.1073/pnas.0732024100.PubMedPubMed CentralView ArticleGoogle Scholar
- Montgomery E, Charlesworth B, Langley CH: A test for the role of natural selection in the stabilization of transposable element copy number in a population of Drosophila melanogaster. Genet Res. 1987, 49: 31-41.PubMedView ArticleGoogle Scholar
- Slawson EE, Shaffer CD, Malone CD, Leung W, Kellmann E, Shevchek RB, Craig CA, Bloom SM, Bogenpohl J, Dee J, et al: Comparison of dot chromosome sequences from D. melanogaster and D. virilis reveals an enrichment of DNA transposon sequences in heterochromatic domains. Genome Biol. 2006, 7: R15-10.1186/gb-2006-7-2-r15.PubMedPubMed CentralView ArticleGoogle Scholar
- Yang HP, Hung TL, You TL, Yang TH: Genome-wide comparative analysis of the highly abundant transposable element DINE-1 suggests a recent transpositional burst in Drosophila yakuba. Genetics. 2005, 173: 189-96. 10.1534/genetics.105.051714.PubMedView ArticleGoogle Scholar
- Simons C, Pheasant M, Makunin IV, Mattick JS: Transposon-free regions in mammalian genomes. Genome Res. 2006, 16: 164-172. 10.1101/gr.4624306.PubMedPubMed CentralView ArticleGoogle Scholar
- Manning JE, Schmid CW, Davidson N: Interspersion of repetitive and nonrepetitive DNA sequences in the Drosophila melanogaster genome. Cell. 1975, 4: 141-155. 10.1016/0092-8674(75)90121-X.PubMedView ArticleGoogle Scholar
- Srinivasan R: An approach to testing the goodness of fit of incompletely specified distributions. Biometrika. 1970, 57: 605-611. 10.2307/2334778.View ArticleGoogle Scholar
- Charlesworth B, Langley CH: The population genetics of Drosophila transposable elements. Annu Rev Genet. 1989, 23: 251-287. 10.1146/annurev.ge.23.120189.001343.PubMedView ArticleGoogle Scholar
- Charlesworth B, Jarne P, Assimacopoulos S: The distribution of transposable elements within and between chromosomes in a population of Drosophila melanogaster. III. Element abundances in heterochromatin. Genet Res. 1994, 64: 183-197.PubMedView ArticleGoogle Scholar
- Hoskins RA, Smith CD, Carlson JW, Carvalho AB, Halpern A, Kaminker JS, Kennedy C, Mungall CJ, Sullivan BA, Sutton GG, et al: Heterochromatic sequences in a Drosophila whole-genome shotgun assembly. Genome Biol. 2002, 3: RESEARCH0085-10.1186/gb-2002-3-12-research0085.PubMedPubMed CentralView ArticleGoogle Scholar
- Charlesworth B: Background selection and patterns of genetic diversity in Drosophila melanogaster. Genet Res. 1996, 68: 131-149.PubMedView ArticleGoogle Scholar
- Gall JG: Repetitive DNA in Drosophila. Molecular Cytogenetics. Edited by: Hamkalo BA, Papaconstantinou J. 1973, New York: Plenum, 59-74.Google Scholar
- Miklos GL, Cotsell JN: Chromosome structure at interfaces between major chromatin types: alpha- and beta-heterochromatin. Bioessays. 1990, 12: 1-6. 10.1002/bies.950120102.PubMedView ArticleGoogle Scholar
- Sun FL, Haynes K, Simpson CL, Lee SD, Collins L, Wuller J, Eissenberg JC, Elgin SC: cis-Acting determinants of heterochromatin formation on Drosophila melanogaster chromosome four. Mol Cell Biol. 2004, 24: 8210-8220. 10.1128/MCB.24.18.8210-8220.2004.PubMedPubMed CentralView ArticleGoogle Scholar
- Charlesworth B, Lapid A, Canada D: The distribution of transposable elements within and between chromosomes in a population of Drosophila melanogaster. II. Inferences on the nature of selection against elements. Genet Res. 1992, 60: 115-130.PubMedView ArticleGoogle Scholar
- Wright SI, Agrawal N, Bureau TE: Effects of recombination rate and gene density on transposable element distributions in Arabidopsis thaliana. Genome Res. 2003, 13: 1897-1903.PubMedPubMed CentralGoogle Scholar
- Singh ND, Petrov DA: Rapid sequence turnover at an intergenic locus in Drosophila. Mol Biol Evol. 2004, 21: 670-680. 10.1093/molbev/msh060.PubMedView ArticleGoogle Scholar
- Butler H, Levine S, Wang X, Bonyadi S, Fu G, Lasko P, Suter B, Doerig R: Map position and expression of the genes in the 38 region of Drosophila. Genetics. 2001, 158: 1597-1614.PubMedPubMed CentralGoogle Scholar
- Maside X, Assimacopoulos S, Charlesworth B: Fixation of transposable elements in the Drosophila melanogaster genome. Genet Res. 2005, 85: 195-203. 10.1017/S0016672305007548.PubMedView ArticleGoogle Scholar
- Robert V, Prud'homme N, Kim A, Bucheton A, Pelisson A: Characterization of the flamenco region of the Drosophila melanogaster genome. Genetics. 2001, 158: 701-713.PubMedPubMed CentralGoogle Scholar
- Prud'homme N, Gans M, Masson M, Terzian C, Bucheton A: Flamenco, a gene controlling the gypsy retrovirus of Drosophila melanogaster. Genetics. 1995, 139: 697-711.PubMedPubMed CentralGoogle Scholar
- Desset S, Meignin C, Dastugue B, Vaury C: COM, a heterochromatic locus governing the control of independent endogenous retroviruses from Drosophila melanogaster. Genetics. 2003, 164: 501-509.PubMedPubMed CentralGoogle Scholar
- Lis JT, Prestidge L, Hogness DS: A novel arrangement of tandemly repeated genes at a major heat shock site in D. melanogaster. Cell. 1978, 14: 901-919. 10.1016/0092-8674(78)90345-8.PubMedView ArticleGoogle Scholar
- O'Hare K, Chadwick BP, Constantinou A, Davis AJ, Mitchelson A, Tudor M: A 5.9-kb tandem repeat at the euchromatin-heterochromatin boundary of the X chromosome of Drosophila melanogaster. Mol Genet Genomics. 2002, 267: 647-655. 10.1007/s00438-002-0698-x.PubMedView ArticleGoogle Scholar
- Hackett RW, Lis JT: DNA sequence analysis reveals extensive homologies of regions preceding hsp70 and alphabeta heat shock genes in Drosophila melanogaster. Proc Natl Acad Sci USA. 1981, 78: 6196-6200. 10.1073/pnas.78.10.6196.PubMedPubMed CentralView ArticleGoogle Scholar
- Wensink PC, Tabata S, Pachl C: The clustered and scrambled arrangement of moderately repetitive elements in Drosophila DNA. Cell. 1979, 18: 1231-1246. 10.1016/0092-8674(79)90235-6.PubMedView ArticleGoogle Scholar
- Livak KJ, Freund R, Schweber M, Wensink PC, Meselson M: Sequence organization and transcription at two heat shock loci in Drosophila. Proc Natl Acad Sci USA. 1978, 75: 5613-5617. 10.1073/pnas.75.11.5613.PubMedPubMed CentralView ArticleGoogle Scholar
- Lipatov M, Lenkov K, Petrov DA, Bergman CM: Paucity of chimeric gene-transposable element transcripts in the Drosophila melanogaster genome. BMC Biol. 2005, 3: 24-10.1186/1741-7007-3-24.PubMedPubMed CentralView ArticleGoogle Scholar
- Lerman DN, Michalak P, Helin AB, Bettencourt BR, Feder ME: Modification of heat-shock gene expression in Drosophila melanogaster populations via transposable elements. Mol Biol Evol. 2003, 20: 135-144. 10.1093/molbev/msg015.PubMedView ArticleGoogle Scholar
- Walser JC, Chen B, Feder ME: Heat-shock promoters: targets for evolution by P transposable elements in Drosophila. PLoS Genet. 2006, 2: e165-10.1371/journal.pgen.0020165.PubMedPubMed CentralView ArticleGoogle Scholar
- Leigh Brown AJ: Variation at the 87A heat-shock loci in Drosophila melanogaster. Proc Natl Acad Sci USA. 1983, 80: 5350-5354. 10.1073/pnas.80.17.5350.View ArticleGoogle Scholar
- Maside X, Bartolome C, Charlesworth B: S-element insertions are associated with the evolution of the Hsp70 genes in Drosophila melanogaster. Curr Biol. 2002, 12: 1686-1691. 10.1016/S0960-9822(02)01181-8.PubMedView ArticleGoogle Scholar
- Abad JP, De Pablos B, Osoegawa K, De Jong PJ, Martin-Gallardo A, Villasante A: Genomic analysis of Drosophila melanogaster telomeres: full-length copies of HeT-A and TART elements at telomeres. Mol Biol Evol. 2004, 21: 1613-1619. 10.1093/molbev/msh174.PubMedView ArticleGoogle Scholar
- Karpen GH, Spradling AC: Analysis of subtelomeric heterochromatin in the Drosophila minichromosome Dp1187 by single P-element insertional mutagenesis. Genetics. 1992, 132: 737-753.PubMedPubMed CentralGoogle Scholar
- Walter MF, Jang C, Kasravi B, Donath J, Mechler BM, Mason JM, Biessmann H: DNA organization and polymorphism of a wild-type Drosophila telomere region. Chromosoma. 1995, 104: 229-241.PubMedView ArticleGoogle Scholar
- Biessmann H, Champion LE, O'Hair M, Ikenaga K, Kasravi B, Mason JM: Frequent transpositions of Drosophila melanogaster HeT-A transposable elements to receding chromosome ends. EMBO J. 1992, 11: 4459-4469.PubMedPubMed CentralGoogle Scholar
- Levis RW, Ganesan R, Houtchens K, Tolar LA, Sheen FM: Transposons in place of telomeric repeats at a Drosophila telomere. Cell. 1993, 75: 1083-1093. 10.1016/0092-8674(93)90318-K.PubMedView ArticleGoogle Scholar
- Abad JP, De Pablos B, Osoegawa K, De Jong PJ, Martin-Gallardo A, Villasante A: TAHRE, a novel telomeric retrotransposon from Drosophila melanogaster, reveals the origin of Drosophila telomeres. Mol Biol Evol. 2004, 21: 1620-1624. 10.1093/molbev/msh180.PubMedView ArticleGoogle Scholar
- Betran E, Thornton K, Long M: Retroposed new genes out of the X in Drosophila. Genome Res. 2002, 12: 1854-1859. 10.1101/gr.6049.PubMedPubMed CentralView ArticleGoogle Scholar
- Nurminsky DI, Nurminskaya MV, De Aguiar D, Hartl DL: Selective sweep of a newly evolved sperm-specific gene in Drosophila. Nature. 1998, 396: 572-575. 10.1038/25126.PubMedView ArticleGoogle Scholar
- Ponce R, Hartl DL: The evolution of the novel Sdic gene cluster in Drosophila melanogaster. Gene. 2006, 376: 174-183. 10.1016/j.gene.2006.02.011.PubMedView ArticleGoogle Scholar
- SanMiguel P, Gaut BS, Tikhonov A, Nakajima Y, Bennetzen JL: The paleontology of intergene retrotransposons of maize. Nat Genet. 1998, 20: 43-45. 10.1038/1695.PubMedView ArticleGoogle Scholar
- Stumpf MP, Wiuf C, May RM: Subnets of scale-free networks are not scale-free: sampling properties of networks. Proc Natl Acad Sci USA. 2005, 102: 4221-4224. 10.1073/pnas.0501179102.PubMedPubMed CentralView ArticleGoogle Scholar
- Harden N, Ashburner M: Characterization of the FB-NOF transposable element of Drosophila melanogaster. Genetics. 1990, 126: 387-400.PubMedPubMed CentralGoogle Scholar
- Badal M, Portela A, Xamena N, Cabre O: Molecular and bioinformatic analysis of the FB-NOF transposable element. Gene. 2006, 371: 130-135. 10.1016/j.gene.2005.11.020.PubMedView ArticleGoogle Scholar
- Heitz E: Uber a and b-heterochromatin sowie Konstanz und Bau der Chromomeren bei Drosophila. Biologisches Zentralblatt. 1934, 54: 588-609.Google Scholar
- Bridges CB: Salivary chromosome maps with a key to banding of the chromsomes of Drosophila melanogaster. J Hered. 1935, 26: 60-64.Google Scholar
- Kauffmann BP: Distribution of induced breaks along the X-chromosome of Drosophila melanogaster. Proc Natl Acad Sci USA. 1939, 25: 571-577. 10.1073/pnas.25.11.571.View ArticleGoogle Scholar
- Koryakov DE, Alekseyenko AA, Zhimulev IF: Dynamic organization of the beta-heterochromatin in the Drosophila melanogaster polytene X chromosome. Mol Gen Genet. 1999, 260: 503-509. 10.1007/s004380050923.PubMedView ArticleGoogle Scholar
- Belyaeva ES, Zhimulev IF, Volkova EI, Alekseyenko AA, Moshkin YM, Koryakov DE: Su(UR)ES: a gene suppressing DNA underreplication in intercalary and pericentric heterochromatin of Drosophila melanogaster polytene chromosomes. Proc Natl Acad Sci USA. 1998, 95: 7532-7537. 10.1073/pnas.95.13.7532.PubMedPubMed CentralView ArticleGoogle Scholar
- Adams MD, Celniker SE, Holt RA, Evans CA, Gocayne JD, Amanatides PG, Scherer SE, Li PW, Hoskins RA, Galle RF, et al: The genome sequence of Drosophila melanogaster. Science. 2000, 287: 2185-2195. 10.1126/science.287.5461.2185.PubMedView ArticleGoogle Scholar
- Myster SH, Wang F, Cavallo R, Christian W, Bhotika S, Anderson CT, Peifer M: Genetic and bioinformatic analysis of 41C and the 2R heterochromatin of Drosophila melanogaster: a window on the heterochromatin-euchromatin junction. Genetics. 2004, 166: 807-822. 10.1534/genetics.166.2.807.PubMedPubMed CentralView ArticleGoogle Scholar
- Sun X, Le HD, Wahlstrom JM, Karpen GH: Sequence analysis of a functional Drosophila centromere. Genome Res. 2003, 13: 182-194. 10.1101/gr.681703.PubMedPubMed CentralView ArticleGoogle Scholar
- Lifschytz E: Fine-structure analysis and genetic organization at the base of the X chromosome in Drosophila melanogaster. Genetics. 1978, 88: 457-467.PubMedPubMed CentralGoogle Scholar
- Yamamoto MT, Mitchelson A, Tudor M, O'Hare K, Davies JA, Miklos GL: Molecular and cytogenetic analysis of the heterochromatin-euchromatin junction region of the Drosophila melanogaster X chromosome using cloned DNA sequences. Genetics. 1990, 125: 821-832.PubMedPubMed CentralGoogle Scholar
- Ashburner M, Misra S, Roote J, Lewis SE, Blazej R, Davis T, Doyle C, Galle R, George R, Harris N, et al: An exploration of the sequence of a 2.9-Mb region of the genome of Drosophila melanogaster: the Adh region. Genetics. 1999, 153: 179-219.PubMedPubMed CentralGoogle Scholar
- Ashburner M: Some aspects of the structure and function of the polytene chromosomes of the Diptera. Insect Cytogenetics. Edited by: Blackman RL, Hewitt G, Ashburner M. 1980, Oxford: Blackwell Scientific Publications, 65-84.Google Scholar
- Glaser RL, Karpen GH, Spradling AC: Replication forks are not found in a Drosophila minichromosome demonstrating a gradient of polytenization. Chromosoma. 1992, 102: 15-19. 10.1007/BF00352285.PubMedView ArticleGoogle Scholar
- Moshkin YM, Belyakin SN, Rubtsov NB, Kokoza EB, Alekseyenko AA, Volkova EI, Belyaeva ES, Makunin IV, Spierer P, Zhimulev IF: Microdissection and sequence analysis of pericentric heterochromatin from the Drosophila melanogaster mutant Suppressor of Underreplication. Chromosoma. 2002, 111: 114-125.PubMedView ArticleGoogle Scholar
- Nurminsky DI, Shevelyov Y, Nuzhdin SV, Gvozdev VA: Structure, molecular evolution and maintenance of copy number of extended repeated structures in the X-heterochromatin of Drosophila melanogaster. Chromosoma. 1994, 103: 277-285.PubMedView ArticleGoogle Scholar
- Wicker T, Stein N, Albar L, Feuillet C, Schlagenhauf E, Keller B: Analysis of a contiguous 211 kb sequence in diploid wheat (Triticum monococcum L.) reveals multiple mechanisms of genome evolution. Plant J. 2001, 26: 307-316. 10.1046/j.1365-313X.2001.01028.x.PubMedView ArticleGoogle Scholar
- Wicker T, Zimmermann W, Perovic D, Paterson AH, Ganal M, Graner A, Stein N: A detailed look at 7 million years of genome evolution in a 439 kb contiguous sequence at the barley Hv-eIF4E locus: recombination, rearrangements and repeats. Plant J. 2005, 41: 184-194. 10.1111/j.1365-313X.2004.02285.x.PubMedView ArticleGoogle Scholar
- Lippman Z, Gendrel AV, Black M, Vaughn MW, Dedhia N, McCombie WR, Lavine K, Mittal V, May B, Kasschau KD, et al: Role of transposable elements in heterochromatin and epigenetic control. Nature. 2004, 430: 471-476. 10.1038/nature02651.PubMedView ArticleGoogle Scholar
- Rubin GM, Spradling AC: Genetic transformation of Drosophila with transposable element vectors. Science. 1982, 218: 348-353. 10.1126/science.6289436.PubMedView ArticleGoogle Scholar
- Losada A, Abad JP, Agudo M, Villasante A: The analysis of Circe, an LTR retrotransposon of Drosophila melanogaster, suggests that an insertion of non-LTR retrotransposons into LTR elements can create chimeric retroelements. Mol Biol Evol. 1999, 16: 1341-1346.PubMedView ArticleGoogle Scholar
- Aravin AA, Naumova NM, Tulin AV, Vagin VV, Rozovsky YM, Gvozdev VA: Double-stranded RNA-mediated silencing of genomic tandem repeats and transposable elements in the D. melanogaster germline. Curr Biol. 2001, 11: 1017-1027. 10.1016/S0960-9822(01)00299-8.PubMedView ArticleGoogle Scholar
- Kalmykova AI, Klenov MS, Gvozdev VA: Argonaute protein PIWI controls mobilization of retrotransposons in the Drosophila male germline. Nucleic Acids Res. 2005, 33: 2052-2059. 10.1093/nar/gki323.PubMedPubMed CentralView ArticleGoogle Scholar
- Sarot E, Payen-Groschene G, Bucheton A, Pelisson A: Evidence for a piwi-dependent RNA silencing of the gypsy endogenous retrovirus by the Drosophila melanogaster flamenco gene. Genetics. 2004, 166: 1313-1321. 10.1534/genetics.166.3.1313.PubMedPubMed CentralView ArticleGoogle Scholar
- Vagin VV, Sigova A, Li C, Seitz H, Gvozdev V, Zamore PD: A distinct small RNA pathway silences selfish genetic elements in the germline. Science. 2006, 313: 320-4. 10.1126/science.1129333.PubMedView ArticleGoogle Scholar
- Reiss D, Josse T, Anxolabehere D, Ronsseray S: aubergine mutations in Drosophila melanogaster impair P cytotype determination by telomeric P elements inserted in heterochromatin. Mol Genet Genomics. 2004, 272: 336-343. 10.1007/s00438-004-1061-1.PubMedView ArticleGoogle Scholar
- Aravin AA, Lagos-Quintana M, Yalcin A, Zavolan M, Marks D, Snyder B, Gaasterland T, Meyer J, Tuschl T: The small RNA profile during Drosophila melanogaster development. Dev Cell. 2003, 5: 337-350. 10.1016/S1534-5807(03)00228-4.PubMedView ArticleGoogle Scholar
- Jensen S, Gassama MP, Heidmann T: Taming of transposable elements by homology-dependent gene silencing. Nat Genet. 1999, 21: 209-212. 10.1038/5997.PubMedView ArticleGoogle Scholar
- Jensen S, Gassama MP, Heidmann T: Cosuppression of I transposon activity in Drosophila by I-containing sense and antisense transgenes. Genetics. 1999, 153: 1767-1774.PubMedPubMed CentralGoogle Scholar
- Desset S, Conte C, Dimitri P, Calco V, Dastugue B, Vaury C: Mobilization of two retroelements, ZAM and Idefix, in a novel unstable line of Drosophila melanogaster. Mol Biol Evol. 1999, 16: 54-66.PubMedView ArticleGoogle Scholar
- Lakhotia SC, Jacob J: EM autoradiographic studies on polytene nuclei of Drosophila melanogaster. II. Organization and transcriptive activity of the chromocentre. Exp Cell Res. 1974, 86: 253-263. 10.1016/0014-4827(74)90711-3.PubMedView ArticleGoogle Scholar
- Petrov DA, Aminetzach YT, Davis JC, Bensasson D, Hirsh AE: Size matters: non-LTR retrotransposable elements and ectopic recombination in Drosophila. Mol Biol Evol. 2003, 20: 880-892. 10.1093/molbev/msg102.PubMedView ArticleGoogle Scholar
- Comprehensive R Archive Network. [http://cran.r-project.org/]
- FlyBase Gbrowser Cytolocation Tool. [http://www.flybase.org/cgi-bin/gbrowse_fb/dmel]
- Vista Genome Browser. [http://pipeline.lbl.gov/cgi-bin/gateway2?bg=dm2&selector=vista]
- Kent WJ: BLAT - the BLAST-like alignment tool. Genome Res. 2002, 12: 656-664. 10.1101/gr.229202. Article published online before March 2002.PubMedPubMed CentralView ArticleGoogle Scholar
- Rice P, Longden I, Bleasby A: EMBOSS: the European Molecular Biology Open Software Suite. Trends Genet. 2000, 16: 276-277. 10.1016/S0168-9525(00)02024-2.PubMedView ArticleGoogle Scholar
- CPAN Graph PERL module. [http://search.cpan.org/]
- Mathematica 5.1 Combinatorica Package. [http://documents.wolfram.com/v5/Add-onsLinks/StandardPackages/DiscreteMath/Combinatorica.html]
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.PubMedPubMed CentralView ArticleGoogle Scholar
- Liu H, Wang J: A new way to enumerate cycles in graph. Advanced International Conference on Telecommunications and International Conference on Internet and Web Applications and Services. 2006, Los Alimitos, CA, USA (AICT/ICIW), 57-Google Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.