Long terminal repeat retrotransposons of Mus musculus
© McCarthy and McDonald 2004
Received: 4 September 2003
Accepted: 9 January 2004
Published: 13 February 2004
Long terminal repeat (LTR) retrotransposons make up a large fraction of the typical mammalian genome. They comprise about 8% of the human genome and approximately 10% of the mouse genome. On account of their abundance, LTR retrotransposons are believed to hold major significance for genome structure and function. Recent advances in genome sequencing of a variety of model organisms has provided an unprecedented opportunity to evaluate better the diversity of LTR retrotransposons resident in eukaryotic genomes.
Using a new data-mining program, LTR_STRUC, in conjunction with conventional techniques, we have mined the GenBank mouse (Mus musculus) database and the more complete Ensembl mouse dataset for LTR retrotransposons. We report here that the M. musculus genome contains at least 21 separate families of LTR retrotransposons; 13 of these families are described here for the first time.
All families of mouse LTR retrotransposons are members of the gypsy-like superfamily of retroviral-like elements. Several different families of unrelated non-autonomous elements were identified, suggesting that the evolution of non-autonomy may be a common event. High sequence similarity between several LTR retrotransposons identified in this study and those found in distantly-related species suggests that horizontal transfer has been a significant factor in the evolution of mouse LTR retrotransposons.
Retrotransposons are mobile genetic elements that make up a large fraction of most eukaryotic genomes. All retrotransposons are distinguished by a life cycle involving an RNA intermediate. The RNA genome of a retroelement is copied into a double-stranded DNA molecule by reverse transcriptase, which is subsequently integrated into the host's genome. Retrotransposons fall into two main categories: those with long terminal repeats (LTRs), such as retroviruses and LTR retrotransposons, and those that lack such repeats, for example, long interspersed nuclear elements (LINEs).
Retrotransposons are particularly abundant in plants, where they are often a principal component of nuclear DNA. In corn, 50-80%, and in wheat fully 90%, of the genome is made up of retrotransposons [1, 2]. This percentage is generally lower in animals than in plants but it can still be significant. For example, about 8% of the human genome is now known to be composed of LTR retrotransposons . In the mouse genome this figure has been estimated at 10% .
This article presents the results of a recent survey (December 2002) of the GenBank mouse (M. musculus) database (GBMD) and the 2.9 Gbp Ensembl  mouse dataset (EMD) for the presence of LTR retrotransposons. We have employed a new search program, LTR_STRUC (LTR retrotransposon structure program), as the initial data-mining tool in our survey . Identified elements were subjected to sequence analyses to identify open reading frames (ORFs) encoding reverse transcriptase (RT) and other retroviral proteins. LTR_STRUC finds only full-length elements, that is, ones having two LTRs and a pair of target site duplications (TSDs). We therefore augmented our search approach by conducting BLAST searches using reverse transcriptase queries. These queries are of two types: previously known RTs in the public database from mouse and other mammals, and RTs obtained from our initial scan of the EMD with LTR_STRUC. Subsequent RT sequence alignments were carried out, followed by construction of phylogenetic trees.
An LTR retrotransposon 'family' is defined as a group of elements with RTs at least 90% similar at the amino acid level . Experience has shown that when two elements have RTs that are 90% similar, their LTRs are typically about 60% similar. Thus, non-autonomous elements, lacking an RT ORF, are assigned to the same family if their LTRs are at least 60% similar. Many LTR retrotransposons replicate non-autonomously. Four different families of murine LTR retrotransposons have non-autonomous members. (MalR elements, ETn elements, VL30 elements and a new type identified in this study, related to IAP elements). These non-autonomous elements are discussed below. Non-autonomous elements can reach a high copy number even though they lack an RT ORF [4, 8–11].
Currently there is no standard mouse retrotransposon nomenclature. In our system of classification for mouse, LTR retrotransposons are specified by the acronym Mmr (M. musculus retrotransposon). Distinct families are indicated by number (for example, Mmr1, Mmr2, Mmr3). We have chosen to adopt the Mmr nomenclature in this study because it is consistent with the systematic logic ('Mm' indicative of the genus and species of the host organism; 'r' indicates retrotransposon) used in previous articles [8, 12]. In each case where we use the Mmr acronym in this article to refer to a previously named family, we also include any pre-existing name for the family.
Results and discussion
RTs from elements identified in our survey fall into numerous distinct families. All autonomous LTR retrotransposons identified were of the gypsy-like elements (Classes I, II, and III). Autonomous retroviral-like elements in the mouse genome usually have an overall length of between 6,000 and 9,000 bp. Results of our study indicate that the TSDs of mouse LTR retrotransposons are four to six base pairs long and that within each of the three major classes of these elements a single TSD length is characteristic (see below). With the exception of a few mutated copies, mouse LTR retrotransposons seem to have the same canonical dinucleotides terminating the LTRs as are typically found in other species (TG/CA). The LTRs of murine retroviral-like elements are generally 300-600 bp long, with the exception of mouse mammary tumor virus (MMTV) where the LTRs are some 1,300 bp in length. Our survey shows that at least 21 distinct LTR retrotransposon families exist in the mouse genome, 13 of which have not been described previously.
LTR retrotransposon families of the murine genome
To date, LTR retrotransposon diversity has been rigorously classified into families for only a few organisms (for example, Oryza sativa , Drosophila melanogaster  and Caenorhaditis elegans ). This article represents a first attempt to establish a similar uniform classification and nomenclature for the domestic mouse. Previous studies have classified murine retrotransposons into broad categories only, which ignore the standard definition of 'family' (see above). For example, the term 'intracisternal type A particle' (IAP) has been used to refer to elements that belong to several distinct LTR-retrotransposon phylogenetic groups. The autonomous elements identified in our survey of the GBMD and EMD fall into 20 families on the basis of degree of RT divergence (greater than 10% denotes family). In addition, we have classified MalR elements, which are non-autonomous, into a twenty-first family that is closely related to MuERV-L elements, because these two types of transposons have similar LTRs. MusD and ETn elements form a second pair of related autonomous and non-autonomous elements; MmERV and VL30 elements constitute a third. These three paired families are discussed in more detail below.
Class I (families 1-4)
Non-murine RTs obtained from translating BLAST
Name of retrotransposon
Position of RT in file
Human endogenous retrovirus L
Human endogenous C type retrovirus
Class II (families 5-19)
Exemplars of mouse LTR retrotransposon families characterized in this study
Element length (bp)
LTR-LTR identity (%)
Class III (families 20 and 21)
Like MalRs in other species, murine MalRs are all internally deleted. The internal region contains only non-coding repetitive DNA. Nevertheless they have typical LTRs, primer binding site and polypurine tract. Members of Mmr21_MaLR are of two types: MT MalRs - the most common type of LTR retrotransposon in the mouse genome (mean length approximately 1,980 bp); and ORR1 MalRs (mean length approximately 2,460 bp). Our survey suggests that in the mouse genome, MT MalRs are about ten times as common as their longer relatives, the ORR1 MalRs. Non-truncated copies of Mmr20_ MuERV-L elements have an overall length of about 6,400 bp.
Length variation in murine LTR retrotransposons
Although all copies of family Mmr10_IAP found by LTR_STRUC have two LTRs and recognizable TSDs (as required by the search algorithm employed by the program), the individual members of this abundant family vary widely in overall length (2,700-7,200 bp) due to the presence of internal deletions of varying length. On the other hand, the two abundant types of non-autonomous Class III elements (MT and ORR1 MalRs) exhibit a markedly different pattern of variation from that of Mmr10_IAP elements. Lengths of ORR1 MalRs peak sharply at 2,300 bp and those of MT MalRs at 1,980 bp, with very few elements in either case differing from these peak frequencies by more than 100 bp (<1%). Moreover, most copies of Mmr10_IAP, from the shortest to the longest, are preponderantly represented by copies with a high level of LTR-LTR identity (>99%), a finding consistent with recent transposition. The ability of internally truncated Mmr10_IAPs to complete their replication cycle is consistent with the fact that a number of Mmr10_IAP copies bearing the same 1,800-bp deletion (affecting the polyprotein ORF) were found in our survey on a variety of different mouse chromosomes. A similar dispersed distribution of lengths was observed in two other families Mmr19_MusD and Mmr1_MmERV. Comparison of a VL30 element (AF486451) with our data revealed a high degree of LTR-LTR similarity (>90%) to elements in family Mmr1_MmERV and therefore are members of that family (VL30s are non-autonomous and cannot be compared with other elements on the basis of RT similarity).
Known RTs used for comparison in phylogenies
Name of retrovirus
Gibbon ape leukemia virus
Porcine endogenous retrovirus ERV-PK15
Bovine Leukemia Virus
Human endogenous retrovirus K
Human breast cancer associated
Human endogenous retrovirus L
Golden hamster intracisternal A-particle H18
Feline leukemia virus
Rabbit endogenous retrovirus
Golden hamster intracisternal type-A
Simian SRV-1 type D retrovirus
Mason-Pfizer Monkey Virus
Moloney murine leukemia virus
Murine endogenous retrovirus ERV-L
Murine type D-like endogenous retrovirus MusD1
Human endogenous retrovirus type C oncovirus
Koala type C endogenous virus
M. dunni endogenous virus
Mouse mammary tumor virus
M. musculus endogenous retrovirus
All autonomous retrotransposons identified in our study were retroviral-like elements (of Classes I, II, and III). At least 21 distinct families of murine LTR retrotransposons exist. Families Mmr4, Mmr5, Mmr6, Mmr7, Mmr8, Mmr9, Mmr12, Mmr13, Mmr14, Mmr15, Mmr16, Mmr17, and Mmr18 have not been previously recognized, 13 families in all. These new families are all Class II elements (with the exception of Mmr4, which belongs to Class I) and are thus akin to immune deficiency viruses such as simian retrovirus SRV-1, to mouse mammary tumor virus (MMTV), and to IAP elements.
Our purpose in using LTR_STRUC to begin our survey of the mouse genome was to obtain a broadly representative sample of murine retrotransposons. Since the algorithm it employs is not dependent upon sequence homology, as in standard search methods such as BLAST, the initial results of our survey presumably were not biased toward a particular set of queries. Also, since the current version of LTR_STRUC now categorizes the elements it locates and assigns a new name to any element that differs sufficiently from any found earlier in the search, the chances of overlooking low-copy families has been reduced. The thoroughness of our BLAST search can only have been augmented by using LTR_STRUC because, in the BLAST phase of our survey, the queries used were a combination of those element types already recognized, prior to our investigation, with those found by LTR_STRUC. We believe this approach is the reason we were able to identify the 13 previously unreported families listed above.
Materials and methods
Using a new data-mining program, LTR_STRUC , we have mined the Ensembl mouse (M. musculus) dataset  for LTR retrotransposons. We have used elements found in this initial search, as well as murine LTR retrotransposons identified by previous workers, to conduct BLAST searches of the GenBank mouse database.
Automated characterization of LTR retrotransposons
The methods used in our survey of the mouse genome are essentially the same as those used in our earlier study of the rice genome and are described elsewhere . Briefly, we began our survey by using a new computer program, LTR_STRUC, which identifies new LTR retrotransposons based on the presence of characteristic retroelement features . Additional elements were identified by BLAST searches using the RTs, both of elements located by LTR_STRUC and of ones previously recognized in earlier studies by previous researchers.
Initial scans with LTR_STRUC were conducted on a dataset consisting of the 2.9 Gbp of M. musculus sequence data available in the Ensembl database at the time of the initial scan (December 2002). The dataset (EMD) was obtained from the Ensembl website . In an effort to identify additional elements not picked up in the initial survey with LTR_STRUC, we have used representative sequences from each retrotransposon family identified in this study as queries to conduct BLAST searches against the GenBank mouse database (GBMD). Thus, the results reported here constitute a reasonably unbiased survey of LTR-retrotransposon diversity in mouse. RT sequences were identified according to previously described criteria [16, 17].
Multiple sequence alignments and phylogenetic analyses
The RT domains of the various Mmr elements were aligned, as described elsewhere , with previously reported RT sequences (Table 3). In the case of elements lacking an RT sequence because of fragmentation or internal truncation, the LTR sequences were used to assign them the proper family.
- SanMiguel P, Tikhonov A, Jin YK, Motchoulskaia N, Zakharov D, Melake-Berhan A, Springer PS, Edwards KJ, Lee M, Avramova Z, Bennetzen JL: Nested retrotransposons in the intergenic regions of the maize genome.Science 1996, 274:765–768.View ArticleGoogle Scholar
- Flavell RB: Repetitive DNA and chromosome evolution in plants.Philos Trans R Soc Lond B Biol Sci 1986, 312:227–242.View ArticleGoogle Scholar
- Lander ES, Linton LM, Birren B, Nusbaum C, Zody MC, Baldwin J, Devon K, Dewar K, Doyle M, FitzHugh W, et al.: Initial sequencing and analysis of the human genome.Nature 2001, 409:860–921.View ArticleGoogle Scholar
- Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al.: Initial sequencing and comparative analysis of the mouse genome.Nature 2002, 420:520–562.View ArticleGoogle Scholar
- Mouse Genome Server [http://www.ensembl.org/Mus_musculus/]
- McCarthy EM, McDonald JF: LTR_STRUC: a novel search and annotation program for LTR retrotransposons.Bioinformatics 2003, 19:362–367.View ArticleGoogle Scholar
- Bowen N, McDonald JF: Drosophilaeuchromatic LTR retrotransposons are much younger than the host species in which they reside.Genome Res 2001, 11:1527–1540.View ArticleGoogle Scholar
- McCarthy EM, Liu J, Gao L, McDonald JF: Long terminal repeat retrotransposons ofOryza sativa.Genome Biol 2002, 3:research0053.1–0053.11.View ArticleGoogle Scholar
- Mager DL, Freeman JD: Novel mouse Type D endogenous proviruses and ETn elements share long terminal repeat and internal sequences.J Virol 2000, 74:7221–7229.View ArticleGoogle Scholar
- Jiang N, Jordan IK, Wessler SR: Dasheng and RIRE2: a non-autonomous long terminal repeat element and its putative autonomous partner in the rice genome.Plant Physiol 2002, 130:1697–1705.View ArticleGoogle Scholar
- Witte CP, Hien L, Bureau T, Kumar A: Terminal-repeat retrotransposons in miniature (TRIM) are involved in restructuring the host-plant genome.Proc Natl Acad Sci USA 2001, 98:13778–13783.View ArticleGoogle Scholar
- Bowen N, McDonald JF: Genomic analysis ofCaenorhabditis elegansreveals ancient families of retroviral-like elements.Genome Res 1999, 9:924–935.View ArticleGoogle Scholar
- Bromham L, Clark F, McKee JJ: Discovery of a novel murine type C retrovirus by data mining.J Virol 2001, 75:3053–3057.View ArticleGoogle Scholar
- Brulet P, Kaghad M, Xu YS, Croissant O, Jacob F: Early differential tissue expression of transposon-like repetitive DNA sequences of the mouse.Proc Natl Acad Sci USA 1983, 80:5641–5645.View ArticleGoogle Scholar
- Kuff EL, Lueders KK: The intracisternal A-particle gene family: structure and functional aspects.Adv Cancer Res 1988, 51:183–276.View ArticleGoogle Scholar
- Xiong Y, Eickbush TH: Similarity of reverse transcriptase-like sequences of viruses, transposable elements, and mitochondrial introns.Mol Biol Evol 1988, 5:675–690.Google Scholar
- Xiong Y, Eickbush TH: Origin and evolution of retroelements based upon their reverse-transcriptase sequences.EMBO J 1990, 9:3353–3362.Google Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article: verbatim copying and redistribution of this article are permitted in all media for any purpose, provided this notice is preserved along with the article's original URL.