The origin of recent introns: transposons?
© BioMed Central Ltd 2004
Published: 29 November 2004
Skip to main content
© BioMed Central Ltd 2004
Published: 29 November 2004
The long-standing question of how genes acquire introns has provoked much debate. A recent study makes considerable progress by identifying numerous recently gained introns in nematodes - although it remains difficult to distinguish definitively between models of intron gain.
The origin of spliceosomal introns is one of molecular biology's longest-standing unsolved mysteries. Despite 27 years of extensive study, we are confident of the origin of an intron in only two cases: a short interspersed nucleotide element (SINE) insertion that gave rise to a new intron in the coding region of the catalase A gene of rice , and two midge globin genes that acquired an intron via gene conversion with an intron-containing paralog . Previous large-scale studies have failed to find a single convincing case of intron gain since the divergence of humans and mice  or a single case of convincing sequence homology between introns in the same genome for a range of taxa , and although some other cases of recent intron insertion have been discovered, the sources of these introns remain unknown. Yet, all characterized metazoan species and most other eukaryotes harbor multiple introns per gene, requiring hundreds of thousands, if not millions, of individual intron gains to have occurred throughout eukaryotic evolution.
Coghlan and Wolfe  recently studied newly gained introns in the nematodes Caenorhabditis elegans and Caenorhabditis briggsae. They identified 122 apparently recent gains by searching for introns that are present in only one of the two species and are absent from the distantly related parasitic nematode Brugia malayi as well as from paralogs and orthologs from several other species. These introns are longer than control introns, are more likely to lie in genes expressed in the germline, and contain more palindromic sequences and microsatellites. The absence of type II introns in Caenorhabditis mitochondria rules out the self-splicing intron model as an explanation for the origins of these introns; the authors' requirement that the new intron be at a site which is intronless in known paralogs excludes the intron transfer hypothesis. Coghlan and Wolfe  then sought to distinguish between the three remaining hypotheses.
They found that 21 of 81 new introns in C. elegans and 7 of 41 in C. briggsae show significant sequence similarity to other introns in the same genome . In three of these 28 cases, two in C. briggsae and one in C. elegans, the recently gained intron shows homology to another intron in the same gene. In 19 cases, the new intron matches multiple introns in the same genome. Sequence similarity of new introns to other introns is clearly a central expectation of the intron transposition model. Such similarities are also consistent with the transposon model, however, because a second copy of the intron-forming transposon may independently insert into another, previously existing, intron, and with the genomic duplication model as the new intron sequence would be homologous to nearby exonic and intronic sequences. Further analysis showed that the newly gained introns are not enriched for known repetitive elements relative to control introns (apparent evidence against the transposon hypothesis) and that the ends of new introns show no similarity to flanking exonic sequences, apparent evidence against the genomic duplication model. Thus intron transposition seems to be supported by a process of elimination.
This raises the possibility that intron 3 acquired this palindromic element not by transposition of another intron but by a third transposon insertion, either into a pre-existing intron 3 or into a contiguous coding region, leading to the creation of intron 3 (the transposon model). The finding of Coghlan and Wolfe  that new introns are generally enriched in palindromic sequences suggests the latter. The possibility of intron origin by insertion of palindromic transposons is enticing, because the tendency of palindromic elements to form hairpin structures could bring the 5' and 3'splice sites of the new intron into proximity, perhaps facilitating splicing. (A shorter hairpin structure is maintained by selection in the first intron of the Adh gene in Drosophila melanogaster .) The intron sequence could then gradually lose its palindromic character as other compensatory local mutations increased the intron's splicing efficiency, leading eventually to the quasi-random sequence characteristic of most introns. Although the authors'  finding that recently gained introns are not enriched in known repetitive elements seems to be evidence against transposon origins for these introns, this could be reconciled if the palindromic elements involved are extinct, and their extant copies too diverged (the intron matches in the Coghlan and Wolfe study  show around 70% nucleotide identity) to warrant inclusion in libraries of known transposable elements.
Other mechanisms could also account for the excess of palindromic elements in new introns, however. Regions with more stable DNA secondary structures (such as palindromic elements) are expected to experience more replication slippage, leading to higher rates of duplication of short-to-medium stretches of DNA. If such duplications occasionally lead to the creation of new introns (the tandem duplication hypothesis), these introns would themselves contain the palindromic sequences of adjacent regions. That the authors find no similarity between the terminal 25 base-pair regions of new introns and those of flanking exons could be due to the age of the gains (the levels of observed sequence similarity in the study are around 70%, a level that is not significant over short stretches) and/or to stronger positive selection near the boundaries of the new introns. The higher frequency of intron-acquiring genes in the germline is, however, harder to explain by genomic duplication except by recourse to the generally faster evolution of germline genes. Also, these arguments do not exclude intron transposition as a possibility. As pointed out by the authors , the palindromic character of new introns could reflect longer survival times of introns with stable secondary structures, affording more opportunity to be reverse-spliced. In cases where the new intron is homologous both to a transposon and to another intron, however, it seems more parsimonious to postulate a reasonably common single transposon insertion rather than a series of three rarer events (intron reinsertion, transcript retroposition and gene conversion).
What evidence remains for intron transposition? First, germline-expressed genes preferentially acquire introns, as would be expected if intron gain occurs at the RNA level, although this could instead reflect preferential insertion of palindromic elements into actively transcribing regions  or generally faster evolution of germline-expressed genes. Second, genes involved in mRNA processing and splicing preferentially gain introns. This is a surprise under any model, though it does intuitively seem to implicate the spliceosome in intron gain. As Coghlan and Wolfe  point out, however, it is hard to imagine why a mechanism that inserts introns via a protein complex would tend to favor insertion into the genes coding for these proteins. More attention will be necessary to determine the cause and generality across taxa of this intriguing bias. By identifying clear recent intron gains, Coghlan and Wolfe  have taken a large step forward in deciphering the origins of introns. That even this study is subject to interpretation underscores the slipperiness of the problem. The increasing focus of sequencing projects on closely related genomes is promising, and similar comparative studies in other taxa should help to finally unravel this mystery.