Alu elements: know the SINEs
© BioMed Central Ltd 2011
Received: 21 September 2011
Accepted: 16 December 2011
Published: 28 December 2011
Skip to main content
© BioMed Central Ltd 2011
Received: 21 September 2011
Accepted: 16 December 2011
Published: 28 December 2011
Alu elements are primate-specific repeats and comprise 11% of the human genome. They have wide-ranging influences on gene expression. Their contribution to genome evolution, gene regulation and disease is reviewed.
Alu elements represent one of the most successful of all mobile elements, having a copy number well in excess of 1 million copies in the human genome  (contributing almost 11% of the human genome). They belong to a class of retroelements termed SINEs (short interspersed elements) and are primate specific. These elements are non-autonomous, in that they acquire trans-acting factors for their amplification from the only active family of autonomous human retroelements: LINE-1 .
Although active at higher levels earlier in primate evolution, Alu elements continue to insert in modern humans, including somatic insertion events, creating genetic diversity and contributing to disease through insertional mutagenesis. They are also a major factor contributing to non-allelic homologous recombination events causing copy number variation and disease. Alu elements code for low levels of RNA polymerase III transcribed RNAs that contribute to retrotransposition. However, the ubiquitous presence of Alu elements throughout the human genome has led to their presence in a large number of genes and their transcripts. Many individual Alu elements have wide-ranging influences on gene expression, including influences on polyadenylation [3, 4], splicing [5–7] and ADAR (adenosine deaminase that acts on RNA) editing [8–10].
This review focuses heavily on studies generated as a result of the advent of high-throughput genomics providing huge datasets of genome sequences, and data on gene expression and epigenetics. These data provide tremendous insight into the role of Alu elements in genetic instability and genome evolution, as well as their many impacts on expression of the genes in their vicinity. These roles then influence normal cellular health and function, as well as having a broad array of impacts on human health.
Each RNA polymerase III generated Alu RNA is unique in terms of: (i) accumulated mutations in the Alu element itself; (ii) the length and accumulated sequence heterogeneity in the encoded A-rich region at its 3' end; and (iii) the unique 3' end on each RNA transcribed from the adjacent genomic site. Those RNAs are then thought to assemble into ribonucleoprotein particles (Figure 1b) that involve the SRP9/14 heterodimer , polyA-binding protein (PABP) [14, 15] and at least one other unidentified protein that binds to the RNA structure [14, 15]. The SRP9/14 proteins and PABP are thought to help the Alu RNA associate with a ribosome, where it might become associated with ORF2 protein (ORF2p) being translated from L1 elements [2, 16, 17]. Alu RNAs then utilize the purloined ORF2p to copy themselves at a new genomic site using a process termed target-primed reverse transcription (Figure 1c; reviewed in [18, 19]).
Although Alu is dependent on the L1 ORF2p protein, Alu retrotransposition is not simply an extension of the L1 retrotransposition process. For instance, L1 depends on ORF1p and ORF2p, while Alu requires ORF2p only [2, 20, 21]. This may be one of the reasons why Alu causes several times as many diseases as L1 through insertion [22, 23] and has twice the copy number of L1 . Because L1 elements have been shown to have a splice variant that makes only ORF2p , or that may express ORF2p from elements with a mutated ORF1, Alu might be able to amplify in cells that do not effectively amplify L1. In fact, although L1 transcription is high in the testis, almost all of the RNA is not full-length, mostly due to splicing . This means that Alu may retrotranspose well in the testis, even though L1 retrotransposes poorly. Alu and L1s have several other differences. Following expression, Alu RNAs can retrotranspose rapidly, whereas L1 RNAs take almost 24 h longer . Retrotransposition of Alu and L1 elements is also differentially influenced by different APOBEC3 proteins [26–28]. Alu elements encode the A-tail separately at each locus rather than through post-transcriptional polyadenylation, as with L1. Thus, Alu A-tails are prone to shrinkage and accumulation of mutations that can affect the amplification process from each particular locus (discussed below) .
These mechanistic features all contribute to the observed paucity of actively amplifying 'master' or 'source' Alu elements in the human genome. The internal RNA polymerase III promoter is not strong unless it fortuitously lands near appropriate flanking sequences . Furthermore, epigenetics seems to silence the majority of Alu transcripts. Thus, there are generally very low levels of RNA polymerase III transcribed Alu RNAs in a cell and it is transcribed by a number of dispersed loci, including many loci that are incapable of active retrotransposition . Because the A-tail grows during the insertion process [2, 34], most new inserts have a sufficiently long A-tail for effective amplification. However, because each new insert lands in a different genomic environment, the new loci will vary tremendously in their transcription potential owing to the influences of flanking sequences  and epigenetics. In addition, the 3' flanking sequence will provide the RNA polymerase III terminator, and those with longer 3' unique regions will be poor at retrotransposition . Following insertion, those elements that are initially capable of retrotransposition will gradually lose that capability by a series of sequence changes. The most rapid change will be that the long, relatively unstable A-tails will shrink rapidly , resulting in lower retrotransposition capability [12, 29]. In addition, the A-tails will rapidly accumulate mutations and often form variant microsatellite-like sequences at their ends that will also impair the activity . Over the long run, the body of the Alu element will accumulate mutations , first CpG mutations, and then other random mutations, which will alter the promoter, RNA folding, and/or interactions with cellular proteins, leaving relatively few of the older Alu elements capable of retrotransposition. The sum of all of these factors contributes to the lack of activity of most Alu elements.
Alu elements are ancestrally derived from the 7SL RNA gene [35, 36]. Although the details of the origin are not known, it seems likely that a relatively inefficient retrotransposon was formed by a deleted version of the 7SL RNA gene sometime before the primate/rodent evolutionary divergence. This precursor then evolved into B1 repeats in rodents, and into FLAM (free left Alu monomer) and FRAM (free right Alu monomer) sequences in the primate lineage [36, 37]. A dimer of FLAM and FRAM eventually took on the highly efficient amplification characteristics of the Alu elements.
Alu elements have an even larger impact than that provided by their insertional mutagenesis through their influence on genome instability by providing the most common source of homology for non-allelic homologous recombination events leading to disease [23, 46]. The bioinformatics required to analyze these types of rearrangements from comparative genomic data is technically more difficult than characterizing insertions. However, studies of the human and chimpanzee genomes show that approximately 500 deletion events have occurred in both genomes (Figure 3) [47, 48]. It has not been possible to assess the duplication events that are also caused by this type of recombination, but it is likely that there is approximately the same number of events, and these events have also been suggested to contribute to genomic inversions  and segmental duplications . The lower number of apparent non-allelic Alu/Alu recombination events between human and chimpanzee relative to the number of Alu insertion events (Figure 3) suggests that the recombination events cause a stronger negative selection because there are many more Alu recombination events than insertions causing disease . Thus, they contribute more to disease, but are less well fixed in the population. This is consistent with the relatively short length of the fixed deletions relative to the longer deletions commonly found associated with disease .
Alu elements are preferentially enriched in regions that are generally gene rich, whereas L1 elements are enriched in the gene-poor regions . This also correlates with Alu elements being enriched in reverse G bands , as well as in G+C-rich genomic isochores . However, younger Alu and L1 elements do not show much disparity in their locations, making it most likely that the differences in location are the result of losses of L1 and Alu elements in different genomic regions. It is easy to understand why the much larger L1 elements might have more negative selection when located in genes, making Alu elements much more stably maintained within the genes. It is more difficult to understand why Alu elements seem to be preferentially lost between genes over evolutionary time compared with L1. It is most likely that the tendency of Alu elements to participate in non-allelic homologous recombination events might allow loss of these elements when not under selection [53, 54].
Alu elements have continued to insert in the modern human lineage as evidenced by their continued contribution to human genetic disease. It is estimated that there is about one new Alu insert per 20 human births , leading to about one in every 1,000 new human genetic diseases . Comparison between two completed human genomes showed that there were approximately 800 polymorphic Alu elements between those two individuals .
Alu insertions in human disease
3 × HEMB (IX)
Ya5, Ya5, Yb8
2 × HEMA (VIII)
2 × CLCN5
2 × BTK
X-linked severe combined immunodeficiency disease
Glycerol kinase deficiency
Hyper IgM syndrome
Autosomal dominant optic atrophy
Hypocalciuric hypercalcemia and hyperparathyroidism
Associated with leukemia
Hereditary desmoid disease
Chronic hemolytic anemia
Lipoprotein lipase deficiency
Walker Warburg syndrome
3 × FGFR2
Ya5, Yb8, Yc1
Autoimmune lymphoproliferative syndrome
Acute intermittent porphyria
3 × BRCA2
Ya5, Yc1, Y
Congenital disorder of glycosylation type I
15 × NF1
Genomic studies are now beginning to delve into the diversity of Alu elements in the human population. Several studies involve the resequencing of multiple independent human genomes, resulting in the discovery of many new polymorphic Alu elements [59–61]. These studies largely confirm earlier work on the tremendous amount of diversity contributed to individual genomes by Alu insertions, as well as Alu subfamily types and distribution. These studies have utilized multiple available human genome sequences, primarily those available with low-to-moderate sequence coverage from the first 185 genomes from the 1000 Genomes Project. New, focused, next-generation sequencing (NGS) approaches seem very promising for looking at more specific questions about Alu activity. Among these approaches is a PCR method to isolate sequences flanking L1 or Alu sequences [62, 63]. This approach isolated an additional 403 polymorphic Alu inserts from a number of individuals (also see a second method in the section Somatic insertions of Alu elements). The added sensitivity of these directed NGS approaches will aid in studies for detecting rare insertions in germline tissues, as well as for detecting somatic insertions present in only a few cells within an organ or tumor.
Almost all studies on Alu element activity have focused on germ line or tissue culture cell inserts [2, 12, 29, 31]. However, there is reason to believe that Alu elements are also active in somatic tissues and may continue to contribute to genetic instability throughout the life of an individual, possibly leading to cancer or other age-related degenerations. The high levels of Alu insertion in tissue culture cells from transfected tagged constructs demonstrate that Alu is capable of retrotransposing in cells that are at least somewhat differentiated [2, 29]. However, the only way to demonstrate endogenous activity of Alu elements in tissues is by utilizing the power of high-throughput NGS technologies.
One NGS approach has claimed detection of somatic Alu elements. This approach uses hybrid selection with probes to Alu elements to enrich Alu-containing regions prior to NGS. DNA was sequenced from several brain regions, particularly the hippocampus, which has been reported to have higher levels of somatic L1 retrotransposition . Using very deep sequencing, this study found evidence of thousands of individual Alu insertions. These studies were unable to quantify the relative insertion rate per cell. Each insertion is also extremely low in sequence coverage in these studies as if each one is specific to only a small proportion of cells within the tissue, consistent with insertion very late in the differentiation process. However, with so many of these rare insertions, these data suggest that there is a significant amount of genetic mosaicism created by the activity of mobile elements. A feature of note for the somatic Alu insertions was that there were apparently a large number of insertions of the older S subfamilies. This group of subfamilies is almost completely inactive in the human germ line, implying that the rules of Alu amplification [29, 31] may differ between the somatic cells and the germ line. However, this study needs to be further substantiated, as the NGS reads are short and may have led to some misassignments or misinterpretations.
The hnRNA and mRNA molecules described above are transcribed by RNA polymerase II and are not involved in the Alu amplification process. What is often not appreciated is that RNA polymerase III generated Alu transcripts are generally expressed at very low levels. It has been estimated that HeLa cells express about 100 molecules of Alu RNA (defined as RNA polymerase III generated) , although this could increase under various cellular stresses, including heat shock and viral infection . By contrast, there are hundreds of thousands of mRNA molecules in each cell, and therefore tens of thousands of RNA polymerase II transcribed RNAs that contain Alu sequences. Thus, only a tiny proportion of Alu-containing RNAs in the cell are transcribed by RNA polymerase III. This makes it extremely difficult to measure and characterize the authentic Alu transcripts that might be involved in the amplification process relative to those that are just 'passengers' in other RNAs.
Given the technical challenges involved, it is not surprising that very few studies have looked properly at Alu RNA polymerase III transcripts. These studies have used either a primer extension approach to define the 5' end of the Alu transcript to prove that they were generated from RNA polymerase III rather than read-throughs of Alu elements in RNA polymerase II transcripts , or size fractionation combined with a 3' RACE (rapid amplification of cDNA ends) technique after in vitro tailing of the RNA to define the 3' end of the Alu RNA . Any other traditional method of RNA characterization, such as northern blots, RT-PCR or cDNA cloning, is more likely to study either the closely related 7SL RNA (300 bp band in northern blots) or Alu elements included in RNA polymerase II transcripts, rather than those that might be transcribed by RNA polymerase III.
Many recent studies attempting to measure Alu RNA transcripts do not seem to be aware of the difficulties described above. Some groups using northern blots to look at Alu transcripts  have detected a band that is more likely to be 7SL rather than the expected smear of heterogeneous Alu transcripts. Similarly, investigators often do not realize that typical cDNA cloning approaches [68, 69] or RT-PCR of Alu elements  are also unable to distinguish RNA polymerase III transcripts from those that are contained within RNA polymerase II transcripts (Figure 4). Thus, many claims regarding Alu non-coding RNAs probably reflect the inclusion of Alu elements in mRNAs.
Every time an Alu element inserts in or near a gene, it has the potential to influence expression of that gene in several ways. It is very likely that the majority of such influences would be under negative selection. Thus, only rarely would an Alu element insert and evolve in conjunction with a specific gene to truly become a regulator of that gene.
Alu elements are relatively rich in CpG residues, which appear to be widely subject to methylation and therefore are responsible for approximately 25% of all of the methylation in the genome . Because methylated CpGs readily mutate to TpG, the higher density of methylation occurs in the younger elements. Methylation of Alu elements does vary in different tissues and appears to decrease in many tumors. It is likely that demethylation of an Alu increases expression from that Alu locus. It has also been proposed that Alu elements might be a source of new CpG islands that could influence the regulation of nearby genes. However, studies to date do not make a clear case for Alu methylation being the driving force for nearby gene expression changes rather than the alternative, that Alu methylation is influenced by other nearby genome features.
Alu elements have also been found to host a number of transcription-factor-binding sites. Some of these binding sites are specific to certain Alu subfamilies, and some are also enhanced by changes that occur in Alu elements post-insertion. Dozens of different transcription-factor-binding sites have been predicted within subsets of Alu elements . Although most of these are not validated, it does illustrate the opportunity for such sites to evolve at specific loci into regulatory elements. Sites that have used transcription-factor binding to demonstrate the association with Alu include several families of nuclear receptors [73–75], NF-kappaB  and p53 . Thus, Alu elements have, at the least, a tremendous capacity to serve as a sink of bound transcription factors, and in limited specific cases have been found to influence expression of nearby genes.
The data are even more compelling for Alu elements to contribute to an array of post-transcriptional processes. These include providing polyadenylation sites [3, 4], sites for alternative splicing [5–7] and sites for RNA editing [8–10] that then influences the fate of the RNA. Alu elements have two runs of A in their consensus sequence that can be readily mutated to the AATAAA consensus polyadenylation site. An analysis suggested that the modest bias for Alu elements in the reverse orientation to the gene in which they insert might be because of negative selection against the introduction of potential polyA sites . This was further confirmed by a bioinformatic analysis demonstrating that a number of human genes utilize Alu sites to provide polyadenylation [3, 78], including some that caused differences in human gene transcripts relative to chimpanzee .
Alu elements appear to contribute to a relatively unique form of gene regulation involving ADARs . These enzymes recognize RNAs with double-strand character and deaminate some adenosines to form inosines in those duplex regions. Most ADAR editing in cells occurs on primary transcripts in the nucleus in which two Alu elements in opposite orientations form a hairpin (Figure 5b). One of the major consequences of this editing process is the retention of transcripts in the nucleus . Because ADAR is most prevalent in the brain, but also present in other tissues and tumors, it seems likely that this results in a tissue-specific alteration in RNA retention in the nucleus [8, 82].
There have also been suggested associations between miRNAs and Alu elements. It has been suggested that the Alu promoter drives expression of sequences that can be processed into miRNAs . However, at least in one case this has been suggested to be due to the co-presence of Alu and the miRNA in the intron of an hnRNA molecule, rather than a RNA polymerase III generated Alu RNA . Additionally, some miRNAs appear to recognize Alu elements in other transcripts and may lead to regulation of the large number of transcripts with Alu elements in their 3' ends [5, 85]. This regulation can be altered by RNA editing of the Alu elements, influencing the specificity of the regulation .
There are several cases where the RNA polymerase III transcribed Alu RNAs have been suggested to play roles in gene expression and function (that is, in response to stress ). It has similarly been suggested that the interaction of Alu RNAs with the RNA polymerase II molecule can attenuate transcription . More recently it was reported that alterations in Dicer expression in age-related macular degeneration would lead to increased accumulation of Alu RNAs that were responsible for the pathogenesis . All of these studies are supported either by transient overexpression of Alu RNAs or in vitro studies. However, given the relatively low levels of endogenous Alu transcripts, even upon stress stimulation, it is not completely clear that the necessary levels of RNA to achieve these influences are made in cells.
The abundance of Alu elements in the human genome demonstrates that they have had a tremendous impact on insertional mutagenesis and evolution of the primate genome. Their distribution throughout the genome has acerbated that impact, supplying the primary sequences for non-allelic homologous recombination events throughout the genome. Extensive genomic sequencing efforts demonstrate that these forms of instability have not only resulted in major evolutionary changes in genomes, but continue to cause human diversity and contribute to human diseases. The ubiquity of Alu elements throughout the genome, and their enrichment in genes, has also led them to be inextricably mixed with a number of types of influence on gene expression and regulation. Many high-throughput studies have ignored Alu elements because of the technical difficulties in analyzing such high-copy-number elements. New NGS approaches are beginning to address the intricate relationships between Alu elements and other genomic features.
adenosine deaminase that acts on RNA
free left Alu monomer
free right Alu monomer
primary nuclear transcript
open reading frame
reverse transcriptase PCR
Thanks to Drs Astrid Engel and Victoria Belancio for helpful discussions and comments on the manuscript, and to Melanie Cross for editorial help. Dr Deininger's research on Alu elements is supported by a grant from the NIH (R01GM45668).