Exon identity crisis: disease-causing mutations that disrupt the splicing code
© BioMed Central Ltd 2014
Published: 23 January 2014
Skip to main content
© BioMed Central Ltd 2014
Published: 23 January 2014
Cis-acting RNA elements control the accurate expression of human multi-exon protein coding genes. Single nucleotide variants altering the fidelity of this regulatory code and, consequently, pre-mRNA splicing are expected to contribute to the etiology of numerous human diseases.
Although genes span 33.4% of the human genome from start codon to stop codon, only 3.66% of their sequence comprises protein coding sequences . Introns make up the rest of this gene space, separating adjacent protein coding exons from one another. To produce a mature mRNA that encodes a continuous string of codons, these exons must be put together following the precise excision of introns in a process referred to as precursor messenger RNA splicing (pre-mRNA splicing). Aberrant pre-mRNA splicing is now recognized as the underlying cause of many human diseases. Mutations in trans-acting factors or cis-acting regulatory elements compromise the expression of protein-coding genes by decreasing the specificity or fidelity of splice site selection, a fundamental step in expression of multi-exon genes.
At least three mechanisms can induce RNA-based disease. First, genetic variants such as point mutations can abolish cis-acting elements recognized by RNA binding proteins (RBPs), thereby inducing disease phenotypes in humans. Work on many disease genes, including the breast cancer gene BRCA1, the gene encoding the cystic fibrosis transmembrane conductance regulator (CFTR), the growth hormone gene GH1 and the ataxia telangiectasia mutated gene ATM, has demonstrated that all classes of point mutations, including nonsense mutations, can disrupt exonic splicing regulatory elements (ESRs) and induce aberrant pre-mRNA splicing [2–6]. Second, RBPs are implicated (either by mutation or aberrant expression) in numerous human diseases, including cancer, Alzheimer’s disease, frontotemporal dementia, spinal muscular atrophy (SMA) and retinitis pigmentosa (reviewed in ). These observations suggest that processing of specific transcripts or, more likely, families of related transcripts may be mis-regulated. Finally, RNAs transcribed from genes containing trinucleotide repeat expansions also induce disease. These toxic RNAs seem to function by sequestering RBPs and causing gross changes in post-transcriptional gene expression programs .
Given that other review articles have already done an exceptional job at summarizing the pleiotropic effects of RBPs and toxic RNA elements on pathogenesis [9–13], here we focus on aberrant protein-RNA interactions implicated in monogenic human diseases.
The decision of whether to splice or not to splice is typically modeled as a stochastic rather than deterministic process, such that even the most defined splicing signals can sometimes splice incorrectly . However, under normal conditions, pre-mRNA splicing proceeds at surprisingly high fidelity . This is attributed in part to the activity of adjacent cis-acting auxiliary exonic and intronic splicing regulatory elements (ESRs or ISRs) [20–24]. Typically, these functional elements are classified as either exonic or intronic splicing enhancers (ESEs or ISEs) or silencers (ESSs or ISSs) based on their ability to stimulate or inhibit splicing, respectively. Although there is now evidence that some auxiliary cis-acting elements may act by influencing the kinetics of spliceosome assembly, such as the arrangement of the complex between U1 snRNP and the 5′ splice site, it seems very likely that many elements function in concert with trans-acting RBPs . For example, the serine- and arginine-rich family of RBPs (SR proteins) are a conserved family of proteins  that have a key role in defining exons . SR proteins promote exon recognition by recruiting components of the pre-spliceosome to adjacent splice sites or by antagonizing the effects of ESSs in the vicinity [28–30]. The repressive effects of ESSs can be mediated by members of the heterogeneous nuclear ribonucleoprotein (hnRNP) family and can alter recruitment of core splicing factors to adjacent splice sites . In addition to their roles in splicing regulation, silencer elements are suggested to have a role in repression of pseudo-exons, sets of decoy intronic splice sites with the typical spacing of an exon but without a functional open reading frame . ESEs and ESSs, in cooperation with their cognate trans-acting RBPs, represent important components in a set of splicing controls that specify how, where and when mRNAs are assembled from their precursors [30, 33, 34].
The sequences marking the exon-intron boundaries are degenerate signals of varying strengths that occur at high frequency within human genes . In multi-exon genes, different pairs of splice sites can be linked together in many different combinations, creating a diverse array of transcripts from a single gene [36, 37]. This is commonly referred to as alternative pre-mRNA splicing, and is classified into several discrete event types that have been observed both in vitro and in vivo.
Recent studies suggest that 86 to 94% of human multi-exon genes undergo alternative splicing [39, 40] and a considerable portion of human functional variation within the population is likely to cause changes at the transcript level . The sheer abundance of this phenomenon is remarkable, raising the question of how many of the isoforms produced by a single gene encode functional messages. Although most mRNA isoforms produced by alternative splicing will be exported from the nucleus and translated into functional polypeptides, different mRNA isoforms from a single gene can vary greatly in their translation efficiency . Those mRNA isoforms with premature termination codons at least 50 bp upstream of an exon junction complex are likely to be targeted for degradation by the nonsense-mediated mRNA decay (NMD) pathway . Although this type of unproductive splicing is typically thought to be a byproduct of splicing as a stochastic process, the SR genes are clear examples of how this can be exploited as an essential regulatory mechanism [44–46]. SR proteins have been shown to regulate the splicing of their own genes, each of which contain an ultraconserved sequence  such as a poison exon containing a premature termination codon; when spliced into the mature RNA, these exons can trigger transcript degradation by NMD [48, 49]. The first example of this form of splicing factor autoregulation coupled to mRNA surveillance was characterized in the SRSF2/SC35 gene (a member of the SR family): high levels of the SRSF2/SC35 protein promote a 3′ untranslated region splicing event that destabilizes the SRSF2/SC35 mRNA .
Given that exon-intron boundaries can occur at any of the three positions of a codon, it is clear that only a subset of alternative splicing events can maintain the canonical open reading frame. For example, only exons that are evenly divisible by 3 can be skipped or included in the mRNA without any alteration of reading frame. Splicing events that do not have compatible phases will induce a frame-shift. Unless reversed by downstream events, frame-shifts will almost certainly lead to one or more premature termination codons, probably resulting in subsequent degradation by NMD. The most common frame-preserving alternative event type is compatibly phased exon skipping; however, 20% of all frame-preserving alternative splicing events involve the alternative use of adjacent 3′ NAGNAG splice sites [50, 51]. Several studies have investigated the evolution of multi-exon gene architectures and found significant correlation of the edges of exons with protein domain boundaries [52, 53]. Furthermore, exons whose edges correlated with protein domain boundaries were significantly enriched for compatible splice site phase. These observations have been used as evidence for the evolutionary hypothesis of exon shuffling, a mechanism for diversification of modular protein functions [54, 55]. Moreover, the data clearly support the postulate that evolutionary history of a gene will affect its susceptibility to alternative splicing-induced frame-shifting.
Following a model of neutral genetic drift, some genes are under greater selective constraints than others. Genes encoding proteins that have vital and non-redundant roles may impart a major loss of fitness to an organism if disrupted by germline and somatic mutation. Depending on protein structure and function and exon-intron architecture, these genes may be more or less susceptible to aberrant function by different means. For example, different mutations causing loss of function in CFTR can cause varying levels of severity of cystic fibrosis (CF) . Although 70% of CF cases are at least heterozygous for a deletion of phenylalanine 508 (ΔF508) that impairs protein folding and subsequent function , only four other mutations (G542X, N1303K, G551D and W1282X) have allele frequencies above 1% . This leaves a percentage of atypical CF-associated mutations that are rare or unique to individuals or families, resulting in roughly 15% of all CF cases having mutations with unknown functions . Moreover, about 13 to 20% of all the CF-associated mutations are thought to cause pre-mRNA splicing defects by aberrant inclusion or exclusion of several of the 27 exons as a primary mechanism of disease causation . At least one of these, exon 9, has been studied in great detail, illuminating a complex set of regulatory elements that regulate its alternative splicing [60–62].
High-throughput DNA sequencing is now revealing the extent of human genetic variation on a comprehensive scale. However, because of the complexity of these data, it is often unclear which variants are functional and which biochemical mechanisms they affect . For genes that are highly susceptible to aberrant splicing by a number of different mechanisms (such as CFTR; Figure 1), determining the penetrance associated with de novo atypical mutations is a crucial gap towards comprehensive molecular diagnosis for their associated diseases. To tackle this problem for CFTR and other genes with pre-mRNA splicing defects, it is necessary to consider the possible mechanistic impacts of a point mutation on the splicing machinery. Figure 1b,c illustrates some of the architectural features of a generic wild-type (healthy) gene, such as: the presence of one or more exonic splicing enhancers; splicing silencers that work to repress intronic pseudo-exons; and cryptic splice sites. Mutation of 5′ and 3′ splice site dinucleotides and adjacent bases can render them inactive; this is the most easily recognized mechanism of splicing disruption, accounting for 10% of all human inherited disease mutations . For this reason, disruption of the GU and AG splice site dinucleotides are recognized as deleterious by most of the recent single-nucleotide polymorphism functional classification tools, such as those based on SIFT [65, 66]. However, the need for methods or tools to evaluate the impact of genetic variants towards the loss or gain of both ISRs and ESRs remains critical.
Genes sorted by percentage of ESR loss or gain mutations per gene (for genes with more than 10 such mutations)
Number of ESR
mutations in gene
Genes sorted by total number of ESR gain or loss mutations
No of ESR gains
The susceptibility of many disease genes (such as DMD, ATM, NPC1 (Niemann-Pick disease), F9 (hemophilia A), F8 (hemophilia B)) to aberrant pre-mRNA splicing has spawned creative therapeutic approaches that have been the focus of a great deal of time and effort [77–84]. Of these, one of the most successful cases has been the reversal of aberrant exon 7 skipping in the SMA-related gene SMN2 by antisense oligonucleotides [79, 83, 84]. SMA is an autosomal recessive disorder that is characterized by varying severity due to the loss of function of SMN1, of which humans have one copy on each chromosome 5. A nearly identical paralog, SMN2, has only five single nucleotide differences, all of which are non-coding except one C > T synonymous mutation six bases from the 3′ splice site within exon 7. The mechanistic impact of this C > T transition has been studied extensively, and has been shown to be associated with both the loss of an ESE that binds SRFS1 to stimulate exon definition [85, 86] and the antagonistic gain of an ESS that binds hnRNP A1 to repress exon definition . In vivo selection studies and antisense oligonucleotide tiling experiments have additionally discovered a number of other regulatory elements within and adjacent to this exon [80, 87, 88].
Because individuals with SMA typically have loss of SMN1 but normal copies of SMN2, research into a general treatment for SMA has been targeted towards methods to increase splicing of endogenous SMN2 exon 7 as a means to increase functional SMN protein. Recent studies have robustly ameliorated symptoms of severe SMA mouse models through delivery of antisense oligonucleotides masking the ESS-N1 element [79, 83, 84], demonstrating that antisense approaches may represent an effective treatment for SMA. Although most inherited disease-related genes do not have a backup copy similar to SMN2 to serve as a template for RNA targeted therapies, this scenario does illuminate the potential feasibility of rational nucleic acid-based therapeutics in the coming years.
Functional characterization of both germline and somatic variants remains a considerable challenge. This is due in part to the limited understanding of the gene architectural contexts that give rise to varying degrees of susceptibility to aberrant processing. How different degrees of susceptibility contribute to the etiology of inherited and somatic diseases remains a crucial question in the field. This question is becoming increasingly important for several inherited and somatic diseases, including cancer. The root of this question lies at the heart of unraveling the networks of protein-RNA interaction that are active in various cellular contexts. Beyond this task lies the promise of potentially groundbreaking therapeutic approaches based on correcting aberrant protein-RNA interactions within a cell.
Cystic fibrosis transmembrane conductance regulator
Exonic splicing enhancer
Exonic splicing regulator
Exonic splicing silencer
Intronic splicing enhancer
Intronic splicing silencer
RNA binding protein
Spinal muscular atrophy
small ribonucleoprotein particle
Protein serine- and arginine-rich protein.
We wish to thank our funding agencies, including the Ellison Foundation for Medical Research (JRS), NIH NIA (JRS) and The Santa Cruz Cancer Benefit Group (JRS). We also thank Jon Howard for his thoughtful comments on the manuscript.
This article is published under license to BioMed Central Ltd. The licensee has exclusive rights to distribute this article, in any medium, for 12 months following its publication. After this time, the article is available under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.