Divergence of exonic splicing elements after gene duplication and the impact on gene structures
- Zhenguo Zhang†1, 2,
- Li Zhou†1, 2,
- Ping Wang1, 2,
- Yang Liu1, 2,
- Xianfeng Chen1, 2,
- Landian Hu1, 3 and
- Xiangyin Kong1, 3Email author
© Zhang et al.; licensee BioMed Central Ltd. 2009
Received: 15 June 2009
Accepted: 2 November 2009
Published: 2 November 2009
The origin of new genes and their contribution to functional novelty has been the subject of considerable interest. There has been much progress in understanding the mechanisms by which new genes originate. Here we examine a novel way that new gene structures could originate, namely through the evolution of new alternative splicing isoforms after gene duplication.
We studied the divergence of exonic splicing enhancers and silencers after gene duplication and the contributions of such divergence to the generation of new splicing isoforms. We found that exonic splicing enhancers and exonic splicing silencers diverge especially fast shortly after gene duplication. About 10% and 5% of paralogous exons undergo significantly asymmetric evolution of exonic splicing enhancers and silencers, respectively. When compared to pre-duplication ancestors, we found that there is a significant overall loss of exonic splicing enhancers and the magnitude increases with duplication age. Detailed examination reveals net gains and losses of exonic splicing enhancers and silencers in different copies and paralog clusters after gene duplication. Furthermore, we found that exonic splicing enhancer and silencer changes are mainly caused by synonymous mutations, though nonsynonymous changes also contribute. Finally, we found that exonic splicing enhancer and silencer divergence results in exon splicing state transitions (from constitutive to alternative or vice versa), and that the proportion of paralogous exon pairs with different splicing states also increases over time, consistent with previous predictions.
Our results suggest that exonic splicing enhancer and silencer changes after gene duplication have important roles in alternative splicing divergence and that these changes contribute to the generation of new gene structures.
There is an intimate link between the evolution of novel functions and the evolution of new genes [1–3]. Thus, it is important to understand the origin and evolution of new gene structures in order to understand the mechanisms responsible for the generation of new functions. Several mechanisms for the origin of new genes have been proposed, such as exon shuffling, gene duplication, and retroposition (reviewed in ). It is also possible that new gene structures are generated by the evolution of new alternative splicing forms after gene duplication. Consistent with this idea, 6 to 8% of profiled human-chimpanzee orthologous exons display significant splicing level differences in corresponding tissues. These genes affect diverse functions, including regulation of gene expression, signal transduction, cell death, immune defense, and susceptibility to diseases . Previous studies have focused on the quantitative relationship between gene duplication and alternative splicing, and their findings support the quantitative divergence of alternative splicing events after gene duplication [6–9]. However, the details of this mechanism of alternative splicing evolution and its contributions to the evolution of new gene structures remain largely unknown.
Alternative splicing is a common and vital mechanism in higher eukaryotes that helps to increase transcriptional complexity [10, 11]. By ligation of different exons or regions, alternative splicing is able to produce more than one transcript isoform from the same gene . Alternative splicing requires a complex network and is regulated by many factors. In addition to sequences that comprise the branchpoint and the 3' and 5' splice sites, the cellular splicing machinery relies on additional information in the form of exonic splicing enhancer (ESE) and intronic splicing enhancer and exonic splicing silencer (ESS) and intronic splicing silencer sequences [10, 12–14]. To carry out their functions, these elements are bound by different splicing factors, such as SR proteins  and heterogeneous nuclear ribonucleoproteins [10, 16, 17]. Several groups have made progress in the identification of these functional elements using experimental and computational methods . The RESCUE-ESE method identified 238 ESEs that are preferentially associated with constitutive exons with weak splice sites . In another study, octamers that are overrepresented in internal non-coding exons compared with pseudo-exons and 5' untranslated regions of intronless genes were compiled . Putative ESSs were also identified by using in vivo  and computational methods [20, 22]. These lists of putative elements provide a rich resource to study their functions in exon splicing.
To evaluate the contribution of alternative splicing to the evolution of new gene structures, we studied the evolution of splicing-related elements after gene duplication and determined their influence on splicing. The results can also help improve the understanding of functional divergence. We began with ESEs and ESSs, which are the two types of regulatory elements that are currently the best annotated. Using the synonymous substitution rate (Ks) between duplicates as a proxy for time after gene duplication, we find that ESEs and ESSs diverge with evolutionary time and the patterns are different between real and control motifs, indicating the effect of splicing. Furthermore, we observed that there is a tendency for duplicates to loose ESEs over time. Moreover, gains and losses of ESEs and ESSs tend to be concerted processes with one exon in a duplicated pair tending to always either gain or lose ESE or ESS. The net losses and gains are mainly caused by synonymous changes, which suggests that splicing selection is the major force influencing the splicing element changes. Consistent with the roles of ESE and ESS in splicing, we found that alternative exons have higher ESS and lower ESE densities when compared to their constitutive copies. And the fractions of exon pairs with different constitutive exons increase with duplication age. We conclude that ESE and ESS changes after gene duplication play important roles in alternative splicing divergence and contribute to the origin of new gene structures.
ESE and ESS divergences increase with evolutionary time after gene duplication
Spearman rank correlation of ESE and ESS divergence with Ks
Paralogs (n = 2,074)
Orthologs (n = 99,716)
Asymmetric evolution of paralogous exons for ESEs and ESSs
Count (P< 0.05)
In most cases, ESE and ESS motifs in exons contain both protein-coding and splicing-related information. Therefore, evolution of these motifs should be constrained at two levels of selection - the protein sequence constraint and the requirement for accurate splicing - so called dual-coding . To separate the two types of selection and see how splicing requirement affects ESE/ESS evolution, we constructed a set of control motifs using the method described by Ke et al. . Briefly, we reversed and complemented each sequence of the original ESE set and then purged this set of sequences from the original set to obtain the control ESE set. The same procedure was applied to the ESS set. The control ESE and ESS sets have the same coherence among motifs and the same CG dinucleotide content as the original experimental sets. Compared with real motifs, these control motifs also code proteins, but do not presumably have splicing signals. Therefore, a comparison of the real ESE/ESS motifs to these control motifs can separate the effect of selection from splicing constraint. As shown in Figure 1a, real ESE divergence is generally lower than the control in all stages, and more so at the late stage. For the ESSs, the changes in the real motifs are slightly lower than in the control at the early stage, similar to ESEs, but the divergence exceeds that of the control set in the late stage (Figure 1b). These results suggest that splicing selection indeed has an effect on ESE and ESS changes after gene duplication, but the effects are different for ESEs and ESSs. For ESEs, selection for splicing may suppress their divergence to some extent, but for ESSs, the divergence is enlarged by selection on splicing at the late stage. This indicates that ESEs and ESSs have different evolutionary routes. This idea is consistent with another observation that ESEs are favored by exons but that ESSs are selected against during evolution .
One possibility to consider is that ESE/ESS divergence is a common phenomenon when sequence divergence occurs and is not actually related to the gene duplication event, for example, in a manner similar to changes between orthologs. To test this, we reanalyzed the relationships in human-mouse orthologous exons. Indeed, we found positive correlations between ESE and ESS differences and Ks (Table 1; rho = 0.091868 and 0.170465 for ESEs and ESSs, respectively). However, the magnitudes of correlations are much smaller than those that are observed between paralogs (Table 1). This result indicates that sequence divergence itself could also result in ESE/ESS divergence, but the degree is much stronger in paralogs.
Gain and loss of ESEs and ESSs after gene duplication
The above result shows that both ESEs and ESSs diverge in the time after duplication. A more informative analysis would involve the comparison of the paralogs with their pre-duplication ancestors, from which creations or losses could be determined. The Inparanoid program is able to distinguish duplication events that happened before and after speciation and identify only the paralogs that were produced after speciation . We used human and mouse proteins as inputs to identify paralogs in humans. Under these conditions, it is suitable to use mouse orthologs as the pre-duplication ancestors.
We can also employ the shuffled motifs (as used in Figure 1), to see whether splicing requirements affect ESEs and ESSs when protein constraints are controlled. As shown in Figure 2a and Table S2 in Additional data file 1, generally the real ESE elements are lost more than the control ones for paralogous exons. In particular, compared to mouse exons, the average changes of control ESE motifs is around zero when all the paralogous exons ('All' group) are used, but a significant loss for real ESE motifs exists (two-sided Wilcoxon rank sum test, P = 0.002135). By contrast to the paralogous exons, the human-mouse orthologous exon pairs (Orth group) show greater conservation for the real ESE motifs than for the control (two-sided Wilcoxon rank sum test, P = 4.66 × 10-15). Different from ESEs, the changes of ESS elements in paralogous exons are smaller than the control ones, showing fewer gains of ESSs, especially when all the paralogous exons are used (P = 0.01321). Once again we observed a converse pattern in orthologous exons, showing more gains of ESSs than the control (P = 1.31 × 10-8). These observations suggest that splicing greatly affects the gains and losses of splicing elements in duplicates, and the pattern is different from that in non-duplicated genes.
It is also important to examine the ESE and ESS changes with the age of duplication. We use the Ka between duplicates as a proxy of duplication age. Compared to mouse, the ESE changes are negatively correlated with Ka (Spearman rank correlation analysis, rho = -0.1463, P = 2.38E-07; Figure 2c), which suggests that more ESE losses occur as the age of the duplication increases. For ESSs, as expected, there is no detectable correlation between the changes and Ka (rho = 0.0252, P = 0.3903) when the mean values of paralogous exon pairs are used (Figure 2d). We also use Ks as the proxy of duplication age and the same result is observed (Additional data file 2).
Loss and gain of ESEs/ESSs are mainly caused by synonymous changes
Losses and gains of ESEs and ESSs in different copies of paralogous exons have been observed. Especially for the ESEs, only overall losses are found. It has been reported that protein sequences after gene duplications may diverge quickly [29, 30]. Protein sequence changes are likely to disrupt and create ESEs or ESSs. Thus, it is possible that the changes in ESEs and ESSs are just a passive process that is caused by protein divergence rather than the requirement of splicing. A simple way to test this is to see which kind of mutations, synonymous or nonsynonymous, are more likely to result in ESE or ESS net gains and net losses. In other words, the synonymous changes do not affect the protein sequences and should be mainly affected by splicing selection.
ESE and ESS net changes in human duplicates
The above analysis, by averaging the changes in different copies and clusters of paralogs, may obscure the underlying changes to some extent because the changes in different copies and in different paralogous exon pairs are discordant (Figure 3a, b). To support the above analysis further, each time we only examined the changes generated in the High or Low copy of each paralogous exon pair, as defined in the previous section. This method may enlarge the changes to some extent, but it is safe to test whether the changes are mainly caused by synonymous mutations. As shown in Table 3, in the High group, there were no net creations or disruptions for ESEs on average (mean = 0.022805, P = 0.4164), but there were significant net creations for ESSs (mean = 0.110604, P = 1.37E-07). Both synonymous and nonsynonymous mutations produced more ESS creations (P = 0.000106 and 0.03008, respectively), and the magnitude of net changes that were created by synonymous mutations was larger (Wilcoxon signed-rank test, P = 0.03114). In the Low group, net disruptions were observed for both ESEs (mean = -0.31471, P = 7.6E-13) and ESSs (mean = -0.0992, P = 1.72E-05). Moreover, the net changes produced by synonymous mutations are again larger (synonymous versus nonsynonymous, P = 0.01216 and 0.1173 for ESEs and ESSs, respectively). These findings are consistent with a previous report that the ratio of ESE/ESS changes are mostly caused by synonymous mutations  and suggest splicing itself is the major factor influencing ESE and ESS evolution.
ESE and ESS changes influence exon splicing isoforms
ESEs and ESSs function as regulatory elements to activate or repress splicing of a certain exon [10, 32]. The changes in these elements should contribute to the formation of different splicing isoforms (constitutive or alternative). To test this, we compared the changes of ESEs and ESSs between alternative and constitutive paralogous exons.
Alternative and constitutive exon counts in the ESE/ESS High and Low groups
Proportion of A
Proportion of A
Several mechanisms for the creation of new gene structures are known (reviewed in ). For example, duplication of ymp followed by recombination of retroposed Adh generated the jingwei gene . However, evolution works like a tinker , and gene duplication may be only the first step. Subsequent changes to the duplicated gene are also important for sub-functionalization or neo-functionalization and determine the fates of duplicates [1, 19, 36]. In this study, we examined the divergence of splicing properties after gene duplication.
ESE and ESS divergence contributes to new gene structures and functional divergence
It has reported that alternative exons could be derived from constitutive exons [31, 37, 38], possibly by weakening of 5' splice sites . Exon splicing regulatory elements also play important roles in splicing state transitions , especially under weak splice sites . Consistent with this idea, we observed lower ESE densities and higher ESS densities in alternative exons than in their constitutive copies (Figure 5a). We also checked if the alternative exons had lower splice site scores, but there was no difference (Table S4 in Additional data file 1). This result suggests that ESE/ESS changes, more than splice site changes, are responsible for splicing state transitions after gene duplication. Based on this, we speculated that one copy of the CC exon pairs with large ESE/ESS changes may be on the verge of becoming alternatively spliced, as any changes that relax the splicing signals would result in alternatively spliced exons. Consistent with our idea, a previous study proposed a model of alternative splicing evolution relevant to duplicated genes. The model depicts that alternative splicing isoforms are lost rapidly after gene duplication at an early stage and then new splicing isoforms evolve at a late stage (see Figure 2 in ). Our results support this model well (Figure 6). These new splicing isoforms that evolved after gene duplication are valuable sources of new gene structures. Some originally constitutive exons may be alternatively spliced in a minor mode and even lost in the future, depending on natural selection and genetic drift . Similarly, some originally alternatively spliced exons may become constitutive when they are beneficial. Compared to other mechanisms of new gene structure origination, modifications of gene structures after gene duplication may require fewer changes and changes of otherwise low impact, thus rendering such a path to new structures relatively evolvable.
Divergence of splicing structures may also produce functional divergence among duplicates. It is easy to think that ESE/ESS changes that result in splicing state transitions will also result in new splicing isoforms, which will, in turn, acquire novel functions or partition ancestral functions. This may be realized mainly by two categories of changes. Alternative splicing could change functions by altering the expression of the gene, such as expression level or the place and time of expression. Alternative splicing is a subnetwork of gene regulation , and, thus, the changes could influence gene expression [32, 41–44]. For example, alternative splicing is coupled with nonsense-mediated mRNA decay to regulate gene expression [45–47]. Previous studies showed that expression divergence is partially caused by changes in promoter regulatory elements and related trans-acting factors [48–50]. However, these changes could only explain 2 to 20% of expression divergence [48, 49], and a weak relationship is also found in orthologs , which suggests that the promoter region is a weak predictor of gene expression levels. We hypothesize that alternative splicing is another possible way to alter expression. Since ESEs and ESSs are generally bound by specific factors in order to function, such as SR-proteins (for ESEs) and heterogeneous nuclear ribonucleoproteins (for ESSs) . These factors may be expressed in a tissue-specific manner, such as nova [52–54]. These factors, combined with ESE/ESS divergence, could result in different expression among different tissues.
Consistent with this idea, Makova and Li  found that many duplicates are expressed differentially in different tissues. On the other hand, alternative splicing involves either inclusion or skipping of certain regions or exons in the gene, such that different protein products and structures are generated [56–59]. Alternative splicing could also change the reading frame in certain regions of exons and, in turn, encode different peptides . Both mechanisms should be very important for the evolution of new functions or subfunctions that are different from the ancestral gene. Compared to coding sequence divergence, changes that are required in alternative splicing are maybe much more minor in order to reach the same degree of divergence . However, it is possible that ESEs and ESSs may result in functional divergence in an alternative-splicing-independent manner. For example, ESEs may function as a transcriptional pause site , influence the transcription rate, and, in turn, produce different expression levels.
A previous study reported the negative correlation between coexpression and Ka . In our study, we also found that both ESE changes are negatively correlated with Ka (Figure 2). This suggests that ESE/ESS changes are coupled with protein sequence changes. This is very important as protein sequence changes may help improve functionality when the protein is expressed in a different place or time. Furthermore, alternative splicing itself could alter the evolutionary rate of alternative exons or regions by lowering purifying selection pressure .
ESEs and ESSs are selected at alternative splicing and protein levels
As ESEs and ESSs are mostly located within protein-coding sequences, their changes possibly affect both the protein sequence and splicing efficiency. Previous studies have proposed that ESEs constrained protein sequence changes and codon usage, especially in proximity to the exon boundary [27, 62, 63]. These studies suggest selection of splicing on ESEs inhibits the protein sequence changes and codon choices. On the other hand, protein sequence changes should also affect ESE/ESS motifs. Protein sequence divergence after gene duplication is important to evolve new functions [23, 30]. Thus, the disruption and creation of ESEs/ESSs caused by protein sequences may be inevitable. Consistent with this idea, both creation and disruption could be caused by nonsynonymous changes (Table 3). However, these nonsynonymous changes may be slightly deleterious and the primary force may actually be from the splicing selections. In fact, the changes in ESEs/ESSs at nonsynonymous sites may be selected at two levels, protein and splicing . It will be interesting to discriminate between the two types of selections and determine which is more predominant in future studies. As the two types of selections may not always coordinate, splicing selection should better operate at different places, such as synonymous sites, which are independent of protein sequence. Consistent with this, most ESE and ESS changes are caused by synonymous mutations (Figure 4 and Table 3). This result is also consistent with previous findings that synonymous sites show a much clearer trend toward increased evolutionary rate with increased distance from the splice sites than nonsynonymous sites , which suggests that splicing mainly acts on synonymous sites.
The limitation of our studies
In this study, we do not include the intronic elements, such as intronic splicing silencer and enhancers, due to lack of validated global sets of these motifs. It will be interesting to determine how intronic elements have evolved after gene duplication and their relationships with ESEs/ESSs, and 5' and 3' start sites when the data sets are available. In fact, a significant challenge in this area is determining the method by which this information can be combined and by which exon splicing could be predicted [13, 16, 18, 32]. This would heavily rely on our knowledge of splicing regulation. Although we found higher ESS and lower ESE densities in alternative exons than their constitutive copies, this is just an average trend and does not mean that exons with higher ESS or lower ESE densities are deterministically alternatively spliced. Changes in ESEs or ESSs alone could not predict exon splicing state accurately . It is likely that understanding of splicing element changes will improve when the 'splicing code' [65, 66] is well defined in the future.
We used the duplicates that occurred after the human-mouse split, about 91 million years ago . Therefore, the duplicates in our datasets are relatively young. It will be interesting to see how ESEs/ESSs evolve when duplicates over a long evolutionary period are used in the future. On the other hand, since only a limited fraction of exons in our mouse set was mapped to the ASAP2 database , there is not enough exon splicing state information for the mouse exons, which prevents us from determining the direction of splicing state changes. This should be extended in future analyses, when mouse exon splicing data have accumulated. The newly emerging high-throughput sequencing technology has provided a powerful tool in this direction [47, 69–72].
Our results revealed the evolution pattern of exonic splicing elements after gene duplication and its influence on exon splicing. Furthermore the findings suggest that splicing requirement but not protein sequence mostly determines the changes of ESE and ESS. Combined with previous studies, our results imply that alternative splicing changes after gene duplication can have important roles in generating new splicing isoforms and gene structures.
Materials and methods
Identification of homologous exons
We downloaded the sequences and annotations for human and mouse from the NCBI RefSeq  database (build 36.1)  in January 2007. Any mRNAs for which the length in nucleotides of the coding sequence was not a multiple of three or that contain in-frame stop codons were excluded. The exon-intron structure information from the annotation files was then parsed using a Perl script.
To identify orthologs and paralogs for human and mouse, all the human and mouse proteins were input into the Inparanoid program (version 1.35) [28, 75, 76], which was run with the default parameters except for the overlap cutoff, which was set to 0.8 to improve specificity. The Inparanoid program is able to identify orthologs and in-paralogs (duplication that occurred after speciation) between two species. Therefore, we could obtain the orthologs between human and mouse and the paralogs in human, which were generated after human-mouse split (see the InParanoid FAQ  for details). Thus, it is reasonable to assume the mouse orthologs are the pre-duplication ancestors for the human paralogs.
To identify the homologous exons, the proteins in each paralog cluster were aligned using MUSCLE  and then converted to a coding sequence alignment using the BioPerl utility aa_to_dna_aln [79, 80]. The exon-intron information was mapped on these alignments using Perl scripts. The paralogous exons were defined when they satisfied the following two criteria: the two exons must align at least at one boundary in the coding sequence alignment; and the two exons have the same length. We obtained 2,074 paralogous exons with this method (Additional data file 4). The same approach was applied to the orthologous genes between human and mouse, and 99,716 pairs of orthologous exons were identified (Additional data file 5). To improve the accuracy of homologous exon identifications, only mRNAs with the 'NM_' prefix, which generally have more experimental support, were used. The first and last coding exons in each mRNA were also excluded, as these exons are generally partially coding and may have different properties.
Calculation of Ka and Ks
To calculate Ka and Ks, the coding sequence alignments generated above were input into the PAML package , and the values were calculated using the CodeML program with the parameter CodonFreq equal to F3X4, which is estimated from the nucleotide composition at each of three codon positions.
Scanning of ESEs and ESSs in exons
where density human and density mouse are the motif densities for human and mouse genes, respectively, and the function max gets the larger value of the two. A positive value indicates a gain of motifs and a negative value indicates a loss of motifs. The ESE proportion for each is calculated as ESE count divided by the sum of ESE and ESS counts, like that described by Zhang et al. .
In order to get the control ESE and ESS motifs, as described in , the ESE motifs were converted into their reverse and complementary sequences, and original ESE sequences were purged from the converted sequences. The end result was 327 real ESE and 327 control ESE motifs. The same approach was applied to ESSs and resulted in 153 real ESS and 153 control ESS motifs. The new sets of motifs were searched separately, and compared to the real and control motifs.
In this study, we use the list of ESEs and ESSs that was compiled for humans and applied this list in a search of the mouse genome, since there is no validated ESS list for the mouse. This should have a systematic influence on the scanning of ESEs/ESSs in the mouse genome but should not change our conclusions, since we used the orthologous exon pairs as a control.
Testing the asymmetric evolution of paralogous exons
and P = 1 for d1 = d2. To minimize the noise, we exclude the exon pairs with the total difference d1 + d2 less than 6, which is the minimum to get a P-value of 0.05 in the most asymmetric situation. This resulted in 1,089 and 454 testable exon pairs for ESEs and ESSs, respectively.
Splicing score calculation
The tool MaxEntScan [83, 84] was used to calculate the scores for the 5' and 3' splice sites. The 5' splice site includes three bases in the exon and six bases in the intron. The 3' splice site includes 20 bases in the intron and 3 bases in the exon.
Exon splicing state classification
To determine the splicing state (alternative or constitutive) for each exon, as described in , we mapped the ASAP2 database  exons to our exon sets by their positions on the chromosomes. The exon splicing states were then extracted from the ASAP2 database, which, in turn, allowed us to obtain the splicing states for our exons.
Creation and disruption of ESEs/ESSs
To determine the mutations that create or disrupt ESEs/ESSs after gene duplication in human paralogs, the human exon coding sequences were aligned with mouse orthologous exons using their peptide alignment as a guide. For each pair of paralogous exons, only the mutational positions that were different within this pair of exons were considered, in order to ensure that the examined mutations occurred after gene duplication. The bases were compared to mouse orthologs at each mutation position in order to decide the mutational direction. Only those cases in which one of a pair of paralogous exons is the same as the mouse ortholog are considered, because this will enable determination of the sequence in which the mutation happened.
After determining the mutational direction, the six hexamers that overlapped the changed bases in the paralog were compared to aligned six hexamers in the ancestor to determine a net loss or gain of ESEs or ESSs. If there are more ESEs or ESSs in the ancestor, this mutation is likely a disruption, and, if it is vice versa, then the mutation is likely a creation. Note the two hexamers under comparison must contain only one mutation to exclude the possibility that the motif loss or gain is caused by mutations other than the one we are focused on.
Additional data files
The following additional data are available with the online version of this paper: a document providing supplementary materials, including supplementary Tables S1, S2, S3 and S4 (Additional data file 1); a figure showing the relationship between ESE/ESS changes and Ks (Additional data file 2); a figure providing the counts of different types of paralogous exon pairs in each Ks group (Additional data file 3); a dataset listing all the human paralogous exon pairs used in this study (Additional data file 4); the complete list of human-mouse orthologous exon pairs used in this study (Additional data file 5).
exonic splicing enhancer
exonic splicing silencer
nonsynonymous substitution rate
synonymous substitution rate.
We would like to thank Dr. Laurence D Hurst for critical comments and kind help in the manuscript preparation and thank Mrs Suchao Chen, Zhen Wang and Dr Zhixi Su for helpful discussions. We also would like to thank the two anonymous referees for their constructive suggestions for improving our manuscript. This work is supported by the National High Technology Research and Development Program of China (2006AA02Z330, 2006AA02A301), the National Basic Research Program of China (No. 2007CB512202, 2004CB518603), the National Natural Science Foundation of China, Key Program (No.30530450), and the Knowledge Innovation Program of the Chinese Academy of Sciences (Grant No. KSCX1-YW-R-74). The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.
- Katju V, Lynch M: On the formation of novel genes by duplication in the Caenorhabditis elegans genome. Mol Biol Evol. 2006, 23: 1056-1067.PubMedView ArticleGoogle Scholar
- Otto SP, Yong P: The evolution of gene duplicates. Adv Genet. 2002, 46: 451-483.PubMedView ArticleGoogle Scholar
- Long M: Evolution of novel genes. Curr Opin Genet Dev. 2001, 11: 673-680.PubMedView ArticleGoogle Scholar
- Long M, Betran E, Thornton K, Wang W: The origin of new genes: glimpses from the young and old. Nat Rev Genet. 2003, 4: 865-875.PubMedView ArticleGoogle Scholar
- Calarco JA, Xing Y, Caceres M, Calarco JP, Xiao X, Pan Q, Lee C, Preuss TM, Blencowe BJ: Global analysis of alternative splicing differences between humans and chimpanzees. Genes Dev. 2007, 21: 2963-2975.PubMedPubMed CentralView ArticleGoogle Scholar
- Talavera D, Vogel C, Orozco M, Teichmann SA, de la Cruz X: The (in)dependence of alternative splicing and gene duplication. PLoS Comput Biol. 2007, 3: e33-PubMedPubMed CentralView ArticleGoogle Scholar
- Guan Q, Zheng W, Tang S, Liu X, Zinkel RA, Tsui KW, Yandell BS, Culbertson MR: Impact of nonsense-mediated mRNA decay on the global expression profile of budding yeast. PLoS Genet. 2006, 2: e203-PubMedPubMed CentralView ArticleGoogle Scholar
- Kopelman NM, Lancet D, Yanai I: Alternative splicing and gene duplication are inversely correlated evolutionary mechanisms. Nat Genet. 2005, 37: 588-589.PubMedView ArticleGoogle Scholar
- Pan Q, Bakowski MA, Morris Q, Zhang W, Frey BJ, Hughes TR, Blencowe BJ: Alternative splicing of conserved exons is frequently species-specific in human and mouse. Trends Genet. 2005, 21: 73-77.PubMedView ArticleGoogle Scholar
- Black DL: Mechanisms of alternative pre-messenger RNA splicing. Annu Rev Biochem. 2003, 72: 291-336.PubMedView ArticleGoogle Scholar
- Blencowe BJ: Alternative splicing: new insights from global analyses. Cell. 2006, 126: 37-47.PubMedView ArticleGoogle Scholar
- Solis AS, Shariat N, Patton JG: Splicing fidelity, enhancers, and disease. Front Biosci. 2008, 13: 1926-1942.PubMedView ArticleGoogle Scholar
- Hertel KJ: Combinatorial control of exon recognition. J Biol Chem. 2008, 283: 1211-1215.PubMedView ArticleGoogle Scholar
- Lim LP, Burge CB: A computational analysis of sequence features involved in recognition of short introns. Proc Natl Acad Sci USA. 2001, 98: 11193-11198.PubMedPubMed CentralView ArticleGoogle Scholar
- Blencowe BJ, Bowman JA, McCracken S, Rosonina E: SR-related proteins and the processing of messenger RNA precursors. Biochem Cell Biol. 1999, 77: 277-291.PubMedView ArticleGoogle Scholar
- Wang Z, Burge CB: Splicing regulation: from a parts list of regulatory elements to an integrated splicing code. Rna. 2008, 14: 802-813.PubMedPubMed CentralView ArticleGoogle Scholar
- Graveley BR: Sorting out the complexity of SR protein functions. Rna. 2000, 6: 1197-1211.PubMedPubMed CentralView ArticleGoogle Scholar
- Chasin LA: Searching for splicing motifs. Adv Exp Med Biol. 2007, 623: 85-106.PubMedView ArticleGoogle Scholar
- Fairbrother WG, Yeh RF, Sharp PA, Burge CB: Predictive identification of exonic splicing enhancers in human genes. Science. 2002, 297: 1007-1013.PubMedView ArticleGoogle Scholar
- Zhang XH, Chasin LA: Computational definition of sequence motifs governing constitutive exon splicing. Genes Dev. 2004, 18: 1241-1250.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang Z, Rolish ME, Yeo G, Tung V, Mawson M, Burge CB: Systematic identification and analysis of exonic splicing silencers. Cell. 2004, 119: 831-845.PubMedView ArticleGoogle Scholar
- Goren A, Ram O, Amit M, Keren H, Lev-Maor G, Vig I, Pupko T, Ast G: Comparative analysis identifies exonic splicing regulatory sequences - the complex definition of enhancers and silencers. Mol Cell. 2006, 22: 769-781.PubMedView ArticleGoogle Scholar
- Jones CD, Begun DJ: Parallel evolution of chimeric fusion genes. Proc Natl Acad Sci USA. 2005, 102: 11373-11378.PubMedPubMed CentralView ArticleGoogle Scholar
- Wagner A: Asymmetric functional divergence of duplicate genes in yeast. Mol Biol Evol. 2002, 19: 1760-1768.PubMedView ArticleGoogle Scholar
- Su Z, Wang J, Yu J, Huang X, Gu X: Evolution of alternative splicing after gene duplication. Genome Res. 2006, 16: 182-189.PubMedPubMed CentralView ArticleGoogle Scholar
- Warnecke T, Weber CC, Hurst LD: Why there is more to protein evolution than protein function: splicing, nucleosomes and dual-coding sequence. Biochem Soc Trans. 2009, 37: 756-761.PubMedView ArticleGoogle Scholar
- Ke S, Zhang XH, Chasin LA: Positive selection acting on splicing motifs reflects compensatory evolution. Genome Res. 2008, 18: 533-543.PubMedPubMed CentralView ArticleGoogle Scholar
- Remm M, Storm CE, Sonnhammer EL: Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. J Mol Biol. 2001, 314: 1041-1052.PubMedView ArticleGoogle Scholar
- Lynch M, Conery JS: The evolutionary fate and consequences of duplicate genes. Science. 2000, 290: 1151-1155.PubMedView ArticleGoogle Scholar
- Ohno S: Evolution by Gene Duplication. 1970, Berlin: SpringerView ArticleGoogle Scholar
- Lev-Maor G, Goren A, Sela N, Kim E, Keren H, Doron-Faigenboim A, Leibman-Barak S, Pupko T, Ast G: The "alternative" choice of constitutive exons throughout evolution. PLoS Genet. 2007, 3: e203-PubMedPubMed CentralView ArticleGoogle Scholar
- Matlin AJ, Clark F, Smith CW: Understanding alternative splicing: towards a cellular code. Nat Rev Mol Cell Biol. 2005, 6: 386-398.PubMedView ArticleGoogle Scholar
- Zhang XH, Arias MA, Ke S, Chasin LA: Splicing of designer exons reveals unexpected complexity in pre-mRNA splicing. Rna. 2009, 15: 367-376.PubMedPubMed CentralView ArticleGoogle Scholar
- Long M, Langley CH: Natural selection and the origin of jingwei, a chimeric processed functional gene in Drosophila. Science. 1993, 260: 91-95.PubMedView ArticleGoogle Scholar
- Jacob F: Evolution and tinkering. Science. 1977, 196: 1161-1166.PubMedView ArticleGoogle Scholar
- Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J: Preservation of duplicate genes by complementary, degenerative mutations. Genetics. 1999, 151: 1531-1545.PubMedPubMed CentralGoogle Scholar
- Koren E, Lev-Maor G, Ast G: The emergence of alternative 3' and 5' splice site exons from constitutive exons. PLoS Comput Biol. 2007, 3: e95-PubMedPubMed CentralView ArticleGoogle Scholar
- Kim E, Goren A, Ast G: Alternative splicing: current perspectives. Bioessays. 2008, 30: 38-47.PubMedView ArticleGoogle Scholar
- Ast G: How did alternative splicing evolve?. Nat Rev Genet. 2004, 5: 773-782.PubMedView ArticleGoogle Scholar
- Kralovicova J, Vorechovsky I: Global control of aberrant splice-site activation by auxiliary splicing sequences: evidence for a gradient in exon and intron definition. Nucleic Acids Res. 2007, 35: 6399-6413.PubMedPubMed CentralView ArticleGoogle Scholar
- Wegmann D, Dupanloup I, Excoffier L: Width of gene expression profile drives alternative splicing. PLoS ONE. 2008, 3: e3587-PubMedPubMed CentralView ArticleGoogle Scholar
- Cheah MT, Wachter A, Sudarsan N, Breaker RR: Control of alternative RNA splicing and gene expression by eukaryotic riboswitches. Nature. 2007, 447: 497-500.PubMedView ArticleGoogle Scholar
- Elton TS, Martin MM: Alternative splicing: a novel mechanism to fine-tune the expression and function of the human AT1 receptor. Trends Endocrinol Metab. 2003, 14: 66-71.PubMedView ArticleGoogle Scholar
- Shin C, Manley JL: Cell signalling and the control of pre-mRNA splicing. Nat Rev Mol Cell Biol. 2004, 5: 727-738.PubMedView ArticleGoogle Scholar
- Cuccurese M, Russo G, Russo A, Pietropaolo C: Alternative splicing and nonsense-mediated mRNA decay regulate mammalian ribosomal gene expression. Nucleic Acids Res. 2005, 33: 5965-5977.PubMedPubMed CentralView ArticleGoogle Scholar
- Lareau LF, Brooks AN, Soergel DA, Meng Q, Brenner SE: The coupling of alternative splicing and nonsense-mediated mRNA decay. Adv Exp Med Biol. 2007, 623: 190-211.PubMedView ArticleGoogle Scholar
- Saltzman AL, Kim YK, Pan Q, Fagnani MM, Maquat LE, Blencowe BJ: Regulation of multiple core spliceosomal proteins by alternative splicing-coupled nonsense-mediated mRNA decay. Mol Cell Biol. 2008, 28: 4320-4330.PubMedPubMed CentralView ArticleGoogle Scholar
- Leach LJ, Zhang Z, Lu C, Kearsey MJ, Luo Z: The role of cis-regulatory motifs and genetical control of expression in the divergence of yeast duplicate genes. Mol Biol Evol. 2007, 24: 2556-2565.PubMedView ArticleGoogle Scholar
- Su AI, Wiltshire T, Batalov S, Lapp H, Ching KA, Block D, Zhang J, Soden R, Hayakawa M, Kreiman G, Cooke MP, Walker JR, Hogenesch JB: A gene atlas of the mouse and human protein-encoding transcriptomes. Proc Natl Acad Sci USA. 2004, 101: 6062-6067.PubMedPubMed CentralView ArticleGoogle Scholar
- Pal C, Papp B, Hurst LD: Highly expressed genes in yeast evolve slowly. Genetics. 2001, 158: 927-931.PubMedPubMed CentralGoogle Scholar
- Chiba H, Yamashita R, Kinoshita K, Nakai K: Weak correlation between sequence conservation in promoter regions and in protein-coding regions of human-mouse orthologous gene pairs. BMC Genomics. 2008, 9: 152-PubMedPubMed CentralView ArticleGoogle Scholar
- Jensen KB, Dredge BK, Stefani G, Zhong R, Buckanovich RJ, Okano HJ, Yang YY, Darnell RB: Nova-1 regulates neuron-specific alternative splicing and is essential for neuronal viability. Neuron. 2000, 25: 359-371.PubMedView ArticleGoogle Scholar
- Buckanovich RJ, Yang YY, Darnell RB: The onconeural antigen Nova-1 is a neuron-specific RNA-binding protein, the activity of which is inhibited by paraneoplastic antibodies. J Neurosci. 1996, 16: 1114-1122.PubMedGoogle Scholar
- Ule J, Jensen KB, Ruggiu M, Mele A, Ule A, Darnell RB: CLIP identifies Nova-regulated RNA networks in the brain. Science. 2003, 302: 1212-1215.PubMedView ArticleGoogle Scholar
- Makova KD, Li WH: Divergence in the spatial pattern of gene expression between human duplicate genes. Genome Res. 2003, 13: 1638-1645.PubMedPubMed CentralView ArticleGoogle Scholar
- Kondrashov FA, Koonin EV: Evolution of alternative splicing: deletions, insertions and origin of functional parts of proteins from intron sequences. Trends Genet. 2003, 19: 115-119.PubMedView ArticleGoogle Scholar
- Shionyu M, Yamaguchi A, Shinoda K, Takahashi K, Go M: AS-ALPS: a database for analyzing the effects of alternative splicing on protein structure, interaction and network in human and mouse. Nucleic Acids Res. 2009, 37: D305-309.PubMedPubMed CentralView ArticleGoogle Scholar
- Birzele F, Kuffner R, Meier F, Oefinger F, Potthast C, Zimmer R: ProSAS: a database for analyzing alternative splicing in the context of protein structures. Nucleic Acids Res. 2008, 36: D63-68.PubMedPubMed CentralView ArticleGoogle Scholar
- Stetefeld J, Ruegg MA: Structural and functional diversity generated by alternative mRNA splicing. Trends Biochem Sci. 2005, 30: 515-521.PubMedView ArticleGoogle Scholar
- Liang H, Landweber LF: A genome-wide study of dual coding regions in human alternatively spliced genes. Genome Res. 2006, 16: 190-196.PubMedPubMed CentralView ArticleGoogle Scholar
- Xing Y, Lee C: Evidence of functional selection pressure for alternative splicing events that accelerate evolution of protein subsequences. Proc Natl Acad Sci USA. 2005, 102: 13526-13531.PubMedPubMed CentralView ArticleGoogle Scholar
- Parmley JL, Urrutia AO, Potrzebowski L, Kaessmann H, Hurst LD: Splicing and the evolution of proteins in mammals. PLoS Biol. 2007, 5: e14-PubMedPubMed CentralView ArticleGoogle Scholar
- Parmley JL, Hurst LD: Exonic splicing regulatory elements skew synonymous codon usage near intron-exon boundaries in mammals. Mol Biol Evol. 2007, 24: 1600-1603.PubMedView ArticleGoogle Scholar
- Dewey CN, Rogozin IB, Koonin EV: Compensatory relationship between splice sites and exonic splicing signals depending on the length of vertebrate introns. BMC Genomics. 2006, 7: 311-PubMedPubMed CentralView ArticleGoogle Scholar
- Irimia M, Rukov JL, Roy SW: Evolution of alternative splicing regulation: changes in predicted exonic splicing regulators are not associated with changes in alternative splicing levels in primates. PLoS One. 2009, 4: e5800-PubMedPubMed CentralView ArticleGoogle Scholar
- Fu XD: Towards a splicing code. Cell. 2004, 119: 736-738.PubMedView ArticleGoogle Scholar
- Hedges SB: The origin and evolution of model organisms. Nat Rev Genet. 2002, 3: 838-849.PubMedView ArticleGoogle Scholar
- Kim N, Alekseyenko AV, Roy M, Lee C: The ASAP II database: analysis and comparative genomics of alternative splicing in 15 animal species. Nucleic Acids Res. 2007, 35: D93-98.PubMedPubMed CentralView ArticleGoogle Scholar
- Carninci P, Kasukawa T, Katayama S, Gough J, Frith MC, Maeda N, Oyama R, Ravasi T, Lenhard B, Wells C, Kodzius R, Shimokawa K, Bajic VB, Brenner SE, Batalov S, Forrest AR, Zavolan M, Davis MJ, Wilming LG, Aidinis V, Allen JE, Ambesi-Impiombato A, Apweiler R, Aturaliya RN, Bailey TL, Bansal M, Baxter L, Beisel KW, Bersano T, Bono H, et al: The transcriptional landscape of the mammalian genome. Science. 2005, 309: 1559-1563.PubMedView ArticleGoogle Scholar
- Wang Z, Gerstein M, Snyder M: RNA-Seq: a revolutionary tool for transcriptomics. Nat Rev Genet. 2009, 10: 57-63.PubMedPubMed CentralView ArticleGoogle Scholar
- Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature. 2008, 453: 1239-1243.PubMedView ArticleGoogle Scholar
- Karolchik D, Kuhn RM, Baertsch R, Barber GP, Clawson H, Diekhans M, Giardine B, Harte RA, Hinrichs AS, Hsu F, Kober KM, Miller W, Pedersen JS, Pohl A, Raney BJ, Rhead B, Rosenbloom KR, Smith KE, Stanke M, Thakkapallayil A, Trumbower H, Wang T, Zweig AS, Haussler D, Kent WJ: The UCSC Genome Browser Database: 2008 update. Nucleic Acids Res. 2008, 36: D773-779.PubMedPubMed CentralView ArticleGoogle Scholar
- Pruitt KD, Tatusova T, Maglott DR: NCBI reference sequences (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins. Nucleic Acids Res. 2007, 35: D61-65.PubMedPubMed CentralView ArticleGoogle Scholar
- National Center for Biotechnology Information ftp. [ftp://ftp.ncbi.nih.gov/genomes]
- Berglund AC, Sjolund E, Ostlund G, Sonnhammer EL: InParanoid 6: eukaryotic ortholog clusters with inparalogs. Nucleic Acids Res. 2008, 36: D263-266.PubMedPubMed CentralView ArticleGoogle Scholar
- InParanoid: Eukaryotic Ortholog Groups. [http://inparanoid.sbc.su.se/]
- InParanoid FAQ. [http://inparanoid.sbc.su.se/cgi-bin/faq.cgi]
- Edgar RC: MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res. 2004, 32: 1792-1797.PubMedPubMed CentralView ArticleGoogle Scholar
- Stajich JE: An introduction to BioPerl. Methods Mol Biol. 2007, 406: 535-548.PubMedGoogle Scholar
- Stajich JE, Block D, Boulez K, Brenner SE, Chervitz SA, Dagdigian C, Fuellen G, Gilbert JG, Korf I, Lapp H, Lehväslaiho H, Matsalla C, Mungall CJ, Osborne BI, Pocock MR, Schattner P, Senger M, Stein LD, Stupka E, Wilkinson MD, Birney E: The Bioperl toolkit: Perl modules for the life sciences. Genome Res. 2002, 12: 1611-1618.PubMedPubMed CentralView ArticleGoogle Scholar
- Yang Z, Nielsen R: Estimating synonymous and nonsynonymous substitution rates under realistic evolutionary models. Mol Biol Evol. 2000, 17: 32-43.PubMedView ArticleGoogle Scholar
- Corvelo A, Eyras E: Exon creation and establishment in human genes. Genome Biol. 2008, 9: R141-PubMedPubMed CentralView ArticleGoogle Scholar
- Yeo G, Burge CB: Maximum entropy modeling of short sequence motifs with applications to RNA splicing signals. J Comput Biol. 2004, 11: 377-394.PubMedView ArticleGoogle Scholar
- MaxEntScan. [http://genes.mit.edu/burgelab/maxent/Xmaxentscan_scoreseq.html]
- Zhang Z, Xin D, Wang P, Zhou L, Hu L, Kong X, Hurst LD: Noisy splicing, more than expression regulation, explains why some exons are subject to nonsense-mediated mRNA decay. BMC Biol. 2009, 7: 23-PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.