RNA sequencing of wild-type, sad1 mutant and SAD1-overexpressing plants
The Arabidopsis sad1 mutant was isolated in our previous genetic screen for components that regulate the expression of stress-responsive genes [14]. The sad1 mutant was also more sensitive to stress and ABA inhibition of seed germination and seedling growth [14]. Since loss-of-function mutations in any single-copied core LSm genes are expected to be lethal, the recovery of this point-mutation sad1 mutant provided an invaluable opportunity to study the functions of this important group of proteins. To explore the role of SAD1 in gene expression and stress tolerance, we generated transgenic Arabidopsis plants over-expressing the wild-type SAD1 cDNA (under control of the cauliflower mosaic virus 35S promoter) both in the wild type (ecotype C24) and in the sad1 mutant background (SAD1-OE, see Methods). Although the transgenic plants in both backgrounds had similar physiological and molecular phenotypes, here we mainly focus on SAD1-OE in the sad1 mutant background (referred to as SAD1-OE hereafter).
As shown in Figure 1A, overexpressing wild-type SAD1 rescued the small stature phenotype of the sad1 mutant, demonstrating that the phenotypic defects of the sad1 mutant were caused by the loss of the wild-type SAD1 protein. We genotyped these seedlings using primers that span the whole gene body. The PCR products in the SAD1-OE plant had two bands, representing the original SAD1 gene and the transferred cDNA, respectively (Figure 1B).
We next performed RNA-seq using the Illumina HiSeq platform (Illumina Inc., San Diego, CA, USA) on two-week-old seedlings of C24 (wild type), sad1 and SAD1-OE. These seedlings were subjected to two treatments: control (H2O) and salt stress (300 mM NaCl, 3 h). The salt-stress treatment was based on our previous observations that stress-responsive genes were most obviously activated and that the mutant sad1 showed strong molecular phenotypes under these conditions [14, 17]. Based on six cDNA libraries (C24-control, sad1-control, SAD1-OE-control, C24-NaCl, sad1-NaCl and SAD1-OE-NaCl), we generated a total of 164 million reads (101 bp in length, except for the SAD1-OE control, whose reads were 85 bp in length), about 90% of which could be uniquely aligned to the TAIR10 reference genome sequence (version TAIR10; [18]) (Additional file 1). Comparison of the mapped reads to the gene model (version TAIR10) revealed that approximately 95% of the reads mapped to the exonic regions, whereas only about 3% mapped to intergenic regions (Additional file 2), which were consistent with the Arabidopsis gene annotation. Plotting the coverage of reads along each transcript exhibited a uniform distribution with no obvious 3′/5′ bias, which reflects the high quality of the cDNA libraries (Additional file 3). Furthermore, assessing the sequencing saturation demonstrated that as more reads were obtained, the number of new discovered genes plateaued (Additional file 4). This suggests that extensive coverage was achieved, which can also be seen when the read coverage was plotted by chromosome, demonstrating extensive transcriptional activity in the genome (Additional file 5).
We previously identified the sad1 mutation as a G-to-A change 34 bp from the putative translation start site and predicted that the mutation would change a glutamic acid (E) residue to lysine (K). In the RNA-seq data, the mutation of sad1 at the genomic position 19,813,556 of chromosome 5 was confirmed. However, it turned out that the mutation occurred at the 3′ splicing acceptor recognition site of the first intron, changing the invariant AG dinucleotide to AA. Consequently, all of sad1 mRNAs were aberrantly spliced in the mutants, as visualized with the Integrative Genomics Viewer (IGV) browser [19, 20] (Figure 1C). We identified three main mutant transcripts in sad1: two with obvious aberrant 3′ splice sites (3′SSs) that respectively occurred 2 bp and 20 bp downstream of the mutated splice site; and one with the retention of the first intron (Figure 1C). All of these transcripts were validated by RT-PCR using primers that span the alternative 3′SSs, in which the corresponding events were detected in the sad1 mutant, but not in C24 (Figure 1D). Sequence analysis suggested that the transcript with aberrant 3′SSs that occurred 20 bp downstream of the mutated splice site did not alter the coding frame. It was predicted to produce one novel protein with the deletion of seven amino acids compared to the normal SAD1 protein. It seems that this mutant protein could provide some of the wild-type’s functions such that the sad1 mutation was not lethal. By contrast, the other alternative 3′SS and the intron retention led to a coding-frame shift that would generate a premature stop codon and thus would lead to truncated proteins. In the SAD1-OE plant, all these aberrantly spliced forms could be found, albeit at much lower levels than in sad1. However, normal SAD1 mRNA was overexpressed, with the transcript level more than 10-times higher than in C24, which was validated by quantitative RT-PCR (Figure 1E).
Identification of alternative splicing events in C24, sad1and SAD1-OE plants
To determine if there were any changes in pre-mRNA splicing upon the depletion or overexpression of SAD1, we first developed a pipeline to identify all AS events in C24, sad1 and SAD1-OE. The pipeline involved three steps: prediction of splice junctions, filtering of the false positive junctions and annotation of AS events. We randomly sampled 20 million uniquely mapped reads (estimated average approximately 57-times coverage on all the expressed transcripts) from each RNA-seq library for the identification or comparison of AS, respectively. This method ensured that the comparison of AS events would be performed at the same level.
To predict splice junctions, we mapped the RNA-seq reads onto the Arabidopsis genome using the software TopHat, which was designed to identify exon-exon splice junctions [21]. After the alignment, we identified 732,808 junctions from the six RNA-seq libraries. Comparison of these junctions to the gene annotation (TAIR10) revealed that about 83% of total junctions had previously been annotated, and the remaining 17% were assigned as novel junctions (Additional file 6A). However, when trying to characterize these novel and annotated junctions, we found that there was a large number of novel junctions that had short overhangs (that is, fewer than 20 bp) with the corresponding exons, while most of the annotated junctions had large overhangs, with the enrichment at approximately 90 bp (Additional file 6B). Moreover, the novel junctions had relatively low coverage compared with the annotated junctions (Additional file 6C). In general, the junctions with short overhangs and lower coverage were considered as false positives, which are often caused by non-specific or error alignment. Therefore, to distinguish between true splice junctions and false positives, we assessed the criteria based on simulated data of a set of randomly constituted junctions. To do this, we first generated a set of 80,000 splice junctions in which annotated exons from different chromosomes were randomly selected and spliced together in silico. We also constructed 119,618 annotated junctions from the gene annotation. Since the length of our sequencing reads was 101/85 bp, the splice junction sequences were determined to be 180/148 bp long (90/74 nucleotides on either side of the splice junction) to ensure an 11 bp overhang of the read mapping from one side of the junction onto the other. Alignments to the random splice junctions were considered to be false positives, because such junctions are thought to rarely exist when compared to annotated junctions. The alignment of the raw RNA-seq reads to the random junctions revealed that 99.90% of false positive junctions had an overhang size of fewer than 20 bp (Additional file 7A). In sharp contrast, the alignment to the annotated junctions indicated that most (98.60%) annotated junctions had larger overhang sizes. In addition, we estimated that 56.90% of false positive junctions had only one read spanning the junction, whereas the annotated junctions had higher read coverage (Additional file 7B). To minimize the false positive rate, we required that the overhang size must be more than 20 bp and that there be at least two reads spanning the junctions. Using these criteria, we filtered out almost all the false positive junctions (Additional file 7C). Finally, we obtained a junction data set of 52,599 confident novel junctions from the six RNA-seq libraries. Based on these junctions, we identified all the AS events including cassette exons, alternative 5′SSs, alternative 3′SSs, mutually exclusive exons, coordinate cassette exons, alternative first exons and alternative last exons (Additional file 8).
Depletion of SAD1 activates alternative splicing
We first compared the difference in AS between C24 and the sad1 mutant. By comparing the number of AS events, we found that the alternative 5′SSs and exon-skipping events were consistently promoted in the control and NaCl-treated mutants (Figure 2A; Additional file 9A). Furthermore, the number of splice junction reads from alternative 5′SSs and exon-skipping events in the mutant was significantly higher than that in the wild type (Fisher’s exact test, P <0.001) (Figure 2B; Additional file 9B). Using Fisher’s exact test on the junction read counts and the corresponding exon read counts between the wild type and the mutant, we identified 478 alternative 5′SSs and 138 exon-skipping events from 550 genes that were significantly over-represented in the control or NaCl-treated mutants; by contrast, we identified only 133 alternative 5′SSs and 41 exon-skipping events from 171 genes that were over-represented in the corresponding wild type (Additional files 10, 11, 12 and 13). These results indicated that SAD1 depletion increased alternative 5′SSs and exon-skipping events. In addition, the alternative 3′SSs showed significant increases in the NaCl-treated mutant. We identified 319 alternative 3′SSs that were over-represented in the mutant; by contrast, 142 were over-represented in the wild type (Additional files 14, 15). This result suggests that SAD1-depletion could also promote alternative 3′SSs under salt-stress conditions.
Twenty-two selected events were further validated by RT-PCR using the splicing-site-flanking primers, in which the corresponding AS events were detected in sad1 mutants, but were weakly or not presented in C24 (Figure 2C and Additional file 16). Figure 2C highlights three representative examples visualized by the IGV junction browser and validated by RT-PCR. The SBI1 (AT1G02100) gene had alternative 5′SSs in the 10th intron in sad1, but not in C24, an observation validated by RT-PCR using the forward primer that covered the splice junction and the reverse one that was located at the 11th exon. One can see that the corresponding isoform was detected in the sad1 mutant, but was not present in C24 (Figure 2C). The HINT3 (AT5G48545) gene had alternative 3′SSs in the fifth exon in the mutant sad1, which was validated by RT-PCR using a forward primer in the first exon and a reverse primer that covered the splice junction (Figure 2C). The gene PAC (AT2G48120) exhibited exon-skipping between the third and fifth exons, which was validated by RT-PCR using primers at the third and sixth exon, which meant that two different products were amplified, representing exon inclusion and skipping isoforms, respectively (Figure 2C).
Sequence analysis of these over-represented alternative 5′SSs and alternative 3′SSs (in the NaCl-treated sad1 mutant) revealed that these activated splice sites were still associated with GU and AG dinucleotides (Figure 2D; Additional file 17A), suggesting that the depletion of SAD1 did not change the accuracy of the sequence recognition of the splicing sites. When investigating the distribution of these activated splice sites, we found that alternative 5′SSs and 3′SSs were enriched in the downstream or upstream approximately 10 bp region of the dominant 5′SSs and 3′SSs, respectively (Figure 2E; Additional file 17B). This indicates that the depletion of SAD1 leads to the activation of the 5′SSs and 3′SSs proximal to the respective dominant ones. These results suggest that SAD1, as a component of U6 RNPs, may play a regulatory role in the selection of splice sites.
Interestingly, exon-skipping events also increased in sad1 mutants. When correlating each exon-skipping event with alternative 5′SSs and 3′SSs, we found that about 20% of the skipped exons simultaneously had alternative 5′SSs or 3′SSs in the mutants. This chance of occurrence was significantly higher than that expected for random sampling of all annotated exons (the probability of random occurrence was 0.02%, Fisher’s exact test, P <0.001). This result suggests a coordinated occurrence of exon-skipping and alternative splice site selection. Therefore, we considered that SAD1-depletion could simultaneously activate multiple alternative 5′SSs or 3′SSs that include not only the proximal ones, but also the distal ones, including those located at the next exons, albeit to a lesser extent. Nonetheless, the possibility that SAD1, probably as a splicing factor, may directly regulate exon-skipping in vivo could not be ruled out.
SAD1 depletion results in widespread intron retention
Based on DNA chip and RT-PCR analyses, very recent studies have suggested that the depletion of SAD1 and other LSm proteins can result in defects in intron removal [15, 16]. Nonetheless, genome-wide analyses at the single nucleotide level of splicing defects in these mutants are not available. Based on our RNA-seq data, we plotted the expression intensity of introns and exons between the wild-type C24 and sad1 mutants (Figure 3; Additional file 18). Figure 3 clearly shows a global up-regulation of intron expression in the mutants, but this was not seen for exon expression, suggesting widespread intron retention in the mutant. Ten selected events were further validated by RT-PCR using the intron-flanking primers, in which the corresponding intron retention events were detected in sad1 mutants, but were weakly or not presented in C24 (Additional file 19). Using Fisher’s exact test, we compared the counts of intron reads and the corresponding counts of exon reads between the wild type and mutants. We identified 4,610 introns from 2,737 genes that were significantly retained in the control or NaCl-treated mutants (P <0.001) (Additional file 20). By contrast, only 23 introns from 20 genes were significantly retained in the corresponding wild-type plants (Additional file 21). This result further demonstrated that SAD1 depletion results in widespread intron retention.
We next investigated if there is any influence of the splicing defects on the expression of the affected genes. Sequence analysis suggested that all of these intron retention events would generate premature stop codons in the intron-retained transcripts and, if translated, would produce truncated proteins. Although it is possible that some individual truncated proteins might still be functional, for our sequence analyses, we assumed that these intron-retained transcripts do not generate functional proteins. Through calculating the proportions of the intron-retained transcripts to the total transcripts for each gene with intron-retention in the mutant, we estimated that on average around 15% of total transcripts were with intron retention (Additional file 22). Moreover, when plotting the expression levels of the total and the functional transcripts (without intron) for each intron-retained gene between the wild type C24 and sad1 mutants (Additional files 23 and 24), we found that the expression levels of the total transcripts did not obviously change between C24 and sad1, but the functional transcripts tended to be down-regulated in the mutant. These results indicate that the splicing defects are associated with a global reduction of functional mRNAs, which might negatively affect the functions of these affected genes.
Genes with aberrant splicing in sad1are closely related to stress response and are activated by stress
We further analyzed functional categories and pathways of the genes with abnormal splicing in the sad1 mutants. We identified 3,354 genes with abnormal splicing in control or NaCl-treated sad1 mutants, the majority of which were with intron retention. Moreover, 83% of these genes were unique to either the control treatment or the NaCl treatment, suggesting that abnormal splicing may be specific to different treatments. An analysis of functional categories using the software DAVID [22, 23] revealed that these abnormally spliced genes were significantly enriched at several biological processes, including response to abiotic stimulus, response to stress, photosynthesis, and protein transport, suggesting that SAD1 is involved in multiple biological processes through regulating pre-mRNA splicing (Additional files 25 and 26). Interestingly, we observed a striking enrichment at the response-to-abiotic-stress pathways, which were commonly observed in both treatments (Figure 4A; Additional file 27). Further analysis using Genevestigator [24] showed that the stress-responsive genes with abnormal splicing in NaCl-treated sad1 mutants were closely associated with the response to salt and ABA stresses (Figure 4B); whereas, those in sad1 under the control condition were not associated with the response to salt and ABA stresses (Additional file 28), but rather related to the response to various other environmental stresses. These results not only are consistent with the salt-sensitive phenotypes of sad1 mutants, but also suggest that SAD1 plays critical roles in effectively regulating splicing of stress-responsive genes under stress conditions. Meanwhile, we found that genes with splicing defects coincided with those regulated by transcriptional activation under the respective treatments (shown in Figure 4B), which suggests that the occurrence of the splicing defects could follow or co-occur with transcriptional activation.
Further analysis using Mapman [25] suggested that genes with aberrant splicing in sad1 mutants are involved in various stress response pathways, including hormone-signaling pathways, MAPK-signaling pathways and transcription regulation (Figure 4C; Additional file 29). Notably, some important genes (such as SnRK2.1 and 2.2, SOS2, DREB2A, NHX1, WRKY33, WRKY25, STT3A, CAX1 and RCI2A) involved in stress responses were identified to have splicing defects in the sad1 mutant. Among these genes, SnRK2.1 and 2.2 encode members of SNF1-related protein kinases activated by ionic (salt) and non-ionic (mannitol) osmotic stress that are required for osmotic stress tolerance [26]; SOS2 encodes a protein kinase essential for salt tolerance [27]; DREB2A encodes a transcription factor that activates drought and salt stress-responsive genes [28]; NHX1 encodes a vacuolar sodium/proton antiporter whose overexpression increases salt tolerance in several plant species including Arabidopsis[29]; WRKY33 and WRKY25 encode plant WRKY transcription factors involved in response to salt and other stresses [30, 31]; STT3A encodes an oligosaccharyl transferase whose knockout mutants are hypersensitive to high salt conditions [32]; CAX1 encodes a high affinity vacuolar calcium antiporter and can be activated by SOS2 to integrate Ca2+ transport and salt tolerance [33]; and RCI2A (Rare-cold inducible 2A), whose product plays a role in preventing over-accumulation of excess Na+ and contributes to salt tolerance [34]. These genes showed increased intron retention in the mutants, which were also validated by RT-PCR using intron-flanking primers where the corresponding intron-retained transcripts were more obviously identified in sad1, consistent with the RNA-seq data (Figure 4D). Above all, these results suggest that genes with aberrant splicing in sad1 are closely related to stress response, which could directly or indirectly contribute to the stress sensitivity of the sad1 mutant.
Overexpression of SAD1 rescues the splicing defects in the sad1mutant and strengthens splicing accuracy under salt stress
To address the question whether the splicing defects seen in sad1 mutants result from loss of the wild-type SAD1 protein, we overexpressed the wild-type SAD1 cDNA in the sad1 mutant, and performed RNA-seq on the rescued plants (SAD1-OE). We first compared the expression levels of splice junctions in SAD1-OE, C24 and sad1. We found that the AS events previously seen in sad1 were completely or at least partially suppressed in SAD1-OE plants (Figure 5A; Additional file 30), demonstrating that overexpression of SAD1 was sufficient to rescue the sad1-dependent AS defects. While our previous study indicated that the sad1 mutation was recessive with regard to the morphological, physiological and stress-inducible gene expression phenotypes [14], we could not rule out the possibility that an isoform of the sad1 mutant protein (for example, isoform 3, Figure 1D) might have a dominant-negative effect that could be partly responsible for the SAD1-OE’s incomplete rescue of some of the splicing defects in sad1. Interestingly, when comparing the number of AS events between C24 and SAD1-OE, we found that the numbers of alternative 5′SSs, alternative 3′SSs and exon-skipping in the NaCl-treated SAD1-OE were obviously smaller than those in the corresponding C24 (Figure 5B), and the numbers of corresponding junction reads were also significantly lower (P <0.001) (Figure 5C). These results were not observed in the control treatment (Additional file 31). These observations indicate that overexpression of SAD1 could inhibit AS under salt-stress conditions. Using Fisher’s exact test, we identified 454 alternative 5′SSs, alternative 3′SSs and exon-skipping events from 434 genes that were significantly absent in NaCl-treated SAD1-OE (Additional file 32). Further analyses showed that these alternative 5′SSs and 3′SSs are still associated with GU or AG dinucleotides (Figure 5D) and enriched downstream or upstream of the dominant 5′SSs and 3′SSs (Figure 5E), suggesting that overexpression of SAD1 inhibits the usage of alternative 5′SSs and 3′SSs and promotes the usage of the dominant ones. Together with the result that SAD1-depletion activates the alternative 5′SSs and 3′SSs, we suggest that SAD1 could dynamically regulate the selection of 5′SSs and 3′SSs and control the splicing accuracy and efficiency.
We further compared the expression levels of introns in SAD1-OE with those in C24 and sad1. We found that the expression of most introns in SAD1-OE was restored to normal levels (Figure 5F; Additional file 33), demonstrating that the intron retention indeed resulted from the sad1 mutation. Furthermore, using Fisher’s exact test we identified 76 introns from 75 genes that were significantly absent in NaCl-treated SAD1-OE, but were over-represented in NaCl-treated C24 (Additional file 34). This result shows that SAD1-overexpression can increase splicing efficiency.
Overexpression of SAD1improves plant salt tolerance
In the NaCl-treated SAD1-OE plants, we identified 506 genes with decreased alternative 5′SSs, alternative 3′SSs, exon-skipping or intron retention. Analyses of the expression level for these genes demonstrated that their functional transcripts tended to be up-regulated in SAD1-OE plants, indicating that overexpression of SAD1 leads to the increase of functional mRNAs (Additional file 35). Analyses of the functional categories of these genes revealed that they were strikingly enriched in the group of ′response-to-abiotic-stimulus′ (Figure 6A; Additional file 36). More specifically, these genes were well associated with the response to salt and ABA stresses and transcriptional activation (Figure 6B). Therefore, overexpression of SAD1 can increase splicing accuracy and efficiency of stress-responsive genes under stress conditions. This result further elucidates the specific regulation of SAD1 in splicing of the stress-related genes and the potential relationship between transcription and splicing.
Further analysis suggested that these genes are involved in various stress response pathways (Additional file 37). Some of the stress-responsive genes that were more effectively spliced in SAD1-OE included ABF3/ABF2, encoding ABRE binding factors that mediate ABA-dependent stress responses [35, 36]; CIPK3, encoding CBL-interacting serine/threonine-protein kinase 3 that is involved in the resistance to abiotic stresses (for example, high salt, hyperosmotic stress) by regulating the expression of several stress-inducible genes [37]; and DREB2A, that encodes a transcriptional factor mediating high salinity- and dehydration-inducible transcription [28]. These genes have been reported to be key regulators of ABA or salt-stress responses.
With the increased splicing efficiency in these key regulators of ABA or salt-stress responses, we were curious to know whether the SAD1-OE plants would have improved tolerance to salt stress. To test this, one-week-old seedlings of C24, sad1 and SAD1-OE grown on the regular Murashige and Skoog (MS) agar medium were transferred to MS agar plates supplemented with 0 (control), 50, 100 or 200 mM NaCl. We found that SAD1-OE seedlings showed enhanced tolerance to 100 mM NaCl on vertically placed plates (Figure 6C). At 200 mM NaCl, however, root elongation of all genotypes was inhibited and seedlings were not able to survive an extended period of the stress treatment (data not shown). Measuring the root growth of the seedlings showed that the roots of SAD1-OE were longer than those of C24 and sad1 at 100 mM NaCl (Additional file 38). We also tested salt tolerance of seedlings on horizontally placed agar medium plates. Two-week-old seedlings from ½ MS media were transferred onto 200 mM NaCl media and incubated for five days. The percentage of green leaf number over total leaf number was calculated for each seedling. The data indicated that SAD1-OE seedlings had a higher percentage of green leaves, suggesting that they were significantly less damaged by the salt stress than were the wild-type or sad1 seedlings (Figure 6D). To test further whether SAD1-OE plants were tolerant to salt stress at the adult stage and in soil, we grew these seedlings in soil and irrigated with either 50, 100, 150, 200 or 400 mM NaCl solutions at intervals of four days (see Methods). After two weeks of treatment, we found that sad1 plants were very sensitive to salt stress at concentrations above 150 mM and wild-type seedlings also exhibited signs of damages at higher salt concentrations as indicated by wilty inflorescence and damaged leaves, whereas the SAD1-OE plants were not obviously affected by the stress treatment and were also taller than the wild-type plants (Figure 6E; Additional file 39). These results indicate that SAD1-overexpression improves salt tolerance, which correlates with increased splicing accuracy and efficiency of stress-responsive genes.