Genome-wide analysis of plant nat-siRNAs reveals insights into their distribution, biogenesis and function

Background Many eukaryotic genomes encode cis-natural antisense transcripts (cis-NATs). Sense and antisense transcripts may form double-stranded RNAs that are processed by the RNA interference machinery into small interfering RNAs (siRNAs). A few so-called nat-siRNAs have been reported in plants, mammals, Drosophila, and yeasts. However, many questions remain regarding the features and biogenesis of nat-siRNAs. Results Through deep sequencing, we identified more than 17,000 unique siRNAs corresponding to cis-NATs from biotic and abiotic stress-challenged Arabidopsis thaliana and 56,000 from abiotic stress-treated rice. These siRNAs were enriched in the overlapping regions of NATs and exhibited either site-specific or distributed patterns, often with strand bias. Out of 1,439 and 767 cis-NAT pairs identified in Arabidopsis and rice, respectively, 84 and 119 could generate at least 10 siRNAs per million reads from the overlapping regions. Among them, 16 cis-NAT pairs from Arabidopsis and 34 from rice gave rise to nat-siRNAs exclusively in the overlap regions. Genetic analysis showed that the overlapping double-stranded RNAs could be processed by Dicer-like 1 (DCL1) and/or DCL3. The DCL3-dependent nat-siRNAs were also dependent on RNA-dependent RNA polymerase 2 (RDR2) and plant-specific RNA polymerase IV (PolIV), whereas only a fraction of DCL1-dependent nat-siRNAs was RDR- and PolIV-dependent. Furthermore, the levels of some nat-siRNAs were regulated by specific biotic or abiotic stress conditions in Arabidopsis and rice. Conclusions Our results suggest that nat-siRNAs display distinct distribution patterns and are generated by DCL1 and/or DCL3. Our analysis further supported the existence of nat-siRNAs in plants and advanced our understanding of their characteristics.


Background
Gene regulation through small non-coding RNAs has been recognized as an important mechanism for almost all cellular processes in eukaryotes. Most small RNAs (sRNAs) are processed by RNase III-type ribonuclease Dicer-like (DCL) proteins, and incorporated into Argonaute (AGO) proteins to guide gene silencing at the transcriptional level through DNA or histone modification, and at the posttranscriptional level by mRNA cleavage and degradation or translational repression [1][2][3][4][5].
Based on their precursor structures and biogenesis, sRNAs can be categorized into microRNAs (miRNAs) and small-interfering RNAs (siRNAs). miRNAs are derived from hairpin-structured precursors and do not depend on RNA-dependent RNA polymerases (RDRs) or RNA amplification mechanisms, whereas endogenous siRNAs (endo-siRNA) are generated from long doublestranded RNAs (dsRNAs) as a result of antisense transcription or the activity of plant-specific RNA polymerases such as RNA polymerase (Pol)IV and RDRs [6][7][8].
Here we report the identification of 33,826 and 162,397 siRNAs corresponding to 17,141 and 56,209 unique sequencing reads that were derived from cis-NATs of biotic and abiotic stress-challenged Arabidopsis and abiotic stress-treated rice, respectively. These siR-NAs, displaying either a distributed or a site-specific pattern, were enriched in the overlapping regions and often exhibited strand bias. Genetic analysis revealed that nat-siRNAs in Arabidopsis were mainly generated by DCL1 and/or DCL3, and a subgroup of them was RDR-and PolIV-dependent. Furthermore, accumulation of some nat-siRNAs was regulated by biotic and abiotic stress conditions, which may contribute to plant responses to environmental stresses.

Results
Natural antisense transcripts and cis-NAT-derived siRNAs in Arabidopsis and rice We sequenced a total of 21 sRNA libraries prepared from biotic and abiotic stress-treated Arabidopsis and abiotic stress-challenged Oryza sativa plants. Biotic stress datasets for Arabidopsis included infection of non-pathogenic Pseudomonas syringae pv tomato (Pst) DC3000 hrcC -, a virulent strain Pst DC3000 carrying an empty vector, and an avirulent strain Pst DC3000 carrying an effector gene, avrRpt2, as previously described [28][29][30]. Abiotic stresses on Arabidopsis and rice included salt, cold and drought treatments (see Materials and methods). We obtained a total of more than 24.6, 23.2 and 30.9 million raw sequencing reads from these biotic and abiotic stress-treated Arabidopsis and abiotic stress-treated rice small-RNA libraries, respectively (Additional file 1). After trimming adaptor sequences and discarding reads shorter than 17 nucleotides or longer than 28 nucleotides or of low quality, we obtained 13,985,938, 14,664,923 and 17,446,592 reads from the biotic and abiotic stress-treated Arabidopsis libraries and the abiotic stress-treated rice libraries that perfectly match to the Arabidopsis and rice genomes or cDNAs, respectively (Additional file 1). Among all the small-RNA species profiled, 24-nucleotide sRNAs were the most abundant size class (Additional file 2a). The 5'first nucleotides of sRNAs preferentially are adenosine or uracil (Additional file 2c). In addition to a large number of miRNAs [28,29], we discovered many new endogenous siRNAs belonging to different siRNA categories. In this study, we focused on siRNAs that correspond to cis-NATs.
Using the Arabidopsis genome annotation (TAIR/ NCBI version 8.0) and the rice genome annotation (MSU RGAP 6.1), we searched for pairs of genes in Arabidopsis and rice that overlapped more than 25 nucleotides at the same genomic loci. We thus identified 1,439 and 767 pairs of cis-NATs in Arabidopsis and rice, respectively (Table 1). cis-NATs can be further categorized into three groups: convergent (3'-3' overlap), divergent (5'-5' overlap), and enclosed, in which one transcript is entirely encompassed by the other. Most cis-NATs (1,126 and 560 pairs in Arabidopsis and rice, respectively) are arranged in convergent orientation (Table 1). We excluded 65 and 10 cis-NAT pairs in which at least one gene encodes rRNA, tRNA, small nuclear RNA (snRNA), small nucleolar RNA (snoRNA), miRNA, ta-siRNA, or transposons in Arabidopsis and rice, respectively, from subsequent analyses. These RNA genes or transposons are considered 'hot spots' of RNA degradation or small RNA generation and may mask the real distribution of nat-siRNAs. We also analyzed the NAT genes targeted by miRNAs to eliminate the possibility that the sRNAs that matched to the transcripts are secondary siRNAs triggered by 22-nucleotide miRNAs [31,32]. We found that At2g33810 in the At2g33810/ At2g33815 pair was a target of miR156 and At1g53230 in the At1g53230/At1g53233 pair was a target of miR319. However, neither miR156 nor miR319 is 22 nucleotides, and the siRNAs generated from At2g33810 and At1g53230 were not arranged in a phasing pattern, which is a characteristic pattern of miRNA-triggered secondary siRNAs. Therefore, we retained At2g33810/ At2g33815 and At1g53230/At1g53233 in our list. A total of 1,374 NAT pairs in Arabidopsis were used in this study. Among them, 186 genes (6.8% of 2,748 NAT genes) are annotated as 'other RNAs'. They are most likely non-coding RNAs or RNAs with potential to encode very short proteins or peptides.
We searched our small RNA deep-sequencing libraries for reads that matched perfectly (that is, no mismatches) to the cis-NATs. We found that 84 (6.1% of 1,374) and 119 (15.7% of 757) of the Arabidopsis and rice cis-NAT pairs, respectively, gave rise to at least 10 sRNAs per one million total genome-mapped reads from the overlapping regions of the cis-NATs in all the libraries (Table 1; Additional file 3). Interestingly, among the 84 Arabidopsis NAT pairs, 54 pairs have one transcript (32.1% of 168 genes) annotated as 'other RNAs' (Additional file 4), suggesting that non-coding antisense RNAs are more likely to generate siRNAs. A total of 33,826 and 162,397 siRNAs corresponding to 17,059 and 56,209 unique reads were derived from cis-NATs in Arabidopsis and rice, respectively ( Table 2). As shown in Additional file 2, nat-siRNAs are mainly 21 and 24 nucleotides in length, whereas in the total small RNA population, the 24-nucleotide siRNAs are the most abundant species. The first-nucleotide distribution for the nat-siRNAs was similar to that of the total sRNAs (Additional file 2). Within the three types of cis-NATs, enclosed cis-NATs have the highest percentages for producing siRNAs, that is, 39 (27.1% of 144) and 39 (36.4% of 107) of the enclosed cis-NATs in Arabidopsis and rice, respectively, generated siRNAs (Table 1). Among the 84 and 119 cis-NAT pairs that produce more than 10 siRNAs per million reads in the overlapping regions in Arabidopsis and rice, respectively, 16 and 34 pairs (19.0% and 28.6%) have siRNAs exclusively in the overlapping regions ( Table 1, Figures 1a-i and 2a). To determine if siRNAs were truly enriched in the overlapping regions of NATs, we took into consideration the length differences between the overlapping and non-overlapping regions of cis-NATs by comparing the densities of siRNAs in these two regions. A statistical test showed that the density of siRNAs in the overlapping regions was more than six times that in the non-overlapping regions, with P-values of 0.0077 and 0.0242 in Arabidopsis and rice, respectively (see Materials and methods). To further explore whether cis-NATs have a higher likelihood to give rise to siRNAs than other protein-coding genes, we computed the densities of sRNAs in the 3'-UTR overlapping regions of convergent cis-NATs and 5'-UTR overlapping regions of divergent cis-NATs, and the siRNA densities in the 3'-UTRs and 5'-UTRs of protein-coding genes that were not involved in NATs, respectively (see Materials and methods). We then compared the density of siRNAs in the convergent cis-NATs with the density of siRNAs in the 3'-UTR of non-NAT genes in Arabidopsis; similarly, we compared divergent cis-NATs with the 5'-UTR of non-NAT genes. The densities of siRNAs in Table 1 Different types of cis-NATs identified in Arabidopsis and rice and cis-NATs that generated cis-nat-siRNAs    Zhang et al. Genome Biology 2012, 13:R20 http://genomebiology.com/2012/13/3/R20 divergent 5' and convergent 3' cis-NATs were statistically greater than that of the 5'-and 3'-UTRs of non-NAT genes, with P-values of 0.0366 and 0.0438, respectively. These results confirmed that cis-NATs were more likely than non-NAT genes to generate siRNAs. This is further supported by the study of Argonaute-associated sRNAs, which indicated that cis-NATs generated 2.29 times more sRNAs than other gene transcripts [33]. Our results were also consistent with that of animal endo-siRNAs, which are also enriched in the overlapping regions of cis-NATs, for example, more than ten-fold enrichment in Drosophila [21][22][23]25,34]. The distribution of siRNAs along the cis-NATs display two distinct patterns, one is the distributed pattern with siRNAs scattered along the overlapping regions or across the entire transcripts (Figures 1 and 2a, b), and the other is the site-specific pattern with siRNAs derived from one or a few specific sites (Figures 2c and 3) (see Materials and methods for additional details). We identified 9 site-specific patterns from Arabidopsis and 23 from rice (Additional files 5 and 6). Because RNA secondary structure analysis at these specific sites using RNAFold [35] revealed no obvious stem-loop structures, these regions were unlikely to harbor new miRNA genes or nat-miRNA genes as described [36], although we are aware of the limitation of RNAfold. Moreover, 46 and 63 (50.5% and 52.9%) cis-NAT pairs in Arabidopsis and rice, respectively, exhibited strand bias in generating nat-siRNAs with different directions (with more than two-fold change) ( Table 1).

Requirement of DCL1 and/or DCL3 for the accumulations of nat-siRNAs
To experimentally characterize the nat-siRNAs, we examined the expression of 15 new nat-siRNAs identified in Arabidopsis with relatively high numbers of reads. In addition to nat-siRNAATGB2, another 6 out of the 15 nat-siRNAs that we examined showed clear siRNA signals by small RNA Northern blot analysis using locked nucleic acid (LNA) probes (indicated by black arrows in Figures 1 and 3; probes used are listed in Additional file 7), which enabled us to study their biogenesis by examining their expression in small RNA pathway mutants. The biogenesis of the site-specific nat-siRNA generated from the divergent pair At1g51400/At1g51402 (referred to as nat-siRNA1g51400; Figure 3c) was dependent on DCL1  and HEN1, but did not require any RDRs or PolIV (Figure 4a). Similarly, the nat-siRNA generated from the convergent pair At2g41590/At2g41600 with a distributed pattern (referred to as nat-siRNA2g41590; Figure 1j) was also DCL1-dependent but RDR-and PolIV-independent ( Figure 4b). Both nat-siRNAs were 21 nucleotides in length. Previous studies have shown that the biogenesis of bacterial-induced nat-siRNAATGB2, and the accumulation of some of the salt-induced nat-siRNASRO5 and development-associated nat-siRNAKPL also depended on DCL1 [14,16,17]. We found that the convergent NAT pair At1g13330/ At1g13340 ( Figure 1o) could produce two classes of siR-NAs, 21-nucleotide and 24-nucleotide siRNAs (referred to as nat-siRNA1g13340) from the same site ( Figure 5a). Genetic analysis revealed that the 21-nucleotide siRNAs were DCL1-dependent, whereas the 24-nucleotide siR-NAs were DCL3-dependent. Furthermore, we found Figure 4 Biogenesis of nat-siRNAs that depend on DCL1. (a-c) Accumulation of nat-siRNA1g51400 (a), nat-siRNA2g41590 (b) and nat-lsiRNA4g34070 (c) is shown in various dcl, rdr, polIV and polV mutants. Total RNA (75 to 100 μg) was used for Northern blot analysis. LNA probes complementary to each nat-siRNA were used. U6 was an internal control to show equal loading. The same results were obtained from two biological replicates.
that two other 24-nucleotide nat-siRNAs with a distributed pattern, nat-siRNA1g28100 generated from NAT pair At1g28100/At1g28110 (Figure 1l) and nat-siR-NA1g17745 generated from NAT pair At1g17744/ At1g17745 (Figure 1n), were also dependent on DCL3 (Figure 5b, c). Interestingly, all the DCL3-dependent 24nucleotide nat-siRNAs examined are dependent on RDR2 and PolIV ( Figure 5), while only some of the 21nucleotide nat-siRNAs are dependent on RDRs and PolIV [14,16,17]. Deep sequencing results showed that 82 out of the 84 cis-NAT pairs gave rise to two classes of siRNAs, 20-to 22-nucleotide and 23-to 28-nucleotide siRNAs in the overlapping regions, while the remaining two pairs gave rise to the 20-to 22-nucleotide class only (Additional file 4). Thus, our results suggested that the cis-NATs have the potential to be processed by both DCL1 and DCL3.

nat-siRNAs generated from introns
In both plants and animals, nat-siRNAs are derived from NAT mRNAs and mainly function post-transcriptionally [16,17,23]. This was also supported by our observation that some NATs with distributed nat-siR-NAs predominantly appeared in exonic regions of protein coding genes, for example, At2g33810/At2g33815, At1g13330/At1g13340, OS08g06220/OS08g06230, and At4g35850/At4g35860 (Figures 1g, o, 2b and Figure 3a). However, based on the Arabidopsis genome annotation (TAIR version 8), we found that 383 of the 1,374 cis-NAT pairs (27.9%) contain introns in the overlapping regions. Among them, 45 (11.7%) have siRNAs mapped to introns. Examples are shown for At2g41590/ At2g41600, AT434070/At4g34071 and At2g02795/ At2g02800 (Figure 1j, k, m). In rice, 346 of 757 (45.7%) cis-NAT pairs contain introns in the overlapping regions; one example is shown for OS04g57140/ OS04g57160 in Figure 2c. A substantial number of nat-siRNAs originated from such introns within NATs, as listed in Table 2 and Additional files 8 and 9. Similarly, it has been reported that a large number of siRNAs are also generated from the introns of several cis-NATs in Drosophila [23]. It is possible that the intronic nat-siR-NAs are generated from pre-mRNAs in the nucleus before the introns are spliced out.
To confirm these intron-derived nat-siRNAs, we validated the candidates in small RNA biogenesis mutants. Although the total number of siRNAs from the entire overlapping regions might be large, the reads for individual siRNAs were still low. Most of the siRNAs examined were below the detection level of Northern blot analysis except one siRNA that is derived from an intron-exon junction of At4g34070 in the overlapping region of the divergent NAT pair At4g34070/At4g34071 (Figure 1k). We detected a distinct 30-nucleotide siRNA band using a 24-nucleotide probe. Because we used a 18-to 26-nucleotide size-fractionated small-RNA population for constructing our small-RNA libraries, lsiRNAs were excluded from our small-RNA sequencing dataset. To determine whether it was a new lsiRNA like the ones we detected before [13], or just a degradation product, we examined the dependence of this 30-nucleotide siRNA on components of the small RNA biogenesis pathways. We found that the generation of this siRNA depended on DCL1 and HEN1, but not on DCL3, or any RDRs, PolIV or PolV, indicating that it was a new lsiRNA but not a hc-siRNA ( Figure 4c). It is worth noting that the reported AtlsiRNA-1 is also dependent on DCL1 and HEN1 [13]. These results suggested that a subgroup of nat-siRNAs was likely to be generated from pre-mRNAs.

Some nat-siRNAs regulate the expression of their cognate NAT mRNAs in cis
We found that siRNAs from some cis-NATs accumulated to different levels in different libraries (Additional files 10 and 11). Eighty cis-NAT pairs in Arabidopsis and 37 cis-NAT pairs in rice displayed a more than two-fold difference of siRNA accumulation in the overlap regions under at least one stress condition as compared with mock-treated samples. This result suggested that these nat-siRNAs were likely to regulate expression of the NATs under stress conditions, although more experiments are needed to validate and confirm these changes.
To determine the regulatory function of nat-siRNAs on their NAT transcripts in cis, we performed computational analysis using dcl1-7 and dcl3-1 (inflorescence tissue) microarray data generated by the Carrington lab [37] (see Materials and methods), because DCL1 and DCL3 are responsible for nat-siRNA biogenesis. Of the 84 pairs of cis-NAT genes, 93 genes from 78 pairs were present on the 22-K microarray chip. Of the 93 cis-NAT genes, 22 (23.7%) exhibited a greater than 1.5-fold upregulation in the dcl1-7 mutant compared to the wild type (WT) control (P-value < 0.05; Additional file 12). As a comparison, only 176 out of 2,215 (7.9%) total NAT genes were up-regulated. Although we cannot absolutely rule out the possible small RNA-independent function of DCL1 on the accumulation of miRNA targets [38], the result that more siRNA-associated NATs (23.5%) were regulated by DCL1 than total NATs (7.9%) strongly suggests that these small RNA-producing cis-NATs were more prone to down-regulation by DCL1dependent nat-siRNAs. However, we did not find any of the 93 NAT genes that were up-regulated more than 1.5-fold in dcl3, suggesting that the DCL3-dependent nat-siRNAs may not be directly involved in the expression regulation of the NAT transcripts.
To confirm the results from the microarray analysis and to examine the expression of some siRNA-targeting NAT transcripts that were not present on the chip or were not identified from the analysis, we used real-time RT-PCR to examine their expression in the WT, dcl3-1 and dcl1-7 fwf2 double mutant plants. We chose to use the double mutant instead of the dcl1-7 single mutant because the fwf2 mutant, which carries a mutation in the ARF8 gene [39,40], largely rescued the pleiotropy and fertility defects of dcl1-7 [41,42]. We found that the accumulation of At1g51402 and At1g13330 increased six and four times more, respectively, in the dcl1-7 fwf2 mutant than in WT plants ( Figure 6), indicating that the antisense transcripts were no longer suppressed when nat-siRNA1g51400 and nat-siRNA1g13340 were not produced. For At2g41590/At2g41600 and At4g34070/ At4g34071 pairs that produce siRNAs from both strands and have less strand bias, both sense and antisense transcripts were up-regulated in the dcl1-7 fwf2 mutant compared to the WT, suggesting that both transcripts were under the regulation of nat-siRNAs ( Figure 6). Although it has been shown that the arf8 mutation does not interfere with miRNA-mediated gene silencing or RNAi or small RNA generation [39,40], it is still possible that up-regulation of these NAT transcripts in the dcl1-7 fwf2 double mutant is due to the fwf2 mutation that is sRNA-independent. To test this possibility, we also examined the expression levels of these NAT genes in the fwf2 single mutant. As shown in Additional file 13, in the fwf2 single mutant, there was no clear induction of these NAT transcripts except for At1g13340. Both transcripts of the At1g13340/Ag1g13330 pair were induced in the dcl1-7/fwf2 double mutant and At1g13340 was induced to a similar extent in both single and double mutants (about two-fold), suggesting that the up-regulation of At1g13340 was mainly due to the arf8 mutation whereas induction of At1g13330 was attributable to the de-repression of small RNA-mediated silencing by the mutation in DCL1. sRNAs generated from this pair had a strong strand bias (Figure 1o), were predominantly generated from At1g13340 and targeted At1g13330. These results correlated well with our realtime RT-PCR results. In summary, these data indicated that the expression of NAT transcripts was regulated by the small RNAs generated from DCL1.
We found that only three out of the ten transcripts that we tested showed an increase in the dcl3-1 mutant compared to WT (Figure 6), suggesting that some 24nucleotide nat-siRNAs are still able to regulate the expression of NAT transcripts, although a majority of the NATs were not affected by the 24-nucleotide siRNAs.
To further analyze the function of these nat-siRNAs, we searched the publicly available deep sequencing datasets of the AGO-associated small-RNAs in Arabidopsis for the nat-siRNAs in the overlap regions we identified [43,44]. As shown in Additional file 14, the 20-to 22nucleotide nat-siRNAs were mainly loaded into AGO2, the Argonaute protein that predominately binds 21nucleotide sRNAs and contributes to anti-viral and antibacterial resistances [45][46][47][48]. Interestingly, the 23-to 26nucleotide long nat-siRNAs were loaded into both AGO1 and AGO4, the Argonaute proteins responsible for mRNA cleavage and DNA methylation, respectively [49,50].

Discussion
We identified 17,141 and 56,209 unique nat-siRNA sequences originating from cis-NATs in biotic and abiotic stressed Arabidopsis plants and abiotic stress-challenged rice, respectively. These siRNAs are enriched in the overlapping regions of cis-NAT pairs and display distributed or site-specific patterns.
Our biogenesis analysis suggested that the siRNAs from cis-NATs were dependent on DCL1 and/or DCL3. Deep sequencing results showed that 82 out of the 84 cis-NAT pairs that generated siRNAs in the overlapping regions gave rise to two classes of siRNAs, 20-to 22nucleotide and 23-to 28-nucleotide nat-siRNAs, while only two pairs of NATs gave rise to one class of 20-to 22-nucleotide siRNAs (Additional file 4). The 20-to 22nucleotide nat-siRNAs are generated by DCL1, whereas the 23-to 28-nucleotide nat-siRNAs were DCL3-dependent (Figures 4 and 5). We also provide an example that a DCL1-dependent 21-nucleotide nat-siRNA and a DCL3-dependent 24-nucleotide nat-siRNA were generated from the same site in the overlapping region of a NAT pair (Figure 5a). These two classes of sRNAs have also been observed in transgene silencing, viral RNA silencing and endogenous inverted repeat (IR) loci, where long double-stranded precursors or intermediates are formed [51][52][53][54][55]. Different kinds of transgenic constructs can generate different types of sRNAs: sense transgenes mainly generate 21-and 22-nucleotide Figure 6 Expression of the NAT transcripts is increased in dcl1-7 fwf2 and dcl3-1 mutants. Expression was examined by quantitative RT-PCR and Actin2 was used as an internal control. Total RNA (5 μg) was treated with DNaseI and then subjected to reverse transcription. Error bars indicate standard deviations derived from three technical replicates. Similar results were obtained from two biological replicates.
By examining the expression of NAT transcripts that are complementary to the siRNAs and analyzing published microarray data, we found that many 21-nucleotide nat-siRNAs were able to regulate the accumulation of their target transcripts ( Figure 6; Additional file 12), which was consistent with the function of reported 21nucleotide nat-siRNAs processed by DCL1 [14,16,17]. However, only a small number of NATs targeted by siR-NAs were up-regulated in the dcl3 mutant, suggesting that only a fraction of the 24-nucleotide class of siRNAs may regulate gene expression directly. Similar results were observed in the viral-derived 24-nucleotide siR-NAs. Although 24-nucleotide viral siRNAs are as abundant as 21-nucleotide and/or 22-nucleotide siRNAs, they can neither silence viral RNAs nor suppress viral infection [60][61][62][63][64]. Endogenous 24-nucleotide siRNAs are mainly generated by the DCL3/RDR2/PolIV pathway, and are often associated with AGO4 and AGO6 and mainly direct DNA methylation or chromatin modification [49,50,54,66]. We found that the generation of the DCL3-dependent 23-to 28-nucleotide nat-siRNAs also required RDR2 and PolIV. Further studies are necessary to determine whether these 23-to 28-nucleotide nat-siRNAs can also direct DNA methylation or histone modification at their NAT loci.
We observed two major distribution patterns of nat-siRNAs -distributed and site-specific -in Arabidopsis and rice. We found that the site-specific nat-siRNAs tested were generated by DCL1. Because of its predominant role in processing hairpin structures into miRNAs, DCL1 is generally assumed to be capable of recognizing certain RNA secondary structures. The sense and antisense transcripts in a NAT pair are transcribed separately, and their overlapping regions may interact to form a dsRNA. In this partially dsRNA duplex, complex secondary structures may arise, which are recognized by the versatile DCL1 enzyme, and thus are processed to yield site-specific nat-siRNAs, or possibly distributed nat-siRNAs, depending on what secondary structures are formed and recognized by DCL1. It is also possible that these apparently site-specific nat-siRNAs may be generated across the whole overlapping region first as the distributed ones, but only specific siRNAs from particular sites are stable and thus accumulated, while the rest are degraded.
We showed that RDRs were not always required for nat-siRNA formation (Figure 4a-c), which suggested that the dsRNAs formed within the overlapping regions of the sense and antisense transcripts were sufficient for producing nat-siRNAs in such cases. As for the nat-siR-NAs that do require RDRs [13,16,17], they were not completely eliminated in the rdr mutants. The partial dependence on RDRs suggested that these nat-siRNAs could be primarily generated from the dsRNA regions formed within overlapping sense-antisense transcripts, and then may be amplified by RDRs in a secondary amplification step. However, because the enrichment of cis-NAT-generated siRNAs compared to all genic siR-NAs is relatively weak, one should not automatically assume that siRNA clusters corresponding to the cis-NAT genes are always caused by the cis-NAT configuration; it is possible that some siRNAs may coincidentally be in a cis-NAT region. Nevertheless, our observation of some siRNA clusters that appear exclusively in the overlapping regions of NATs strongly supports that at least some siRNAs are indeed caused by the cis-NAT configuration.
We also found that many nat-siRNAs were derived from introns or intron-exon junctions. It has been shown that the sense/antisense dsRNA pairs in vertebrates are processed in the nucleus but not the cytoplasm [67,68]. Therefore, some plant nat-siRNAs may also be produced in the nucleus before introns are spliced out from the NATs. Moreover, we found a new Arabidopsis lsiRNA, lsiRNA4g34070, which was generated from the exon-intron junction of a NAT pair. Similar to AtlsiRNA-1, lsiRNA4g34070 was processed by DCL1.
The extent of transcript overlap within each cis-NAT may differ from the TAIR annotations and requires careful examinations. For example, recent 3' rapid amplification of cDNA ends (RACE) experiments revealed that the major SRO5 transcripts (data not shown) are shorter than previously indicated [17], although the experiments also showed a longer SRO5 transcript that extends beyond the detected nat-siRNAs (GenBank accession JQ513374). Some genes such as AGO3 were annotated in the TAIR database as having no 3'-UTRs, but longer 3'-UTRs for some of these genes have been recovered by 3'-RACE in our labs and others. It appears that many genes have multiple forms of transcripts with various lengths of 3'-UTRs.
Plant nat-siRNAs have been shown to regulate gene expression in stress responses or under specific developmental stages [16,17]. In this study, we identified new nat-siRNAs that appeared to regulate the expression of their NAT mRNAs, which may contribute to gene expression reprogramming and fine-tuning in response to environmental stresses. NATs are widespread in eukaryotic organisms [19,[69][70][71][72]. The expression of antisense transcripts may differ among different cell types or under different environmental conditions [19,73]. It is likely that different sets of nat-siRNAs may be generated from NATs in response to different developmental and environmental cues, and therefore play a broad regulatory role in gene expression under specific conditions. Several reported nat-siRNAs, such as nat-siRNASRO5, nat-siRNAATGB2 and nat-siRNAKPL, are induced under stress or appear at specific developmental stages and suppress the expression of their antisense transcripts post-transcriptionally [14,[16][17][18]69]. Similar examples have also been observed in animal systems, for example, the siRNAs from the SIC34a1 gene pair are only generated from mouse kidney and testis [67]. Moreover, many cis-NAT gene pairs in mammalian brain, including the genes involved in Alzheimer's disease and LRRTM1/Ctnna2, a pair of cis-NAT genes that participates in schizophrenia, can generate nat-siRNAs [74]. A lot of these nat-siRNAs are up-regulated in olfactory discrimination training [74]. Therefore, most of the reported nat-siRNAs are generated under specific developmental or environmental conditions. In this study, we detected bacterial-induced nat-siRNAATGB2 [16], as well as another siRNA that is in phase with nat-siRNAATGB2 (Figure 3a; Additional file 10) by deep sequencing, whereas the developmental stage-specific nat-siRNA nat-siRNAKPL was not present in our database [14]. The salt-induced nat-siRNAs from the SRO5-P5CDH pair were also present in this dataset, but the number of reads was below our cutoff threshold (minimum ten reads per million total match reads in the overlapping region), so it was not included in our list. In order to be consistent with other abiotic stress treatment, the salt treatment condition in this study was different from that in the published work [17]. However, more SRO5-P5CDH siRNAs were detected from two other salt-treated small RNA libraries in a separate study (Additional file 15; Gene Expression Omnibus (GEO) accession number GSE33642). It is worth noting that our genome-wide analysis on nat-siRNA-targeting NAT transcripts used the microarray result generated from healthy inflorescences (GEO accession number GSE2473), which may miss some NAT transcripts that are regulated by siRNAs at a different developmental stage or under specific environmental conditions. In addition, we cannot rule out the possibility that some of the nat-siRNAs regulate the expression of their targets by translational inhibition and therefore cannot be directly detected at the mRNA level by microarray or real-time RT-PCR experiments.

Conclusion
Our studies of plant sRNAs showed that those derived from plant cis-NATs were enriched in their overlap regions and cis-NATs are more likely to generate sRNAs than other transcripts. Taken together with previously published data on plants, Drosophila and yeast, our results strongly support the existence of nat-siRNAs in eukaryotic organisms. Our genetic analysis indicated that cis-NATs were processed by DCL1 and/or DCL3. By analyzing a large number of small RNA libraries from stress-challenged plants, we found that some nat-siRNAs were likely to respond to different environmental stresses, and therefore contributed to plant stress resistance, development or other cellular processes by regulating corresponding NAT mRNAs. nat-siRNAmediated gene regulation is one of the regulatory mechanisms for at least a subgroup of cis-NATs.

Small-RNA library construction and deep sequencing
Bacteria infiltration was carried out on the leaves of 4week-old Arabidopsis WT and mutant plants as described previously [46]. Briefly, leaves of 4-week-old Arabidopsis plants growing at 23°C with 12 h light were infiltrated with mock (10 mM MgCl 2 ), non-pathogenic strain Pst DC3000 hrcC -, Pst DC3000 (empty vector) and Pst DC3000 (avrRpt2) at a concentration of 2 × 10 7 cfu/ml. The leaves were collected at 6 and 14 hours post-inoculation.
For abiotic stress treatment in Arabidopsis, plants grew at 23°C with 16 h light for 19 days, and were then divided into four groups. One of the groups was waterdeprived (drought treatment) while the others were maintained with regular irrigation. At day 29, one group of plants was treated with salt solution (200 mM NaCl as irrigation solution for 24 h). Another group of plants was transferred to a growth chamber at 5°C and 16 h light for 24 h for cold treatment. The fourth group of plants was always maintained under regular conditions as a normal control. At day 30, all four groups of plants were collected. For small RNA library construction, the whole shoots of plants (stems, leaves and inflorescences) were used.
Three-month-old rice growing at 28°C and approximately 13 h of natural light were treated with drought (water withholding for 3 days), cold (5°C for 24 h) and salt (400 mM NaCl for 24 h, added as irrigation solution). The inflorescences of these plants were collected and used for sRNA library construction.
sRNA extraction and library construction were carried out as described previously [75,76]. Briefly, total RNA was isolated from infiltrated leaves and fractionated on 15% denaturing polyacrylamide gel. RNA molecules ranging from 18 to 26 nucleotides were excised and ligated to 5'-and 3'-RNA adaptors using T4 RNA ligase followed by RT-PCR and gel purification as described in the instructions from Illumina Inc. The small RNA libraries were sequenced by Illumina Inc. and UCR core facility.

Northern blot analysis of siRNAs
We used LNA probes for small RNA Northern blot analysis to detect nat-siRNAs. A total of 100 μg total RNA was separated on a 17% acrylimide gel. The blots were probed at 39°C with PerfectHyb Plus Hybridization Buffer (Sigma-Aldrich, Saint Louis, MO, USA). The sequences of the oligos were listed in Additional file 7.

Quantitative RT-PCR analysis of small RNA targets
For QRT-PCR analysis, 5 μg total DNaseI-treated (Invitrogen, Grand Island, NY, USA) RNA was used for synthesizing cDNA using Oligo dT and SuperScript II (Invitrogen). Amplification of small RNA targets was carried out using a real-time PCR machine (MyiQ, Bio-Rad, Hercules, CA, USA). The sequences of primers used here are listed in Additional file 7.

Pre-processing of deep sequencing data
Raw sequence reads were parsed to remove 3'-adaptors using an in-house program. Unique sequences from Arabidopsis small-RNA libraries were mapped to the Arabidopsis chromosome sequences [77]. To produce candidate datasets of sRNAs, we removed sequencing reads that matched to rRNAs, tRNAs, snRNAs, snoR-NAs, transposons, retrotransposons, or repeats. rRNA, snRNA and snoRNA sequences were obtained from TAIR8 genome annotation, and tRNA sequences were downloaded from the Arabidopsis tRNA database [78] and TAIR [77]. Transposons, retrotransposons, and repeat sequences were downloaded from the TIGR v2 Arabidopsis Repeat Database [79] and RepBase [80].
The rice genome and annotation were retrieved from the MSU Rice Genome Annotation Project V6.1 (RGAP 6.1). Non-coding RNA (tRNA, rRNA, snoRNA and snRNA) were collected from fRNAdb [81] and the UCSC tRNA database. Transposons, retrotransposons and repeat sequences were based on transposable elements of MSU RGAP 6.1 and RepBase [80].

Analysis of enrichment of endogenous sRNAs in the cis-NATs
The enrichment of sRNAs in the overlapping regions of NATs was examined by the method outlined in [69]. The sRNAs in the combined libraries for Arabidopsis or rice were used here. Briefly, for an l-nucleotide genic or genomic region that gives rise to n sRNAs, we defined n/l as the density of small RNA loci in this region. For each pair of cis-NATs, we counted the number of sRNAs mapping to the overlapping region, N o , and the total number of sRNAs matching the non-overlapping regions of the two genes, N g , and measured the length of the overlapping region, L o , and the total length of non-overlapping regions of the two genes, L g . Then we computed the density of small-RNA loci in the overlapping region, N o /L o , and that of the non-overlapping regions of the two NAT genes, N g /L g . The one-tail paired two-sample t-test was applied to the two-paired samples of siRNA density, N o /L o and N g /L g . The average densities in the overlapping regions (A o ) and in the nonoverlapping regions of the NAT genes (A g ) that spawn sRNAs were computed. The ratio A o /A g was considered as the enrichment score.
In order to explore whether cis-NATs were more likely to give rise to siRNAs than other protein-coding genes, we calculated the average densities in the overlapping regions (A o ) of cis-NATs. We then calculated the average densities A u of a set of randomly chosen regions from the other protein-coding genes, which have the same size as the cis-NATs. Statistical significance was measured by a P-value computed as the frequency of the occurrences that A u was greater than A o .
We applied this method to compare the siRNA density of convergent cis-NATs with that of 3'-UTRs and the siRNA density of divergent cis-NATs with that of 5'-UTRs.
To analyze the regulation of nat-siRNAs under a specific condition, the sRNA reads in a single library that matched 100% to the genome were normalized to one million. Then the normalized sRNA abundances were compared between different libraries. Furthermore, we calculated the maximum fold change among all treated samples compared to corresponding control conditions as a supplemental value (Additional files 10 and 11).

Classification of nat-siRNAs
To classify the cis-NAT pairs into distributed and sitespecific patterns, we first clustered sequencing reads mapped to regions of cis-NATs. Starting from the first reads closest to the 5' end of a cis-NAT, reads were clustered if the positions of their first nucleotides were within a ten-nucleotide long segment. Clusters with more than five reads were retained for further analysis. We calculated two metrics for each cis-NAT: 1) the number of small-RNA clusters and 2) the percentage of the number of reads within these clusters relative to the total number of reads mapped to the whole cis-NAT. We then categorized a cis-NAT to have a site-specific pattern if 1) it had no more than ten siRNA clusters and 2) the percentage of the reads within the clusters was greater than 50%; otherwise we categorized the cis-NAT to have a distributed pattern. The criterion of 10 siRNA clusters was chosen based on the observation that the frequency of clusters of 84 Arabidopsis cis-NATs has a clear separation between 10 and above (Additional file 5a). The percentage criterion (more than 50% sequence reads) was determined based on a visual exanimation of the relationship between the number of clusters and the frequencies of reads within the clusters in the 84 cis-NATS, as plotted Additional file 5b. As shown in the figure, this relationship seemed to follow two distributions, one is a linear correlation represented by the line in Additional file 5b and the other is a group with no more than ten clusters that retain more than 50% of the total sequence reads. The second distribution thus formed site-specific patterns of siRNAs. We thus used these two criteria because a site-specific pattern should have relatively more reads appearing in a relatively smaller number of clusters than a distributed pattern.

Microarray data and data analysis
The processed microarray data of dcl1-7 and Ler WT Arabidopsis were downloaded from GEO accession number GSE2473 (samples GSM47014, GSM47015, GSM47016, GSM47017, GSM47018, GSM47019, GSM47023, GSM47024, GSM47025, GSM47026 and GSM47027). We identified differentially expressed genes using Rank Product [82]. The rationale behind Rank Product is that it is unlikely for a gene to be ranked in the top position in all replicates if the gene was not differentially expressed. It has been shown that Rank Product is less sensitive to noise and has a better performance than other methods when sample size is small [83]. We selected differentially expressed genes with a false discovery rate no greater than 0.05 for 1,000 permutations. Differentially expressed probe sets were then mapped to corresponding genes according to the annotation of the Affymatrix ATH1-121501 chip.

Analysis of association of AGOs and nat-siRNAs
The processed AGO data for Arabidopsis were downloaded from the GEO databases under accession numbers GSE10036 and GSE12037. The number of AGOassociated nat-siRNAs was obtained by perfectly mapping the identified nat-siRNAs to each of the AGO pulldown datasets separately. The number was then normalized by the total number of AGO reads in an AGO dataset that can be perfectly mapped to the genome raised to per million for normalization. The enrichment of nat-siRNAs in an AGO pull-down relative to the other AGOs was calculated as the percentage of the normalized nat-siRNAs in the AGO of the total normalized nat-siRNAs in all AGO pull-downs.

Identification of nat-siRNA regulated cis-NATs
The nat-siRNA regulated cis-NATs were identified from the set of differentially expressed genes from the microarray data of the dcl1-7 and Ler WT control Arabidopsis plants. The analysis was carried out based on two criteria: 1) more than 1.5-fold up-regulation with a P-value less than 0.05 in the dcl1-7 mutant and 2) presence of nat-siRNAs produced by the cis-NATs on the strands opposite to the NAT transcripts. The rationale behind these criteria is that the main function of DCL1-generated nat-siRNAs is to target cis-NATs on the opposite strands to down-regulate their expression. Thus, in a DCL1 mutant where no such nat-siRNAs are produced, the expressions of the targeted NATs are expected to be up-regulated.

Additional material
Additional file 1: Distributions of the sequencing reads from small RNA libraries of abiotic and biotic treated Arabidopsis and abiotic challenged rice. Shown in the table are the total number of raw sequencing reads (total), the number of qualified reads that can map perfectly to the corresponding Arabidopsis or rice genome (mapped), intergenic regions (intergenic), intronic sequences (introns), transposons or repeats (repeats/mobile), tRNA, rRNA, snoRNA and snRNA sequences (ncRNA), trans-acting siRNA (ta-siRNA), microRNA sequences (miRNAs),