Genome mapping and expression analyses of human intronic noncoding RNAs reveal tissue-specific patterns and enrichment in genes related to regulation of transcription
© Nakaya et al.; licensee BioMed Central Ltd. 2007
Received: 17 October 2006
Accepted: 26 March 2007
Published: 26 March 2007
RNAs transcribed from intronic regions of genes are involved in a number of processes related to post-transcriptional control of gene expression. However, the complement of human genes in which introns are transcribed, and the number of intronic transcriptional units and their tissue expression patterns are not known.
A survey of mRNA and EST public databases revealed more than 55,000 totally intronic noncoding (TIN) RNAs transcribed from the introns of 74% of all unique RefSeq genes. Guided by this information, we designed an oligoarray platform containing sense and antisense probes for each of 7,135 randomly selected TIN transcripts plus the corresponding protein-coding genes. We identified exonic and intronic tissue-specific expression signatures for human liver, prostate and kidney. The most highly expressed antisense TIN RNAs were transcribed from introns of protein-coding genes significantly enriched (p = 0.002 to 0.022) in the 'Regulation of transcription' Gene Ontology category. RNA polymerase II inhibition resulted in increased expression of a fraction of intronic RNAs in cell cultures, suggesting that other RNA polymerases may be involved in their biosynthesis. Members of a subset of intronic and protein-coding signatures transcribed from the same genomic loci have correlated expression patterns, suggesting that intronic RNAs regulate the abundance or the pattern of exon usage in protein-coding messages.
We have identified diverse intronic RNA expression patterns, pointing to distinct regulatory roles. This gene-oriented approach, using a combined intron-exon oligoarray, should permit further comparative analysis of intronic transcription under various physiological and pathological conditions, thus advancing current knowledge about the biological functions of these noncoding RNAs.
The five million expressed sequence tags (ESTs) deposited into public sequence databases probably constitute the best representation of the human transcriptome. Human EST data have been extensively used to identify novel genes in silico [1, 2] and novel exons of protein-coding genes [3–6]. Informatics analyses of the EST collection mapped to the human genome have also shown that the occurrence of overlapping sense/antisense transcription is widespread [7–9]. However, the complement of unspliced human transcripts that map exclusively to introns was not appreciated in those reports because the authors selected: transcripts with evidence of splicing ; pairs of sense-antisense messages for which at least one exon was colinear on the genome sequence ; or only ESTs where both a polyadenylation signal and a poly(A) tail were present .
A detailed analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs revealed that 15,815 are noncoding RNAs (ncRNAs), of which 71% are unspliced/single exon, indicating that ncRNA is a major component of the transcriptome . The recent completion and detailed annotation of the euchromatic sequence of the human genome has identified 20,000 to 25,000 protein-coding genes ; however, noncoding messages were not assessed . Extrapolation from the numbers for chromosome 7 leads to an estimate of 3,700 human ncRNAs , and two databases of human and murine noncoding RNAs are available [13, 14]. Nevertheless, there has been no comprehensive count and mapping of human noncoding RNAs.
Examples of long (0.6-2 kb) intronic noncoding RNAs involved in different biological processes are described in the literature; they participate in the transcriptional or post-transcriptional control of gene expression [15, 16], and in the regulation of exon-skipping  and intron retention . In addition, microarray experiments performed by our group have revealed a set of long intronic ncRNAs whose expression is correlated to the degree of malignancy in prostate cancer . Introns are also the sources of short ncRNAs that have been characterized as microRNAs  and small nucleolar RNAs (snoRNAs) . Biogenesis and function are better understood for microRNAs than for other ncRNAs; they may regulate as many as one-third of human genes , and tissue-specific expression signatures have been identified in different human cancers . However, the complement and biological functions of most of the complex and diverse ncRNA output, both the short and the long ncRNAs, remain to be determined.
Different types of noncoding RNA genes can be transcribed by either RNA polymerase (RNAP) I, II or III . Recently, a fourth nuclear RNAP consisting of an isoform of the human single-polypeptide mitochondrial RNAP, named spRNAP IV, was found to transcribe a small fraction of mRNAs in human cells . Surprisingly, α-amanitin up-regulates the transcription of protein-coding mRNAs by this polymerase . The role of spRNAP IV in the transcription of ncRNAs has not been investigated.
Here we report a search for hitherto unidentified exclusively intronic unspliced RNA transcripts in the collection of transcribed human sequences available at GenBank. The characterization comprises the identification and distribution analysis of 55,000 long intronic ncRNAs over the introns of protein-coding genes and the detection of a higher frequency of alternatively spliced exons for genes that undergo intronic transcription. An oligoarray with 44,000 elements representing exons of protein-coding genes and the corresponding actively transcribed introns was employed to assess intronic transcription in different human tissues. Robust tissue signatures of exonic and intronic expression were detected in human kidney, prostate and liver. We found that in each tissue, the most highly expressed exclusively intronic antisense RNAs were transcribed from a group of protein-coding genes that is significantly enriched in the 'Regulation of transcription' Gene Ontology (GO) category. A subset of partially intronic antisense ncRNAs and the corresponding overlapping protein-coding exons showed a correlated pattern of tissue expression, indicating that intronic RNAs may have a role in regulating abundance or alternative exon-splicing events. Finally, we found that a significant fraction of wholly or partially intronic ncRNAs is insensitive to RNAP II inhibition by α-amanitin, and another fraction is even up-regulated when RNAP II transcription is blocked, suggesting that a portion of long ncRNAs may be transcribed by spRNAP IV. We conclude that oligoarray-based gene-oriented analysis of intronic transcription is a powerful tool for identifying novel potentially functional noncoding RNAs.
Defining a comprehensive reference dataset of spliced protein-coding genes
Evidence of intronic transcription in the human mRNA/RefSeq GenBank dataset
mRNA clusters with overlap to exons of non-redundant RefSeq dataset*
mRNA clusters wholly intronic to non-redundant RefSeq dataset
mRNA clusters not mapped to RefSeq dataset
Spliced mRNA clusters†
Unspliced mRNA clusters†
A detailed analysis of the mapping coordinates of these mRNA clusters with respect to the non-redundant RefSeq dataset revealed that 11,361 spliced and unspliced clusters mapped outside the non-redundant RefSeq dataset, representing less well-characterized human transcripts. As expected, most of the mRNA clusters (14,575) were spliced and mapped to exons of RefSeq genes in the sense direction (Table 1). In addition, 2,559 spliced mRNA clusters mapped in the antisense direction with respect to the non-redundant RefSeq dataset, suggesting that 16% of the RefSeq genes have spliced natural antisense transcripts that overlap at least one of their exons. Among these antisense messages, 1,414 are already annotated as RefSeq transcripts. Such genomic organization of sense-antisense gene pairs seems to have been conserved throughout vertebrate evolution [7, 8, 24, 25]. When the unspliced mRNA clusters were included, we found a total of 4,231 antisense messages with overlaps to exons in RefSeq genes, indicating that as many as 27% of the latter have antisense counterparts. A complete list of these sense/antisense pairs with exon overlapping is given in Additional data file 1. This is in line with the prediction that over 20% of human transcripts might form sense-antisense pairs . As a control, we cross-referenced the previously known sense/antisense pairs to our dataset (see Materials and methods) and found that essentially 100% of known pairs [8, 9] with evidence from RefSeq or mRNA are covered by our set. In addition, we found 1,116 RefSeqs with evidence of antisense exon-overlapping messages not covered by Yelin et al.  and 1,573 not covered by Chen et al. . The complete list of sense/antisense pairs identified here is given in Additional data file 1 along with data for the cross-reference to published sense/antisense pairs.
Most interestingly, we found 7,507 spliced and unspliced mRNA clusters that are entirely intronic to the non-redundant RefSeq genes (Table 1). While 5,002 (67%) of these mapped in the sense direction and may represent new exons of the corresponding genes, 2,505 (33%) mapped exclusively to the introns of RefSeq genes in the antisense direction and thus comprise a set of antisense mRNA clusters with no overlap to exons of sense messages that had not been appreciated in the previous analyses. A complete list of the latter wholly intronic mRNA/RefSeq clusters and the corresponding protein-coding RefSeq is given in Additional data file 1. Although the strandedness of genomic mapping of these mRNAs was taken as preliminary evidence of antisense transcription, direct experimental confirmation was obtained by microarray assays, as described in the following sections. Owing to the fragmented nature of the transcript data in GenBank, some of these intronic antisense messages may originate from the 3' or 5' ends of overlapping sense-antisense transcripts of adjacent genes. However, most of them could represent independent antisense transcriptional units, which became more evident when data from the public EST repository were taken into account, as described below.
Identification of long, unspliced, totally intronic transcripts
Classification of GenBank ESTs with respect to their genome mapping coordinates in relation to the set of non-redundant spliced RefSeq sequences
EST clusters with overlap to exons of RefSeq genes*
EST clusters wholly intronic to RefSeq genes
EST clusters mapped outside of RefSeq genes
Spliced EST contigs
Number of exons of spliced EST contigs (median)
Total number of spliced ESTs in contigs
Number of spliced ESTs per contig (median)
Unspliced EST contigs
Total number of unspliced ESTs in contigs
Number of unspliced ESTs per contig (median)
Spliced EST singlets
Unspliced EST singlets
Total non-redundant EST clusters (contigs + singlets)
The most interesting finding was that 55,139 unspliced EST contigs formed by grouping 190,583 ESTs mapped entirely to the introns of genes in the RefSeq dataset (Table 2). A marked feature of these unspliced, wholly intronic EST contigs is their low protein-coding potential; in silico analysis of the coding potential using the normalized ESTScan2 score  predicted that 98% of them are probably noncoding transcripts, supporting the idea that they represent a separate class of noncoding RNAs. To check whether ESTScan2 predicted the coding potential of such a fragmented sequence dataset correctly, we created a virtual dataset in silico composed of 55,139 exonic fragments from RefSeq genes with exactly the same lengths as the 55,139 wholly intronic EST contigs. ESTScan2 correctly predicted that 70% of these in silico-generated virtual exonic fragments have coding potential. This supports the inference that since only a very few (approximately 2%) of the wholly intronic EST contigs are predicted by ESTScan2 to have a protein-coding potential, most of the RNAs in this class (98%) are indeed noncoding messages.
Inspection of the length distribution curves (Figure 1) of the wholly intronic EST contigs reveals messages with lengths well over 1,000 nt. The median length (573 nt) is 4.1 times greater than the median length of exons (141 nt) in the RefSeq reference dataset. On the basis of these findings, we call these transcriptional units long totally intronic noncoding (TIN) transcripts.
Most mammalian snoRNAs  and a large fraction of microRNAs  are derived from introns in protein-coding and noncoding genes transcribed by RNAP II. To address the possibility that some of the TIN transcripts are the sources of these known small RNAs, we compared the human genomic coordinates of TIN sequences to those of 346 snoRNAs  and 383 microRNAs . We found that 98 snoRNA or microRNA transcripts (14%) mapped to 86 TIN EST contigs, which may well be the sources of these small RNAs. The 86 TIN EST contigs comprise a very small portion (0.2%) of the TIN transcript dataset. We postulate that the large remaining set could be the source of new snoRNAs and microRNAs as well as of new types of ncRNAs.
Identification of long, unspliced, partially intronic transcripts
A set of unspliced partially intronic noncoding (PIN) EST contigs was identified. A PIN contig was defined as a contig that overlaps an exon of a RefSeq gene and extends at least 30 bases over both ends of the exon (Figure 1). In total, 12,592 PIN EST contigs (median length 719 nt) were identified. An estimated 90% of PIN transcripts have no or limited protein-coding potential as determined by ESTScan2 analysis. By matching the PIN contig sequences to ESTs from high-quality directionally cloned EST libraries , to transcriptionally active regions (TARs) in whole-genome strand specific tiling arrays , and to the publicly available unspliced full-length mRNA dataset from GenBank we found that 5,992 PIN contigs (48%) have evidence of being transcribed antisense to the corresponding RefSeq gene. It should be noted that the above EST and tiling array information was not taken as definite evidence of antisense PIN transcription. Sense/antisense PINs were determined experimentally by oligoarray hybridization as described in the following sections, using a pair of separate reverse complementary probes for each PIN in the array, and the strand information was obtained by mapping the actual 60-mer oligonucleotide single-stranded probe to the genomic sequence and recording its strand direction.
Most RefSeq genes have intronic transcription
Overall, we found that at least 11,679 RefSeq genes, corresponding to 74% of all spliced human genes in the reference dataset, have transcriptionally active introns to which TIN or PIN EST contigs were mapped. If we were to consider TIN or PIN EST singlets, the fraction of RefSeq genes with intronic transcription would increase to 86% of all RefSeq genes.
TIN and PIN transcripts are potential alternative splicing regulators
We found that the average frequency of exon skipping for genes in the RefSeq reference dataset that show evidence of PIN transcripts is 0.23, and the average frequency of exon skipping for exons immediately 3' to TIN transcripts is 0.22. These frequencies are significantly (p < 0.0001) higher than the average frequency of exon skipping (0.14) in the overall set of RefSeq genes (data not shown).
Design and overall performance of a gene-oriented intron-exon oligoarray platform
We opted to use the 60-mer Agilent oligoarray technology to construct this custom-designed array because the probe characteristics and the hybridization and washing protocols in this platform have been optimized to attain reproducible results . Therefore, probe design followed Agilent recommendations with respect to GC content and melting temperature (Tm), as detailed in Materials and methods, to ensure a homogeneous and effective hybridization of fluorescent targets. In fact, the reproducibility of expression in our experiments was fairly high, as evaluated by the correlation coefficients obtained for the two-color raw intensities within each slide and the correlation coefficients of inter-slide comparisons. These correlation coefficients ranged from 0.914 to 0.981 for intra-slide and from 0.915 to 0.949 for inter-slide comparisons.
Probe specificity was ensured by selecting 60-mer sequences with a homopolymeric stretch no longer than 6 bases; in addition, probes should not have 8 or more bases derived from repetitive regions of the genome. The selected probes have a low probability of cross-hybridization, as estimated by a BLAST search against the sequences of all transcribed human messages using the following criteria. All probes have 100% matches to the transcript sequences they represent, which translates into a best-match BLAST bit-score of 119. A bit-score high-end cutoff for the second-best match of each selected probe was set at 42.1, which would correspond to cross-hybridization with a maximum match of 21 bases with no gaps. This high-end cutoff level was determined from the bit-scores of the second-best hits for all the Agilent-designed commercial probes for protein-coding genes included in our platform; it is a conservative cutoff that includes 90% of the Agilent-optimized probes (Additional data file 3). Commercial probes with bit-score cross-hybridization matches higher than 42.1 were included because Agilent have tested each of their probes individually for absence of cross-hybridization . Since we did not test individual probes, we opted to use this conservative high-end cutoff parameter for the intronic probes.
Negative controls in the oligoarray (1,198 Agilent commercial control probes, see Materials and methods) included sequences from adenovirus E1A transcripts, synthetically generated mRNAs, Arabidopsis genes and control probes designed not to hybridize to targets because of secondary structure. The hybridization and washing stringency conditions optimized by Agilent ensured that the raw signal intensities for these negative controls (median 34.3) in our experiments were low. For each experiment, the average negative control intensity plus 2 standard deviations (SD) was used as a low-limit cutoff to call the expressed and not-expressed genes.
Figure 3b shows the distribution of average intensities in the microarray experiments for genes called not-expressed (below the low-limit cutoff) and for protein-coding, antisense or sense TIN and antisense PIN expressed transcripts. The distribution is skewed towards higher intensities for protein-coding transcripts and the median intensity is 351. The distribution of intensities is very similar for all types of intronic transcripts, and is skewed towards lower intensities when compared to that of protein-coding genes (Figure 3b). Nevertheless, the median intensities (134 for antisense TIN, 126 for antisense PIN and 135 for sense TIN transcripts) were sufficiently above that of the negative controls to permit a considerable number of expressed intronic transcripts to be detected in all tissues. Discrimination between expressed and not-expressed transcripts may be more critical for intronic messages than for protein-coding ones, and a larger fraction of false-negatives may be present in the intronic data. Our results corroborate previous tiling array measurements in chromosomes 21 and 22 that showed that ncRNAs were generally expressed at lower levels than protein-coding ones .
Partially and totally intronic noncoding transcripts expressed in three human tissues
It can be seen that 50% to 69% of protein-coding transcripts were expressed in each individual tissue, while 14% t o 32% antisense and sense TIN and 20% to 45% antisense PIN transcripts were detected (Figure 4). This reveals that the abundance of intronic transcripts was lower than that of protein-coding messages, in terms of both the diversity of messages per tissue (Figure 4) and the relative distribution of signal intensities (Figure 3b).
Antisense TIN transcripts are enriched in introns of genes related to regulation of transcription
Among the top 40% most highly expressed antisense TIN transcripts mapping to 678 protein-coding genes in the prostate, 105 (16%) belong to 'Regulation of transcription, DNA-dependent' (Figure 7b). Analogous results were obtained for liver and kidney, where 71 out of 409 (17%) and 118 out of 812 (15%) of the genes, respectively, belong to 'Regulation of transcription, DNA-dependent'. A total of 123 unique genes related to 'Regulation of transcription' were found in common among the 40% most highly expressed antisense TIN transcripts in prostate, kidney or liver. Most of these (69 genes, 56%) were expressed in all three tissues (Figure 7b), while some were shared between two tissues and a few were only expressed in one. The 'Regulation of transcription' GO category includes genes encoding various DNA-binding proteins such as transcription factors, zinc fingers and nuclear receptors. The entire list of genes identified in Figure 7b can be found in Additional data file 5. Similar analyses with the top 40% highly expressed sense TIN and antisense PIN transcripts did not identify any enriched GO category.
A similar analysis using the top 40% most highly expressed protein-coding genes showed an entirely different set of significantly (p < 0.05) enriched GO categories; between 10 and 15 significantly enriched categories were detected in each tissue, and none was related to 'Regulation of transcription' (Additional data file 6). The most significantly enriched GO categories in all three tissues include genes involved in RNA and protein biosynthesis, ribosome biosynthesis, mRNA processing and initiation of translation.
Many TIN and PIN RNAs are insensitive to RNAP II inhibition or are even up-regulated by α-amanitin
Markedly fewer of the expressed TIN antisense (12%) and sense (14%) transcripts were affected by α-amanitin. Similar fractions of antisense (16%, 42/265) and sense (15%, 49/326) TIN transcripts were up-regulated in α-amanitin treated cells (Figure 8). PIN antisense transcript levels exhibited an expression pattern rather different from that of protein-coding transcripts when RNAP II was inhibited: only 15% were affected, of which 12% (39/339) were up-regulated. Interestingly, 3 to 4 times as many TIN and PIN RNAs as protein-coding messages (4%) were up-regulated by α-amanitin (Figure 8).
We consider that the stringent criteria used, combining two statistical methods to identify the differentially expressed transcripts, may be conservative. Therefore, the proportion of intronic messages that are up-regulated following α-amanitin treatment may be even greater than those reported here. In any case, the number of intronic ncRNAs insensitive to inhibition, or up-regulated upon α-amanitin treatment, is likely to be in the thousands when extrapolated to all the intronic transcripts found in human cells. Considering only the 55,139 wholly intronic EST clusters, over a thousand are predicted to be up-regulated if at least 13% are affected by 24 hours of RNAP II inhibition.
Tissue signatures of TIN and PIN expression
A tissue signature containing 2,809 protein-coding transcripts was also identified (Figure 10d). Analysis of GO enrichment (not shown) revealed that in liver the protein-coding tissue signature is enriched in GO categories related to urea cycle (GO: 006594), cysteine metabolism (GO: 006534), cholesterol biosynthesis (GO: 008203) and prostaglandin metabolism (GO: 006693), while in kidney it is enriched in the GO categories related to sodium and potassium ion transport (GO: 006834 and GO: 006813, respectively). In the prostate, no relevant GO categories were enriched, but prostate-specific genes such as KLK3 and TMEPAI were found.
In a smaller subset of nine loci, the 3' exon of the protein-coding transcript (Figure 11b, right panel) does not follow the pattern of tissue expression of the PIN RNA and the corresponding PIN-overlapped exon of the protein-coding gene (Additional data file 11; Figure 11b, left and central panels). In addition, the PIN RNA (Additional data file 11; Figure 11c, left panel) in six loci has an inverted expression pattern relative to that of the PIN RNA-overlapped exon (Figure 11c, central panel). In some tissues, there is an inverted pattern in the relative levels of PIN-overlapped exon and the 3' exon of the protein-coding gene for these two sets (Figure 11b,c, central and right panels), suggesting that the protein-coding message is alternatively spliced in a tissue-dependent manner. The similar levels of PIN RNAs and PIN-overlapped exons in Figure 11b (central and right panels) suggest that, in these cases, the PIN RNA may be involved in exon retention of the protein-coding gene, whereas the inverted pattern observed in Figure 11c (central and right panels) suggests that the PIN RNA may favor skipping of the overlapped exon. The effect of intronic RNAs on splicing has been documented in a recent report, where overexpression of a naturally occurring antisense PIN RNA (Saf transcript) mapping to the first intron of Fas caused the retention of an alternative Fas exon that was complementary to the antisense PIN transcript .
Long intronic unspliced transcripts in humans
In this work we have evaluated the contribution of introns in the human genome to the production of noncoding RNAs by gathering data on expressed intronic sequences from public databases, and in parallel by measuring expression with combined intron-exon oligoarrays. We focused on the unspliced messages that map totally (TIN) or partially (PIN) to intronic regions and found that most of the genes defined by RefSeq sequences (74%) undergo intronic transcription. This fraction is likely to prove even greater since intronic expression has not yet been assessed in different developmental stages and physiological conditions. While some of the unspliced intronic ESTs (mapping to the sense strand) may represent hitherto-overlooked exons of alternatively spliced forms of known genes, a significant number of the sense and antisense transcripts in this dataset is likely to derive from novel independent transcriptional units. This is supported by the low protein-coding potential and long length of the TIN and PIN EST contig sequences (medians of 573 nt and 719 nt, respectively), well above the typical lengths of exons of protein-coding genes (median 141 nt).
The median length of the 55,000 TIN RNAs identified in all chromosomes in our analysis is in line with the lengths observed in previous reports by RACE analysis of non-annotated transcripts from 10 human chromosomes (average length 680 nt, range 173 to 4,650 ). Almost none of the TIN EST sequences (0.2%) matched known snoRNAs or microRNAs. Nevertheless, it remains possible that some long TIN messages are precursors of yet-undiscovered small RNAs.
We found no correlation between intron size and the abundance of mapped TIN unspliced EST contigs for most of the genes (approximately 60%) that showed evidence of intronic transcription, suggesting that most intronic transcription does not occur by chance. In addition, the consistent correlation between approximately 30% of TIN contigs and intron length might support the 'genomic design' hypothesis [39, 40], in the sense that transcription of the longer introns in tissue and development-specific genes could carry regulatory information [39, 40]. In effect, we see a more abundant expression of intronic antisense messages in genes with regulatory functions (see discussion below).
We have shown that long TIN RNAs were correlated to the degree of malignancy in prostate cancer . To investigate if there is a preferential contribution of ESTs from tumor libraries to the set of TINs identified in this work, we compiled the information regarding normal or neoplastic tissue origin that is documented in the Cancer Genome Anatomy Project (CGAP) database , and assigned it to the set of five million ESTs analyzed in this work. We found that 43% and 57% of the 5 million ESTs are derived from tumor or normal libraries, respectively. Interestingly, we found that the same distribution (43% and 57%) was present in the set of 190,583 ESTs included in the TIN contig dataset. Therefore, there is no biased contribution from tumor EST libraries to the TIN dataset. Moreover, we found that 49% of the 55,139 TIN contigs contained at least one EST from a tumor library, suggesting that TIN transcription is equally present in normal and tumor tissues. These results corroborate the notion that TIN transcription is not an exclusive feature of neoplastic tissues, but rather part of the normal transcriptional output of the cells that may be partially dysfunctional in cancer disease.
Intronic transcripts may stabilize protein-coding transcripts or regulate their alternative splicing
Most of the PIN and TIN RNAs selected in the tissue-specific signatures have the same tissue expression patterns as the corresponding protein-coding genes. This might indicate that transcription of some PIN and TIN RNAs is linked to a cis-acting stabilization of the corresponding protein-coding transcript [42–44]. Intronic transcripts may also act in trans, for example, by controlling regional chromatin architecture as demonstrated for some specific long ncRNAs [34, 45, 46]. Overexpression of complete introns in the CFTR gene affects the expression of a large number of protein-coding messages in trans, many of them related to CFTR function .
A few PIN RNAs selected in the tissue-specific signatures showed tissue expression patterns that correlated with the corresponding protein-coding exon overlapped by the PIN transcript. However, the expression pattern of an exon closer to the 3' end of the same protein-coding gene was not correlated, suggesting that alternatively spliced isoforms of the protein-coding transcripts were tissue-specific. Splicing is known to be modulated by the binding of intronic splicing enhancer (ISE) and silencer (ISS) elements to regulatory factors, favoring or blocking spliceosome formation . Some PIN RNAs might regulate the skipping or retention of exons by interacting either with splice signals or with ISE and ISS elements in pre-mRNAs. In fact, there are examples of control of exon skipping by artificially introduced oligonucleotides in human cells , and some promising therapeutic strategies rely on antisense oligonucleotides that modulate exon-skipping . As for exon retention, a recent report has identified a long antisense noncoding transcript named Saf, which maps as a partially intronic transcript to the first intron of Fas, a gene encoding an apoptotic protein . Overexpression of Saf in Jurkat cells induced the expression of different alternatively spliced Fas isoforms, in which an alternative exon overlapped by Saf was retained and non-adjacent 3' exons were skipped, indicating that cis-acting antisense intronic RNAs have a regulatory function .
Our present microarray analysis is conservative, in that we only selected transcripts that had correlated patterns of intronic and protein-coding messages and were also simultaneously present in tissue-signatures. A more direct experimental approach, for example, over-expressing or suppressing specific PIN transcripts and measuring their effect on the splicing pattern of the overlapped exons, might reveal novel candidates for antisense RNA regulators of exon usage that possibly contribute to a ubiquitous and under-appreciated mechanism of alternative splicing regulation.
Alternative splicing affects more than 70% of human protein-coding genes , in which exon skipping is the most frequent event . We found in silico evidence that cis-acting intronic transcription influences alternative splicing, that is, a higher incidence of noncoding transcription in the first introns along with higher skipping frequency of the first exons in the protein-coding genes. In addition, the frequency of skipping for exons close to or overlapped by intronic transcripts was significantly higher than the average frequency of exon skipping in the overall set of human genes. In fact, exon skipping can be artificially induced by introducing antisense oligonucleotides that map to intron/exon junctions [49, 50] or to wholly intronic regions .
The higher incidence of transcription in the first intron, closer to the gene promoter, might have other functional implications, such as the impairment of transcription by transcriptional interference [53, 54], regulation of gene promoter usage , or regulation of the initiation of RNAP II transcription . In the latter case, ncRNAs are known to function as co-activators; protein-binding ncRNAs are expected to provide a broad and diverse way of controlling mRNA transcription [15, 56]. We speculate that a high fraction of the intronic transcripts, especially the sense TIN RNAs, may act in trans, being parts of multi-component RNA-protein complexes that regulate gene expression. There are thousands of potential RNA regulators, which may effectively amplify the complexity of a human genome with a limited number of protein-coding genes [11, 57] through RNA-RNA, RNA-DNA, or RNA-protein interactions.
Biogenesis of TIN and PIN transcripts
We evaluated the contribution of RNAP II to the biosynthesis of intronic ncRNAs in human cells by blocking its activity with α-amanitin and measuring the levels of protein-coding and noncoding intronic messages. Remarkably, a considerable fraction (12% to 16%) of the wholly intronic or partially intronic antisense transcripts was up-regulated, a fraction 3- to 4-fold higher than that observed for protein-coding messages (4%). In addition, fewer intronic (12% to 15%) than protein-coding (39%) transcripts were sensitive to RNAP II inhibition. Importantly, the sense TIN RNAs responded quite similarly to antisense TIN RNAs with respect to RNAP II inhibition, suggesting that these ncRNAs share similar properties that are different from protein-coding messages. The refractory behavior of intronic transcript expression after α-amanitin treatment, and the apparent up-regulation of many intronic transcripts, suggests that a different transcriptional system may be involved in the biosynthesis of these long wholly intronic ncRNAs. A reasonable candidate is spRNAP IV, which is activated by α-amanitin , though the mechanism involved remains elusive. Further experimentation is warranted to verify this hypothesis.
Advantages of a gene-oriented combined intron/exon expression array platform
Experimental analysis using genome tiling arrays has permitted unbiased probing of transcribed regions in the human genome [32, 33, 38, 58, 59]. Probing chromosomes 21 and 22 revealed 5.3 kb of novel transcribed sequences within or overlapping the intronic regions of well-characterized genes, of which 2.7 kb (51%) are antisense to the protein-coding genes . Tiling arrays of the whole human genome have extended these analyses, detecting messages in liver that map to 1,529 and 1,566 novel intronic transcriptionally active regions (TARs) arising, respectively, from the antisense or the sense strands of the corresponding gene . Genome tiling arrays for 10 human chromosomes revealed interlaced networks of both poly A+ and poly A- annotated transcripts and unannotated transcripts of unknown function . It has become apparent that introns as well as intergenic regions constitute major sources of non-protein-coding RNAs , and tiling arrays promise to help unravel the complex cellular program of intronic transcription.
Different physiological and pathological conditions are yet to be probed by tiling arrays, and the amount and complexity of the information generated by high-density whole human genome tiling arrays may make the experiments difficult to perform. In this context, we believe that a gene-oriented combined intron-exon expression array that samples the intronic noncoding regions of the genome from which there is previous evidence of transcription, along with the corresponding protein-coding regions, will help to identify the particular gene families, biological processes or functional gene categories of greatest relevance to any physiological condition under study. In the present case, we have opted to probe in a combined intron-exon oligoarray approximately 15% (7,135 TINs) of the 55,139 wholly intronic genomic regions with evidence of transcription. With such a platform we were able to interrogate the intronic expression of three different tissues, and we found 1,915 sense and antisense TIN transcripts expressed in liver, 3,288 in prostate and 4,012 in kidney (Figure 4). A total of 4,296 unique intronic regions (60% of all probed TIN loci) were actively transcribed in at least one tissue, as determined by our combined intron-exon expression oligoarray. Thus, it is apparent that most of the 55,139 intronic regions with evidence of transcription from EST and mRNA data can be independently confirmed by direct hybridization, pointing to the best candidate set of intronic genomic regions to be studied in more detail. High-density custom tiling arrays of selected chromosome regions containing genes that are identified as preferentially transcribed in a given tissue should permit further detailed studies of intronic expression patterns. The information gathered from such complementary approaches should help accelerate the acquisition of information about the emerging diverse roles of intronic messages.
Tissue-specific intronic expression and enrichment of genes related to regulation of transcription
Tissue-specific expression signatures provide strong evidence that intronic transcripts are physiologically relevant. Expression signatures of microRNAs have been reported to classify human cancers , adding to the evidence that different ncRNAs are tissue-specific and functionally important.
The present finding that the most abundant wholly intronic antisense RNAs are transcribed from introns of genes related to the regulation of transcription provides a clue to their functional relevance. A high degree of conservation is expected in those intronic genomic regions that are under strong selective constraints. In fact, conserved genomic regions have been identified by several different approaches in the introns of genes involved in transcriptional regulation [60–64]: identification of non-transcribed ultraconserved sequences , multispecies conserved sequences , sequences conserved in vertebrates but highly divergent among chimpanzees and humans , short blocks of multiple-copy sequences (pyknons) , or conserved regions without transposon insertions . Our results add to these findings by showing that conserved intronic DNA segments of genes involved in transcriptional regulation are the sources of one of the most abundant intronic RNAs in three different human tissues. The possibility that regulatory genes are controlled by ncRNAs transcribed at the same loci is appealing. It would represent an additional mechanism for regulating the regulators, in a rather sophisticated system for fine-tuning eukaryotic gene expression.
Our approach has used an oligoarray-based gene-oriented combined intron-exon expression platform as a practical and effective compromise between a biased exon array that only probes the protein-coding messages, and the whole human genome tiling arrays. This approach has identified potentially functional intronic RNAs that are most abundantly transcribed from introns of genes involved in transcriptional regulation. Further comparative analysis of intronic transcription under a different number of physiological and pathological conditions should advance current knowledge about the diverse biological roles of these noncoding RNAs in the control of gene expression.
Materials and methods
Cross-referencing of genomic coordinates of transcripts from different sequence datasets
The analyzed sequence dataset comprises all human RefSeq, mRNA and EST sequences, of which the genome coordinates were downloaded from the Genome Browser web page  (hg17; NCBI Build 35, March 2005). First, sequences with poor alignment quality (coverage <0.70 and identity <0.90) or mapped to more than one genomic region were removed. Second, we discarded sequences with complicated rearrangement patterns, such as T-cell receptor and immunoglobulin genes. ESTs and mRNAs that aligned to exons of two or more non-overlapping RefSeqs from the same genomic strand were filtered out as suspected chimeras. Sequencing errors in transcripts aligned to the genome sequence led to gaps that are interpreted as introns by our parser. To avoid these falsely identified introns, we joined adjacent exons whenever an intron of less than 30 bases was detected.
A bioinformatics tool was developed to handle the over five million human ESTs efficiently. Essentially, this tool consists of a package of scripts written in Perl that uses files of genome mapping coordinates directly obtained from the UCSC genome browser. The use of coordinates avoids the computationally intensive and parameter-dependent problems of alignment-based programs. EST sequences with overlapping exons were merged into EST clusters using the genomic mapping coordinates. RefSeq and mRNA sequences were processed separately and split into four sets according to the genomic strand to which they mapped, and further sub-divided into spliced and unspliced groups of messages. Sequences from the same strand in each subgroup were merged into a transcriptional unit when their exons overlapped at the same genomic locus. The mRNA dataset was aligned against the RefSeq dataset to identify additional splice variants, intronic and antisense transcripts represented in the mRNA collection, as detailed in Table 1. A complete list of sense/antisense transcript pairs identified here is given in Additional data file 1.
From the combined data described above, a reference dataset was defined comprising the set of 15,783 spliced non-redundant RefSeq transcriptional units plus the evidence of additional splice variants obtained for each transcriptional unit from all mRNA sequences mapping to the same locus. As a control to our filter and clustering procedures this reference dataset was cross-referenced to the lists of previously known sense/antisense pairs [8, 9]. First, we eliminated from the published lists of pairs those that were composed of sequences that had been eliminated from the UCSC hg17 database and, therefore, were not by definition in the dataset analyzed here (181 pairs from , and 15 from ). Second, we eliminated from the published lists those pairs for which there was no evidence of sense/antisense overlap from RefSeq or mRNA, only from ESTs, since this was the criterion used in the present analysis to establish our reference dataset (1,002 pairs from  and 822 from ). Next, we found 45 sequences from  and 159 from  that matched clusters in our dataset that contained only mRNAs, not RefSeq sequences. The remaining 1,432 pairs from  and 1,740 from  comprise the pairs that were expected to be found in our RefSeq reference dataset. Of these, a total of 1,429 (99.8%) from  and 1,734 (99.7%) from  were covered by our dataset; only 3 pairs from  and 6 from  were filtered out from our dataset because of various low quality criteria that we had implemented (see filters above).
Subsequently, this RefSeq reference dataset was compared to the total set of EST clusters in order to define those that were exonic or wholly intronic to genes in the reference dataset. The genome mapping coordinates of each of the 55,139 unspliced EST contigs identified as wholly intronic to RefSeq genes (TIN RNAs, Table 2) are listed in Additional data file 13, and the file is formatted in a way that each entry can be uploaded as a track in the UCSC genome browser tool hg17 assembly version of May 2004 and viewed. For wholly intronic RNAs, we recorded the relative position of the RefSeq intron to which the RNA mapped with respect to the total number of introns of the respective RefSeq reference gene.
Partially intronic EST contigs were identified in a later step, by searching for evidence of two or more overlapping EST sequences that mapped to an exon and covered the intronic regions flanking the exon on each side by more than 30 contiguous bases (unspliced extension of the exon). The genome mapping coordinates of each of the 12,592 EST contigs identified as partially intronic to RefSeq genes (PIN RNAs) are listed in Additional data file 14, and the file is formatted in a way that each entry can be uploaded as a track in the UCSC genome browser tool hg17 assembly version of May 2004 and viewed.
Only the genomic mapping coordinates of TIN and PIN contigs were recorded, not the genomic strand orientation; direct experimental determination of strandedness of transcription was obtained by oligoarray hybridization, using a pair of separate reverse complementary probes for each TIN or PIN in the array as described in the following sections.
Exon skipping frequencies
For each exon of a gene from the RefSeq reference dataset, we counted the number of times that it mapped to an exon (# in exon) or an intron (# in intron) in all the mRNA sequences from the same subgroup (mRNAs and RefSeqs from the same locus and on the same genomic strand). Exon skipping (ES) frequency was given by:
ES = 1 - [# in intron/(# in intron + # in exon)]
Design of the 44 k intron-exon oligoarray
Oligonucleotide probes were designed for the sense and antisense strands of each of 7,135 totally (TIN) and 4,439 partially (PIN) intronic noncoding RNAs picked randomly from the list of unspliced EST contigs with most abundant ESTs representing each type of intronic transcript. First, for each PIN or TIN RNA, we selected all 60-mer sequences that satisfied a series of conditions  as follows: probes should not have 8 or more bases derived from repetitive regions of the genome or homopolymeric stretches of 7 or more bases (low complexity); and they should have a GC content of 35% to 55% and a Tm of 68-76°C. To reduce cross-hybridization, each 60-mer sequence was searched by BLAST against a specific database comprising all human genomic regions for which the mRNA or EST data give any evidence of transcription. Those 60-mer sequences for which the second best hits against this database had bit-scores equal to or lower than 42.1 were carried forward to the next step. They were mapped back to their respective targets and one probe was selected closer to the 3' end of each target in the antisense direction, relative to the protein-coding genes. For each target, a second probe was selected for the opposite strand by taking the reverse complementary sequence of the selected 60-mer, so that a pair of sense/antisense probes is present for each TIN and PIN candidate region in the array. For PIN RNAs, the probe on the opposite strand corresponds to the exon of the gene where the partially intronic message overlaps. To measure the transcriptional level of the protein-coding genes to which PIN and TIN RNAs mapped, we included in our array 14,074 elements corresponding to exons of 7,464 unique Agilent-designed probes contained in the Whole Human Genome Oligo Microarray set (matched by their Gene_Name annotations to the RefSeq reference dataset genes to which the PIN and TIN RNAs mapped), together with the set of 2,256 positive and negative control Agilent commercial probes (IS-44290-1-V1_eQC-V1) designed for the Agilent human expression oligoarrays. Our custom-designed 44 k intron-exon oligoarrays were printed by Agilent Technologies. A list of all probes is available at GEO under accession number GPL4051, and also as Additional data file 15. The genome mapping coordinates of each of the intronic and exonic oligoarray probes are listed in Additional data file 16, and the file is formatted in a way that each entry can be uploaded as a track in the UCSC genome browser tool hg17 assembly version of May 2004 and viewed.
Human tissue samples
Total RNA was purified from two pools of normal human liver, each with samples from 5 individuals, and from normal kidney tissues obtained from 17 individuals. Four pools of normal kidney were prepared (three with four samples and one with five samples). In addition, two prostate tumor samples were used. All samples were obtained from patients who signed informed consent, and approval was received from the ethics committees of the hospitals. Total RNA was purified using Trizol (Invitrogen, Carlsbad, CA, USA) according to the manufacturer's instructions, followed by treatment with DNase I following the 'on-column DNase digestion' protocol of the Qiagen RNeasy kit (Qiagen, Valencia, CA, USA) to remove potential genomic DNA contamination. All RNA samples were checked for purity using a ND-1000 spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and for integrity by electrophoresis on a 2100 BioAnalyzer (Agilent Technologies, Santa Clara, CA, USA).
α-Amanitin experiments with LNCaP cells
The prostate carcinoma cell line LNCaP was obtained from ATCC and maintained in RPMI 1640 medium (Invitrogen) supplemented with 10% (vol/vol) fetal calf serum, 3 mM L-glutamine, 100 μg/ml streptomycin and 100 U/ml penicillin, at 37°C and 5% CO2. For RNAP II inhibition experiments, 8 × 105 LNCaP cells were plated in p60 dishes and cultured for 2 days, after which the medium was replaced by fresh medium with or without (mock) 50 μg/ml α-amanitin (Roche, Basel, Switzerland). After 24 h, the cells were washed once with ice-cold phosphate-buffered saline, harvested, pelleted and stored at -80°C. Total RNA was isolated using Qiagen RNeasy kit and treated with DNase I following the 'on-column DNase digestion' protocol (Qiagen). RNA quality was checked as described above. Two biological replicas were processed separately.
Sample labeling and microarray hybridization procedures
Cy5- and Cy3-labeled cRNA was obtained using 300 ng total RNA as template for amplification of poly(A) RNA by T7-RNA polymerase with the Agilent Low RNA Input Fluorescent Linear Amplification kit. The T7-polymerase amplified cRNA labeling approach advantageously replaces the reverse-transcriptase cDNA labeling used in early microarray experiments, because T7-RNA polymerase labeling of cRNA preserves the strand orientation of the original mRNA template. Reverse-transcriptase labeling can eventually generate a complementary cDNA second strand and cause artifactual labeling of a target with the opposite sense to that of the original message. For LNCaP cell line samples (mock-treated or α-amanitin-treated cells), 500 ng total RNA was used and a control in vitro synthesized mRNA (Agilent RNA Spike-In kit) was spiked into the amplification and labeling assay. For kidney tissue samples, the four pools from normal individuals were considered as replicas, and each pair was labeled with either Cy3 or Cy5. Each liver sample pool, prostate tissue sample or LNCaP cell line sample was separately labeled in replicate with Cy3 or Cy5. Hybridization of 750 ng each of Cy3- and Cy5-labeled cRNA was performed with an Agilent in situ Hybridization kit-plus, as recommended by the manufacturer, using a total of six 44 k intron-exon expression oligoarrays. Slides were washed and processed according to the Agilent 60-mer Oligo Microarray Processing protocol and scanned on a GenePix 4000B scanner (Molecular Devices, Sunnyvale, CA, USA). Data were extracted from the images with ArrayVision 8.0 (Imaging Research Inc., GE Healthcare, Piscataway, NJ, USA). Cy5- and Cy3-derived intensity data from the same sample were corrected for intensity-dependent dye biases  using a Lowess function implemented in the R package . The different experiments with human tissues were normalized by the 40% trimmed mean intensity of all the spots in each slide that were above the mean plus 2 SD intensity of 1,198 negative controls. The experiments with the LNCaP cells (mock- and α-amanitin-treated cells) were normalized by the 40% trimmed mean intensity of 300 control probes from a specific probe set on the array that reports the signals from labeled targets generated from the synthetic spiked-in mRNA.
For tissue-specific expression profiles, the SAM approach was employed using as parameters: multi-class response, 1,000 permutations, K-Nearest Neighbors Imputer, and FDR ≤ 0.002. Analysis of variance (ANOVA), implemented in the SpotFire Decision Site for Functional Genomics (SpotFire Inc., Somerville, MA, USA) with cutoff p ≤ 0.001 was also used. Gene sets identified by SAM or ANOVA were combined in order to identify a more restricted set of genes that showed statistically significant changes of expression in a tissue by both analyses.
For RNAP II inhibition experiments, the SAM approach was employed, using as parameters: two-class unpaired response, t-statistic, 1,000 permutations, K-Nearest Neighbors Imputer, and FDRs ranging from 0.2% to 2%; a signal-to-noise ratio (SNR) analysis with 10 k permutations (p < 0.05) was performed. Gene sets identified by SAM or SNR were combined in order to identify a more restricted set of genes that showed statistically significant changes of expression upon α-amanitin treatment by both analyses.
GO enrichment analyses
We used BiNGO, the Biological Network Gene Ontology plug-in tool  version 1 from the Cytoscape package , with a GO database updated as of 17 June 2006. BiNGO analysis does not include eventual duplicate instances of the same Gene_ID in a given selected dataset; only one event is counted for a given Gene_ID. We used the Hypergeometric statistical test with Benjamini and Hochberg's FDR multiple testing correction, choosing a significance level of 0.05. We used as the reference dataset all genes that were present in our 44 k intron-exon expression oligoarray, as follows: for protein-coding genes, we used all Gene_IDs of protein-coding probes in the array; for TIN and PIN RNAs, we used all Gene_IDs to protein-coding genes for which there were TIN or PIN RNA probes in the array mapping to the corresponding genomic loci.
Related microarray data are deposited at Gene Expression Omnibus (GEO) under accession numbers [GenBank:GSE5452, GenBank:GSE5453].
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 lists sense/antisense transcript pairs with overlapping exons and with no exon overlap (wholly intronic) identified in the RefSeq and mRNA data from GenBank. Additional data file 2 shows abundance of wholly intronic noncoding transcription in RefSeq genes. Additional data file 3 shows the distribution of BLAST bit-score for the second best hit of the 60-mer oligonucleotide probes from the microarray. Additional data file 4 shows that the most highly expressed antisense TIN transcripts map to genes related to regulation of transcription. The table shows the exact p values for all significantly enriched GO categories for each of the three tissues studied. Additional data file 5 is a list of 210 probes representing antisense TIN RNAs from 123 Gene IDs of genes related to 'Regulation of transcription'.
Additional data file 6 provides gene ontology analyses with the most highly expressed protein-coding transcripts in three different human tissues. Additional data file 7 lists exact p values for all significantly enriched GO categories of genes with up-regulated intronic transcription in the presence of α-amanitin. All exonic protein-coding and intronic non-coding RNAs up-regulated upon alpha-amanitin treatment are also shown. Additional data file 8 lists the tissue signatures of 431 antisense PIN RNAs. Additional data file 9 lists the tissue signatures of 419 antisense TIN RNAs. Additional data file 10 lists the tissue signatures of 567 sense TIN RNAs. Additional data file 11 is a comparison of tissue signatures between antisense PIN RNAs and exons of protein-coding genes. Additional data file 12 is a comparison of tissue signatures between TIN RNAs and exons of protein-coding genes. Additional data file 13 provides the genomic coordinates of all 55,139 TIN RNAs (formatted for UCSC browser track, hg17 assembly version of May 2004). Additional data file 14 provides the genomic coordinates of all 12,592 PIN RNAs (formatted for UCSC browser track, hg17 assembly version of May 2004). Additional data file 15 shows the 44 K platform design. Additional data file 16 provides the Genomic coordinates of all intronic and exonic probes in the custom-designed 44 K intron-exon oligoarray (formatted for UCSC browser track, hg17 assembly version of May 2004).
The authors thank Camila Egidio for help with testing the Agilent microarray protocol. The authors also thank Dr Marcia Kubrusly (Hospital das Clínicas, Universidade de São Paulo) and Dr Marcello Barcinski (Instituto Nacional de Câncer, Rio de Janeiro) for providing the tissue samples. This work was supported by a grant from Fundação de Amparo a Pesquisa do Estado de São Paulo, FAPESP to SVA, EMR and AMDS and by fellowships from FAPESP and Conselho Nacional de Desenvolvimento Científico e Tecnológico, CNPq, Brasil.
- Reis EM, Ojopi EP, Alberto FL, Rahal P, Tsukumo F, Mancini UM, Guimaraes GS, Thompson GM, Camacho C, Miracca E, et al: Large-scale transcriptome analyses reveal new genetic marker candidates of head, neck, and thyroid cancer. Cancer Res. 2005, 65: 1693-1699. 10.1158/0008-5472.CAN-04-3506.PubMedView ArticleGoogle Scholar
- Ferguson DA, Chiang JT, Richardson JA, Graff J: eXPRESSION: an in silico tool to predict patterns of gene expression. Gene Expr Patterns. 2005, 5: 619-628. 10.1016/j.modgep.2005.03.003.PubMedView ArticleGoogle Scholar
- Gupta S, Zink D, Korn B, Vingron M, Haas SA: Genome wide identification and classification of alternative splicing based on EST data. Bioinformatics. 2004, 20: 2579-2585. 10.1093/bioinformatics/bth288.PubMedView ArticleGoogle Scholar
- Thanaraj TA, Clark F, Muilu J: Conservation of humanalternative splice events in mouse. Nucleic Acids Res. 2003, 31: 2544-2552. 10.1093/nar/gkg355.PubMedPubMed CentralView ArticleGoogle Scholar
- Kan Z, Rouchka EC, Gish WR, States DJ: Gene structure prediction and alternative splicing analysis using genomically aligned ESTs. Genome Res. 2001, 11: 889-900. 10.1101/gr.155001.PubMedPubMed CentralView ArticleGoogle Scholar
- Modrek B, Lee CJ: Alternative splicing in the human, mouse and rat genomes is associated with an increased frequency of exon creation and/or loss. Nat Genet. 2003, 34: 177-180. 10.1038/ng1159.PubMedView ArticleGoogle Scholar
- Shendure J, Church GM: Computational discovery of sense-antisense transcription in the human and mouse genomes. Genome Biol. 2002, 3: research0044.1-0044.14. 10.1186/gb-2002-3-9-research0044.View ArticleGoogle Scholar
- Yelin R, Dahary D, Sorek R, Levanon EY, Goldstein O, Shoshan A, Diber A, Biton S, Tamir Y, Khosravi R, et al: Widespread occurrence of antisense transcription in the human genome. Nat Biotechnol. 2003, 21: 379-386. 10.1038/nbt808.PubMedView ArticleGoogle Scholar
- Chen J, Sun M, Kent WJ, Huang X, Xie H, Wang W, Zhou G, Shi RZ, Rowley JD: Over 20% of human transcripts might form sense-antisensepairs. Nucleic Acids Res. 2004, 32: 4812-4820. 10.1093/nar/gkh818.PubMedPubMed CentralView ArticleGoogle Scholar
- Okazaki Y, Furuno M, Kasukawa T, Adachi J, Bono H, Kondo S, Nikaido I, Osato N, Saito R, Suzuki H, et al: Analysis of the mouse transcriptome based on functional annotation of 60,770 full-length cDNAs. Nature. 2002, 420: 563-573. 10.1038/nature01266.PubMedView ArticleGoogle Scholar
- International Human Genome Sequencing Consortium: Finishing the euchromatic sequence of the human genome. Nature. 2004, 431: 931-945. 10.1038/nature03001.View ArticleGoogle Scholar
- Scherer SW, Cheung J, MacDonald JR, Osborne LR, Nakabayashi K, Herbrick JA, Carson AR, Parker-Katiraee L, Skaug J, Khaja R, et al: Human chromosome 7: DNA sequence and biology. Science. 2003, 300: 767-772. 10.1126/science.1083423.PubMedPubMed CentralView ArticleGoogle Scholar
- Pang KC, Stephen S, Engstrom PG, Tajul-Arifin K, Chen W, Wahlestedt C, Lenhard B, Hayashizaki Y, Mattick JS: RNAdb - a comprehensive mammalian noncoding RNA database. Nucleic Acids Res. 2005, D125-D130. 33 DatabaseGoogle Scholar
- Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005, D121-124. 33 DatabaseGoogle Scholar
- Goodrich JA, Kugel JF: Non-coding-RNA regulators of RNA polymerase II transcription. Nat Rev Mol Cell Biol. 2006, 7: 612-616. 10.1038/nrm1946.PubMedView ArticleGoogle Scholar
- Khochbin S, Lawrence JJ: An antisense RNA involved in p53 mRNA maturation in murine erythroleukemia cells induced to differentiate. EMBO J. 1989, 8: 4107-4114.PubMedPubMed CentralGoogle Scholar
- Yan MD, Hong CC, Lai GM, Cheng AL, Lin YW, Chuang SE: Identification and characterization of a novel gene Saf transcribed from the opposite strand of Fas. Hum Mol Genet. 2005, 14: 1465-1474. 10.1093/hmg/ddi156.PubMedView ArticleGoogle Scholar
- Krystal GW, Armstrong BC, Battey JF: N-myc mRNA forms anRNA-RNA duplex with endogenous antisense transcripts. Mol Cell Biol. 1990, 10: 4180-4191.PubMedPubMed CentralView ArticleGoogle Scholar
- Reis EM, Nakaya HI, Louro R, Canavez FC, Flatschart AV, Almeida GT, Egidio CM, Paquola AC, Machado AA, Festa F, et al: Antisense intronic non-coding RNA levels correlate to the degree of tumor differentiation in prostate cancer. Oncogene. 2004, 23: 6684-6692. 10.1038/sj.onc.1207880.PubMedView ArticleGoogle Scholar
- Du T, Zamore PD: microPrimer: the biogenesis and function of microRNA. Development. 2005, 132: 4645-4652. 10.1242/dev.02070.PubMedView ArticleGoogle Scholar
- Kiss T: Small nucleolar RNAs: an abundant group of noncoding RNAs with diverse cellular functions. Cell. 2002, 109: 145-148. 10.1016/S0092-8674(02)00718-3.PubMedView ArticleGoogle Scholar
- Lu J, Getz G, Miska EA, Alvarez-Saavedra E, Lamb J, Peck D, Sweet-Cordero A, Ebert BL, Mak RH, Ferrando AA, et al: MicroRNA expression profiles classify human cancers. Nature. 2005, 435: 834-838. 10.1038/nature03702.PubMedView ArticleGoogle Scholar
- Kravchenko JE, Rogozin IB, Koonin EV, Chumakov PM: Transcription of mammalian messenger RNAs by a nuclear RNA polymerase of mitochondrial origin. Nature. 2005, 436: 735-739. 10.1038/nature03848.PubMedPubMed CentralView ArticleGoogle Scholar
- Dahary D, Elroy-Stein O, Sorek R: Naturally occurring antisense:transcriptional leakage or real overlap?. Genome Res. 2005, 15: 364-368. 10.1101/gr.3308405.PubMedPubMed CentralView ArticleGoogle Scholar
- Kiyosawa H, Yamanaka I, Osato N, Kondo S, Hayashizaki Y: Antisense transcripts with FANTOM2 clone set and their implications for gene regulation. Genome Res. 2003, 13: 1324-1334. 10.1101/gr.982903.PubMedPubMed CentralView ArticleGoogle Scholar
- Iseli C, Jongeneel CV, Bucher P: ESTScan: a program for detecting, evaluating, and reconstructing potential coding regions in EST sequences. Proc Int Conf Intell Syst Mol Biol. 1999, : 138-148.Google Scholar
- Rodriguez A, Griffiths-Jones S, Ashurst JL, Bradley A: Identification of mammalian microRNA host genes and transcription units. Genome Res. 2004, 14: 1902-1910. 10.1101/gr.2722704.PubMedPubMed CentralView ArticleGoogle Scholar
- The snoRNABase. [http://www-snorna.biotoul.fr/]
- Griffiths-Jones S: The microRNA Registry. Nucleic Acids Res. 2004, D109-111. 10.1093/nar/gkh023. 32 DatabaseGoogle Scholar
- Bertone P, Stolc V, Royce TE, Rozowsky JS, Urban AE, Zhu X, Rinn JL, Tongprasit W, Samanta M, Weissman S, et al: Global identification of human transcribed sequences with genome tiling arrays. Science. 2004, 306: 2242-2246. 10.1126/science.1103388.PubMedView ArticleGoogle Scholar
- Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ, Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer MR, et al: Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol. 2001, 19: 342-347. 10.1038/86730.PubMedView ArticleGoogle Scholar
- Kampa D, Cheng J, Kapranov P, Yamanaka M, Brubaker S, Cawley S, Drenkow J, Piccolboni A, Bekiranov S, Helt G, et al: Novel RNAs identified from an in-depth analysis of the transcriptome of human chromosomes 21 and 22. Genome Res. 2004, 14: 331-342. 10.1101/gr.2094104.PubMedPubMed CentralView ArticleGoogle Scholar
- Kapranov P, Drenkow J, Cheng J, Long J, Helt G, Dike S, Gingeras TR: Examples of the complex architecture of the human transcriptome revealed by RACE and high-density tiling arrays. Genome Res. 2005, 15: 987-997. 10.1101/gr.3455305.PubMedPubMed CentralView ArticleGoogle Scholar
- Sun BK, Deaton AM, Lee JT: A transient heterochromatic state in Xist preempts X inactivation choice without RNA stabilization. Mol Cell. 2006, 21: 617-628. 10.1016/j.molcel.2006.01.028.PubMedView ArticleGoogle Scholar
- Maere S, Heymans K, Kuiper M: BiNGO: a Cytoscape plugin to assess overrepresentation of gene ontology categories in biological networks. Bioinformatics. 2005, 21: 3448-3449. 10.1093/bioinformatics/bti551.PubMedView ArticleGoogle Scholar
- Tusher VG, Tibshirani R, Chu G: Significance analysis of microarrays applied to the ionizing radiation response. Proc Natl Acad Sci USA. 2001, 98: 5116-5121. 10.1073/pnas.091062498.PubMedPubMed CentralView ArticleGoogle Scholar
- Golub TR, Slonim DK, Tamayo P, Huard C, Gaasenbeek M, Mesirov JP, Coller H, Loh ML, Downing JR, Caligiuri MA, et al: Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. Science. 1999, 286: 531-537. 10.1126/science.286.5439.531.PubMedView ArticleGoogle Scholar
- Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, et al: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005, 308: 1149-1154. 10.1126/science.1108625.PubMedView ArticleGoogle Scholar
- Vinogradov AE: Compactness of human housekeeping genes: selection for economy or genomic design?. Trends Genet. 2004, 20: 248-253. 10.1016/j.tig.2004.03.006.PubMedView ArticleGoogle Scholar
- Vinogradov AE: "Genome design" model: evidence from conserved intronic sequence in human-mouse comparison. Genome Res. 2006, 16: 347-354. 10.1101/gr.4318206.PubMedPubMed CentralView ArticleGoogle Scholar
- Cancer Genome Anatomy Project (CGAP). [http://cgap.nci.nih.gov/]
- Cawley S, Bekiranov S, Ng HH, Kapranov P, Sekinger EA, Kampa D, Piccolboni A, Sementchenko V, Cheng J, Williams AJ, et al: Unbiased mapping of transcription factor binding sites along human chromosomes 21 and 22 points to widespread regulation of noncoding RNAs. Cell. 2004, 116: 499-509. 10.1016/S0092-8674(04)00127-8.PubMedView ArticleGoogle Scholar
- Reis EM, Louro R, Nakaya HI, Verjovski-Almeida S: As antisense RNA gets intronic. Omics. 2005, 9: 2-12. 10.1089/omi.2005.9.2.PubMedView ArticleGoogle Scholar
- Mattick JS, Makunin IV: Non-coding RNA. Hum Mol Genet. 2006, 15 (Suppl 1): R17-29. 10.1093/hmg/ddl046.PubMedView ArticleGoogle Scholar
- Sanchez-Elsner T, Gou D, Kremmer E, Sauer F: Noncoding RNAs of trithorax response elements recruit Drosophila Ash1 to Ultrabithorax. Science. 2006, 311: 1118-1123. 10.1126/science.1117705.PubMedView ArticleGoogle Scholar
- Mayer C, Schmitz KM, Li J, Grummt I, Santoro R: Intergenic transcripts regulate the epigenetic state of rRNA genes. Mol Cell. 2006, 22: 351-361. 10.1016/j.molcel.2006.03.028.PubMedView ArticleGoogle Scholar
- Hill AE, Hong JS, Wen H, Teng L, McPherson DT, McPherson SA, Levasseur DN, Sorscher EJ: Micro-RNA-like effects of complete intronic sequences. Front Biosci. 2006, 11: 1998-2006. 10.2741/1941.PubMedView ArticleGoogle Scholar
- Bruno IG, Jin W, Cote GJ: Correction of aberrant FGFR1 alternative RNA splicing through targeting of intronic regulatory elements. Hum Mol Genet. 2004, 13: 2409-2420. 10.1093/hmg/ddh272.PubMedView ArticleGoogle Scholar
- De Angelis FG, Sthandier O, Berarducci B, Toso S, Galluzzi G, Ricci E, Cossu G, Bozzoni I: Chimeric snRNA molecules carrying antisense sequences against the splice junctions of exon 51 of the dystrophinpre-mRNA induce exon skipping and restoration of a dystrophin synthesis in Delta 48-50 DMD cells. Proc Natl Acad Sci USA. 2002, 99: 9456-9461. 10.1073/pnas.142302299.PubMedPubMed CentralView ArticleGoogle Scholar
- McClorey G, Fletcher S, Wilton S: Splicing intervention for Duchenne muscular dystrophy. Curr Opin Pharmacol. 2005, 5: 529-534. 10.1016/j.coph.2005.06.001.PubMedView ArticleGoogle Scholar
- Johnson JM, Castle J, Garrett-Engele P, Kan Z, Loerch PM, Armour CD, Santos R, Schadt EE, Stoughton R, Shoemaker DD: Genome-wide survey of human alternative pre-mRNA splicing with exon junction microarrays. Science. 2003, 302: 2141-2144. 10.1126/science.1090100.PubMedView ArticleGoogle Scholar
- Clark F, Thanaraj TA: Categorization and characterization of transcript-confirmed constitutively and alternatively spliced introns and exons from human. Hum Mol Genet. 2002, 11: 451-464. 10.1093/hmg/11.4.451.PubMedView ArticleGoogle Scholar
- Prescott EM, Proudfoot NJ: Transcriptional collision between convergent genes in budding yeast. Proc Natl Acad Sci USA. 2002, 99: 8796-8801. 10.1073/pnas.132270899.PubMedPubMed CentralView ArticleGoogle Scholar
- Martens JA, Laprade L, Winston F: Intergenic transcription is required to repress the Saccharomyces cerevisiae SER3 gene. Nature. 2004, 429: 571-574. 10.1038/nature02538.PubMedView ArticleGoogle Scholar
- Geirsson A, Lynch RJ, Paliwal I, Bothwell AL, Hammond GL: Human trophoblast noncoding RNA suppresses CIITA promoter III activity in murine B-lymphocytes. Biochem Biophys Res Commun. 2003, 301: 718-724. 10.1016/S0006-291X(03)00028-7.PubMedView ArticleGoogle Scholar
- Willingham AT, Orth AP, Batalov S, Peters EC, Wen BG, Aza-Blanc P, Hogenesch JB, Schultz PG: A strategy for probing the function of noncoding RNAs finds a repressor of NFAT. Science. 2005, 309: 1570-1573. 10.1126/science.1115901.PubMedView ArticleGoogle Scholar
- Mattick JS: RNA regulation: a new genetics?. Nat Rev Genet. 2004, 5: 316-323. 10.1038/nrg1321.PubMedView ArticleGoogle Scholar
- Kapranov P, Cawley SE, Drenkow J, Bekiranov S, Strausberg RL, Fodor SP, Gingeras TR: Large-scale transcriptional activity in chromosomes 21 and 22. Science. 2002, 296: 916-919. 10.1126/science.1068597.PubMedView ArticleGoogle Scholar
- Rinn JL, Euskirchen G, Bertone P, Martone R, Luscombe NM, Hartman S, Harrison PM, Nelson FK, Miller P, Gerstein M, et al: The transcriptional activity of human chromosome 22. Genes Dev. 2003, 17: 529-540. 10.1101/gad.1055203.PubMedPubMed CentralView ArticleGoogle Scholar
- Bejerano G, Pheasant M, Makunin I, Stephen S, Kent WJ, Mattick JS, Haussler D: Ultraconserved elements in the human genome. Science. 2004, 304: 1321-1325. 10.1126/science.1098119.PubMedView ArticleGoogle Scholar
- Sironi M, Menozzi G, Comi GP, Cagliani R, Bresolin N, Pozzoli U: Analysis of intronic conserved elements indicates that functional complexity might represent a major source of negative selection on non-coding sequences. Hum Mol Genet. 2005, 14: 2533-2546. 10.1093/hmg/ddi257.PubMedView ArticleGoogle Scholar
- Pollard KS, Salama SR, Lambert N, Lambot MA, Coppens S, Pedersen JS, Katzman S, King B, Onodera C, Siepel A, et al: An RNA gene expressed during cortical development evolved rapidly in humans. Nature. 2006, 443: 167-172. 10.1038/nature05113.PubMedView ArticleGoogle Scholar
- Rigoutsos I, Huynh T, Miranda K, Tsirigos A, McHardy A, Platt D: Short blocks from the noncoding parts of the human genome have instances within nearly all known genes and relate to biological processes. Proc Natl Acad Sci USA. 2006, 103: 6605-6610. 10.1073/pnas.0601688103.PubMedPubMed CentralView ArticleGoogle Scholar
- Simons C, Pheasant M, Makunin IV, Mattick JS: Transposon-free regions in mammalian genomes. Genome Res. 2006, 16: 164-172. 10.1101/gr.4624306.PubMedPubMed CentralView ArticleGoogle Scholar
- UCSC Genome Browser. [http://genome.ucsc.edu]
- Peixoto BR, Vencio RZ, Egidio CM, Mota-Vieira L, Verjovski-Almeida S, Reis EM: Evaluation of reference-based two-color methods for measurement of gene expression ratios using spotted cDNA microarrays. BMC Genomics. 2006, 7: 35-10.1186/1471-2164-7-35.PubMedPubMed CentralView ArticleGoogle Scholar
- The R Project for Statistical Computing. [http://www.r-project.org]
- Shannon P, Markiel A, Ozier O, Baliga NS, Wang JT, Ramage D, Amin N, Schwikowski B, Ideker T: Cytoscape: a software environment for integrated models of biomolecular interaction networks. Genome Res. 2003, 13: 2498-2504. 10.1101/gr.1239303.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.