Genomewide characterization of non-polyadenylated RNAs
© Yang et al.; licensee BioMed Central Ltd. 2011
Received: 12 November 2010
Accepted: 16 February 2011
Published: 16 February 2011
RNAs can be physically classified into poly(A)+ or poly(A)- transcripts according to the presence or absence of a poly(A) tail at their 3' ends. Current deep sequencing approaches largely depend on the enrichment of transcripts with a poly(A) tail, and therefore offer little insight into the nature and expression of transcripts that lack poly(A) tails.
We have used deep sequencing to explore the repertoire of both poly(A)+ and poly(A)- RNAs from HeLa cells and H9 human embryonic stem cells (hESCs). Using stringent criteria, we found that while the majority of transcripts are poly(A)+, a significant portion of transcripts are either poly(A)- or bimorphic, being found in both the poly(A)+ and poly(A)- populations. Further analyses revealed that many mRNAs may not contain classical long poly(A) tails and such messages are overrepresented in specific functional categories. In addition, we surprisingly found that a few excised introns accumulate in cells and thus constitute a new class of non-polyadenylated long non-coding RNAs. Finally, we have identified a specific subset of poly(A)- histone mRNAs, including two histone H1 variants, that are expressed in undifferentiated hESCs and are rapidly diminished upon differentiation; further, these same histone genes are induced upon reprogramming of fibroblasts to induced pluripotent stem cells.
We offer a rich source of data that allows a deeper exploration of the poly(A)- landscape of the eukaryotic transcriptome. The approach we present here also applies to the analysis of the poly(A)- transcriptomes of other organisms.
Nascent pre-mRNA transcripts undergo multiple co-transcriptional/post-transcriptional processing and modification events during their maturation. A poly(A) tail is added post-transcriptionally to the 3' end of almost all eukaryotic mRNAs and plays an important role in mRNA stability, nucleocytoplasmic export, and translation . 3' end formation involves binding of the cleavage/polyadenylation machinery to the AAUAAA hexamer (or some variants), often together with a downstream G/U rich sequence, followed by endonucleolytic cleavage of the pre-mRNA and the addition of a 3' non-templated poly(A) tail of up to 200 to 250 adenosines in mammalian cells . As most known mRNAs are polyadenylated at their 3' ends, transcriptome analysis using deep sequencing (mRNA-seq) typically involves enrichment of poly(A)+ RNAs by oligo(dT) selection [3–6]. However, this approach precludes detection of transcripts lacking a poly(A) tail.
A number of functional long transcripts (defined here as those >200 nucleotides in length) are known to lack poly(A) tails. These non-polyadenylated transcripts (poly(A)- RNAs) include ribosomal RNAs (rRNAs) generated by RNA polymerase I and III, other small RNAs generated by RNA polymerase III, and replication-dependent histone mRNAs  and a few recently described long non-coding RNAs (lncRNAs) [8, 9] synthesized by RNA polymerase II. Unlike poly(A)+ RNAs, the 3' end processing mechanisms of poly(A)- transcripts are quite distinct from each other. While most histone pre-mRNAs contain evolutionarily conserved stem-loop structures in their 3' UTRs that direct U7 small nuclear RNA (snRNA)-mediated 3' end formation , the lncRNAs malat1 and menβ are processed at their 3' ends by RNase P (which also processes the 5' ends of tRNAs), but also both encode a highly conserved short poly(A) tract at their 3' ends [8, 9].
Apart from histone mRNAs and the other transcripts mentioned above, relatively little is known about poly(A)- transcripts or mRNAs with short poly(A) tails. Earlier evidence suggested the existence of non-histone polysomal-associated poly(A)- RNAs [10, 11], but these were not characterized in detail. In addition, Katinakis et al.  suggested that some transcripts can be 'bimorphic' and exist in both poly(A)+ and poly(A)- forms, and that bimorphic ones can be produced from poly(A)+ RNAs that are processed to reduce or totally remove the poly(A) tail under certain conditions. This observation was further supported by more recent studies. By searching for the conserved poly(A)-limiting element, Gu et al.  identified several hundred sequences in human cells that possess poly(A) tails of <20 nucleotides. By separating RNAs into two fractions depending on the length of their poly(A) tails (short and long poly(A) tails) followed by a microarray analysis, Meijer et al.  found that approximately 25% of expressed genes have a short poly(A) tail of less than 30 residues in a significant percentage of their transcripts in NIH3T3 cells. The larger scale bioinformatic studies also suggested that a significant fraction (>24%) of long non-coding transcripts present in cells may lack a classical poly(A) tail [15–17]. Cheng et al.  used tiling arrays to detect total RNAs from ten human chromosomes in multiple human cell lines and Wu et al.  used 454 sequencing to characterize the 3' ends of transcripts regardless of whether or not they contained a poly(A) tail. Both groups identified many long poly(A)- transcripts, though there was relatively little overlap between the poly(A)- transcripts identified in these two studies.
In the current study, we have used deep sequencing to separately characterize the poly(A)+ and poly(A)- enriched transcriptomes from both HeLa cells and H9 human embryonic stem cells (hESCs). By comparing the relative abundance of long transcripts (>200 nucleotides) in the poly(A)- and the poly(A)+ libraries, we have identified populations of bimorphic and poly(A)- transcripts. These transcripts include not only known long poly(A)- transcripts such as histone mRNAs, precursors for Cajal body related small RNAs, and lncRNAs, but many other non-polyadenylated (or short poly(A)-tail-containing) transcripts of protein-coding genes and intron-derived lncRNAs. We also observed that some replication-dependent histone mRNAs are specifically expressed in pluripotent cells, and thus may constitute a unique group of markers for pluripotency.
Results and discussion
Identification of poly(A)- transcripts by RNA-Seq
All libraries were then sequenced in three lanes on the Illumina Genome Analyzer IIx (GAIIx) platform. Since the correlation among lanes was greater than r2 = 0.98 (Additional file 1), we the combined data from all lanes of each sample to obtain between 37 and 54 million 75-nucleotide reads from each library (Additional file 2). We used Bowtie [18, 19] to align the reads to a combined database of the Homo sapiens genome (GRCh37/hg19) and annotated splice junction sequences. Figure 1c shows a diagram of our analytical approach. For the poly(A)- libraries, approximately 5.0 and 6.0 million reads in H9 cells and HeLa cells were uniquely aligned, respectively, compared with approximately 23.0 and 33.4 million reads from poly(A)+ samples in H9 cells and HeLa cells (Additional file 2).
We used the uniquely aligned reads to determine the extent of the genome covered by at least 1 or 2 reads. We found that 3.3% and 3.8% of the genome was mapped by at least one read in the H9 and HeLa poly(A)- samples, respectively (Additional file 2), while 0.8% and 1.2% of the genome was mapped by at least two reads in the H9 and HeLa poly(A)- samples, respectively. In contrast, in the poly(A)+ samples, 5.5% and 6.8% of the genome were mapped with at least one read (Additional file 2) and 2.4% and 3.2% of the genome were mapped with at least two reads in H9 and HeLa cells, respectively. Note that due to performing rRNA depletion, size selection, and unique mapping, our poly(A)- data did not include rRNAs, abundant short RNAs (microRNAs, piwi-interacting RNAs (piRNAs), and small interfering RNAs (siRNAs)), tRNAs, snRNAs, many small nucleolar RNAs (snoRNAs) and repetitive transcripts such as the abundant Alu elements, long interspersed nuclear elements (LINEs) and endogenous long terminal repeats.
Classification of poly(A)+, poly(A)-, and bimorphic transcripts
We next classified all expressed annotated transcripts as being either poly(A)+, poly(A)-, or bimorphic predominant subgroups according to their relative abundance using BPKM (bases per kilobase of gene model per million mapped bases; see Materials and methods and ) values for each gene in the poly(A)+ and poly(A)- samples from the same cell line (Figure 1d). Poly(A)- predominant transcripts (for simplicity we use the term 'poly(A)- transcripts' throughout this study) were defined as those with BPKM ≥1, P < 0.05 and at least two-fold greater enrichment from the poly(A)- library compared to the poly(A)+ library. In contrast, poly(A)+ predominant transcripts ('poly(A)+ transcripts') were defined as those with BPKM ≥1, P < 0.05 and at least two-fold greater enrichment from the poly(A)+ library compared to the poly(A)- library. Bimorphic-predominant transcripts ('bimorphic transcripts') were defined as those with BPKM ≥1, P < 0.05 and less than two-fold relative expression between the poly(A)+ and poly(A)- libraries (Figure 1d). A number of apparently poly(A)- or bimorphic genes were discarded following manual examination because they had low/inconsistent expression patterns or contained alternative transcripts expressed from introns. For example, WDR74 was originally identified as a poly(A)- transcript, but the processed WDR74 mRNA is poly(A)+. Mis-characterization resulted from very high expression of an intronic poly(A)- small RNA. Thus, we removed WDR74 from the poly(A)- list. Using the above criteria, we found that although most (84.2% in H9 cells and 74.2% in HeLa cells) of the annotated expressed transcripts are poly(A)+, a significant portion of genes (13.1% in H9 cells and 23.3% in HeLa cells) are bimorphic. In addition, 2.7% and 2.5% of the annotated transcripts are poly(A)- in H9 and HeLa cells, respectively (Figure 1d). Full gene lists are available in Additional files 3 and 4.
It has previously been estimated that between 60% and 80% of transcripts are either poly(A)- or bimorphic [15, 16], a significantly higher number than what we observed. This could be due to numerous technical and experimental differences between the previous studies and ours.
Validation of poly(A)- and poly(A)+ transcripts
We next examined several transcripts that are known to contain a poly(A) tail. These included ncl (nucleolin), ubb (ubiquitin B) and h2afz (h2a histone family, member z). These mRNAs were enriched in the sequence data from the polyA(+) samples for both cell lines (Figure 2d-f, grey and pink colors). As expected, semi-quantitative RT-PCR and qRT-PCR confirmed that 80 to 90% of these mRNAs were present in the poly(A)+ samples in both H9 cells and HeLa cells (Figure 2d-f). In addition, one known polyadenylated lncRNA, the short isoform of neat1 [9, 21], was also significantly enriched in the poly(A)+ sample from HeLa cells (Additional file 7c,d), validation data not shown). Taken together, these validation experiments demonstrated that our method can successfully identify poly(A)+ and poly(A)- transcripts, allowing for a thorough analysis of the transcriptome, including RNAs with different types of 3' ends.
Characterization of bimorphic transcripts
We next randomly selected several bimorphic mRNAs that are expressed either in both cell types (cyclin G1, ccng1), uniquely in H9 cells (nuclear receptor subfamily 6, group A, member 1, nr6a1), or uniquely in HeLa cells (G protein-coupled receptor, family C, group 5, member A, gprc5a) (Additional files 8 and 9), and performed real time RT-PCR to confirm their relative abundance in both RNA fractions. The results confirmed that each of the tested transcripts is present at comparable levels in both the poly(A)+ and poly(A)- samples (Figure 3d). It will be of interest to further investigate whether there are common structural features or sequence motifs that regulate the length of the poly(A) tail in these transcripts. For example, studies by Gu et al.  indicated that the poly(A)-limiting element is a conserved cis-acting sequence that can regulate poly(A) tail length. Several hundred sequences with poly(A) tails of <20 nucleotides were found in human cells, and, consistent with the results of our gene ontology analysis, an extended family of ZNF transcription factors were overrepresented in this list . Owing to a lack of precision of the precise 3'-processing sites of many of our bimorphic transcripts (they do not match the annotated ends), it is not yet possible to compare our results directly with those of Gu et al. .
In addition, as we classify transcripts according to their ability to bind to oligo(dT) cellulose, we cannot discriminate the truly bimorphic transcripts, such as h2afx and neat1, from those whose poly(A) tails are shortened during normal transcript metabolism. While it is not clear exactly how long a tail is necessary for retention on oligo(dT), or how long mRNAs persist once their tails are shortened, in our experiments, many of these transcripts behave in the same way (low affinity to oligo(dT)) in both cell lines, and h2afx and neat1 are accurately classified as bimorphic transcripts under our selective standards. On the other hand, it is possible that some transcripts may have encoded A stretches that might result in retention on oligo(dT) to some extent. We therefore examined some known mRNAs of this type. The conserved human repetitive Alu elements contain long A stretches, and Alu elements are embedded in the 3' UTRs of many transcripts, such as nicn1, paics, pccb, and lin28 [24, 25]; however, we found almost all of these Alu element-containing transcripts to be clearly classified as poly(A)+ in both cell lines. Therefore, transcripts with short encoded A stretches are not likely retained on oligo(dT) under our conditions.
Although it is hard without additional experimental support to predict how many of the classified transcripts truly contain two distinct transcripts, the information we provide here represents a comprehensive list of abundant transcripts that are potentially bimorphic.
Incomplete transcripts do not significantly affect the population of bimorphic transcripts
Characterization of poly(A)- transcripts
Stable excised introns are a new class of long non-coding RNAs
Interestingly, a number of stable excised introns were discovered by manually analyzing our data on the UCSC genome browser. These excised introns were observed in the poly(A)- RNA samples from both H9 and HeLa cells, and therefore could represent a new class of lncRNAs lacking poly(A) tails (Figure 5a,d,e; Additional file 13). Figure 5d shows one example of the excised 16th intron of the azi1 (5-azacytidine induced 1) mRNA (EI-azi1). EI-azi1 accumulates in both H9 and HeLa cells and is only detected in the poly(A)- RNA samples. Figure 5e and Additional file 13 offer a representative list of such highly abundant excised introns from a variety of intron regions in different mRNAs. These abundant, stable excised introns are of different lengths and most can be detected in both tested cell lines. It is well known that the vast majority of excised introns are rapidly degraded after debranching. We do not yet know whether these represent introns that are inefficiently debranched, or whether their accumulation results from specific cis-elements or the association with stabilizing proteins.
In addition to excised introns, we also observed the curious accumulation of several specific exons from internal regions of genes (Additional file 14). In the cases shown, one or two adjacent exons are extremely abundant in the poly(A)- RNA samples, while adjacent exon regions are not. Again, this occurs in samples from both cell lines. Although the mechanisms of formation of these RNAs are unknown, further studies will be focused on their biogenesis and whether these excised introns and exons have specific cellular locations or any specific biological functions.
Specific expression of a group of histone genes in hESCs
The majority of histone genes are expressed as replication-dependent, poly(A)- transcripts. Interestingly, although most histone mRNAs are expressed in all somatic cells, different cell types have been found to express alternative histones [26–29]. More importantly, several recent observations have suggested that the state of chromatin in undifferentiated stem cells appears to be quite different from that of differentiated cells - these cells show a more diffuse and 'hyperdynamic' heterochromatin structure  and some histone modifications on the chromatin are likely to be bivalent . Further, pluripotency may be coupled to a unique cell cycle program characterized by rapid proliferation and a truncated G1 phase [32–34]. As such, the cells devote more than half of the entire cell cycle to S phase and may lack a G1/S checkpoint. Since histone expression is mechanistically coupled to S-phase progression, it is perhaps not surprising to find distinct histone expression in pluripotent cells. Strikingly, however, we found that at least ten poly(A)- histone transcripts are preferentially expressed in H9 cells when compared to HeLa cells (fold change >10, P < 0.05), and one poly(A)- histone transcript is preferentially expressed in HeLa cells (Figure 6a,b; Additional files 16 and 17). In contrast, the expression levels of all poly(A)+ histone transcripts are comparable in both cell types (Figure 6a), although their expression levels are much lower than those of the poly(A)- histone transcripts, consistent with their roles in replication-independent expression [27, 29].
While H9 cells express a number of histone genes that are poorly expressed in HeLa cells, it is important to note that, with the exception of two histone H1 variants (hist1h1b and hist1h1d), all of these genes express proteins that are identical or nearly identical to histones expressed from other loci (data not shown). This suggests that undifferentiated H9 cells may simply require a higher dosage of replication-dependent histone gene expression in order to maintain rapid growth and self-renewal properties. However, the expression of distinct histone H1 variants may be important for the maintenance of the unique chromatin status of these cells. In addition, since some of these replication-dependent histones are expressed from the same gene clusters (Additional file 17), it will be of interest to determine how specific histone gene transcription is regulated in the different cell lines.
Finally, we examined the expression of the hESC-specific histone transcripts described above during H9 and H14 cell differentiation. We treated hESCs with bone morphogenetic protein (BMP)4, which leads to trophoblast lineage differentiation [25, 35, 36] and found that the expression of these histone transcripts was significantly diminished upon differentiation (Figure 6c). For example, early (3 days) after BMP4 treatment of H14 cells the stem cell marker genes oct3/4 and lin28 were still expressed and a trophoblast maker gene hcgβ was just beginning to be expressed. However, at this time we already observed a significant reduction in the expression of hist1h3i and hist1h3j in these cells (Figure 6c, lanes 1 and 2). Prolonged (6 days) BMP4 treatment revealed that expression of all of the hESC-specific histone RNAs was reduced to almost undetectable levels in H9 cells (Figure 6c, lanes 3 and 4). We note, however, that 6 days after induction of differentiation of hESCs by BMP4 the cells grew slowly. Therefore, a complementary approach was taken to address the issue of a connection between specific histone expression and pluripotency. Consistent with a specific pattern of histone gene expression in pluripotent cells, we also observed a similar expression pattern of hESC-specific histone gene transcription upon reprogramming of human fibroblast IMR90 cells (Figure 6c, lanes 5 and 6). The hESC-specific histone mRNAs were expressed at extremely low levels in precursor human diploid IMR90 cells, while their expression significantly increased upon reprogramming to induced pluripotent stem (iPS) cells. Taken together, these observations suggest that a specific group of histone transcripts might serve as a novel group of sensitive pluripotency markers. As these histone transcripts are not abundantly expressed in other dividing cells such as HeLa cells and primary IMR90 cells, the possibility exists that these specific histones are functionally connected to the unique chromatin status of undifferentiated stem cells.
We have used deep sequencing to explore the repertoire of both poly(A)+ and poly(A)- RNAs from two standard cell lines, HeLa cells and hESC H9 cells. This work provides a resource for not only the discovery but also for the study of many novel aspects of gene regulation. We found while the majority of the transcripts are poly(A)+, a significant portion of transcripts are either poly(A)- or bimorphic. Our sequencing data not only allow us to show that a number of mRNAs that are important for many important biological processes may contain short poly(A) tails (Figures 3 and 5), but also provide a useful tool to visualize some transcripts showing 5' or 3' end enrichment (Figure 4; Additional files 7 and 12). Furthermore, we also identified excised introns as a new class of stable non-polyadenylated lncRNAs (Figure 5d,e; Additional file 13). Finally, in addition to the identification of poly(A)- mRNAs and non-coding RNAs, we found that a specific subset of poly(A)- histone mRNAs are expressed in undifferentiated hESCs and are rapidly diminished upon differentiation (Figure 6). Further, these same histone genes are induced upon reprogramming of fibroblasts to iPS cells. In conclusion, we offer a rich source of data that allows a deeper exploration of the poly(A)- landscape of the eukaryotic transcriptome. This approach can also be applied to the analysis of the poly(A)- transcriptomes of other model organisms.
Materials and methods
Cell culture and differentiation
HeLa cells were cultured under standard conditions. hES H9, H14 cell lines and iPS cell lines were maintained on plates coated with Matrigel (BD Biosciences, Bedford, MA, USA) in either defined mTeSR medium (StemCell Technologies Inc., Vancouver, BC, Canada) or conditioned medium with irradiated mouse embryo fibroblasts supplemented with 4 ng/ml human basic fibroblast growth factor (Life Technologies, Inc., Grand Island, NY, USA) [25, 35, 36]. Passages 2 to 6 of IMR90 cells were used in this study. For trophoblast differentiation, hESCs were treated with 100 ng/ml BMP4 (R&D Systems, Minneapolis, MN, USA) in the presence of conditioned medium and basic fibroblast growth factor for the indicated days [25, 35, 36]. Human iPS (IMR90) cell lines were generated from IMR90 precursor cells and were verified at the UConn Stem Cell Core [37, 38] and confirmed positive for Tra-1-81, Tra-1-60, SSEA-3 and SSEA-4 by immunofluorescence and teratoma formation . Pluripotent cell cultures were regularly evaluated for Oct3/4 expression every 3 to 4 weeks and cells were passaged every 6 to 7 days.
Poly(A)+ and poly(A)- RNA separation
Total RNAs were prepared using Trizol Reagent (Life Technologies, Carlsbad, CA, USA). After treatment with DNase I (DNA-free kit; Ambion, Austin, TX, USA), total RNAs were incubated with oligo(dT) magnetic beads to isolate either poly(A)+ RNAs, which were bound to beads, or poly(A)- RNAs, which were present in the flowthrough after incubation. Oligo(dT) magnetic bead selection was performed three times to ensure pure poly(A)+ or poly(A)- populations. The poly(A)- RNA population was further processed with the RiboMinus kit (Human/Mouse Module, Invitrogen, Carlsbad, CA, USA) to deplete most of the abundant ribosomal RNAs (Figure 1).
All RNA-Seq libraries were prepared using the Illumina mRNA-Seq Sample Prep Kits (P/N 1004814) according to the manufacturer's instructions. Briefly, poly(A)- or poly(A)+ RNAs were fragmented using divalent cations at elevated temperature, reverse transcribed with random hexamers to obtain double-stranded cDNA fragments, which were end-repaired and 5' end phosphorylated. After adding 'A' bases to the 3' ends, Illumina adaptor oligonucleotides were ligated to the cDNA fragments and approximately 300-bp fragments were isolated from an agarose gel, followed by PCR amplification and gel purification. The cDNA libraries were then individually loaded onto flowcells for cluster generation (version 2) after quantification with Nanodrop, and sequenced on an Illumina Genome Analyzer IIx using a single-read protocol of 75 cycles with v3 chemistry. All sequence files can be accessed from the NCBI Sequence Read Archive by Gene Expression Omnibus accession number [GEO:GSE24399].
The poly(A)+ or poly(A)- sequence reads were uniquely aligned to the human hg19 genome and splice junction index by using Bowtie [18, 19], allowing up to two mismatches. Wiggle track files were generated from bowtie output files by a custom bowtie2wiggle script and correlations among different samples were calculated with MATLAB. Replicate lanes were then concatenated and viewed on the UCSC genome browser. The normalized read density of each gene was set for comparison on the UCSC genome browser with a normalized read density value. Since the size of the wiggle tracks of the concatenated poly(A)+ samples for both cell lines exceeded the limit of the UCSC genome browser tracks for uploading, we compared one poly(A)+ sequencing dataset from each cell line to the concatenated poly(A)- sample of that cell for visualization purposes. Thus, while all data were used for analysis, a single sequencing round for poly(A)+ RNA was used for visualization purposes but concatenated data were used for poly(A)- visualizations. Normalized gene expression levels were determined in units of BPKM for all 27,297 annotated genes (hg19, 2009, UCSC) using wig_integrator.pl (Additional files 3 and 4). BPKM is the simple sum (integral) of base coverage over the limits defined by a given feature (exon, transcript, gene) from the annotated genome and represents the integral of the wiggle track over feature interval limits, which is then normalized by total aligned bases and the length of the feature.
Genes were classified into different subgroups according to their 3' end structures using several parameters, including BPKM values for expression level, fold changes of poly(A)- reads verse poly(A)+ reads, and P-value of fold change determined by Wald test analysis with a custom perl script.
Poly(A)- predominant subgroup
For each gene in this subgroup, the BPKM value from a poly(A)- sample must be ≥1, the fold change of the BPKM value of poly(A)- versus the BPKM value of poly(A)+ must be ≥2, and the P-value of fold change must be <0.05 (Wald score >1.96).
Poly(A)+ predominant subgroup
For each gene in this subgroup, the BPKM value from the poly(A)+ sample must be ≥1, the fold change of the BPKM value of poly(A)- versus the BPKM value of poly(A)+ must be ≤0.5, and the P-value of fold change must be <0.05 (Wald score <-1.96).
For each gene in this subgroup, the BPKM value from the poly(A)+ sample or poly(A)- sample must be ≥1, the fold change of the BPKM value of poly(A)- versus the BPKM value of poly(A)+ must be between 0.5 and 2, and the P-value of the fold change must be <0.05 (Wald score >1.96 or <-1.96).
Subgroup with low expression and/or low significant changes
For each gene in this subgroup, the BPKM value is <1, and/or the P-value of fold change is >0.05 (-1.96 < Wald score < 1.96). This group also included genes for which there was no unique read aligned under the conditions used in this study. This group was not analyzed further in this study.
RT-PCR and/or qRT-PCR were preformed from independent poly(A)- and poly(A)+ enriched samples from different cell lines for validation. Isolated RNA samples were resuspended in the same amount of DEPC-H2O and 1 μg of each sample was reverse transcribed to cDNAs using SuperScript II (Invitrogen) and random hexamers. In addition, cDNA from the same amount of unfractionated (total) RNA was also transcribed as a control. To be consistent, all semi-quantitative PCRs were amplified with either 26 or 28 cycles (depends on the relative abundance of specific transcripts in the transcriptome) to visualize the differences from different fractionations. For the real time PCR, the relative abundance of each tested transcript in either poly(A)- or poly(A)+ enriched samples was normalized to total RNA. We mathematically assumed the total RNA equals the poly(A)+ RNA plus poly(A)- RNA; therefore, if the signal in poly(A)- RNA was two-fold that in poly(A)+, it would count for two-thirds of the signal from the total RNA. Primer sets are listed in Additional file 18.
bone morphogenetic protein
bases per kilobase of gene model per million mapped bases
human embryonic stem cell
induced pluripotent stem
long non-coding RNAs
- poly(A)- RNAs:
- poly(A)+ RNAs:
small nuclear RNA
University of California, Santa Cruz
zinc finger proteins.
We thank the UCHC Translational Genomics Core for use of the Illumina Genome Analyzer. H9 and H14 cells were obtained from the WiCell Research Institute and the CT Stem Cell Core Facility. This work was supported by grant 0925347 from the National Science Foundation to GGC, grant XDA01010206 from the "Strategic Priority Research Program" of the Chinese Academy of Sciences, grant 2011CBA01105 from the National Basic Research Program of China, and awards from the State of Connecticut under the Connecticut Stem Cell Research Grants Program to LLC, GGC and BRG. Its contents are solely the responsibility of the authors and do not necessarily represent the official views of the State of Connecticut, the Department of Public Health of the State of Connecticut, or Connecticut Innovations, Inc.
- Moore MJ, Proudfoot NJ: Pre-mRNA processing reaches back to transcription and ahead to translation. Cell. 2009, 136: 688-700. 10.1016/j.cell.2009.02.001.PubMedView ArticleGoogle Scholar
- Manley JL, Proudfoot NJ, Platt T: RNA 3'-end formation. Genes Dev. 1989, 3: 2218-2244. 10.1101/gad.3.12b.2218.PubMedView ArticleGoogle Scholar
- Pan Q, Shai O, Lee LJ, Frey BJ, Blencowe BJ: Deep surveying of alternative splicing complexity in the human transcriptome by high-throughput sequencing. Nat Genet. 2008, 40: 1413-1415. 10.1038/ng.259.PubMedView ArticleGoogle Scholar
- Wang ET, Sandberg R, Luo S, Khrebtukova I, Zhang L, Mayr C, Kingsmore SF, Schroth GP, Burge CB: Alternative isoform regulation in human tissue transcriptomes. Nature. 2008, 456: 470-476. 10.1038/nature07509.PubMedPubMed CentralView ArticleGoogle Scholar
- Li JB, Levanon EY, Yoon JK, Aach J, Xie B, Leproust E, Zhang K, Gao Y, Church GM: Genome-wide identification of human RNA editing sites by parallel DNA capturing and sequencing. Science. 2009, 324: 1210-1213. 10.1126/science.1170995.PubMedView ArticleGoogle Scholar
- Wilhelm BT, Marguerat S, Watt S, Schubert F, Wood V, Goodhead I, Penkett CJ, Rogers J, Bahler J: Dynamic repertoire of a eukaryotic transcriptome surveyed at single-nucleotide resolution. Nature. 2008, 453: 1239-1243. 10.1038/nature07002.PubMedView ArticleGoogle Scholar
- Marzluff WF, Wagner EJ, Duronio RJ: Metabolism and regulation of canonical histone mRNAs: life without a poly(A) tail. Nat Rev Genet. 2008, 9: 843-854. 10.1038/nrg2438.PubMedPubMed CentralView ArticleGoogle Scholar
- Wilusz JE, Freier SM, Spector DL: 3' end processing of a long nuclear-retained non-coding RNA yields a tRNA-like cytoplasmic RNA. Cell. 2008, 135: 919-932. 10.1016/j.cell.2008.10.012.PubMedPubMed CentralView ArticleGoogle Scholar
- Sunwoo H, Dinger ME, Wilusz JE, Amaral PP, Mattick JS, Spector DL: MEN epsilon/beta nuclear-retained non-coding RNAs are up-regulated upon muscle differentiation and are essential components of paraspeckles. Genome Res. 2009, 19: 347-359. 10.1101/gr.087775.108.PubMedPubMed CentralView ArticleGoogle Scholar
- Milcarek C, Price R, Penman S: The metabolism of a poly(A) minus mRNA fraction in HeLa cells. Cell. 1974, 3: 1-10. 10.1016/0092-8674(74)90030-0.PubMedView ArticleGoogle Scholar
- Salditt-Georgieff M, Harpold MM, Wilson MC, Darnell JE: Large heterogeneous nuclear ribonucleic acid has three times as many 5' caps as polyadenylic acid segments, and most caps do not enter polyribosomes. Mol Cell Biol. 1981, 1: 179-187.PubMedPubMed CentralView ArticleGoogle Scholar
- Katinakis PK, Slater A, Burdon RH: Non-polyadenylated mRNAs from eukaryotes. FEBS Lett. 1980, 116: 1-7. 10.1016/0014-5793(80)80515-1.PubMedView ArticleGoogle Scholar
- Gu H, Das Gupta J, Schoenberg DR: The poly(A)-limiting element is a conserved cis-acting sequence that regulates poly(A) tail length on nuclear pre-mRNAs. Proc Natl Acad Sci USA. 1999, 96: 8943-8948. 10.1073/pnas.96.16.8943.PubMedPubMed CentralView ArticleGoogle Scholar
- Meijer HA, Bushell M, Hill K, Gant TW, Willis AE, Jones P, de Moor CH: A novel method for poly(A) fractionation reveals a large population of mRNAs with a short poly(A) tail in mammalian cells. Nucleic Acids Res. 2007, 35: e132-10.1093/nar/gkm830.PubMedPubMed CentralView ArticleGoogle Scholar
- Cheng J, Kapranov P, Drenkow J, Dike S, Brubaker S, Patel S, Long J, Stern D, Tammana H, Helt G, Sementchenko V, Piccolboni A, Bekiranov S, Bailey DK, Ganesh M, Ghosh S, Bell I, Gerhard DS, Gingeras TR: Transcriptional maps of 10 human chromosomes at 5-nucleotide resolution. Science. 2005, 308: 1149-1154. 10.1126/science.1108625.PubMedView ArticleGoogle Scholar
- Wu Q, Kim YC, Lu J, Xuan Z, Chen J, Zheng Y, Zhou T, Zhang MQ, Wu CI, Wang SM: Poly A-transcripts expressed in HeLa cells. PLoS One. 2008, 3: e2803-10.1371/journal.pone.0002803.PubMedPubMed CentralView ArticleGoogle Scholar
- Cui P, Lin Q, Ding F, Xin C, Gong W, Zhang L, Geng J, Zhang B, Yu X, Yang J, Hu S, Yu J: A comparison between ribo-minus RNA-sequencing and polyA-selected RNA-sequencing. Genomics. 2010, 96: 259-265. 10.1016/j.ygeno.2010.07.010.PubMedView ArticleGoogle Scholar
- Langmead B, Trapnell C, Pop M, Salzberg SL: Ultrafast and memory-efficient alignment of short DNA sequences to the human genome. Genome Biol. 2009, 10: R25-10.1186/gb-2009-10-3-r25.PubMedPubMed CentralView ArticleGoogle Scholar
- Brooks AN, Yang L, Duff MO, Hansen KD, Park JW, Dudoit S, Brenner SE, Graveley BR: Conservation of an RNA regulatory map between Drosophila and mammals. Genome Res. 2010, 21: 193-202. 10.1101/gr.108662.110.PubMedView ArticleGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5: 621-628. 10.1038/nmeth.1226.PubMedView ArticleGoogle Scholar
- Hutchinson JN, Ensminger AW, Clemson CM, Lynch CR, Lawrence JB, Chess A: A screen for nuclear transcripts identifies two linked non-coding RNAs associated with SC35 splicing domains. BMC Genomics. 2007, 8: 39-10.1186/1471-2164-8-39.PubMedPubMed CentralView ArticleGoogle Scholar
- Bailey TL, Elkan C: Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of the Second International Conference on Intelligent Systems for Molecular Biology; August 14-17, 1994: Stanford, California. 1994, Menlo Park, California: AAAI Press, 28-36.Google Scholar
- Multiple Em for Motif Elicitation. [http://meme.nbcr.net/meme4_4_0/intro.html]
- Chen LL, DeCerbo JN, Carmichael GG: Alu element-mediated gene silencing. EMBO J. 2008, 27: 1694-1705. 10.1038/emboj.2008.94.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen LL, Carmichael GG: Altered nuclear retention of mRNAs containing inverted repeats in human embryonic stem cells: Functional role of a nuclear non-coding RNA. Mol Cell. 2009, 35: 467-478. 10.1016/j.molcel.2009.06.027.PubMedPubMed CentralView ArticleGoogle Scholar
- Doenecke D, Albig W, Bode C, Drabent B, Franke K, Gavenis K, Witt O: Histones: genetic diversity and tissue-specific gene expression. Histochem Cell Biol. 1997, 107: 1-10. 10.1007/s004180050083.PubMedView ArticleGoogle Scholar
- Ausio J: Histone variants - the structure behind the function. Brief Funct Genomic Proteomic. 2006, 5: 228-243. 10.1093/bfgp/ell020.PubMedView ArticleGoogle Scholar
- Izzo A, Kamieniarz K, Schneider R: The histone H1 family: specific members, specific functions?. Biol Chem. 2008, 389: 333-343. 10.1515/BC.2008.037.PubMedView ArticleGoogle Scholar
- Henikoff S, Ahmad K: Assembly of variant histones into chromatin. Annu Rev Cell Dev Biol. 2005, 21: 133-153. 10.1146/annurev.cellbio.21.012704.133518.PubMedView ArticleGoogle Scholar
- Meshorer E, Yellajoshula D, George E, Scambler PJ, Brown DT, Misteli T: Hyperdynamic plasticity of chromatin proteins in pluripotent embryonic stem cells. Dev Cell. 2006, 10: 105-116. 10.1016/j.devcel.2005.10.017.PubMedPubMed CentralView ArticleGoogle Scholar
- Bernstein BE, Mikkelsen TS, Xie X, Kamal M, Huebert DJ, Cuff J, Fry B, Meissner A, Wernig M, Plath K, Jaenisch R, Wagschal A, Feil R, Schreiber SL, Lander ES: A bivalent chromatin structure marks key developmental genes in embryonic stem cells. Cell. 2006, 125: 315-326. 10.1016/j.cell.2006.02.041.PubMedView ArticleGoogle Scholar
- Fujii-Yamamoto H, Kim JM, Arai K, Masai H: Cell cycle and developmental regulations of replication factors in mouse embryonic stem cells. J Biol Chem. 2005, 280: 12976-12987. 10.1074/jbc.M412224200.PubMedView ArticleGoogle Scholar
- White J, Dalton S: Cell cycle control of embryonic stem cells. Stem Cell Rev. 2005, 1: 131-138. 10.1385/SCR:1:2:131.PubMedView ArticleGoogle Scholar
- Becker KA, Ghule PN, Therrien JA, Lian JB, Stein JL, van Wijnen AJ, Stein GS: Self-renewal of human embryonic stem cells is supported by a shortened G1 cell cycle phase. J Cell Physiol. 2006, 209: 883-893. 10.1002/jcp.20776.PubMedView ArticleGoogle Scholar
- Xu RH, Chen X, Li DS, Li R, Addicks GC, Glennon C, Zwaka TP, Thomson JA: BMP4 initiates human embryonic stem cell differentiation to trophoblast. Nat Biotechnol. 2002, 20: 1261-1264. 10.1038/nbt761.PubMedView ArticleGoogle Scholar
- Xu RH, Peck RM, Li DS, Feng X, Ludwig T, Thomson JA: Basic FGF and suppression of BMP signaling sustain undifferentiated proliferation of human ES cells. Nat Methods. 2005, 2: 185-190. 10.1038/nmeth744.PubMedView ArticleGoogle Scholar
- Takahashi K, Tanabe K, Ohnuki M, Narita M, Ichisaka T, Tomoda K, Yamanaka S: Induction of pluripotent stem cells from adult human fibroblasts by defined factors. Cell. 2007, 131: 861-872. 10.1016/j.cell.2007.11.019.PubMedView ArticleGoogle Scholar
- Zeng H, Park JW, Guo M, Lin G, Crandall L, Compton T, Wang X, Li XJ, Chen FP, Xu R: Lack of ABCG2 expression and side population properties in human pluripotent stem cells. Stem Cells. 2009, 27: 2435-2445. 10.1002/stem.192.PubMedView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.