Expanded identification and characterization of mammalian circular RNAs
© Guo et al.; licensee BioMed Central Ltd 2014
Received: 9 April 2014
Accepted: 29 July 2014
Published: 29 July 2014
The recent reports of two circular RNAs (circRNAs) with strong potential to act as microRNA (miRNA) sponges suggest that circRNAs might play important roles in regulating gene expression. However, the global properties of circRNAs are not well understood.
We developed a computational pipeline to identify circRNAs and quantify their relative abundance from RNA-seq data. Applying this pipeline to a large set of non-poly(A)-selected RNA-seq data from the ENCODE project, we annotated 7,112 human circRNAs that were estimated to comprise at least 10% of the transcripts accumulating from their loci. Most circRNAs are expressed in only a few cell types and at low abundance, but they are no more cell-type-specific than are mRNAs with similar overall expression levels. Although most circRNAs overlap protein-coding sequences, ribosome profiling provides no evidence for their translation. We also annotated 635 mouse circRNAs, and although 20% of them are orthologous to human circRNAs, the sequence conservation of these circRNA orthologs is no higher than that of their neighboring linear exons. The previously proposed miR-7 sponge, CDR1as, is one of only two circRNAs with more miRNA sites than expected by chance, with the next best miRNA-sponge candidate deriving from a gene encoding a primate-specific zinc-finger protein, ZNF91.
Our results provide a new framework for future investigation of this intriguing topological isoform while raising doubts regarding a biological function of most circRNAs.
Many classes of non-protein-coding RNAs (ncRNAs) exist in cells ,, and members of each class play important roles in either regulating gene expression or other biological processes –. For example, microRNAs (miRNAs) pair to sites within messenger RNAs (mRNAs) to target the mRNAs for translational repression and/or mRNA destabilization . In an intriguing elaboration of this regulatory pathway, the activity of the mammalian miR-7 miRNA can be inhibited by CDR1as/ciRS-7, which is in turn targeted by another miRNA, miR-671, which shows near-perfect complementarity and triggers endonucleolytic cleavage of CDR1as –. CDR1as is a circular RNA (circRNA) deriving from an antisense transcript of the CDR1 protein-coding gene . With >60 conserved sites for miR-7, CDR1as is thought to act as a sponge to titrate miR-7 from its other targets ,. A second circRNA proposed to act as a sponge is the testis-specific transcript of the male sex-determining gene Sry, which contains 16 sites for miR-138 . Because circRNAs lack poly(A) tails and 5′ termini, they would escape the deadenylation, decapping and degradation normally caused by miRNA association , an obvious advantage for an RNA acting as a miRNA sponge ,.
To begin to consider potential roles of circRNAs in post-transcriptional regulation, we developed a computational pipeline that identifies circRNAs from long-read RNA-seq data without relying on gene annotations. The pipeline resembled that reported previously , except it quantifies and considers the abundance of each circular isoform with respect to its alternative linear isoforms. Applying this pipeline to the non-poly(A)-selected RNA-seq data from the ENCODE project, we catalogued >7,000 human circRNAs and characterized their global properties, acquiring new insights regarding their biogenesis, the cell-type specificity of their expression, the extent to which they are conserved, the extent to which they are translated and their potential to act as miRNA sponges.
Properties of human circRNAs
To identify circRNAs from RNA-seq data, we developed the following computational pipeline (Figure 1B). We first mapped all the RNA-seq reads to the genome using Bowtie in single-end mode, allowing ≤2 mismatches. Then we used BLAT to find partial alignment of the unmapped reads. Dual alignments of two read segments mapping to the genome in the reversed order were indicative of circRNAs. The circular fraction (that is, the fraction of the circular isoform relative to all transcripts from the same locus) was quantified for each circRNA candidate by counting relevant reads from the same sample. We performed circRNA identification and quantification using all the currently available whole-cell non-poly(A)-selected RNA-seq data from the ENCODE project , which included a large variety of cultured cell types (Table S1A in Additional file 1). As in some previous studies ,, our pipeline used the assembled genome for sequence alignment but disregarded its annotations, and thus it was not affected by incomplete or inaccurate genome annotations and was not biased in favor of alternative isoforms of pre-mRNAs.
circRNAs produced from back-splicing would be expected to have splicing signals at their junctions. Introns spliced by the major spliceosome usually contain the GU dinucleotide at their 5′ end (the splice donor) and the AG dinucleotide at their 3′ end (the splice acceptor) . Indeed, when we analyzed all the dinucleotide frequencies in 10-nucleotide genomic windows mapping to each observed circular junction, a vast majority of candidate circular junctions contained the GT dinucleotide within 5 nucleotides of the putative donor end and the AG dinucleotide within 5 nucleotides of the putative acceptor end (Figure 1C; Figure S1A in Additional file 2). Moreover, a search for motifs within 10-nucleotide genomic windows flanking the circular junctions recaptured the canonical sequence motifs of splice donors and acceptors (Figure S1B in Additional file 2). When considering the minority of candidates without GT-AG-flanking junctions, no pronounced dinucleotide enrichment or significant motif was observed (Figure S1A,B in Additional file 2).
Reasoning that for biological circRNAs a higher fraction of the transcript isoforms might be circular, as is the case for CDR1as, for which almost no linear isoform could be detected ,, we calculated for each candidate the fraction of its transcript isoforms that were circular and compared the circular fractions of groups of circRNA candidates with different flanking dinucleotide signatures. The circular fractions of GT-AG-flanking candidates tended to be greater than those of the remaining candidates, with the circular fractions of most non-GT-AG-flanking candidates falling below 1% (Figure 1D). To test the extent to which the minor spliceosome might contribute to circRNA formation, we examined the distribution of circular fractions for AT-AC-flanking candidates, but observed no difference from the other non-GT-AG-flanking candidates (Figure 1D).
Collectively, these results indicated that back-splicing by the major spliceosome generates most, if not all, cellular circRNAs. Candidates without these splicing signals were more likely to have arisen from sequencing artifacts (such as chimeric RNA-seq reads resulting from template switching during reverse transcription or PCR), which justified the filter for GT-AG splicing signals imposed in previous pipelines . To maximize the specificity of our pipeline, we carried forward only those candidates flanked by the GT-AG splicing signals, recognizing the possibility that a few candidates discarded by this filter might be authentic circRNAs generated by mechanisms that do not involve the spliceosome, as shown in Archaea . As a second quality filter, we also required that each circRNA have a circular fraction ≥10% in two or more samples. This requirement filtered out about two-thirds of the circRNAs in each sample. With these filters, we annotated 7,112 circRNAs from 39 biological samples representing a large variety of human cell lines (Table S2A in Additional file 3).
Assuming that each circRNA had the same exon structure as the current GENCODE annotation at its locus, we found that most circRNAs spanned <5 exons (Figure 1E), with the distribution of exon abundance resembling that reported for the other GENCODE-annotated ncRNA genes in the human genome . The distribution of circRNA exonic sequence lengths also resembled that of ncRNAs, with a median length of 547 nucleotides, compared with 566 and 2,149 nucleotides for ncRNAs and mRNAs, respectively (Additional file 4). More than half of the circRNAs consisted of only protein-coding exons (Figure 1F), whereas smaller fractions also contained 5′ untranslated regions (UTRs), 3′ UTRs, or both. CDR1as was among the 68 circRNAs that mapped antisense to annotated protein-coding genes. Another 67 circRNAs mapped to annotated long intervening ncRNAs (lincRNAs) , and 342 mapped between annotated genes, with no sense or antisense overlap.
Because many circRNAs contained multiple exons (Figure 1E) and previous studies have noticed retained introns in a few circRNAs ,, we more systematically examined whether introns within circRNAs were efficiently removed. We started by mapping all the mate reads of the circular junction-spanning reads in the CD34+ hematopoietic progenitor cells sample. If intra-circular splicing did not occur, most of the mate reads would be expected to map to the first upstream or downstream intron from the back-spliced donor or acceptor, respectively (Figure 1G). We found that approximately 80% of the mates reads that did not map to the same exons as the circular junctions mapped to their neighboring exons, indicating that introns within circRNAs were usually spliced out, although a substantial fraction (approximately 20%) were retained (Figure 1G).
Comparison with previous circRNA catalogs
When comparing our circRNA catalog with those of previous studies, we found that most annotated circRNAs were present in only one catalog (Additional file 5), presumably because of differences in cell types, cutoffs and computational pipelines. A key difference between our catalog and those of others was our requirement that the circRNAs have a circular fraction ≥10%, which prompted us to examine the extent to which this filter explained the differences between our catalog and those of others. For each catalog, we randomly selected one cell type used to build the catalog and quantified the circular fraction of the circRNAs identified in that cell type by the corresponding study, using non-poly(A)-selected RNA-seq data of that cell type. Due to our circular-fraction filter, all the circRNAs from our study had circular fractions of ≥10% (Additional file 5). About half of the circRNAs identified by the Memczak et al. study  had circular fractions of ≥10%, whereas less than 10% of the circRNAs from the other two studies, which used either RNase R-treated  or poly(A)-depleted RNA-seq data  to enrich for circRNAs, had circular fractions ≥10%.
Trans-splicing rarely contributed to back-spliced junctions
Although analysis of mate reads would have identified more trans-spliced products if many members of our catalog were in fact trans-spliced and not circular, this analysis presumably missed evidence of trans-splicing in cases for which the exonic distance between the trans-spliced acceptor and donor was too large to exclude any mate reads, which was the case for most circRNAs (Additional file 4). As an orthogonal approach for discriminating between back-spliced and trans-spliced products we considered their polyadenylation status . Poly(A) selection should deplete circRNAs but not trans-spliced products, which are linear and thus expected to have poly(A) tails (Figure 2A). Indeed, using data from U2OS cells, which were independent of the data we used for circRNA discovery, we found that of the 598 members of our catalog detected through junction-spanning reads in non-poly(A)-selected RNA-seq data, only 20 were detected in poly(A)-selected RNA-seq data, as indicated by circular fractions exceeding zero for only these 20 members in the poly(A)-selected data (Figure 2D). Moreover, only six members of our catalog were detected in the poly(A)-selected data but not the non-poly(A)-selected data. The 20 detected in both datasets presumably include both trans-spliced products and circRNAs from loci that also produce trans-spliced isoforms. These observations, in conjunction with the lack of translation across the circular junctions (see below), indicated that trans-splicing contributed very few (<5%) false positives in our circRNA catalog, despite a previous study reporting that shuffled splice isoforms are predominantly trans-spliced products . We attribute our high specificity to our use of non-poly(A)-selected samples for circRNA identification (whereas the previous report started with poly(A)-selected samples) and our requirement that the circular fraction exceeded 10% in at least two samples. These results are consistent with previous studies showing that circRNAs are non-polyadenylated  or RNase R-resistant ,.
Expression of circRNAs
We next examined the cell type specificity of circRNA expression. The 39 biological samples varied in the number of detectable circRNAs (Figure 3C). Although 1,500 to 3,000 circRNAs passed our cutoffs in most cell types, some cell types (for example, HFDPCs (follicle dermal papilla cells)) had approximately three times more circRNAs in the final catalog than others (for example, HAoECs (thoracic aortic endothelial cells)) (Figure 3C). This variation could not be explained by the differences in sequencing depths (Additional file 6).
Although some circRNAs (including CDR1as) were more ubiquitously expressed, most were found in only a few cell types (Figure 3D). To assess whether circRNAs were any more cell type specific than their linear counterparts, we compared the Jensen-Shannon specificity scores  of circRNAs with those of a cohort of linearly spliced exon pairs with the same distribution of expression levels (that is, the same distribution of total junction-spanning reads) as the circRNA set. The expression of circular junctions was not more cell type-specific than that of the control cohort of linear junctions (Figure 3E), and the expression of both was less cell type-specific than that of lincRNAs . To test whether the efficiency of circularization might be regulated in a cell-type-specific manner, we examined the circular fractions of 1,299 circRNAs for which the availability of both the donor and the acceptor sites were each supported by ≥5 reads in all 39 samples. The circular fractions of these circRNAs were nearly as correlated between cell types (median Spearman’s ρ = 0.60 to 0.75) (Figure 3F) as they were between biological replicates (median Spearman’s ρ = 0.75). Taken together, our results suggested that circRNA expression is not any more regulated than expected from the availability of the primary transcripts. We compiled a list of 57 circRNAs, including CDR1as, for which the circular fraction was ≥50% in most cell types in which transcript isoforms were detected (Table S2B in Additional file 3).
To examine their subcellular localization, we quantified the circular fractions of circRNAs in each of the subcellularly fractionated K562 samples, focusing on the 514 circRNAs detected in the K562 whole-cell samples (Additional file 7). Consistent with previous results on a few circRNAs ,, most of these circRNAs were predominantly in the poly(A)-depleted cytoplasmic samples.
Conservation of circRNAs between human and mouse
The derivation of most circRNAs from coding exons complicates analysis of sequence conservation that might provide evidence for sequence-dependent biological function of the circular isoforms. A previous analysis of 223 circRNAs that both derive from coding exons and have orthologous circular isoforms in mouse reported elevated conservation levels in the third nucleotide positions of codons when compared to a control cohort of linear coding exons that were chosen to match the conservation levels at the first and second codon positions . We were able to reproduce these results using the previous list of circRNAs and found that the elevated conservation at the third codon positions was robust when compared with 1,000 different control cohorts (Figure S7A in Additional file 10). Applying this analysis to our list of 130 human circRNAs with mouse orthologs also indicated elevated conservation of the third codon positions (Figure S7A in Additional file 10). Following up on this result, we compared the nucleotide conservation of coding exons within circRNAs to their neighboring linear coding exons, reasoning that the neighboring linear exons would better control for transcript expression levels as well as other unanticipated factors that might correlate with circRNA identification. When using these alternative controls, we did not detect significantly elevated conservation in the third codon positions for either the previous list of circRNAs (Figure S7B in Additional file 10) or our new list (Figure 4E), which argued against the notion that sequence-dependent noncoding functions are enriched within circRNAs.
No evidence for translation of circRNAs
The potential of other circRNAs to act as miRNA sponges
Because the AGO2-crosslinking sites were determined in HEK293 cells, circRNAs and miRNAs not expressed in HEK293 cells were missed by this analysis. We thus concatenated the annotated exons within each circRNA, and counted the number of canonical 7- and 8-nucleotide target sites  for each of the 87 miRNA families conserved across vertebrates. Again, CDR1as ranked on top, containing 71 miR-7 sites (Figure 6C). CDR1as-miR-7 was also the only circRNA-miRNA pair that exceeded the upper limit of results from the negative control, in which the analysis was repeated with permutated miRNA sequences (Figure 6C). We conclude that among the human circRNAs, CDR1as stands alone as the most compelling miRNA sponge for any conserved miRNA seed family.
Our analysis of miRNA site number also pointed to circRNAs from the repeat-rich C2H2 zinc finger (ZNF) gene family (Figure 6D). In particular, a circRNA generated from the ZNF91 locus (circRNA-ZNF91) contains 24 miR-23 sites (Figure 6E), 19 of which were 8-nucleotide sites. These numbers exceeded that of the other proposed miRNA sponge, mouse Sry, which has 16 miR-138 sites . ZNF91 belongs to a C2H2 zinc finger (ZNF) gene family that is greatly expanded in the primate lineage and known to contain exceptionally abundant target sites for several miRNA families, including miR-23, miR-181 and miR-199 . The next nine ZNF circRNAs ranked by the total number of sites for these three miRNA families had 7 to 15 sites to one of the 3 families (Figure 6D). Expanding our miRNA site search beyond the 87 miRNA families conserved beyond mammals to the 66 miRNA families conserved only within the mammalian lineage (Figure S9A in Additional file 12), we found that circRNA-ZNF91 had 39 additional sites for miR-296 (Figure 6E). CDR1as also had 22 sites for the miR-876-5p/3167 family (Figure S9B in Additional file 12), although they were not as conserved as the miR-7 sites.
Because molecular studies of eukaryotic RNA typically begin with poly(A)-selection, circRNAs have often escaped detection and consideration. Our study adds to previous circRNA annotation efforts ,,, to yield an expanded catalog of circRNAs robustly detected from a large variety of human cell types. Our circRNA identification method resembles that previously used ,, except we focused our analyses on the circRNA loci with circular fractions ≥10%. Other recent studies take a more targeted approach and search for back-spliced junctions from annotated splice sites , and therefore miss the unannotated genes and exons, especially those that have particularly high circular fractions and are rarely found in the poly(A)+ RNA-seq data, such as CDR1as. Moreover, unlike previous studies that identify circRNAs from poly(A)-depleted RNA-seq data ,, we applied our pipeline to non-poly(A)-selected RNA-seq data, which were neither depleted nor enriched in circRNAs or their linear isoforms. An advantage of using these datasets is that we could directly estimate circular fractions without experimental calibration .
With this catalog of 7,112 human circRNAs in hand, the key question is whether they comprise an underappreciated class of molecules with cellular functions, or whether they are largely inert side-products of imperfect pre-mRNA splicing. The circRNA with the most compelling evidence for a biological function is the miR-7 sponge, CDR1as. Although a biological context has not yet been identified in which CDR1as loss-of-function influences miR-7 activity, this circRNA has >60 conserved sites to miR-7 and a developmental phenotype following its ectopic delivery ,. The other circRNA proposed to act as a miRNA sponge, mouse Sry, has only one miR-138 site in its human homolog, which indicates that the proposed sponge function is not conserved in mammals.
What about functional potential of the other 7,000-plus circRNAs? By characterizing the molecular abundance and translation of circRNAs and providing an updated perspective on their sequence conservation and potential to act as miRNA sponges, our analyses can speak to this question. Although we found thousands of circRNAs in each cell type, only approximately 2% (20 to 60, depending on the cell type) had circular fractions exceeding 50%, which indicates that most were minor alternative isoforms of their respective primary transcripts. Moreover, fewer than 10% had FPKMs ≥10 in any of the 39 samples examined. Considering that in homogeneous cell types one molecule per cell usually corresponds to an FPKM of 1 to 4 , most circRNAs only accumulated to a few molecules per cell. This generally low circular fraction and weak accumulation was observed despite the expectation that each circRNA, by virtue of its exonuclease insusceptibility, might persist in the cell much longer than its linear alternative isoforms. Such low accumulation would not be expected of molecules that titrate miRNAs or other abundant regulators away from their regulatory targets. Indeed, we find few circRNAs with the properties expected of miRNA sponges. When circRNAs are experimentally enriched by either poly(A)-depletion  or RNase R digestion , tens of thousands of more circRNAs are found, even when limiting the search to only those that use annotated splice sites. Many of these low-abundance circRNAs have zero junction-spanning reads when we searched in the non-poly(A)-selected RNA-seq data, in which circRNAs were neither enriched nor depleted (Additional file 5). Perhaps it is not too far-fetched to speculate that all multi-exon genes generate one or more circular isoforms at low frequencies, whereas circularization of CDR1as is specific and efficient in all cell types in which it is expressed.
To have a physiological effect at such low levels, circRNAs would need to either participate in a catalytic process or interact very specifically with other molecules that have important functions when present at very low cellular levels. For example, mRNAs have physiological effects when present at only a few molecules per cell because they participate in the catalytic process of translation, which can produce many protein molecules from each mRNA molecule. However, we found that circRNAs are rarely translated. Some linear lincRNAs are proposed to interact with and modulate the output of a single genomic locus, which would explain their physiological effect despite their relatively low cellular abundance . Likewise, a rare circRNA could conceivably recognize and regulate a rare mRNA. However, a specific, high-affinity interaction with an mRNA or other rare cellular component would presumably rely on the circRNA sequence, which would need to be conserved to retain its function over evolutionary time, yet we found no evidence for circRNA sequence conservation beyond that observed for neighboring linear exons.
We suspect that CDRas is not the only circRNA with an evolutionarily conserved biological function. This being said, our observations that most circRNAs 1) are inefficiently produced relative to their linear alternative isoforms, 2) accumulate to only low levels in the cell, and 3) are no more conserved than their neighboring linear exons, when considered together, suggest that most circRNAs may be inconsequential side-products of imperfect pre-mRNA splicing. For linear alternative-spliced isoforms, preferential production of orthologous isoforms in the same tissues of different species is considered evidence of function ,. For circular isoforms, this type of analysis would require non-poly(A)-selected datasets from the same tissues of different species, which unfortunately are not yet available. For now, the only observation consistent with the idea that many circRNAs could be functional is our finding that the loci that produce circRNAs in mouse also tend to do so in humans. However, retention of circRNA production since the last common ancestor of mouse and human could have other causes apart from selection for circRNA function. For example, slowed splicing at the circRNA acceptor would presumably favor circRNA production because it would allow for transcription of the downstream donor, and if this slowed splicing is conserved for reasons other than circRNA function, then the production of circRNAs might nonetheless be conserved. Therefore, considering the conserved production of circRNAs as evidence against the idea that the vast majority of circRNAs are inert splicing side-products would require a more thorough understanding of the determinants of circRNA biogenesis.
Mammalian cells produce a large number of circRNAs, which have captured the interest of many biologists, particularly after the description of CDR1as and its many conserved sites to miR-7. Our work identifies thousands of additional circRNAs and focuses on those that have circular fractions ≥10%. Unlike CDR1as, most of the previously and newly identified mammalian circRNAs represent alternatively spliced, low-abundance isoforms of protein-coding genes. Expression of circRNAs is generally not more cell-type-specific than mRNAs with similar overall expression levels. Although orthologous circRNAs were found between mouse and human, their sequence conservation is no higher than that of their neighboring linear exons, and no other identified circRNA is expected to function as a miRNA sponge nearly as effectively as CDR1as. Although some circRNAs with biological functions might exist, our results suggest that a large majority of circRNAs are inconsequential side-products of pre-mRNA splicing.
Materials and methods
circRNA identification and quantification
Human and mouse Ribo-Zero RNA-seq data were downloaded from either the ENCODE project or Gene Expression Omnibus (GEO). For each sample, Fastq reads were first mapped to hg19 or mm9 genome by Bowtie, allowing 2 mismatches. After removing PCR-duplicated reads by FASTX toolkit, all the unmapped reads were then aligned by BLAT (no mismatch or gap allowed). Dual alignments of two complimentary segments within a single read mapping to two regions on the same chromosome in the reverse order and no more than 100 kb away from each other were selected as circular-junction candidates. Next, GT and AG dinucleotides were searched for within 10 nucleotides genomic windows flanking the donor and acceptor end of each junction, respectively. Candidates with GT-AG-flanking junctions were carried forward, and the GT-AG dinucleotides were used to identify the precise splice sites. For human circRNAs, each junction required support from at least two independent reads within the sample.
To quantify the relative ratio of circular and linear isoforms, we focused on the two segments (20 nucleotides upstream from the donor and 20 nucleotides downstream from the acceptor) flanking the circular junction. Because many linear isoforms may exist for a given splice site, we took an inclusive approach and simply counted the reads that contained either of these two sequences and have enough sequence space for the other sequence (ndonor and nacceptor), and the reads that spanned the circular junction and contained both sequences (njunction). The circular fraction is calculated as njunction / (ndonor + nacceptor – njunction + 1). To be accepted into the final circRNA catalog, a circRNA candidate must have a circular fraction ≥ 10% in at least two samples.
One-to-one gene ortholog tables for gene-level analysis were downloaded from Ensembl . For exon-level analysis, human circRNA junction coordinates were converted to mouse (mm9) genome coordinates using the UCSC liftOver tool, then intersected with mouse circRNA junctions using BEDTools. To calculate the correlation of average circular fractions of circRNA orthologs, circular fractions of each circRNA in all cell types wherein it was expressed (≥1 read for each of the donor and acceptor ends) were averaged. Spearman’s rank correlation test was performed.
Analysis of translation
Twenty-nucleotide sequences were taken from circular junctions and each of the two linear junctions overlapping the circular junctions (10 nucleotides from each side of each junction). Numbers of reads containing each of these sequences, as well as the circular fractions for each circRNA, were compared using RNA-seq and RPF data from human U2OS cells.
miRNA and protein binding sites
PAR-CLIP data were downloaded from the GEO. After read alignment by Bowtie, binding clusters were identified using PARalyzer with default settings . Cluster densities of all circular exons were calculated and compared to those of their linear neighboring exons. To avoid biases, only coding exons were considered. To quantify miRNA targets sites, exonic segments within each circRNA were concatenated using the transcript models built from all ENCODE cytosolic RNA-seq data, and numbers of canonical miRNA sites (7mer-A1, 7mer-m8, and 8mer sites)  for the 87 miRNA families conserved across vertebrates and 66 miRNA families conserved across mammals were quantified for each circRNA. To estimate the distribution of sites expected by chance, the procedure was repeated using 1,000 cohorts consisting of 87 or 66 control k-mers. To select a control k-mer, each 8mer site was randomly permuted to preserve its mononucleotide composition. Permutated sequences were chosen if they preserved the CG dinucleotide number and possessed an A at the 3′-most nucleotide. Collectively, these constraints served to select control k-mers with similar expected genome-wide abundance.
RNA-seq and RPF data of human U2OS cells have been deposited in GEO under accession number GSE51584.
fragments per kilobase of transcript per million fragments sequenced
Gene Expression Omnibus
long intervening non-coding RNA
ribosome protected fragment
We thank C. Burge, S. Eichhorn, I. Ulitsky and O. Rissland for helpful discussions and suggestions. This work was supported by NIH grant GM067031 (D.P.B.), and a National Science Foundation Graduate Research Fellowship (V.A.). J.U.G. is a Damon Runyon Fellow supported by the Damon Runyon Cancer Research Foundation (DRG-2152-13). H.G. was supported by the Agency for Science, Technology and Research, Singapore. D.P.B. is an investigator of the Howard Hughes Medical Institute.
- Djebali S, Davis CA, Merkel A, Dobin A, Lassmann T, Mortazavi A, Tanzer A, Lagarde J, Lin W, Schlesinger F, Xue C, Marinov GK, Khatun J, Williams BA, Zaleski C, Rozowsky J, Röder M, Kokocinski F, Abdelhamid RF, Alioto T, Antoshechkin I, Baer MT, Bar NS, Batut P, Bell K, Bell I, Chakrabortty S, Chen X, Chrast J, Curado J: Landscape of transcription in human cells. Nature. 2012, 489: 101-108. 10.1038/nature11233.PubMedPubMed CentralView ArticleGoogle Scholar
- Derrien T, Johnson R, Bussotti G, Tanzer A, Djebali S, Tilgner H, Guernec G, Martin D, Merkel A, Knowles DG, Lagarde J, Veeravalli L, Ruan X, Ruan Y, Lassmann T, Carninci P, Brown JB, Lipovich L, Gonzalez JM, Thomas M, Davis CA, Shiekhattar R, Gingeras TR, Hubbard TJ, Notredame C, Harrow J, Guigó R: The GENCODE v7 catalog of human long noncoding RNAs: analysis of their gene structure, evolution, and expression. Genome Res. 2012, 22: 1775-1789. 10.1101/gr.132159.111.PubMedPubMed CentralView ArticleGoogle Scholar
- Sabin LR, Delas MJ, Hannon GJ: Dogma derailed: the many influences of RNA on the genome. Mol Cell. 2013, 49: 783-794. 10.1016/j.molcel.2013.02.010.PubMedView ArticleGoogle Scholar
- Guttman M, Rinn JL: Modular regulatory principles of large non-coding RNAs. Nature. 2012, 482: 339-346. 10.1038/nature10887.PubMedPubMed CentralView ArticleGoogle Scholar
- Ulitsky I, Bartel DP: lincRNAs: genomics, evolution, and mechanisms. Cell. 2013, 154: 26-46. 10.1016/j.cell.2013.06.020.PubMedPubMed CentralView ArticleGoogle Scholar
- Batista PJ, Chang HY: Long noncoding RNAs: cellular address codes in development and disease. Cell. 2013, 152: 1298-1307. 10.1016/j.cell.2013.02.012.PubMedPubMed CentralView ArticleGoogle Scholar
- Bartel DP: MicroRNAs: target recognition and regulatory functions. Cell. 2009, 136: 215-233. 10.1016/j.cell.2009.01.002.PubMedPubMed CentralView ArticleGoogle Scholar
- Memczak S, Jens M, Elefsinioti A, Torti F, Krueger J, Rybak A, Maier L, Mackowiak SD, Gregersen LH, Munschauer M, Loewer A, Ziebold U, Landthaler M, Kocks C, Ie Noble F, Rajewsky N: Circular RNAs are a large class of animal RNAs with regulatory potency. Nature. 2013, 495: 333-338. 10.1038/nature11928.PubMedView ArticleGoogle Scholar
- Hansen TB, Jensen TI, Clausen BH, Bramsen JB, Finsen B, Damgaard CK, Kjems J: Natural RNA circles function as efficient microRNA sponges. Nature. 2013, 495: 384-388. 10.1038/nature11993.PubMedView ArticleGoogle Scholar
- Hansen TB, Wiklund ED, Bramsen JB, Villadsen SB, Statham AL, Clark SJ, Kjems J: miRNA-dependent gene silencing involving Ago2-mediated cleavage of a circular antisense RNA. EMBO J. 2011, 30: 4414-4422. 10.1038/emboj.2011.359.PubMedPubMed CentralView ArticleGoogle Scholar
- Huntzinger E, Izaurralde E: Gene silencing by microRNAs: contributions of translational repression and mRNA decay. Nat Rev Genet. 2011, 12: 99-110. 10.1038/nrg2936.PubMedView ArticleGoogle Scholar
- Salzman J, Gawad C, Wang PL, Lacayo N, Brown PO: Circular RNAs are the predominant transcript isoform from hundreds of human genes in diverse cell types. PLoS One. 2012, 7: e30733-10.1371/journal.pone.0030733.PubMedPubMed CentralView ArticleGoogle Scholar
- Danan M, Schwartz S, Edelheit S, Sorek R: Transcriptome-wide discovery of circular RNAs in Archaea. Nucleic Acids Res. 2012, 40: 3131-3142. 10.1093/nar/gkr1009.PubMedPubMed CentralView ArticleGoogle Scholar
- Jeck WR, Sorrentino JA, Wang K, Slevin MK, Burd CE, Liu J, Marzluff WF, Sharpless NE: Circular RNAs are abundant, conserved, and associated with ALU repeats. RNA. 2013, 19: 141-157. 10.1261/rna.035667.112.PubMedPubMed CentralView ArticleGoogle Scholar
- Salzman J, Chen RE, Olsen MN, Wang PL, Brown PO: Cell-type specific features of circular RNA expression. PLoS Genet. 2013, 9: e1003777-10.1371/journal.pgen.1003777.PubMedPubMed CentralView ArticleGoogle Scholar
- Capel B, Swain A, Nicolis S, Hacker A, Walter M, Koopman P, Goodfellow P, Lovell-Badge R: Circular transcripts of the testis-determining gene Sry in adult mouse testis. Cell. 1993, 73: 1019-1030. 10.1016/0092-8674(93)90279-Y.PubMedView ArticleGoogle Scholar
- Nigro JM, Cho KR, Fearon ER, Kern SE, Ruppert JM, Oliner JD, Kinzler KW, Vogelstein B: Scrambled exons. Cell. 1991, 64: 607-613. 10.1016/0092-8674(91)90244-S.PubMedView ArticleGoogle Scholar
- Sharp PA, Burge CB: Classification of introns: U2-type or U12-type. Cell. 1997, 91: 875-879. 10.1016/S0092-8674(00)80479-1.PubMedView ArticleGoogle Scholar
- Cabili MN, Trapnell C, Goff L, Koziol M, Tazon-Vega B, Regev A, Rinn JL: Integrative annotation of human large intergenic noncoding RNAs reveals global properties and specific subclasses. Genes Dev. 2011, 25: 1915-1927. 10.1101/gad.17446611.PubMedPubMed CentralView ArticleGoogle Scholar
- Al-Balool HH, Weber D, Liu Y, Wade M, Guleria K, Nam PL, Clayton J, Rowe W, Coxhead J, Irving J, Elliott DJ, Hall AG, Santibanez-Koref M, Jackson MS: Post-transcriptional exon shuffling events in humans can be evolutionarily conserved and abundant. Genome Res. 2011, 21: 1788-1799. 10.1101/gr.116442.110.PubMedPubMed CentralView ArticleGoogle Scholar
- Caudevilla C, Serra D, Miliar A, Codony C, Asins G, Bach M, Hegardt FG: Natural trans-splicing in carnitine octanoyltransferase pre-mRNAs in rat liver. Proc Natl Acad Sci U S A. 1998, 95: 12185-12190. 10.1073/pnas.95.21.12185.PubMedPubMed CentralView ArticleGoogle Scholar
- Gilbert WV: Alternative ways to think about cellular internal ribosome entry. J Biol Chem. 2010, 285: 29033-29038. 10.1074/jbc.R110.150532.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen CY, Sarnow P: Initiation of protein synthesis by the eukaryotic translational apparatus on circular RNAs. Science. 1995, 268: 415-417. 10.1126/science.7536344.PubMedView ArticleGoogle Scholar
- Corcoran DL, Georgiev S, Mukherjee N, Gottwein E, Skalsky RL, Keene JD, Ohler U: PARalyzer: definition of RNA binding sites from PAR-CLIP short-read sequence data. Genome Biol. 2011, 12: R79-10.1186/gb-2011-12-8-r79.PubMedPubMed CentralView ArticleGoogle Scholar
- Hafner M, Landthaler M, Burger L, Khorshid M, Hausser J, Berninger P, Rothballer A, Ascano M, Jungkamp AC, Munschauer M, Ulrich A, Wardle GS, Dewell S, Zavolan M, Tuschl T: Transcriptome-wide identification of RNA-binding protein and microRNA target sites by PAR-CLIP. Cell. 2010, 141: 129-141. 10.1016/j.cell.2010.03.009.PubMedPubMed CentralView ArticleGoogle Scholar
- Kishore S, Jaskiewicz L, Burger L, Hausser J, Khorshid M, Zavolan M: A quantitative analysis of CLIP methods for identifying binding sites of RNA-binding proteins. Nat Methods. 2011, 8: 559-564. 10.1038/nmeth.1608.PubMedView ArticleGoogle Scholar
- Hafner M, Lianoglou S, Tuschl T, Betel D: Genome-wide identification of miRNA targets by PAR-CLIP. Methods. 2012, 58: 94-105. 10.1016/j.ymeth.2012.08.006.PubMedPubMed CentralView ArticleGoogle Scholar
- Schnall-Levin M, Rissland OS, Johnston WK, Perrimon N, Bartel DP, Berger B: Unusually effective microRNA targeting within repeat-rich coding regions of mammalian mRNAs. Genome Res. 2011, 21: 1395-1403. 10.1101/gr.121210.111.PubMedPubMed CentralView ArticleGoogle Scholar
- Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B: Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nat Methods. 2008, 5: 621-628. 10.1038/nmeth.1226.PubMedView ArticleGoogle Scholar
- Merkin J, Russell C, Chen P, Burge CB: Evolutionary dynamics of gene and isoform regulation in Mammalian tissues. Science. 2012, 338: 1593-1599. 10.1126/science.1228186.PubMedPubMed CentralView ArticleGoogle Scholar
- Barbosa-Morais NL, Irimia M, Pan Q, Xiong HY, Gueroussov S, Lee LJ, Slobodeniuc V, Kutter C, Watt S, Colak R, Kim T, Misguitta-Ali CM, Wilson MD, Kim PM, Odom DT, Frey BJ, Blencowe BJ: The evolutionary landscape of alternative splicing in vertebrate species. Science. 2012, 338: 1587-1593. 10.1126/science.1230612.PubMedView ArticleGoogle Scholar
- Vilella AJ, Severin J, Ureta-Vidal A, Heng L, Durbin R, Birney E: EnsemblCompara GeneTrees: complete, duplication-aware phylogenetic trees in vertebrates. Genome Res. 2009, 19: 327-335. 10.1101/gr.073585.107.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.