Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Long non-coding RNAs display higher natural expression variation than protein-coding genes in healthy humans

Fig. 2

LncRNAs not in public annotations show less mRNA-like features. a Distribution of 6,249 granulocyte de novo annotated lncRNA transcripts according to coverage by three commonly used public annotations (PA): RefSeq, GENCODE-v19, Cabili [14, 58, 59]. Known lncRNA loci contain two transcript types: ‘PA transcripts’ that show full exonic overlap with an annotated lncRNA transcript (32 %, 2,003 transcripts, dark gray), or ‘isoform not in PA’ transcripts, that can share exons but contain one or more additional exons not present in public annotation (37 %, 2,331 transcripts medium gray). New lncRNA loci: contain 1,921 ‘not in PA’ transcripts (31 % of lncRNA transcripts identified in granulocytes, light gray). b An example of a publicly-annotated lncRNA locus (GENCODE-v19 AC007950.1) that contains additional upstream exons not in PA, from sample D2-2_pa_100ss (Additional file 2B). The annotation identifies locus gra912 (thick green bar). The annotated lncRNA isoforms of locus gra912 with alternative transcription start sites (TSS) are shown underneath as gray lines (the shorter PA transcript is shown in black for comparison). c Granulocyte-specificity analysis. Bar plot shows the percentage of granulocyte-specific (purple) and not-specific (light gray) transcripts de novo annotated in granulocytes. Each bar shows the percentage of granulocyte-specific transcripts for each transcript class while the dashed green line shows the percentage for all lncRNAs together. d Average expression level (RPKM) in granulocyte PolyA+ RNA-seq samples used for annotation. The median values are: all mRNA transcripts (blue): 6.14, all lncRNA transcripts (green dashed line): 0.65, lncRNA transcripts ‘in PA’ (dark gray): 1.00, lncRNA transcripts ‘isoform not in PA’ (medium gray): 0.68, lncRNA transcripts ‘not in PA’ (light gray): 0.47. e PolyA+ enrichment of de novo granulocyte annotated transcripts calculated as a ratio between abundance of a transcript in PolyA+ RNA and abundance in total ribosome-depleted RNA. Transcript abundance (RPKM) is averaged among all PolyA+ RNA-seq samples or all total RNA-Ribosomal depleted RNA-seq samples. Transcripts not detected in total RNA-seq data (average RPKM <0.2) were not analyzed. The median values are: all mRNA transcripts (blue): 2.62, all lncRNA transcripts (dashed green line): 1.56, lncRNA transcripts ‘in PA’ (dark gray): 1.80, lncRNA transcripts ‘isoform not in PA’ (medium gray): 1.54, lncRNA transcripts ‘not in PA’ (light gray): 1.29. f Splicing efficiency of de novo granulocyte annotated transcripts. Only transcripts with average RPKM >0.2 in 21 ribosomal-depleted RNA-seq samples were analyzed and the efficiency of the most efficiently-spliced site in each transcript is plotted. The median values are: all mRNA transcripts: 99.02 %, all lncRNA transcripts: 88.13 %, lncRNA transcripts ‘in PA’: 87.18 %, lncRNA transcripts ‘isoform not in PA’: 90.90 %, lncRNA transcripts ‘not in PA’: 77.97 %. Remarks to boxplots d, e, and f: the box plot displays the full population but P values are calculated using Mann–Whitney U test on equalized population sizes. *0.001 < P < 10-5, **10-5 < P < 10-10, ***P < 10-16. Green asterisks indicate the significance of the difference between mRNAs and all lncRNAs (only the median level is plotted as a dashed green line). Outliers are not displayed

Back to article page