Skip to main content


Fig. 1 | Genome Biology

Fig. 1

From: Non-coding RNAs underlie genetic predisposition to breast cancer

Fig. 1

Identification of mencRNAs from breast cancer GWAS risk regions. a Schematic of the RNA CaptureSeq experimental design. Oligonucleotide probes were tiled across intronic and intergenic regions within 1.5-Mb intervals surrounding breast cancer risk regions (capturing ~ 138 Mb or 4.3% of the human genome). The probes were hybridized to cDNAs from breast-derived cell lines and tissues resulting in capture and enrichment of low abundance transcripts in target regions that were then sequenced. The sequencing reads were de novo assembled, mapped, and quantified. b The number of transcripts captured from each RNA CaptureSeq library. The libraries included nine breast-derived cell lines, four breast tumor (BT) samples, and four breast normal (NB) samples. Four non-captured libraries were also sequenced. c Distribution of mencRNA transcript length. Pooled captured transcripts from all libraries were binned based on their transcript lengths. d Hierarchical clustering of RNA CaptureSeq libraries based on mencRNA expression profiles. ER-positive breast cancer cell lines and tumors are shown in red, ER-negative breast cancer cell lines are shown in blue, and normal breast cell lines and tissues are shown in black. NC non-captured, NB normal breast, BT breast tumor. The y-axis of the dendrogram represents a distance measure between the clusters. e Expression distribution of captured mencRNA transcripts versus protein-coding transcripts. Multi-exonic captured transcripts with max. FPKM ≥ 0.5 were mapped in TCGA RNA-Seq data and their average expression across the TCGA tumors were compared to GENCODE protein-coding genes. The y-axis represents the frequency of transcripts with a given expression value represented as log2 (average FPKM) on the x-axis. f Principal component analysis (PCA) of captured transcripts in TCGA normal breast and matched tumor samples. Scaled, centred, and normalized expression of the captured transcripts were analyzed for the first (x-axis; PC1) and second (y-axis; PC2) principal components. Each dot represents expression profile of an individual sample. g PCA of the captured transcripts in different PAM50 breast cancer subtypes. h Comparison of tissue-specific expression of captured mencRNA versus protein-coding transcripts. Multi-exonic captured transcripts with max. FPKM ≥ 0.5 and GENCODE protein-coding genes were mapped in TCGA RNA-seq data for primary tumors from seven different cancer types. For each gene, tissue specificity index (Tau) was measured with 0 and 1, indicating broad and tissue-specific expression, respectively. The y-axis represents the frequency of transcripts with a given Tau value

Back to article page