A single molecule, single cell RNA FISH survey of lncRNAs in three human cell types
To characterize the abundance and localization patterns of lncRNAs in the three different cell types, we studied 61 lncRNAs systematically selected to span a range of parameters (Figure 1a) using single molecule RNA FISH. Specifically, we manually curated a candidate set of 61 lncRNA for screening (Figure 1; Additional files 1 and 2) such that: (1) the lncRNAs in our set are significantly expressed in at least one of human foreskin fibroblasts (hFFs), human lung fibroblasts (hLFs), or HeLa cells, the target cell lines for our study; (2) the lncRNAs span a wide range of expression levels and tissue specificity (Additional file 1: Figure S1; Additional file 2); (3) the set includes a subset of 43 lncRNAs that have an expressed syntenic ortholog in mouse; and (4) the set includes a subset of 16 lincRNAs that are transcribed divergently to a neighboring mRNA (within 10 KB). These criteria and subsets are not mutually exclusive (Figure 1b). Finally, we included 16 previously studied lncRNAs as a point of reference. We also included two different groups of mRNA controls (Additional file 3; 34 in total): (1) nine mRNAs transcribed divergently to those ‘divergent lncRNAs’ in this study the cyclin CCNA2 as a marker of cell cycle; and (2) 24 mRNAs that span a wide range of expression levels in hFF (Padovan-Merhar and Raj, personal communication).
To visualize single lncRNA molecules directly inside of cells, we used an established protocol for single molecule RNA FISH [24], where we design 10 to 48 complementary DNA oligonucleotides, each 20 bases long and labeled with a single fluorophore at its 3′ end (Figure 1a). When these probes hybridize to a single RNA molecule, the concentration of so many fluorophores at a single location renders the RNA molecule detectable by fluorescence microscopy. When applied to mRNAs, this method has typically been proven highly specific, as signal is only detectable when a large fraction of the probe set hybridizes to the target [24], and is highly accurate as gauged by quantitative polymerase chain reaction (qPCR) [44-48]. We successfully designed probe sets for 61 lncRNAs in hFFs, hLFs, and HeLa cells (Methods; Additional file 3), 53 of which yielded a detectable signal in at least one cell type. In all of the hybridizations we performed, we co-stained for CCNA2 mRNA, a cyclin whose transcripts are present only in S/G2/M, thus providing us with cell cycle information for the cells we imaged.
During the course of our investigations, we noticed that performing RNA FISH on lncRNAs presented a major challenge due to off-target binding of oligonucleotides. Even a single oligonucleotide binding to a highly abundant off-target RNA can lead to spurious signals, problems exacerbated by lncRNAs’ higher repeat content [39] (leading to more potential off-targets) and typically lower abundance than mRNAs [9] (making off-target binding more noticeable). For example, we noticed images of a particular lncRNA with similar localization patterns to MALAT1; however, removal of just one oligonucleotide from the probe pool with homology to MALAT1 resulted in complete loss of the dominant signal (Additional file 1: Figure S2a).
To control for these ‘rogue’ oligonucleotides with off-target signal, we used a two-color co-localization approach [23,24] in which we analyzed each lncRNA after partitioning its probe set into two subsets (‘even’ and ‘odd’ oligonucleotides), each labeled with a differently colored fluorophore (Figure 1a; Additional file 1: Figure S2b-d; Methods). If the oligonucleotides in the probe set were binding specifically, the signals from these two subsets should largely co-localize (for example, Figure 1a middle; Additional file 1: Figure S2b), with the number of co-localized spots roughly equaling those obtained from the full probe set (‘quantitative consistency’; Figure 1a right; Additional file 1: Figure S2d). If a single oligonucleotide hybridizes to a highly abundant off target, we would see the signal only in either the odd or even channel (see for example Figure 1a right or Additional file 1: Figure S2c for an ‘invalid’ probe set targeting). Note that for mRNA, the presence of nuclear bright foci of off-target signal is less of a concern than for lncRNA because they seldom display such bright foci without also exhibiting very large numbers of cytoplasmic RNA, whereas for lncRNA, we have found several examples for which the legitimate signal can take on this pattern (for example, Xist, Kcnq1ot1 [6,28]). We also observed cases in which the number of spots in the full probe set differed dramatically from the number of co-localized spots, potentially indicating some other non-specific background (‘quantitative inconsistency’, Figure 1a right; Additional file 1: Figure S2c).
Using the ‘two-color co-localization’ validation, we eliminated 19 probe sets from further analysis, as they had major qualitative or quantitative differences in the two color co-localization assay, underscoring the importance of testing for off-target effects for lncRNA FISH (Figure 1a; Additional file 1: Figure S2d-e and Figure S21; Additional file 4). Another eight probe sets had no discernible signal in any of the three examined cell types. We were unable to attribute the cases of no detectable signal or co-localization inconsistencies to low number of oligonucleotides and observed a very slight bias toward lower abundance lncRNAs (Kruskal-Wallis one way analysis of variance P <8.4X10-3; Additional file 1: Figure S3). Importantly, our validation approach was required in each cell type investigated, as some probes were valid in one cell type but not in another (Additional file 1: Figure S4). Upon further checking for quantitative consistency (Methods; Additional file 1: Figure S1a, Figure S2e, Figure S21; Additional file 4), we were left with 70 lncRNA-cell type pairs with valid signal, corresponding to 34 unique lncRNAs (Additional file 4; Additional file 1: Figure S22). Altogether, we acquired over 2,000 images overall in three to five separate fluorescence channels, with two to three biological replicates per gene-cell pair (the final analysis included 80, 24, and 28 cells per gene on average, for HeLa cells, hLFs, and hFFs, respectively).
lncRNAs exhibit a diversity of localization patterns composed of a few basic characteristics
We examined the cytoplasmic and nuclear localization of these 34 lncRNAs in the three cell types (70 lncRNA-cell type pairs) and observed a wide range of localization patterns (Figure 2; Additional file 1: Figure S5). These patterns consisted of combinations of a few basic features, including bright nuclear foci with multiple RNA in them, monodisperse single RNAs in the nucleoplasm, and monodisperse single RNAs in the cytoplasm. The bright nuclear foci also took a number of different forms: most consisted of a few tight puncta, but some exhibited a spatial delocalization, such as XIST, or many bright accumulations, such as MALAT1. We did not observe bright accumulations of lncRNA in the cytoplasm. These features did not manifest independently - for instance, the presence of nuclear foci was typically associated with more nuclear than cytoplasmic spots. Thus, we classified the lncRNA into the following types: (Methods; Additional file 5): (I) one or two large foci in the nucleus (nine pairs); (II) large nuclear foci and single molecules scattered through the nucleus (11 pairs); (III) predominantly nuclear, without foci (18 pairs); (IV) cytoplasmic and nuclear (28 pairs); and (V) predominantly cytoplasmic (four pairs). Validating our approach, 11 of the 12 lncRNA previously imaged by RNA FISH [6,19,21,25,49-56] showed patterns that were consistent with previous reports (Additional file 3). These included the large nuclear foci previously observed for XIST and Kcnq1ot1 [6,7,51], localization of GAS5 to both the nucleus and cytoplasm [21] and the speckle- and para-speckle-like structures of MALAT1 and NEAT1, respectively [19,49].
The majority of lncRNAs (55% classified as class I to III; 38 lncRNA-cell type pairs) are predominantly in the nucleus (Additional file 1: Figure S3a and b; Methods; compared to 1/49 of mRNAs using the I to III classification criteria of more than 65% of molecules in the nucleus), with approximately 13% of lncRNA-cell type pairs mainly located in one or two large foci (type I). As noted, we also observed two distinct types of nuclear localization patterns: (1) localization to tight foci in the nucleus (for example, XLOC_006922, XLOC_005764); and (2) a more diffuse but spatially ‘speckled’ pattern (for example, MALAT1, MEG3, XLOC_003526). Interestingly, using simultaneous imaging of MALAT1, MEG3, and XLOC_003526 by labeling each target with different fluorescent dye in hLFs and hFFs, we find that the three lncRNA share a ‘speckle like’ localization pattern, and a significant fraction of MEG3 molecules co-localize with MALAT1 (statistically significant overlap in approximately 80% of cells examined; Additional file 1: Figure S6, Methods; Additional file 5).
The bias toward nuclear localization was significant compared to localization of mRNAs (67% of lncRNAs vs. 10% of mRNAs have more than 50% of their RNA in the nucleus; Kolmogorov Smirnov (KS) P <13×10-11; Figure 3a and b). Within the lncRNA set, divergent lncRNAs presented a slightly higher bias toward nuclear localization (KS P <2.12×10-2; effect size = 0.35; Figure 3c) while syntenic orthologs did not present such bias over the lncRNA background distribution. The latter set did, however, exhibit a slight bias toward higher expression (KS P <3.25×10-3; Figure 3d).
In the vast majority (85%) of cases, the lncRNA localization pattern was consistent across the cell types where data were available. The notable exceptions were five lncRNAs (lincFOXF1, TERC, XLOC_005764, GAS5, XLOC_002746) that displayed distinct patterns in at least two cell types. These differences, however, appeared mostly to result from differences in overall abundance that likely leads to the appearance of additional bright foci in the nucleus (Figure 2, magenta stars, Additional file 1: Figure S7, S8, S9; Additional file 5). For example, we identified large lncRNA foci for TERC and XLOC_005764 in HeLa cells (type II), where they are more abundant (approximately 81 and 22 molecules per cell, respectively) than in hFFs (type III, approximately 17 and 4 molecules per cell, respectively), where these foci are missing. Similarly, GAS5 has dominant nuclear foci in HeLa cells (type II, approximately 195 molecules per cell), and less frequent foci in fibroblasts, where its expression is lower (type IV, approximately 75 molecules per cell). In other cases, higher abundance was associated with the appearance of RNA in the cytoplasm as well. For example, lincFOXF1 was more abundant in fibroblasts than in HeLa cells, where it more frequently appears in the cytoplasm (type IV in fibroblasts vs. type II in HeLa cells; Additional file 1: Figure S8).
We next applied single molecule RNA FISH for a few of our lncRNAs on tissue sections [57,58] to test whether the localization patterns we observed in cultured cells were consistent with the patterns found in intact tissues. We selected MALAT1, NEAT1, and PVT1 (XLOC_006922), which have orthologous expressed transcripts in mouse, and performed single molecule RNA-FISH in both mouse embryonic stem cells (mESCs) and mouse neonatal cardiac/kidney tissue (Methods). For each of these lncRNAs, we observe the same unique focal nuclear pattern across species (that is, in both HeLa cells and mESCs) and in the mouse tissue (Additional file 1: Figure S10; Methods), showing that the patterns we observed in cultured cells recapitulate what we observed in vivo.
lncRNAs do not persist at nuclear foci during mitosis
The appearance of bright nuclear foci of specific lncRNAs raised the question of whether these foci persist through mitosis; persistence at the target locus through mitosis could suggest that lncRNA play a role in potential mechanisms for the maintenance of epigenetic states through cell division. To address this question, we examined the staining in mitotic cells of six lncRNA that exhibit nuclear specific localization patterns (approximately 50% of such cases).
None of the lncRNA we examined exhibited nuclear foci in cells undergoing mitosis (Figure 3e; Additional file 5). (The potential foci we observed in approximately one-third of ANRIL mitotic cells were not validated when using two-color co-localization; Additional file 5). Notably, for five of the lncRNAs, including XIST, we observed some molecules spread throughout the cytoplasm during mitosis (consistent with previous observations for XIST [6]). In the case of XLOC_001515 we did not observe any lncRNA molecules whatsoever during mitosis. Thus, we found no evidence for mitotic retention of these lncRNA to the nuclear foci they inhabit during interphase.
The extent of cell-to-cell variability in lncRNA expression is similar to that of mRNAs
When measured in bulk cell populations, lncRNAs are typically expressed at low levels compared to mRNAs [4,9]. Several studies have hypothesized that these bulk measurements may obscure an extreme cell-to-cell heterogeneity in which lncRNA are expressed very highly in a small fraction of cells, but lowly or not at all in most others cells, resulting in average low expression [10,59]. We tested this hypothesis by quantifying the cell-to-cell variability of the lncRNAs in our panel.
We first confirmed that the average (cell population) expression level estimates for our lncRNAs were generally consistent between RNA FISH and RNA-Seq (Pearson r = 0.55; P value <2.5×10-6; Additional file 1), with discrepancies possibly due to the high variability in RNA-Seq abundance estimates for some of the examined transcripts (Additional file 1: Figure S11). We observed even higher consistency with qPCR (Pearson r = 0.788, P value <3.96×10-3, in comparison to Pearson r = 0.579 when comparing RNA-Seq on the same subset of genes; Additional file 1: Figure S12; Methods), as also reported by others [44-48].The distribution of single cell counts demonstrated the relatively low overall expression of lncRNAs, with 43% of lncRNA-cell pairs having 10 or fewer molecules per cell on average and with a median of 14 molecules across all gene-cell-pair distribution medians (vs. 36 for the 49 mRNA-cell pairs we examined) (Figure 4a).
We also checked whether any of our lncRNAs showed evidence for G1 or S/G2/M dependent expression by simultaneously measuring the cyclin CCNA2 transcript count in every image we obtained, which is high in the S, G2, and M phases of the cell cycle [60,61]. We identified two lncRNAs whose expression positively correlated with CCNA2 (lincSFPQ and XLOC_001226), and one negatively correlated (XLOC_011185), (Additional file 5; Additional file 1: Figure S13), suggesting that expression of these lncRNAs was regulated through the cell cycle. Still, for the majority, any variability we observed was not due to variability in cell cycle phase.
In most cases, cell-to-cell variability in lncRNA levels was similar to that of protein coding mRNAs expressed at comparable average levels and did not reveal the presence of low frequency, highly expressing cells (Additional file 1; Figure 4c). In particular, the mean and the median molecule counts were similar, highlighting the lack of outlier cells in the single cell distributions (Additional file 1: Methods; Figure 4b; Additional file 1: Figure S9; Pearson r = 0.98, P value <2.5×10-39). One notable exception was the tissue specific lncRNA XLOC_003526 encoded from a poorly conserved 900 Kb gene desert (Figure 4d, e): it is lowly expressed on average (FPKM <1 in a population of hLF RNA-Seq, with few, if any, spliced reads; Additional file 1: Figure S14), but in RNA-FISH approximately 25% of the cells express it highly (107 +/- 26 molecules on average), whereas the other cells express it very lowly (9 +/- 1.2 molecules on average). Its expression did not correlate with CCNA2, suggesting that its variability is not related to cell cycle.
Since we only obtained a few dozen cells for most of the lncRNA-cell line pairs examined (due to limited imaging throughput), we could not rule out the possibility of a particularly rare cell with extraordinarily high expression levels. To increase our statistical power, we imaged 500 to 700 cells for each of four lncRNA in HeLa cells (Additional file 1: Figure S15), including XLOC_004456, which displayed no signal in HeLa in our initial assessment. None of these images revealed the presence of any highly expressing outlier cells. With a sample size of n = 500 cells, we can place an upper bound of 0.6% of cells that may express high levels of the lncRNA but went undetected in our assay with a statistical power of 0.95 (Additional file 1).
Cellular localization and expression correlation of divergently transcribed lncRNA-mRNA transcript pairs
We have previously distinguished a subset of lincRNAs that are transcribed divergently from protein coding genes’ promoters (approximately 500, approximately 13% of human lincRNAs [9,35]; Figure 5a), but are stable, processed and spliced. One hypothesis is that these ‘divergent’ lncRNAs are co-regulated with their neighbors and possibly have a regulatory effect on their neighbor at the transcription site [35,62], with bulk assays observing co-expression of divergent transcripts [35,42,43,62]. To look for correlations at the single cell level and potential localization to the site of transcription, we simultaneously measured abundance and localization of divergent lncRNA and their mRNA neighbor for eight of the nine candidate divergent lncRNAs for which we had valid probe sets (Figure 5; Additional file 5).
We observed that in most cases (7/8) the bi-directionally promoted lncRNAs were not simply localized at one or few foci (characteristics of type I; likely to be the site of transcription), but rather were located throughout the cell (Figure 5a and b; Additional file 1: Figure S16). For example, RNA from XLOC_011950 and XLOC_010514 were substantially cytoplasmic and showed no nuclear foci (type VI). NR_029435, TUG1, and XLOC_009233 RNA were mostly nuclear but with no apparent foci (type III). Lastly, lincMKLN1 (type II; also known as PINT [63]), lincFOXF1 (also known as FENDRR [64]), and GAS5 (type II and VI) RNA were all present as nuclear foci in some cell types. Substantial numbers of lincFOXF1and GAS5 RNA were also found outside these foci and in the cytoplasm. Together, the subcellular localizations displayed by divergent lncRNAs were distinct from each other, and were not qualitatively different from those of the other lncRNAs in our survey.
We also observed a spectrum of correlation and expression levels of the lncRNA and its neighboring protein coding gene (Figure 5c). Both lincFOXF1 and XLOC_010514 tightly correlated with their neighbors in hLFs (Pearson r = 0.91, 0.84, respectively). XLOC_011950 and its neighbor are positively correlated in HeLa cells, but did not correlate in hFFs, where they were still expressed to the same extent on average (Figure 5c; Additional file 1: Figure S17). NR_029435 and GAS5 were positively correlated with their neighbors in HeLa cells (Pearson r = 0.4 and 0.44, respectively), although it is possible that these relatively mild correlations resulted from a generic correlation with cellular volume (Padovan-Merhar and Raj, personal communication). We note that there was no correspondence between the existence of an expression correlation between the lncRNA and its neighbor and a particular subcellular localization pattern. Taken together, while the divergent lncRNA in this study shared a common genomic layout, no consistent pattern of localization nor co-expression levels with their neighboring coding gene emerged.