Surface-seq: characterization of maxRNAs by sequencing
To inspect the possible cell surface-presentation of nuclear-encoded RNA, we developed a technology called Surface-seq. Surface-seq is based on a nanotechnology that extracts the plasma membrane from cells and tightly assembles the membrane around polymeric cores to form membrane-coated nanoparticles (MCNP) [10,11,12]. This technology suits our purpose because it retains the inside-outside orientation of the membrane by keeping the surface molecules on the membrane facing outwards, as validated by transmission electron microscopy [10, 13]. Moreover, the process of cell membrane purification and their stable coating onto the polymeric core ensures the rigorous removal of intracellular contents [10,11,12]. These validated features of MCNP enable the purification of RNAs that are stably associated with the extracellular layer of the cell membrane [10, 12], which are then used as the input of the Surface-seq library construction and sequencing.
We performed Surface-seq with EL4 cells using 2 technical variations. In variation A, after MCNPs were assembled and washed, RNAs were extracted using phenol-chloroform, quantified, and constructed into a sequencing library (Fig. 1a, Additional file 1: Fig. S1 and S2). This variation enriches for all membrane-associated RNA without differentiating the sides of the membrane. In variation B, after the MCNP assembly, the RNAs exposed on the outer surface of MCNPs (which correspond to the outer cell surface) were directly ligated to a 3′ RNA adaptor. The RNA was subsequently purified and ligated with the 5′ adaptor. Because the 3′ adaptor was selectively ligated to the outside-facing RNA, this technical variation enriches for the outside-facing membrane-associated RNA in the sequencing library (Fig. 1b).
We generated 5 Surface-seq libraries from EL4 cells, including 3 replicate libraries from technical variation A (A1, A2, A3) and 2 replicate libraries (B1, B2) from technical variation B (Additional file 1: Table S1). Our initial analysis focused on long noncoding RNAs (lncRNAs) because these have been previously associated with bacterial or mammalian cell membrane functions [4, 7]. Each sequencing library revealed 200 to 400 lncRNAs with counts per million greater than 2, and 82 of them, including Malat1, Neat1, and Snhg20, were shared among all 5 Surface-seq libraries (Fig. 1c, d). Taking Malat1 as an example, the Surface-seq reads were not uniformly spread across the entire lncRNA, but enriched at specific regions, especially around the center of the transcript (Fig. 1d). To identify the outside-facing RNAs, we compared the sequencing libraries generated from Variation B (B1, B2) to those generated from Variation A (A1, A2, A3). A total of 17 lncRNAs were identified (Benjamini-Hochberg adjustment FDR < 0.05, and fold change > 2, DESeq2 [14]), including Malat1 (the scale of the B1, B2 tracks was larger than the scale of the A1, A2, A3 tracks, Fig. 1d). These experiments identified candidate maxRNAs that appeared consistently on the outer cell membrane for further validation.
Validation of maxRNAs by RNA-FISH on the cell surface (Surface-FISH)
To validate the localization of candidate maxRNAs, we carried out single-molecule RNA-FISH on the cell surface, which we termed Surface-FISH. This technique was adapted from our previously established protocol [15] where the cell membrane permeabilization step was skipped. We used a set of five quantum-dot-labeled oligonucleotide probes each consisting of 40 nt against the target transcript (arrows in Fig. 1d, e). We tested 2 Surface-seq prioritized lncRNAs, i.e., Malat1 (Fig. 1f–l) and Neat1 (Fig. 1f) in EL4 cells. To control for probe specificity, we used probes with six mutated bases at the center of the 40 nt probes designed for testing Malat1 (mut-Malat1 control) and Neat1 (mut-Neat1 control), respectively (Additional file 1: Table S3). We examined 20 to 30 single cells for each probe-set (Fig. 1f). Nearly all cells treated with Malat1 and Neat1 probes exhibited Surface-FISH signals, ranging from 1 to 10 signal foci per cell, whereas most cells treated with the control probes exhibited no signal (median = 0) (p values < 0.0001, Wilcoxon rank tests) (Fig. 1g–j).
To confirm that the Surface-FISH signals are not a result of RNA leakage from damaged cell membranes, we combined Malat1 Surface-FISH with a transmission-through-dye (TTD) microscopic analysis, where only live cells with intact membranes are fluorescently labeled [16,17,18] (Additional file 1: Fig. S3). Malat1 FISH signals appeared on cells with perfectly intact membranes (Fig. 1k), as indicated by TTD staining of the same cell (Fig. 1l). Together, observations made using various techniques suggest the presence of specific nuclear-encoded transcripts on the surface of intact live cells.
Visualization of maxRNA from primary PBMCs as a test for cell-type specificity
Based on the concept of guilt-by-association [19,20,21], cell-type specificity of maxRNA presentation may suggest the relevance of maxRNA to the functions of the presenting cells. To evaluate this association, we tested whether maxRNAs are present in primary human cells under physiological conditions and whether their presence is cell-type specific. We chose primary PBMCs for these tests, considering their heterogeneity and frequent interactions among each other and with other cell types. We collected 120,000 PBMCs from each of the 4 human subjects and split each donor’s cells into 4 aliquots, each with 30,000 cells. The 4 aliquots of PBMCs per donor were used for 1 test and 3 control experiments, as described below.
In the test experiment, we probed for putative maxRNAs on PBMCs by hybridization with a randomized library of fluorescence-labeled oligonucleotides of 20 nt (maxRNA probes). Hereafter we will refer to this technique as in situ surface FISH (isFISH) (Fig. 2a). After probe incubation and washes, we subjected PBMCs to an imaging flow cytometry (IFC) analysis of 6 channels, which detects brightfield, live/dead, cell nuclei (Hoechst), maxRNA, CD14 (a monocyte marker), CD3ε (a T cell marker), and CD19 (a B cell marker) (Fig. 2b, c, Additional file 1: Fig. S4). To evaluate any possible fluorophore internalization or non-specific membrane attachment, we carried out 3 control experiments. We replaced the 20-mer probes respectively with one of the following: (1) a randomized probe library of 6 nt oligonucleotides (6-mer library control), (2) a 20 nt probe against the drosophila Art4 RNA (dArt4 control), and (3) the fluorophore without conjugating any oligonucleotides (fluorophore only control).
On average, 4.8% of total PBMCs exhibited isFISH signals, which is at least 27-fold more than any of the control groups (Fig. 2d) (p value < 0.005, Kruskal-Wallis test). At the stereotypical cell type level, on average more than 10% of CD14+ cells and approximately 3% of CD3ε+ cells exhibited isFISH signals (p value < 0.005, t test), whereas less than 2% of CD19+ and CD3ε−CD14−CD19− cells exhibited isFISH signals (Fig. 2e). These data support the presence and cell-type specificity of maxRNA in primary human PBMCs, thereby accumulating evidence towards guilt-by-association [19,20,21], i.e., the relevance of maxRNA to the functions of the presenting cells.
Single-cell transcriptome analysis of maxRNA-presenting cells: additional evidence for cell-type specificity
To provide further evidence for the cell-type specific maxRNA presentation, we characterized the maxRNA presenting cells by combining isFISH and fluorescence-activated cell sorting (FACS) with single-cell RNA sequencing (scRNA-seq). Specifically, after isFISH labeling, we performed FACS on PBMCs and obtained 2 cell populations, i.e., isFISH+ and isFISH−. We then subjected these two populations of cells scRNA-seq on the 10X Genomics platform, which yielded 2486 isFISH+ and 9043 isFISH− cells, with on average 21,059 reads per cell. The 3 control experiments (6-mer, dArt4, fluorophore only) yielded too few positive cells (Fig. 2d) to be analyzed by the 10X Genomics scRNA-seq platform.
Next, we employed both unsupervised and supervised methods to query whether the isFISH+ cells are associated with any known cell types. In the unsupervised analysis, we plotted the single-cell transcriptomes on a tSNE plot (Fig. 3a). isFISH+ and isFISH− cells formed two separate clusters on the tSNE plot (blue and red dots, Fig. 3a). The single cells expressing monocyte markers CD14 and LYZ were enriched in the isFISH+ cluster and were nearly absent from the isFISH− cluster (Fig. 3b, c). On the contrary, the single cells expressing T cell markers CD3E and CD8A, natural killer (NK) cell marker NKG7, and B cell marker MS4A1 were enriched in the isFISH- cluster (Additional file 1: Fig. S5).
For a supervised analysis, we used the trained SingleCellNet classifier [22] that classifies each PBMC into one of its 11 pre-defined PBMC cell types (Fig. 3d). These 11 cell types were defined by training the SingleCellNet with ~ 68,000 human PBMC single-cell transcriptomes [22]. We ranked the 11 pre-defined cell types by their association with isFISH+ cells based on odds ratios (Fig. 3e). Two pre-defined cell types, namely CD14+ monocytes and dendritic cells, were enriched in isFISH+ cells (first two columns, Fig. 3e). The majority (87%) of the isFISH+ cells were classified as CD14+ monocytes, as compared to only 0.55% of isFISH− cells classified as CD14+ monocytes (odds ratio = 1143, Bonferroni adjusted p value < 0.001). Of the other 10 cell types, dendritic cells exhibited a modest enrichment with isFISH+ cells (odds ratio = 7.78) and the other 9 cell types were relatively depleted in isFISH+ cells (odds ratio < 1, Fig. 3e). Consistent with the unsupervised analysis, this supervised analysis suggests that the majority of maxRNA-presenting cells are monocytes. Collectively, both isFISH imaging flow cytometry (in Fig. 2) and isFISH scRNA-seq (in Fig. 3) data suggest that maxRNA are not uniformly present in all cell types, and monocytes are a major maxRNA-presenting cell type in human PBMCs.
Antisense purification and sequencing of maxRNAs from primary human cells
To interrogate the functional relevance of maxRNAs in PBMCs, we asked what are the maxRNA-producing genes in these cells. To answer this question, we developed Surface-FISHseq to sequence the isFISH-captured candidate maxRNAs. The central idea of Surface-FISHseq is to purify the cell surface RNAs through hybridization with biotin-tagged probes, and then subject the purified RNA for sequencing (Fig. 4a). Compared to the previously described BrU labeling and Surface-seq, the Surface-FISHseq enables maxRNA capture and purification from primary live cells with minimal perturbation. Additionally, it allows for a microscopic examination of probe hybridization at the cell surface before proceeding to the sequencing steps.
We reasoned that even if maxRNAs exist, their relative quantity would be significantly less than intracellular RNAs. Thus, a successful maxRNA purification procedure will have to be highly selective. To this end, we carried out 3 Surface-FISHseq experiments. The 3 experiments shared the core Surface-FISHseq experimental pipeline, and each experiment contained an additional selection step to remove non-maxRNA presenting cells or intracellular components (Fig. 4a, Additional file 1: Table S2, Fig. S6A). We did not anticipate identical results from these 3 experiments due to their differences in RNA-enrichment and membrane collection methods.
In the first experiment, we purified the cell membrane before the pulldown of the probe-RNA hybrids (Surface-FISHseq-membrane). We generated 3 Surface-FISHseq-membrane libraries from PBMC samples derived from 3 different donors (Fig. 4c, e, membrane+ tracks in red). In parallel, we generated 3 control libraries from total purified membrane RNA from these same PBMC samples (Membrane control tracks in blue, Fig. 4c, e). A comparison of Surface-FISHseq-membrane libraries against control libraries based on DEseq2 [14] resulted in 5722 RNAs at the significance level of FDR < 0.15 (blue circle, Fig. 4b), including both protein-coding and non-coding RNAs (Additional file 1: Fig. S6B).
In the second experiment, we used FACS to collect isFISH+ cells, followed by maxRNA biotin-purification and sequencing (Surface-FISHseq-FACS). We generated 2 Surface-FISHseq-FACS libraries from 2 PBMC samples of 2 donors (FACS+ tracks in red, Fig. 4c and E). Two control libraries were generated from pulldown using the dArt4 probe, from the same 2 PBMC samples (FACS control tracks in blue, Fig. 4c, e). A comparison between the test and the control libraries based on DEseq2 [14] resulted in 1976 RNAs at the significance level of FDR < 0.15 (green circle, Fig. 4b).
In the third experiment, we used psoralen to reversibly cross-link with ultraviolet light the hybridized probes to their RNA targets. We then purified probe-bound maxRNA for sequencing (Surface-FISHseq Psoralen). Psoralen only cross-links hybridized nucleic acids and does not cross-link nucleotides with proteins, which, when combined with subsequent stringent washes, minimizes indirect interactions or promiscuously attached molecules. In total, 4 libraries were generated for this experiment, including 1 targeted maxRNA library (Psoralen+ track in red, Fig. 4c, e) and 3 control libraries. The first control library is obtained following the same procedure of the targeted maxRNA library, except omitting psoralen during the cross-linking step. The remaining 2 control libraries used a 20-nt probe against dArt4 in place of the 20-mer oligo library and were carried out with and without psoralen cross-linking (psoralen control tracks in blue, Fig. 4c, e). A comparison of the experiment and the control libraries based on DEseq2 [14] resulted in 1571 RNAs at the significance level of FDR < 0.15 (orange circle, Fig. 4b).
Pairwise comparisons of the 3 experiments revealed significant overlaps of genes detected from each experiment (odds ratio between experiments 1 and 2 = 4.53, p value < 2.2e−16; odds ratio between experiments 1 and 3 = 19.65, p value < 2.2e−16; odds ratio between experiments 2 and 3 = 5.08, p value < 2.2e−16). A total of 118 maxRNA genes were identified by all 3 Surface-FISHseq experiments (intersection, Fig. 4b). Taken together, Surface-FISHseq prioritized specific maxRNAs for downstream tests of their possible relevance to cellular functions.
Cell-cell interactions are impaired by blocking specific maxRNAs
Considering the correlation between maxRNA presentation and monocytes among all PBMC cell types, we evaluated next whether maxRNAs impact cellular functions of the monocytes. We prioritized cell-cell interactions based on the biological functions of monocytes and the requirement of surface molecules in the cell-cell interactions. As one of the major cell types in innate immune and inflammatory response, monocytes have the unique property to interact with a wide range of cell types, including platelets [23], vascular endothelial cells (ECs) [24], and smooth muscle cells [25]. Among these, monocyte-EC interaction is essential under both steady-state conditions and during inflammatory responses [24, 26]. This process, initiated by surface molecules, can be reproducibly quantified by monocyte-EC adhesion assay [24, 26]. Therefore, we chose the monocyte-EC attachment level as a functional readout of the maxRNA. We tested a total of 11 Surface-FISHseq-prioritized candidate maxRNAs, namely IDH1, NEDD4, CENPF, ATF1, QKI, CEP350, ARL14EP, CRNDE, CMTM6, CTSS, and FNDC3B on the monocytes isolated from PBMCs (Fig. 4b).
To perturb the maxRNA without interfering with the function of the protein encoded by the mRNA of the same gene, we used extracellular hybridization with antisense probes. Specifically, these antisense probes were designed to target regions with high Surface-FISHseq read coverage, which correspond to the exposed regions of the candidate maxRNAs (Fig. 4c–f). For each maxRNA, we designed a probe-set comprised of 25 antisense oligos (each 20 nt in length) to target the parts of the transcripts with Surface-FISHseq read coverage (test probe-sets). We incubated each probe-set with monocytes for 60 min before fluorescent-labeling and seeding them onto confluent human umbilical vein endothelial cells (HUVEC). The monocyte-EC attachment level was measured by a normalized fluorescent intensity reflecting the number of attached monocytes [27]. We also included 3 controls in the experiment. The first control was not incubated with any oligo probes (no-probe control). The second control was with a probe-set comprised of 25 antisense oligos (each 20 nt in length) against dArt4 (dArt4 control). The third control was a randomized 20-nt probe-set (random 20-mer control).
As expected, the two control groups did not exhibit a detectable difference in monocyte-EC attachment levels in 8 repeated experiments (the first two columns, Fig. 4g). The probe-sets against IDH1 and NEDD4 did not induce a detectable difference either (Fig. 4g). Although probe-sets targeting CENPF, ATF1, QKI, CEP350, ARL14EP, and CRNDE resulted in lower monocyte-EC attachment levels as compared to the controls, none of these differences reached the threshold of Bonferroni-adjusted p value < 0.001. Finally, the monocytes incubated with antisense probes against CTSS, FNDC3B, and CMTM6 exhibited reduced monocyte attachment levels (Bonferroni-adjusted p value < 0.001, Kruskal-Wallis test) (Fig. 4g), suggesting that antisense probes against specific maxRNAs can attenuate the monocyte attachment to vascular ECs.
Specific regions of maxRNAs modulate cell-cell interactions
To test if any specific region of a maxRNA is responsible for the reduced monocyte attachment levels, we repeated the above experiments with individual 20-nt probes. Based on Surface-FISHseq read coverage, we chose 9 probes from the FNDC3B probe-set for this test, including 4 probes targeting 4 exons (Exon 22, 24–26) and 5 probes targeting the 3′UTR (probe track, Fig. 4c, d). Similarly, we also included two controls, i.e., a no-probe control and a 20-nt probe against dArt4 (dArt4 control). As expected, the no-probe control and the dArt4 control did not exhibit any detectable difference (black and red bars, Fig. 4h). No significant difference was detected from 5 out of the 9 tested probes (E1, E2, E3, E4, U1, Fig. 4h), suggesting that not all parts of the FNDC3B transcript were responsible for monocyte-EC attachment. However, each of the other 4 tested probes reduced the monocyte attachment levels as compared to the no-probe control (Bonferroni-adjusted p value < 0.001, Kruskal-Wallis test) (U2, U3, U4, U5, Fig. 4h). All these 4 probes targeted the 3′ tail of the 3′UTR (Fig. 4d), consistent with a reproducible Surface-FISHseq peak in the 3′ portion of the 3′UTR (pink tracks, Fig. 4d). These data suggest that not all parts of a maxRNA are equally important for their cell surface functions.
To test whether the above observation could be reproduced with another maxRNA, we repeated the experiment with 11 probes from the CTSS probe-set, including 1 intronic probe (E1), 1 exonic probe (E2), and 9 probes spanning the 3′UTR (bottom track, Fig. 4e, f). Neither the intronic probe (E1) nor the exon probe (E2) resulted in a significant change in the attachment levels (Fig. 4i). The U7 probe at the center and U9 probe at the 3′ end of the 3′UTR did not affect the attachment level either. However, all the other 7 probes targeting 3′UTR of CTSS show a trend towards reduction of monocyte attachment levels, with U2, U3, and U8 probes reaching the significance level of Bonferroni- adjusted p value < 0.001 (Kruskal-Wallis test). Taken together, monocyte-EC interactions can be modulated by extracellular hybridization of antisense oligos targeting towards specific parts of FNDC3B and CTSS transcripts. These data suggest that the exposure of maxRNAs to the extracellular milieu is required for proper cell-cell interactions.