Correlation of gene expressions between nucleus and cytoplasm reflects single-cell physiology

Background Eukaryotes transcribe RNAs in nuclei and transport them to the cytoplasm through multiple steps of post-transcriptional regulation. Existing single-cell sequencing technologies, however, are unable to analyse nuclear (nuc) and cytoplasmic (cyt) RNAs separately and simultaneously. Hence, there remain challenges to discern correlation, localisation, and translocation between them. Results Here we report a microfluidic system that physically separates nucRNA and cytRNA from a single cell and enables single-cell integrated nucRNA and cytRNA-sequencing (SINC-seq). SINC-seq constructs two individual RNA-seq libraries, nucRNA and cytRNA per cell, quantifies gene expression in the subcellular compartments and combines them to create a novel single-cell RNA-seq data enabled by our system, which we here term in-silico single cell. Conclusions Leveraging SINC-seq, we discovered three distinct natures of correlation among cytRNA and nucRNA that reflected the physiological state of single cells: The cell-cycle-related genes displayed highly correlated expression pattern with minor differences; RNA splicing genes showed lower nucRNA-to-cytRNA correlation, suggesting a retained intron may be implicated in inhibited mRNA transport; A chemical perturbation, sodium butyrate treatment, transiently distorted the correlation along differentiating human leukemic cells to erythroid cells. These data uniquely provide insights into the regulatory network of mRNA from nucleus toward cytoplasm at the single cell level.

3 nuclei (nucRNA-seq) is an emerging alternative to profile gene expressions of cells in tissues that cannot be readily dissociated such as the adult brain and frozen samples. The method is further capable of coupling with sorting by fluorescence activated cell sorters [4,7], Fluidigm C1 [5], and Drop-seq [8], and demonstrated feasibilities of identifying cell types and cell cycles with nucRNA-seq data [9]. Although these works hypothesise that the nucRNA expression is representative of whole cells, to date, the direct evidence of the correlation in the cytRNA and nucRNA expression at single-cell resolution has not been provided.
Recent technical advances have further enabled combined sequencing at multiomic levels within the same single cells [10][11][12] and helped to understand the links underlying the regulatory cascade. Several microfluidic [13][14][15][16] and non-microfluidic protocols [17,18] offer parallel transcriptional and genomic analyses on the same single cell by fractionating cytRNAs and nuclei of single cells. However, we know of no work that has reported an integrated nucRNA-seq and cytRNA-seq with the same cell to study RNA transport and gene regulation and function through splicing of pre-mRNA [19,20].
Here we develop a novel single-cell sequencing method, SINC-seq, which combines a microfluidic protocol that physically fractionates nuclear and cytoplasmic RNAs and a subcellular RNA-seq pipeline to dissect RNA expressions in the individual subcellular compartment. We utilise SINC-seq to explore both correlated and uncorrelated gene expression between the compartments with single K562 human leukemic cells. We further explore correlation dynamics that reflect the transient response of cells with differentiating K562 cells to erythroid cells under a perturbation of sodium 4 butyrate, a histone deacetylase inhibitor. These data reveal how eukaryotes manage subcellular RNA expressions via inter-compartment regulation.

Results
A Microfluidic platform for single-cell integrated nuclear and cytoplasmic RNAsequencing: SINC-seq To dissect transcriptional correlation in the subcellular compartments, we devised SINC-seq that combines an electrophoretic fractionation of cytRNA from the nucleus [14][15][16] with off-chip RNA sequencing (Fig. 1a, b). SINC-seq constructs individual RNA-seq libraries with cytRNA and nucRNA and integrates the sequencing data in a new form of sequencing data, which we term an in-silico single cell. SINC-seq starts with a microfluidic protocol that leverages a hydrodynamic trap that captures a single cell, concentrates an electrical field to lyse the cytoplasmic membrane selectively while leaving the nuclear membrane relatively intact, and retains the nucleus during electric-field-based extraction of cytRNA to fractionate them (Fig.1b, c, Supplementary   Fig. 1, Supplementary Video 1, and Methods). The microfluidic system completes the entire process with a voltage control via three end-channel electrodes and outputs the cytRNA and nucleus to different wells in less than 5 min. We note that the hydrodynamic trap integrated in this work couples hydrodynamic flow and electric field concentration, and uniquely enables a highly automated workflow and about 20-fold reduction in the applied voltage compared to our previous protocol [14][15][16]. These key improvements allowed us to study RNA expressions in subcellular compartments of single cells systematically.

5
Library preparation and quality control with SINC-seq To critically evaluate SINC-seq, we performed experiments with 93 single cells of K562 human myeloid leukaemia cells and generated 186 RNA-seq libraries with offchip Smart-seq2 protocol [21]. Of the 93 single cells analysed in this experiment, all showed successful extraction with monitoring current during extraction (Supplementary

SINC-seq dissects the difference in subcellular gene expression
To assess the performance of SINC-seq, we computed gene expression with an in-silico single cell analysis. We used this to benchmark the sensitivity and repeatability respectively, and 8,640±1,150 genes per cell with transcripts per million (TPM) greater than 0 (Fig. 1d). SINC-seq also revealed that ~16% transcripts were in the nucleus and ~84% in the cytoplasm (Fig. 1e) and enriched expression of 226 and 3,035 genes, respectively (Fig. 1f). On average, SINC-seq displayed about 6.3% smaller number of detected genes than the conventional single-cell RNA-seq (scRNA-seq) that detected 9,230±1,220 genes (n=12). Notably, our in-silico single cell data showed a wider dynamic range in the detection of genes as compared to scRNA-seq ( Supplementary Fig. 4k). To the scRNA-seq, the in-silico single-cell data showed the higher coefficient of correlation computed with log-transformed expression (log10(TPM+1)) than either cytRNA-seq or nucRNA-seq (Fig. 1g). Combined, average gene expression profile of 12 in-silico singlecells ( Supplementary Fig. 5) showed an excellent matching with that of 12 scRNA-seq (r = 0.949, Pearson correlation coefficient, Supplementary Fig. 3q). The total number of detected genes in the 12 in-silico single cells was 15,314, of which 13,286 genes were also detected in the average of the 12 scRNA-seq ( Supplementary Fig. 3t). We again stress that the in-silico normalisation and resulting scRNA-seq is a novel method, which uniquely leverages the physical separation and minimal cross-contamination enabled by our electrophoretic fractionation.

Cell cycle-related genes show correlated expression in cytoplasm and nucleus
To view the landscape of the correlation between nucRNA and cytRNA, we computed cross-correlation of the individual gene as a measure of covariation in the two subcellular compartments and ranked the genes with the coefficient of correlation (Fig. 2a). Gene ontology analysis revealed that the highly correlated genes (top 10%) had 7 cell-cycle as an enriched function (Fig. 2b) and the bottom 10% including anti-correlation had RNA processing (Fig. 2c).
To dissect the highly correlated gene expression, we first focused on cell cycle based on transcriptional oscillations [2] and phase-score analysis [3]  Notably, we found that the anti-correlation among out-of-phase genes was slightly higher in a nucRNA (U-test, p = 7.81×10 -26 , Supplementary Fig. 6d), suggesting that nucRNAseq more directly detected phase transition in the transcriptional oscillation of cell-cycle genes. On the other hand, the correlation among in-phase genes was higher in cytRNA may indicate further modulation in the cytoplasm.
To elucidate how the transcriptional oscillations in nucRNA modulated the gene expression in the cytRNA, we extended the analyses to compute the cross-correlation between cytRNA and nucRNA with cell-cycle genes. The cell-cycle genes showed synchronised oscillation in the each of the two subcellular compartments (Fig. 2g), consistent with the observation that the subpopulations segregated into the G1 and G2 groups showed corresponding up-and down-regulation of G1 and G2 genes ( Supplementary Fig. 6e,f, Supplementary Information). Together, these results suggested that the cytRNA and nucRNA had similar expression patterns of cell-cycle genes and both 8 of them solely had a potency to detect the cell-cycle.

Nuclear retained intron attenuates the transcriptional oscillation
As an instance of the uncorrelated genes, we next studied the retained intron (RI) mediated regulation of mRNA transport [22][23][24] Fig. 9). We examined the dynamics of the cross-correlation of DEG expression between cytRNA and nucRNA along with the pseudo-time [30] (Fig. 4a), which was obtained with in-silico single cell data ( Supplementary Fig. 10a). The cross-correlation between cytRNA and nucRNA within the same single cell (along with the diagonal shown in Fig. 4a) exhibited gradual decrease with the pseudo-time, suggesting that the subcellular gene expression patterns behaved differently and diverged along the differentiation. Notably, the correlation between nucRNA fourth day and cytRNA first day (near the top right corner in Fig. 4a) displayed lower correlation as compared to that between nucRNA first day and cytRNA fourth day (near the bottom left corner), suggesting strongly that nucRNA was the driver of the diverging gene expression. We reinforced this observation with the control experiments ( Supplementary Fig. 10b-d).
We hypothesised that the divergence of gene expressions in the two subcellular compartments might reflect the transient response of different regulatory pathways and enable to resolve the on/off regulation of the transcription leveraging nucRNA-seq. To test this hypothesis, we introduced a localisation-embedded principal components analysis (L-PCA) that computed PCs with the subcellular gene expression of DEGs (Fig. 4b). As expected, L-PCA resolved the trajectory of the differentiation slightly clearer than a conventional PCA that computed PCs with in-silico single cell data (Fig. 4c). To further corroborate the L-PCA, we performed PCA on datasets of an individual fraction, showing this specific and unique transient in the nucRNA compared to that in the cytRNA-that is, in nucRNA, the third day cluster was furthest from the day 0 th cluster. (Supplementary Fig. 10e, f). These results demonstrated that the SINCseq was able to detect important regulatory events. Such events are usually masked by abundant cytoplasmic transcripts in conventional single-cell sequencing. Our study also suggests a compelling caution to an approach that approximates the transcriptomic profile of the whole cell with that of a single compartment without validation. The SINC-seq platform will be broadly applicable to different types of cells as long as they are isolated as singles. The method thus will contribute to validate existing subcellular RNA-seq methods [4, 5, 7-9, 17, 18] and also define their limitations.

Conclusion
To dissect transcriptional correlation in the subcellular compartments, we devised SINC- We included PVP to suppress electroosmotic flow. We purchased Tris, HEPES, Imidazole, and HCl from Sigma-Aldrich, and PVP (MW 1 MDa) from Polyscience. We prepared all solutions in UltraPure DNase-/RNase-free deionised (DI) water (Life 13 Technologies).

Microfluidic system setup
We fabricated polydimethylsiloxane (PDMS, Sylgard 184, Dow Corning) microchannel superstructures ( Supplementary Fig. 1a,b)  Before each experiment, we preconditioned the microchannel by filling the inlet and outlet wells with washing solutions and applying vacuum at the waste well. Our washing process was as follows: 1 M NaOH for 1 min, 1 M HCl for 1 min, and deionised (DI) water for 1 min. All washing solutions contained 0.1% Triton X-100 to suppress bubble clogging in the hydrophobic microchannel.
Following this, we loaded 9.5 μL of LE and TE to the outlet and inlet wells, respectively, and briefly applied vacuum to the waste well to exchange the solution in the microchannel with LE and TE. The hydrodynamic pressure induced by buffers in the inlet and outlet wells created a pressure driven laminar flow from both inlet and outlet wells toward the waste well and formed a stable LE-TE interface at the junction of three microchannels. We then loaded a 1 μL cell suspension containing a single cell into the inlet well and introduced it into the microchannel via the pressure driven flow. Once we visually confirmed the captured single cell at the hydrodynamic trap (Fig. 1b), we added 9.5 μL of the TE to the waste well to reduce the pressure driven flow. We placed 300 m diameter platinum wire electrodes into the wells and applied -150 V, -170 V and 0 V to the electrodes at the inlet, waste and outlet wells, respectively. The DC voltage created a concentrated electrical field at the hydrodynamic trap ( Supplementary Fig. 1d)

Library preparation and mapping analysis
We synthesised respective cDNA libraries from the fractionated cytRNA and nuclear RNA separately using Smart-seq2 (SMART-seq v4 Ultra Low Input RNA Kit for Sequencing, Clontech) with 18 PCR cycles followed by purification with Agencourt AMPure XP (BeckmanCoulter). We examined the yield and quality of cDNA, respectively, with Qubit 2.0 Fluorometer (ThermoFisher Scientific) and qPCR targeting GAPDH (glyceraldehyde-3-phosphate dehydrogenase, Hs02758991_g1, ThermoFisher Scientific) and HBG (gamma-globin genes, Hs00361131_g1, ThermoFisher Scientific).
We performed the tagmentation reaction with 200 pg cDNA using Nextera XT DNA sample prep kit (Illumina) and purified the cDNA library following the manufacturer's protocol, except we eluted the cDNA sample with 24 L instead of 50 L (see Supplementary Fig. 2a for yields of cDNA). We pooled 98-108 libraries and sequenced these on an Illumina HiSeq2500 with 100-base paired-end reads to an average depth of 4.64 million reads ( Supplementary Fig. 2b, c). We mapped the trimmed sequencing reads to the transcripts derived from the human reference genome (GRCh37.75) using a STAR(ver.2.5.1b) of mapping program [31] with ENCODE options, and calculated expression estimates with TPM using RNA-seq by expectation maximisation (RSEM ver.1.3.0) [32]. The average transcriptomic alignments were 94±1% (mean±s.d.) and 93±1%, respectively, with cytRNA-seq and nucRNA-seq ( Supplementary Fig. 2d).

Analysis of intron retention
We computed intron expressions with fragments per kilobase of intron per million mapped reads (FPKM) using 347,041 unique introns (longer than 50 nt) on the genome annotation with 48 SINC-seq data of K562 cells under a standard culturing condition. On average, SINC-seq yielded 14.8% reads mapped to introns with nucRNA-seq, but only 1.1% with cytRNA-seq. Further, SINC-seq detected 37,900±10,800 and 33,800±8,080 unique introns in nucRNA-seq and cytRNA-seq, respectively, and 54,900±9,600 per cell with FPKM of more than 0. We identified an RI that had at least 10% expression level of the gene, 95% coverage in the intronic region, and non-zero expression in the adjacent exon. We discarded intron reads locating on a gene with less than 2 TPM. On the other hand, we identified a fully-spliced intron that had less than 1% expression of the gene and 50% coverage in the adjacent exon. We discarded intron reads that failed the criteria above. We validated the RI identification with splice site scores [33], which showed lower values with RIs than fully-spliced introns (p-value < 2.2×10 -16 , U-test), using 9mer (exonic 3mer + intronic 6mer) around 5' splice site, and 23mer (intronic 20mer + exonic 3mer) around 3' splice site. In total, we detected 17,335 RIs of which 12,950 and 2,177 RIs had a higher probability of RI in nucRNA and cytRNA, respectively. The RI's enrichment in the nucleus was consistent with the previous studies [22][23][24].
To identify the NRI, we calculated the probability of intron retention defined as the proportion of cells with the RI and identified NRI that had 0.25 higher probability in the nuclear fraction than in the cytoplasmic fraction. We then filtered unique NRIs discarding smaller NRIs that had an overlap with a long NRI. On the other hand, we identified CRI that had 0.25 higher probability in the cytoplasmic fraction than in the nuclear fraction.
Computing cyt-vs nuc-normalised data: in-silico single-cell normalisation We computed in-silico single-cell RNA-seq data with cytRNA-seq and nucRNA-seq data, scaling the raw TPM values and combining the cyt and nucRNA-seq data as , (S1) where  and  are normalisation factors, which make the summation of the TPM values, TPMin-silico, of the in-silico single cell data to be one million. We here write raw TPM values with an asterisk and TPM values of an in-silico single cell data, cytRNA-seq, and nucRNA-seq, respectively, with their subscripts. We calculate  and  using Ct of qPCR data taken at the QC of cDNA (Supplementary Data 1) as where TPM * cyt,geneA, TPM * nuc, geneA, and Ct are, respectively, the raw TPM value of gene A with cytRNA-seq, the raw TPM value of gene A with nucRNA-seq, and Ct=Ctnuc, geneA-Ctcyt, geneA, which is Ct with respect to gene A. We calculated pairs of  and  with GAPDH and HBG genes (HBG1+HBG2) as gene As, and used the mean  and  to compute the TPM.