Cell integrity and RNA quality present crucial requirements for successful single-cell transcriptome sequencing experiments. Conventional conservation processes, such as freezing, lead to crystallization and disruption of cellular membranes, which impedes subsequent single-cell preparation. To conserve intact and viable cells for cell and tissue archiving, cryoprotectants are commonly used; however, their compatibility with single-cell experiments has not been established. We tested whether cells preserved with the cryoprotectant dimethyl-sulfoxide (DMSO) are suitable for single-cell genomics workflows. We sequenced 670 fresh and 816 cryopreserved single cells derived from cell lines and primary tissues (Additional file 1: Figure S1 and Additional file 2: Table S1). Single-cell transcriptome libraries were prepared with the massively parallel single-cell RNA-sequencing (MARS-Seq) protocol [5, 6]. To evaluate the impact of cryopreservation on single-cell full-length transcriptomes, we applied the Smart-seq2 protocol [12]. A variety of statistical methods, including the most common measures in single-cell genomics, were applied.
We used the MARS-Seq sample preparation protocol to determine potential impacts of the cryopreservation procedure on single-cell RNA profiles. We initially isolated single cells from four cell lines HEK293 (human embryonic kidney cells), K562 (human leukemia cells), NIH3T3 (mouse embryo fibroblasts), and MDCK (canine adult kidney cells) by fluorescence-activated cell sorting (FACS). The cells were either freshly harvested or cryopreserved in the presence of DMSO at –80 °C or in liquid nitrogen prior to single-cell separation and library preparation. To minimize technically introduced batch effects between conditions, all single cells were processed simultaneously for library preparations and sequencing reactions. As expected, the freezing process resulted in an elevated proportion of damaged cells, indicated by the positive staining with propidium iodide. HEK293, K562, and NIH3T3 presented 14%, 2%, and 15% of damaged cells when processed freshly, 66%, 55%, and 20% when cryopreserved at –80 °C, respectively. Conservation in liquid nitrogen slightly improved cell viability showing 61%, 49%, and 17% of damaged cells, respectively. Nevertheless, sequencing reads produced from sorted viable cells displayed an equal distribution over the transcripts (characteristic 3’ bias for MARS-Seq libraries), excluding systematic errors in the library preparation process (Fig. 1a).
Following gene expression quantification, we evaluated to which extent transcriptome information is maintained within single cells and compared transcript and gene information content between fresh and the cryopreserved (–80 °C and liquid nitrogen) conditions. A comparable number of genes was detected by cumulating information from single cells, suggesting that the power to detect gene transcripts in the conserved material is not reduced (Fig. 1b and Additional file 1: Figure S2). We further observed that libraries from fresh and cryopreserved cells produced a similar number of sequencing reads (Additional file 1: Figure S3). Importantly, we found a highly correlated linear relationship between the number of sequencing reads and unique transcripts for both conditions. This indicates that the capacity to capture transcript molecules and the library complexity is not different between both conditions (linear regression model; Fig. 1c and Additional file 1: Figure S4). In line, equal sequencing depth identified similar numbers of expressed genes (linear regression model; Fig. 1d and Additional file 1: Figure S5).
We further assessed the impact of sample conservation on single-cell transcriptome profiles. Genes with variable expression patterns are commonly utilized for the identification of cell subtypes, thus differences between conditions could introduce technical artefacts that complicate data interpretation. Importantly, dimensionality reduction representations using the most variable genes (MVG) point to a general conservation of the single-cell transcriptome during cryopreservation. Expression patterns from cryopreserved cells were similar to freshly processed cells in principal component analyses (PCA) (Fig. 1e and Additional file 1: Figure S6) and t-distributed stochastic neighbor embedding representations (t-SNE) (Fig. 1f and Additional file 1: Figure S6). Small differences between fresh and cryopreserved samples (Fig. 1e and Additional file 1: Figure S6) were considerably lower than technically introduced batch effects when two sequencing pools were compared (Additional file 1: Figure S7a, b) and could be the result of different sampling time points (biological variability). The homogeneity between single cells and conditions in t-SNE representations was stable with varying perplexity parameter selection, underlining the robustness of the results (Additional file 1: Figure S8). Determining the MVG separately for fresh and cryopreserved samples showed an average overlap of 53% (range 51–56%). Randomly subsampling (100 permutations) only fresh cells into two groups resulted in an average overlap of 38% (range 37–41%), while MVG overlapped in 36% (range 35–37%) when fresh and cryopreserved cells were sampled at the same cell numbers. Analyzing transcriptional uniformity across cell types, variably expressed genes distinguished between K562 and HEK293 cells, while processing conditions mixed homogenously in dimensionality reduction representations (Additional file 1: Figure S9a, b).
High similarities between single cells from fresh and cryopreserved (–80 °C) cells were confirmed by direct correlation analysis, showing highly consistent and representative gene expression profiles of HEK293 cells after cell conservation (Fig. 2a). As expected analyzing homogenous cell populations, expression profiles showed high correlation values between single cells of the same type and condition (Pearson’s correlation test, Fig. 2b). However, also between conditions, transcription profiles were highly correlated (Pearson’s correlation test, Fig. 2a, b), suggesting the freezing process to conserve single-cell transcriptome profiles. These results were reproducible across the different cell types and species (Additional file 1: Figure S10a–h). Further, we evaluated the specificity of such analysis by combining the analysis of different cell types. In accordance with the presence of tissue-specific expression programs, HEK293 and K562 cells displayed correlating profiles of their respective single cells and highly decreased associations across samples (Fig. 2c). These patterns were conserved in fresh and cryopreserved cells. Consistent expression profiles were further supported by highly correlating mean expression values when directly comparing both conditions (Fig. 2d and Additional file 1: Figure S10i–k).
In order to evaluate potential impacts on comparative expression analyses involving fresh and conserved sample types, we assessed differentially expressed genes between both conditions. In the four-cell line, we only detected a single significantly differentially expressed gene between fresh and cryopreserved samples (adjusted p value < 0.01, Additional file 2: Tables S2–5), supporting the possibility to include conserved material in studies profiling freshly processed samples. Finally, biological processes that one might suspect to change due to a challenge, such as cell cycle and apoptotic programs, remained unchanged (Fig. 2e, f). Moreover, cell subpopulations identified by hierarchical clustering of the most variable gene sets (see “Material and methods”) were equally identified in fresh and cryopreserved samples (Fig. 2g, h). We detected a similar proportion of fresh and conserved cells in a subpopulation with an activated cell cycle program indicated by G2/M checkpoint genes (χ2 test, p = 0.83; Fig. 2i). Of note, comparable compositions of cellular subtypes were also identified processing fresh and conserved cells separately, with 37% and 34% of cells pointing to cell cycle activation, respectively (Additional file 1: Figure S11).
We extended the results studying single-cell libraries produced using the Smart-seq2 protocol, which is a widely used technique to sequence full-length transcripts from single cells [12]. To this end, we produced Smart-seq2 libraries for 48 fresh and 47 cryopreserved (–80 °C) HEK293 and K562 cells, respectively, and sequenced to an average depth of 7.7 million reads per single cell (Additional file 1: Figure S1). No systematic bias in sequencing read distribution across the transcripts was detected, supporting a conserved integrity of the RNA following cryopreservation (Figs. 3a and 4a). Most libraries showed an equal distribution of sequencing reads from the 5’- to the 3´-end, indicating the detection of full-length transcripts from both conditions. Of note, while HEK293-derived libraries presented extremely consistent distribution profiles between cells and conditions (Fig. 3a), K562 cells displayed higher heterogeneity (Fig. 4a). Here, a few libraries observed in fresh and cryopreserved samples showed a bias towards the 3’-end, pointing to a partial degradation of the RNA. We excluded that the increased coverage heterogeneity observed for the cryopreserved K562 cells (Fig. 4a) is indicative of the cryopreservation performance by sequencing an additional sample with Smart-seq2. Here, the analysis of a patient-derived orthotopic xenograft (PDOX) of a lung adenocarcinoma cryopreserved for six months did not show an elevated heterogeneity between samples across the transcripts (Additional file 1: Figure S12a).
Cumulating gene expression information over single cells pointed to conserved transcriptome content in archived samples (Figs. 3b and 4b). Further, dimensionality reduction representation of the most variable genes could not distinguish between fresh and cryopreserved cells (Figs. 3c, d and 4c, d) and clearly separated the analyzed tissue types (Additional file 1: Figure S12b, c). Correlation analysis of gene expression profiles from single cells supported transcriptional profiles to be highly conserved following cryopreservation (Figs. 3e, f and 4e, f). Hierarchical clustering and t-SNE representation of the most variable gene sets (see “Material and methods”) were able to identify subpopulations in HEK293 (Fig. 3g, h) and K562 (Fig. 4g, h) samples and did not point to proportional differences between conditions (χ2 test, p = 0.46 and p = 0.86, respectively). Consistent with cell populations identified using MARS-Seq, 44% of HEK293 cells presented an activated G2/M checkpoint program (Fig. 3i), a result that could be replicated using separate analysis for fresh and frozen samples (Additional file 1: Figure S13a, b). Finally, only two genes were detected to be differentially expressed between both conditions in HEK293 cells (adjusted p value < 0.01, Additional file 2: Table S6), further supporting the possibility to join fresh and conserved samples in combined studies. Of note, significantly differentially expressed genes showed high variance within the conditions, supporting the possibility that the low number of analyzed cells led their identification and not differences between conditions per se (Additional file 1: Figure S13c). Interestingly, we identified a small subpopulation of K562 cells with a repressed G2/M checkpoint program (Fig. 4i), a result that could be replicated in separate analysis of fresh and cryopreserved cells (Additional file 1: Figure S14a, b). Comparing both conditions in the K562, we detected ten genes to be differentially expressed (adjusted p value < 0.01, Additional file 2: Table S7), which again presented high variability within the respective conditions (Additional file 1: Figure S14c).
Although conserving cell cultures for single-cell analysis opens up the applicability to more complex experimental designs, we intended to further widen the application spectrum to complex primary tissues. We performed MARS-Seq experiments on fresh and cryopreserved human peripheral blood mononuclear cells (PBMC), mouse colon tissue, and finally extended the work to human tumor samples.
We prepared MARS-Seq libraries for 341 cells derived from PBMC. While the freshly prepared sample did not show any sign of damaged cells, 23% of cryopreserved PBMCs stained positive with the marker reagent propidium iodide. Consistent with the results obtained from the cell line experiments, fresh and cryopreserved blood cells produced libraries of comparable complexity. We found a similar linear relationship between the number of sequencing reads and unique transcript counts (Fig. 5a) or the number of detected genes (Fig. 5b), suggesting equal transcriptome capture efficiencies in for both conditions. In line, cumulating gene information over single cells revealed highly comparable numbers of expressed genes in both datasets (Fig. 5c). Correlation analysis of average gene expression levels between both conditions further supported an efficient transcriptome conservation following cryopreservation (Fig. 5d). Dimensionality reduction representation of the most variable genes across datasets clearly suggested gene expression profiles to be unaltered following cell conservation (Fig. 5e). In line, we could not detect differentially expressed genes between fresh and cryopreserved single cells (Additional file 2: Table S8).
Mononuclear blood cells consist of a variety of well-defined subtypes, with marker genes indicating distinct cell populations. Consequently, we performed sample deconvolution to identify and assign blood subpopulations. Using clustering of correlating gene sets and signatures defined through the analysis of sorted blood cell types (see “Material and methods”), we observed four distinct cell populations, both detectable in fresh and cryopreserved samples (Fig. 5f–h). Based on marker genes and cell-type signatures, we assigned the four subpopulations to represent cytotoxic T-cells (expression of NKG7/GNLY/GZMB, red cluster, Additional file 1: Figure S15a), memory T-cells (expression of CD3D/G/E and CD8A/B, black cluster, Additional file 1: Figure S15b), B-cells (expression of CD24 and CD79A/B, turquois cluster, Additional file 1: Figure S15c), and myeloid cells (expression of CD33, pink cluster, Additional file 1: Figure S15d). Due to the limited number of cells analyzed, we were not able to distinguish between CD4-positive and CD8-positive cell populations or to clearly define natural killer cells within the cytotoxic subpopulation. Surprisingly, although the B-cell, monocyte and T-cell clusters were formed by equal proportions of fresh and cryopreserved cells (χ2 test, p = 0.28; Fig. 5f), we detected a bias towards preserved T-cells in the cytotoxic subpopulation (χ2 test, p = 0.0005; Fig. 5f). While this bias could represent donor blood composition variation at the sampling time points, we cannot finally exclude that it points to a technical artefact introduced by the preservation process and further tests are required with a focus on specific blood cell populations. Nevertheless, all main cell-type clusters could be detected at equal proportions using fresh and cryopreserved samples (Fig. 5f), suggesting that cryopreserved blood could be a suitable resource for single-cell transcriptomics analysis.
A fresh mouse colon sample was split and one part was cryopreserved for one week before single-cell separation. As observed previously, cryopreservation resulted in an increased proportion of damages cell, detecting 30% in fresh and 68–71% in conserved samples. This proportion was similar comparing samples cryopreserved as minced tissue (68%) or as a single-cell solution (71%). Due to the donor-matched design, fresh and cryopreserved cells could not be processed in the same library and sequencing pools, resulting in confounding batch effects when directly comparing both conditions. However, highly similar library complexity and an unaltered power to detect gene signatures and cell types further supported the value of cryopreserved tissue in single-cell studies. Specifically, both conditions resulted in libraries with comparable complexity as determined by the linear relationship between the number of sequencing reads and detected transcripts or genes (Fig. 6a, b). We were able to detect similar numbers of genes by cumulating information over single cells (Fig. 6c). Average gene expression levels were highly correlated (Fig. 6d). Due to the introduced batch effects, we could detect patterns in the transcriptional profile of the most variable genes (Fig. 6e); however, these did not bias the annotation to cell subpopulation after hierarchical clustering and t-SNE representation (Fig. 6f, g). We were able to identify transit amplifying (TA) cells, secretory enteroendocrine cells, and differentiated enterocytes in both conditions (Fig. 6g, h); the major cell types present in the colon mucosa. The single-cell transcriptome data enabled us to assign colon cell types to cell clusters using marker genes [9], such as Reg4 (secretory cells), Apoa1 (enterocytes), or ribosomal proteins (TA cells) (Fig. 6h). Importantly, all cell types were identified in equal proportions by fresh and cryopreserved cells, excluding a systematic bias introduced by the conservation process (χ2 test, p = 0.95; Fig. 6g, h). We conclude that the conservation process did not alter the transcriptional profile of single colon mucosa cells and that both, single-cell sequencing of fresh and conserved tissues, is equally suitable to extract biologically relevant information, such as cell-type-specific programs.
Finally, we applied our method to a PDOX in mouse. It is of note that digestion of the tumor sample to single-cell solution prior to cryopreservation resulted in an increased proportion of damaged cells (56% damaged cells), compared to the sample conserved as minced material (26% damaged cells). The cryopreserved ovarian clear cell carcinoma orthoxenograft (passage #2) was processed simultaneously with a matched freshly resected PDOX (passage #3). Therefore, the tumor was cryopreserved for three months and simultaneously sub-cultured in a mouse to obtain a fresh matched specimen. Consistent with prior observations, single transcriptome profiles of fresh or conserved tumor cells did not differ in their library complexity (Fig. 7a, b), transcriptional profiles (Fig. 7c), or gene expression levels (Fig. 7d). No significantly differentially expressed genes could be detected (Additional file 2: Table S9). Single cells from both conditions were able to detect a large tumor cell subpopulations (Fig. 7e, f) with an elevated expression level of ribosomal protein-coding genes (Fig. 7g) and a minor population with an activated G2/M checkpoint profile (Fig. 7h), further highlighting tissue conservation to be possible for various experimental designs, including tumor samples. We observed a proportional bias between the conditions within the putative tumor subpopulations when clustering cells using variable gene sets (χ2 test, p = 0.0002, Fig. 7g). Although we cannot finally exclude the preservation having caused the proportional shift, variant clonal heterogeneity presents a highly frequent feature of serial passaging of PDOX samples [13]. Interestingly, these differences were absent when analyzing the most variable genes separately (χ2 test, p = 0.87, Additional file 1: Figure S16), suggesting that gene set-based hierarchical clustering might be too restrictive to sensitively assign subclonal structures in heterogeneous cancer samples.