Systematic assessment of tissue dissociation and storage biases in single-cell and single-nucleus RNA-seq workflows

Background Single-cell RNA sequencing has been widely adopted to estimate the cellular composition of heterogeneous tissues and obtain transcriptional profiles of individual cells. Multiple approaches for optimal sample dissociation and storage of single cells have been proposed as have single-nuclei profiling methods. What has been lacking is a systematic comparison of their relative biases and benefits. Results Here, we compare gene expression and cellular composition of single-cell suspensions prepared from adult mouse kidney using two tissue dissociation protocols. For each sample, we also compare fresh cells to cryopreserved and methanol-fixed cells. Lastly, we compare this single-cell data to that generated using three single-nucleus RNA sequencing workflows. Our data confirms prior reports that digestion on ice avoids the stress response observed with 37 °C dissociation. It also reveals cell types more abundant either in the cold or warm dissociations that may represent populations that require gentler or harsher conditions to be released intact. For cell storage, cryopreservation of dissociated cells results in a major loss of epithelial cell types; in contrast, methanol fixation maintains the cellular composition but suffers from ambient RNA leakage. Finally, cell type composition differences are observed between single-cell and single-nucleus RNA sequencing libraries. In particular, we note an underrepresentation of T, B, and NK lymphocytes in the single-nucleus libraries. Conclusions Systematic comparison of recovered cell types and their transcriptional profiles across the workflows has highlighted protocol-specific biases and thus enables researchers starting single-cell experiments to make an informed choice.

Next, we investigated what genes contribute to the ambient RNA profile. We again selected all 'empty droplets' and now summed up their counts for each sample to obtain aggregate ambient RNA counts. Then we calculated the percentage of counts attributed to each gene in each sample. Additional file 2: Fig. S12C shows the top 20 genes with the highest percentage in methanol-fixed samples. The figure indicates that mitochondrial genes compose a larger fraction of ambient RNA in fresh samples. In contrast, haemoglobin RNA and abundant tubular transcripts that are seen in Fig. 3D contribute more to the ambient RNA profile of methanolfixed kidneys.
Hence, while the total amount of ambient RNA does not differ drastically between the fresh and methanol-fixed samples, we found differences in ambient RNA profile composition, which is ultimately seen in the downstream analysis as higher levels of haemoglobins and tubular genes in methanol-fixed cells.

Supplementary Note 3. Comparison of three nuclei isolation protocols.
Aggregate gene expression was highly correlated in the three nuclei isolation protocols (Additional file 2: Fig. S13) and each yielded similar numbers of nuclei with similar numbers of genes and UMIs detected per nucleus (Additional file 2: Fig. S14). The major difference observed was a higher percentage of reads mapping to mitochondrial genes in the SN_FANS_1x2000g_v3 data (mean of 1.69%), vs 0.27% and 0.15% for the SN_sucrose and SN_FANS_3x500g_v3 data, respectively (Additional file 2: Fig. S14). Additional files 11-13 list differentially expressed genes for three pair-wise comparisons between the protocols, calculated for each cell type separately using Wilcoxon test in Seurat [5] with thresholds of logFC = 0.5, minimum detection rate 0.5, FDR < 0.05. The differential expression analysis suggested that contamination of all cell populations with highly expressed kidney transcripts was the strongest in SN_FANS_1x2000g_v3 and the lowest in SN_FANS_3x500g_v3. In terms of cell type composition, SN_FANS_1x2000g_v3 and SN_sucrose were mostly similar, while SN_FANS_3x500g showed significant differences across several cell populations (Additional file 2: Fig. S13). Taking the low levels of contamination and the simpler protocol without need for specialist FANS equipment we recommend the SN_sucrose protocol.

Supplementary Note 4. Comparison of bulk RNA-seq of intact kidneys and cold-dissociated cell suspensions.
We compared bulk RNA-seq profiles of undissociated kidneys to bulk RNA-seq profiles of cold-dissociated single-cell suspensions derived from kidneys of Balb/c female mice. Raw counts were normalised to gene length and then to library sizes using weighted trimmed mean of M-values (TMM) method in edgeR [3], to derive gene length corrected trimmed mean of M-values (GeTMM) as described in [6] (see Methods). Note that single-cell suspensions were filtered through 70µm and 40µm cell strainers, hence, this comparison could potentially reveal poorly dissociated cell types.
Differential expression analysis (edgeR exact test [3] with FDR < 0.05 and logFC threshold of Additional file 2: Fig. S16 shows a heatmap for average expression levels of the 102 genes in all cell types in the single-cell dataset. The heatmap reveals that several cell types, such as neutrophils, proximal tubules or endothelial cells, express genes which showed higher expression in undissociated than dissociated kidneys. This might indicate that the corresponding cell types were incompletely dissociated. Surprisingly, mesangial cells express both genes which showed higher expression in undissociated kidneys and genes with higher expression in dissociated suspensions, which might point to mesangial cell subtypes that are unequally represented in intact vs dissociated bulk RNA-seq kidney profiles.