Good scRNA-seq data quality after cold storage
We obtained the lung, esophagus, and spleen samples from 12 organ donors (Additional file 2: Table S1). The transplant surgeon assessed each organ as overall healthy in appearance. Whole genome sequencing (WGS) was carried out for each individual, confirming that none of the study participants displayed gross genomic abnormalities (Additional file 1: Figure S1). Furthermore, for each donor, histology sections were produced from the 12- or 24-h time points of each tissue, stained with hematoxylin and eosin, and assessed by a pathologist (Additional file 1: Figure S2). This confirmed all tissue sections as healthy, except one donor with possible lung hypertension. Heterogeneity between tissue sections, for example, the presence of glands, and amount of inflammation in some sections (Additional file 1: Figure S2), is likely to impact profiling by scRNA-seq.
Samples of lung parenchyma, esophagus mid-region, and spleen (n ≥ 5; experimental design, Fig. 1a) were placed into 4 °C HypoThermosol FRS solution immediately after collection (within 2 h of cessation of circulation with cold perfusion) and were kept at 4 °C until used for scRNA-seq. For the majority of lung donors (n = 5), tissue pieces were also flash frozen at the clinic (earliest possible time point), before transport to the processing site for bulk RNA sequencing. Following transport, fresh samples were either dissociated immediately (T0) or stored at 4 °C for 12, 24, or 72 h cold ischemic time prior to processing to single-cell suspension (Fig. 1b, Additional file 2: Table S1). The T0 time point varied depending on the length of the organ transplant procedure, time required to collect samples in the clinic, and speed of courier delivery (on average 4 h of cold ischemia from cessation of circulation to receipt of tissue at the processing laboratory). Other time points were processed at 12 h, 24 h, and 72 h after T0. Cells were analyzed by 10X 3′ v2 scRNA-seq (Fig. 1c), and the number of cells obtained for each sample is given in Additional file 3: Table S2. At each time point, tissue pieces were also flash frozen for bulk RNA-seq analysis.
After alignment and normalization of scRNA-seq data, quality control metrics were assessed for all samples (Fig. 1, Additional file 1: Figure S3). The number of reads per sample, number of cells per sample, median number of genes per cell, and other quality metrics did not change significantly over time for the lung and esophagus, but we did observe changes in the spleen at the 72-h time point (Fig. 1b–d, Additional file 1: Figure S3). The percentage of reads confidently (QC = 255) mapped to the transcriptome was stable for all samples except for the spleen at 72 h (Fig. 1.e). RNA quality was good and did not change with time in any of the tissues (Additional file 4: Table S3; RIN > 7 for the majority of bulk RNA seq samples, with the exception of four lower quality outliers in spleen mainly from a single donor). We conclude that in terms of quality metrics, we do not detect changes that are associated with the length of cold storage within 24 h of cold ischemic time.
Reduced scRNA-seq data quality by 72 h in the spleen
While the majority of quality metrics did not change with time, we further studied the observed decline in confidently mapped reads in the spleen. We identified a statistically significant decrease in the percentage of reads in exons that was only observed in the spleen (Fig. 2a, b). Additionally, the percentage of reads in introns increased with storage time in the spleen, but not the lung and esophagus (Fig. 2c, d). The change in proportion of good quality reads in the spleen at 72 h (Fig. 2b, c) may lead to cell type-specific differences that are further explored later. This skewing between intronic and exonic reads becomes even more apparent when only the top and bottom quartile of cells (with respect to intronic and exonic alignment) are examined over time (Additional file 1: Figure S4). This result implies that non-spliced reads are more stable to degradation.
We next assessed the proportion of mitochondrial reads. This is a commonly used quality metric indicative of cellular stress [35], for example, induced by tissue dissociation or storage. Cells with high percentages of mitochondrial reads are generally excluded from analysis [36]. In our data, the fraction of mitochondrial reads was low, with no significant change in proportion, except in the spleen where mitochondrial reads increase by 72 h in 4 out of 5 donors (Fig. 2e, f). This is also apparent when examining the number of cells with mitochondrial percentage higher than 10%, which significantly increases with time in the spleen only (Fig. 2g).
Effect of time on doublet rates and empty droplets
Doublet scores were calculated for each cell in every sample, and these did not change with time for any of the three tissues (Additional file 1: Figure S5).
We next evaluated changes in non-cellular droplets. All droplet sequencing reactions generate many droplets that do not contain cells but capture acellular mRNA, often referred to as “ambient RNA” or “soup” [37]. We normalized the number of UMI by read depth and set arbitrary thresholds to define “ambient RNA” as 0–0.25, “debris” as 0.25–5, and “cellular material” as > 5 normalized UMI per droplet (Fig. 3a) to reflect the distribution of reads. The proportions of droplets containing UMIs in any of these intervals were not affected by time in the spleen, lung, or esophagus (Additional file 1: Figure S6). However, the mean number of normalized UMI increased in debris and decreased in cellular droplets by 72 h (but not 24 h) in the spleen (Fig. 3b, c). This was not observed in the lung or esophagus, but we note that the mean values in debris and cellular material were very variable in all three tissues.
The increasing debris in the spleen could indicate increased cellular death by 72 h. After dissociation, we observed significant variation in cell viability between samples (Additional file 1: Figure S7) that may be of biological (donor variation) or technical origin (possibly due to samples being manually counted by multiple operators throughout the study). However, viability scores became more consistent after dead cell removal. To assess if cell viability was altered in the tissue prior to dissociation, we performed TUNEL assays on T0 and 72 h tissue sections from all three tissues to visualize apoptosing cells (Additional file 1: Figure S8). TUNEL staining intensity varied both between and within individual samples, with staining being noticeably patchy. There was a trend of higher staining at 72 h for all three tissues, but T0 staining in the spleen was higher than in the other two tissues. Overall, these findings are consistent with increased cell death at later time points and with a larger effect of cell death observed in the spleen.
Since dead cells should be removed in the washing steps and viability columns, we expect not to observe the cells at the late stages of apoptosis in our sequencing data. However, we do observe more debris in the spleen by 72 h that can indicate increased sensitivity to dissociation after prolonged storage.
Annotation of cell types
The gene expression count matrices from Cell Ranger output were used to perform sequential clustering of cells from either whole tissues or particular subclusters. The cell type identities of the clusters were determined and annotated by observation of expression of known cell type markers (Fig. 4a–c, Additional file 1: Figure S9a-c, and Additional file 3: Table S2). Importantly, all time points and at least four different donors contributed to every cell type in all three tissues (Fig. 4d–f, Additional file 1: Figure S10, and Additional file 3: Table S2).
In the lung, 57,020 cells passed quality control and represented 25 cell types. We detected ciliated, alveolar types 1 and 2 cells, as well as fibroblast, muscle, and endothelial cells both from blood and lymph vessels. The cell types identified from the immune compartment included NK, T, and B cells, as well as two types of macrophages, monocytes, and dendritic cells (DC). Multiple DC populations such as conventional DC1, plasmacytoid DC (pcDC), and activated DC were detected and constituted 0.3% (163 cells), 0.08% (46 cells), and 0.2% (122 cells) of all cells, respectively. Lung club cell marker genes are detected in a small number of cells, but our clustering algorithm did not recognize these cells as a separate cluster (Additional file 1: Figure S11). All donors contributed to every cluster. Dividing cells formed separate clusters for T cells, DC, monocytes, NK, and macrophages.
The esophagus yielded 87,947 cells with over 90% belonging to the 4 major epithelial cell types: upper, stratified, suprabasal, and dividing cells of the suprabasal layer. The additional cells from the basal layer of the epithelia clustered more closely to the gland duct and mucous secreting cells. While all donors contributed to the basal layer, only 2 samples from a total of 23 esophagus samples provided the majority of the mucous secreting cells (0.06% from total; 55 cells; samples 325C, 12 h, and 356C, 72 h). Immune cells in the esophagus include T cells, B cells, monocytes, macrophages, DCs, and mast cells. Interestingly, almost 80% of the mast cells (87 cells) originated from a single donor (296C). Increased proportions of other immune cells (B cells, DC, monocytes/macrophages) were also noticed in this donor. This donor was the only one subjected to MACS dead cell removal, which was later excluded from the protocol due to concerns about losing larger cell types such as upper epithelial cells (0.5% of all cells in 296C, over 7% in all other donors). In addition, this donor was diagnosed with ventilator-associated pneumonia and some reports in mice indicate a link between mast cells and pneumonia infection [38, 39].
All the 94,257 cells from the spleen were annotated as immune cells. Follicular and mantle zone B cells were identified as the largest group with 17% (> 16,000 cells) and 20% (> 18,000 cells), respectively. Dividing B cells potentially undergoing affinity maturation were annotated by the expression of AICDA and detected with a frequency of 0.5% (437 cells). Over 6000 plasma cells were detected and annotated as plasmablasts, IgG, or IgM expressing plasma cells. About 90% from each of those originated from one donor 356C, which is consistent with the medical records showing chest infection in this donor. Over 28,000 T cells were annotated as CD4+ conventional, CD8+ activated, CD+4 naive, CD4+ follicular helper (fh), CD8+ MAIT-like, CD8+ gamma-delta, CD8+ cytotoxic lymphocyte (CTL), CD4+ regulatory, or dividing T cells. Two subpopulations of natural killer (NK) cells, a dividing NK population, monocytes, macrophages, and DCs were also identified. Multiple cell groups were represented in very low proportions, such as subpopulations of the DC including activated DC (0.04%), conventional DC1 (0.3%), and pcDC-s (0.3%), as well as innate lymphoid cells (0.6%), CD34+ progenitor cells (0.2%), platelets (0.08%), and an unknown population of cells positioned between T and B cell clusters (0.1%). Another group containing over 2207 cells expressing both T and B cell markers could represent the doublets of interacting cells and were called T_B doublet. Besides the plasma cell populations, multiple other cell types such as T_B doublet, conventional DC 1 and DC 2, DC plasmacytoid, and macrophages were also represented in higher proportions in donor 356C than any other donor. No stromal cells were detected, which is likely to be due to the fact that for the spleen, no enzymatic digestion was employed to release cells.
Tissue processing signatures
We also performed bulk RNA-sequencing for each donor at each time point to assess gene expression changes over time without dissociation artifacts and to allow us to determine what gene sets are changed by dissociation, or if specific cell populations are lost. On a UMAP plot, the bulk and single-cell pseudo-bulk (sc-pseudo-bulk) samples cluster primarily by method (bulk or sc-pseudo-bulk) and by tissue of origin, but not according to the time point (Additional file 1: Figure S12). Previous work has highlighted the effect of enzymatic tissue dissociation on gene expression patterns [20]. Differential expression analysis was carried out by the Wilcoxon signed-rank test in each tissue between bulk vs sc-pseudo-bulk samples. p values were corrected for multiple testing via the Benjamini and Hochberg (BH) method. The genes with the highest fold changes from sc-pseudo-bulk to bulk were enriched in ribosomal genes in all three tissues (Additional file 5: Table S4). Also, a long non-coding RNA “MALAT1” appeared in the top 20 genes expressed at higher levels in sc-pseudo-bulk in all of the 3 tissues (adjusted p values < 0.002 in the lung, esophagus, and spleen). The high enrichment of ribosomal genes (adjusted p value 1.15 × 10−6) as well as MALAT1 (adjusted p value 1.15 × 10−6, median log2 fold change − 4.4) in sc-pseudo-bulk samples was also evident when combining all three tissues for the analysis. All of the dissociation-related FOS, FOSB, JUN, and JUNB genes [20] were significantly higher in the sc-pseudo-bulk than in the bulk samples with adjusted p values 1.56 × 10−7, 2.3 × 10−10, 8.7 × 10−06, and 6.13 × 10−09, respectively. Differential expression of ribosomal and early response genes was also seen in previous reports of dissociation signatures [20].
We also carried out tissue-specific analysis of differential gene expression. Genes more highly expressed in bulk derive from the cell types sensitive to dissociation. Pulmonary alveolar cells are very scarce in our single-cell lung data, but abundant in the tissue. This results in the differential expression of the marker AGER and surfactant-protein encoding genes SFTPB, SFTPA1, and SFTPA2. Other genes with high fold changes between bulk and sc-pseudo-bulk lung are blood vessel endothelial markers VWF and PECAM1. In the esophagus, stromal-specific genes FLNA and MYH11 and both KRT4 and KRT5, expressed in the majority of keratinocytes, are higher in bulk vs sc-pseudo-bulk. In the spleen, the list of top genes includes APOE, CD5L, VCAM1, HMOX1, C1QA, and C1QC, which are strongly expressed in macrophages. This suggests that our sample processing protocols are mostly affecting alveolar and blood vessel cells in the lung, stromal cells in the esophagus, and macrophages in the spleen.
Cell type-specific changes
Having annotated cell types, it was possible to study the change in proportion of cell types over time. Cell type proportions varied greatly between samples and between donors (Fig. 4d–f, Additional file 3: Table S2, Additional file 1: Figure S13a). When examining cell type changes with time within donors, we noticed that the proportion of B cells increased in the spleen and that of T cells decreased in the lung and spleen with storage time (Additional file 3: Table S2, Additional file 1: Figure S13b, and Additional file 1: Figure S14). None of these changes were statistically significant after multiple testing corrections when comparing individual time points. However, we do observe a decrease of CD4 T cells and CD8 cytotoxic lymphocyte proportion in the lung when combining the T0, 12-h, and 24-h time points for comparison with the 72-h time point (BH-corrected p values < 0.01, Additional file 6: Table S5).
We next examined whether there was a cell type-specific effect of storage time on the transcriptome. Notably, UMAP plots that were calculated on highly variable genes did not reveal an obvious effect of time (Fig. 4g, h). We joined gene expression matrices for all the tissues and calculated the percentage of variability explained by different variables. Figure 4j shows that the variable donor, tissue, cell type, and number of counts account for the highest fraction of the variance explained, while the effect of storage time made the smallest contribution. This remained the case when the analysis was carried out per tissue (Additional file 1: Figure S15).
We next examined whether the observed increase in mitochondrial reads with time (spleen, 72 h, Fig. 2e–g) was due to a specific cell type. For this purpose, cells with high mitochondrial reads were assigned to a cell type via similarity. For each cell type and tissue, the mitochondrial percentages and their fold changes relative to T0 were calculated (Additional file 1: Figure S16, Fig. 5). The highest fold changes were present in the spleen at 72 h. While this effect was apparent in multiple cell types, it was particularly evident in plasma cells, where this effect was independently replicated in the two donors contributing the majority of this cell type (Additional file 1: Figure S17, Fig. 5a).
Next, similar cell types were joined together into larger clusters for more reliable analysis. The percentage of variability explained by time point in each of these cell type clusters was extremely low (Fig. 5b), especially compared with variables such as donor and number of counts (Fig. 4j), highlighting that for almost all cell types, cold storage time did not have a major effect.
We also examined which genes changed most with storage time in each cell type (see the “Methods” section). This analysis was carried on a per organ basis, as cold storage gene signatures derived from different cell types primarily clustered by organ of origin, rather than cell types. For example, storage-induced gene signatures from T cells, natural killer cells, and monocytes/macrophages grouped by organ (Additional file 1: Figure S18). Furthermore, the genes driving this similarity were among the top genes contributing to the ambient RNA contamination in the majority of samples (Additional file 1: Figure S19, Additional file 7: Table S6). For example, in the spleen, the plasma cell-specific genes such as JCHAIN, IGHA1, and IGLC3 are high in ambient RNA (Additional file 1: Figure S19) and are also overrepresented in the cold storage signature. This observation is consistent with high mitochondrial percentages (due to stress or cell death) observed in the plasma cells (Fig. 5a). In addition, we observed an overrepresentation of the cold storage signature genes (Additional file 5: Table S4) among the most strongly dissociation-related genes (adjusted p value < 0.01 and median log2 fold change < − 2, Additional file 7: Table S6) with Fisher’s exact test. We found a higher overlap of the dissociation-related signature than expected by chance in all three tissues (p values 2.2 × 10−16, 2.05 × 10−14, and 2.2 × 10−16 in the lung, esophagus, and spleen correspondingly). This can be explained either by cells becoming more sensitive to the dissociation with storage time, or by similar stress signatures being activated via storage time and dissociation independently. Therefore, the low levels of gene expression changes that we do observe with storage time are likely to be driven by stress-induced cell death leading to ambient RNA contamination.
Pairwise differential expression analysis in bulk RNA-sequencing between T0 and other time points did not yield significant genes in any tissue (Additional file 8: Table S7), further indicating that any changes observed are extremely small. For the lung, we were also able to freeze samples at the clinic immediately after collection and compare this sample to later time points. Again, no significantly differentially expressed genes were detected.
It may seem surprising to observe so few changes in gene expression with time, especially given that other studies such as the GTEx project do demonstrate such effects [22, 32]. However, it is important to note that post-mortem samples from warm autopsy were used for the GTEx project (albeit with < 24 h PMI). Our study was designed to mimic the process used during organ transplantation, in which tissues are removed rapidly (within 1 h of cessation of circulation) from cold-perfused donors and stored at 4 °C in hypothermic preservation media such as University of Wisconsin (timing is tissue-dependent; the heart can be stored for 4–6 h, lungs median 6.5 h, kidneys median 13 h). Indeed, for some organs, there is evidence that they remain functional for longer [29, 30]. Further, the work of Wang et al. [31], which looked at hypothermic preservation of mouse kidneys in HypoThermosol FRS, also demonstrated little change in gene expression over 72 h. Therefore, while it is certainly true that rapid gene expression changes will occur under certain storage conditions, at least for the organs tested in this study, it appears these can be limited by maintaining the samples cold in hypothermic preservation media. Altogether, this will be very useful for designing further studies with fresh biological samples (including biopsies from living donors) with regard to sample collection time in clinic, transport to the lab, and storage until processing is convenient.
Mapping of cell types across organs
Having generated datasets for the esophagus, lung, and spleen, we examined if cell types that can be found in all three organs, such as immune cells, would cluster by organ or by cell type. Figure 5c shows the result of hierarchical clustering using the 1000 most highly variable genes in up to 10 cells per cell type, tissue, time, and donor. In this analysis of approximately 7500 cells, we see clear subclusters of mast cells, macrophages, and plasma cells with some substructure depending on the donor and the tissue of origin, suggesting that more extensive analysis will allow us to study tissue adaptation of different immune cell populations. Other cells such B cells sit in two groups, and dividing cells (NK, macrophages, T cells) also co-segregate. Importantly for this study, samples do not cluster by time.
Variation in cell type contribution
Our protocols of single-cell dissociation are aimed at capturing the diversity of cell types present in each organ, but do not represent the proportion of each cell type in the original tissue. For example, the tissue dissociation protocol employed for the lung strongly enriches for immune cell types. Relatively high variability in the proportions of cell types was seen between samples. This was likely to be due to technical variation as well as underlying biological variation as indicated by the capture of rare structures such as glandular cells in only some esophagus samples, namely from donor 325C and 356C. Interestingly, histology on sections from donor 325C (12 h) confirmed the presence of glands (mucous secreting cells; Additional file 1: Figure S2) that were not present in the other esophagus samples sectioned. All other samples contained fewer than five mucous secreting cells (Additional file 3: Table S2). This exemplifies the difficulty in collecting cells from structures that are sparsely distributed, such as the glands in the esophagus, and suggests that some of the sample to sample variation is due to the underlying differences in the architecture of the specific tissue sections analyzed. A similar effect was seen for blood vessels (Fig. 4e, 367C, 72 h). Furthermore, the immune infiltrate seen on one of the histology sections (Additional file 1: Figure S2c, 362C, 24 h) is possibly reflected in an increase in immune cells (B, T, and monocytes/macrophages) at the single-cell level (Fig. 4e, 362C, 24 h).
Overall, we observe greater variability between donors than between samples. Additional file 6: Table S5 lists all cell types per tissue, 72 in total. Statistical tests (t tests) for changes with time yielded only two significant changes (discussed above). However, when we test differences between donors, we find that for 29 out of the 72 combinations the proportions of cell types were significantly different in at least 1 of the donors compared to the rest (one-sided ANOVA, BH-corrected p value < 0.05). This variability in cell type proportions per donor is also visualized in Additional file 1: Figure S12a. The cell type with the most significant variation between donors was mast cells in the lung. Other examples include NK cells in both the spleen and lung and dividing epithelial cells in the esophagus (Fig. 4d–f). This high variability between donors suggests that for the Human Cell Atlas, a large number of donors will have to be profiled to understand the range of “normal.”