- Open Access
CellTag Indexing: genetic barcode-based sample multiplexing for single-cell genomics
© The Author(s). 2019
- Received: 7 June 2018
- Accepted: 23 April 2019
- Published: 9 May 2019
High-throughput single-cell assays increasingly require special consideration in experimental design, sample multiplexing, batch effect removal, and data interpretation. Here, we describe a lentiviral barcode-based multiplexing approach, CellTag Indexing, which uses predefined genetic barcodes that are heritable, enabling cell populations to be tagged, pooled, and tracked over time in the same experimental replicate. We demonstrate the utility of CellTag Indexing by sequencing transcriptomes using a variety of cell types, including long-term tracking of cell engraftment and differentiation in vivo. Together, this presents CellTag Indexing as a broadly applicable genetic multiplexing tool that is complementary with existing single-cell technologies.
Single-cell technology is advancing at a rapid pace, providing unique opportunities to investigate biological systems and processes with unparalleled resolution. As an increasing variety of assays are being deployed at single-cell resolution, this has presented new challenges for experimental design and data analysis. Recently, batch effects were shown to drive aberrant clustering of the same biological sample processed via two different methodologies , demonstrating how the accuracy of single-cell data analysis can be confounded by measurement errors. Several algorithms currently exist to support the computational correction of batch effects [2–5]. These methods aim to minimize technical artifacts by regressing out known factors of variation during single-cell data processing. However, this requires prior knowledge of the specific factors contributing to batch effects, limiting these approaches. In an alternative strategy, samples are pooled together and subsequently demultiplexed, based on their natural genetic variation , a powerful approach that supports the multiplexing of up to ~ 20 samples. However, if the samples are not genetically distinct or are not accompanied by detailed genotypic knowledge, demultiplexing by genetic variation does not represent a feasible approach. For instance, this strategy would not be suitable for comparing different experimental groups from the same individual or animal model where genetic background stays constant.
Recently, several “label-and-pool” approaches have been developed to mark individual cells of the same sample with a distinct barcode prior to pooling and processing in the same single-cell RNA-sequencing (scRNA-seq) run [7–12]. For example, cells can be tagged with barcoded antibodies [9, 12], chemically labeled with DNA oligonucleotides [8, 10], or transiently transfected with DNA oligonucleotides , such that sample identifiers for each cell can be read, in parallel with their transcriptomes. Similarly, several other methods exist to couple genetic perturbations with barcodes [13–17], although these have not been demonstrated to support reliable, large-scale sample multiplexing. Here, we introduce a methodology to multiplex biological samples via long-term genetic labeling with heritable virally delivered barcodes, “CellTags.” In this approach, defined 8-nucleotide (nt) CellTag barcodes are expressed as polyadenylated transcripts, captured in standard single-cell processing protocols. This design permits the indelible labeling and subsequent identification of cells by sample, in parallel with the measurement of their identity and state. In contrast to labeling approaches based on transient physical interactions at the cell or nuclear surface, CellTag Indexed cells retain their heritable barcodes for an extended period in vitro and in vivo, supporting long-term cell tracking experiments. This also distinguishes CellTag Indexing as a unique multiplexing tool in that cell samples can be tagged, mixed and tracked within the same biological replicate, and processed together to mitigate unwanted biological and technical variation.
Here, we validate CellTag Index-based multiplexing via the labeling and mixing of genetically distinct populations, demonstrating accurate and efficient demultiplexing of sample identity. Furthermore, we demonstrate the efficacy of CellTag Indexing for long-term live cell multiplexing, via the establishment of a unique competitive transplant model. In this context, we showcase how CellTag Indexing can be used for in vivo multiplexing to precisely quantify engraftment and differentiation potential of distinct, competing cell populations. Together, this positions CellTag Indexing as a broadly applicable tool, easily deployed in cell culture- and transplantation-based assays, that is compatible across different single-cell modalities.
Genetic labeling of biological samples via CellTag Indexing
First, to ensure that CellTag Indexing does not perturb cell physiology, we tested the impact of labeling on a well-characterized lineage reprogramming system, B cell to induced macrophage reprogramming . We cultured HAFTL pre-B cells and induced reprogramming to macrophage fate with β-estradiol, as previously described . One biological replicate was transduced with CellTag lentivirus, while an independent control replicate, cultured in parallel, was not transduced (Additional file 1: Figure S1A). After 72 h of reprogramming, the two induced macrophage samples were independently processed for sequencing, along with a sample of the original, untransduced B cells. This yielded 1310 CellTagged transcriptomes, 2849 control transcriptomes, and 972 B cell transcriptomes. We detected a median of 6 CellTag transcripts per cell in CellTagged transcriptomes (CellTags were detected in every cell of this sample) and 0 in control transcriptomes (Additional file 1: Figure S1B). Clustering and visualization [5, 20] of CellTagged and control macrophage transcriptomes are interspersed with no independent clustering observed, with both clustering separately from B cells (Additional file 1: Figure S1C&D). Additionally, CellTagged and control induced macrophages exhibit comparable upregulation of macrophage marker expression, accompanied by similar levels of B cell marker downregulation (Additional file 1: Figure S1E). Genome-wide comparison of gene expression of the two samples shows a strong linear association with an R2 value of 0.98 (Additional file 1: Figure S1F), confirming that CellTag Indexing does not perturb cell identity or physiology. This is in agreement with our previous study showing that transduction with a random CellTag library does not influence cell behavior .
Species mixing of genetically distinct cells validates CellTag-based multiplexing
To assess the efficacy of CellTag-based multiplexing, we applied it to “species mixing,” an experiment commonly performed to estimate cell co-encapsulation rates in droplet-based scRNA-seq . In this experiment, one sample of human HEK293T cells was labeled with CellTag Index A (CellTagA), and one sample of mouse embryonic fibroblasts (MEFs) was labeled with CellTag Index B (CellTagB), for 24–48 h. Transduction efficiency was visualized by measuring the percentage of GFP-positive cells (~ 90%, Additional file 1: Figure S2A). Labeled cells were pooled, in equal proportions, and processed together for single-cell library preparation and sequencing, yielding a total of 18,159 transcriptomes, with 9357 single human cells (aligning predominantly to the hg19 genome), 7456 single mouse cells (aligning predominantly to the mm10 genome), and 1346 multiplets as classified by 10x Genomics’ Cell Ranger pipeline, based on alignment to the custom hg19-mm10 reference genome (Fig. 1b and Additional file 1: Figure S2B). For the purpose of validation, we take this classification result as a benchmark for comparison. To assign sample identity based on CellTag Index expression, we developed a novel demultiplexing algorithm (https://github.com/morris-lab/CellTag-Classifier)  that examines the expression distribution of each CellTag Index, followed by a dynamic binarization step to assess each CellTag Index signal on an individual cell basis (Fig. 1a and Additional file 1: Figure S2C; see the “Methods” section). With this method, we demultiplexed the pooled transcriptomes into 7510 human cells (CellTagA), 6397 mouse cells (CellTagB), 1040 multiplets, and 3212 non-determined cells (Fig. 1c and Additional file 1: Figure S2D). Overall, our algorithm successfully classified, or demultiplexed, 82.3% of all transcriptomes. The presence of non-determined cells is likely due to cells that did not receive sufficient dosage of virus during CellTag Index transduction. This can be enhanced by increasing virus multiplicity of infection (MOI) and visualizing the percentage of GFP-expressing cells prior to sequencing, as demonstrated below. For the purpose of benchmarking, we removed the 3212 non-determined cells for comparison with the 10x-based classification (Additional file 1: Figure S2E&F). Using Cohen’s kappa as a measure of agreement between independent observations, we calculated a kappa of 0.814 (Additional file 1: Figure S2G), suggesting that our demultiplexing is in strong agreement with the orthogonal 10x-based classification. Furthermore, cells designated as multiplets by both 10x and CellTagging demonstrate a clear increase in the mean numbers of transcripts per cell (Additional file 1: Figure S2H&I), suggesting they do indeed represent multiplets.
To demonstrate the efficacy of CellTag Indexing for multiplexing several biological samples in one experiment, we conducted additional validation where four samples of HEK293Ts and one sample of MEFs were transduced with five different predefined CellTag Indexes (HEK293Ts: CellTags C, D, E, and A; MEFs: CellTag B). A total of 4673 cells were sequenced, with an inferred doublet rate of 3.6% (see the “Methods” section). Overall, CellTag expression is detected in 99.2% of all cells, reflecting the improved tagging efficiency from an increased MOI. We demultiplexed the transcriptomes as above, including an additional step to resolve misclassified multiplets (Additional file 1: Figure S2C; see the “Methods” section). Overall, 4558 out of 4673 transcriptomes, or 97.5% of all transcriptomes, were successfully classified (Additional file 2: Figure S3A). Visualization of the classified transcriptomes by heatmap of CellTag barcode expression (Fig. 1d) and by dimension reduction (Fig. 1e, Additional file 2: Figure S3B&C) demonstrates clear segregation between species, suggesting that CellTag Indexing can be used to reliably multiplex numerous samples.
CellTag multiplexing enables long-term tracking of cell potential in an in vivo competitive transplant model
Current multiplexing methods are based on transient transfection or temporary molecular interactions with the cell or nucleus surface [7–12]. Although relative to CellTag Indexing, this offers faster labeling of cells, it does not support long-term labeling. Here, the unique advantage of CellTag-based multiplexing is that the label is heritable, as a result of stable integration into the cell genome, and can persist for many weeks as we have shown previously . This creates opportunities for the longitudinal analysis of cell behavior over an extended period. Moreover, since experimental groups can be tagged, mixed, and tracked within the same biological replicate, unwanted biological and technical variation is minimized. To explore this application of CellTag multiplexing, we applied the method to assess rates of cell engraftment and intestinal differentiation potential in an in vivo competitive transplant model.
We previously reported that MEFs can be directly reprogrammed, via forced expression of transcription factors, into progenitor-like cells that possess both hepatic and intestinal potential [23, 24]. We demonstrated that these cells, named induced endoderm progenitors (iEPs), are able to functionally engraft a mouse model of induced colitis . Prior to transplant, iEPs possess weak hepatic and intestinal identity, still partially resembling the fibroblasts they originated from. Twelve days after transplant into the mouse large intestine, iEPs more closely resemble differentiated intestine . However, in this study, cell identity was assessed via bulk expression analysis that cannot distinguish between different intestinal cell types. Therefore, the mechanism of engraftment and differentiation potential of cells reprogrammed to iEPs remained to be characterized.
Our recent single-cell lineage tracing of fibroblast to iEP reprogramming revealed that this lineage conversion comprises two distinct trajectories: one path successfully reprogramming to iEPs, and an alternate path characterized by progression into a “dead-end” state, where fibroblast identity is re-established . The transition along the successful reprogramming trajectory is accompanied by upregulation of genes such as Apoa1 and Cdh1 (E-cadherin). We hypothesized that the Apoa1HighEcadHigh iEP cells constitute the subpopulation responsible for our previously observed colon engraftment . In this context, CellTag Indexing is well-suited for tracking and quantifying reprogrammed and dead-end cell differentiation potential as the barcodes are stably integrated and heritable, making it possible to label cells for long-term tracing transplantation experiments.
Most intestinal cell recovery protocols focus on the harvest of the epithelium, neglecting many other cell types that constitute the intestine. Given the range of engraftment phenotypes observed in our above histology analyses, we considered that iEPs may also differentiate towards non-epithelial cell types. Thus, to capture the full spectrum of intestinal cell identities, we opted to use whole tissue single-nucleus extraction, over epithelial isolation and digestion, to process the engrafted colon for RNA sequencing. Indeed, single-nucleus RNA sequencing (snRNA-seq) from three colon samples, post-engraftment, followed by Uniform Manifold Approximation and Projection (UMAP)-based visualization  revealed 16 clusters (Fig. 2e), corresponding to a range of different intestinal epithelial cell types. This included intestinal stem cells (ISCs), enterocytes, deep crypt secretory cells, goblet cells, enteroendocrine cells, and non-epithelial cell types (endothelial cells, muscle, enteric neurons, immune cells, fibroblasts) (Fig. 2f). To our knowledge, this is the first dataset of such that profiles large intestinal cell types beyond the epithelium. Known intestinal markers are observed such as Lgr5, Lrig1, and Smoc2 in ISCs [26–29]; Reg4 in deep crypt secretory cells ; Myt1l, Asic2, and Syt1 in enteric neurons [31–34]; Vil1, Plac8, and Krt20 in enterocytes [35–37]; Nkx2-2, Chga, and Tph1 in enteroendocrine cells [38, 39]; and Fcgbp, Muc2, and Clca1 in goblet cells [40, 41] (Fig. 2g, Additional file 2: Figure S5C-E).
Upon further analysis and literature review, we annotated the ISCs into two populations, Lgr5+ ISCs (clusters 1 and 6) and Lgr5− Lrig1+ ISCs (cluster 0), based on distinct patterns of marker expression (Additional file 2: Figure S5C). Lrig1, a transmembrane negative regulator of ErbB signaling , is purported to mark a class of ISCs that are phenotypically distinct from Lgr5+ stem cells in the intestine [27, 28], with additional roles in stem cells of the gastric epithelium  and the epidermis [44–46]. Lgr5+ ISCs, located in clusters 1 and 6 in this dataset, express high levels of established intestinal stem cell markers Lgr5 and Smoc2, as well as Lrig1 (Fig. 2g, Additional file 2: Figure S5C-E). In contrast to Lgr5, Lrig1 is more widely expressed, with moderate levels of expression extending into cluster 0, where Lgr5 expression is absent (Additional file 2: Figure S5C). This is consistent with two independent studies in the small intestine and colon, where Lrig1 was expressed in many crypt cells, while the highest levels of Lrig1 expression were observed in Lgr5+ stem cells [27, 28]. Loss of Lrig1 caused crypt expansion in Lrig1-knockout animals, and three-dimensional intestinal spheres derived from Lrig1-knockout animals matured into budding organoids in culture without exogenous ErbB ligands in contrast to wild-type samples . Intriguingly, Lrig1 was shown to mark a population of ISCs that expand and repopulate the colonic crypt upon tissue damage , although a distinction was not made regarding whether this could be due to the subpopulation of Lrig1+ cells that are also Lgr5+.
Of note, two clusters that remain unannotated (cluster 11, enriched for Reln and Prox1; cluster 14, enriched for Nkain2 and Csmd1) may represent rare or previously unidentified cell types (Additional file 2: Figure S5D). For example, Reln and Prox1 are known for their roles in neuronal migration [47, 48] and neurogenesis [49, 50]; we, therefore, speculate that they may mark a peripheral neuronal cell type in cluster 11.
Colon engrafted iEPs transition through an intestinal stem cell state
Engrafted tissue was harvested in early stages of intestinal regeneration, with the epithelium still undergoing active repair. We chose this time point in an effort to understand the mechanism of iEP engraftment. Indeed, in line with this early regeneration period, cluster 4 likely represents cells in the early stages of engraftment and repair, characterized by expression of both intestinal and mesenchymal markers (Additional file 2: Figure S5D&E). Notably, Grip1, an adaptor protein implicated in maintaining the epidermal-dermal junction via the Fras1/Frem1 complex [51, 52], is among the list of marker genes for cluster 4, suggesting that cluster 4 might represent an iEP engraftment mechanism via adhesion to the basement membrane. We next focused on the proportions of fully reprogrammed EcadHigh iEPs (labeled by CellTagA) and dead-end EcadLow iEPs (labeled by CellTagB) engrafting the intestine. We found that 0.687% ± 0.214% of engrafted cells were derived from reprogrammed iEPs whereas 0.413% ± 0.113% of engrafted cells were derived from dead-end iEPs (p = 0.06; Additional file 2: Figure S6D). This low percentage was expected given that we aimed to capture a broad range of intestinal engraftment to provide an unbiased assessment of engraftment.
In our previous study, we observed that iEPs are capable of long-term (7 weeks post-transplant), functional engraftment, where entire crypts are repopulated by iEP-derived cells . At that time, we speculated that iEPs transition through an intestinal stem cell state to support long-term engraftment. Here, considering our hypothesis that fully reprogrammed EcadHigh iEPs are responsible for this long-term engraftment, we performed a randomized test that we previously developed to assign statistical significance in cluster occupancy . Here, we applied this approach to determine whether reprogrammed and dead-end iEPs were more likely to associate with any particular cluster of intestinal cells. We did not include cluster 4 in this analysis as the colonic epithelium is in the early stages of regeneration, where we consider cells in this cluster to be superficially attached, and not all these cells will eventually integrate into the recovered epithelium. Our randomized test revealed that reprogrammed EcadHigh/CellTagA cells are significantly more likely to occupy cluster 0 (Lgr5− Lrig1+ ISCs, p = 4.03 × 10− 5) and cluster 1 (Lgr5+ ISCs, p = 9.83 × 10− 3), while CellTagged reprogrammed and dead-end populations are depleted from non-epithelial cell clusters (Fig. 3e). Together, this suggests that EcadHigh/CellTagA cells integrate into the regenerating epithelium via an intestinal stem cell intermediate. Expression of the ISC markers Lgr5 and Lrig1 in engrafted iEPs supports this observation (Additional file 2: Figure S6C). As reported previously, Lrig1+ ISCs expand and repopulate the colonic crypt upon tissue damage , pointing to a potential mechanism of long-term iEP engraftment in the mouse colon.
To further investigate engraftment mechanics, we conducted RNA velocity analysis  to reveal the transcriptional kinetics of engrafting iEPs. We reasoned that if these iEPs were differentiating towards intestinal lineages, then transcript kinetics from early iEP engraftment cluster, cluster 4, should show velocity vectors towards annotated intestinal clusters. Indeed, RNA velocities projected onto the UMAP clusters show cluster 4 velocities towards the main intestinal clusters (Fig. 3f, Additional file 2: Figure S6F&G). Specifically, velocity vectors from the subset of CellTagged cells show vectors originating from cluster 4 towards cluster 0, and from the intestinal stem cell pole of the main intestinal clusters towards the more differentiated pole of enterocytes (Fig. 3g). Taken together, here, we have demonstrated the utility of CellTag Indexing to multiplex EcadHigh and EcadLow iEPs for transplantation into the mouse large intestine, suggesting that iEPs transition through a Lgr5+ and/or Lrig1+ stem cell state to engraft and repopulate the colonic epithelium, resolving speculation about their engraftment route. Our findings are consistent with previous reports of iEP differentiation potential and position CellTag Indexing as a powerful long-term tracking and multiplexing tool for scRNA-seq.
Here, we present a broadly applicable toolbox, CellTag Indexing, to label biological samples for single-cell analysis, where each sample is genetically tagged with a predefined lentiviral GFP barcode to mark its sample identity. We demonstrate that CellTag Indexing does not perturb cell physiology, and validate the utility of our multiplexing approach via species mixing, showing that it can be used to accurately multiplex samples for scRNA-seq, with subsequent demultiplexing at high efficiency. We showcase the unique feature of this heritable labeling approach, by tracking cells in a competitive in vivo transplant setting, revealing reprogrammed cell potential and mechanisms of engraftment while providing internal controls to mitigate both biological and technical batch effects. CellTag multiplexing is complementary to current strategies based on transient cell surface interactions for labeling cells immediately prior to scRNA-seq, yet unique in that CellTag barcodes are stably integrated and heritable through cell division. The flexible timing of lentiviral barcode transduction, coupled with stable barcode expression, makes our system uniquely suitable for long-term tracing experiments and transplant models where temporary tags would not be retained.
CellTag Indexing offers the advantages of minimized technical variation by experimental design, the ability to multiplex biological samples for competitive transplant, broad compatibility with various cell types and single-cell technologies, long-term barcode expression, streamlined workflow and library preparation, reduced sequencing cost, and straightforward demultiplexing. CellTag Indexing is designed for broad applications; its use of lentivirus as a labeling method represents a commonly used and accessible biological tool with minimal setup costs and reagent requirements. As lentivirus can transduce both dividing and non-dividing cells, CellTag barcodes can be introduced into a wide variety of cell types. In terms of estimating labeling efficiency, CellTag Indexing conveniently utilizes GFP as a barcode carrier, which can act as a visual readout for transduction efficiency. Generally, CellTag transcripts are abundantly expressed and can be optionally amplified during library preparation to further increase the detection rate.
Importantly, CellTag transcripts can be recovered from the nucleus, extending this approach to single-nucleus RNA sequencing. Furthermore, cells labeled with CellTag Indexes can be cultured and used in experiments prior to collection for sequencing, for example in the competitive transplant assay we demonstrated here where tagged samples act as internal controls for each other to minimize unwanted biological variation. This is complementary to existing labeling methods that utilize cell/nuclear surface chemistry or transient transfection for temporary tagging [7–12], where the labels would be progressively lost in vitro and in vivo. Additionally, as a future application, we expect that CellTag multiplexing will be compatible with single-genome-based assays such as single-cell ATAC-seq. In summary, CellTag Indexing is a broadly applicable tool complementary to existing methods for cell multiplexing and tracking, providing a diverse panel of experimental and analytical strategies.
Mouse embryonic fibroblasts were derived from the C57BL/6 J strain (The Jackson Laboratory 000664). HEK293T and mouse embryonic fibroblasts were cultured in Dulbecco’s modified Eagle medium (Gibco) supplemented with 10% fetal bovine serum (Gibco), 1% penicillin/streptomycin (Gibco), and 55 μM 2-mercaptoethanol (Gibco). HAFTL pre-B cells were cultured in RPMI1640 without phenol red (Lonza) supplemented with 10% charcoal/dextran-treated FBS (Hyclone) and 55 μM 2-mercaptoethanol (Gibco) .
Generation of iEPs
Mouse embryonic fibroblasts were converted to iEPs as previously described [23, 24]. Briefly, fibroblasts were prepared from E13.5 embryos, cultured on gelatin, and serially transduced every 12 h with Hnf4α-t2a-Foxa1 retrovirus for five times over the course of 3 days, followed by culture on collagen in hepato-medium, which is DMEM:F-12 (Gibco) supplemented with 10% FBS, 1% penicillin/streptomycin, 55 μM 2-mercaptoethanol, 10 mM nicotinamide (Sigma-Aldrich), 100 nM dexamethasone (Sigma-Aldrich), 1 μg/mL insulin (Sigma-Aldrich), and 20 ng/ml epidermal growth factor (Sigma-Aldrich).
CellTag lentiviral constructs were generated by introducing an 8-bp variable region into the 3′ UTR of GFP in the pSmal plasmid  using a gBlock gene fragment (Integrated DNA Technologies) and megaprimer insertion (https://www.addgene.org/pooled-library/morris-lab-celltag/). Individual clones were picked and Sanger sequenced to generate predefined barcodes. The specific CellTag barcodes used in this manuscript are TGCTATAT (CellTagA), GTTGGCTA (CellTagB), AGTTTAGG (CellTagC), GGTTCACA (CellTagD), and TAGAAAGC (CellTagE). These constructs are available from Addgene: https://www.addgene.org/browse/article/28197603/.
Lenti- and retrovirus production
Lentiviruses were produced by transfecting HEK293T cells with lentiviral pSMAL vector and packing plasmids pCMV-dR8.2 dvpr (Addgene plasmid 8455) and pCMV-VSV-G (Addgene plasmid 8454) using X-tremeGENE 9 (Sigma-Aldrich). Viruses were collected 48 and 72 h after transfection. Retroviruses were similarly produced, with retroviral pGCDNSam vector and packaging plasmid pCL-Eco (Imgenex).
CellTag virus-containing supernatant collected from virus-producing HEK293T cells was kept at 4 °C and used within 1 week. Prior to transduction, protamine sulfate (Sigma-Aldrich) was added to the viral solution to a final concentration of 4 μg/ml. Cells were aspirated of media, and the CellTag virus was added to the cells for a 24-h transduction period. This transduction was repeated as needed, for a total of 48 h for HEK293T cells and 72 h for MEFs in the 5-tag species mixing experiment and 72 h for iEPs.
Immunostaining and quantification
Transduced HEK293T and MEFs were cultured on a four-chamber culture slide (Falcon) for 24 h prior to fixation in 4% paraformaldehyde and staining in 300 nM DAPI in PBS. The slide was then mounted in ProLong Gold Antifade Mountant (Invitrogen). Images were acquired on a Nikon eclipse Ts2 inverted microscope. For automatic quantification, images of CellTagged HEK 293T and MEF were processed with a custom python script to count GFP-positive/negative cells. The proportion of GFP-positive cells was calculated from DAPI and GFP images. First, DAPI images were transformed into binary images by thresholding fluorescent signal. The threshold values were determined by the Otsu method. The binary nucleus image was processed by watershed segmentation to separate individual cell areas completely. Inappropriately sized objects were filtered to remove noise and doublet cells. The intensity of the GFP signal per individual cell area was then quantified to distinguish between GFP-positive cells and GFP-negative cells. These processes were run with Python 3.6.1 and its libraries: scikit-image 0.13.0, numpy 1.12.1, matplotlib 2.0.2, seaborn 0.8.1, jupyter 1.0.0.
Mouse model of colonic mucosal injury
Using a previously described procedure , we generated colonic epithelial injury with modifications as followed: C57BL/6 mice were anesthetized with inhaled isoflurane. A custom-made syringe catheter (consisted of 3-ml syringe (BD #309657), Luer lock 26-gauge 1/2″ dispensing needle (GraingerChoice #5FVG9), and polyethylene tubing (Scientific Commodities, #BB31695-PE/2) cut to approximate 5 cm in length and affixed to the needle) was used to deliver approximately 1 mL of PBS enema intraluminally via the anal canal, followed by gentle abdominal massage to promote excretion of excess fecal matter. The luminal space was then filled with 0.5 mL of 500 mM EDTA/PBS using the custom syringe catheter over the course of approximately 30 s. Mechanical abrasion was performed with Proxabrush cleaners (Sunstar #872FC) dipped in 500 mM EDTA/PBS, inserted approximately 1 cm into the colon, with 30 rotational movements to gently scratch the luminal surface.
iEP characterization and transplantation
Eight-week iEPs were stained with mouse E-cadherin-APC antibody (10 μL per one million cells, R&D Systems, FAB7481A) and sorted on a modified Beckman Coulter MoFlo into EcadHigh and EcadLow populations. Sorted iEPs were plated and cultured as above. Colony formation assay was performed as previously described . For colon engraftment, CellTagged EcadHigh and EcadLow iEPs were digested into single-cell suspensions. For each mouse, 0.5 million of EcadHigh iEPs (CellTagA) and 0.5 million of EcadLow iEPs (CellTagB) were pooled and resuspended in 50 μL of 10% Matrigel on ice. A total of 1 million iEPs was instilled into the colonic lumen of each mouse by using the custom syringe catheter, after which the mouse was held vertically head-down for approximately 2 min to prevent immediate excretion of the infused cell suspension.
Single-nucleus RNA-seq procedure
Single-nucleus extraction from tissue was performed as previously described . Briefly, engrafted colonic tissues were finely minced with a razor then transferred to a Dounce tissue homogenizer (Kimble Chase KT885300-0002) in 2 mL of ice-cold Nuclei EZ Lysis buffer (Sigma #N-3408) supplemented with protease inhibitor (Roche #5892791001) and RNase inhibitors (Promega #N2615, Thermo Fisher Scientific #AM2696). The tissue was ground 20–30 times with the loose pestle. The homogenate was filtered through a 200-μm cell strainer (pluriSelect #43-50200) then transferred back to the Dounce homogenizer, ground with the tight pestle 10–15 times. The homogenate was incubated on ice for 5 min with an additional 2 mL of lysis buffer, then filtered through a 40-μm cell strainer (pluriSelect #43-50040) and centrifuged at 500g for 5 min at 4 °C. The incubation and centrifugation steps were repeated one time, followed by resuspension Nuclei Suspension Buffer (1× PBS, 1% BSA, 0.1% RNase inhibitor) and filtering through a 5-μm cell strainer (pluriSelect #43-50005). The nuclei were then loaded onto the 10x Chromium Single Cell Platform for encapsulation and barcoding.
10x Chromium Single Cell 3′ Library & Gel Bead Kit, 10x Chromium Single Cell 3′ Chip kit, and 10x Chromium i7 Multiplex kit (10x Genomics) were used according to the manufacturer’s protocols. Libraries were quantified on the Agilent 2200 TapeStation and sequenced on Illumina HiSeq 2500.
where Cij = expression value of CellTag j in cell i and S = 1000 sample drawn from the density curve of CellTag j, Dj. This process is iterated for at least 50 times to make sure that the samples are representative of the overall density. The cells are then classified to their corresponding CellTag based on the proportions calculated above by finding the overall minimum in each proportion matrix. The uniqueness of the minimum does not eliminate the probability for the cell to be a multiplet. Hence, for cells with a unique minimum, we examine the pair-wise differences between the minimum tag and other tags using a baseline cutoff of 0.238 learned via benchmarking and training against orthogonal 10x classification. Finally, the number of multiplets identified from our pipeline is compared to the expected number derived from 10x Genomics’ Single Cell 3′ Reagents Kit v2 User Guide Rev. E (multiplet % = 0.0007589 × number of cells recovered + 0.0527214). If the number of multiplets exceeds the expected number, the optional multiplet checkpoint is implemented, where the proportion matrix is sorted such that the most likely multiplets are identified using a cutoff at the quantile of (1.5 × expected num/multiplet). The remaining cells are then classified to their singlet identities.
The Cell Ranger v.3.0.1 pipeline (https://support.10xgenomics.com/single-cell-gene-expression/software/downloads/latest) was used to process data generated using the 10x Chromium platform. For alignment of the single-nucleus RNA-seq data, a modified “pre-mRNA” mm10 reference was used to include reads aligned to introns. The R package Seurat  (version 3) was used for data processing and visualization. For the iEP dataset, we removed cells with a low number of genes detected (< 200), cells with a high number of UMI detected (> 100,000), and cells with a high proportion of UMI counts attributed to mitochondrial genes (> 20%). The filtered expression matrix was then normalized and scaled to remove unwanted sources of variation driven by the number of detected UMIs and mitochondrial gene expression. Linear dimension reduction was performed, followed by canonical correlation analysis to integrate independent biological replicates, then clustering and visualization via UMAP .
Assessing cluster occupancy by randomized testing
A randomized test that we developed previously  was used to identify clusters that are significantly occupied by EcadHigh/EcadLow iEPs. In brief, we calculated the proportions of CellTagA and CellTagB cells that fall into each cluster, serving as the null percentages for the two tags. In particular, let n be the number of cells with a CellTag. Let s be the number of cells without this tag. The two were then pooled together from which we drew n random samples without replacement for at least (n + s)/n times such that every possible ending group can be captured. With each sample drawn, the occupancy of n sampled cells in each cluster was calculated. A background proportion distribution was then generated based on this resampling result. We then used the distributions to compute the likelihood of having the null percentage or higher. Using a p value of < 0.05, we identified the clusters that are enriched for each CellTag. This randomized test was performed using a python script. We exclude cluster 4 in this test as it represents the early engraftment stage. Cell number tested for CellTagA equals 66. Cell number tested for CellTagB equals 46.
RNA velocity analysis
RNA velocity was analyzed with Velocyto.py (version 0.17.17). The analysis was done according to the web instruction; http://velocyto.org/velocyto.py/. For the input of single-cell RNA-seq data, the output files of 10x Cell Ranger pipeline were used. The single-cell RNA-seq reads for each sample were converted into read-counts after distinguishing a spliced or unspliced transcript. This process was done with command line velocyto API, and final products were saved as loom files. Next, the loom files of each scRNA-seq sample were merged into a single loom file. The merged loom file was processed with velocyto python API to create the velocyto object. Then, the velocyto object was integrated with UMAP dimensional reduction data and CellTag data which were produced in the scRNA-seq analysis with Seurat and CellTag demultiplexing process. Next, the velocyto object was subjected to quality check and filtering process. Genes were filtered by the mRNA detection level (min_expr_counts = 40, min_cells_express = 30). After feature selection by a velocyto function, the data were normalized by total molecule count. Then, velocyto object was subjected to a series of final data processing process: PCA, k-nn-based imputation, velocity estimation, and shift calculation. Finally, the vectors estimated by RNA velocity were projected on the UMAP graph.
We would like to thank members of the Morris lab and Kristen Seiler for critical discussions, John Dick for the kind gift of the pSMAL-GFP construct  and Thomas Graf for the HAFTL B cell line , Genome Technology Access Center for sequencing, and Siteman Flow Cytometry core for assistance with cell sorting.
This work was funded by the National Institutes of Health (NIH) grants R01-GM126112, R21-HG009750, and P30-DK052574; Silicon Valley Community Foundation, Chan Zuckerberg Initiative Grants HCA-A-1704-01646 and HCA2-A-1708-02799; The Children’s Discovery Institute of Washington University and St. Louis Children’s Hospital MI-II-2016-544. S.A.M. is supported by a Vallee Scholar Award, C.G. NIH-5T32GM007200-42, K.K. Japan Society for the Promotion of Science Postdoctoral Fellowship, and G.C.R.G. Washington University in. St. Louis School of Medicine Center of Regenerative Medicine Rita Levi-Montalcini Fellowship.
Availability of data and materials
All source data including sequencing reads, single-cell expression matrices, and CellTag classification metadata are available from the Gene Expression Omnibus (GEO) under accession number GSE130065 . The CellTag demultiplexing algorithm is available at https://github.com/morris-lab/CellTag-Classifier  under the MIT license.
CG and SAM conceived the research. CG led experimental work, assisted by GCRG, XY, and YK. CG and WK led the computational analysis, assisted by KK and supervised by SAM. CG wrote the manuscript, reviewed by WK, KK, GCRG, XY, YK, and SAM. All authors read and approved the final manuscript.
Ethics approval and consent to participate
All animal procedures were based on animal care guidelines approved by the Institutional Animal Care and Use Committee at Washington University in St. Louis under protocol number 20150192.
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Hicks SC, Townes FW, Teng M, Irizarry RA. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics. 2017;19:562–78.View ArticleGoogle Scholar
- Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.View ArticleGoogle Scholar
- Shaham U, Stanton KP, Zhao J, Li H, Raddassi K, Montgomery R, et al. Removal of batch effects using distribution-matching residual networks. Bioinformatics. 2017;33:2539–46.View ArticleGoogle Scholar
- Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol. 2018;36:421–7.View ArticleGoogle Scholar
- Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36:411–20.View ArticleGoogle Scholar
- Kang HM, Subramaniam M, Targ S, Nguyen M, Maliskova L, McCarthy E, et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat Biotechnol. 2018;36:89–94.View ArticleGoogle Scholar
- Cao J, Packer JS, Ramani V, Cusanovich DA, Huynh C, Daza R, et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science. 2017;357:661–7.View ArticleGoogle Scholar
- Gehring J, Park JH, Chen S, Thomson M, Pachter L. Highly multiplexed single-cell RNA-seq for defining cell population and transcriptional spaces. bioRxiv. 2018.Google Scholar
- Stoeckius M, Zheng S, Houck-Loomis B, Hao S, Yeung BZ, Mauck WM, et al. Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol. 2018;19:224.View ArticleGoogle Scholar
- McGinnis CS, Patterson DM, Winkler J, Hein MY, Srivastava V, Conrad DN, et al. MULTI-seq: scalable sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. bioRxiv. 2018.Google Scholar
- Shin D, Lee W, Lee JH, Bang D. Multiplexed single-cell RNA-seq via transient barcoding for drug screening. bioRxiv. 2018.Google Scholar
- Gaublomme JT, Li B, McCabe C, Knecht A, Drokhlyansky E, Van WN, et al. Nuclei multiplexing with barcoded antibodies for single-nucleus genomics. bioRxiv. 2018.Google Scholar
- Wong ASL, Choi GCG, Cui CH, Pregernig G, Milani P, Adam M, et al. Multiplexed barcoded CRISPR-Cas9 screening enabled by CombiGEM. Proc Natl Acad Sci. 2016;113:2544–9.View ArticleGoogle Scholar
- Adamson B, Norman TM, Jost M, Cho MY, Nuñez JK, Chen Y, et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell. 2016;167:1867–1882.e21.View ArticleGoogle Scholar
- Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell. 2016;167:1853–1866.e17.View ArticleGoogle Scholar
- Jaitin DA, Weiner A, Yofe I, Lara-Astiaso D, Keren-Shaul H, David E, et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-Seq. Cell. 2016;167:1883–1896.e15.View ArticleGoogle Scholar
- Datlinger P, Rendeiro AF, Schmidl C, Krausgruber T, Traxler P, Klughammer J, et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat Methods. 2017;14:297–301.View ArticleGoogle Scholar
- Biddy BA, Kong W, Kamimoto K, Guo C, Waye SE, Sun T, et al. Single-cell mapping of lineage and identity in direct reprogramming. Nature. 2018;564:218–24.View ArticleGoogle Scholar
- Bussmann LH, Schubert A, Vu Manh TP, De Andres L, Desbordes SC, Parra M, et al. A robust and highly efficient immune cell reprogramming system. Cell Stem Cell. 2009;5:554–66.View ArticleGoogle Scholar
- Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37:38–44.View ArticleGoogle Scholar
- Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015;161:1202–14.View ArticleGoogle Scholar
- Guo C, Kong W, Kamimoto K, Rivera-Gonzalez GC, Yang X, Kirita Y, et al. CellTag Indexing: genetic barcode-based sample multiplexing for single-cell genomics. Github Repos.Google Scholar
- Sekiya S, Suzuki A. Direct conversion of mouse fibroblasts to hepatocyte-like cells by defined factors. Nature. 2011;475:390–3.View ArticleGoogle Scholar
- Morris SA, Cahan P, Li H, Zhao AM, San Roman AK, Shivdasani RA, et al. Dissecting engineered cell types and enhancing cell fate conversion via Cellnet. Cell. 2014;158:889–902.View ArticleGoogle Scholar
- Fukuda M, Mizutani T, Mochizuki W, Matsumoto T, Nozaki K, Sakamaki Y, et al. Small intestinal stem cell identity is maintained with functional Paneth cells in heterotopically grafted epithelium onto the colon. Genes Dev. 2014;28:1752–7.View ArticleGoogle Scholar
- Barker N, van Es JH, Kuipers J, Kujala P, van den Born M, Cozijnsen M, et al. Identification of stem cells in small intestine and colon by marker gene Lgr5. Nature. 2007;449:1003–7.View ArticleGoogle Scholar
- Wong VWY, Stange DE, Page ME, Buczacki S, Wabik A, Itami S, et al. Lrig1 controls intestinal stem-cell homeostasis by negative regulation of ErbB signalling. Nat Cell Biol. 2012;14:401–8.View ArticleGoogle Scholar
- Powell AE, Wang Y, Li Y, Poulin EJ, Means AL, Washington MK, et al. The pan-ErbB negative regulator lrig1 is an intestinal stem cell marker that functions as a tumor suppressor. Cell. 2012;149:146–58.View ArticleGoogle Scholar
- Muñoz J, Stange DE, Schepers AG, Van De Wetering M, Koo BK, Itzkovitz S, et al. The Lgr5 intestinal stem cell signature: robust expression of proposed quiescent ′ +4′ cell markers. EMBO J. 2012;31:3079–91.View ArticleGoogle Scholar
- Sasaki N, Sachs N, Wiebrands K, Ellenbroek SIJ, Fumagalli A, Lyubimova A, et al. Reg4 + deep crypt secretory cells function as epithelial niche for Lgr5 + stem cells in colon. Proc Natl Acad Sci. 2016;113:E5399–407.View ArticleGoogle Scholar
- Vohra BPS, Tsuji K, Nagashimada M, Uesaka T, Wind D, Fu M, et al. Differential gene expression and functional analysis implicate novel mechanisms in enteric nervous system precursor migration and neuritogenesis. Dev Biol. 2006;298:259–71.View ArticleGoogle Scholar
- Levanti MB, Guerrera MC, Calavia MG, Ciriaco E, Montalbano G, Cobo J, et al. Acid-sensing ion channel 2 (ASIC2) in the intestine of adult zebrafish. Neurosci Lett. 2011;491:24–8.View ArticleGoogle Scholar
- Roy-Carson S, Natukunda K, chao CH, Pal N, Farris C, Schneider SQ, et al. Defining the transcriptomic landscape of the developing enteric nervous system and its cellular environment. BMC Genomics. 2017;18:290.View ArticleGoogle Scholar
- Memic F, Knoflach V, Morarach K, Sadler R, Laranjeira C, Hjerling-Leffler J, et al. Transcription and signaling regulators in developing neuronal subtypes of mouse and human enteric nervous system. Gastroenterology. 2018;154:624–36.View ArticleGoogle Scholar
- Chan CWM, Wong NA, Liu Y, Bicknell D, Turley H, Hollins L, et al. Gastrointestinal differentiation marker Cytokeratin 20 is regulated by homeobox gene CDX1. Proc Natl Acad Sci. 2009;106:1936–41.View ArticleGoogle Scholar
- Gao S, Yan L, Wang R, Li J, Yong J, Zhou X, et al. Tracing the temporal-spatial transcriptome landscapes of the human fetal digestive tract using single-cell RNA-sequencing. Nat Cell Biol. 2018;20:721–34.View ArticleGoogle Scholar
- Moor AE, Harnik Y, Ben-Moshe S, Massasa EE, Rozenberg M, Eilam R, et al. Spatial reconstruction of single enterocytes uncovers broad zonation along the intestinal villus axis. Cell. 2018;175:1156–67.View ArticleGoogle Scholar
- Desai S, Loomis Z, Pugh-Bernard A, Schrunk J, Doyle MJ, Minic A, et al. Nkx2.2 regulates cell fate choice in the enteroendocrine cell lineages of the intestine. Dev Biol. 2008;313:58–66.View ArticleGoogle Scholar
- Gross S, Garofalo DC, Balderes DA, Mastracci TL, Dias JM, Perlmann T, et al. The novel enterochromaffin marker Lmx1a regulates serotonin biosynthesis in enteroendocrine cell lineages downstream of Nkx2. 2 Development. 2016;143:2616–28.View ArticleGoogle Scholar
- Johansson MEV, Larsson JMH, Hansson GC. The two mucus layers of colon are organized by the MUC2 mucin, whereas the outer layer is a legislator of host-microbial interactions. Proc Natl Acad Sci. 2010;108:4659–65.View ArticleGoogle Scholar
- Pelaseyed T, Bergström JH, Gustafsson JK, Ermund A, Birchenough GMH, Schütte A, et al. The mucus and mucins of the goblet cells and enterocytes provide the first defense line of the gastrointestinal tract and interact with the immune system. Immunol Rev. 2014;260:8–20.View ArticleGoogle Scholar
- Laederich MB, Funes-Duran M, Yen L, Ingalla E, Wu X, Carraway KL, et al. The leucine-rich repeat protein LRIG1 is a negative regulator of ErbB family receptor tyrosine kinases. J Biol Chem. 2004;279:47050–6.View ArticleGoogle Scholar
- Schweiger PJ, Clement DL, Page ME, Schepeler T, Zou X, Sirokmány G, et al. Lrig1 marks a population of gastric epithelial cells capable of long-term tissue maintenance and growth in vitro. Sci Rep. 2018;8:15255.View ArticleGoogle Scholar
- Jensen KB, Watt FM. Single-cell expression profiling of human epidermal stem and transit-amplifying cells: Lrig1 is a regulator of stem cell quiescence. Proc Natl Acad Sci. 2006;103:11958–63.View ArticleGoogle Scholar
- Jensen KB, Collins CA, Nascimento E, Tan DW, Frye M, Itami S, et al. Lrig1 expression defines a distinct multipotent stem cell population in mammalian epidermis. Cell Stem Cell. 2009;4:427–39.View ArticleGoogle Scholar
- Page ME, Lombard P, Ng F, Göttgens B, Jensen KB. The epidermis comprises autonomous compartments maintained by distinct stem cell populations. Cell Stem Cell. 2013;13:471–82.View ArticleGoogle Scholar
- Sheldon M, Rice DS, D’Arcangelo G, Yoneshima H, Nakajima K, Mikoshiba K, et al. Scrambler and yotari disrupt the disabled gene and produce a reeler- like phenotype in mice. Nature. 1997;389:730–3.View ArticleGoogle Scholar
- Sentürk A, Pfennig S, Weiss A, Burk K, Acker-Palmer A. Ephrin Bs are essential components of the Reelin pathway to regulate neuronal migration. Nature. 2011;472:356–60.View ArticleGoogle Scholar
- Iwano T, Masuda A, Kiyonari H, Enomoto H, Matsuzaki F. Prox1 postmitotically defines dentate gyrus cells by specifying granule cell identity over CA3 pyramidal cell fate in the hippocampus. Development. 2012;139:3051–62.View ArticleGoogle Scholar
- Holzmann J, Hennchen M, Rohrer H. Prox1 identifies proliferating neuroblasts and nascent neurons during neurogenesis in sympathetic ganglia. Dev Neurobiol. 2015;75:1352–67.View ArticleGoogle Scholar
- Bladt F, Tafuri A, Gelkop S, Langille L, Pawson T. Epidermolysis bullosa and embryonic lethality in mice lacking the multi-PDZ domain protein GRIP1. Proc Natl Acad Sci. 2002;99:6816–21.View ArticleGoogle Scholar
- Smyth I, Du X, Taylor MS, Justice MJ, Beutler B, Jackson IJ. The extracellular matrix gene Frem1 is essential for the normal adhesion of the embryonic epidermis. Proc Natl Acad Sci. 2004;101:13560–5.View ArticleGoogle Scholar
- La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, et al. RNA velocity of single cells. Nature. 2018;560:494–8.View ArticleGoogle Scholar
- Lu R, Neff NF, Quake SR, Weissman IL. Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nat Biotechnol. 2011;29:928–33.View ArticleGoogle Scholar
- Wu H, Uchimura K, Donnelly EL, Kirita Y, Morris SA, Humphreys BD. Comparative analysis and refinement of human PSC-derived kidney organoid differentiation with single-cell transcriptomics. Cell Stem Cell. 2018;23:869–881.e8.View ArticleGoogle Scholar
- Zorita E, Cuscó P, Filion GJ. Starcode: sequence clustering based on all-pairs search. Bioinformatics. 2015;31:1913–9.View ArticleGoogle Scholar
- Guo C, Kong W, Kamimoto K, Rivera-Gonzalez GC, Yang X, Kirita Y, et al. CellTag Indexing: genetic barcode-based sample multiplexing for single-cell genomics. Gene Expression Omnibus https://www.ncbi.nlm.nih.gov/geo/query/acc.cgi?acc=GSE130065. 2019.