Skip to main content

CellTag Indexing: genetic barcode-based sample multiplexing for single-cell genomics

A Protocol for this article was published on 12 February 2020


High-throughput single-cell assays increasingly require special consideration in experimental design, sample multiplexing, batch effect removal, and data interpretation. Here, we describe a lentiviral barcode-based multiplexing approach, CellTag Indexing, which uses predefined genetic barcodes that are heritable, enabling cell populations to be tagged, pooled, and tracked over time in the same experimental replicate. We demonstrate the utility of CellTag Indexing by sequencing transcriptomes using a variety of cell types, including long-term tracking of cell engraftment and differentiation in vivo. Together, this presents CellTag Indexing as a broadly applicable genetic multiplexing tool that is complementary with existing single-cell technologies.


Single-cell technology is advancing at a rapid pace, providing unique opportunities to investigate biological systems and processes with unparalleled resolution. As an increasing variety of assays are being deployed at single-cell resolution, this has presented new challenges for experimental design and data analysis. Recently, batch effects were shown to drive aberrant clustering of the same biological sample processed via two different methodologies [1], demonstrating how the accuracy of single-cell data analysis can be confounded by measurement errors. Several algorithms currently exist to support the computational correction of batch effects [2,3,4,5]. These methods aim to minimize technical artifacts by regressing out known factors of variation during single-cell data processing. However, this requires prior knowledge of the specific factors contributing to batch effects, limiting these approaches. In an alternative strategy, samples are pooled together and subsequently demultiplexed, based on their natural genetic variation [6], a powerful approach that supports the multiplexing of up to ~ 20 samples. However, if the samples are not genetically distinct or are not accompanied by detailed genotypic knowledge, demultiplexing by genetic variation does not represent a feasible approach. For instance, this strategy would not be suitable for comparing different experimental groups from the same individual or animal model where genetic background stays constant.

Recently, several “label-and-pool” approaches have been developed to mark individual cells of the same sample with a distinct barcode prior to pooling and processing in the same single-cell RNA-sequencing (scRNA-seq) run [7,8,9,10,11,12]. For example, cells can be tagged with barcoded antibodies [9, 12], chemically labeled with DNA oligonucleotides [8, 10], or transiently transfected with DNA oligonucleotides [11], such that sample identifiers for each cell can be read, in parallel with their transcriptomes. Similarly, several other methods exist to couple genetic perturbations with barcodes [13,14,15,16,17], although these have not been demonstrated to support reliable, large-scale sample multiplexing. Here, we introduce a methodology to multiplex biological samples via long-term genetic labeling with heritable virally delivered barcodes, “CellTags.” In this approach, defined 8-nucleotide (nt) CellTag barcodes are expressed as polyadenylated transcripts, captured in standard single-cell processing protocols. This design permits the indelible labeling and subsequent identification of cells by sample, in parallel with the measurement of their identity and state. In contrast to labeling approaches based on transient physical interactions at the cell or nuclear surface, CellTag Indexed cells retain their heritable barcodes for an extended period in vitro and in vivo, supporting long-term cell tracking experiments. This also distinguishes CellTag Indexing as a unique multiplexing tool in that cell samples can be tagged, mixed and tracked within the same biological replicate, and processed together to mitigate unwanted biological and technical variation.

Here, we validate CellTag Index-based multiplexing via the labeling and mixing of genetically distinct populations, demonstrating accurate and efficient demultiplexing of sample identity. Furthermore, we demonstrate the efficacy of CellTag Indexing for long-term live cell multiplexing, via the establishment of a unique competitive transplant model. In this context, we showcase how CellTag Indexing can be used for in vivo multiplexing to precisely quantify engraftment and differentiation potential of distinct, competing cell populations. Together, this positions CellTag Indexing as a broadly applicable tool, easily deployed in cell culture- and transplantation-based assays, that is compatible across different single-cell modalities.


Genetic labeling of biological samples via CellTag Indexing

Here, we describe our lentiviral CellTag toolbox for labeling cells with transcribed DNA barcodes, acting as cell/sample identifiers that can be easily recovered from single transcriptomes. CellTag Indexing is based on the integration of defined 8-nt barcodes (CellTags), delivered via lentivirus. In this design, CellTags are positioned in the 3′ UTR of the green fluorescent protein (GFP) gene, followed by an SV40 polyadenylation signal sequence (Fig. 1a). Lentivirus carrying a defined CellTag is used to transduce and genetically label a sample, where GFP is included in this design to enable straightforward quantification of transduction efficiency. This results in the high expression of heritable, polyadenylated CellTag transcripts that are efficiently captured in standard single-cell library preparation pipelines, allowing for the demultiplexing of original sample identity in downstream analysis. We previously demonstrated the efficacy of this approach to label cells with combinations of random CellTags to support lineage tracing in cell fate reprogramming [18]. While this is a powerful approach to track clonally related cells, it requires more complex experimental design and significant computational analysis. Furthermore, only ~ 50% of labeled cells can be tracked via this method; while this supports high-confidence lineage reconstruction, it is not suited to high-efficiency cell labeling for the purpose of sample multiplexing. Our goal here was to expand the utility of CellTagging to support sample multiplexing.

Fig. 1
figure 1

Validation of CellTag Indexing for genetic labeling of biological samples. a Schematic of CellTag Indexing. CellTag barcodes are positioned in the 3′ UTR of a lentiviral GFP construct with a SV40 polyadenylation signal. Barcoded viruses produced from CellTag constructs are used to transduce the cells to be “tagged.” Tagged cells can then be pooled for single-cell profiling. Prior to analysis, cell identity is demultiplexed by our classifier pipeline: A CellTag digital gene expression (DGE) matrix is generated by extracting and counting CellTag sequences for each cell; the DGE is then collapsed by consensus clustering of the detected CellTags; after filtering and log normalization, the DGE is processed by dynamic binarization and classification. Classification results can be visualized as metadata overlaying single transcriptomes projected onto reduced dimensions. b Scatter plot of 18,159 transcriptomes from the 2-tag species mixing experiment, classified by 10x Genomics Cell Ranger pipeline into 9357 single human cells, 7456 single mouse cells, and 1346 multiplets based on alignment to the custom hg19-mm10 reference genome. c Scatter plot of 18,159 transcriptomes from the 2-tag species mixing experiment, demultiplexed by CellTag Indexing into 7510 human cells (CellTagA), 6397 mouse cells (CellTagB), 1040 multiplets, and 3212 non-determined cells. d Log-normalized CellTag expression of the 4673 transcriptomes from the 5-tag species mixing experiment, demultiplexed into their respective sample identity on the x-axis; CellTag barcodes, y-axis. e Transcriptomes from the 5-tag species mixing experiment projected onto reduced dimensions by t-SNE, visualized with CellTag classification. CellTagC, CellTagD, CellTagE, and CellTagA label HEK293Ts; CellTagB labels MEFs

First, to ensure that CellTag Indexing does not perturb cell physiology, we tested the impact of labeling on a well-characterized lineage reprogramming system, B cell to induced macrophage reprogramming [19]. We cultured HAFTL pre-B cells and induced reprogramming to macrophage fate with β-estradiol, as previously described [19]. One biological replicate was transduced with CellTag lentivirus, while an independent control replicate, cultured in parallel, was not transduced (Additional file 1: Figure S1A). After 72 h of reprogramming, the two induced macrophage samples were independently processed for sequencing, along with a sample of the original, untransduced B cells. This yielded 1310 CellTagged transcriptomes, 2849 control transcriptomes, and 972 B cell transcriptomes. We detected a median of 6 CellTag transcripts per cell in CellTagged transcriptomes (CellTags were detected in every cell of this sample) and 0 in control transcriptomes (Additional file 1: Figure S1B). Clustering and visualization [5, 20] of CellTagged and control macrophage transcriptomes are interspersed with no independent clustering observed, with both clustering separately from B cells (Additional file 1: Figure S1C&D). Additionally, CellTagged and control induced macrophages exhibit comparable upregulation of macrophage marker expression, accompanied by similar levels of B cell marker downregulation (Additional file 1: Figure S1E). Genome-wide comparison of gene expression of the two samples shows a strong linear association with an R2 value of 0.98 (Additional file 1: Figure S1F), confirming that CellTag Indexing does not perturb cell identity or physiology. This is in agreement with our previous study showing that transduction with a random CellTag library does not influence cell behavior [18].

Species mixing of genetically distinct cells validates CellTag-based multiplexing

To assess the efficacy of CellTag-based multiplexing, we applied it to “species mixing,” an experiment commonly performed to estimate cell co-encapsulation rates in droplet-based scRNA-seq [21]. In this experiment, one sample of human HEK293T cells was labeled with CellTag Index A (CellTagA), and one sample of mouse embryonic fibroblasts (MEFs) was labeled with CellTag Index B (CellTagB), for 24–48 h. Transduction efficiency was visualized by measuring the percentage of GFP-positive cells (~ 90%, Additional file 1: Figure S2A). Labeled cells were pooled, in equal proportions, and processed together for single-cell library preparation and sequencing, yielding a total of 18,159 transcriptomes, with 9357 single human cells (aligning predominantly to the hg19 genome), 7456 single mouse cells (aligning predominantly to the mm10 genome), and 1346 multiplets as classified by 10x Genomics’ Cell Ranger pipeline, based on alignment to the custom hg19-mm10 reference genome (Fig. 1b and Additional file 1: Figure S2B). For the purpose of validation, we take this classification result as a benchmark for comparison. To assign sample identity based on CellTag Index expression, we developed a novel demultiplexing algorithm ( [22] that examines the expression distribution of each CellTag Index, followed by a dynamic binarization step to assess each CellTag Index signal on an individual cell basis (Fig. 1a and Additional file 1: Figure S2C; see the “Methods” section). With this method, we demultiplexed the pooled transcriptomes into 7510 human cells (CellTagA), 6397 mouse cells (CellTagB), 1040 multiplets, and 3212 non-determined cells (Fig. 1c and Additional file 1: Figure S2D). Overall, our algorithm successfully classified, or demultiplexed, 82.3% of all transcriptomes. The presence of non-determined cells is likely due to cells that did not receive sufficient dosage of virus during CellTag Index transduction. This can be enhanced by increasing virus multiplicity of infection (MOI) and visualizing the percentage of GFP-expressing cells prior to sequencing, as demonstrated below. For the purpose of benchmarking, we removed the 3212 non-determined cells for comparison with the 10x-based classification (Additional file 1: Figure S2E&F). Using Cohen’s kappa as a measure of agreement between independent observations, we calculated a kappa of 0.814 (Additional file 1: Figure S2G), suggesting that our demultiplexing is in strong agreement with the orthogonal 10x-based classification. Furthermore, cells designated as multiplets by both 10x and CellTagging demonstrate a clear increase in the mean numbers of transcripts per cell (Additional file 1: Figure S2H&I), suggesting they do indeed represent multiplets.

To demonstrate the efficacy of CellTag Indexing for multiplexing several biological samples in one experiment, we conducted additional validation where four samples of HEK293Ts and one sample of MEFs were transduced with five different predefined CellTag Indexes (HEK293Ts: CellTags C, D, E, and A; MEFs: CellTag B). A total of 4673 cells were sequenced, with an inferred doublet rate of 3.6% (see the “Methods” section). Overall, CellTag expression is detected in 99.2% of all cells, reflecting the improved tagging efficiency from an increased MOI. We demultiplexed the transcriptomes as above, including an additional step to resolve misclassified multiplets (Additional file 1: Figure S2C; see the “Methods” section). Overall, 4558 out of 4673 transcriptomes, or 97.5% of all transcriptomes, were successfully classified (Additional file 2: Figure S3A). Visualization of the classified transcriptomes by heatmap of CellTag barcode expression (Fig. 1d) and by dimension reduction (Fig. 1e, Additional file 2: Figure S3B&C) demonstrates clear segregation between species, suggesting that CellTag Indexing can be used to reliably multiplex numerous samples.

CellTag multiplexing enables long-term tracking of cell potential in an in vivo competitive transplant model

Current multiplexing methods are based on transient transfection or temporary molecular interactions with the cell or nucleus surface [7,8,9,10,11,12]. Although relative to CellTag Indexing, this offers faster labeling of cells, it does not support long-term labeling. Here, the unique advantage of CellTag-based multiplexing is that the label is heritable, as a result of stable integration into the cell genome, and can persist for many weeks as we have shown previously [18]. This creates opportunities for the longitudinal analysis of cell behavior over an extended period. Moreover, since experimental groups can be tagged, mixed, and tracked within the same biological replicate, unwanted biological and technical variation is minimized. To explore this application of CellTag multiplexing, we applied the method to assess rates of cell engraftment and intestinal differentiation potential in an in vivo competitive transplant model.

We previously reported that MEFs can be directly reprogrammed, via forced expression of transcription factors, into progenitor-like cells that possess both hepatic and intestinal potential [23, 24]. We demonstrated that these cells, named induced endoderm progenitors (iEPs), are able to functionally engraft a mouse model of induced colitis [24]. Prior to transplant, iEPs possess weak hepatic and intestinal identity, still partially resembling the fibroblasts they originated from. Twelve days after transplant into the mouse large intestine, iEPs more closely resemble differentiated intestine [24]. However, in this study, cell identity was assessed via bulk expression analysis that cannot distinguish between different intestinal cell types. Therefore, the mechanism of engraftment and differentiation potential of cells reprogrammed to iEPs remained to be characterized.

Our recent single-cell lineage tracing of fibroblast to iEP reprogramming revealed that this lineage conversion comprises two distinct trajectories: one path successfully reprogramming to iEPs, and an alternate path characterized by progression into a “dead-end” state, where fibroblast identity is re-established [18]. The transition along the successful reprogramming trajectory is accompanied by upregulation of genes such as Apoa1 and Cdh1 (E-cadherin). We hypothesized that the Apoa1HighEcadHigh iEP cells constitute the subpopulation responsible for our previously observed colon engraftment [24]. In this context, CellTag Indexing is well-suited for tracking and quantifying reprogrammed and dead-end cell differentiation potential as the barcodes are stably integrated and heritable, making it possible to label cells for long-term tracing transplantation experiments.

To test our hypothesis that the Apoa1HighEcadHigh iEP subpopulation harbors intestinal engraftment and differentiation potential, we first enriched EcadHigh and EcadLow iEP populations using fluorescence-activated cell sorting (FACS). Functional assays confirmed that EcadHigh iEPs express significantly higher levels of Apoa1 and Cdh1, form larger colonies of reprogrammed iEPs in culture, and retain their EcadHigh phenotype, relative to their EcadLow counterparts (Additional file 2: Figure S4A-C). We then labeled sorted EcadHigh iEPs with CellTagA and EcadLow iEPs with CellTagB, followed by pooling in equal proportions and transplant into a modified mouse model of colonic mucosal injury [25] (Fig. 2a). Seven days following transplantation, mice were euthanized and dissected, and the engrafted colons collected for histology and single-nucleus RNA sequencing. Microscopic examination of the engrafted tissue reveals iEP engraftment in discrete patches, located by their GFP expression (Fig. 2b). Histology of the cryosectioned engrafted colon shows the expected tissue architecture with evidence of epithelial injury (Fig. 2c), occasional submucosal iEPs (Fig. 2d), and occasional aggregates of iEPs sitting atop of the damaged epithelium (Additional file 2: Figure S4D).

Fig. 2
figure 2

CellTag Indexing for long-term tracking of cells demonstrated in a competitive transplant experiment. A Schematic of iEP generation and enriched into EcadHigh and EcadLow populations by FACS, labeled with CellTagA and CellTagB respectively, pooled in equal proportions and transplanted into a mouse model of colonic injury. Engrafted colon is then processed for single-nucleus RNA-seq. B Fluorescent microscopic images of the lumen of the engrafted colon, showing patches of GFP+ iEPs. Scale bar, 100 μm. C H&E-stained section of the engrafted colon showing normal intestinal architecture with evidence of epithelial injury. Scale bar, 100 μm. D DAPI-stained section of the engrafted colon showing GFP+ iEPs in the mucosa. Scale bar, 100 μm. E Transcriptomes from three post-engraftment colon tissues sequenced and analyzed, visualized by UMAP, revealing 16 clusters. F Annotation of the 16 clusters into (a) Lgr5 Lrig1+ intestinal stem cells (ISCs), (b) Lgr5+ ISCs, (c) deep crypt secretory cells, (d) endothelial cells, (e) enteric neurons, (f) enterocytes, (g) enteroendocrine cells, (h) fibroblasts, (i) goblet cells, (j) iEPs, (k) immune cells, (i) muscle, (m) Nkain2+ Csmd1+ cells, and (n) Reln+ Prox1+ cells. G Marker expression in annotated cell types

Most intestinal cell recovery protocols focus on the harvest of the epithelium, neglecting many other cell types that constitute the intestine. Given the range of engraftment phenotypes observed in our above histology analyses, we considered that iEPs may also differentiate towards non-epithelial cell types. Thus, to capture the full spectrum of intestinal cell identities, we opted to use whole tissue single-nucleus extraction, over epithelial isolation and digestion, to process the engrafted colon for RNA sequencing. Indeed, single-nucleus RNA sequencing (snRNA-seq) from three colon samples, post-engraftment, followed by Uniform Manifold Approximation and Projection (UMAP)-based visualization [20] revealed 16 clusters (Fig. 2e), corresponding to a range of different intestinal epithelial cell types. This included intestinal stem cells (ISCs), enterocytes, deep crypt secretory cells, goblet cells, enteroendocrine cells, and non-epithelial cell types (endothelial cells, muscle, enteric neurons, immune cells, fibroblasts) (Fig. 2f). To our knowledge, this is the first dataset of such that profiles large intestinal cell types beyond the epithelium. Known intestinal markers are observed such as Lgr5, Lrig1, and Smoc2 in ISCs [26,27,28,29]; Reg4 in deep crypt secretory cells [30]; Myt1l, Asic2, and Syt1 in enteric neurons [31,32,33,34]; Vil1, Plac8, and Krt20 in enterocytes [35,36,37]; Nkx2-2, Chga, and Tph1 in enteroendocrine cells [38, 39]; and Fcgbp, Muc2, and Clca1 in goblet cells [40, 41] (Fig. 2g, Additional file 2: Figure S5C-E).

Upon further analysis and literature review, we annotated the ISCs into two populations, Lgr5+ ISCs (clusters 1 and 6) and Lgr5 Lrig1+ ISCs (cluster 0), based on distinct patterns of marker expression (Additional file 2: Figure S5C). Lrig1, a transmembrane negative regulator of ErbB signaling [42], is purported to mark a class of ISCs that are phenotypically distinct from Lgr5+ stem cells in the intestine [27, 28], with additional roles in stem cells of the gastric epithelium [43] and the epidermis [44,45,46]. Lgr5+ ISCs, located in clusters 1 and 6 in this dataset, express high levels of established intestinal stem cell markers Lgr5 and Smoc2, as well as Lrig1 (Fig. 2g, Additional file 2: Figure S5C-E). In contrast to Lgr5, Lrig1 is more widely expressed, with moderate levels of expression extending into cluster 0, where Lgr5 expression is absent (Additional file 2: Figure S5C). This is consistent with two independent studies in the small intestine and colon, where Lrig1 was expressed in many crypt cells, while the highest levels of Lrig1 expression were observed in Lgr5+ stem cells [27, 28]. Loss of Lrig1 caused crypt expansion in Lrig1-knockout animals, and three-dimensional intestinal spheres derived from Lrig1-knockout animals matured into budding organoids in culture without exogenous ErbB ligands in contrast to wild-type samples [27]. Intriguingly, Lrig1 was shown to mark a population of ISCs that expand and repopulate the colonic crypt upon tissue damage [28], although a distinction was not made regarding whether this could be due to the subpopulation of Lrig1+ cells that are also Lgr5+.

Of note, two clusters that remain unannotated (cluster 11, enriched for Reln and Prox1; cluster 14, enriched for Nkain2 and Csmd1) may represent rare or previously unidentified cell types (Additional file 2: Figure S5D). For example, Reln and Prox1 are known for their roles in neuronal migration [47, 48] and neurogenesis [49, 50]; we, therefore, speculate that they may mark a peripheral neuronal cell type in cluster 11.

Colon engrafted iEPs transition through an intestinal stem cell state

To identify iEPs within the single-nucleus landscape of the engrafted colon, we extracted and processed CellTag Indexes across all single transcriptomes. Both CellTagA and CellTagB barcodes were detected in all three post-engraftment samples (Additional file 1: Figure S6A), with clear expression differences between tags (Additional file 2: Figure S6B). Projecting CellTagged iEPs onto the UMAP plot revealed their enrichment in cluster 4 (Fig. 3a, b), while a moderate number of CellTagged cells are found in intestinal epithelial clusters such as cluster 0 (Lgr5 Lrig1+ ISCs) and cluster 1 (Lgr5+ ISCs), expressing ISC markers Lgr5 and Lrig1 (Fig. 3c, d, Additional file 2: Figure S6C).

Fig. 3
figure 3

CellTag Indexing revealed iEP engraftment and transition through an intestinal stem cell fate. a CellTags identified engrafted iEPs enriched in cluster 4 (early engraftment iEPs) and the main intestinal epithelial clusters. b Density heatmap confirms enrichment of CellTagged cells in the early engraftment iEP cluster and the main intestinal epithelial cell clusters. c, d Stacked bar plots of CellTagged cells show enrichment in clusters 0, 1, and 4. e Permutation test of cluster enrichment or depletion for each CellTag in intestinal clusters show statistically significant enrichment of EcadHigh/CellTagA cells in cluster 0 (Lgr5 Lrig1+ ISCs, p = 4.03 × 10− 5) and cluster 1 (Lgr5+ ISCs, p = 9.83 × 10− 3). y-axis, negative log10 of p value for cluster enrichment, log10 of p value for cluster depletion. Dotted lines correspond to a p value of 0.05. f RNA velocity analysis shows velocity vectors from iEPs towards Lgr5 Lrig1+ ISCs and from the ISC clusters towards the differentiated enterocyte clusters. g Subset of velocity vectors of CellTagged cells confirm transcriptional kinetics of engrafted iEPs in the direction towards intestinal stem cells

Engrafted tissue was harvested in early stages of intestinal regeneration, with the epithelium still undergoing active repair. We chose this time point in an effort to understand the mechanism of iEP engraftment. Indeed, in line with this early regeneration period, cluster 4 likely represents cells in the early stages of engraftment and repair, characterized by expression of both intestinal and mesenchymal markers (Additional file 2: Figure S5D&E). Notably, Grip1, an adaptor protein implicated in maintaining the epidermal-dermal junction via the Fras1/Frem1 complex [51, 52], is among the list of marker genes for cluster 4, suggesting that cluster 4 might represent an iEP engraftment mechanism via adhesion to the basement membrane. We next focused on the proportions of fully reprogrammed EcadHigh iEPs (labeled by CellTagA) and dead-end EcadLow iEPs (labeled by CellTagB) engrafting the intestine. We found that 0.687% ± 0.214% of engrafted cells were derived from reprogrammed iEPs whereas 0.413% ± 0.113% of engrafted cells were derived from dead-end iEPs (p = 0.06; Additional file 2: Figure S6D). This low percentage was expected given that we aimed to capture a broad range of intestinal engraftment to provide an unbiased assessment of engraftment.

In our previous study, we observed that iEPs are capable of long-term (7 weeks post-transplant), functional engraftment, where entire crypts are repopulated by iEP-derived cells [24]. At that time, we speculated that iEPs transition through an intestinal stem cell state to support long-term engraftment. Here, considering our hypothesis that fully reprogrammed EcadHigh iEPs are responsible for this long-term engraftment, we performed a randomized test that we previously developed to assign statistical significance in cluster occupancy [18]. Here, we applied this approach to determine whether reprogrammed and dead-end iEPs were more likely to associate with any particular cluster of intestinal cells. We did not include cluster 4 in this analysis as the colonic epithelium is in the early stages of regeneration, where we consider cells in this cluster to be superficially attached, and not all these cells will eventually integrate into the recovered epithelium. Our randomized test revealed that reprogrammed EcadHigh/CellTagA cells are significantly more likely to occupy cluster 0 (Lgr5 Lrig1+ ISCs, p = 4.03 × 10− 5) and cluster 1 (Lgr5+ ISCs, p = 9.83 × 10− 3), while CellTagged reprogrammed and dead-end populations are depleted from non-epithelial cell clusters (Fig. 3e). Together, this suggests that EcadHigh/CellTagA cells integrate into the regenerating epithelium via an intestinal stem cell intermediate. Expression of the ISC markers Lgr5 and Lrig1 in engrafted iEPs supports this observation (Additional file 2: Figure S6C). As reported previously, Lrig1+ ISCs expand and repopulate the colonic crypt upon tissue damage [28], pointing to a potential mechanism of long-term iEP engraftment in the mouse colon.

To further investigate engraftment mechanics, we conducted RNA velocity analysis [53] to reveal the transcriptional kinetics of engrafting iEPs. We reasoned that if these iEPs were differentiating towards intestinal lineages, then transcript kinetics from early iEP engraftment cluster, cluster 4, should show velocity vectors towards annotated intestinal clusters. Indeed, RNA velocities projected onto the UMAP clusters show cluster 4 velocities towards the main intestinal clusters (Fig. 3f, Additional file 2: Figure S6F&G). Specifically, velocity vectors from the subset of CellTagged cells show vectors originating from cluster 4 towards cluster 0, and from the intestinal stem cell pole of the main intestinal clusters towards the more differentiated pole of enterocytes (Fig. 3g). Taken together, here, we have demonstrated the utility of CellTag Indexing to multiplex EcadHigh and EcadLow iEPs for transplantation into the mouse large intestine, suggesting that iEPs transition through a Lgr5+ and/or Lrig1+ stem cell state to engraft and repopulate the colonic epithelium, resolving speculation about their engraftment route. Our findings are consistent with previous reports of iEP differentiation potential and position CellTag Indexing as a powerful long-term tracking and multiplexing tool for scRNA-seq.


Here, we present a broadly applicable toolbox, CellTag Indexing, to label biological samples for single-cell analysis, where each sample is genetically tagged with a predefined lentiviral GFP barcode to mark its sample identity. We demonstrate that CellTag Indexing does not perturb cell physiology, and validate the utility of our multiplexing approach via species mixing, showing that it can be used to accurately multiplex samples for scRNA-seq, with subsequent demultiplexing at high efficiency. We showcase the unique feature of this heritable labeling approach, by tracking cells in a competitive in vivo transplant setting, revealing reprogrammed cell potential and mechanisms of engraftment while providing internal controls to mitigate both biological and technical batch effects. CellTag multiplexing is complementary to current strategies based on transient cell surface interactions for labeling cells immediately prior to scRNA-seq, yet unique in that CellTag barcodes are stably integrated and heritable through cell division. The flexible timing of lentiviral barcode transduction, coupled with stable barcode expression, makes our system uniquely suitable for long-term tracing experiments and transplant models where temporary tags would not be retained.

CellTag Indexing offers the advantages of minimized technical variation by experimental design, the ability to multiplex biological samples for competitive transplant, broad compatibility with various cell types and single-cell technologies, long-term barcode expression, streamlined workflow and library preparation, reduced sequencing cost, and straightforward demultiplexing. CellTag Indexing is designed for broad applications; its use of lentivirus as a labeling method represents a commonly used and accessible biological tool with minimal setup costs and reagent requirements. As lentivirus can transduce both dividing and non-dividing cells, CellTag barcodes can be introduced into a wide variety of cell types. In terms of estimating labeling efficiency, CellTag Indexing conveniently utilizes GFP as a barcode carrier, which can act as a visual readout for transduction efficiency. Generally, CellTag transcripts are abundantly expressed and can be optionally amplified during library preparation to further increase the detection rate.

Importantly, CellTag transcripts can be recovered from the nucleus, extending this approach to single-nucleus RNA sequencing. Furthermore, cells labeled with CellTag Indexes can be cultured and used in experiments prior to collection for sequencing, for example in the competitive transplant assay we demonstrated here where tagged samples act as internal controls for each other to minimize unwanted biological variation. This is complementary to existing labeling methods that utilize cell/nuclear surface chemistry or transient transfection for temporary tagging [7,8,9,10,11,12], where the labels would be progressively lost in vitro and in vivo. Additionally, as a future application, we expect that CellTag multiplexing will be compatible with single-genome-based assays such as single-cell ATAC-seq. In summary, CellTag Indexing is a broadly applicable tool complementary to existing methods for cell multiplexing and tracking, providing a diverse panel of experimental and analytical strategies.


Cell culture

Mouse embryonic fibroblasts were derived from the C57BL/6 J strain (The Jackson Laboratory 000664). HEK293T and mouse embryonic fibroblasts were cultured in Dulbecco’s modified Eagle medium (Gibco) supplemented with 10% fetal bovine serum (Gibco), 1% penicillin/streptomycin (Gibco), and 55 μM 2-mercaptoethanol (Gibco). HAFTL pre-B cells were cultured in RPMI1640 without phenol red (Lonza) supplemented with 10% charcoal/dextran-treated FBS (Hyclone) and 55 μM 2-mercaptoethanol (Gibco) [19].

Generation of iEPs

Mouse embryonic fibroblasts were converted to iEPs as previously described [23, 24]. Briefly, fibroblasts were prepared from E13.5 embryos, cultured on gelatin, and serially transduced every 12 h with Hnf4α-t2a-Foxa1 retrovirus for five times over the course of 3 days, followed by culture on collagen in hepato-medium, which is DMEM:F-12 (Gibco) supplemented with 10% FBS, 1% penicillin/streptomycin, 55 μM 2-mercaptoethanol, 10 mM nicotinamide (Sigma-Aldrich), 100 nM dexamethasone (Sigma-Aldrich), 1 μg/mL insulin (Sigma-Aldrich), and 20 ng/ml epidermal growth factor (Sigma-Aldrich).

CellTag barcodes

CellTag lentiviral constructs were generated by introducing an 8-bp variable region into the 3′ UTR of GFP in the pSmal plasmid [54] using a gBlock gene fragment (Integrated DNA Technologies) and megaprimer insertion ( Individual clones were picked and Sanger sequenced to generate predefined barcodes. The specific CellTag barcodes used in this manuscript are TGCTATAT (CellTagA), GTTGGCTA (CellTagB), AGTTTAGG (CellTagC), GGTTCACA (CellTagD), and TAGAAAGC (CellTagE). These constructs are available from Addgene:

Lenti- and retrovirus production

Lentiviruses were produced by transfecting HEK293T cells with lentiviral pSMAL vector and packing plasmids pCMV-dR8.2 dvpr (Addgene plasmid 8455) and pCMV-VSV-G (Addgene plasmid 8454) using X-tremeGENE 9 (Sigma-Aldrich). Viruses were collected 48 and 72 h after transfection. Retroviruses were similarly produced, with retroviral pGCDNSam vector and packaging plasmid pCL-Eco (Imgenex).

CellTag transduction

CellTag virus-containing supernatant collected from virus-producing HEK293T cells was kept at 4 °C and used within 1 week. Prior to transduction, protamine sulfate (Sigma-Aldrich) was added to the viral solution to a final concentration of 4 μg/ml. Cells were aspirated of media, and the CellTag virus was added to the cells for a 24-h transduction period. This transduction was repeated as needed, for a total of 48 h for HEK293T cells and 72 h for MEFs in the 5-tag species mixing experiment and 72 h for iEPs.

Immunostaining and quantification

Transduced HEK293T and MEFs were cultured on a four-chamber culture slide (Falcon) for 24 h prior to fixation in 4% paraformaldehyde and staining in 300 nM DAPI in PBS. The slide was then mounted in ProLong Gold Antifade Mountant (Invitrogen). Images were acquired on a Nikon eclipse Ts2 inverted microscope. For automatic quantification, images of CellTagged HEK 293T and MEF were processed with a custom python script to count GFP-positive/negative cells. The proportion of GFP-positive cells was calculated from DAPI and GFP images. First, DAPI images were transformed into binary images by thresholding fluorescent signal. The threshold values were determined by the Otsu method. The binary nucleus image was processed by watershed segmentation to separate individual cell areas completely. Inappropriately sized objects were filtered to remove noise and doublet cells. The intensity of the GFP signal per individual cell area was then quantified to distinguish between GFP-positive cells and GFP-negative cells. These processes were run with Python 3.6.1 and its libraries: scikit-image 0.13.0, numpy 1.12.1, matplotlib 2.0.2, seaborn 0.8.1, jupyter 1.0.0.

Mouse model of colonic mucosal injury

Using a previously described procedure [25], we generated colonic epithelial injury with modifications as followed: C57BL/6 mice were anesthetized with inhaled isoflurane. A custom-made syringe catheter (consisted of 3-ml syringe (BD #309657), Luer lock 26-gauge 1/2″ dispensing needle (GraingerChoice #5FVG9), and polyethylene tubing (Scientific Commodities, #BB31695-PE/2) cut to approximate 5 cm in length and affixed to the needle) was used to deliver approximately 1 mL of PBS enema intraluminally via the anal canal, followed by gentle abdominal massage to promote excretion of excess fecal matter. The luminal space was then filled with 0.5 mL of 500 mM EDTA/PBS using the custom syringe catheter over the course of approximately 30 s. Mechanical abrasion was performed with Proxabrush cleaners (Sunstar #872FC) dipped in 500 mM EDTA/PBS, inserted approximately 1 cm into the colon, with 30 rotational movements to gently scratch the luminal surface.

iEP characterization and transplantation

Eight-week iEPs were stained with mouse E-cadherin-APC antibody (10 μL per one million cells, R&D Systems, FAB7481A) and sorted on a modified Beckman Coulter MoFlo into EcadHigh and EcadLow populations. Sorted iEPs were plated and cultured as above. Colony formation assay was performed as previously described [18]. For colon engraftment, CellTagged EcadHigh and EcadLow iEPs were digested into single-cell suspensions. For each mouse, 0.5 million of EcadHigh iEPs (CellTagA) and 0.5 million of EcadLow iEPs (CellTagB) were pooled and resuspended in 50 μL of 10% Matrigel on ice. A total of 1 million iEPs was instilled into the colonic lumen of each mouse by using the custom syringe catheter, after which the mouse was held vertically head-down for approximately 2 min to prevent immediate excretion of the infused cell suspension.

Single-nucleus RNA-seq procedure

Single-nucleus extraction from tissue was performed as previously described [55]. Briefly, engrafted colonic tissues were finely minced with a razor then transferred to a Dounce tissue homogenizer (Kimble Chase KT885300-0002) in 2 mL of ice-cold Nuclei EZ Lysis buffer (Sigma #N-3408) supplemented with protease inhibitor (Roche #5892791001) and RNase inhibitors (Promega #N2615, Thermo Fisher Scientific #AM2696). The tissue was ground 20–30 times with the loose pestle. The homogenate was filtered through a 200-μm cell strainer (pluriSelect #43-50200) then transferred back to the Dounce homogenizer, ground with the tight pestle 10–15 times. The homogenate was incubated on ice for 5 min with an additional 2 mL of lysis buffer, then filtered through a 40-μm cell strainer (pluriSelect #43-50040) and centrifuged at 500g for 5 min at 4 °C. The incubation and centrifugation steps were repeated one time, followed by resuspension Nuclei Suspension Buffer (1× PBS, 1% BSA, 0.1% RNase inhibitor) and filtering through a 5-μm cell strainer (pluriSelect #43-50005). The nuclei were then loaded onto the 10x Chromium Single Cell Platform for encapsulation and barcoding.

scRNA-seq procedure

10x Chromium Single Cell 3′ Library & Gel Bead Kit, 10x Chromium Single Cell 3′ Chip kit, and 10x Chromium i7 Multiplex kit (10x Genomics) were used according to the manufacturer’s protocols. Libraries were quantified on the Agilent 2200 TapeStation and sequenced on Illumina HiSeq 2500.

CellTag demultiplexing

Details of the CellTag Classifier can be found on the GitHub repository ( [22]. Briefly, the CellTag count matrix is extracted as previously described [18] (outlined at CellTag sequences are collapsed using Starcode with the sphere clustering algorithm [56], where CellTags with similar sequences were collapsed to the centroid CellTag. The collapsed CellTag count matrix is log-normalized, from which the most highly expressed CellTags across cells are selected. Then, a dynamic binarization method is applied to assess the existence of each CellTag in each cell, where a “0” suggests insignificant/unobservable signals and a “1” indicates a significant signal. Specifically, for each CellTag, we compute the density function D of its expression across all cells. Then, for each cell, we draw 1000 samples from the density functions D and calculate the proportion P of samples that are greater than or equal to the expression value being tested:

$$ P=\frac{\mathrm{II}\left(S\ge {C}_{ij}\right)}{\mathrm{Length}\ \mathrm{of}\ S} $$

where Cij = expression value of CellTag j in cell i and S = 1000 sample drawn from the density curve of CellTag j, Dj. This process is iterated for at least 50 times to make sure that the samples are representative of the overall density. The cells are then classified to their corresponding CellTag based on the proportions calculated above by finding the overall minimum in each proportion matrix. The uniqueness of the minimum does not eliminate the probability for the cell to be a multiplet. Hence, for cells with a unique minimum, we examine the pair-wise differences between the minimum tag and other tags using a baseline cutoff of 0.238 learned via benchmarking and training against orthogonal 10x classification. Finally, the number of multiplets identified from our pipeline is compared to the expected number derived from 10x Genomics’ Single Cell 3′ Reagents Kit v2 User Guide Rev. E (multiplet % = 0.0007589 × number of cells recovered + 0.0527214). If the number of multiplets exceeds the expected number, the optional multiplet checkpoint is implemented, where the proportion matrix is sorted such that the most likely multiplets are identified using a cutoff at the quantile of (1.5 × expected num/multiplet). The remaining cells are then classified to their singlet identities.

scRNA-seq analysis

The Cell Ranger v.3.0.1 pipeline ( was used to process data generated using the 10x Chromium platform. For alignment of the single-nucleus RNA-seq data, a modified “pre-mRNA” mm10 reference was used to include reads aligned to introns. The R package Seurat [5] (version 3) was used for data processing and visualization. For the iEP dataset, we removed cells with a low number of genes detected (< 200), cells with a high number of UMI detected (> 100,000), and cells with a high proportion of UMI counts attributed to mitochondrial genes (> 20%). The filtered expression matrix was then normalized and scaled to remove unwanted sources of variation driven by the number of detected UMIs and mitochondrial gene expression. Linear dimension reduction was performed, followed by canonical correlation analysis to integrate independent biological replicates, then clustering and visualization via UMAP [20].

Assessing cluster occupancy by randomized testing

A randomized test that we developed previously [18] was used to identify clusters that are significantly occupied by EcadHigh/EcadLow iEPs. In brief, we calculated the proportions of CellTagA and CellTagB cells that fall into each cluster, serving as the null percentages for the two tags. In particular, let n be the number of cells with a CellTag. Let s be the number of cells without this tag. The two were then pooled together from which we drew n random samples without replacement for at least (n + s)/n times such that every possible ending group can be captured. With each sample drawn, the occupancy of n sampled cells in each cluster was calculated. A background proportion distribution was then generated based on this resampling result. We then used the distributions to compute the likelihood of having the null percentage or higher. Using a p value of < 0.05, we identified the clusters that are enriched for each CellTag. This randomized test was performed using a python script. We exclude cluster 4 in this test as it represents the early engraftment stage. Cell number tested for CellTagA equals 66. Cell number tested for CellTagB equals 46.

RNA velocity analysis

RNA velocity was analyzed with (version 0.17.17). The analysis was done according to the web instruction; For the input of single-cell RNA-seq data, the output files of 10x Cell Ranger pipeline were used. The single-cell RNA-seq reads for each sample were converted into read-counts after distinguishing a spliced or unspliced transcript. This process was done with command line velocyto API, and final products were saved as loom files. Next, the loom files of each scRNA-seq sample were merged into a single loom file. The merged loom file was processed with velocyto python API to create the velocyto object. Then, the velocyto object was integrated with UMAP dimensional reduction data and CellTag data which were produced in the scRNA-seq analysis with Seurat and CellTag demultiplexing process. Next, the velocyto object was subjected to quality check and filtering process. Genes were filtered by the mRNA detection level (min_expr_counts = 40, min_cells_express = 30). After feature selection by a velocyto function, the data were normalized by total molecule count. Then, velocyto object was subjected to a series of final data processing process: PCA, k-nn-based imputation, velocity estimation, and shift calculation. Finally, the vectors estimated by RNA velocity were projected on the UMAP graph.


  1. Hicks SC, Townes FW, Teng M, Irizarry RA. Missing data and technical variability in single-cell RNA-sequencing experiments. Biostatistics. 2017;19:562–78.

    PubMed Central  Google Scholar 

  2. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W, et al. Limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic Acids Res. 2015;43:e47.

    PubMed  PubMed Central  Google Scholar 

  3. Shaham U, Stanton KP, Zhao J, Li H, Raddassi K, Montgomery R, et al. Removal of batch effects using distribution-matching residual networks. Bioinformatics. 2017;33:2539–46.

    CAS  PubMed  PubMed Central  Google Scholar 

  4. Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nat Biotechnol. 2018;36:421–7.

    CAS  PubMed  PubMed Central  Google Scholar 

  5. Butler A, Hoffman P, Smibert P, Papalexi E, Satija R. Integrating single-cell transcriptomic data across different conditions, technologies, and species. Nat Biotechnol. 2018;36:411–20.

    CAS  PubMed  PubMed Central  Google Scholar 

  6. Kang HM, Subramaniam M, Targ S, Nguyen M, Maliskova L, McCarthy E, et al. Multiplexed droplet single-cell RNA-sequencing using natural genetic variation. Nat Biotechnol. 2018;36:89–94.

    CAS  PubMed  Google Scholar 

  7. Cao J, Packer JS, Ramani V, Cusanovich DA, Huynh C, Daza R, et al. Comprehensive single-cell transcriptional profiling of a multicellular organism. Science. 2017;357:661–7.

    CAS  PubMed  PubMed Central  Google Scholar 

  8. Gehring J, Park JH, Chen S, Thomson M, Pachter L. Highly multiplexed single-cell RNA-seq for defining cell population and transcriptional spaces. bioRxiv. 2018.

  9. Stoeckius M, Zheng S, Houck-Loomis B, Hao S, Yeung BZ, Mauck WM, et al. Cell hashing with barcoded antibodies enables multiplexing and doublet detection for single cell genomics. Genome Biol. 2018;19:224.

    CAS  PubMed  PubMed Central  Google Scholar 

  10. McGinnis CS, Patterson DM, Winkler J, Hein MY, Srivastava V, Conrad DN, et al. MULTI-seq: scalable sample multiplexing for single-cell RNA sequencing using lipid-tagged indices. bioRxiv. 2018.

  11. Shin D, Lee W, Lee JH, Bang D. Multiplexed single-cell RNA-seq via transient barcoding for drug screening. bioRxiv. 2018.

  12. Gaublomme JT, Li B, McCabe C, Knecht A, Drokhlyansky E, Van WN, et al. Nuclei multiplexing with barcoded antibodies for single-nucleus genomics. bioRxiv. 2018.

  13. Wong ASL, Choi GCG, Cui CH, Pregernig G, Milani P, Adam M, et al. Multiplexed barcoded CRISPR-Cas9 screening enabled by CombiGEM. Proc Natl Acad Sci. 2016;113:2544–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  14. Adamson B, Norman TM, Jost M, Cho MY, Nuñez JK, Chen Y, et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell. 2016;167:1867–1882.e21.

    CAS  PubMed  PubMed Central  Google Scholar 

  15. Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell. 2016;167:1853–1866.e17.

    CAS  PubMed  PubMed Central  Google Scholar 

  16. Jaitin DA, Weiner A, Yofe I, Lara-Astiaso D, Keren-Shaul H, David E, et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-Seq. Cell. 2016;167:1883–1896.e15.

    CAS  PubMed  Google Scholar 

  17. Datlinger P, Rendeiro AF, Schmidl C, Krausgruber T, Traxler P, Klughammer J, et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat Methods. 2017;14:297–301.

    CAS  PubMed  PubMed Central  Google Scholar 

  18. Biddy BA, Kong W, Kamimoto K, Guo C, Waye SE, Sun T, et al. Single-cell mapping of lineage and identity in direct reprogramming. Nature. 2018;564:218–24.

    Google Scholar 

  19. Bussmann LH, Schubert A, Vu Manh TP, De Andres L, Desbordes SC, Parra M, et al. A robust and highly efficient immune cell reprogramming system. Cell Stem Cell. 2009;5:554–66.

    CAS  PubMed  Google Scholar 

  20. Becht E, McInnes L, Healy J, Dutertre CA, Kwok IWH, Ng LG, et al. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37:38–44.

    CAS  Google Scholar 

  21. Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, et al. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015;161:1202–14.

    CAS  PubMed  PubMed Central  Google Scholar 

  22. Guo C, Kong W, Kamimoto K, Rivera-Gonzalez GC, Yang X, Kirita Y, et al. CellTag Indexing: genetic barcode-based sample multiplexing for single-cell genomics. Github Repos.

  23. Sekiya S, Suzuki A. Direct conversion of mouse fibroblasts to hepatocyte-like cells by defined factors. Nature. 2011;475:390–3.

    CAS  PubMed  Google Scholar 

  24. Morris SA, Cahan P, Li H, Zhao AM, San Roman AK, Shivdasani RA, et al. Dissecting engineered cell types and enhancing cell fate conversion via Cellnet. Cell. 2014;158:889–902.

    CAS  PubMed  PubMed Central  Google Scholar 

  25. Fukuda M, Mizutani T, Mochizuki W, Matsumoto T, Nozaki K, Sakamaki Y, et al. Small intestinal stem cell identity is maintained with functional Paneth cells in heterotopically grafted epithelium onto the colon. Genes Dev. 2014;28:1752–7.

    CAS  PubMed  PubMed Central  Google Scholar 

  26. Barker N, van Es JH, Kuipers J, Kujala P, van den Born M, Cozijnsen M, et al. Identification of stem cells in small intestine and colon by marker gene Lgr5. Nature. 2007;449:1003–7.

    CAS  PubMed  Google Scholar 

  27. Wong VWY, Stange DE, Page ME, Buczacki S, Wabik A, Itami S, et al. Lrig1 controls intestinal stem-cell homeostasis by negative regulation of ErbB signalling. Nat Cell Biol. 2012;14:401–8.

    CAS  PubMed  PubMed Central  Google Scholar 

  28. Powell AE, Wang Y, Li Y, Poulin EJ, Means AL, Washington MK, et al. The pan-ErbB negative regulator lrig1 is an intestinal stem cell marker that functions as a tumor suppressor. Cell. 2012;149:146–58.

    CAS  PubMed  PubMed Central  Google Scholar 

  29. Muñoz J, Stange DE, Schepers AG, Van De Wetering M, Koo BK, Itzkovitz S, et al. The Lgr5 intestinal stem cell signature: robust expression of proposed quiescent ′ +4′ cell markers. EMBO J. 2012;31:3079–91.

    PubMed  PubMed Central  Google Scholar 

  30. Sasaki N, Sachs N, Wiebrands K, Ellenbroek SIJ, Fumagalli A, Lyubimova A, et al. Reg4 + deep crypt secretory cells function as epithelial niche for Lgr5 + stem cells in colon. Proc Natl Acad Sci. 2016;113:E5399–407.

    CAS  PubMed  PubMed Central  Google Scholar 

  31. Vohra BPS, Tsuji K, Nagashimada M, Uesaka T, Wind D, Fu M, et al. Differential gene expression and functional analysis implicate novel mechanisms in enteric nervous system precursor migration and neuritogenesis. Dev Biol. 2006;298:259–71.

    CAS  PubMed  PubMed Central  Google Scholar 

  32. Levanti MB, Guerrera MC, Calavia MG, Ciriaco E, Montalbano G, Cobo J, et al. Acid-sensing ion channel 2 (ASIC2) in the intestine of adult zebrafish. Neurosci Lett. 2011;491:24–8.

    Google Scholar 

  33. Roy-Carson S, Natukunda K, chao CH, Pal N, Farris C, Schneider SQ, et al. Defining the transcriptomic landscape of the developing enteric nervous system and its cellular environment. BMC Genomics. 2017;18:290.

    PubMed  PubMed Central  Google Scholar 

  34. Memic F, Knoflach V, Morarach K, Sadler R, Laranjeira C, Hjerling-Leffler J, et al. Transcription and signaling regulators in developing neuronal subtypes of mouse and human enteric nervous system. Gastroenterology. 2018;154:624–36.

    CAS  PubMed  Google Scholar 

  35. Chan CWM, Wong NA, Liu Y, Bicknell D, Turley H, Hollins L, et al. Gastrointestinal differentiation marker Cytokeratin 20 is regulated by homeobox gene CDX1. Proc Natl Acad Sci. 2009;106:1936–41.

    PubMed  PubMed Central  Google Scholar 

  36. Gao S, Yan L, Wang R, Li J, Yong J, Zhou X, et al. Tracing the temporal-spatial transcriptome landscapes of the human fetal digestive tract using single-cell RNA-sequencing. Nat Cell Biol. 2018;20:721–34.

    CAS  PubMed  Google Scholar 

  37. Moor AE, Harnik Y, Ben-Moshe S, Massasa EE, Rozenberg M, Eilam R, et al. Spatial reconstruction of single enterocytes uncovers broad zonation along the intestinal villus axis. Cell. 2018;175:1156–67.

    CAS  PubMed  Google Scholar 

  38. Desai S, Loomis Z, Pugh-Bernard A, Schrunk J, Doyle MJ, Minic A, et al. Nkx2.2 regulates cell fate choice in the enteroendocrine cell lineages of the intestine. Dev Biol. 2008;313:58–66.

    CAS  PubMed  Google Scholar 

  39. Gross S, Garofalo DC, Balderes DA, Mastracci TL, Dias JM, Perlmann T, et al. The novel enterochromaffin marker Lmx1a regulates serotonin biosynthesis in enteroendocrine cell lineages downstream of Nkx2. 2 Development. 2016;143:2616–28.

    CAS  PubMed  Google Scholar 

  40. Johansson MEV, Larsson JMH, Hansson GC. The two mucus layers of colon are organized by the MUC2 mucin, whereas the outer layer is a legislator of host-microbial interactions. Proc Natl Acad Sci. 2010;108:4659–65.

    PubMed  PubMed Central  Google Scholar 

  41. Pelaseyed T, Bergström JH, Gustafsson JK, Ermund A, Birchenough GMH, Schütte A, et al. The mucus and mucins of the goblet cells and enterocytes provide the first defense line of the gastrointestinal tract and interact with the immune system. Immunol Rev. 2014;260:8–20.

    CAS  PubMed  PubMed Central  Google Scholar 

  42. Laederich MB, Funes-Duran M, Yen L, Ingalla E, Wu X, Carraway KL, et al. The leucine-rich repeat protein LRIG1 is a negative regulator of ErbB family receptor tyrosine kinases. J Biol Chem. 2004;279:47050–6.

    CAS  PubMed  Google Scholar 

  43. Schweiger PJ, Clement DL, Page ME, Schepeler T, Zou X, Sirokmány G, et al. Lrig1 marks a population of gastric epithelial cells capable of long-term tissue maintenance and growth in vitro. Sci Rep. 2018;8:15255.

    PubMed  PubMed Central  Google Scholar 

  44. Jensen KB, Watt FM. Single-cell expression profiling of human epidermal stem and transit-amplifying cells: Lrig1 is a regulator of stem cell quiescence. Proc Natl Acad Sci. 2006;103:11958–63.

    CAS  PubMed  PubMed Central  Google Scholar 

  45. Jensen KB, Collins CA, Nascimento E, Tan DW, Frye M, Itami S, et al. Lrig1 expression defines a distinct multipotent stem cell population in mammalian epidermis. Cell Stem Cell. 2009;4:427–39.

    CAS  PubMed  PubMed Central  Google Scholar 

  46. Page ME, Lombard P, Ng F, Göttgens B, Jensen KB. The epidermis comprises autonomous compartments maintained by distinct stem cell populations. Cell Stem Cell. 2013;13:471–82.

    CAS  PubMed  PubMed Central  Google Scholar 

  47. Sheldon M, Rice DS, D’Arcangelo G, Yoneshima H, Nakajima K, Mikoshiba K, et al. Scrambler and yotari disrupt the disabled gene and produce a reeler- like phenotype in mice. Nature. 1997;389:730–3.

    CAS  PubMed  Google Scholar 

  48. Sentürk A, Pfennig S, Weiss A, Burk K, Acker-Palmer A. Ephrin Bs are essential components of the Reelin pathway to regulate neuronal migration. Nature. 2011;472:356–60.

    PubMed  Google Scholar 

  49. Iwano T, Masuda A, Kiyonari H, Enomoto H, Matsuzaki F. Prox1 postmitotically defines dentate gyrus cells by specifying granule cell identity over CA3 pyramidal cell fate in the hippocampus. Development. 2012;139:3051–62.

    CAS  PubMed  Google Scholar 

  50. Holzmann J, Hennchen M, Rohrer H. Prox1 identifies proliferating neuroblasts and nascent neurons during neurogenesis in sympathetic ganglia. Dev Neurobiol. 2015;75:1352–67.

    CAS  PubMed  Google Scholar 

  51. Bladt F, Tafuri A, Gelkop S, Langille L, Pawson T. Epidermolysis bullosa and embryonic lethality in mice lacking the multi-PDZ domain protein GRIP1. Proc Natl Acad Sci. 2002;99:6816–21.

    CAS  PubMed  PubMed Central  Google Scholar 

  52. Smyth I, Du X, Taylor MS, Justice MJ, Beutler B, Jackson IJ. The extracellular matrix gene Frem1 is essential for the normal adhesion of the embryonic epidermis. Proc Natl Acad Sci. 2004;101:13560–5.

    CAS  PubMed  PubMed Central  Google Scholar 

  53. La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, et al. RNA velocity of single cells. Nature. 2018;560:494–8.

    PubMed  PubMed Central  Google Scholar 

  54. Lu R, Neff NF, Quake SR, Weissman IL. Tracking single hematopoietic stem cells in vivo using high-throughput sequencing in conjunction with viral genetic barcoding. Nat Biotechnol. 2011;29:928–33.

    CAS  PubMed  PubMed Central  Google Scholar 

  55. Wu H, Uchimura K, Donnelly EL, Kirita Y, Morris SA, Humphreys BD. Comparative analysis and refinement of human PSC-derived kidney organoid differentiation with single-cell transcriptomics. Cell Stem Cell. 2018;23:869–881.e8.

    CAS  PubMed  PubMed Central  Google Scholar 

  56. Zorita E, Cuscó P, Filion GJ. Starcode: sequence clustering based on all-pairs search. Bioinformatics. 2015;31:1913–9.

    CAS  PubMed  PubMed Central  Google Scholar 

  57. Guo C, Kong W, Kamimoto K, Rivera-Gonzalez GC, Yang X, Kirita Y, et al. CellTag Indexing: genetic barcode-based sample multiplexing for single-cell genomics. Gene Expression Omnibus 2019.

Download references


We would like to thank members of the Morris lab and Kristen Seiler for critical discussions, John Dick for the kind gift of the pSMAL-GFP construct [54] and Thomas Graf for the HAFTL B cell line [19], Genome Technology Access Center for sequencing, and Siteman Flow Cytometry core for assistance with cell sorting.


This work was funded by the National Institutes of Health (NIH) grants R01-GM126112, R21-HG009750, and P30-DK052574; Silicon Valley Community Foundation, Chan Zuckerberg Initiative Grants HCA-A-1704-01646 and HCA2-A-1708-02799; The Children’s Discovery Institute of Washington University and St. Louis Children’s Hospital MI-II-2016-544. S.A.M. is supported by a Vallee Scholar Award, C.G. NIH-5T32GM007200-42, K.K. Japan Society for the Promotion of Science Postdoctoral Fellowship, and G.C.R.G. Washington University in. St. Louis School of Medicine Center of Regenerative Medicine Rita Levi-Montalcini Fellowship.

Availability of data and materials

All source data including sequencing reads, single-cell expression matrices, and CellTag classification metadata are available from the Gene Expression Omnibus (GEO) under accession number GSE130065 [57]. The CellTag demultiplexing algorithm is available at [22] under the MIT license.

Author information

Authors and Affiliations



CG and SAM conceived the research. CG led experimental work, assisted by GCRG, XY, and YK. CG and WK led the computational analysis, assisted by KK and supervised by SAM. CG wrote the manuscript, reviewed by WK, KK, GCRG, XY, YK, and SAM. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Samantha A. Morris.

Ethics declarations

Ethics approval and consent to participate

All animal procedures were based on animal care guidelines approved by the Institutional Animal Care and Use Committee at Washington University in St. Louis under protocol number 20150192.

Competing interests

The authors declare that they have no competing interests.

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Additional files

Additional file 1:

Supplementary Figure S1 and S2 (PDF 18535 kb)

Additional file 2:

Supplementary Figure S3-S6 (PDF 14243 kb)

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Guo, C., Kong, W., Kamimoto, K. et al. CellTag Indexing: genetic barcode-based sample multiplexing for single-cell genomics. Genome Biol 20, 90 (2019).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: