- Open Letter
- Open Access
Challenges and emerging directions in single-cell analysis
Genome Biology volume 18, Article number: 84 (2017)
Single-cell analysis is a rapidly evolving approach to characterize genome-scale molecular information at the individual cell level. Development of single-cell technologies and computational methods has enabled systematic investigation of cellular heterogeneity in a wide range of tissues and cell populations, yielding fresh insights into the composition, dynamics, and regulatory mechanisms of cell states in development and disease. Despite substantial advances, significant challenges remain in the analysis, integration, and interpretation of single-cell omics data. Here, we discuss the state of the field and recent advances and look to future opportunities.
Cell-to-cell variation is a universal property of multi-cellular organisms, which contain diverse cell types characterized by different functions, morphologies, and gene expression profiles. Even within any single tissue, no matter how apparently homogeneous, there is a diverse population of cells, all of which represent different manifestations of that tissue type. Investigation of tissues or cell populations is inherently limited by the fact that the readout of any pooled assay that uses bulk tissue represents a weighted average of that population’s cellular constituents. Intrinsic cellular heterogeneity is obscured in the typical ensemble studies on which the canon of modern biology and medicine is constructed.
Consider, for example, the diverse repertoire of cells present in the three most rapidly self-renewing tissues in mammals: blood, skin, and the intestinal epithelium. Although the trajectory from stem to terminally differentiated cell is almost certainly a continuum of highly variable states, our limited understanding forces us to regard known stem and progenitor cell populations as discrete and stable entities. Even in post-mitotic tissues such as the adult brain, the differentiated cell states resulting from complex bifurcating developmental trajectories may also appear as a continuum. The diversity of cellular states is not only caused by their own inherent cell-to-cell variability, but also influenced by interactions among tens or even hundreds of distinct cells. These considerations question the precise boundary of a cell type and point to the need for single-cell analysis to dissect the underlying complexity and the empirical reality of stable and distinct cell states.
The past few years have seen the introduction of technologies that provide genome-scale molecular information at the resolution of single cells, providing unprecedented power for systematic investigation of cellular heterogeneity in DNA [1, 2], RNA , proteins , and metabolites . These technologies have been applied to identify previously unknown cell types and associated markers [6,7,8] and to predict developmental trajectories [9,10,11,12,13].
Beyond expanding the catalog of mammalian cell states and identities, single-cell analyses have challenged prevailing ideas of cell-fate determination [14,15,16,17,18,19] and opened new ways of studying the mechanisms associated with disease development and progression. For example, single-cell DNA sequencing (scDNA-seq) has revealed remarkable cellular heterogeneity inside each tumor, significantly revising models of clonal evolution [20,21,22], whereas single-cell RNA sequencing (scRNA-seq) has shed new light on the role of tumor microenvironments in disease progression and drug resistance .
The ambitious goal of understanding the full complexity of cells in a multi-cellular organism collectively requires not only experimental methods that are considerably better than existing platforms, but also synchronous development of computational methods that can be used to derive useful insights from complex and dense data on large numbers of diverse single cells. Several recent papers have discussed various challenges critical to advance the incipient field of single-cell analysis [24,25,26,27]; here we expand on these discussions with a focus on looking to the future.
Current challenges in analyzing single-cell data
While many methods have been successfully used for the analysis of genomic data from bulk samples, the relatively small number of sequencing reads, the sparsity of data, and cell population heterogeneity present significant analytical challenges in effective data analysis. Recent advances in computational biology have greatly enhanced the quality of data analyses and provided important new biological insights [24,25,26,27].
The goal of data preprocessing is to convert the raw measurements to bias-corrected and biologically meaningful signals. Here we focus on scRNA-seq, which has become the primary tool for single-cell analysis. Gene expression profiling by scRNA-seq is inherently noisier than bulk RNA-seq, as vast amplification of small amounts of starting material combined with sparse sampling introduce significant distortions. A typical single-cell gene expression matrix contains excessive zero entries. The limited efficiency of RNA capture and conversion rate combined with DNA amplification bias may lead to significant distortion of the gene expression profiles. On one hand, even transcripts that are expressed at a high level may occasionally evade detection altogether, resulting in false-negative errors. On the other hand, transcripts that are expressed at a low level may appear abundant due to amplification biases. These errors artificially inflate the estimate of the cell-to-cell variability. While a number of methods have been developed to address this issue [28,29,30], managing dropout events continues to be a challenge. Another source of technical variation is the batch effect, which can be introduced when cells from one biological group are cultured, captured, and sequenced separately from cells in a second condition. If a scRNA-seq experiment is designed improperly, the results can be significantly affected by batch effects . Furthermore, high throughput technologies typically involve multiplexing of thousands or more barcode sequences. Errors in demultiplexing may be caused by barcode impurities or external background, and dealing with them has become increasingly challenging as thousands or more cells are multiplexed by recent technologies. Finally, the cell-to-cell variation may also be attributed to cell size, cell cycle state, and other factors that are irrelevant for cell type identification. Statistical models have been developed to remove such confounding factors . Together, these technical artifacts pose important challenges for data calibration and interpretation.
The entanglement of technical and biological variation poses a significant challenge for evaluating data reproducibility. One approach to directly measure technical variability is to use dilute bulk RNA to approximately single-cell levels (~10–50 pg of total RNA) [32, 33]. However, this approach has at least two significant limitations. First, RNA purification leaves out cellular factors that may impede RNA isolation and amplification. Second, accurate dilution up to single-cell levels is technically challenging. Another approach is to use external spike-ins, such as ERCC . However, this approach also has a number of limitations . First, the spike-in probes typically have different molecular properties to the RNA molecules of interest. Second, the spike-in probes interact differently with respect to different molecular biology protocols. Furthermore, the dynamic range of spike-in sets like ERCC is often not optimized for the dynamic range of a typical single-cell transcriptome (~103–104). As such, there is a great need to develop better-controlled methods for separating technical and biological variation. Considering these limitations, targeted approaches aimed at precise quantification of key pathways may provide more biological insights in some applications.
Lack of spatial-temporal context
Single-cell DNA-based and scRNA-based assays often contain the following steps: cell isolation; cell sorting; and library preparation and sequencing. During this process, cells are isolated from their local environment and destroyed prior to profiling. These “snapshots” lose important contextual information regarding both a cell’s spatial environment and position within a trajectory of dynamic behavior . Both sources of information are crucial to interpret the precise state of a cell at the time point of its isolation (and usually destruction).
In situ transcriptomic analysis
To preserve spatial information, a transcriptome can be profiled in situ in fixed cells and tissues, using either in situ hybridization (ISH) or sequencing. Single molecule florescence in situ hybridization (smFISH) provides a powerful tool for detecting individual transcripts [36, 37]. Using super-resolution microscopy [38, 39], this was extended to image over a dozen messenger RNA (mRNA) in situ regardless of transcript density . More recently, a temporal barcoding scheme was developed that scales exponentially with the number of hybridizations, called sequential FISH (seqFISH) . In parallel, in situ sequencing methods were developed to directly sequence transcripts in tissue sections [42, 43], which has broad coverage but lower efficiency compared to FISH-based methods. More recently, a Hamming distance 2-based error correcting barcode system called merFISH  was developed and can be applied to long transcripts (>3 kb). This technology has recently been extended to detect 130 mRNA species . Fundamentally, because of high background in tissues, smFISH-based methods are difficult to apply directly for detection of mRNAs in tissues.
An amplified version of seqFISH , based on hybridization chain reaction (HCR) , allows robust detection of mRNAs in tissues and thick cleared brain samples. Combining amplification and a simple one-drop tolerant error correction scheme, this technology was applied to profile up to 249 genes, with each mRNA detected at ~80% efficiency, in over 15,000 cells in the mouse brain to resolve the structural organization of the hippocampus with single-cell resolution . The authors identified distinct layers in the dentate gyrus corresponding to the granule cell layer and the subgranular zone. They also found that the dorsal CA1 is relatively homogeneous at the single-cell level, while ventral CA1 is highly heterogeneous. For imaging large samples, such as the brain, imaging speed is rate limiting, rather than the switching time between hybridizations. This is because one can toggle between two samples on the microscope, one that is being imaged and another that is being hybridized. Faster imaging modalities such as lattice lightsheet  and faster cameras can enable higher throughput in the number of imaged cells.
Future work in spatial genomics will take several directions. First, to combine spatial transcriptome data with scRNA-seq data, one can take an approach where cell states are defined by RNA-seq and then mapped onto the spatial images and transcription profiles determined by spatial transcriptome data . Second, to increase the optical space available in each cell and allow more mRNAs to be resolved spatially, expansion microscopy  can physically enlarge the tissue sample. An alternative image correlation approach  can also allow dense transcripts to be decoded. Lastly, analysis of in situ transcriptomic data requires development of new computational methods, for example, to automatically detect spatial patterns from combinations of multiple genes.
Live imaging transcriptomic analysis
Cellular and molecular behaviors are highly dynamic and constantly changing. These dynamic behaviors greatly complicate the interpretation of snapshot single-cell analyses because individual cells will differ in their molecular state not only from other cells, but even from themselves if analyzed at a different time point . Importantly, these dynamics may not represent noise, but rather a basis for important regulatory mechanisms controlling cell identity, so it is important to quantify dynamic changes and to understand their relevance . Unfortunately, it is also much more difficult than static snapshot analyses. Cells must be kept alive and unchanged during the continuous and sometimes very long non-invasive analysis of their behaviors. The acquisition, handling, and analysis of time-resolved single-cell data then require specialized technical and theoretical approaches. Not only are the requirements for robustness of data acquisition technologies such as live imaging much higher than for snapshot analyses, but the resulting large and complex volumes of data require specialized solutions. These differ from tools available to analyze snapshot data and often require self-made custom developments. This holds true for the required theoretical algorithms and for user-friendly implementation .
The objective of lineage tracing is to label the progeny of individual cells using molecular markers and use such information to reconstruct the developmental trajectories. Recently, high-throughput lineage-tracing methods have been developed using CRSIPR/Cas9-based multiplexing DNA barcodes synthesis [54,55,56,57,58]. These barcodes are stably registered in the genome and inherited during cell division and differentiation. Additional mutations are cumulated over time, through either combinatorial editing at multiple guide RNA (gRNA) target loci [54, 55, 57] or by sequential editing at a single locus [56, 58]. In the latter approach, the investigators introduced genetic mutations at the Streptococcus pyogenes gRNA-encoding sequence to circumvent the requirement of the PAM motif in gRNA recognition, enabling the resulting gRNA to repeatedly target its own locus. In addition, the DNA barcodes can be sequenced in situ, thereby preserving the spatial information . Some of the aforementioned technologies have been applied to study developmental loci [54, 55, 57] and immune response . In one study , the investigators traced the cell lineages in zebrafish and found that the majority of cells in each organ are derived from a small number of progenitor cells, whereas different progenitors are biased toward different germ layers and organs. Similar results are reported in an independent study . These lineage-tracing technologies will likely have wide-ranging applications in mapping developmental and disease-progression trajectories.
While significant effort has been dedicated to improving the quality and throughput of various omic assays, work is also ongoing to develop methods to profile multiple sources of information in the same cells. Multi-omics profiling is valuable for accurate mapping of cell states and can provide insights into regulatory mechanisms. For example, genomic DNA and mRNA transcripts from the same cells can be quantified by either physical separation  or pre-amplification , followed by high throughput sequencing. In the former, extracted genomic DNA can be further processed by bisulfite conversion, leading to simultaneous quantification of the methylome and transcriptome [61, 62]. Bioinformatic analysis of the bisulfite sequencing data can further detect genetic information [63, 64]. Protein and transcriptome have also been measured in the same cells . Multi-omic methods applied to single cells have revealed some surprises. For example, profiling DNA and RNA variability in single acute lymphoblastic leukemia cells suggests that genetic heterogeneity is not responsible for the diverse response of drug treatment (Enver, unpublished).
Recent technologies have moved even beyond single cells to investigate sub-cellular localization of biologically active molecules. For example, nanoliter-scale cell fractionation or micro-manipulation has been applied to measure subcellular information within single cells . On a different front, super-resolution imaging has been applied to map the nuclear compartmentalization of chromatin domains . These subcellular data provide new insights into the precise mechanisms of various cellular processes. Ultimately, we may be able to understand phenotypic differences between genetically identical cells in terms of such variations in subcellular organization.
Modeling and predictions
Different cell types usually arise from a linear hierarchy of differentiation stages and one goal of single-cell analysis is to identify previously unknown cell types and lineage relationships. Numerous methods have been developed to isolate similar cell types from single-cell gene expression data [7, 50, 68, 69]. Furthermore, additional methods have been developed to specifically detect rare cell subpopulations [70, 71]. To compensate for the dropout effect, methods have also been developed to impute gene expression based on similar cell types .
Single-cell analysis has helped refine traditional views of cell differentiation. For example, a number of studies [14,15,16, 73] report evidence to suggest that megakaryocytes emerge at a “high” level, approximating that of the hematopoietic stem cell (HSC); this insight challenges the prevailing model that the megakaryocytic lineage emerges late in the differentiation cascade. The cell states defined by transcriptomic patterns are surprisingly continuous instead of forming distinct, transcriptionally defined groups [15, 74]. This apparent continuity of cell states poses practical challenges for cell annotation and, conceptually, implies a need for significant revisions to current models of cell-lineage hierarchy.
Data from single-cell studies have enabled the development of mathematical models that represent the distribution of cell states as one sampled from a dynamic system [11, 17, 19, 73]. In this view, cell types are modeled as “attractors” , stable states that are determined by the underlying gene regulatory networks and sometimes referred to as the energy landscape. In some models, stochastic fluctuation, due to either intrinsic or extrinsic noise, may facilitate dispersion and transition between attractors . Although complex, these mathematical models can be used not only to explain the continuity of cell states but also, in some cases, to predict the initiation events during cell differentiation, thereby providing mechanistic insights . In a similar way, the hierarchy of cell states can be measured by entropy, which has been applied to inform cell differentiation directions [76, 77]. These new methods have opened up new ways to think about cell states, not as discrete entities, but as a continuum. To connect these two viewpoints, it is critical to determine with high precision the level of natural variation that defines the same cell type and distinguish this from the changes linked to functional state transitions. A major obstacle to achieving this goal is that the resolution of cell-state identification is limited by the quality of the underlying scRNA-seq data, which varies greatly depending on sequencing depth and other factors. Such differences have contributed to the debate over the organizing structure of the hematopoietic lineage hierarchy [15, 16].
As single-cell data continue to grow in quality and quantity, new cell states, lineages, and associated markers are being identified at an accelerating rate. It is important to recognize that such findings are typically based on correlative analyses and that their functional relevance needs to be carefully evaluated through further experimental validation.
A first level of validation is to utilize the identified markers to label the predicted cell type and visualize it in its original tissue. For example, unsupervised clustering of 25,000 single-cell transcriptomes identified 15 types of bipolar neurons . The authors identified cell type-specific markers and fluorescently labeled the predicted cell types by DNA FISH. They found that the spatial organization of these predictive cell types is restricted to definitive layers and that different cell types display distinct morphologies, thereby supporting their functional identity.
A deeper level of validation requires design of functional assays to demonstrate that a predicted cell type has unique properties. For example, single-cell analysis showed that common myeloid progenitors (CMP) occur in two varieties associated with differential expression of CD55 . Using an in vitro colony forming assay, the investigators found that CD55+ CMP produce predominantly erythroid and megakaryocytic (MegE) colonies, whereas few MegE colonies are formed from CD55– CMP, indicating these two subpopulations are functionally different. A similar strategy has been applied to compare the functional difference between HSC subpopulations, termed MolO and NoMO . These investigators found that MolO cells were enriched for higher than average CD150 and Sca-1 surface marker expression and lower than average CD48 expression.
In the same vein, engineered animal models can allow isolation of cell populations and functional testing. For example, comparative scRNA-seq analysis between HSCs from young and old mice identified a gene signature associated with the MegE lineage . By using a transgenic mouse strain carrying a VWF-EGFP reporter, the authors verified an increased bias toward platelet-priming HSCs in old mice.
Combining scRNA-seq- and CRISPR/cas9-based perturbations
CRISPR/Cas9-based genetic screens have been widely used to systematically characterize gene functions [80, 81]. Recently, this technique has been combined with scRNA-seq analysis [82,83,84,85], thereby greatly increasing the throughput of functional readouts. In these studies, gene activities are disrupted by either genetic mutations [82, 83, 85] or epigenetic inhibition . gRNA-specific reporter transcripts are synthesized, which can be detected along with the mRNAs by scRNA-seq sequencing. By varying the concentration of the gRNA-containing vectors, the technique can be used to study the gene function either in isolation or in combination. In one study , the investigators applied this technology to analyze the effects of 24 TFs in mediating the immune response of dendritic cells. They found that the TFs form distinct modules, each targeting a common set of gene programs. Further analysis detected significant genetic interactions among a subset of TFs. In another study , the engineered hematopoietic progenitor cells were injected into wild-type recipient mice to evaluate their effect in hematopoiesis. This allowed them to identify a previously unknown role of Cebpb in regulating the balance between dendritic cells and monocytes during development. The combination of genome editing and scRNA-seq profiling provides a powerful tool for high-throughput dissection of gene functions and will have a wide range of applications in biomedical research.
Genomic profiling has been widely used to identify markers, mechanisms, and therapeutic targets of diseases. Most studies to date identify disease-related alterations by comparing genomic profiles obtained from bulk disease samples and their normal counterparts. However, these average profiles provide a distorted view of the disease sample if it contains significant cellular heterogeneity, as in cancer. Single-cell technologies have provided a set of powerful tools to dissect cellular heterogeneity and which have led to important discoveries in cancer [23,87,88,, 86–89] and other diseases [90, 91]. For example, multiplexing quantitative polymerase chain reaction analysis has identified subtypes of leukemia cells with distinct capacities for proliferation [87, 92]. Application of scRNA-seq to cancer also led to identification of rare subpopulations associated with drug resistance  or self-renewal , whereas scDNA-seq can be used to reconstruct paths of clonal evolution . Single-cell profiling will provide new opportunities for mechanistic understanding of the initiation and progression of human diseases and to develop novel treatment methods targeting specific cell types.
We recognize that to overcome each challenge requires significant lab and computational infrastructure resources. To move forward, the field needs groups of people with diverse expertise to work together. Interdisciplinary approaches are recognized to be important, if not a crucial prerequisite, for addressing many open questions, but also come with numerous challenges. First, the lack of expertise for parts of an interdisciplinary collaboration requires increased effort and communication. The importance of a common language is well known, but remains a significant problem in almost every new project. Ideally, this hurdle will be overcome by a new generation of students and postdocs who are educated in multiple disciplines like biology/medicine, engineering, and theoretical sciences. Interdisciplinary science, while leading to higher long-term impact, tends to be slower and published in journals of lesser impact  and is hard to organize and fund. Thus, it needs more patience, in particular in environments with funding cycles that require fast, short-term output. Finally, not only language, but also career paths, scientific and publication cultures, hiring procedures and age, and scientific talent, and academic motivation vary widely across disciplines. While many of these differences pose managerial challenges and should not impact scientific merit, in reality they often are the reason for failures of interdisciplinary endeavors. Overcoming these problems will require changes in teaching, funding, and publication and hiring procedures, which would benefit most areas of science, but will only have a measurable effect after a few years.
Single-cell analysis is an exciting and rapidly expanding field that holds tremendous potential to improve our understanding of fundamental biological problems and help us to better understand the nature and complexity of human disease in order to develop more effective therapies. To achieve these ambitious goals, proper control needs to be taken to warrant the detection of genuine heterogeneity existing in cell populations and tissue samples. In addition, we need to invest in development of new methods. Single-cell data present a number of intrinsic challenges, including systematic noise, the features of biological systems, and the sparsity and complexity of the data. The past few years have witnessed remarkable growth in the field, a trend we believe will continue, enabling more rigorous development of methods and deeper understanding of biological complexity.
Eberwine J, Sul JY, Bartfai T, Kim J. The promise of single-cell sequencing. Nat Methods. 2014;11:25–7.
Blainey PC, Quake SR. Dissecting genomic diversity, one cell at a time. Nat Methods. 2014;11:19–21.
Sandberg R. Entering the era of single-cell transcriptomics in biology and medicine. Nat Methods. 2014;11:22–4.
Spitzer MH, Nolan GP. Mass cytometry: single cells, many features. Cell. 2016;165:780–91.
Zenobi R. Single-cell metabolomics: analytical and biological perspectives. Science. 2013;342:1243259.
Treutlein B, Brownfield DG, Wu AR, Neff NF, Mantalas GL, Espinoza FH, et al. Reconstructing lineage hierarchies of the distal lung epithelium using single-cell RNA-seq. Nature. 2014;509:371–5.
Zeisel A, Munoz-Manchado AB, Codeluppi S, Lonnerberg P, La Manno G, Jureus A, et al. Brain structure. Cell types in the mouse cortex and hippocampus revealed by single-cell RNA-seq. Science. 2015;347:1138–42.
Shekhar K, Lapan SW, Whitney IE, Tran NM, Macosko EZ, Kowalczyk M, et al. Comprehensive classification of retinal bipolar neurons by single-cell transcriptomics. Cell. 2016;166:1308–23. e1330.
Trapnell C, Cacchiarelli D, Grimsby J, Pokharel P, Li S, Morse M, et al. The dynamics and regulators of cell fate decisions are revealed by pseudotemporal ordering of single cells. Nat Biotechnol. 2014;32:381–6.
Bendall SC, Davis KL, Amir e-AD, Tadmor MD, Simonds EF, Chen TJ, et al. Single-cell trajectory detection uncovers progression and regulatory coordination in human B cell development. Cell. 2014;157:714–25.
Marco E, Karp RL, Guo G, Robson P, Hart AH, Trippa L, et al. Bifurcation analysis of single-cell gene expression data reveals epigenetic landscape. Proc Natl Acad Sci U S A. 2014;111:E5643–50.
Setty M, Tadmor MD, Reich-Zeliger S, Angel O, Salame TM, Kathail P, et al. Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat Biotechnol. 2016;34:637–45.
Haghverdi L, Buttner M, Wolf FA, Buettner F, Theis FJ. Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods. 2016;13:845–8.
Guo G, Luc S, Marco E, Lin TW, Peng C, Kerenyi MA, et al. Mapping cellular hierarchy by single-cell analysis of the cell surface repertoire. Cell Stem Cell. 2013;13:492–505.
Paul F, Arkin Y, Giladi A, Jaitin DA, Kenigsberg E, Keren-Shaul H, et al. Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell. 2015;163:1663–77.
Olsson A, Venkatasubramanian M, Chaudhri VK, Aronow BJ, Salomonis N, Singh H, et al. Single-cell analysis of mixed-lineage states leading to a binary cell fate choice. Nature. 2016;537:698–702.
Hoppe PS, Schwarzfischer M, Loeffler D, Kokkaliaris KD, Hilsenbeck O, Moritz N, et al. Early myeloid lineage choice is not initiated by random PU.1 to GATA1 protein ratios. Nature. 2016;535:299–302.
Kim TK, Sul JY, Peternko NB, Lee JH, Lee M, Patel VV, et al. Transcriptome transfer provides a model for understanding the phenotype of cardiomyocytes. Proc Natl Acad Sci U S A. 2011;108:11918–23.
Kim J, Eberwine J. RNA: state memory and mediator of cellular phenotype. Trends Cell Biol. 2010;20:311–8.
Ni X, Zhuo M, Su Z, Duan J, Gao Y, Wang Z, et al. Reproducible copy number variation patterns among single circulating tumor cells of lung cancer patients. Proc Natl Acad Sci U S A. 2013;110:21083–8.
Wang Y, Waters J, Leung ML, Unruh A, Roh W, Shi X, et al. Clonal evolution in breast cancer revealed by single nucleus genome sequencing. Nature. 2014;512:155–60.
Gawad C, Koh W, Quake SR. Dissecting the clonal origins of childhood acute lymphoblastic leukemia by single-cell genomics. Proc Natl Acad Sci U S A. 2014;111:17947–52.
Tirosh I, Izar B, Prakadan SM, Wadsworth 2nd MH, Treacy D, Trombetta JJ, et al. Dissecting the multicellular ecosystem of metastatic melanoma by single-cell RNA-seq. Science. 2016;352:189–96.
Bacher R, Kendziorski C. Design and computational analysis of single-cell RNA-sequencing experiments. Genome Biol. 2016;17:63.
Skylaki S, Hilsenbeck O, Schroeder T. Challenges in long-term imaging and quantification of single-cell dynamics. Nat Biotechnol. 2016;34:1137–44.
Wagner A, Regev A, Yosef N. Revealing the vectors of cellular identity with single-cell genomics. Nat Biotechnol. 2016;34:1145–60.
Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, et al. Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol. 2015;33:155–60.
Kharchenko PV, Silberstein L, Scadden DT. Bayesian approach to single-cell differential expression analysis. Nat Methods. 2014;11:740–2.
Lun AT, Bach K, Marioni JC. Pooling across cells to normalize single-cell RNA sequencing data with many zero counts. Genome Biol. 2016;17:75.
Finak G, McDavid A, Yajima M, Deng J, Gersuk V, Shalek AK, et al. MAST: a flexible statistical framework for assessing transcriptional changes and characterizing heterogeneity in single-cell RNA sequencing data. Genome Biol. 2015;16:278.
Hicks SC, Teng M, Irizarry RA. On the widespread and critical impact of systematic bias and batch effects in single-cell RNA-Seq data. bioRxiv. 2015. doi:10.1101/025528.
Marinov GK, Williams BA, McCue K, Schroth GP, Gertz J, Myers RM, et al. From single-cell to cell-pool transcriptomes: stochasticity in gene expression and RNA splicing. Genome Res. 2014;24:496–510.
Dueck H, Khaladkar M, Kim TK, Spaethling JM, Francis C, Suresh S, et al. Deep sequencing reveals cell-type-specific patterns of single-cell transcriptome variation. Genome Biol. 2015;16:122.
Baker SC, Bauer SR, Beyer RP, Brenton JD, Bromley B, Burrill J, et al. The external RNA controls consortium: a progress report. Nat Methods. 2005;2:731–4.
Svensson V, Natarajan KN, Ly LH, Miragaia RJ, Labalette C, Macaulay IC, et al. Power analysis of single-cell RNA-sequencing experiments. Nat Methods. 2017;14:381–7.
Femino AM, Fay FS, Fogarty K, Singer RH. Visualization of single RNA transcripts in situ. Science. 1998;280:585–90.
Raj A, Peskin CS, Tranchina D, Vargas DY, Tyagi S. Stochastic mRNA synthesis in mammalian cells. PLoS Biol. 2006;4:e309.
Betzig E, Patterson GH, Sougrat R, Lindwasser OW, Olenych S, Bonifacino JS, et al. Imaging intracellular fluorescent proteins at nanometer resolution. Science. 2006;313:1642–5.
Rust MJ, Bates M, Zhuang X. Sub-diffraction-limit imaging by stochastic optical reconstruction microscopy (STORM). Nat Methods. 2006;3:793–5.
Lubeck E, Cai L. Single-cell systems biology by super-resolution imaging and combinatorial labeling. Nat Methods. 2012;9:743–8.
Lubeck E, Coskun AF, Zhiyentayev T, Ahmad M, Cai L. Single-cell in situ RNA profiling by sequential hybridization. Nat Methods. 2014;11:360–1.
Ke R, Mignardi M, Pacureanu A, Svedlund J, Botling J, Wahlby C, et al. In situ sequencing for RNA analysis in preserved tissue and cells. Nat Methods. 2013;10:857–60.
Lee JH, Daugharthy ER, Scheiman J, Kalhor R, Yang JL, Ferrante TC, et al. Highly multiplexed subcellular RNA sequencing in situ. Science. 2014;343:1360–3.
Chen KH, Boettiger AN, Moffitt JR, Wang S, Zhuang X. RNA imaging. Spatially resolved, highly multiplexed RNA profiling in single cells. Science. 2015;348:aaa6090.
Moffitt JR, Hao J, Wang G, Chen KH, Babcock HP, Zhuang X. High-throughput single-cell gene-expression profiling with multiplexed error-robust fluorescence in situ hybridization. Proc Natl Acad Sci U S A. 2016;113:11046–51.
Shah S, Lubeck E, Schwarzkopf M, He TF, Greenbaum A, Sohn CH, et al. Single-molecule RNA detection at depth by hybridization chain reaction and tissue hydrogel embedding and clearing. Development. 2016;143:2862–7.
Choi HM, Chang JY, le Trinh A, Padilla JE, Fraser SE, Pierce NA. Programmable in situ amplification for multiplexed imaging of mRNA expression. Nat Biotechnol. 2010;28:1208–12.
Shah S, Lubeck E, Zhou W, Cai L. In situ transcription profiling of single cells reveals spatial organization of cells in the mouse hippocampus. Neuron. 2016;92:342–57.
Legant WR, Shao L, Grimm JB, Brown TA, Milkie DE, Avants BB, et al. High-density three-dimensional localization microscopy across large volumes. Nat Methods. 2016;13:359–65.
Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33:495–502.
Chen F, Tillberg PW, Boyden ES. Optical imaging. Expansion microscopy. Science. 2015;347:543–8.
Coskun AF, Cai L. Dense transcript profiling in single cells by image correlation decoding. Nat Methods. 2016;13:657–60.
Etzrodt M, Endele M, Schroeder T. Quantitative single-cell approaches to stem cell research. Cell Stem Cell. 2014;15:546–58.
McKenna A, Findlay GM, Gagnon JA, Horwitz MS, Schier AF, Shendure J. Whole-organism lineage tracing by combinatorial and cumulative genome editing. Science. 2016;353:aaf7907.
Junker JP, Spanjaard B, Peterson-Maduro J, Alemany A, Hu B, Florescu M, et al. Massively parallel whole-organism lineage tracing using CRISPR/Cas9 induced genetic scars. bioRxiv. 2016. doi:10.1101/056499.
Perli SD, Cui CH, Lu TK. Continuous genetic recording with self-1 targeting CRISPR-Cas in human cells. Science. 2016;353:aag0511.
Schmidt ST, Zimmerman SM, Wang J, Kim SK, Quake SR. Quantitative analysis of synthetic cell lineage tracing using nuclease barcoding. ACS Synth Biol. 2017. doi:10.1021/acssynbio.6b00309.
Kalhor R, Mali P, Church GM. Rapidly evolving homing CRISPR barcodes. Nat Methods. 2017;14:195–200.
Macaulay IC, Haerty W, Kumar P, Li YI, Hu TX, Teng MJ, et al. G&T-seq: parallel sequencing of single-cell genomes and transcriptomes. Nat Methods. 2015;12:519–22.
Dey SS, Kester L, Spanjaard B, Bienko M, van Oudenaarden A. Integrated genome and transcriptome sequencing of the same cell. Nat Biotechnol. 2015;33:285–9.
Angermueller C, Clark SJ, Lee HJ, Macaulay IC, Teng MJ, Hu TX, et al. Parallel single-cell sequencing links transcriptional and epigenetic heterogeneity. Nat Methods. 2016;13:229–32.
Hu Y, Huang K, An Q, Du G, Hu G, Xue J, et al. Simultaneous profiling of transcriptome and DNA methylome from a single cell. Genome Biol. 2016;17:88.
Hou Y, Guo H, Cao C, Li X, Hu B, Zhu P, et al. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Res. 2016;26:304–19.
Cheow LF, Courtois ET, Tan Y, Viswanathan R, Xing Q, Tan RZ, et al. Single-cell multimodal profiling reveals cellular epigenetic heterogeneity. Nat Methods. 2016;13:833–6.
Genshaft AS, Li S, Gallant CJ, Darmanis S, Prakadan SM, Ziegler CG, et al. Multiplexed, targeted profiling of single-cell proteomes and transcriptomes in a single reaction. Genome Biol. 2016;17:188.
Cabili MN, Dunagin MC, McClanahan PD, Biaesch A, Padovan-Merhar O, Regev A, et al. Localization and abundance analysis of human lncRNAs at single-cell and single-molecule resolution. Genome Biol. 2015;16:20.
Wang S, Su JH, Beliveau BJ, Bintu B, Moffitt JR, Wu CT, et al. Spatial organization of chromatin domains and compartments in single chromosomes. Science. 2016;353:598–602.
Levine JH, Simonds EF, Bendall SC, Davis KL, el Amir AD, Tadmor MD, et al. Data-driven phenotypic dissection of AML reveals progenitor-like cells that correlate with prognosis. Cell. 2015;162:184–97.
Xu C, Su Z. Identification of cell types from single-cell transcriptomes using a novel clustering method. Bioinformatics. 2015;31:1974–80.
Grun D, Lyubimova A, Kester L, Wiebrands K, Basak O, Sasaki N, et al. Single-cell messenger RNA sequencing reveals rare intestinal cell types. Nature. 2015;525:251–5.
Jiang L, Chen H, Pinello L, Yuan GC. GiniClust: detecting rare cell types from single-cell gene expression data with Gini index. Genome Biol. 2016;17:144.
Prabhakaran S, Azizi E, Carr A, Pe’er D. Dirichlet process mixture model for correcting technical variation in single-cell gene expression data. J Mach Learn Res. 2016;48:1070–9.
Haas S, Hansson J, Klimmeck D, Loeffler D, Velten L, Uckelmann H, et al. Inflammation-induced emergency megakaryopoiesis driven by hematopoietic stem cell-like megakaryocyte progenitors. Cell Stem Cell. 2015;17:422–34.
Macaulay IC, Svensson V, Labalette C, Ferreira L, Hamey F, Voet T, et al. Single-cell RNA-sequencing reveals a continuous spectrum of differentiation in hematopoietic cells. Cell Rep. 2016;14:966–77.
Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol. 1969;22:437–67.
Grun D, Muraro MJ, Boisset JC, Wiebrands K, Lyubimova A, Dharmadhikari G, et al. De Novo prediction of stem cell identity using single-cell transcriptome data. Cell Stem Cell. 2016;19:266–77.
Teschendorff AE. Single-cell entropy for quantification of differentiation potency from a cell’s transcriptome. bioRxiv. 2016. doi:10.1101/084202.
Wilson NK, Kent DG, Buettner F, Shehata M, Macaulay IC, Calero-Nieto FJ, et al. Combined single-cell functional and gene expression analysis resolves heterogeneity within stem cell populations. Cell Stem Cell. 2015;16:712–24.
Grover A, Sanjuan-Pla A, Thongjuea S, Carrelha J, Giustacchini A, Gambardella A, et al. Single-cell RNA sequencing reveals molecular and functional platelet bias of aged haematopoietic stem cells. Nat Commun. 2016;7:11075.
Shalem O, Sanjana NE, Hartenian E, Shi X, Scott DA, Mikkelsen TS, et al. Genome-scale CRISPR-Cas9 knockout screening in human cells. Science. 2014;343:84–7.
Wang T, Wei JJ, Sabatini DM, Lander ES. Genetic screens in human cells using the CRISPR-Cas9 system. Science. 2014;343:80–4.
Dixit A, Parnas O, Li B, Chen J, Fulco CP, Jerby-Arnon L, et al. Perturb-Seq: dissecting molecular circuits with scalable single-cell RNA profiling of pooled genetic screens. Cell. 2016;167:1853–66. e1817.
Jaitin DA, Weiner A, Yofe I, Lara-Astiaso D, Keren-Shaul H, David E, et al. Dissecting immune circuits by linking CRISPR-pooled screens with single-cell RNA-Seq. Cell. 2016;167:1883–96. e1815.
Adamson B, Norman TM, Jost M, Cho MY, Nunez JK, Chen Y, et al. A multiplexed single-cell CRISPR screening platform enables systematic dissection of the unfolded protein response. Cell. 2016;167:1867–82. e1821.
Datlinger P, Rendeiro AF, Schmidl C, Krausgruber T, Traxler P, Klughammer J, et al. Pooled CRISPR screening with single-cell transcriptome readout. Nat Methods. 2017;14:297–301.
Patel AP, Tirosh I, Trombetta JJ, Shalek AK, Gillespie SM, Wakimoto H, et al. Single-cell RNA-seq highlights intratumoral heterogeneity in primary glioblastoma. Science. 2014;344:1396–401.
Saadatpour A, Lai S, Guo G, Yuan GC. Single-cell analysis in cancer genomics. Trends Genet. 2015;31:576–86.
Tsoucas D, Yuan GC. Recent progress in single-cell cancer genomics. Curr Opin Genet Dev. 2017;42:22–32.
Tirosh I, Venteicher AS, Hebert C, Escalante LE, Patel AP, Yizhak K, et al. Single-cell RNA-seq supports a developmental hierarchy in human oligodendroglioma. Nature. 2016;539:309–13.
van den Bos H, Spierings DC, Taudt AS, Bakker B, Porubsky D, Falconer E, et al. Single-cell whole genome sequencing reveals no evidence for common aneuploidy in normal and Alzheimer’s disease neurons. Genome Biol. 2016;17:116.
Gaudilliere B, Fragiadakis GK, Bruggner RV, Nicolau M, Finck R, Tingle M, et al. Clinical recovery from surgery correlates with single-cell immune signatures. Sci Transl Med. 2014;6:255ra131.
Saadatpour A, Guo G, Orkin SH, Yuan GC. Characterizing heterogeneity in leukemic cells using single-cell gene expression analysis. Genome Biol. 2014;15:525.
Van Noorden R. Interdisciplinary research by the numbers. Nature. 2015;525:306–7.
This article is a result of the discussions at the Radcliffe Institute Exploratory Seminar on “Theoretical Challenges in Single-Cell Analysis” in June 2016. We are grateful for the Radcliffe Institute’s generous financial and logistical support.
The work was supported by a Radcliffe Institute Exploratory Seminar Award and by the NIH grants R13-CA124365 and R01-DK081113S1.
Availability of data and materials
GCY conceived the study. All authors participated in the discussions and writing of the manuscript. All authors read and approved the final manuscript.
The authors declare that they have no competing interests.
Ethics approval and consent to participate
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
About this article
Cite this article
Yuan, G., Cai, L., Elowitz, M. et al. Challenges and emerging directions in single-cell analysis. Genome Biol 18, 84 (2017) doi:10.1186/s13059-017-1218-y
- Cell State
- Cellular Heterogeneity
- Common Myeloid Progenitor
- Hybridization Chain Reaction
- Similar Cell Type