Our single-cell analysis platform is based on Printed Droplet Microfluidics (PDM) [28, 29], an approach that allows cells to be optically scanned and dispensed to custom nanoliter well plates (nanoplates) (Fig. 1a). To perform linked optical and scRNA-seq analysis, we record the fluorescence of a cell while confined to a droplet, then dispense the cell and droplet to the nanoplate at defined locations. Then, scRNA-seq library preparation is performed on each cell using specific “coordinate oligos” encoding each cell’s location on the nanoplate. After sequencing, these oligos allow each cell barcode to be traced to a well of origin, thereby linking it to the optical data collected for that cell. The workflow is similar to flow cytometry, except that the sorter is a microfluidic device and the wells in which the cells are dispensed are ~ 10,000-fold smaller than conventional microwells. This reduction in volume, combined with the speed of the microfluidic printer, enables highly scalable optical phenotyping and sequencing of single cells.
Prior to device operation, a separate flow focusing device encapsulates cells in a droplet emulsion. We introduce this emulsion into the PDM device, where each drop is optically scanned (Fig. 1a, left). As in flow cytometry, laser-induced fluorescence accomplishes the optical analysis, whereby focused lasers excite fluorescence of the cells which a multicolor detector then captures (Fig. 1a, middle). Cells with desired fluorescence properties are isolated through sorting them into a printing nozzle that dispenses them into a nanowell on the substrate (Fig. 1a, right). We record the cell fluorescence and dispense location, allowing this information to be paired with the scRNA-seq data collected later.
To link the optical and sequencing data, we index the wells such that each cells’ dataset can be traced back to a well on the array. The indexes comprise “coordinate oligos” pre-loaded into the wells using a commercial reagent spotter (Fig. 1b, lower left) [30]. To index the array, we place “X” and “Y” coordinate oligos, each of which contains a different 8 base sequence encoding the specific location of a given nanowell on the plate. The coordinate oligos are polyadenylated, allowing them to be captured along with cellular mRNA during the scRNA-seq library preparation. For scRNA-seq, we adapt the validated “Drop-Seq” protocol [7], which uses beads coated with poly-deoxythymidine “barcode” oligos to capture and label both mRNA and coordinate oligos. We accomplish this by co-dispensing beads and cells in each nanowell and lysing the cells. After retrieving the beads, performing the requisite library preparation steps of Drop-Seq, and sequencing the barcoded cDNA, we obtain a collection of reads representing the cell transcriptome and coordinate oligos, all sharing a Drop-Seq barcode. Thus, the location of the cell from which the data originate is encoded in the sequencing data, allowing it to be traced back to a specific well on the array (Fig. 1c, left) and associated with the previously recorded optical data (Fig. 1c, middle). With this paired dataset, we can use dimensionality reduction methods to first visualize gene expression data, to which we add the optical phenotype information (Fig. 1c, right).
The microfluidic print device consists of a droplet spacer, sorter, and printing nozzle (Fig. 2a). A packed emulsion containing cells or beads is introduced, spaced by oil, and optically scanned by a four-color laser-induced fluorescence detector (Fig. 2a, red outline). Embedded fiber optics excite and collect fluorescence that is processed through filters and analyzed in real time by custom software; this allows cell, bead, and droplet fluorescence and scattering to be recorded, to determine whether to print the droplet and its contents to the current nanowell. Printing is achieved by sorting a droplet (Fig. 2a, green outline) into the printing nozzle positioned above a nanowell (Fig. 2a, purple outline); if the current droplet should not be printed, it is not sorted into the nozzle and passes into the discard channel. Because the carrier oil is viscous and denser than water, in the absence of other forces, the ejected droplet would float away and not go into the nanowell. Thus, to dispense it into the nanowell, electrodes positioned under the substrate emit an oscillating electric field. This field pulls the dispensed droplet into the nanowell by a dielectrophoretic force and is key to the speed of PDM, since it allows a droplet to travel the final few hundred microns from the printing nozzle to the nanowell in tens of milliseconds [28]. Moreover, because the trap extends above the substrate, the printer need not dispense the droplets with perfect accuracy into the nanowells, since any droplet within the electric field will, ultimately, be pulled into the nearest nanowell. The trapping field also ensures that the printed droplets remain fixed in the wells. Upon completion of a print run, droplets can be released by un-powering the electrodes (Additional file 1: Movie S1).
To demonstrate the accuracy of scRNA-seq using our approach, we perform an experiment with two cell types. We prepare and encapsulate a mixed suspension of Calcein Red stained mouse (3T3) and Calcein Green stained human (HEK293) cells (Fig. 2b, upper left). When scanned in the print head, we observe distinct green and red cell populations (Fig. 2b, upper right); thus, with suitable gating instructions, the printer can print these cells in a defined pattern to the nanoplate. To enable scRNA-seq of the printed cells, Drop-Seq beads must also be printed, which requires that they be detectable in the print head; this is accomplished by labeling them with 4-MU, a blue dye that does not overlap with the cell stains (Fig. 2b, lower left). The 4-MU dye is insoluble in water and remains sequestered within the beads following labeling, allowing bead-loaded to be discerned from bead-empty droplets (Fig. 2b, lower right). To print cells and beads in defined combinations, we generate a “print file” containing gating and location instructions that we input into the printer software; the printer reads this file, printing cells and beads to the nanoplate according to the instructions in the file (Fig. 2c, left).
A strength of PDM for scRNA-seq is that it allows the nanowells to be controllably loaded with cells and beads; this contrasts with other high-throughput scRNA-seq methods which randomly load cells or beads and, thus, are less efficient. Moreover, PDM allows systematic variation of nanowell contents across the array, to choose conditions that maximize data quality [28]. For instance, minimizing the rate of doublets—when two cells are inadvertently sequenced as one—leads to better data quality and lower per-cell sequencing costs [31, 32]. While we cannot directly monitor the number of cells passing through each droplet, in the two-cell experiment, we gate out heterotypic doublets of droplets containing a red and green cell and minimize the rate of homotypic doublets by excluding the upper tail of the Calcein Red and Calcein Green distributions. Moreover, controlled printing allows us to load multiple capture beads to every well (Fig. 2c, right, Additional file 2: Movie S2), which can increase cell and mRNA capture efficiency by compensating for losses during sequencing library preparation [33]. To illustrate this, we print two substrates, the first with 1 bead per well and the second with 4, both on 42 by 56 nanowell (2352) arrays. All beads originating from the same well contain the same coordinate barcodes, allowing us to group reads associated with multiple beads together. Due to loss of beads during library preparation, starting with more beads per well increases the likelihood of recovering at least one bead from every well (Fig. 3a). When printing more beads, we also recover more transcripts per well (Fig. 3b, upper), and that the number of transcripts per bead remains consistent when recovering up to four beads per well (Fig. 3b, lower). This suggests that increasing bead surface area per cell lysate increases mRNA capture.
Single-cell RNA-seq methods depend on minimal cross contamination of RNA between cells [31]. To investigate cross contamination, we print mouse and human cells in a checkerboard pattern and classify each transcriptome according to which species’ genome the transcripts predominately align. We find that the optical data align 99.3% of the time with the expected printing pattern (Fig. 3c, left). We recover transcripts from 1204 nanowells following sequencing and bioinformatic analysis, with 97.6% of transcriptomes having species purities of at least 90% mouse or 90% human and matching the expected printing pattern (Fig. 3c, right). As we demonstrate a near-perfect success rate when printing beads and cells, we reason that the most of the lost nanowells represent beads that were not collected after printing or lost during library preparation. We find that 1.8% of nanowells have less than 90% species purity, suggesting either mRNA cross contamination or misprinting of cells (dots not aligned with axes) (Fig. 3d). When we sequence deeper on a smaller subset of beads, we recover on average 17,500 and 15,200 transcripts from mouse and human cells, respectively (Additional file 3: Fig. S1). In sum, we demonstrate quality optical and scRNA-seq data from over a thousand cells when printing four beads per well.
Cells undergo changes in state and phenotype through the cell cycle [17]. For example, by the G2 phase, cells have doubled their genome, allowing it to be optically detected via DNA staining [34]. Moreover, different genes peak and diminish in expression through the cycle, making the cell cycle useful for validating our approach. As a model system, we use Jurkat cells stained with DRAQ5, which is a live-cell stain for genomic DNA (Fig. 4a, upper). We observe a broad distribution of DNA fluorescence, the brightest of which likely correspond to cells with the most DNA and, thus, in the G2 or M phase of the cell cycle. To confirm these results, we perform flow cytometry of the suspension and obtain a similar distribution (Fig. 4a, lower). We utilize PDM to generate a 56 by 56 nanowell array to which we print a checkerboard pattern of low and high DRAQ5 expressing cells (Fig. 4b). We sequence transcriptomes from 437 cells and use Uniform Manifold Approximation and Projection (UMAP) to visualize these cells [35]. Through assigning cells to G1, S, or G2M phases based on their expression of cell cycle-associated genes and using those genes for principal component analysis, we generate a UMAP plot which identifies three clusters in agreement with three stages of the cell cycle (Fig. 4c, left). To determine whether these classifications agree with the optical data, we annotate the points of the UMAP plot according to the magnitude of DRAQ5 fluorescence (Fig. 4c, right). The plots are in general agreement, with the state comprising the most DNA (G2M) appearing brightest in the DNA stain. To observe how the population varies through this cycle, we order the cells by fluorescence and plot the proportion in the three states as classified by gene expression (Fig. 4d). We expect the proportion of cells in G1 to be at low DRAQ5 signal, S phase in the middle of the distribution, and G2 at the top end. The peak of the G1 phase curve is at the low end of the DRAQ5 distribution, S phase in the middle, and G2/M at the top end. We observe general concordance with the expected trend when we pair fluorescence and scRNA-seq measurements of cell cycle state. With our platform, we thus demonstrate characterization of a fundamental biological process through linked optical and gene expression analysis.
Cell populations are often heterogeneous, with particularly important cells present below 5%, such as those involved in cancer pathogenesis, immunological memory, and maintaining adult stem cell niches [25,26,27]. With PDM, we can enrich for rare cell types using their optical phenotype prior to their loading on the nanowell array, markedly increasing the abundance of rare cell types in the obtained sequencing data. We demonstrate this capability by enriching for CD14-positive (CD14+) and CD16-positive (CD16+) peripheral blood mononuclear cells (PBMCs). We count CD14+ and CD16+ cell abundances prior to analysis using PDM and determine that they constitute ~ 25% and ~ 3% of all PBMCs, respectively. We load the nanowell array with CD14+ and CD16+ PBMCs, demonstrating an up to 405-fold enrichment of cells over their detection frequencies (Fig. 5a, upper). We recover transcriptomes from 310 cells and determine their phenotype through gene expression analysis. By overlaying the cell phenotype over the fluorescence data, we determine that natural killer (NK) cells largely are CD14−/CD16+, nonclassical monocytes CD14+/CD16+, and classical monocytes and monocytes of an intermediate phenotype CD14+/CD16− (Fig. 5a, lower). These four phenotypes cluster separately on a UMAP (Fig. 5b, left). By contrast, when we sample cells from the sequencing data such that the ratios of CD14+ to CD16+ cells match the observed abundances, we do not resolve the NK or nonclassical monocyte clusters after clustering (Fig. 5b, right). The ability of our approach to enrich for specific cells in a heterogeneous population and dedicate all the sequencing to them thus provides a powerful advantage when the cells of interest are rare.
Flow cytometry is routinely used to profile monocyte phenotypes based on CD14 and CD16 surface marker expression [36]. By overlaying the optical data on the UMAP, we show that CD14+/CD16− cells largely correlate with the classical and intermediate monocyte clusters, and the NK cells and nonclassical monocyte clusters are CD16+ (Fig. 5c). We also observe an intermediate monocyte population that is CD14+/CD16+. To further study the trends between optical phenotype and gene expression, we plot gene expression versus CD14 or CD16 fluorescence for three relevant genes: LYZ, S100A9, and IFITM2 (Fig. 5d). LYZ and S100A9 are associated with the classical monocyte phenotype [37], and IFITM2 is associated with nonclassical monocytes [38]. By ordering cells from low to high fluorescence and taking the moving average of gene expression, we uncover trends between fluorescence and gene expression for these genes. LYZ expression decreases with increasing CD16 fluorescence; S100A9 expression increases with increasing CD14 fluorescence; IFITM2 expression is globally lower than the others, but expression peaks in the middle of the CD16 fluorescence distribution. By combining fluorescence and gene expression measurements, we highlight that the expression of signature monocyte phenotype-associated genes gradually changes with optical phenotype.
Single-cell RNA-seq is a powerful and general method for analyzing cells, but not all traits of interest are observable in gene expression data alone. Here, we demonstrate an approach that allows gene expression to be linked to optical data in a high-throughput format scalable to thousands of single cells. A key concern for any scRNA-seq workflow that relies on barcoding cells for bulk sample amplification is the loss of cells due to loss of beads during downstream processing [33]. By spreading each cell’s transcriptome over several beads, we increase the probability of recovering transcriptomes for all cells. Our platform’s ability to localize combinations of cells, beads, and reagents at defined positions on a nanoliter array affords other powerful capabilities, such as systematic variation of cell, reagent, and drug combinations and tuning of optical and sequencing parameters to achieve optimal data. The open nature of the array also makes it amenable to additional measurement modalities, such as atomic force microscopy, mass spectrometry, and chemical assays, all of which can be linked with optical and scRNA-seq data using the approach we have presented [39, 40]. The ability to link sequencing readouts with other measurements for thousands of single cells will facilitate further investigations into the molecular underpinnings of cell function [41].