Quartz-Seq2: a high-throughput single-cell RNA-sequencing method that effectively uses limited sequence reads

High-throughput single-cell RNA-seq methods assign limited unique molecular identifier (UMI) counts as gene expression values to single cells from shallow sequence reads and detect limited gene counts. We thus developed a high-throughput single-cell RNA-seq method, Quartz-Seq2, to overcome these issues. Our improvements in the reaction steps make it possible to effectively convert initial reads to UMI counts, at a rate of 30–50%, and detect more genes. To demonstrate the power of Quartz-Seq2, we analyzed approximately 10,000 transcriptomes from in vitro embryonic stem cells and an in vivo stromal vascular fraction with a limited number of reads. Electronic supplementary material The online version of this article (10.1186/s13059-018-1407-3) contains supplementary material, which is available to authorized users.


Streamline for pooling of cell-barcoded cDNA
After cell barcoding in reverse transcription, labeled cDNA is pooled and purified for subsequent whole-transcript amplification. A low volume of reverse-transcription solution for cell barcoding is efficient for cost reduction [4][5][6]. On the other hand, the pooling of labeled cDNA at a low volume from a PCR plate is complicated and time-consuming using pipette tips. Ideally, cell-barcoded cDNA should be pooled and purified with minimal loss for the subsequent whole-transcript amplification step in an expeditious manner. Therefore, we validated and streamlined the pooling step of cell-barcoded cDNA at a low volume. We collected 2 µL of solution from every well of a 384-well PCR plate and estimated the collection efficiency. By using an eight-channel pipette, we collected 79.6±2.4% (n=3) of solution in approximately 10 min from every well of a 384-well PCR plate. We also developed a spin-down collection system for the PCR plate ( Figure 1 and Additional file 1: Fig. S4). Using this collection system (type A), we collected 88.7±0.99% (n=3) of solution in 5 min from two to four plates. The collection efficiency of cDNA solution was improved by approximately 10% by constructing this spin-down collection system. In addition, the collection time per 384-well plate using this collection system was four-to eightfold faster than with a multi-channel pipette. We also validated the purification efficiency of cell-barcoded cDNA.
We determined the qPCR score with eight genes using purified first-strand cDNA and unpurified first-strand cDNA. The purification efficiency was 93.77% (n=4) (Additional file 1: Fig S3a). By combining our approach with the spin-down collection system, we estimated that 80%-82% of cell-barcoded cDNA could theoretically be used for subsequent whole-transcript amplification in our system.
To import more cDNA molecules into subsequent whole-transcript amplification, we attempted to improve the efficiency of conversion from target RNA to first-strand cDNA in reverse transcription. First, we performed reverse transcription under conditions with various buffers and temperatures (Additional file 1: Fig. S3b). The Spearman's rank correlation coefficient (SCC) between the Quartz-Seq-like condition (PCR buffer, 45ºC, 20 min) and positive control was 0.6957. In contrast, the SCC between the modified condition (T100, 50ºC, 50 min) and the positive control was 0.8996. Moreover, reverse-transcription efficiency was slightly improved by approximately 20% on average (Additional file 1: Fig. S3b). These results show that the modified RT condition reduced the bias of sequence preference in reverse transcription. In addition, we could import more cDNA molecules into subsequent whole-transcript amplification.

Relationship between sequence platform and Quartz-Seq2
We also focused on the relationship between sequencing platform and Quartz-Seq2. We did not observe clear differences between the NextSeq500 and HiSeq2500 sequencing platforms for determination of the UMI count and the gene count for Quartz-Seq2 (Additional file 1: Fig. S14a).
On the other hand, the unique mapping ratio and UMI count clearly depended on the Read2 length for transcript mapping (Additional file 1: Fig. S14c, d). In this regard, the number of detected genes slightly depended on Read2 length (Additional file 1: Fig. S14d). Long Read2 length led to better UMI counts, but high sequencing cost (Additional file 1: Fig. S14b). Therefore, in this study, we mainly used the NextSeq500 sequence platform and a Read2 length of 62 due to the cost performance of this combination.

Library preparation method for Quartz-Seq2
Moreover, we mainly used the adaptor ligation method for sequence library preparation of Quartz-Seq2. We did not observe a clear difference between the two sequence library preparation methods (ligation and Nextera/Tn5 transposase) regarding determination of the UMI count and the gene count. However, we could easily identify specific familiar genes (such as Pou5f1 and Gapdh) using the ligation method (Additional file 1: Fig. S14e). Therefore, we mainly used the ligation method for Quartz-Seq2. Fig. S1. Summary of improvements conferred by Quartz-Seq2

Supplemental Figures
Target RNA was converted into amplifiable cDNA with a cell barcode sequence via several steps.
We show a summary of the improvements in each step in the boxes. To convert more mRNA molecules to first-strand cDNA, we improved the efficiency of reverse transcription. Moreover, we used low-enzyme concentration for reverse-transcription. To collect more cDNA molecules for subsequent steps, we improved the collection efficiency in pooling steps. In addition, to convert more cDNA molecules to amplifiable cDNA, we improved the poly-A tagging efficiency.
Finally, we reduced the PCR bias using an UMI sequence in the reverse-transcription primer. first-strand cDNA. Previously, we reported that the length of target DNA is important for the suppression of byproduct synthesis. Specifically, the byproduct tended to form a pan-like structure, which prevented binding between PCR primer and template DNA. Finally, byproduct amplification was suppressed.
b) The effect of RT primer on byproduct synthesis. We used three kinds of RT primer (Quartz-Seq RT primer, v2 RT primer, and v3.1 RT primer) for whole-transcript amplification, in accordance with the Quartz-Seq-like conditions. First-strand cDNA with respective RT primer was treated with column purification and/or exonuclease I digestion. After that, purified cDNA solution was used for whole-transcript amplification. The yellow arrowhead indicates the byproduct derived from RT primer. c) Typical cDNA pattern of Quartz-Seq2 from a 384-well plate containing 10 pg of total RNA. Thereafter, we determined the qPCR scores for eight genes (Eef1b2, Nanog, Pou5f1, Sox2, Utf1, Spp1, Ywhae, and Tbp) using purified first-strand cDNA and unpurified first-strand cDNA. We estimated the purification efficiency using the slope score of linear regression analysis by comparison between unpurified and purified conditions. The purification efficiency was 93.77% (n=4). The 10 ng of tRNA as carrier RNA. We thus investigated whether column purification could be used instead of exonuclease I treatment, which reached 95.52% (n=4) (p-value, 0.987).
SCC stands for Spearman's rank correlation coefficient, while PCC stands for Pearson's correlation coefficient. b) Improvement of reverse-transcription efficiency. We prepared first-strand cDNA using 200 pg of total RNA and poly-dT primer (mixture of v3.1) under six conditions (n=4). We also prepared first-strand cDNA using 200 pg of total RNA and random hexamer as a control for comparison (n=4). For the reverse transcription with random hexamer, we used the SuperScript IV system in accordance with the manufacturer's instructions. We determined qPCR scores for eight genes (Eef1b2, Nanog, Pou5f1, Sox2, Utf1, Spp1, Ywhae, and Tbp) using the above cDNA. We describe the respective conditions (buffers and temperature conditions) for reverse transcription with poly-dT primer in the boxes. In Quartz-Seq, the reverse-transcription buffer and temperature conditions were as follows (PCR buffer and 35ºC for 5 min, 45ºC for 20 min). Finally, we adopted the following conditions for Quartz-Seq2 (T100 buffer and 35ºC for 5 min, 50ºC for 50 min).
Details of the buffer compositions are presented in Additional file 5: Table S4. We estimated the relative efficiency of reverse transcription using the slope score of linear regression analysis by comparison between poly-dT conditions and random hexamer conditions. In the Quartz-Seq condition, the relative efficiency of reverse transcription was 48.48%. In the Quartz-Seq2 condition, the relative efficiency was 59.02%. (4) We then turned the reaction 384-well PCR plate upside down on the assembled collector type A. (5) Next, we set it on a centrifuge adaptor. (6) We then centrifuged the plate with the assemble collector at 3,010 g and 4ºC for 3 min with swing-bucket rotors. By using spin-down collection (type A), we collected 88.7±0.99% (n=3) of solution in 5 min from two to four plates.
We mainly used type A for cDNA collection. c) Assembly of collector type B. (0) By using type B, cDNA solution of 384 wells was collected and split into eight wells in a reservoir. (1) We prepared an eight-well disposable reservoir. (2) We put a joint-adaptor (384 Transfer Plate 1859-3845, Watson/Fukaekasei) around the eight-well reservoir. The joint-adaptor has 384 funnel-shaped wells, which enable liquid transfer.
(3) We turned the reaction 384-well PCR plate upside down on the assembled collector type B.
(4) We then fixed the assembled one using metal holders (Watson/Fukaekasei). (5) Next, we set it on a centrifuge adaptor. (6) We centrifuged the plate with the assemble collector at 3,010 g and 4ºC for 3 min with swing-bucket rotors. By using spin-down collection (type B), we collected 86.9±0.4% (n=2) of solution in 5 min from two to four plates. c) The improvement of cDNA yield with the combination of T55 buffer and "Increment" (T55+Inc) was confirmed by qPCR assay. We detected eight genes (Eef1b2, Nanog, Pou5f1, Sox2, Utf1, Spp1, Ywhae, Tbp) with amplified cDNA of Figure 2a and nonamplified cDNA. We prepared nonamplified cDNA with 200 pg of total RNA and random hexamer. SCC stands for Spearman's rank correlation coefficient.  by qPCR for the respective conditions (n=4). We show the relative qPCR score, which is normalized by RT100 with PCR buffer for the respective genes. c) We prepared first-strand cDNA using 200 pg of total RNA and poly-dT primer (mixture of v3.1) under serially diluted enzyme conditions (technical replicates, n=5). The detail of enzyme concentration is presented in the figure. We determined qPCR scores for ten genes (Eef1b2, Nanog, Pou5f1, Sox2, Fn1, Spp1, Utf1, Atp5a, Rex1, and Lefty2) using the above cDNA. For Figure 2b, we calculated the average relative qPCR score with 10 genes per technical replicate. d) Bar plots present the amount of amplified cDNA from three technical replicates of a 384-well plate with 10 pg of total RNA in all wells for the "RT100" and "RT25" conditions. The y-axis represents the yield of amplified cDNA. The cDNA yields from 10 pg of total RNA were 55.9±5.5 ng for "RT100" and 65.7±2.5 ng for "RT25." The presented p-value was obtained using two-tailed Welch's t-test.

Fig. S8. Low enzyme concentration in reverse transcription improved the variability of UMI counts and gene counts
For three conditions [Quartz-Seq-like, Quartz-Seq (RT100), Quartz-Seq (RT25)], we performed three batch experiments using all wells of a 384-well plate, which contained 10 pg of total RNA.
For Quartz-Seq2, we used different enzyme concentrations for reverse transcription. We used the following reverse-transcription premix: RT100 (2x Thermopol buffer, 20 units/µL SuperScript III, 2.2 units/µL RNasin plus) and RT25 (2x Thermopol buffer, 5 units/µL SuperScript III, 0.55 units/µL RNasin plus) for Quartz-Seq2. The presented p-value was obtained using two-tailed Welch's t-test between the RT100 condition (three 384-well plates) and the RT25 condition b) We analyzed the same sequence library DNA with NextSeq500 and HiSeq2500. Read2 lengths of NextSeq500 and HiSeq2500 were 62 and 98, respectively. Circles represent the data from NextSeq500. Triangles represent the data from HiSeq2500.

Fig. S9. Calculation of UMI conversion efficiency
We performed the conversion from fastq reads as initial reads to UMI counts via several steps.
We define the formula for calculating the UMI conversion efficiency (boxed region). Each parameter is defined as follows: UMI sc : the number of UMI count, assigned to a single-cell sample, fastq sc : the number of fastq reads derived from each single-cell sample, fastq non-sc : the number of fastq reads derived from non-single-cell samples, which include experimental byproducts such as WTA adaptors, WTA byproducts, and non-STAMPs. Initial fastq reads are composed of fastq sc and fastq non-sc .

Quartz-Seq2
We prepared sequence library DNA of Quartz-Seq2 from 10 pg of total RNA of four 384-well plates by using two different reverse-transcription premixes as follows: RT25 (2x Thermopol    All batch experiments can ensure 30,000 fastq reads/well on average (red line). We calculated the ERCC capture efficiency and detection limit for CEL-seq2(C1), CEL-seq2, and MARS-seq by using a digital expression matrix from Svensson et al. [2]. We observed that these performances strongly correlated with the numbers of initial fastq reads and samples. Nevertheless, Quartz-Seq2 had good performance equivalent to or higher than the range of fewer initial fastq reads.
b) Further comparison of the detection limit between Quartz-Seq2 and CEL-seq(C1). We reanalyzed CEL-seq2(C1) (n=96) with SRA files, which were also used by Svensson et al. [2][3], using our pipeline. First, we confirmed that the detection limit of reanalyzed CEL-seq2(C1) was not an underestimate compared with that of previously analyzed CEL-seq2(C1). We observed that the detection limit of all Quartz-Seq2 data from 25 batch experiments was lower than that of CEL-seq2 in the range of 30,000 initial fastq reads.  a) There was no obvious difference between the NextSeq500 platform and the HiSeq2500 platform for Quartz-Seq2. We analyzed the same sequence library DNA with these different platforms. We did not observe any clear differences in UMI count and gene count. b) We show the relative costs with the three sequence platforms for Quartz-Seq2. Read2 sequence of Quartz-Seq2 was used for the mapping to transcript. A Read2 length of 62 nt in NextSeq500 was the most cost-effective setting for Quartz-Seq2. c) Read2 length affected the unique mapping ratio. We prepared the three kinds of sequence library DNA from the same amplified cDNA of Quartz-Seq2. We prepared the library DNA using the ligation method and the Nextera method. We also prepared data comprising a mixture of ligation data and Nextera data.

Quartz-Seq2
We plotted the 4,484 cells on t-SNE space with color labeling for each 384-well PCR plates.

Fig. S16. Quality check of high proportion of mitochondrial RNA for a cell cluster
We show the UMI count, gene count, and the proportion of mitochondrial RNA for each cluster.
We present the mean UMI count for each cluster. In cluster 2, both marker genes (ES and PrE marker genes) were expressed. In addition, the UMI count of cluster 2 was about twice those of clusters 1 and 4. Therefore, cluster 2 was judged to represent doublets of an ES cell and a PrE cell. Moreover, we determined the UMI count derived from mitochondrial genes. We also calculated the proportion of mitochondrial RNA for respective cell clusters. In cluster 6, the proportion was higher than for the other cell clusters.      We present the gene expression using Quartz-Seq2 data. We used a previously reported list of marker genes (Thy1/Cd90, Eng/Cd105, Pdgfra, Lys6a/Sca1, Tnnt2, Nanog, Pou5f1, Sox2, Myog, Cd34, Cd31, Cd45) in MSCs.

Fig. S23. Expression profile of previously reported ASC marker genes for clusters 1 and 8
We present the gene expression using Quartz-Seq2 data. We used a previously reported list of marker genes in CD34+/CD55+/Dpp4+ adipose-derived stem cells (ASC). Almost all marker genes were enriched in cluster 1.