CEL-Seq2: sensitive highly-multiplexed single-cell RNA-Seq
- Tamar Hashimshony†1,
- Naftalie Senderovich†1,
- Gal Avital1,
- Agnes Klochendler2,
- Yaron de Leeuw1,
- Leon Anavy1,
- Dave Gennert3, 4, 5,
- Shuqiang Li6,
- Kenneth J. Livak6,
- Orit Rozenblatt-Rosen3, 4, 5,
- Yuval Dor2,
- Aviv Regev3, 4, 5 and
- Itai Yanai1Email author
© Hashimshony et al. 2016
Received: 20 December 2015
Accepted: 8 April 2016
Published: 28 April 2016
Single-cell transcriptomics requires a method that is sensitive, accurate, and reproducible. Here, we present CEL-Seq2, a modified version of our CEL-Seq method, with threefold higher sensitivity, lower costs, and less hands-on time. We implemented CEL-Seq2 on Fluidigm’s C1 system, providing its first single-cell, on-chip barcoding method, and we detected gene expression changes accompanying the progression through the cell cycle in mouse fibroblast cells. We also compare with Smart-Seq to demonstrate CEL-Seq2’s increased sensitivity relative to other available methods. Collectively, the improvements make CEL-Seq2 uniquely suited to single-cell RNA-Seq analysis in terms of economics, resolution, and ease of use.
Single-cell transcriptomics is a transformative method with tremendous potential to illuminate the complexities of gene regulation. Single-cell RNA-Seq was first introduced by Tang et al. , using a polyT primer with an anchor sequence to select for the cell’s mRNA. After polyadenylation of the resulting cDNA, a second polyT primer with a different anchor is used to obtain double stranded DNA, which is then PCR-amplified. Each sample is individually converted to a library for sequencing. The STRT method introduced early barcoding at the reverse transcription stage , thereby enabling highly-multiplexed analyses, and adapted a template switching mechanism based on the ability of the reverse transcriptase to tag the end of the cDNA , eliminating the need for the polyadenylation reaction. Smart-Seq [4, 5] used the same template switching mechanism as STRT, but without the early barcoding. Each sample is processed individually, and the reaction was optimized for full transcript sequencing.
The CEL-Seq  method is the first method to use in vitro transcription (IVT) for the amplification, thereby eliminating the requirement for a template-switch step which is thought to reduce efficiency. We use early barcoding, enabling highly-multiplexed analysis, and 3′ end tagging enabling accurate estimation of expression levels without having to account for gene length and with fewer sequencing reads required. Here we introduce CEL-Seq2, which has been optimized for higher sensitivity, less hands-on time, and lower price. We show that CEL-Seq2 works well on different platforms, and compare it to previously published methods.
Results and discussion
CEL-Seq2 is optimized for higher sensitivity
First, we sought to increase the efficiency of the reverse transcription (RT) reaction by shortening the CEL-Seq primer from 92 to 82 nucleotides, despite the addition of six UMI nucleotides. This was done by reducing the length of the barcode from eight to six nucleotides, as well as shortening the T7 promoter and the Illumina 5’ adaptor. Use of the shortened primers indeed improved the sensitivity to 10.6 %, detecting more transcripts (Additional file 1: Figure S1). In this analysis on 100 pg of RNA, the number of detected genes also increased, though not significantly (Additional file 1: Figure S1a), which likely reflects that most of the additionally identified transcripts are of genes already detected using the longer primer.
We next optimized the conversion of RNA to dsDNA by testing alternative commercially available reverse transcriptases for cDNA synthesis and polymerases for second-strand synthesis. We found that SuperScript II for the RT step (Additional file 1: Figure S1d) provided a major improvement. For second strand synthesis, the differences were less pronounced, but the polymerase and other components from the SuperScript II Double-Stranded cDNA Synthesis Kit were better than its competitors (Additional file 1: Figure S1e). We also modified our method of dsDNA and aRNA clean-up from column to beads, which provided a threefold gain in yield (Additional file 1: Figure S1d). While CEL-Seq was originally implemented using the Ambion MessageAmp II aRNA Amplification Kit, these changes led to a kit-free (and therefor cheaper) process because the reagents of the SuperScript II Double-Stranded cDNA Synthesis Kit may be purchased separately.
Ligation-free library preparation improves read mapping
In the original CEL-Seq protocol, the aRNA is converted to a library compatible with Illumina sequencing by ligating the second adaptor . Following conversion to cDNA with RT, a few PCR cycles completed the attachment of Illumina adaptors. The ligation step is not efficient, however, and introduces primer dimers that interfere with sequencing. CEL-Seq2 remedies this by inserting the Illumina adaptor directly at the RT step as a 5′-tail attached to a random hexamer (Fig. 1a, change 5), thus eliminating the ligation step. We prepared a library introducing the Illumina adaptor by ligation or using the random hexamer (no ligation) from the same amplified RNA. Sequencing of the “no ligation” libraries yields 93.8 % mapping of the reads with barcodes, an improvement from 60.9 % in the ligated library (Fig. 1b). This modification also led to the identification of more genes and transcripts (Fig. 1c, d), suggesting that the removal of the ligation step significantly increased the sensitivity. To control for sequencing depth, we sub-sampled 300,000 reads from each sample, and obtained similar results (Additional file 2: Figure S2a, b). In addition, this modification reduces the hands-on time and cost of library preparation and it alleviates the need for Illumina’s TruSeq Small-RNA kit.
CEL-Seq2 is compatible with different platforms
Our implementation of CEL-Seq for individual tubes is easily scaled-up to plates and could be performed with robotic liquid handlers. To further improve the efficiency, we sought to implement CEL-Seq2 on the Fluidigm C1, a nanoliter automatic microfluidic instrument. After capture of individual cells, the C1 loads the individually barcoded CEL-Seq2 primers from the outlet wells, lyses the cells chemically (rather than our traditional lysis by freezing), and performs an RT reaction followed by second-strand synthesis. In place of a cDNA clean-up, the second strand enzymes are heat inactivated. IVT then occurs for each sample individually. The amplified RNA is harvested from the C1 chip and pooled to a single sample from which a library is prepared.
The increased sensitivity of CEL-Seq2 is observed both in terms of the increased detection of transcripts and genes. CEL-Seq2 identifies twice as many transcripts per cell compared to the original protocol (Fig. 2b) and 30 % more genes (Fig. 2c), and again on the C1 the protocol performance is even better. Examining the noise in gene expression levels across our samples, we see that genes identified by the original protocol show reduced levels of noise with CEL-Seq2 (Fig. 2d). In particular, for the spike-ins we find lower levels of noise for almost all 92 spiked-in RNA-species (Fig. 2e). Again, to control for differences in sequencing depth, we subsampled 300,000 reads from each sample and obtained similar results (Additional file 2: Figure S2c–g).
Using CEL-Seq2 to determine cell-cycle associated differences in the transcriptome
CEL-Seq2 shows better sensitivity and reproducibility than Smart-Seq
While Smart-Seq2 has an improved template switch step relative to Smart-Seq , previous work has also shown that C1-based transcriptomics shows increased sensitivity , thus making for an appropriate comparison. It is important to note that CEL-Seq2, as a 3′ tag method, differs from Smart-Seq, which produces full-length transcripts. The comparison presented in Fig. 4 is thus complicated by this basic difference. CEL-Seq2 does not provide information on most instances of splicing since it is strongly 3′-biased. However, the sensitivity and ability to individually count transcripts offer a clear advantage for most transcriptomics applications.
Collectively, CEL-Seq2 benefits from optimized primers, reagents, clean-up, and library preparation step. Together, these modifications greatly improve the quality of the data and make CEL-Seq2 more time- and cost-efficient. Furthermore, our Fluidigm C1-enabled cell-barcoding allows for a single library construction, instead of working individually to set up the library preparation for each cell. Our improvements will also be able to be implemented in the inDrop and Drop-Seq methods [15, 16].
CEL-Seq and CEL-Seq2
CEL-Seq was performed as previously described , with the exception that a 5 base UMI was added to the primer and the barcode length was reduced to 6 bases. For CEL-Seq2 the following modifications were introduced (see Additional file 4: Supplementary file 1 for a detailed protocol): (1) A new set of primers was used (Additional file 5: Table S2). Primers are shorter and include an UMI upstream of the barcode. (2) SuperScript® II Double-Stranded cDNA Synthesis Kit is used to convert the mRNA to double stranded DNA. Reagents can be purchased individually, see protocol. (3) Nucleic acid purification steps are performed with RNAClean and AMPure XP beads. (4) The aRNA was converted to cDNA using random priming. A random hexamer was used with a 5′-tail containing the Illumina 3′ adaptor sequence.
Six nucleotide barcodes were designed so that every pair of barcodes had a Hamming distance of at least 2 bases ensuring that a single sequencing error would not cause the read to be associated with a different sample. Each unique barcode was designed to have a GC content of 33–67 % to avoid low-complexity barcodes which may result in low sequencing quality, and to have the last nucleotide anything other than T. 168 unique barcodes matching these constraints were selected, although larger groups can be constructed.
Mouse ear fibroblasts were derived from 1-month-old CyclinB1-GFP mice , as previously described . Briefly, a small piece of mouse ear was collected and digested overnight in DMEM containing Collagenase/Dispase. The tissue was then dissociated by gentle pipetting and cells were washed in DMEM/10 % FBS/1 % Pen-Strep/1 % L-Glutamine, pelleted, seeded, and cultured for 2–4 days before sorting on a FACS ARIA (Becton Dickinson).
Chip priming and cell capture were performed according to the manufacturer’s instructions. After cell capture, CEL-Seq primers were loaded to each of the 96 outlets. The lysis mix was loaded to inlet # 3, RT mix to inlet #4, second strand mix to inlet #7, and IVT mix to inlet # 8, and the CEL-Seq program was run. The resulting aRNA was pooled from all outlets, bead purified, fragmented, and purified again. Library was prepared according to the CEL-Seq2 protocol.
Paired-end sequencing was performed on the Hiseq 2500 in rapid mode, 15 bases for read 1 (R1), 7 bases for the Illumina index, and 36 bases for read 2 (R2). The data have been deposited under GEO accession number GSE78779.
Expression analysis pipeline
CEL-Seq reads were processed into an expression matrix using a multistep, parallel computational pipeline within the Galaxy framework . For CEL-Seq2, we developed a new pipeline as a standalone lightweight python program allowing for a faster run-time. The CEL-Seq2 pipeline is compatible also with CEL-Seq reads. The pipeline is distributed under the GPLv3 license allowing others to further customize (https://github.com/yanailab/CEL-Seq-pipeline). The pipeline consists of the following steps: (1) Demultiplexing: using the barcode from R1 we split R2 reads into their original samples creating a separate file for each sample. Since the UMI is also read in R1 we extract it and attach it to the R2 read metadata for downstream analysis. (2) Mapping: using Bowtie2 , we map the reads of the different samples in parallel, cutting the analysis time by roughly the number of available cores. (3) Read counting: A modified version of the htseq-count script  (https://github.com/yanailab/CEL-Seq-pipeline) that supports the identification and elimination of reads sharing the same UMI to generate an accurate molecule count for each feature. As in Grun et al.  we use binomial statistics to convert the number of UMIs into transcript counts. The different steps in the pipeline are wrapped together in a single program with a simple configuration file allowing to control for different run modes. We include the C1 steps as Additional file 6: Supplementary file 2.
Smart-Seq C1 transcriptome data for single mouse dendritic cells were download from GEO under accession number GSE48968.
All animal experiments were performed in accordance with guidelines established by the joint ethics committee (IACUC) of the Hebrew University and Hadassah Medical Center and IACUC protocol number 0612-058-15 at MIT. The Hebrew University is an AAALAC International accredited institute.
Availability of data and materials
The data have been deposited under GEO accession number GSE78779. Our pipeline is available at https://github.com/yanailab/CEL-Seq-pipeline.
We thank the community of CEL-Seq users for useful feedback. We thank the Technion Genome Center for technical assistance.
The research leading to these results has received funding from the European Research Council under the European Union’s Seventh Framework Programme (FP/2007-2013) under grant agreement number 310927. This work was also supported by the Israel Science Foundation (grant 1457/14).
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
- Tang F, Barbacioru C, Wang Y, Nordman E, Lee C, Xu N, Wang X, Bodeau J, Tuch BB, Siddiqui A, Lao K, Surani MA. mRNA-Seq whole-transcriptome analysis of a single cell. Nat Methods. 2009;6:377–82.Google Scholar
- Islam S, Kjällquist U, Moliner A, Zajac P, Fan J-B, Lönnerberg P, Linnarsson S. Characterization of the single-cell transcriptional landscape by highly multiplex RNA-seq. Genome Res. 2011;21:1160–7.Google Scholar
- Zhu YY, Machleder EM, Chenchik A, Li R, Siebert PD. Reverse transcriptase template switching: a SMART approach for full-length cDNA library construction. Biotechniques. 2001;30:892–7.PubMedGoogle Scholar
- Ramsköld D, Luo S, Wang Y-C, Li R, Deng Q, Faridani OR, Daniels GA, Khrebtukova I, Loring JF, Laurent LC, Schroth GP, Sandberg R. Full-length mRNA-Seq from single-cell levels of RNA and individual circulating tumor cells. Nat Biotechnol. 2012;30:777–82.Google Scholar
- Picelli S, Björklund ÅK, Faridani OR, Sagasser S, Winberg G, Sandberg R. Smart-seq2 for sensitive full-length transcriptome profiling in single cells. Nat Methods. 2013;10:1096–8.View ArticlePubMedGoogle Scholar
- Hashimshony T, Wagner F, Sher N, Yanai I. CEL-Seq: single-cell RNA-Seq by multiplexed linear amplification. Cell Rep. 2012;2:666–73.View ArticlePubMedGoogle Scholar
- Grün D, Kester L, van Oudenaarden A. Validation of noise models for single-cell transcriptomics. Nat Methods. 2014;11:637–40.View ArticlePubMedGoogle Scholar
- Jaitin DA, Kenigsberg E, Keren-Shaul H, Elefant N, Paul F, Zaretsky I, Mildner A, Cohen N, Jung S, Tanay A, Amit I. Massively parallel single-cell RNA-seq for marker-free decomposition of tissues into cell types. Science. 2014;343:776–9.Google Scholar
- Hug H, Schuler R. Measurement of the number of molecules of a single mRNA species in a complex mRNA preparation. J Theor Biol. 2003;221:615–24.View ArticlePubMedGoogle Scholar
- Kivioja T, Vähärautio A, Karlsson K, Bonke M, Enge M, Linnarsson S, Taipale J. Counting absolute numbers of molecules using unique molecular identifiers. Nat Methods. 2012;9:72–4.Google Scholar
- Baker SC, Bauer SR, Beyer RP, Brenton JD, Bromley B, Burrill J, Causton H, Conley MP, Elespuru R, Fero M, Foy C, Fuscoe J, Gao X, Gerhold DL, Gilles P, Goodsaid F, Guo X, Hackett J, Hockett RD, Ikonomi P, Irizarry RA, Kawasaki ES, Kaysser-Kranich T, Kerr K, Kiser G, Koch WH, Lee KY, Liu C, Liu ZL, Lucas A,, et al. The External RNA Controls Consortium: a progress report. Nat Methods. 2005;2:731–4.Google Scholar
- Klochendler A, Weinberg-Corem N, Moran M, Swisa A, Pochet N, Savova V, Vikeså J, Van de Peer Y, Brandeis M, Regev A, Nielsen FC, Dor Y, Eden A. A transgenic mouse marking live replicating cells reveals in vivo transcriptional program of proliferation. Dev Cell. 2012;23:681–90.Google Scholar
- Shalek AK, Satija R, Shuga J, Trombetta JJ, Gennert D, Lu D, Chen P, Gertner RS, Gaublomme JT, Yosef N, Schwartz S, Fowler B, Weaver S, Wang J, Wang X, Ding R, Raychowdhury R, Friedman N, Hacohen N, Park H, May AP, Regev A. Single-cell RNA-seq reveals dynamic paracrine control of cellular variation. Nature. 2014;510:363–9.Google Scholar
- Grün D, van Oudenaarden A. Design and analysis of single-cell sequencing experiments. Cell. 2015;163:799–810.View ArticlePubMedGoogle Scholar
- Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, Peshkin L, Weitz DA, Kirschner MW. Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 2015;161:1187–201.Google Scholar
- Macosko EZ, Basu A, Satija R, Nemesh J, Shekhar K, Goldman M, Tirosh I, Bialas AR, Kamitaki N, Martersteck EM, Trombetta JJ, Weitz DA, Sanes JR, Shalek AK, Regev A, McCarroll SA. Highly parallel genome-wide expression profiling of individual cells using nanoliter droplets. Cell. 2015;161:1202–14.Google Scholar
- Shao C, Deng L, Henegariu O, Liang L, Raikwar N, Sahota A, Stambrook PJ, Tischfield JA. Mitotic recombination produces the majority of recessive fibroblast variants in heterozygous mice. Proc Natl Acad Sci U S A. 1999;96:9230–5.Google Scholar
- Langmead B, Salzberg SL. Fast gapped-read alignment with Bowtie 2. Nat Methods. 2012;9:357–9.View ArticlePubMedPubMed CentralGoogle Scholar
- Anders S, Pyl PT, Huber W. HTSeq - A Python framework to work with high-throughput sequencing data. Bioinformatics. 2015;31:166–9.View ArticlePubMedPubMed CentralGoogle Scholar