Skip to main content

Table 2 Overview of UMI datasets used for analysis

From: Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data

reference accession no. protocol cells genes species description
33k PBMC * 10X v1 33,148 16,809 Human Peripheral blood mononuclear cells
[43] SRP073767** 10X v1 3,994 15,715 Human FACS-sorted PBMC cells
[52] E-MTAB-5480 10X v2 2,000 13,025 Human Droplets of bulk RNA solution
[53] GSM1599501 inDrop 953 25,025 Human Droplets of bulk RNA solution
[54] GSM2906413 MicrowellSeq 9,994 15,069 Mouse Non-differentiating stem cells
[33] GSE63472 DropSeq 24,769 17,973 Mouse Retinal cells
[37] GSE81904 DropSeq 13,987 16,520 Mouse Retinal bipolar cells
[38] GSE133382 10X v2 15,750 17,685 Mouse Retinal ganglion cells
[39] GSE119945 sci-RNA-seq3 2,058,652 26,183 Mouse Organogenesis of mouse embryo cells
  1. In the 10X control dataset [52], we used only sample 1. In the MicrowellSeq control dataset [54], we used the E14 dataset. In the three retinal datasets [33, 37, 38], we only used cells from the largest batch. The FACS-sorted PBMC dataset was assembled by authors of a recent paper [42], based on a benchmarking dataset published earlier [43]. Numbers of genes and cells are after batch selection (where applicable) and initial gene filtering (see “Methods”). Scripts performing these operations and detailed download instructions for all materials are published in our Github repository at The accession numbers refer to archived datasets at the Gene Expression Omnibus (NCBI), the Sequence Read Archive (NCBI), or ArrayExpress (EMBL-EBI). *Data directly obtained from 10X Genomics at **The accession number links to the base dataset that the original authors used to construct the ground truth dataset for their paper [42]. To obtain the dataset used here, use the Bioconductor 3.1.3 R package DuoClustering2018 or visit the authors’ website (