Analytic Pearson residuals for normalization of single-cell RNA-seq UMI data

Table 2 Overview of UMI datasets used for analysis

reference	accession no.	protocol	cells	genes	species	description
33k PBMC	*	10X v1	33,148	16,809	Human	Peripheral blood mononuclear cells
[43]	SRP073767**	10X v1	3,994	15,715	Human	FACS-sorted PBMC cells
[52]	E-MTAB-5480	10X v2	2,000	13,025	Human	Droplets of bulk RNA solution
[53]	GSM1599501	inDrop	953	25,025	Human	Droplets of bulk RNA solution
[54]	GSM2906413	MicrowellSeq	9,994	15,069	Mouse	Non-differentiating stem cells
[33]	GSE63472	DropSeq	24,769	17,973	Mouse	Retinal cells
[37]	GSE81904	DropSeq	13,987	16,520	Mouse	Retinal bipolar cells
[38]	GSE133382	10X v2	15,750	17,685	Mouse	Retinal ganglion cells
[39]	GSE119945	sci-RNA-seq3	2,058,652	26,183	Mouse	Organogenesis of mouse embryo cells

In the 10X control dataset [52], we used only sample 1. In the MicrowellSeq control dataset [54], we used the E14 dataset. In the three retinal datasets [33, 37, 38], we only used cells from the largest batch. The FACS-sorted PBMC dataset was assembled by authors of a recent paper [42], based on a benchmarking dataset published earlier [43]. Numbers of genes and cells are after batch selection (where applicable) and initial gene filtering (see “Methods”). Scripts performing these operations and detailed download instructions for all materials are published in our Github repository at http://www.github.com/berenslab/umi-normalization. The accession numbers refer to archived datasets at the Gene Expression Omnibus (NCBI), the Sequence Read Archive (NCBI), or ArrayExpress (EMBL-EBI). *Data directly obtained from 10X Genomics at https://support.10xgenomics.com/single-cell-gene-expression/datasets/1.1.0/pbmc33k. **The accession number links to the base dataset that the original authors used to construct the ground truth dataset for their paper [42]. To obtain the dataset used here, use the Bioconductor 3.1.3 R package DuoClustering2018 or visit the authors’ website (http://imlspenticton.uzh.ch/robinson_lab/DuoClustering2018/)

ISSN: 1474-760X