Skip to main content

Table 2 Clarification of zero-related terminology

From: Statistics or biology: the zero-inflation controversy about scRNA-seq data

In the current scRNA-seq literature, much ambiguity exists in the use of terms including “dropouts”, “excess zeros”, and “zero inflation” to describe the prevalence of zeros in scRNA-seq data [94]. We clarify the three terms by summarizing their various uses in the scRNA-seq field to facilitate our discussion.

Dropout or dropouts are widely used regarding the prevalence of zeros in scRNA-seq data. It was first introduced in the SCDE method paper: “dropout describes zero gene expression for the genes that show moderate or high expressions in only a proportion of cells [38]”. Hence, dropouts, as a data-driven concept, are not equivalent to either biological or non-biological zeros. Nevertheless, the use of “dropouts” in later papers became inconsistent and confusing: most papers meant non-biological zeros [20, 36, 40, 52, 55, 95, 96]; some meant non-biological zeros and low expression measurements [45, 97]; some meant all zeros [46, 47, 98]. In addition, “dropout” was often used as an adjective to mean the existence of many zeros [99]. Such inconsistent uses of “dropouts” are emphasized in a recent work [94]. To avoid possible confusion, we will not use “dropout” or “dropouts” in the following text.

Excess zeros are used in various ways: some papers referred to the larger proportion of zeros in scRNA-seq data than in bulk RNA-seq data [40]; some meant non-biological zeros [45, 96]; some meant the additional zeros that cannot be explained by the negative binomial (NB) model [97]. To avoid confusion, we will not use “excess zeros” in the following text.

Zero inflation, unlike the first two terms, is a statistical concept that depends on a specified model, i.e., a count distribution such as the Poisson distribution and the NB distribution [95]. It means the proportion of zeros that exceeds what is expected under the specified model [40]. We will use “zero inflation” in the following discussion because its definition has no ambiguity.