Skip to main content

Table 2 Short description of methods for the imputation of missing data in scRNA-seq data

From: Eleven grand challenges in single-cell data science

A: model-based imputation

bayNorm

Binomial model, empirical Bayes prior

[47]

BISCUIT

Gaussian model of log counts, cell- and cluster-specific parameters

[48]

CIDR

Decreasing logistic model (DO), non-linear least-squares regression (imp)

[49]

SAVER

NB model, Poisson LASSO regression prior

[50]

ScImpute

Mixture model (DO), non-negative least squares regression (imp)

[51]

scRecover

ZINB model (DO identification only)

[52]

VIPER

Sparse non-negative regression model

[53]

B: data smoothing

DrImpute

k-means clustering of PCs of correlation matrix

[54]

knn-smooth

k-nearest neighbor smoothing

[55]

LSImpute

Locality sensitive imputation

[56]

MAGIC

Diffusion across nearest neighbor graph

[57]

netSmooth

Diffusion across PPI network

[58]

C: data reconstruction, matrix factorization

ALRA

SVD with adaptive thresholding

[59]

ENHANCE

Denoising PCA with aggregation step

[60]

scRMD

Robust matrix decomposition

[61]

consensus NMF

Meta-analysis approach to NMF

[62]

f-scLVM

Sparse Bayesian latent variable model

[63]

GPLVM

Gaussian process latent variable model

[64]

pCMF

Probab. count matrix factorization with Poisson model

[65]

scCoGAPS

Extension of NMF

[66]

SDA

Sparse decomposition of arrays (Bayesian)

[67]

ZIFA

ZI factor analysis

[68]

ZINB-WaVE

ZINB factor model

[69]

C: data reconstruction, machine learning

AutoImpute

AE, no error back-propagation for zero counts

[70]

BERMUDA

AE for cluster batch correction (MMD and MSE loss function)

[71]

DeepImpute

AE, parallelized on gene subsets

[72]

DCA

Deep count AE (ZINB / NB model)

[73]

DUSC / DAWN

Denoising AE (PCA determines hidden layer size)

[74]

EnImpute

Ensemble learning consensus of other tools

[75]

Expression Saliency

AE (Poisson negative log-likelihood loss function)

[76]

LATE

Non-zero value AE (MSE loss function)

[77]

Lin_DAE

Denoising AE (imputation across k-nearest neighbor genes)

[78]

SAUCIE

AE (MMD loss function)

[79]

scScope

Iterative AE

[80]

scVAE

Gaussian-mixture VAE (NB / ZINB / ZIP model)

[81]

scVI

VAE (ZINB model)

[82]

scvis

VAE (objective function based on latent variable model and t-SNE)

[83]

VASC

VAE (denoising layer; ZI layer, double-exponential and Gumbel distribution)

[84]

Zhang_VAE

VAE (MMD loss function)

[85]

T: using external information

ADImpute

Gene regulatory network information

[86]

netSmooth

PPI network information

[58]

SAVER-X

Transfer learning with atlas-type resources

[87]

SCRABBLE

Matched bulk RNA-seq data

[88]

TRANSLATE

Transfer learning with atlas-type resources

[77]

URSM

Matched bulk RNA-seq data

[89]

  1. Imputation methods using only data from within a dataset are roughly categorized approaches A (model-based), B (data smoothing), and C (data reconstruction), with the latter further differentiated into matrix factorization and machine learning approaches. In contrast to these methods, those in category T (for transfer learning) also use information external to the dataset to be analyzed
  2. AE autoencoder, DO dropout, imp imputation, MMD maximum mean discrepancy, MSE mean squared error, NB negative binomial, NMF non-negative matrix factorization, P Poisson, PC principal component, PCA principal component analysis, PPI protein-protein interaction, SVD singular value decomposition, VAE variational autoencoder, ZI zero-inflated