Skip to main content

Table 2 Short description of methods for the imputation of missing data in scRNA-seq data

From: Eleven grand challenges in single-cell data science

A: model-based imputation
bayNormBinomial model, empirical Bayes prior[47]
BISCUITGaussian model of log counts, cell- and cluster-specific parameters[48]
CIDRDecreasing logistic model (DO), non-linear least-squares regression (imp)[49]
SAVERNB model, Poisson LASSO regression prior[50]
ScImputeMixture model (DO), non-negative least squares regression (imp)[51]
scRecoverZINB model (DO identification only)[52]
VIPERSparse non-negative regression model[53]
B: data smoothing
DrImputek-means clustering of PCs of correlation matrix[54]
knn-smoothk-nearest neighbor smoothing[55]
LSImputeLocality sensitive imputation[56]
MAGICDiffusion across nearest neighbor graph[57]
netSmoothDiffusion across PPI network[58]
C: data reconstruction, matrix factorization
ALRASVD with adaptive thresholding[59]
ENHANCEDenoising PCA with aggregation step[60]
scRMDRobust matrix decomposition[61]
consensus NMFMeta-analysis approach to NMF[62]
f-scLVMSparse Bayesian latent variable model[63]
GPLVMGaussian process latent variable model[64]
pCMFProbab. count matrix factorization with Poisson model[65]
scCoGAPSExtension of NMF[66]
SDASparse decomposition of arrays (Bayesian)[67]
ZIFAZI factor analysis[68]
ZINB-WaVEZINB factor model[69]
C: data reconstruction, machine learning
AutoImputeAE, no error back-propagation for zero counts[70]
BERMUDAAE for cluster batch correction (MMD and MSE loss function)[71]
DeepImputeAE, parallelized on gene subsets[72]
DCADeep count AE (ZINB / NB model)[73]
DUSC / DAWNDenoising AE (PCA determines hidden layer size)[74]
EnImputeEnsemble learning consensus of other tools[75]
Expression SaliencyAE (Poisson negative log-likelihood loss function)[76]
LATENon-zero value AE (MSE loss function)[77]
Lin_DAEDenoising AE (imputation across k-nearest neighbor genes)[78]
SAUCIEAE (MMD loss function)[79]
scScopeIterative AE[80]
scVAEGaussian-mixture VAE (NB / ZINB / ZIP model)[81]
scVIVAE (ZINB model)[82]
scvisVAE (objective function based on latent variable model and t-SNE)[83]
VASCVAE (denoising layer; ZI layer, double-exponential and Gumbel distribution)[84]
Zhang_VAEVAE (MMD loss function)[85]
T: using external information
ADImputeGene regulatory network information[86]
netSmoothPPI network information[58]
SAVER-XTransfer learning with atlas-type resources[87]
SCRABBLEMatched bulk RNA-seq data[88]
TRANSLATETransfer learning with atlas-type resources[77]
URSMMatched bulk RNA-seq data[89]
  1. Imputation methods using only data from within a dataset are roughly categorized approaches A (model-based), B (data smoothing), and C (data reconstruction), with the latter further differentiated into matrix factorization and machine learning approaches. In contrast to these methods, those in category T (for transfer learning) also use information external to the dataset to be analyzed
  2. AE autoencoder, DO dropout, imp imputation, MMD maximum mean discrepancy, MSE mean squared error, NB negative binomial, NMF non-negative matrix factorization, P Poisson, PC principal component, PCA principal component analysis, PPI protein-protein interaction, SVD singular value decomposition, VAE variational autoencoder, ZI zero-inflated