From: Statistical and machine learning methods for spatially resolved transcriptomics data analysis
Name | Algorithms | Application scenarios | Advantages | Disadvantages |
---|---|---|---|---|
SpatialDWLS [23] | Weighted least squares | Spatial decomposition | Higher accuracy and faster than benchmarked tools | High bias in estimating the proportion of rare cell types |
SPOTlight [24] | Seeded NMF regression | Spatial decomposition | High accuracy across multiple tissues | Does not incorporate capture location information to model spatial decomposition |
RCTD [25] | Poisson distribution with MLE | Spatial decomposition | Systematically models platform effect | Assumes that platform effects are shared among cell types |
stereoscope [26] | Negative binomial distribution with MAP | Spatial decomposition | Utilizes complete expression profiles rather than selected marker genes to achieve a higher accuracy | Requires deep sequencing depth |
DSTG [27] | Semi-supervised GCN | Spatial decomposition | Higher accuracy than benchmarked tools | Highly dependent on the quality of the link graph that models the GCN |
ProximID [28] | Cluster label permutations | Cell-cell/gene-gene interactions | Does not require to physically separate the cells in FISH images | Cannot detect interactions that are not physically attached |
MISTy [29] | Multi-view framework to dissect effects related to CCI | Cell-cell/gene-gene interactions | 1. Does not require cell type annotation 2. Utilizes complete expression profiles | The extracted interactions cannot be directly considered as causal |
stLearn [30] | A toolbox containing integrated algorithms from multiple studies | 1.Cell-cell/gene-gene interactions 2. Spatial clustering 3. Cell trajectories inference | A streamlined package from raw inputs to in-depth downstream analysis | Only compatible with certain ST platforms |
SVCA [31] | Gaussian processes | Cell-cell/gene-gene interactions | Is applicable to both RNA-seq and proteomic data | Does not account for technology-specific noise |
GCNG [32] | GCN | Cell-cell/gene-gene interactions | Can infer novel CCIs and predict novel functional genes | The hyperparameters need to be re-optimized when applied to different datasets |
Seurat V3 [33] | Analysis pipelines with integrated algorithms | 1. Gene imputation 2. Spatial location reconstruction for scRNA-seq data 3. Others | 1. A comprehensive data analysis pipeline 2. Can be applied to multi-omics datasets, including transcriptomic, epigenomic, proteomic, and spatially resolved single-cell data | Only available for certain types of ST platforms |
LIGER [34] | Integrative NMF | 1. Gene imputation 2. Spatial location reconstruction for scRNA-seq data | The embeddings maintain both common and dataset-specific terms | Memory intensive compared to benchmarked tools |
SpaGE [35] | Domain adaptation model to align ST and scRNA-seq data to a common space | 1. Gene imputation 2. Spatial location reconstruction for scRNA-seq data | Less memory usage and faster than benchmarked tools in large datasets | Only common genes in both datasets are included in the model |
stPlus [36] | Autoencoder model for dimensional reduction to map ST and scRNA-seq data into a shared space | Gene imputation | 1. Higher accuracy than benchmarked tools in cell type clustering 2. Less time and memory usage than most benchmarked tools other than SpaGE [35] when applied to large datasets | Only applicable to data from image-based sequencing platforms |
gimVI [37] | Variational autoencoders for dimensional reduction to map ST and scRNA-seq data into a shared space | 1. Gene imputation 2. Dimensional reduction and feature extraction | Generates platform-specific patterns in the model for better biological interpretability | Slower than benchmarked tools in large datasets |
Harmony [38] | Maximum diversity clustering and mixture model based batch correction | 1. Gene imputation 2. Spatial location reconstruction for scRNA-seq data | Can impute low abundant genes with high accuracy | The embeddings lack biological interpretability |
DEEPsc [39] | ANN | Gene imputation | A system-adaptive method specifically designed for gene imputation | Does not incorporate spatial information into the computation |
Trendsceek [40] | Marked point process | Identify SVGs | Does not need to specify a distribution or a spatial region of interest | Limited to a single gene at a time, computationally intensive |
SpatialDE [41] | Gaussian process regression | Identify SVGs | Can detect both temporal and periodic gene expression patterns for SVG identification | Does not identify spatial regions with distinct expression patterns, computationally intensive |
SPARK [42] | Generalized linear spatial models | 1. Identify SVGs 2. Spatial location reconstruction for scRNA-seq data | 1. Low false discovery rate 2. Does not require the user to preprocess the raw count matrix | The hyperparameters (kernels and weights) need to be re-optimized when applied to different datasets |
SpaGCN [43] | GCN | 1. Identify SVGs 2. Spatial location reconstruction for scRNA-seq data | Jointly identifies SVGs and spatial domains | Does not incorporate cell type information and tissue anatomical structure into the computation |
SPARK-X [44] | Non-parametric covariance test | 1. Identify SVGs 2. Spatial location reconstruction for scRNA-seq data | Less time and memory usage and lower false discovery rate than most benchmarked tools, especially in large-scale and sparse ST data | Accuracy varies on different similarity measurements and covariance functions |
sepal [45] | Diffusion model | 1. Identify SVGs 2. Spatial location reconstruction for scRNA-seq data | Can detect genes with irregular spatial patterns | Has CPU parallelization, but no GPU acceleration |
GLISS [46] | Graph Laplacian-based model | 1. Identify SVGs 2. Spatial location reconstruction for scRNA-seq data | Does not need to make distributional assumptions for either spatial or scRNA-seq data | Requires pre-specified landmark genes either manually or through other algorithms |
Zhu et al. [47] | HMRF | 1. Profile localized gene expression pattern 2. Identify SVGs 3. Identify interactions between cell type and spatial environment | Can identify de novo spatially associated subpopulations | Only available for in situ hybridization datasets |
BayesSpace [48] | Bayesian statistical method | 1. Profile localized gene expression pattern to enhance ST data resolution 2. Spatial clustering | Does not require independent single-cell data | Only considers the neighborhood structure present in data from ST and Visium platforms |
Bergenstråhle et al. [49] | Deep generative model | Gene expression prediction from histology images | Available for gene expression inference at transcriptome-wide level in histology images | Only in situ RNA capturing technologies are available |
Seurat V1 [50] | L1-constrained linear model | 1. Spatial location reconstruction for scRNA-seq data 2. Gene imputation | The idea of landmark genes allows the use of a small number of genes for spatial location reconstruction | Need to pre-compute the positions of landmark genes |
CSOmap [51] | Reconstructs cellular spatial organization based on cell-cell affinity by ligand-receptor interactions | 1. Identify cell-cell/gene-gene interactions 2. Spatial location reconstruction for scRNA-seq data | Does not need to predefine the tissue shape for cell-cell interaction inference Does not need to pre-define landmark gene sets | The extracted spatial structure is a pseudo-space structure |
DistMap [52] | Mapping scores to measure the similarity between spatial and scRNA-seq data | Construct 3D gene expression blueprint for the Drosophila embryo | High accuracy with only 84 in situ suffices | Gene regulation can be considered as the in situ suffices to improve the accuracy of model |
Peng et al. [53] | Spearman rank correlation to measure the similarity between spatial and scRNA-seq data | Spatial location reconstruction for scRNA-seq data | High accuracy with a small number of genes and cells required | No benchmark studies for accuracy comparison |
Achim et al. [54] | Measure correlations between spatial and scRNA-seq data | Spatial location reconstruction for scRNA-seq data | Most cells can be mapped with high confidence with only a small number of marker genes (~ 50 to 100) | Need to filter low-quality genes before modeling |
SpaOTsc [55] | Structured optimal transport model | 1. Spatial location reconstruction for scRNA-seq data 2. Cell-cell/gene-gene interactions 3. Identify gene pairs that potentially intercellularly regulate each other | 1. Most cells can be accurately mapped with only a small number of genes 2. Can identify intercellular gene-gene regulatory information | Does not consider the time delay (including the diffusion time of ligand or the reacting time of the intracellular cascades) that may take place in cell-cell communication |
novoSpaRc [56] | Generalized optimal-transport model | Spatial location reconstruction for scRNA-seq data | Does not need to specify landmark genes for alignment | The accuracy can be promoted by using different loss functions |
Tangram [57] | Non-convex optimization by deep learning methods for spatial alignment | 1. Spatial location reconstruction for scRNA-seq data 2. Spatial decomposition 3. Gene imputation from histology data | Is compatible with both capture-based and image-based ST data | Histology gene expression prediction is less accurate if cells cannot be segmented in the images |
Cell2location [58] | Hierarchical Bayesian framework | 1. Spatial location reconstruction for scRNA-seq data 2. Spatial decomposition | Capable of inferring the absolute number of cells per cell type for each capture location | Hyperparameters to be pre-specified are often unknown by the user |
SC-MEB [59] | HMRF based on empirical Bayes | Spatially clustering | Faster and more accurate than benchmarked tools, especially in large datasets | The assumption of a fixed hexagonal neighborhood structure in the model may not maintain high accuracy for all ST platforms |
STAGATE [60] | Graph attention auto-encoder | 1. Spatially clustering 2. Identify SVGs | Can be applied to three-dimensional ST datasets | The boundary of two sections needs to be further refined |
MULTILAYER [61] | Agglomerative clustering of quantile normalized ST data | 1. Spatially clustering 2. Identify SVGs | Higher accuracy than benchmarked tools when applied to data from different ST platforms | Sensitive to ST data with low spatial resolution |
HisToGene [62] | Attention-based (vision transformer) model | Gene expression prediction from histology images | Can predict the gene expression in histology images at capture location level | Requires a large number of tissue samples for model training |
STARCH [63] | HMRF and HMM | Infer copy number aberrations | Higher accuracy than benchmarked tools in predicting CNAs in spatial datasets | A limited number of CNV states (deletion, neutral, amplification) are considered |
Giotto [64] | A toolbox containing integrated algorithms from multiple studies | A comprehensive toolbox for ST analysis and visualization | Offers comprehensive pipelines for ST data analysis | Only available for some ST platforms |