Skip to main content

Table 1 A summary of algorithms, application scenarios, advantages, and disadvantages of the reviewed methods

From: Statistical and machine learning methods for spatially resolved transcriptomics data analysis

Name

Algorithms

Application scenarios

Advantages

Disadvantages

SpatialDWLS [23]

Weighted least squares

Spatial decomposition

Higher accuracy and faster than benchmarked tools

High bias in estimating the proportion of rare cell types

SPOTlight [24]

Seeded NMF regression

Spatial decomposition

High accuracy across multiple tissues

Does not incorporate capture location information to model spatial decomposition

RCTD [25]

Poisson distribution with MLE

Spatial decomposition

Systematically models platform effect

Assumes that platform effects are shared among cell types

stereoscope [26]

Negative binomial distribution with MAP

Spatial decomposition

Utilizes complete expression profiles rather than selected marker genes to achieve a higher accuracy

Requires deep sequencing depth

DSTG [27]

Semi-supervised GCN

Spatial decomposition

Higher accuracy than benchmarked tools

Highly dependent on the quality of the link graph that models the GCN

ProximID [28]

Cluster label permutations

Cell-cell/gene-gene interactions

Does not require to physically separate the cells in FISH images

Cannot detect interactions that are not physically attached

MISTy [29]

Multi-view framework to dissect effects related to CCI

Cell-cell/gene-gene interactions

1. Does not require cell type annotation

2. Utilizes complete expression profiles

The extracted interactions cannot be directly considered as causal

stLearn [30]

A toolbox containing integrated algorithms from multiple studies

1.Cell-cell/gene-gene interactions

2. Spatial clustering

3. Cell trajectories inference

A streamlined package from raw inputs to in-depth downstream analysis

Only compatible with certain ST platforms

SVCA [31]

Gaussian processes

Cell-cell/gene-gene interactions

Is applicable to both RNA-seq and proteomic data

Does not account for technology-specific noise

GCNG [32]

GCN

Cell-cell/gene-gene interactions

Can infer novel CCIs and predict novel functional genes

The hyperparameters need to be re-optimized when applied to different datasets

Seurat V3 [33]

Analysis pipelines with integrated algorithms

1. Gene imputation

2. Spatial location reconstruction for scRNA-seq data

3. Others

1. A comprehensive data analysis pipeline

2. Can be applied to multi-omics datasets, including transcriptomic, epigenomic, proteomic, and spatially resolved single-cell data

Only available for certain types of ST platforms

LIGER [34]

Integrative NMF

1. Gene imputation

2. Spatial location reconstruction for scRNA-seq data

The embeddings maintain both common and dataset-specific terms

Memory intensive compared to benchmarked tools

SpaGE [35]

Domain adaptation model to align ST and scRNA-seq data to a common space

1. Gene imputation 2. Spatial location reconstruction for scRNA-seq data

Less memory usage and faster than benchmarked tools in large datasets

Only common genes in both datasets are included in the model

stPlus [36]

Autoencoder model for dimensional reduction to map ST and scRNA-seq data into a shared space

Gene imputation

1. Higher accuracy than benchmarked tools in cell type clustering

2. Less time and memory usage than most benchmarked tools other than SpaGE [35] when applied to large datasets

Only applicable to data from image-based sequencing platforms

gimVI [37]

Variational autoencoders for dimensional reduction to map ST and scRNA-seq data into a shared space

1. Gene imputation

2. Dimensional reduction and feature extraction

Generates platform-specific patterns in the model for better biological interpretability

Slower than benchmarked tools in large datasets

Harmony [38]

Maximum diversity clustering and mixture model based batch correction

1. Gene imputation

2. Spatial location reconstruction for scRNA-seq data

Can impute low abundant genes with high accuracy

The embeddings lack biological interpretability

DEEPsc [39]

ANN

Gene imputation

A system-adaptive method specifically designed for gene imputation

Does not incorporate spatial information into the computation

Trendsceek [40]

Marked point process

Identify SVGs

Does not need to specify a distribution or a spatial region of interest

Limited to a single gene at a time, computationally intensive

SpatialDE [41]

Gaussian process regression

Identify SVGs

Can detect both temporal and periodic gene expression patterns for SVG identification

Does not identify spatial regions with distinct expression patterns, computationally intensive

SPARK [42]

Generalized linear spatial models

1. Identify SVGs

2. Spatial location reconstruction for scRNA-seq data

1. Low false discovery rate

2. Does not require the user to preprocess the raw count matrix

The hyperparameters (kernels and weights) need to be re-optimized when applied to different datasets

SpaGCN [43]

GCN

1. Identify SVGs

2. Spatial location reconstruction for scRNA-seq data

Jointly identifies SVGs and spatial domains

Does not incorporate cell type information and tissue anatomical structure into the computation

SPARK-X [44]

Non-parametric covariance test

1. Identify SVGs

2. Spatial location reconstruction for scRNA-seq data

Less time and memory usage and lower false discovery rate than most benchmarked tools, especially in large-scale and sparse ST data

Accuracy varies on different similarity measurements and covariance functions

sepal [45]

Diffusion model

1. Identify SVGs

2. Spatial location reconstruction for scRNA-seq data

Can detect genes with irregular spatial patterns

Has CPU parallelization, but no GPU acceleration

GLISS [46]

Graph Laplacian-based model

1. Identify SVGs

2. Spatial location reconstruction for scRNA-seq data

Does not need to make distributional assumptions for either spatial or scRNA-seq data

Requires pre-specified landmark genes either manually or through other algorithms

Zhu et al. [47]

HMRF

1. Profile localized gene expression pattern

2. Identify SVGs

3. Identify interactions between cell type and spatial environment

Can identify de novo spatially associated subpopulations

Only available for in situ hybridization datasets

BayesSpace [48]

Bayesian statistical method

1. Profile localized gene expression pattern to enhance ST data resolution

2. Spatial clustering

Does not require independent single-cell data

Only considers the neighborhood structure present in data from ST and Visium platforms

Bergenstråhle et al. [49]

Deep generative model

Gene expression prediction from histology images

Available for gene expression inference at transcriptome-wide level in histology images

Only in situ RNA capturing technologies are available

Seurat V1 [50]

L1-constrained linear model

1. Spatial location reconstruction for scRNA-seq data

2. Gene imputation

The idea of landmark genes allows the use of a small number of genes for spatial location reconstruction

Need to pre-compute the positions of landmark genes

CSOmap [51]

Reconstructs cellular spatial organization based on cell-cell affinity by ligand-receptor interactions

1. Identify cell-cell/gene-gene interactions

2. Spatial location reconstruction for scRNA-seq data

Does not need to predefine the tissue shape for cell-cell interaction inference

Does not need to pre-define landmark gene sets

The extracted spatial structure is a pseudo-space structure

DistMap [52]

Mapping scores to measure the similarity between spatial and scRNA-seq data

Construct 3D gene expression blueprint for the Drosophila embryo

High accuracy with only 84 in situ suffices

Gene regulation can be considered as the in situ suffices to improve the accuracy of model

Peng et al. [53]

Spearman rank correlation to measure the similarity between spatial and scRNA-seq data

Spatial location reconstruction for scRNA-seq data

High accuracy with a small number of genes and cells required

No benchmark studies for accuracy comparison

Achim et al. [54]

Measure correlations between spatial and scRNA-seq data

Spatial location reconstruction for scRNA-seq data

Most cells can be mapped with high confidence with only a small number of marker genes (~ 50 to 100)

Need to filter low-quality genes before modeling

SpaOTsc [55]

Structured optimal transport model

1. Spatial location reconstruction for scRNA-seq data

2. Cell-cell/gene-gene interactions

3. Identify gene pairs that potentially intercellularly regulate each other

1. Most cells can be accurately mapped with only a small number of genes

2. Can identify intercellular gene-gene regulatory information

Does not consider the time delay (including the diffusion time of ligand or the reacting time of the intracellular cascades) that may take place in cell-cell communication

novoSpaRc [56]

Generalized optimal-transport model

Spatial location reconstruction for scRNA-seq data

Does not need to specify landmark genes for alignment

The accuracy can be promoted by using different loss functions

Tangram [57]

Non-convex optimization by deep learning methods for spatial alignment

1. Spatial location reconstruction for scRNA-seq data

2. Spatial decomposition

3. Gene imputation from histology data

Is compatible with both capture-based and image-based ST data

Histology gene expression prediction is less accurate if cells cannot be segmented in the images

Cell2location [58]

Hierarchical Bayesian framework

1. Spatial location reconstruction for scRNA-seq data

2. Spatial decomposition

Capable of inferring the absolute number of cells per cell type for each capture location

Hyperparameters to be pre-specified are often unknown by the user

SC-MEB [59]

HMRF based on empirical Bayes

Spatially clustering

Faster and more accurate than benchmarked tools, especially in large datasets

The assumption of a fixed hexagonal neighborhood structure in the model may not maintain high accuracy for all ST platforms

STAGATE [60]

Graph attention auto-encoder

1. Spatially clustering

2. Identify SVGs

Can be applied to three-dimensional ST datasets

The boundary of two sections needs to be further refined

MULTILAYER [61]

Agglomerative clustering of quantile normalized ST data

1. Spatially clustering

2. Identify SVGs

Higher accuracy than benchmarked tools when applied to data from different ST platforms

Sensitive to ST data with low spatial resolution

HisToGene [62]

Attention-based (vision transformer) model

Gene expression prediction from histology images

Can predict the gene expression in histology images at capture location level

Requires a large number of tissue samples for model training

STARCH [63]

HMRF and HMM

Infer copy number aberrations

Higher accuracy than benchmarked tools in predicting CNAs in spatial datasets

A limited number of CNV states (deletion, neutral, amplification) are considered

Giotto [64]

A toolbox containing integrated algorithms from multiple studies

A comprehensive toolbox for ST analysis and visualization

Offers comprehensive pipelines for ST data analysis

Only available for some ST platforms

  1. Abbreviations: MLE maximum-likelihood estimation, MAP maximum a posteriori, GCN graph convolutional network, GNN graph neural network, NMF non-negative matrix factorization, PCA principal components analysis, HMRF hidden Markov random field, ANN artificial neural network, MCC Matthews correlation coefficient, HMM hidden Markov model, SVG spatially variable gene, CNA copy number alteration, CNV copy number variation, ST spatial transcriptomics, CCI cell-cell interaction, FISH fluorescence in situ hybridization