Skip to main content
Fig. 5 | Genome Biology

Fig. 5

From: MultiMAP: dimensionality reduction and integration of multimodal data

Fig. 5

Benchmarking MultiMAP against existing approaches. a Embeddings returned by multi-omic integration methods on different datasets. “X” indicates that the method terminated due to an out-of-memory error (218 GB RAM). b Comparison of each method in terms of transfer learning accuracy (“Transfer”), separation of cell type clusters as quantified by Silhouette coefficient (“Silhouette”), mixing of different datasets as measured by fraction of nearest neighbors that belong to a different dataset (“Alignment”), preservation of high-dimensional structure as measured by the Pearson correlation between distances in the high- and low-dimensional spaces (“Structure”), and runtime. c Wall-clock time of multi-omic integration methods on different sized datasets. Seurat, LIGER and iNMF produced out-of-memory errors when run on 500,000 data points (218 GB RAM). To produce these datasets we subsampled the mouse primary cortex scRNA-seq and scATAC-seq data [26] using geometric sketching [31]. The datasets were subsampled so that there are an equal number of cells in the scRNA-seq and scATAC-seq data until 100,000 cells. Since the scATAC-seq data had 81,196 cells in total, for the 500,000-cell comparison, we used an scRNA-seq of 418,804 cells. d Comparison of capabilities and properties of each method. “Mapping” refers to the nature of the mapping employed by the method; “Max no. datasets” refers to the upper limit in terms of numbers of datasets accepted by the method; “Scalable to large data” refers to allowing a total of over 500,000 cells; “Dataset-specific features” is whether the integration method allows information that is not shared across datasets; and “Dataset influence on integration” is whether the user can modulate the weighting of a given dataset relative to the others during the integration

Back to article page