 Research
 Open Access
 Published:
A benchmark study of deep learningbased multiomics data fusion methods for cancer
Genome Biology volume 23, Article number: 171 (2022)
Abstract
Background
A fused method using a combination of multiomics data enables a comprehensive study of complex biological processes and highlights the interrelationship of relevant biomolecules and their functions. Driven by highthroughput sequencing technologies, several promising deep learning methods have been proposed for fusing multiomics data generated from a large number of samples.
Results
In this study, 16 representative deep learning methods are comprehensively evaluated on simulated, singlecell, and cancer multiomics datasets. For each of the datasets, two tasks are designed: classification and clustering. The classification performance is evaluated by using three benchmarking metrics including accuracy, F1 macro, and F1 weighted. Meanwhile, the clustering performance is evaluated by using four benchmarking metrics including the Jaccard index (JI), Cindex, silhouette score, and Davies Bouldin score. For the cancer multiomics datasets, the methods’ strength in capturing the association of multiomics dimensionality reduction results with survival and clinical annotations is further evaluated. The benchmarking results indicate that moGAT achieves the best classification performance. Meanwhile, efmmdVAE, efVAE, and lfmmdVAE show the most promising performance across all complementary contexts in clustering tasks.
Conclusions
Our benchmarking results not only provide a reference for biomedical researchers to choose appropriate deep learningbased multiomics data fusion methods, but also suggest the future directions for the development of more effective multiomics data fusion methods. The deep learning frameworks are available at https://github.com/zhenglinyi/DLmo.
Background
Advances in highthroughput techniques have led to an explosion of multiomics data in biomedical research. Each type of omics data helps researchers to understand the complex biological systems from different perspectives, such as genomics, transcriptomics, proteomics, and metabolomics [1]. Researchers have utilized omics data to address key biomedical problems, such as personalized complex disease therapy [2, 3], drug discovery [4, 5], and cancer drug target discovery [6, 7]. Multiomics data allow researchers to comprehensively understand biologic systems from different aspects because these omics have complementary roles and work together to perform a certain biological function. However, multiomics data are complex, highdimensional, and heterogeneous [8, 9], and it is challenging to extract valuable knowledge from these multiomics data. To address this challenge, various methods have been developed, such as multiple kernel learning, Bayesian consensus clustering, machine learning (ML)based dimensionality reduction, similarity network fusion, and deep learning (DL) methods [10, 11].
Some researchers reviewed and tested several traditional ML algorithms from a data fusion perspective [10, 12,13,14,15]. Rappoport et al. [10] evaluated the methods including multiple kernel learning, Bayesian consensus clustering, MLbased dimension reduction, and similarity network fusion. Tini et al. [15] and PierreJean et al. [14] evaluated the methods including Bayesian consensus clustering, MLbased dimension reduction, and similarity network fusion. Cantini et al. tested and discussed nine joint dimensionality reduction methods [12]. Chauvel et al. focused on the Bayesian consensus clustering and dimension reduction methods [13]. According to the above evaluations, each of these ML methods performs differently on different datasets and tasks. As a rapidly developing branch in the field of ML, DL utilizes efficient algorithms to process complex, highdimensional, and heterogeneous data. Compared to traditional ML algorithms, DL can better capture nonlinearities and complex relationships in multiomics data. However, few benchmark studies comprehensively compare the performance of various DL methods.
This paper evaluated the performance of 16 representative and opensource models from all DLbased data fusion methods on three different datasets, i.e., simulated multiomics datasets, singlecell multiomics datasets, and cancer multiomics datasets. These 16 models were grouped into two categories: supervised models (six) and unsupervised models (ten). Accordingly, for each of the datasets, two tasks were designed: classification and clustering. For simulated and singlecell datasets, groundtruth samples were retrieved through classification and clustering by using six supervised models and ten unsupervised models, respectively. For cancer datasets, supervised DL methods were evaluated in the classification tasks on five types of cancer datasets with groundtruth cancer subtypes. Meanwhile, unsupervised DL methods were evaluated in the clustering tasks. Furthermore, the associations of the embeddings with survival and clinical annotations were evaluated. Based on the benchmarking results, we provided recommendations for biologists to choose appropriate methods in different scenarios and give guidelines on methodological improvements for researches focusing on algorithm design of multiomics data fusion.
Results
DLbased multiomics data fusion methods and benchmarking workflow
DLbased multiomics data fusion methods aim to learn lowdimensional embeddings from the fusion of multiomics data for various downstream tasks. According to our investigation, various DLbased data fusion methods can achieve this goal, including fully connected neural network (FCNN) [16,17,18,19,20], convolutional neural network (CNN) [21,22,23], autoencoder (AE) [24,25,26,27,28,29,30,31,32,33,34], graph neural network (GNN) [35,36,37,38,39], capsule network (CapsNet) [40, 41], generative adversarial network (GAN) [42], and mixture DLbased models for multiomics data fusion [43, 44]. Most of these models in previous publications were used with different strategies (early or late fusion). Early fusion means each omics data are fused first and then inputted into DLbased models. Late fusion means the multiomics data are inputted into DLbased models first and then fused for downstream tasks. Because of the difference in input omics data and downstream tasks, it is difficult to compare these methods directly. To make different methods comparable, in this study, we first extracted the data fusion part from the original model and then compared the performance on unified datasets and tasks. We selected multiomics data fusion methods according to the following two rules: (1) The original models of the selected methods have opensource code, and (2) the original models used multiomics fusion and the data fusion part of the original model can be extracted separately so that we can evaluate the data fusion on unified datasets and tasks. To comprehensively evaluate the performance of the models, three types of multiomics datasets were used in this study: simulated data, singlecell data, and cancer data. Notably, the evaluation models can be grouped into two categories: supervised models and unsupervised models. Therefore, for each of the datasets, two tasks were designed: classification for supervised models and clustering for unsupervised models. Classification performance was evaluated using three benchmarking metrics, namely accuracy, F1 macro, and F1 weighted. Clustering performance was evaluated using four benchmarking metrics, namely Jaccard index (JI), Cindex, silhouette score, and Davies Bouldin score. Furthermore, for the cancer multiomics datasets, this paper further evaluated the methods’ ability to capture the association of multiomics dimensionality reduction results with survival and clinical annotations. The associations could reflect representational ability and interpretability of the fused lowdimensional embeddings (Fig. 1, Table 1).
To make the structural differences of different models easy to understand, we renamed the model according to the structural characteristics (early or late fusion). The evaluation models are described as follows: a late fusion method based on AutoEncoder (lfAE), an early fusion method based on AutoEncoder (efAE), a late fusion method based on Denoising AutoEncoder (lfDAE), an early fusion method based on Denoising AutoEncoder (efDAE), an early fusion method based on Variational AutoEncoder (efVAE), an early fusion method based on Stacked Variational AutoEncoder (efSVAE), an efVAE method whose loss function is a maximum mean discrepancy (efmmdVAE), a late fusion method based on Neural Network (lfNN), an early fusion method based on Neural Network (efNN), a late fusion method based on Convolutional Neural Network (lfCNN), an early fusion method based on Convolutional Neural Network (efCNN), a multiomics Graph Convolutional Network method (moGCN), and a multiomics Graph Attention network method (moGAT). To accommodate different evaluation data, this study modified the input of these ten original models proposed by Ma et al. [24], Lee et al. [26], Poirion et al. [28], Guo et al. [29], Zhang et al. [33], Ronen et al. [32], Hira et al. [34], Kuru et al. [20], Preuer et al. [19], Islam et al. [22], Fu et al. [21], Wang et al. [38], and Xing et al. [39], respectively. Specifically, Ma et al. [24] used the late fusion AEs on gene expression, miRNA expression, and DNA methylation data to develop a robust model to predict clinical target variables. Lee et al. [26] developed an early fusion AE model and fused gene expression, miRNA expression, DNA methylation, and CNV data to predict lung adenocarcinoma survival rate. Poirion et al. [28] used gene expression, miRNA expression, and DNA methylation as input to predict the survival subtypes in bladder cancer by utilizing the late fusion DAE algorithm. Guo et al. [29] fed gene expression, miRNA expression, and CNV data into a novel framework to robustly identify ovarian cancer subtypes using the early fusion DAEs. Zhang et al. [33] used the early fusion VAE to classify samples from DNA methylation and gene expression profiles. Ronen et al. [32] developed an early fusion SVAE and used gene expression, miRNA expression, and DNA methylation to classify cancer subtypes. Hira et al. [34] used gene expression, miRNA expression, and DNA methylation as input and improved the loss function of early fusion VAE to analyze ovarian cancer through patient stratification analysis. The models proposed by Kuru et al. [20], Preuer et al. [19], Islam et al. [22], Fu et al. [21], Wang et al. [38], and Xing et al. [39] are supervised. Kuru et al. [20] and Preuer et al. [19] used gene expression and the drug pairs’ chemical structure data as input to predict drug synergy. Their models are based on the late and early fusion FCNN, respectively. Islam et al. developed a late fusion CNN and used CNV, gene expression, and clinical data to classify molecular subtypes. Fu et al. used variation counts, gene expression, QTANs/QTALs number, and WGCNA module features as the input of an early fusion CNN to predict gene regulation mechanisms. Wang et al. [38] introduced a novel multiomics data fusion method for biomedical classification and used gene expression, DNA methylation, miRNA expression data, and the corresponding similarity network as input to train a GCN to generate initial predictions for the category labels. Xing et al. [39] used a gene coexpression network as the input of GAT for disease diagnosis and prognosis.
Furthermore, considering evaluation completeness, our own frameworks were designed, including a late fusion method based on Variational Autoencoder (lfVAE), a late fusion method based on Stacked Variational Autoencoder (lfSVAE), and a lfVAE method with a loss function of maximum mean discrepancy (lfmmdVAE).
Evaluation of DLbased multiomics data fusion methods on simulated datasets
This study first evaluated DLbased multiomics fusion methods on simulated multiomics datasets that were generated using the InterSIM CRAN package [45] (Fig. 2). This package can generate complex and interrelated multiomics data, including DNA methylation, mRNA gene expression, and protein expression data. One hundred simulated samples with 1000dimensional features were generated. In generation process, the cluster number parameter of 100 simulated samples is set to 5, 10, and 15. Furthermore, we generated each cluster of samples in two conditions: all clusters have the same size, or the clusters have variable random sizes. This simulates a real application scenario in which the proportion of samples belonging to each cluster (subtype) could be the same or different.
Six supervised DL methods are evaluated in the classification tasks. These six supervised methods are intrinsically designed for sample classification, and they classify the samples of groundtruth clusters (subtypes). Their performances were then compared based on the classification results. For the ten other unsupervised DL methods, they were applied to fuse the simulated multiomics data to obtain 5dimensional, 10dimensional, and 15dimensional embeddings first. The dimension of the embeddings was set according to the number of clusters in the simulated multiomics data. Then, the kmeans algorithm was adopted to cluster the multiomics dimensionality reduction results. The clustering of the samples was finally obtained to compare the performances of the ten unsupervised methods.
To quantitatively evaluate the performances of the six supervised DL methods, we partitioned the dataset into training and test sets at the ratio of 3:1 and performed 4fold crossvalidation. Meanwhile, three metrics (accuracy, F1 macro, and F1 weighted score) were calculated (see “Methods”). It can be seen that the efNN, moGCN, and moGAT achieved better classification performance with higher accuracy, F1 macro, and F1 weighted values (Table 2, Additional file 1: Table S1). The performance of the two GNNbased methods (moGCN and moGAT) is remarkable. The two CNNbased methods (efCNN and lfCNN) are less effective on this benchmark. This indicates that using CNN with a onedimensional convolution layer on the input vector may not be optimal in multiomics data fusion.
For the clustering tasks, JI was employed to measure the consistency between multiomics data fusionbased clusters and the groundtruth clusters. JI is an external comparison index used to measure the similarity and diversity of sample sets. The value of the JI ranges from 0 to 1, and the higher the value, the better the clustering result. It can be seen from the experimental results that most methods showed stable performance on different numbers of clusters (Fig. 3a, Additional file 1: Table S2). Compared with the condition that clusters have variable random sizes, most methods obtained higher JI values in the condition that all clusters have the same size. Most of the methods performed reasonably well in different simulated scenarios (JIs >0.6), except for the SVAE methods. According to JI, efAE, efDAE, and efVAE are overall the bestperforming methods. These methods are among the top three best methods in 6/6, 5/6, and 3/6 simulated scenarios, according to JI.
In addition to the external comparison index (JI), this paper also employed three internal indices (Cindex, silhouette score, and Davies Bouldin score) to evaluate the clustering performance (Fig. 3b,c,d, Additional file 1: Table S3,S4,S5). The internal indices were used to measure the goodness of a clustering structure without external information [46]. The values of the Cindex, silhouette score, and Davies Bouldin score range from 0 to 1, −1 to 1, and 0 to infinity. The lower the Cindex and Davies Bouldin score are, the better the clustering results are. While the higher silhouette score indicates better clustering result. According to Cindex, lfmmdVAE achieves the best performance and is among the top three methods in 6/6 scenarios. According to the silhouette score and Davies Bouldin score, efVAE is the bestperforming method. Meanwhile, the two SVAE methods obtain the worst performance according to the three internal indices.
Evaluation of DLbased multiomics data fusion methods on singlecell datasets
Applying multiomics data fusion methods to singlecell multiomics data helps to systematically explore the heterogeneity of cells [47]. To further benchmark the performances of DLbased multiomics data fusion methods, it is crucial to evaluate these methods on singlecell multiomics data.
The singlecell datasets consist of two omics data types, i.e., singlecell chromatin accessibility data and singlecell gene expression data (Fig. 4). The number of features for these two types of omics data is 49,073 and 207,203, respectively. And the two omics data were obtained from three different cancer cell lines (HTC, Hela, and K562) for a total of 206 cells [48].
Similar to the evaluation on the simulated multiomics data above, this study first evaluated six supervised classification methods on the singlecell dataset. These methods classified the samples of three cancer cell lines. The performance of these methods was obtained based on 4fold crossvalidation and was evaluated by three metrics: accuracy, F1 macro, and F1 weighted score (Table 3). It can be seen that the results are similar to those on the simulated data sets. lfNN, efNN, moGCN, and moGAT all perform very well.
For the clustering tasks, ten unsupervised DL methods were first applied to fuse the singlecell multiomics data to obtain the fused twodimensional embeddings. Then, the kmeans algorithm was employed to cluster the multiomics dimensionality reduction results into three categories. The clustering of the samples was finally obtained to compare the performances of the ten unsupervised methods. This study adopted JI, Cindex, silhouette score, and Davies Bouldin score as the evaluation indexes of clustering. According to the external index JI, efmmdVAE and efVAE are the bestperforming methods (Fig. 5, Additional file 1: Table S6). According to the three internal indices, lfAE, lfDAE, and efmmdVAE achieve good performance. Overall, efmmdVAE and lfAE are among the top three methods in 3/4 evaluation indices, so they are the most promising methods on this benchmark.
Evaluation of DLbased multiomics data fusion methods on cancer datasets
In recent years, the rapid development of highthroughput sequencing technologies enables researchers to obtain multiomics molecular profiles of various cancer types. To better understand the molecular and clinical characteristics of cancers, it is crucial to use multiomics data fusion methods [49].
This study evaluated DLbased multiomics fusion methods on The Cancer Genome Atlas (TCGA) cancer multiomics datasets (Fig. 6). The datasets consist of three omics data types: gene expression, DNA methylation, and miRNA expression. For the classification tasks, we collected five different cancer datasets with groundtruth cancer subtypes from TCGA, including breast cancer (BRCA), glioblastoma (GBM), sarcoma (SARC), lung adenocarcinoma (LUAD), and stomach cancer (STAD). For the clustering task, to ensure the authenticity of the evaluation, the data used in this study were obtained from benchmark cancer datasets (http://acgt.cs.tau.ac.il/multi_omic_benchmark/download.html) [10].
Similar to the evaluation on the simulated multiomics data and singlecell multiomics data above, six supervised classification methods were firstly evaluated on the five different cancer datasets with groundtruth cancer subtypes. These methods classified the samples of different groundtruth cancer subtypes (Table 4). The performance of these methods was obtained based on 4fold crossvalidation and was evaluated by three metrics: accuracy, F1 macro, and F1 weighted score (see “Methods”). For each cancer data set, the samples with all three omics data types were selected, and 59, 272, 206, 144, and 198 samples were obtained for BRCA, GBM, SARC, LUAD, and STAD, respectively. The subtypes for each cancer are listed in Table 4. Among the five supervised methods (Table 5), moGAT obtain the most promising results on BRCA and GBM. moGCN, lfNN, and efNN achieve the best performance on SARC, LUAD, and STAD, respectively. lfCNN obtains the lowest scores of all the three metrics on 3/5 datasets. In this benchmark, the GNNbased methods show great advantages.
In addition, it can be seen that all methods do not perform well on BRCA. This is because BRCA has significantly fewer samples than other cancers. For GBM, although it has the largest number of samples among the five cancers, most methods do not achieve good performance. According to investigation, it is found that the subtype labels of GBM may have some deviations. Recent studies suggest that GBM should be classified into three subtypes instead of the four subtypes labeled by TCGA [50,51,52,53].
To further explore how the data size influences the benchmarks, we reduced the amount of data and observed the effects of data reduction. Specifically, 20%, 40%, 60%, and 80% of the total samples in the original data were randomly selected. Then, all six methods were evaluated under different amounts of data. The results of the data reduction experiment are illustrated in Fig. S1. Except for two GNNbased methods (moGAT and moGCN), the performances of other methods are impaired when the amount of data decreases. The performance of the two GNNbased methods fluctuates greatly. This may be because the network structure changes greatly with the change of data for the GNNbased methods.
For the clustering tasks, ten unsupervised DL methods were first applied to fuse the cancer multiomics data to obtain the fused 10dimensional embeddings. The embedding dimension was set as that in the work of Bismeijer et al. [54] and Cantini et al. [12]. Then, the kmeans algorithm was employed to cluster the multiomics dimensionality reduction results into several categories. Because the optimal cluster number (the groundtruth cancer subtypes) was uncertain, the number of clusters was set from two to six in this study. Finally, the clustering of the samples was obtained to compare the performances of the ten unsupervised methods. This study adopted the Cindex, silhouette score, and Davies Bouldin score as the evaluation indexes of clustering (Fig. 7a, b, c, Additional file 1: Table S7, S8, S9). Note that JI was not used as the evaluation index because of the lack of information on groundtruth cancer subtypes in the benchmark cancer datasets. Among these ten DL methods, efmmdVAE, efVAE, and lfmmdVAE are the bestperforming methods. They are among the top three best methods in 42/50, 41/50, and 21/50 datasets according to Cindex, in 47/50, 42/50, and 43/50 according to silhouette score, and in 46/50, 39/50, and 48/50 according to Davies Bouldin score, respectively. In particular, efmmdVAE outperforms the other methods in terms of Cindex, silhouette score, and Davies Bouldin score on KIRC.
To further evaluate the effect of data fusion by using these DLbased methods, we not only used the fused tendimensional embeddings for clustering analysis but also evaluated the associations of the embeddings with survival and clinical annotations (Fig. 8). On the one hand, the associations could reflect representational ability of the fused tendimensional embeddings, and on the other hand, the associations partly reflect the interpretability of the embeddings.
To evaluate the association of the embeddings with survival, we employed the Cox proportionalhazards regression model and calculated Bonferronicorrected pvalues. The Bonferronicorrected pvalues indicate to what extent an embedding can distinguish the difference between the population survival conditions. The statistical significance threshold was set to 0.05. The embeddings with strong association with survival (the Bonferronicorrected pvalues smaller than 0.05) are illustrated in Fig. 7d. The more this type of embeddings, the better the performance of the method. In fact, survival is a comprehensive clinical characteristic affected by many factors. For example, the survival markers of poor prognosis for adrenocortical cancer can be various types of genes, miRNA, and DNA methylation signatures [55,56,57,58]. Similar to the markers, the embeddings with strong association with survival can reflect their impact on survival from various aspects.
Based on the results, we observed that the number of embeddings associated with survival depended on not only the DL method but also the cancer type. In three cancer types (GBM, KIRC, and SARC), half of the DL methods identify at least one survivalassociated embedding. In general, lfVAE and efDAE achieve the best performance, and they can find embeddings significantly associated with survival in 7/10 and 6/10 cancer types, respectively.
Subsequently, using the same tendimensional embeddings described above, we evaluated the association of the embeddings with clinical annotations (Fig. 7e, Additional file 1: Table S10). Four clinical annotations were selected, i.e., “age at initial pathologic diagnosis,” “days to new tumor event after initial treatment,” “gender,” and “history of neoadjuvant treatment.” The KruskalWallis test and Wilcoxon ranksum were adopted to test the significance of the associations of the embeddings with these clinical annotations. Specifically, the KruskalWallis test was used for “age at initial pathologic diagnosis” and “days to new tumor event after initial treatment,” and the Wilcoxon ranksum was used for “gender” and “history of neoadjuvant treatment.” Different from the association of the embeddings with survival mentioned above, the association of the embeddings with clinical annotations is expected to be a onetoone mapping, i.e., an embedding is associated with a clinical annotation. In this way, each embedding is interpretable. Therefore, after obtaining the strong associations through KruskalWallis test and Wilcoxon ranksum, we employed the selectivity score as the evaluation metric [12]. The selectivity score falls within [0, 1]. When each embedding is associated with one and only one clinical annotation, the selectivity score is 1. When all embeddings are associated with all clinical annotations, the selectivity score is 0. The top methods are those with selectivity scores above the average. The results indicate that the average selectivity score of all methods across all cancer types is 0.49. The selectivity scores of all DL methods for kidney cancer are 0. In particular, the selectivity score of lfSVAE is 1 for AML LIHC, LUSC, and OV; the selectivity score of efSVAE is 1 for AML and OV. Overall, lfSVAE, efSVAE, lfAE, lfDAE, efAE, and lfmmdVAE are among the top methods for 8/10, 6/10, 6/10, 6/10, 6/10, and 6/10 cancer types, respectively. The six methods are the bestperforming methods on these ten cancer datasets. Although the two SVAE methods do not perform well in clustering performance, they have an advantage in finding the meaningful embeddings that are associated with clinical annotations.
Discussion and conclusions
Increasing evidence has shown that multiomics data analysis plays an important role in a wide spectrum of biomedical research, which has promoted the development of multiomics data fusion methods. Here, this study systematically evaluated 16 DLbased methods that are representative multiomics fusion methods in three different contexts, i.e., simulated multiomics datasets, singlecell multiomics datasets, and cancer multiomics datasets. For each of the datasets, two tasks were designed: classification and clustering. Meanwhile, various evaluation metrics were employed to evaluate the models’ performance from different aspects.
When evaluated on simulated multiomics datasets, most supervised methods show good performances in classification tasks, especially efNN, moGCN, and moGAT. The two CNNbased methods (efCNN and lfCNN) are less effective on this benchmark, indicating that using CNN with a onedimensional convolution layer on the input vector may not be suitable for multiomics data fusion. For the clustering task, efAE, lfmmdVAE, and efVAE show the best performance. Similar to the result on simulated datasets, moGCN and moGAT perform very well on the classification task of singlecell datasets. As for the evaluation of the clustering performance on the singlecell dataset, efmmdVAE and lfAE are the most efficient methods. Finally, on the cancer data benchmark, moGAT still outperforms the other supervised methods on the classification task. When evaluating the clustering performance, efmmdVAE, efVAE, and lfmmdVAE achieve the most promising results in most scenarios. When evaluating the associations of the embeddings with survival or clinical annotations, lfVAE and lfSVAE are the most efficient. Therefore, for the study of embeddinglevel information, lfVAE and lfSVAE are worthy of being prioritized.
Based on the above results, to make our evaluation more objective, we defined a unified score (see “Methods”) and ranked these DL methods according to the unified score. If one method was evaluated in more than one scenario, its average unified scores were used. For the classification tasks, moGAT ranks first on the three different multiomics datasets (Fig. 9). For the clustering tasks, efVAE, lfmmdVAE, and lfAE are the top three methods on the simulated datasets. lfAE, lfDAE, and efmmdVAE are the top three methods on singlecell datasets. efmmdVAE, lfmmdVAE, and efVAE are the top three methods on cancer datasets. Overall, GNNbased methods should be prioritized by researchers focusing on classification tasks. GNNbased methods structure the multiomics data into similarity networks. The correlations among samples can be captured by the similarity networks. Therefore, the omics features and the geometrical structures of the data can be effectively utilized and the classification performance can benefit from it. When focusing on the clustering tasks, efmmdVAE, efVAE, and lfmmdVAE should be prioritized. They have the most effective and consistent behaviors across all the different benchmarks. These methods learn the probability distribution of the data. They have a layer of data means and standard deviations, which are used to generate new data. This allows for better generalization and flexibility of the learned embeddings. They can thereby be valuable tools for researchers who are interested in applying DLbased multiomics data fusion methods to various biomedical problems.
Although the stateoftheart DLbased multiomics data fusion methods are evaluated comprehensively on three different datasets by employing various evaluation indices and scenarios, there is still limitation in this benchmark study. Combining all omics types may introduce noise because there may be information redundancy in different omics data. The compatibility of omics data should be checked to avoid the case that different omics data is completely discordant. In the future, different combinations of omics data and the selection of a lessredundant set of omics data will be considered in this benchmark study.
Despite the great progress in multiomics data fusion brought by the above DL methods, there is still room for future improvement from a computational perspective. (1) Dealing with class imbalance. As demonstrated in the evaluation on the simulated datasets, most methods perform better in the condition that all clusters have the same size than in the condition that clusters have variable random sizes. Imbalanced classes could impair model performance. Further extensions of DLbased multiomics data fusion methods could handle class imbalance problems by applying costsensitive learning [59, 60], ensemble learning (e.g., bagging and boosting), etc. (2) Combining AE and GNN. The evaluation results indicate that GNNbased methods achieve good performance. Graph autoencoder (GAE), which combines AE and GNN, has achieved success on many tasks [61,62,63]. Applying GAEbased methods to cancer and singlecell multiomics data fusion could be promising and is worthy of further exploration. (3) Designing algorithms that accommodate the missing observations. Multiomics data fusion is often accompanied by the absence of samples in one or several omics. Taking more omics types into consideration and using the sample sets consisting of all omics data types can lead to a limited sample size. One solution to this problem is to infer the missing features. Based on the observation that different omics are not completely independent and can be correlated, the missing features can be inferred by using the complementary information of different omics. For example, in the field of chemoinformatics, Martino et al. designed a model by employing a siamese neural network to massively infer the missing features for ~800,000 molecules [64]. In addition to missing feature inference, using generative adversarial networks (GAN) to generate data similar to a real dataset is also a promising method. Since GANbased algorithms can learn and imitate any distribution of data, Ahmed et al. [42] employed GAN to fuse two omics data, but this GANbased model can only be applied to specific types of data with explicit interactions (e.g., miRNAmRNA interaction). Although GANbased multiomics data fusion algorithms have some limitations currently, they deserve to be further explored in the application of missing value imputation [65]. (4) Developing explainable DL methods. Most of the stateoftheart DL methods lack interpretation, which is increasingly demanded in the biomedical field. For the “blackbox” of DL models, it is difficult to elucidate the underlying biological mechanisms. One emerging approach is to embed prior biological knowledge into the DL models. Several studies used knowledgeembedded algorithms to provide explanations [66,67,68]. For example, Mao et al. [66] performed dimensionality reduction on highdimensional singlecell transcriptome data and provided explanations by embedding prior knowledge into matrix factorization. Gut et al. [67] proposed a knowledgeembedded VAE by restricting the structure of VAE to mirror genepathway memberships and applied the model to reduce the dimensionality of singlecell RNAseq data. Similarly, developing knowledgeembedded DL methods for multiomics data fusion is promising and can provide new insight into the underlying mechanisms. Based on this comprehensive benchmark study and several potential improvement strategies, more progress can be achieved on multiomics data fusion.
Methods
Presentation of the ten evaluation models
This study considered p omics matrices X_{i} (i = 1, …, p) with a dimension of m × n_{i} (m samples and n_{i} features). Each sample can be represented by p vectors x_{i} (i = 1, …, p). Note that the original models of all methods in this part can be found in the publications [19,20,21,22, 24, 26, 28, 29, 32,33,34, 38, 39].
FCNN
FCNN is a simulated neural network and usually consists of an input layer, multiple hidden layers, and an output layer. The neurons in the hidden layers receive the multidimensional input vector X and output y, which can be expressed in Eq. (1).
where y^{1} is the first output vector, W^{1} and b^{1} are parameters that can be learned according to the input X, σ is the activation function. The multilayer neural network can be expressed in Eq. (2).
This study used two types of FCNN with different structures, i.e., efNN and lfNN.
efNN: The pomics vectors are concatenated into one feature vector X. The dimension of X is \({\sum}_{i=1}^P{n}_i\).
The vector X is used as the input of a multilayer neural network for classification; relu is used for the activation function in the middle layers, and softmax is used in the last layer. relu and softmax are expressed in Eq. (4) and Eq. (5), respectively.
where n is the number of features.
Therefore, the middle and last layers can be expressed in Eq. (6) and Eq. (7), respectively.
The overall loss function is the crossentropy loss L_{ce}, which can be expressed in Eq. (8).
where m is the number of samples, and l is the number of categories.
lfNN: Each omics vector x_{i} is used as the input to a subnetwork. The outputs o_{i} of the subnetwork are the intermediate features of each type of omics data. Then, the outputs of multiple neural networks are concatenated into a vector O.
Then, the vector O is used as the input of a multilayer neural network for classification. The evaluation is consistent with that of efNN, except that the last layer uses the softmax activation function, and the other layers use relu. The loss function is also a crossentropy loss L_{ce}.
CNN
CNN stimulates the biomimetic biological natural cognition mechanism, and it is a neural network learning framework using image visual computing. A typical CNN mainly consists of convolutional layers, activation layers, pooling layers, and fully connected layers. The convolution layer is mainly composed of multiple convolution kernels. The convolution kernel is to operate the kernel function on the local image, and its essence is the discrete convolution between two twodimensional matrices. The operation principle is as follows:
where s and t are the widths of the convolution kernel in the x and y directions, F is the parameter matrix of the convolution kernel, G is the local image matrix that is operated with the convolution kernel, and k is the size of the convolution kernel.
The pooling layer is to reduce data dimensionality, thereby decreasing the number of parameters and calculation amount inside the CNN. Meanwhile, it can prevent the network from overfitting to a certain extent. Usually, the maximum pooling layer is used in CNN, that is, the maximum value in the local receptive field is taken. Its mathematical description is as follows:
where P is the feature matrix obtained by max pooling, l is the width of the feature map, A is the feature matrix after activation of the convolutional layer, and w is the width of the pooling region.
The Onedimensional Convolutional Neural Network (1DCNN) is essentially the same as convolutional neural networks. Although 1DCNN has only one dimension, it also has the advantages of translation invariance of CNN for feature recognition. Structurally, 1DCNN is almost the same as CNN. It also includes a series of convolutional layers and pooling layers and outputs the results through the fully connected layer. The difference is that in the calculation of convolutional layers and pooling layers, 1DCNN only extracts the feature sequence of the onedimensional sequence. The operations of the convolutional layer and the pooling layer in the 1DCNN are as follows:
This study used two types of CNN with different structures, namely efCNN and lfCNN.
efCNN: It is similar to efNN, and this study added convolutional layers and pooling layers to the network structure. The pomics vectors are concatenated into one feature vector X. After the convolution layer and the pooling layer, the output features are flattened into and out of the fully connected network to make the final prediction.
lfCNN: It is similar to lfNN. Each omics vector x_{i} is used as the input to a subnetwork. Each subnetwork consists of convolutional layers and pooling layers. The outputs of each subnetwork are flattened, concatenated, and fed into a fully connected neural network to make the final prediction.
moGCN
GCNs are used for omicsspecific learning in moGCN, and a GCN is trained for each type of omics data to perform classification tasks. A GCN model takes two inputs. One input is a feature matrix X ∈ ℝ^{n × d}, where n is the number of nodes, and d is the number of input features. The other input is a description of the graph structure, which can be represented as an adjacency matrix A ∈ ℝ^{n × n}. A GCN can be constructed by stacking multiple convolutional layers. Specifically, each layer is defined as:
where H^{(l)} is the input of the lth layer, W^{(l)} is the weight matrix of the lth layer, and σ(·) denotes a nonlinear activation function. To train GCNs effectively, the adjacency matrix A is further modified as:
where \(\hat{D}\) is the diagonal node degree matrix of \(\hat{A}\), and I is the identity matrix.
The original adjacency matrix A is obtained by calculating the cosine similarity between pairs of nodes, and edges with cosine similarity larger than a threshold ϵ are retained. Specifically, the adjacency between node i and node j in the graph is calculated as:
where x_{i} and x_{j} are the feature vectors of nodes i and j, respectively. \(s\left({x}_i,{x}_j\right)=\frac{x_i{x}_j}{\parallel {x}_i{\parallel}_2\parallel {x}_j{\parallel}_2}\) is the cosine similarity between nodes i and j.
To perform omicsspecific classification, a multilayer GCN is constructed for each omics data type. Specifically, for the ith omics data type, an omicsspecific GCN, i.e., GCN_{i}(·), is trained with training data \({X}_{tr}^{(i)}\in {\mathbb{R}}^{n_{tr}\times {d}_i}\) and the corresponding adjacency matrix \({\overset{\sim }{A}}_{tr}^{(i)}\in {\mathbb{R}}^{n_{tr}\times {n}_{tr}}\). The predictions on the training data can be expressed as:
where \({\hat{Y}}_{tr}^{(i)}\in {\mathbb{R}}^{n_{tr}\times c}\). \({\hat{y}}_j^{(i)}\in {\mathbb{R}}^c\) is denotes the jth row in \({\hat{Y}}_{tr}^{(i)}\), which is the predicted label distribution of the jth training sample from the ith omics data type. Therefore, the loss function for GCN_{i}(·) can be expressed as:
where L_{CE}(·) represents the crossentropy loss function.
In moGCN, VCDN is also utilized to fuse different types of omics data for classification. For simplicity, this paper first demonstrates how to extend VCDN to accommodate three views. For the predicted label distribution of the jth sample from different omics data types \({\hat{y}}_j^{(i)},i=1,2,3\), a crossomics discovery tensor C_{j} ∈ ℝ^{c × c × c} is constructed, where each entry of C_{j} can be calculated as:
where \({\hat{y}}_{j,a}^{(i)}\) denotes the ath entry of \({\hat{y}}_j^{(i)}\).
Then, the obtained tensor C_{j} is reshaped to a c^{3}dimensional vector c_{j} and is forwarded to VCDN(·) for the final prediction. VCDN(·) is a fully connected network with the output dimension of c. The loss function of VCDN(·) can be represented as:
In summary, the total loss function of moGCN can be expressed as:
where γ is a tradeoff parameter between the omicsspecific classification loss and the final classification loss from VCDN(·).
moGAT
This study replaced the GCN model in moGCN with the GAT model to obtain a new model, and except for GCN, other parts of the whole framework remained unchanged. A GAT model also takes two inputs: feature matrix X ∈ R^{n × d}and adjacency matrix A ∈ R^{n × n}. Like all attention mechanisms, the GAT calculation also consists of two steps. First, the attention coefficient is calculated. For vertex i, the similarity coefficient between its neighbors (\(j\in {\mathcal{N}}_i\)) and itself is calculated one by one.
Linear mapping of a shared parameter W adds dimension to the features of the vertex, and this is a common feature augment method. [Wh_{i} ∥ Wh_{j}] stitches the transformed features of the vertex i, j; finally, a(·) maps the concatenated highdimensional features to a real number. The attention coefficients are then normalized through softmax.
In the second step, the features are weighted and aggregated according to the calculated attention coefficient.
where \({h}_i^{\prime }\) is the new feature output by GAT for each vertex i (fused with neighborhood information).
Autoencoder
Autoencoder is a deep neural network that copies its input to its output. An autoencoder consists of two parts, i.e., an encoder and a decoder, and both are implemented by neural networks. The encoder and decoder can be expressed in Eq. (25) and Eq. (26), respectively.
where f_{encoder} and f_{decoder} are multilayer neural networks.
efAE: It is similar to efNN, and the pomics vectors are concatenated into one feature vector X. Therefore, the encoder and the decoder can be represented as z = f_{encoder}(X) and X^{′} = f_{decoder}(z), respectively. For the evaluation, relu is used for the activation function in all layers of the encoder and the middle layers of the decoder. tanh is used in the last layer of the decoder. tanh can be expressed in Eq. (27).
The loss function is the MSE loss L_{MSE}, which can be expressed as:
where n is the number of features.
Eventually, the vector z is taken as a multiomics fusion feature.
lfAE: p AEs are used to perform feature extraction on the pomics vectors. The encoder and decoder can be expressed in Eq. (29) and Eq. (30), respectively.
For each AE, this study set the activation and loss functions the same as those in efAE in our evaluation. z_{i} was adopted to represent the latent features of each omics. Finally, the latent features z_{i} of each omics were concatenated as multiomics fusion features z_{fusion}.
Denoising autoencoder
Unlike the standard AE, DAE constructs partially damaged data by adding noise to the input data and restores it to the original input data through encoding and decoding. The new generated \(\overset{\sim }{x}\) can be expressed in Eq. (32).
where q_{D} represents the stochastic mapping.
Then, \(\overset{\sim }{x}\) is used as the input of the encoder, and x is used as the reconstructed target of the decoder. The loss function is consistent with that of the standard AE.
efDAE: First, the pomics vectors are concatenated into one feature vector X. Then, noise is added to X by \(\overset{\sim }{X}={q}_D(X)\)to obtain \(\overset{\sim }{X}\). Next, the encoder and the decoder can be represented as \(z={f}_{encoder}\left(\overset{\sim }{X}\right)\) and X^{′} = f_{decoder}(z), respectively. The following steps are the same as those in efAE. Finally, the vector z is taken as a multiomics fusion feature.
lfDAE: First, noise is added to pomics vectors x_{i} by \(\tilde{x}_i={q}_D\left({x}_i\right)\) to obtain \(\tilde{x}_i\). Then, p AEs are used to perform feature extraction on the p new vectors with noise. The encoder and decoder can be expressed in Eq. (33) and Eq. (34), respectively.
The following steps are the same as those in efAE. Finally, the latent features z_{i} of each omics are also concatenated as multiomics fusion features z_{fusion}.
Variational autoencoder
Compared with AE, VAE has one more constraint. Thus, the latent vectors of VAE follow closely a unit Gaussian distribution. The final hidden layer of the encoder is fully connected to two output layers, which represent the mean μ and the standard deviation σ in the Gaussian distribution \(\mathcal{N}\left(\mu, \sigma \right)\) of the latent variable z, given an input sample x. To make the sampling step differentiable and suitable for backpropagation, the reparameterization trick in Eq. (35) is applied.
where ϵ is a random variable sampled from the unit normal distribution \(\mathcal{N}\left(0,I\right)\).
The loss function in VAE consists of two parts, i.e., the reconstruction loss and the latent loss. The same as AE, the reconstruction loss is the MSE loss. The latent loss measures how well the latent vectors follow the assumed distribution using the KullbackLiebler divergence (KL).
where L_{KL} is the KL divergence between the learned distribution and a unit Gaussian distribution.
Therefore, the total loss function can be defined in Eq. (38).
efVAE: It is similar to efAE, and the pomics vectors are also concatenated into one feature vector X. X is used as the input of VAE. The mean vector μ and the standard deviation vector σ can be obtained from the encoder. Then, the vector z can be obtained by sampling using Eq. (37). Eventually, the vector z is taken as a multiomics fusion feature. In our evaluation, the activation and loss functions were set the same as those in efAE.
lfVAE: Similar to lfAE, p VAEs are used to perform feature extraction on the pomics vectors. We can obtain the mean vectors μ_{i}(i = 1, 2, …, p) and the standard deviation vectors σ_{i}(i = 1, 2, …, p) from the encoders. Then, the vectors z_{i}(i = 1, 2, …, p) can be obtained by sampling using Eq. (37). Eventually, latent features z_{i} of each omics are concatenated as multiomics fusion features z_{fusion}.
efSVAE: SVAE is a stacked VAE model. In SVAE, all hidden layers obey a unit Gaussian distribution. Each hidden layer of the encoder fully connects to two output layers, which represent the mean μ and the standard deviation σ in the Gaussian distribution \(\mathcal{N}\left(0,I\right)\). The sampling step is the same as that in VAE. In the evaluation, a multiplier was added to the loss function similar to βVA E [69]. The total loss can be expressed in Eq. (39):
where β is initially 0 and is gradually increased by β = β + k until its value reaches 1.
The following steps are the same as those in efVAE. Eventually, the vector z is taken as a multiomics fusion feature.
lfSVAE: Compared with lfVAE, this model just replaces VAE with SVAE. The vectors z_{i} (i = 1, 2, …, p) can be obtained by sampling. Finally, the latent features z_{i} of each omics are concatenated as multiomics fusion features z_{fusion}.
efmmdVAE: Unlike standard VAE, mmdVAE uses Maximum Mean Discrepancy (MMD) in the loss function instead of the KullbackLiebler divergence (KL). MMDbased regularization term estimates divergence by how “different” the moments of two distributions p(z) and q(z) are. This study used the kernel embedding trick to estimate the MMD for two distributions, as shown in Eq. (40):
where k(z, z^{′}) can be any universal kernel.
The corresponding loss function can be expressed in Eq. (42).
In efmmdVAE, one VAE is also used to train the omics data. Except for the different loss function from efVAE, other parts are the same. Finally, the vector z is taken as a multiomics fusion feature.
lfmmdVAE: It is similar to lfVAE, and p VAEs are used to train the omics data. Finally, the vectors z_{i} (i = 1, 2, …, p) can be obtained by sampling and they are concatenated as multiomics fusion features z_{fusion}.
Evaluation metrics
First, a few relevant definitions are introduced:

TP (TruePositive) represents the number of samples that are actually positive cases and are determined as positive cases by the classifier.

FP (FalsePositive) represents the number of samples that are actually negative cases but are determined as positive cases by the classifier.

FN (FalseNegative) represents the number of samples that are actually positive cases but are determined as negative cases by the classifier.

TN (TrueNegative) represents the number of samples that are actually negative cases and are determined as negative cases by the classifier.
Accuracy
Accuracy represents the ratio between the correctly predicted samples and the total samples:
F1 macro
The macro algorithm calculates Precision and Recall by first calculating Precision and Recall for each category and then taking the average.
F1 weighted
The weighted algorithm is a modified version of the macro algorithm to address the issue that the macro algorithm does not consider the imbalance in the sample. When calculating Precision and Recall, the Precision and Recall of each category are multiplied by the percentage w_{i} of that category in the total samples.
Jaccard index
JI is a statistic used to compare the similarity and diversity between two finite sets A and B. It is defined by the size of the intersection of the sets and divided by the size of their union. The value of JI is within [0, 1]. The larger value of JI, the higher the similarity.
Cindex
In the cluster C_{k}, there are \(\frac{n_k\left({n}_k1\right)}{2}\) pairs of distinct points. N_{W} represents the total number of such pairs:
The total number of pairs of distinct points in the dataset is
The Cindex is defined as:
where S_{W} is the sum of the N_{W} distances between all the pairs of points inside each cluster; S_{min} is the sum of the N_{W} smallest distances between all the pairs of points in the whole dataset, and there are N_{T} such pairs: one takes the sum of the N_{W} smallest values; S_{max} is the sum of the N_{W} largest distances between all the pairs of points in the whole dataset, and there are N_{T} such pairs: one takes the sum of the N_{W} largest values.
Silhouette score
The silhouette score of this sample can be written as:
For a sample, a is the average distance from other samples in the same category, and b is the average distance from samples in the nearest different category.
For a sample set, its silhouette score is the average of the silhouette score of all samples. The range of the silhouette score is [−1, 1]. The closer the samples of the same category and the farther the samples of different categories, the higher the value is. A negative value of the silhouette score indicates a poor clustering performance.
Davies Bouldin score
The Davies Bouldin score calculates the sum of the average distance of any two categories divided by the center distance of two clusters to obtain the maximum value. A lower Davies Bouldin score means a smaller intraclass distance and a larger interclass distance. The calculation expression is as follows:
where s_{i} represents the average distance between each point of a cluster and the centroid of the cluster, and d_{ij} represents the distance between the centroids of clusters i and j.
Selectivity score
The selectivity score can be defined as:
where N_{c} is the total number of clinical annotations associated with at least one feature, N_{f} is the total number of features associated with at least one clinical annotation, and L is the total number of associations between clinical annotations and features. When each feature is associated with one and only one clinical annotation, the maximum value of S is 1, and vice versa; the minimum value of S is 0.
Unified score
Based on the benchmarking results, to make our evaluation more objective, we defined “unified score” to unify the results of each indicator as the final comprehensive evaluation indicator. The unified score is referring to the rank aggregation scheme for the Synapse Challenge (https://doi.org/10.7303/syn6131484). This score is equal to the sum over all normalized ranking measures, it is defined as
where r is the rank of a method for a specific metric (e.g., accuracy, silhouette score, etc.) and N is the total number of methods. Thus, higher scores indicate better performance. If one method was evaluated in more than one scenario (e.g., in simulated multiomics dataset, all methods are evaluated on six scenarios: three cluster number × same size/random size), we used its average unified scores.
Availability of data and materials
All datasets used in this study are publicly available. The simulated multiomics datasets were generated by the R package InterSIM. The singlecell multiomics datasets are available in our GitHub repository. The benchmark cancer multiomics datasets were downloaded from http://acgt.cs.tau.ac.il/multi_omic_benchmark/download.html.
All code for evaluation in this paper is available. To reproduce the experimental results, the following main libraries need to be installed: Python:3.7.0, R 3.5.1, Tensorflow 1.15.0, Scikitlearn 0.20.0, and Jupyter 1.0.0.
All dataset and codes are available at the https://github.com/zhenglinyi/DLmo [70] (DOI: https://doi.org/10.5281/zenodo.6876344 [71]).
References
Nicholson JK, Wilson ID. Understanding 'global' systems biology: metabonomics and the continuum of metabolism. Nat Rev Drug Discov. 2003;2(8):668–76.
Nativio R, Lan Y, Donahue G, Sidoli S, Berson A, Srinivasan AR, et al. An integrated multiomics approach identifies epigenetic alterations associated with Alzheimer’s disease. Nat Genet. 2020;52(10):1024–35.
Network TCGA. Comprehensive molecular portraits of human breast tumours. Nature. 2012;490(7418):61–70.
Ianevski A, Giri AK, Gautam P, Kononov A, Potdar S, Saarela J, et al. Prediction of drug combination effects with a minimal set of experiments. Nat Mach Intell. 2019;1(12):568–77.
Patel MN, HallingBrown MD, Tym JE, Workman P, AlLazikani B. Objective assessment of cancer genes for drug discovery. Nat Rev Drug Discov. 2012;12(1):35–50.
Huang A, Garraway LA, Ashworth A, Weber B. Synthetic lethality as an engine for cancer drug target discovery. Nat Rev Drug Discov. 2020;19(1):23–38.
O'Neil NJ, Bailey ML, Hieter P. Synthetic lethality and cancer. Nat Rev Genet. 2017;18(10):613–23.
Boehm KM, Khosravi P, Vanguri R, Gao J, Shah SP. Harnessing multimodal data integration to advance precision oncology. Nat Rev Cancer. 2021;22(2):114–26.
Miao Z, Humphreys BD, McMahon AP, Kim J. Multiomics integration in the age of million singlecell data. Nat Rev Nephrol. 2021;17(11):710–24.
Rappoport N, Shamir R. Multiomic and multiview clustering algorithms: review and cancer benchmark. Nucleic Acids Res. 2018;46(20):10546–62.
Franco EF, Rana P, Cruz A, Calderón VV, Azevedo V, Ramos RTJ, et al. Performance comparison of deep learning autoencoders for cancer subtype detection using multiomics data. Cancers. 2021;13(9):2013.
Cantini L, Zakeri P, Hernandez C, Naldi A, Thieffry D, Remy E, et al. Benchmarking joint multiomics dimensionality reduction approaches for the study of cancer. Nat Commun. 2021;12(1):124.
Chauvel C, Novoloaca A, Veyre P, Reynier F, Becker J. Evaluation of integrative clustering methods for the analysis of multiomics data. Brief Bioinform. 2020;21(2):541–52.
PierreJean M, Deleuze JF, Le Floch E, Mauger F. Clustering and variable selection evaluation of 13 unsupervised methods for multiomics data integration. Brief Bioinform. 2020;21(6):2011–30.
Tini G, Marchetti L, Priami C, ScottBoyer MP. Multiomics integrationa comparison of unsupervised clustering methodologies. Brief Bioinform. 2019;20(4):1269–79.
Huang Z, Zhan X, Xiang S, Johnson TS, Helm B, Yu CY, et al. SALMON: survival analysis learning with multiomics neural networks on breast cancer. Front Genet. 2019;10:166.
SharifiNoghabi H, Zolotareva O, Collins CC, Ester M. MOLI: multiomics late integration with deep neural networks for drug response prediction. Bioinformatics. 2019;35(14):501–9.
Lin Y, Zhang W, Cao H, Li G, Du W. Classifying breast cancer subtypes using deep neural networks based on multiomics data. Genes. 2020;11(8):888.
Preuer K, Lewis RPI, Hochreiter S, Bender A, Bulusu KC, Klambauer G. DeepSynergy: predicting anticancer drug synergy with Deep Learning. Bioinformatics. 2018;34(9):1538–46.
Hb K, Tastan O, Cicek E. MatchMaker: a deep learning framework for drug synergy prediction. IEEE/ACM Trans Comput Biol Bioinform. 2021;2021(1):1545–5963.
Fu Y, Xu J, Tang Z, Wang L, Yin D, Fan Y, et al. A gene prioritization method based on a swine multiomics knowledgebase and a deep learning model. Commun Biol. 2020;3(1):1–11.
Islam MM, Huang S, Ajwad R, Chi C, Wang Y, Hu P. An integrative deep learning framework for classifying molecular subtypes of breast cancer. Comput Struct Biotechnol J. 2020;18:2185–99.
Wu X, Hui H, Niu M, Li L, Wang L, He B, et al. Deep learningbased multiview fusion model for screening 2019 novel coronavirus pneumonia: a multicentre study. Eur J Radiol. 2020;128:109041.
Ma T, Zhang A. Integrate multiomics data with biological interaction networks using Multiview Factorization AutoEncoder (MAE). BMC Genomics. 2019;20(S11):944.
Zhang T, Zhang L, Payne PRO, Li F. Synergistic drug combination prediction by integrating multiomics data in deep learning models. Methods Mol Biol. 2021;2194(2194):223–38.
Lee TY, Huang KY, Chuang CH, Lee CY, Chang TH. Incorporating deep learning and multiomics autoencoding for analysis of lung adenocarcinoma prognostication. Comput Biol Chem. 2020;87:107277.
Seal DB, Das V, Goswami S, De RK. Estimating gene expression from DNA methylation and copy number variation: A deep learning regression model for multiomics integration. Genomics. 2020;112(4):2833–41.
Poirion OB, Chaudhary K, Garmire LX. Deep Learning data integration for better risk stratification models of bladder cancer. AMIA Jt Summits Transl Sci Proc. 2018;2018:197–206.
Guo LY, Wu AH, Wang Yx, Zhang Lp, Chai H, Liang XF. Deep learningbased ovarian cancer subtypes identification using multiomics data. BioData Min. 2020;13(1):10.
Tong L, Mitchel J, Chatlin K, Wang MD. Deep learning based featurelevel integration of multiomics data for breast cancer patients survival analysis. BMC Med Inform Decis Mak. 2020;20(1):225.
Zuo C, Chen L. Deepjointlearning analysis model of single cell transcriptome and open chromatin accessibility data. Brief Bioinform. 2021;22(4):bbaa287.
Ronen J, Hayat S, Akalin A. Evaluation of colorectal cancer subtypes and cell lines using deep learning. Life Sci Alliance. 2019;2(6):1–16.
Zhang X, Zhang J, Sun K, Yang X, Dai C, Guo Y. Integrated multiomics analysis using variational autoencoders: application to pancancer classification. IEEE Int Conf Bioinformatics Biomed. 2019;2019(1):765–9.
Hira MT, Razzaque MA, Angione C, Scrivens J, Sawan S, Sarkar M. Integrated multiomics analysis of ovarian cancer using variational autoencoders. Sci Rep. 2021;11(1):6265.
Jiang P, Huang S, Fu Z, Sun Z, Lakowski TM, Hu P. Deep graph embedding for prioritizing synergistic anticancer drug combinations. Comput Struct Biotechnol J. 2020;18:427–38.
Hao Z, Wu D, Fang Y, Wu M, Cai R, Li X. Prediction of synthetic lethal interactions in human cancers using multiview graph autoencoder. IEEE J Biomed Health Inform. 2021;25:4041–51.
Tang X, Luo J, Shen C, Lai Z. Multiview multichannel attention graph convolutional network for miRNA–disease association prediction. Brief Bioinform. 2021;174:1–12.
Wang T, Shao W, Huang Z, Tang H, Zhang J, Ding Z, et al. MOGONET integrates multiomics data using graph convolutional networks allowing patient classification and biomarker identification. Nat Commun. 2021;12(1):3445.
Xing X, Yang F, Li H, Zhang J, Zhao Y, Gao M, et al. An interpretable multilevel enhanced graph attention network for disease diagnosis with gene expression data. In: 2021 IEEE International Conference on Bioinformatics and Biomedicine (BIBM); 2021. p. 556–61.
Afshar P, Oikonomou A, Naderkhani F, Tyrrell PN, Plataniotis KN, Farahani K, et al. 3DMCN: a 3D multiscale capsule network for lung nodule malignancy prediction. Sci Rep. 2020;10(1):1–11.
Peng C, Zheng Y, Huang DS. Capsule network based modeling of multiomics data for discovery of breast cancerrelated genes. IEEE/ACM Trans Comput Biol Bioinform. 2020;17(5):1605–12.
Ahmed KT, Sun J, Yong J, Zhang W. Multiomics data integration by generative adversarial network. Bioinformatics. 2022;38(1):179–86.
Kang M, Lee S, Lee D, Kim S. Learning celltypespecific gene regulation mechanisms by multiattention based deep learning with regulatory latent space. Front Genet. 2020;11:869.
Chung NC, Mirza B, Choi H, Wang J, Wang D, Ping P, et al. Unsupervised classification of multiomics data during cardiac remodeling using deep learning. Methods. 2019;166:66–73.
Chalise P, Raghavan R, Fridley BL. InterSIM: Simulation tool for multiple integrative ‘omic datasets’. Comput Methods Prog Biomed. 2016;128:69–74.
Thalamuthu A, Mukhopadhyay I, Zheng X, Tseng GC. Evaluation and comparison of gene clustering methods in microarray analysis. Bioinformatics. 2006;22(19):2405–12.
Lee J, Hyeon DY, Hwang D. Singlecell multiomics: technologies and data analysis methods. Exp Mol Med. 2020;52(9):1428–42.
Liu L, Liu C, Quintero A, Wu L, Yuan Y, Wang M, et al. Deconvolution of singlecell multiomics layers reveals regulatory heterogeneity. Nat Commun. 2019;10(1):470.
Heo YJ, Hwa C, Lee GH, Park JM, An JY. Integrative multiomics approaches in cancer research: from biological networks to clinical subtypes. Mol Cell. 2021;44(7):433–43.
Mao XG, Xue XY, Wang L, Lin W, Zhang X. Deep learning identified glioblastoma subtypes based on internal genomic expression ranks. BMC Cancer. 2022;22(1):86.
Wang Q, Hu B, Hu X, Kim H, Squatrito M, Scarpace L, et al. Tumor evolution of gliomaintrinsic gene expression subtypes associates with immunological changes in the microenvironment. Cancer Cell. 2017;32(1):42–56 e6.
Hu B, Ruan Y, Wei F, Qin G. Identification of three glioblastoma subtypes and a sixgene prognostic risk index based on the expression of growth factors and cytokines. Am J Transl Res. 2020;12(8):4669–82.
Zhang P, Xia Q, Liu L, Li S, Dong L. Current opinion on molecular characterization for GBM classification in guiding clinical diagnosis, prognosis, and therapy. Front Mol Biosci. 2020;7:562798.
Bismeijer T, Canisius S, Wessels LFA. Molecular characterization of breast and lung tumors by integration of multiple data types with functional sparsefactor analysis. PLoS Comput Biol. 2018;14(10):e1006520.
Mizdrak M, Ticinovic Kurir T, Bozic J. The role of biomarkers in adrenocortical carcinoma: a review of current evidence and future perspectives. Biomedicines. 2021;9(2):174.
Jouinot A, Assie G, Libe R, Fassnacht M, Papathomas T, Barreau O, et al. DNA methylation is an independent prognostic marker of survival in adrenocortical cancer. J Clin Endocrinol Metab. 2017;102(3):923–32.
Cherradi N. microRNAs as potential biomarkers in adrenocortical cancer: progress and challenges. Front Endocrinol (Lausanne). 2015;6:195.
Wen Y, Song X, Yan B, Yang X, Wu L, Leng D, et al. Multidimensional data integration algorithm based on random walk with restart. BMC Bioinformatics. 2021;22(1):97.
ZhiHua Z, Senior M, XuYing L. Training costsensitive neural networks with methods addressing the class imbalance problem. IEEE Transact Knowledge Data Eng. 2006;18(1):63–77.
Khan SH, Hayat M, Bennamoun M, Sohel FA, Togneri R. Costsensitive learning of deep feature representations from imbalanced data. IEEE Trans Neural Netw Learn Syst. 2018;29(8):3573–87.
Yang F, Fan K, Song D, Lin H. Graphbased prediction of Proteinprotein interactions with attributed signed graph embedding. BMC Bioinformatics. 2020;21(1):323.
Karimi M, Hasanzadeh A, Shen Y. Networkprincipled deep generative models for designing drug combinations as graph sets. Bioinformatics. 2020;36(Suppl_1):i445–i54.
Li H, Sun Y, Hong H, Huang X, Tao H, Huang Q, et al. Inferring transcription factor regulatory networks from singlecell ATACseq data based on graph neural networks. Nat Mach Intell. 2022;4(4):389–400.
Bertoni M, DuranFrigola M, Badia IMP, Pauls E, OrozcoRuiz M, GuitartPla O, et al. Bioactivity descriptors for uncharacterized chemical compounds. Nat Commun. 2021;12(1):3932.
Xu Y, Zhang Z, You L, Liu J, Fan Z, Zhou X. scIGANs: singlecell RNAseq imputation using generative adversarial networks. Nucleic Acids Res. 2020;48(15):e85.
Mao W, Zaslavsky E, Hartmann BM, Sealfon SC, Chikina M. Pathwaylevel information extractor (PLIER) for gene expression data. Nat Methods. 2019;16(7):607–10.
Gut G, Stark SG, Rätsch G, Davidson NR. pmVAE: learning interpretable singlecell representations with pathway modules. 2021. Preprint at https://biorxiv.org/content/10.1101/2021.01.28.428664v1.
Rybakov S, Lotfollahi M, Theis FJ, Wolf FA. Learning interpretable latent autoencoder representations with annotations of feature sets. 2020. Preprint at https://biorxiv.org/content/10.1101/2020.12.02.401182v1.
Irina Higgins, Loic Matthey, Arka Pal, Christopher Burgess, Xavier Glorot, Matthew Botvinick, et al. BetaVAE: learning basic visual concepts with a constrained variational framework International conference on learning representations. ICLR 2017 pcs. 2016.
Leng D, Zheng L, Wen Y, Zhang Y, Wu L, Wang J, Wang M, Zhang Z, He S, Bo X. A benchmark study of deep learningbased multiomics data fusion methods for cancer. GitHub. 2022. https://github.com/zhenglinyi/DLmo.
Leng D, Zheng L, Wen Y, Zhang Y, Wu L, Wang J, et al. A benchmark study of deep learningbased multiomics data fusion methods for cancer. Zenodo. 2022. https://doi.org/10.5281/zenodo.6876344.
Acknowledgements
Not applicable.
Review history
The review history is available as Additional file 2.
Peer review information
Stephanie McClelland was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
Funding
This work was supported by the National Natural Science Foundation of China (62103436).
Author information
Authors and Affiliations
Contributions
X.B., S.H., and Z.Z. supervised the work. Y.W. and D.L. designed and wrote the manuscript. L.Z. performed the experiments. Y.Z. and M.W. performed the data preparation. L.W. and J.W. conducted data analysis. S.H. revised the manuscript. All authors read and approved the final manuscript.
Corresponding authors
Ethics declarations
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Additional information
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary Information
13059_2022_2739_MOESM1_ESM.docx
Additional file 1: Table S1. Performance of six supervised methods in the condition that the clusters have variable random sizes. Table S2. JI of ten unsupervised methods on simulated datasets. The results are presented as mean value of JIs. Table S3. Cindex of ten unsupervised methods on simulated datasets. Table S4. Silhouette score of ten unsupervised methods on simulated datasets. Table S5. Davies Bouldin score of ten unsupervised methods on simulated datasets. Table S6. JI, Cindex, silhouette score, and Davies Bouldin score of ten unsupervised methods on singlecell multiomics datasets. The JI index is presented as mean value of JIs. Table S7. Cindex of ten unsupervised methods on cancer benchmark datasets used in clustering task. Table S8. Silhouette scores of ten unsupervised methods on cancer benchmark datasets used in clustering task. Table S9. Davies Bouldin scores of ten unsupervised methods on cancer benchmark datasets used in clustering task. Table S10. Selectivity score of ten unsupervised methods on cancer benchmark datasets used in clustering task (selectivity scores greater than the average are bolded). Figure S1. Data reduction experiment on cancer benchmark datasets used in classification task. Accuracy (a), F1 macro (b), F1 weighted (c) of the six unsupervised methods for classification under 20%, 40%, 60%, 80% of the total samples in the original data, respectively.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Leng, D., Zheng, L., Wen, Y. et al. A benchmark study of deep learningbased multiomics data fusion methods for cancer. Genome Biol 23, 171 (2022). https://doi.org/10.1186/s13059022027392
Received:
Accepted:
Published:
DOI: https://doi.org/10.1186/s13059022027392