- Open Access
MUON: multimodal omics analysis framework
Genome Biology volume 23, Article number: 42 (2022)
Advances in multi-omics have led to an explosion of multimodal datasets to address questions from basic biology to translation. While these data provide novel opportunities for discovery, they also pose management and analysis challenges, thus motivating the development of tailored computational solutions. Here, we present a data standard and an analysis framework for multi-omics, MUON, designed to organise, analyse, visualise, and exchange multimodal data. MUON stores multimodal data in an efficient yet flexible and interoperable data structure. MUON enables a versatile range of analyses, from data preprocessing to flexible multi-omics alignment.
Multi-omics designs, that is the simultaneous profiling of multiple omics or other modalities for the same sample or cells, have recently gained traction across different biological domains. Multi-omics approaches have been applied to enable new insights in basic biology and translational research [1, 2].
On the one hand, the emerging multi-omics datasets result in novel opportunities for advanced analysis and biological discovery . Critically, however, multi-omics experiments and assays pose considerable computational challenges, both concerning the management and processing as well as the integration of such data [4, 5]. Major challenges include efficient storage, indexing and seamless access of high-volume datasets from disk, the ability to keep track and link biological and technical metadata, and dealing with the dependencies between omics layers or individual features. Additionally, multi-omics datasets need to be converted into specific file formats to satisfy input requirements for different analysis and visualisation tools.
While specialised frameworks for the analysis of different omics data types have been proposed, including for bulk and single-cell RNA-seq [6,7,8,9] or epigenetic variation data [10,11,12,13], there is a lack of comprehensive solutions that specifically address multi-omics designs. Additionally, there currently exists no open exchange format for sharing multi-omics datasets that is accessible from different programming languages. The currently existing solutions for multi-omics data (Seurat , MultiAssayExperiment ) are confined to the R programming language ecosystem, and typically require loading the full dataset, a limitation that prohibits dealing with larger datasets and can only be partially mitigated by using additional third-party software [15, 16].
To address this, we here present MUON (multimodal omics analysis), an analysis framework that is designed from the ground-up to organise, analyse, visualise, and exchange multimodal data. MUON is implemented in Python and comes with an extensive toolbox to flexibly construct, manipulate and analyse multi-omics datasets. At the core of the framework is MuData, an open data structure standard, which is compatible with and extends previous data formats for single omics [9, 17]. MuData files can be seamlessly accessed from different programming languages, including Python , R , and Julia . We illustrate MUON in the context of different vignettes of its application with a major focus on single-cell data, including analysis of combined gene expression and chromatin accessibility assays as well as gene expression and epitope profiling.
MuData: a cross-platform multimodal omics data container
At the core of MUON is MuData (multimodal data)—an open data structure for multimodal datasets. MuData handles multimodal datasets as containers of unimodal data. This hierarchical data model generalises existing matrix-based data formats for single omics, whereby data from each individual omics layer are stored as an AnnData  object (Fig. 1a, c). MuData also provides a coherent structure for storing associated metadata and other side information, both at the level of samples (e.g. cells or individuals) and features (e.g. genes or genomics locations). Metadata tables can either be specific to a single stored data modality, or they can represent joint sample annotations that apply to all modalities stored in a MuData container. In a similar manner, MuData containers can be used to store derived data and analysis outputs, such as cluster labels or an inferred sample embedding (Fig. 1b).
MuData objects are serialised to HDF5  files by default—the industry standard for storing hierarchical data. Individual omics layers are serialised using the existing AnnData serialisation format, thus permitting direct access to single omics using existing toolchains that build on this data standard (Fig. 1c). Basic access to MuData files is possible from all major programming environments that support access to HDF5 array objects. Additionally, MUON comes with dedicated libraries to create, read and write MuData files from Python, R, and Julia. These tools facilitate the exchange of multi-omics data across platforms and ensure consistent file format definitions.
MUON: a framework for multimodal omics data
The MUON framework allows for managing, processing, and visualising multi-omics data using the MuData containers. Existing workflows developed for single-omics can be reused and applied to the contents of a multi-omics container. For example, individual modalities of the simultaneous gene expression and chromatin accessibility profiling  can be processed using existing RNA  and ATAC  workflows. In this manner, canonical processing steps, including quality control, sample filtering, data normalisation and the selection of features for analysis can be transferred from single-omics analysis (Fig. 2a).
The integration of multiple modalities within a MuData container facilitates the definition of multi-omics analysis workflows, allowing to flexibly combine alternative processing steps (from left to right in Fig. 2b). For example, single-omics dimensionality reduction methods such as principal component analysis or factor analysis [25,26,27,28] can be used to separately process RNA-seq and ATAC-seq count matrices. Additionally, MUON comes with interfaces to multi-omics analysis methods that jointly process multiple modalities, including multi-omics factor analysis [29, 30] (MOFA) to obtain lower-dimensional representations, and weighted nearest neighbours  (WNN) to calculate multimodal neighbours. Once the results from either dimensionality reduction strategy are stored in a MUON container, they can be used as input for defining cell neighbour graphs. This graph can be either estimated from individual omics modalities, from a multi-omics representation (e.g. as obtained from MOFA), or by fusing two single-omics neighbour representations (e.g. using methods such as similarity network fusion, SNF , or WNN ).
Finally, the latent or neighbourhood representations can serve as a starting point for downstream analysis and interpretation. For example, uniform manifold approximation and projection (UMAP)  can be directly applied to cell neighbourhood graphs to generate nonlinear embeddings of cells. Similarly, the alternative cell neighbourhood graphs can be used as input for identifying connected components and thereby putative cell types (e.g. using multiplex community detection techniques ).
The flexibility to choose and control individual processing steps in MUON makes it possible to compose tailored workflows for a particular dataset.
Application of MUON to single-cell multi-omics data
To illustrate MUON, we considered data from simultaneous scRNA-seq and scATAC-seq profiling of peripheral blood mononuclear cells (PBMCs), which were generated using the Chromium Single Cell Multiome ATAC + Gene Expression protocol by 10x Genomics . Features in the RNA modality correspond to the expression level of genes, whereas the ATAC modality encodes accessible genomic loci as peaks. MUON supports the application of alternative dimensionality reduction strategies (Fig. 2). For example, multi-omics factors analysis —an approach for integrating different omics modalities based on matrix factorization—yields a lower dimensional representation, including factors that capture variation of individual omics or shared variability (Additional file 1: Fig S1a), which in turn can be interpreted on the level of individual features (Fig. 3a, Additional file 1:Fig S1b). Here, the factors that explain the largest fraction of variance in PBMCs capture canonical biological differences, such as the myeloid—lymphoid axis and cytotoxicity (Fig. 3a, left). These factors capture both variation in mRNA abundance and chromatin accessibility, e.g. as CD3E expression and BCL11B promoter accessibility, which are characteristic for T cells [35, 36] (Fig. 3a, right).
A two-dimensional latent space recapitulating the structure of the data is commonly used for visualising cell type composition, cell-level covariates, or feature counts. For this, it is important that MUON allows to generate, store, and operate with multiple different embeddings constructed for individual modalities (Fig. 3b, left) or jointly for both modalities based, for instance, on the MOFA factors or the WNN graph (Fig. 3b, right). Such visualisations can be generated from MuData objects without loading all the data into memory.
As a second example, we considered CITE-seq  data, which comprise gene expression and epitope abundance information in the same cells. To process the latter, specialised normalisation strategies for denoising and scaling  are available. Normalised protein counts can then be used to define cell types, akin to gating in flow cytometry  (Fig. 3c). Once the count matrices are processed, these can be integrated using alternative multimodal options (Fig. 2). For instance, using both modalities for cell-type annotation as well as for dimensionality reduction allows to attribute the distinction between naïve and memory T cells to the abundance of CD45 isoforms RA and RO at the protein level (Fig. 3d).
Multimodal omics designs are increasingly accessible, allowing for characterising and integrating different dimensions of cellular variation, including gene expression, DNA methylation, chromatin accessibility, and protein abundance [3, 40, 41]. MUON directly addresses the computational needs posed by such multi-omics designs, including data processing, analysis, interpretation, and sharing (Fig. 1). Designed for the Python ecosystem, MUON operates on MuData objects that build on community standards for single-omics analysis . Serialisation to HDF5 makes MuData objects accessible to other programming languages, including R and Julia.
MUON is designed in a modular fashion, which means that existing methods and tools for processing individual omics can be reused to design more complex analysis workflows (Figs. 2 and 3). At the same time, the software facilitates combining single-omics analysis methods with a growing spectrum of multi-omics integration strategies [42, 43] to define novel multi-omics workflows.
Looking ahead, MUON will be a robust platform to build upon and support future developments. On the one hand, handling novel assays for multi-omics that are emerging can be integrated. For example, mRNA and proteins can be assayed together not only with CITE-seq  but also with QuRIE-seq  or INs-seq . Other examples include explicit support for genomic-coordinate based assays  or assays with spatial coordinates . Moreover, trimodal assays such as scNMT-seq  or TEA-seq  allow to generate data beyond just two modalities and can be handled with MUON, which is designed to manage an arbitrary number of modalities. On the other hand, the complexity of experimental designs is rapidly increasing [50, 51]. Already, MUON can take additional covariates into account during multimodal integration, for example, to perform temporally aware factor analysis . Future development of MUON will include incorporating additional relationships in MuData, for example, to explicitly model the dependencies between feature sets across omics, or to account for dependencies between multiple sets of multi-omics experiments.
With MuData proposing a standardised and language-agnostic approach to manage, store, and share multimodal omics data, it is now possible to build methods and tools that can be applied to an increasingly large number of multi-omics datasets. As a multimodal framework, MUON addresses the need for multi-omics analysis workflows that are well integrated into the existing Python ecosystem, in particular with tools for omics analysis such as Scanpy . At the same time, MuData facilitates the compatibility and data exchange with R and Julia.
Implementation of MuData
The reference MuData implementation is written in the Python programming language and builds on AnnData . A MuData object can be cast as a collection of single-omics modalities, each of which is represented as an AnnData object. Additionally, the MuData object provides basic selector operations, including access to individual modalities, subsetting of samples and/or features. When subsetting samples, these are selected in each modality as well as in multimodal annotations; features from different modalities can be used to obtain a MuData object with desired features. As with AnnData, unstructured data can be stored in a MuData object, which can be used for recording assay-specific information. Feature relations across modalities can be stored in the MuData object as a sparse multimodal graph.
MuData objects are serialised to .h5mu files, which are based on HDF5—industry standard for hierarchical storage of numerical data supported by many programming languages . Individual modalities are stored in the file hierarchy in a way compliant with AnnData serialisation, enabling access to individual modalities from disk. Disk backing is implemented for MuData objects so that MuData files can be read without loading count matrices of individual modalities.
Cross-language capabilities of MuData files are demonstrated with Julia and R libraries. Julia library implements native AnnData and MuData objects whereas R libraries create MultiAssayExperiment  or Seurat  objects with information from MuData files. As .h5ad and .h5mu are not the native formats for R frameworks, standards are still to be developed for how to serialise auxiliary information stored in the R object—and, conversely, deserialize this information back from the files.
Implementation of MUON
MUON has been implemented in the Python programming language and builds on a number of existing numerical and scientific open-source libraries, in particular, NumPy , Scipy , Sklearn , Pandas , h5py , AnnData , and Scanpy  for omics data handling, MOFA+  for multimodal data integration and matplotlib  and seaborn  for data visualisation. The weighted nearest neighbours (WNN) method has been implemented following  describing the original method and  describing its generalisation to an arbitrary number of modalities.
Comparison of MuData with alternative data formats
MuData and MUON take inspiration and build on concepts from AnnData  and Scanpy . In fact, the software incorporates ideas and extends it in a modular fashion, similar to the existing practice in the Bioconductor community .
MultiAssay Experiment 
Main programming environment
Objects can contain data out of memory (on disk)
Native serialisation accessible from multiple languages
Native support for I/O operations
Python, Julia, R*
Support for multiple modalities
Support for data missing in some modalities
Support for multimodal embeddings
Processing gene expression and chromatin accessibility data
Single-cell multiome ATAC + gene expression demonstration data for peripheral blood mononuclear cells (PBMCs) from a healthy donor with granulocytes removed through cell sorting processed with ARC 1.0.0 pipeline were provided by 10X Genomics (https://support.10xgenomics.com/single-cell-multiome-atac-gex/datasets). Log-normalisation was used for both gene and peak counts, and respective values for highly variable features scaled and centred to zero mean and unit variance were then used as input to discussed algorithms such as PCA, as implemented in scikit-learn  and scanpy , or MOFA+ . Differentially expressed genes and differentially accessible peaks were identified with respective functionality in scanpy and were used to compile gene lists for cell type identification.
The respective vignettes are available at https://muon-tutorials.readthedocs.io/en/latest/single-cell-rna-atac.
Processing CITE-seq data
CITE-seq data for PBMCs from a healthy donor were provided by 10X Genomics (https://support.10xgenomics.com/single-cell-gene-expression/datasets/3.0.2/5k_pbmc_protein_v3). Log-normalisation was used for gene counts, and dsb  was used to denoise and scale protein counts. Respective values for highly variable features scaled and centred to zero mean and unit variance were then used as input to discussed algorithms. The respective vignettes are available at https://muon-tutorials.readthedocs.io/en/latest/cite-seq.
Availability of Data and Materials
Data on simultaneous scRNA-seq & scATAC-seq profiling  of PBMCs from a healthy donor is available from the 10X Genomics website (10k cells with granulocytes removed through cell sorting, Cell Ranger ARC 1.0.0) .
CITE-seq  data on PBMCs from a healthy donor is available from the 10X Genomics website (5k cells with a panel of TotalSeqTM-B antibodies, v3 Chemistry, Cell Ranger 3.0.2) .
MUON source code is available at https://github.com/scverse/muon  under the BSD3 license. Documentation and tutorials for MUON can be accessed at https://muon.readthedocs.io/ and at https://muon-tutorials.readthedocs.io, respectively.
MuData implementation is available in a standalone Python library at https://github.com/scverse/mudata  under the BSD3 license. Its documentation is accessible at https://mudata.readthdocs.io.
The source code for Julia can be accessed at , for the bioconductor R library—at , for the Seurat R library—at .
The version of the code used in the manuscript is deposited on Zenodo .
Hasin Y, Seldin M, Lusis A. Multi-omics approaches to disease. Genome Biol. 2017;18(1):83. https://doi.org/10.1186/s13059-017-1215-1.
Zhu C, Preissl S, Ren B. Single-cell multimodal omics: the power of many. Nat Methods. 2020;17(1):11–4. https://doi.org/10.1038/s41592-019-0691-5.
Argelaguet R, Cuomo ASE, Stegle O. Marioni JC. Computational principles and challenges in single-cell data integration. Nat Biotechnol. 2021. https://doi.org/10.1038/s41587-021-00895-7.
Conesa A, Beck S. Making multi-omics data accessible to researchers. Sci Data. 2019;6(1):251. https://doi.org/10.1038/s41597-019-0258-4.
Wilkinson MD, Dumontier M, Aalbersberg IJJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Sci Data. 2016;3(1):160018. https://doi.org/10.1038/sdata.2016.18.
ATL L, DJ MC, Marioni JC. A step-by-step workflow for low-level analysis of single-cell RNA-seq data with Bioconductor. F1000Res. 2016;2122. https://doi.org/10.12688/f1000research.9501.2.
DJ MC, Campbell KR, ATL L, Wills QF. scater: pre-processing, quality control, normalisation and visualisation of single-cell RNA-seq data in R. Bioinformatics. 33(8):1179–86. https://doi.org/10.1093/bioinformatics/btw777.
Satija R, Farrell JA, Gennert D, Schier AF, Regev A. Spatial reconstruction of single-cell gene expression data. Nat Biotechnol. 2015;33(5):495–502. https://doi.org/10.1038/nbt.3192.
Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biol. 2018;19(1):15. https://doi.org/10.1186/s13059-017-1382-0.
Danese A, Richter ML, Chaichoompu K, Fischer DS, Theis FJ, Colomé-Tatché M. EpiScanpy: integrated single-cell epigenomic analysis. EpiScanpy: integrated single-cell epigenomic analysis. Nat Commun. 2021;12(1):1-8. https://doi.org/10.1038/s41467-021-25131-3.
Stuart T, Srivastava A, Lareau C, Satija R. Multimodal single-cell chromatin analysis with Signac. bioRxiv. 2020.11.09.373613. https://doi.org/10.1101/2020.11.09.373613.
Granja JM, Corces MR, Pierce SE, Bagdatli ST, Choudhry H, Chang HY, Greenleaf WJ. ArchR is a scalable software package for integrative single-cell chromatin accessibility analysis. Nat Genet. 2021;53(3):403-11. https://doi.org/10.1038/s41588-021-00790-6.
Fang R, Preissl S, Li Y, Hou X, Lucero J, Wang X, et al. Comprehensive analysis of single cell ATAC-seq data with SnapATAC. Nat Commun. 2021;12(1):1337. https://doi.org/10.1038/s41467-021-21583-9.
Ramos M, Schiffer L, Re A, Azhar R, Basunia A, Rodriguez C, et al. Software for the Integration of Multiomics Experiments in Bioconductor. Cancer Res. 2017;77(21):e39–42. https://doi.org/10.1158/0008-5472.CAN-17-0344.
Hoffman P, Satija R. SeuratDisk: Interfaces for HDF5-Based Single Cell File Formats. GitHub. https://github.com/mojaveazure/seurat-disk.
Pagès H. HDF5Array: HDF5 backend for DelayedArray objects, 2018. URL https://bioconductor.org/packages/HDF5Array R package version. 1.
Virshup I, Rybakov S, Theis FJ, Angerer P, Wolf FA. bioRxiv. 2021.12.16.473007. https://doi.org/10.1101/2021.12.16.473007.
Van Rossum G, Drake FL Jr. Python tutorial. Centrum voor Wiskunde en Informatica Amsterdam; 1995.
R Core Team. R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. 2021. URL https://www.R-project.org/.
Bezanson J, Edelman A, Karpinski S, Shah VB. Julia: A Fresh Approach to Numerical Computing. SIAM Rev. 2017;59(1):65–98. https://doi.org/10.1137/141000671.
The HDF5® Library & File Format. http://www.hdfgroup.org/HDF5. Accessed 14 May 2021.
Single Cell Multiome ATAC + Gene Expression - 10x Genomics. https://www.10xgenomics.com/products/single-cell-multiome-atac-plus-gene-expression. Accessed 14 May 2021.
Luecken MD, Theis FJ. Current best practices in single-cell RNA-seq analysis: a tutorial. Mol Syst Biol. 2019;15:e8746. https://doi.org/10.15252/msb.20188746.
Yan F, Powell DR, Curtis DJ, Wong NC. From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis. Genome Biol. 2020;21(1):22. https://doi.org/10.1186/s13059-020-1929-3.
Leek JT, Storey JD. Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 2007;3(9):1724–35. https://doi.org/10.1371/journal.pgen.0030161.
Lee D, Cheng A, Lawlor N, Bolisetty M, Ucar D. Detection of correlated hidden factors from single cell transcriptomes using Iteratively Adjusted-SVA (IA-SVA). Sci Rep. 2018;8(1):17040. https://doi.org/10.1038/s41598-018-35365-9.
Buettner F, Pratanwanich N, McCarthy DJ, Marioni JC, Stegle O. f-scLVM: scalable and versatile factor analysis for single-cell RNA-seq. Genome Biol. 2017;18(1):212. https://doi.org/10.1186/s13059-017-1334-8.
Risso D, Perraudeau F, Gribkova S, Dudoit S, Vert J-P. A general and flexible method for signal extraction from single-cell RNA-seq data. Nat Commun. 2018;9(1):284. https://doi.org/10.1038/s41467-017-02554-5.
Argelaguet R, Velten B, Arnol D. Multi-Omics Factor Analysis—a framework for unsupervised integration of multi-omics data sets. Mol Syst Biol. 2018;14(6):e8124. https://doi.org/10.15252/msb.20178124.
Argelaguet R, Arnol D, Bredikhin D, Deloro Y, Velten B, Marioni JC, et al. MOFA+: a statistical framework for comprehensive integration of multi-modal single-cell data. Genome Biol. 2020;21(1):111. https://doi.org/10.1186/s13059-020-02015-1.
Hao Y, Hao S, Andersen-Nissen E, Mauck III WM, Zheng S, Butler A, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573-3587. https://doi.org/10.1016/j.cell.2021.04.048.
Wang B, Mezlini AM, Demir F, Fiume M, Tu Z, Brudno M, et al. Similarity network fusion for aggregating data types on a genomic scale. Nat Methods. 2014;11(3):333–7. https://doi.org/10.1038/nmeth.2810.
Becht E, McInnes L, Healy J, Dutertre CA, Kwok IW, Ng LG, Ginhoux F, Newell EW. Dimensionality reduction for visualizing single-cell data using UMAP. Nat Biotechnol. 2019;37(1):38-44. https://doi.org/10.1038/nbt.4314.
Mucha PJ, Richardson T, Macon K, Porter MA, Onnela J-P. Community structure in time-dependent, multiscale, and multiplex networks. Science. 2010;328(5980):876–8. https://doi.org/10.1126/science.1184819.
Clevers H, Alarcon B, Wileman T, Terhorst C. The T cell receptor/CD3 complex: a dynamic protein ensemble. Annu Rev Immunol. 1988;6(1):629–62. https://doi.org/10.1146/annurev.iy.06.040188.003213.
Liu P, Li P, Burke S. Critical roles of Bcl11b in T-cell development and maintenance of T-cell identity. Immunol Rev. 2010;238(1):138–49. https://doi.org/10.1111/j.1600-065X.2010.00953.x.
Stoeckius M, Hafemeister C, Stephenson W, Houck-Loomis B, Chattopadhyay PK, Swerdlow H, et al. Simultaneous epitope and transcriptome measurement in single cells. Nat Methods. 2017;14(9):865–8. https://doi.org/10.1038/nmeth.4380.
Mulè MP, Martins AJ, Tsang JS. Normalizing and denoising protein expression data from droplet-based single cell profiling. bioRxiv. 2020:2020.02.24.963603. https://doi.org/10.1101/2020.02.24.963603.
Mattanovich D, Borth N. Applications of cell sorting in biotechnology. Microb Cell Fact. 2006;5(1):12. https://doi.org/10.1186/1475-2859-5-12.
Kelsey G, Stegle O, Reik W. Single-cell epigenomics: Recording the past and predicting the future. Science. 2017;358(6359):69–75. https://doi.org/10.1126/science.aan6826.
Efremova M, Teichmann SA. Computational methods for single-cell omics across modalities. Nat Methods. 2020;17(1):14–7. https://doi.org/10.1038/s41592-019-0692-4.
Lee J, Hyeon DY, Hwang D. Single-cell multiomics: technologies and data analysis methods. Exp Mol Med. 2020;52(9):1428–42. https://doi.org/10.1038/s12276-020-0420-2.
Miao Z, Humphreys BD, McMahon AP, Kim J. Multi-omics integration in the age of million single-cell data. Nat Rev Nephrol. 2021;17(11):1–15. https://doi.org/10.1038/s41581-021-00463-x.
Rivello F, van Buijtenen E, Matuła K, van Buggenum JA, Vink P, van Eenennaam H, Mulder KW, Huck WT. Single-cell intracellular epitope and transcript detection reveals signal transduction dynamics. Cell Rep Methods. 2021;1(5):100070. https://doi.org/10.1016/j.crmeth.2021.100070.
Katzenelenbogen Y, Sheban F, Yalin A, Yofe I, Svetlichnyy D, Jaitin DA, et al. Coupled scRNA-Seq and Intracellular Protein Activity Reveal an Immunosuppressive Role of TREM2 in Cancer. Cell. 2020;182:872–85 e19. https://doi.org/10.1016/j.cell.2020.06.032.
Stovner EB, Sætrom P. PyRanges: efficient comparison of genomic intervals in Python. Bioinformatics. 2020;36(3):918–9. https://doi.org/10.1093/bioinformatics/btz615.
Palla G, Spitzer H, Klein M, Fischer D, Schaar AC, Kuemmerle LB, et al. Squidpy: a scalable framework for spatial single cell analysis. bioRxiv. 2021:2021.02.19.431994. https://doi.org/10.1101/2021.02.19.431994.
Clark SJ, Argelaguet R, Kapourani C-A, Stubbs TM, Lee HJ, Alda-Catalinas C, et al. scNMT-seq enables joint profiling of chromatin accessibility DNA methylation and transcription in single cells. Nat Commun. 2018;9:781. https://doi.org/10.1038/s41467-018-03149-4.
Swanson E, Lord C, Reading J, Heubeck AT, Genge PC, Thomson Z, et al. Simultaneous trimodal single-cell measurement of transcripts, epitopes, and chromatin accessibility using TEA-seq. Elife. 2021;10. https://doi.org/10.7554/eLife.63632.
Rood JE, Stuart T, Ghazanfar S, Biancalani T, Fisher E, Butler A, et al. Toward a Common Coordinate Framework for the Human Body. Cell. 2019:1455–67. https://doi.org/10.1016/j.cell.2019.11.019.
Rozenblatt-Rosen O, Shin JW, Rood JE, Hupalowska A, Human Cell Atlas Standards and Technology Working Group, Regev A, et al. Building a high-quality Human Cell Atlas. Nat Biotechnol. 2021;39(2):149–53. https://doi.org/10.1038/s41587-020-00812-4.
Velten B, Braunger JM, Arnol D, Argelaguet R, Stegle O. Identifying temporal and spatial patterns of variation from multi-modal data using MEFISTO. bioRxiv. 2020.11.03.366674. https://doi.org/10.1101/2020.11.03.366674.
Harris CR, Millman KJ, van der Walt SJ, Gommers R, Virtanen P, Cournapeau D, et al. Array programming with NumPy. Nature. 2020;585(7825):357–62. https://doi.org/10.1038/s41586-020-2649-2.
Virtanen P, Gommers R, Oliphant TE, Haberland M, Reddy T, Cournapeau D, et al. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nat Methods. 2020;17(3):261–72. https://doi.org/10.1038/s41592-019-0686-2.
Pedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine learning in Python. J Mach Learn Res. 2011;12:2825–30.
McKinney W. Others. pandas: a foundational Python library for data analysis and statistics. Python High Perform Sci Comput. 2011;14:1–9.
Collette A. Python and HDF5: unlocking scientific data. "O'Reilly Media, Inc."; 2013.
Hunter JD. Matplotlib: A 2D graphics environment. Comput Sci Engi. 2007;9(03):90–5. https://doi.org/10.1109/MCSE.2007.55.
Waskom M. seaborn: statistical data visualization. J Open Source Softw. 2021;6:3021.
Huber W, Carey VJ, Gentleman R, Anders S, Carlson M, Carvalho BS, et al. Orchestrating high-throughput genomic analysis with Bioconductor. Nat Methods. 2015;12(2):115–21. https://doi.org/10.1038/nmeth.3252.
Genomics 10x. PBMC from a healthy donor - granulocytes removed through cell sorting (10k). [ accessed 10 Dec 2021]. https://www.10xgenomics.com/resources/datasets/pbmc-from-a-healthy-donor-granulocytes-removed-through-cell-sorting-10-k-1-standard-1-0-0
Genomics 10x. 5k Peripheral blood mononuclear cells from a healthy donor (v3 chemistry). [ accessed 10 Dec 2021]. https://www.10xgenomics.com/resources/datasets/5-k-peripheral-blood-mononuclear-cells-pbm-cs-from-a-healthy-donor-v-3-chemistry-3-1-standard-3-0-2
Bredikhin D, Kats I, Stegle O. muon: multimodal omics Python framework. Github. https://github.com/scverse/muon.
Bredikhin D, Kats I, Stegle O. mudata: multimodal data. Github. https://github.com/scverse/mudata.
Bredikhin D, Kats I, Stegle O. Muon.jl. Github. https://github.com/scverse/Muon.jl.
Bredikhin D, Kats I, Stegle O. MuData. Github. https://github.com/PMBio/MuDataMAE.
Bredikhin D, Kats I, Stegle O. MuDataSeurat. Github. https://github.com/PMBio/MuDataSeurat.
Bredikhin D, Kats I, Stegle O. Muon: multimodal omics analysis framework. Zenodo. 2021. https://doi.org/10.5281/ZENODO.5557542.
We are grateful to the members of the Stegle and Theis labs for the discussions on the MuData and MUON design.
Peer review information
Barbara Cheifet was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.
The review history is available as Additional file 2.
D.B. is supported by the EMBL International PhD Programme and a Darwin Trust fellowship. Research in the Stegle research group was further supported by the European Commission (grant agreements 810296, 874769) and the BMBF. Open Access funding enabled and organized by Projekt DEAL.
Ethics approval and consent to participate
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Additional file 1.
Additional file 2.
Rights and permissions
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated in a credit line to the data.
About this article
Cite this article
Bredikhin, D., Kats, I. & Stegle, O. MUON: multimodal omics analysis framework. Genome Biol 23, 42 (2022). https://doi.org/10.1186/s13059-021-02577-8