Cobolt: integrative analysis of multimodal single-cell sequencing data

A growing number of single-cell sequencing platforms enable joint profiling of multiple omics from the same cells. We present Cobolt, a novel method that not only allows for analyzing the data from joint-modality platforms, but provides a coherent framework for the integration of multiple datasets measured on different modalities. We demonstrate its performance on multi-modality data of gene expression and chromatin accessibility and illustrate the integration abilities of Cobolt by jointly analyzing this multi-modality data with single-cell RNA-seq and ATAC-seq datasets. Supplementary Information The online version contains supplementary material available at (10.1186/s13059-021-02556-z).

where the encoder q φ (z|x) is parameterized by neural networks and serves as an approximation to actual posterior distribution. The decoder p ψ (x|z) follows our model for single modality, which we will describe in detail later.
The above equation gives the objective function when all the modalities are observed. In case of misssing modalities, our conditional independence assumption allows us to obtain a partial ELBO. Consider a subset of modalities A ⊂ {1, · · · , M } and data x A = {x (i) |i ∈ A}, its log-likehood given the latent variable is log p ψ (x A |z) = i∈A log p ψ (x (i) |z). The corresponding ELBO can be written as where η i are hyperparameters allowing different weights for different modalities. In actual training, even when all modalities are collected, we still subsample the modalities such that posterior likelihoods given different modality subsets are learnt. Denote S c as the modalities collected for cell c and x c = {x (i) |i ∈ S c } as the data, our final objective function is given by where λ A 's are hyperparameters allowing different weightings for the ELBO terms with different modalities.
Prior Distribution We use Gaussian distribution for the prior p(z). A transformed vector θ = σ(z) is then calculated as the mixing probability of our latent model ( Figure 1) for modeling single modalities, where σ is the softmax transformation. θ therefore follows a logistic normal distribution, which is commonly used as an approximation to a Dirichlet distribution. To approximate Dirichlet(θ|ξ), the parameters of the logistic normal LN (θ|µ 0 (ξ), Σ 0 (ξ)) are calculated as a function of ξ, Therefore, given the hyperparameter ξ, we use the Gaussian distribution N (z|µ 0 (ξ), Σ 0 (ξ)) for the prior p(z).
Decoders For cell c in modality i, our generative likelihood p(x where c indicates the batch of cell c and α 1, c ∈ R di , and B (i) ∈ R di×K are parameters to be estimated that are shared across samples. σ(B (i) ) gives the probability vectors for each latent category. α (i) 0, c and α (i) 1, c therefore provides scaling and shifting for adjusting dataset-or platform-specific differences in feature counts. Notice that Equation 4 works without the softmax transformation on B (i) , which gives us another version of the model In practice, we find 5 works slightly better than 4. The results we present in this paper are based on 5.
In summary, we write the joint likelihood of the latent variable z c and the data x c of cell c as, where p(x Encoders The encoder mostly follows Wu and Goodman [2018], which we discribe below. We would like to obtain an approximation q(z|x A ) of the posterior distribution p(z|x A ) given any collection of modalities x A . To do this, we first write the true posterior as [Wu and Goodman, 2018] We then approximate p(z|x (i) ) p(z) with Gaussian g(z|x (i) ) := N (μ (i) ,Σ (i) ), whereμ (i) andΣ (i) are estimated by neural networks. Here we use the property that the product of two multivariate Gaussian densities is proportional to a Gaussian density. q φ (z|x A ) ∝ p(z) i∈A g(z|x (i) ) is still Gaussian whose parameters can be analytically written out, With this trick, instead of using 2 M − 1 neural networks to estimate all Σ A and µ A individually, our model uses one network forμ (i) andΣ (i) separately for each modality i, resulting in a total of 2M networks. The variational posteriors q φ (z|x A ) are calculated usingμ (i) andΣ (i) . The variational posterior q φ (θ|x A ) is then defined to be logistic normal LN (µ A , Σ A ).