- Method
- Open Access
BGP: identifying gene-specific branching dynamics from single-cell data with a branching Gaussian process
- Alexis Boukouvalas^{1}Email authorView ORCID ID profile,
- James Hensman^{2} and
- Magnus Rattray^{1}Email author
- Received: 30 October 2017
- Accepted: 1 May 2018
- Published: 29 May 2018
Abstract
High-throughput single-cell gene expression experiments can be used to uncover branching dynamics in cell populations undergoing differentiation through pseudotime methods. We develop the branching Gaussian process (BGP), a non-parametric model that is able to identify branching dynamics for individual genes and provide an estimate of branching times for each gene with an associated credible region. We demonstrate the effectiveness of our method on simulated data, a single-cell RNA-seq haematopoiesis study and mouse embryonic stem cells generated using droplet barcoding. The method is robust to high levels of technical variation and dropout, which are common in single-cell data.
Keywords
- Single cell RNA-seq
- Gaussian process
- Branching dynamics
Background
Single-cell gene expression data can be used to uncover cellular progression through different states of a temporal transformation, e.g. during development, differentiation or disease. As single-cell protocols improve, a flurry of methods have been proposed to model branching of cellular trajectories to alternative cell fates [1–4]. In these and similar methods, pseudotime is estimated and a global branching structure is inferred. Our focus in this paper is on a downstream analysis method that can subsequently be used to model branching gene expression dynamics for individual genes. We are interested in discovering which genes follow the global cellular branching dynamics and whether these genes branch early or late with respect to the global cellular branching time. Recently, [3] proposed the branch expression analysis modelling (BEAM) approach, which uses penalised splines to infer the individual gene branching time. Here, we propose an alternative non-parametric method to model gene expression branching dynamics. We develop a probabilistic generative model of branching dynamics that can be used to assess the evidence for branching and to provide a posterior estimate of the branching time. The posterior distribution over branching time can be used to identify the most likely branching time for each gene as well as an associated credible region capturing our uncertainty in the estimate.
Our approach is based on Gaussian processes (GPs), which are a class of flexible non-parametric probabilistic models. GPs have a long history in temporal and spatial statistics and have gained popularity in many areas of machine learning, including multivariate regression, classification and dimensionality reduction [5]. GPs have been used for dimensionality reduction of single-cell expression data [6, 7] and more recently for pseudotime estimation where the effect of uncertainty in the inferred pseudotime can be quantified [8] and capture time can be included as prior information [9, 10]. GP-based methods have also been used for modelling global cellular branching dynamics from single-cell data after assigning pseudotime to cells [11].
Here, we build on the work of [12], who developed a GP model to identify when two gene expression time-course data sets first diverge from one another. They defined a novel GP covariance function that constrains two functions to intersect at a single point. The divergence time is inferred by numerically approximating the posterior using a simple histogram approach. The model identifies when a gene first becomes differentially expressed in time-course gene expression data under control and perturbed conditions. In their approach, all data points are labelled with the branch that generated them and the ordering of time points is assumed known. Although similar to the problem of modelling branching in single-cell data after pseudotime is inferred, this two-sample time-series method cannot be applied directly to our problem because we have to allow for uncertainty regarding which branch each cell belongs to.
Also closely related to the present work, the overlapping mixture of GPs (OMGP) [13] is a mixture model for time-series data where the mixture components are GP functions and the data at any time can be assigned to any of the components. For single-cell data, after pseudotime is assigned to each cell, then the OMGP model can be used to assign cells to different trajectories. The cell labels do not have to be known in advance and can be inferred through fitting the model to data. However, in OMGP models, the cellular trajectories are independent rather than branching. The OMGP model has been applied to single-cell data to infer global cell branching times [11] but as OMGP assumes the latent functions are independent without any branching, a heuristic based on a piecewise linear fit of the log likelihood surface is proposed to identify the most likely branching times. This is problematic, since OMGP does not provide a proper generative model of branching dynamics and therefore, it is not clear how to compute the posterior distribution over the branching time.
Our main methodological contribution here is to generalise the OMGP model to account explicitly for dependence between the functions in the mixture model. Specifically, we consider that the functions branch, as in [12]. This allowed us to develop a probabilistic model over branching cellular trajectories where the assignment of cells to branches is not known in advance. Our new branching GP (BGP) model allows us to calculate the posterior distribution over branching time for each gene while allowing for uncertainty in the branch labels for each cell. This uncertainty is especially important for early-branching genes, since cells are not labelled with a branch prior to the global cellular branching time, which we assume is known.
A naive implementation of GP models scales cubically with the size of the data. As increasing numbers of cells can be profiled in new single-cell protocols, we ensure the scalability of our approach by employing two complementary approaches. Firstly, we use sparse inference [14, 15], which allows model fitting to scale with the number of inducing points. The latter is a user-defined value that trades off model accuracy and training time. Specifically for N cells, naive covariance inversion scales as O(N^{3}) while under sparse inference with k inducing points, it scales as O(k^{2}N) where typically k≪N. Secondly, we provide an open-source implementation that leverages the GPflow library [16], which both simplifies the implementation due to automatic symbolic differentiation and allows for the necessary matrix operations to be computed in parallel across many CPU nodes or GPUs.
The paper is organised as follows. We first give an overview of the BGP model, including how inference is performed in a scalable manner. We discuss how uncertainty is quantified and represented in a full posterior distribution on the branching time for each gene. We contrast the performance of the BGP method to two recently published methods, the mixture of factor analysers (MFA) [17] and the BEAM approach [3], in a synthetic study across a variety of simulated scenarios. We apply the BGP model to data from a haematopoiesis study [18] and to single-cell mouse embryonic stem cell data generated using droplet barcoding [19]. We conclude with a summary of our findings and a discussion of possible future research directions.
Results and discussion
Overview of BGP
Before applying the BGP method, we require the pseudotime for each cell and the global branching pattern of the cells. In our experiments, we used results from the Monocle reversed graph embedding approach, termed DDRTree [3], and the diffusion pseudotime (DPT) [1] and the wishbone [2] approaches. However, BGP can be based on any method that estimates pseudotime and a gene-wide cell branching association. For a recent review, see [20].
OMGP [13] is a mixture model of independent GPs that is able to associate an observation with the generating GP. The authors term this the association problem and derive a variational inference algorithm for independent GPs. Our work extends the OMGP model in two directions: firstly, we remove the assumption of latent function independence and allow dependent GPs as required by a branching model. Secondly, we provide a sparse inducing point approximation that allows for scalable inference.
Let F be a BGP evaluated for N data points (cells) with M latent functions. Z∈{0,1}^{N×M} indicates which branch each cell comes from. The number of latent functions for a single branching point is M=3, as we have separate latent functions for the trunk and each branch. The likelihood is \(p\left (Y|F,Z\right)={\mathcal N}\left (Y|ZF,\sigma ^{2}I\right)\) and as in [13], we place a categorical prior on the indicator matrix \(p(Z) =\prod _{n=1}^{N}\prod _{m=1}^{M}\left [\Pi \right ]_{n,m}^{[Z]_{nm}}\). We place a GP prior on the latent functions \(p\left (F | t_{b}\right) =\mathcal {GP}\left (0,K | t_{b}\right)\), which constrains the latent functions to branch at pseudotime t_{ b }. Note that the latter does not factorise as in [13], as the latent functions are dependent. Further details of the model derivation and inference scheme used are provided in ‘Methods’ and a detailed derivation, including the inducing point approximation, is provided in the supplementary material (Additional file 1: Section 2).
Global branching labels, such as those provided by DDRTree, can provide an informative prior p(Z) for all genes. The prior before the global branching point is uninformative, as no global assignment is available. This is relevant for early-branching genes, which may start branching earlier than the global cellular branching. After the global branching point, the prior favours an increased assignment probability to the globally assigned branch. If the prior places significant mass on the alternative assignment, the resulting assignment may differ from the global allocation given enough evidence from the likelihood term. This allows the model to correct mislabelled cells as well as account for sources of noise in the data, such as dropout for lowly expressed genes. However, in our single-cell case studies, we use a strong prior assignment probability of 0.99 to avoid cell reassignment after the global branching time. This simplifies the interpretation of the results, as cells do not switch their global branch assignment. Nevertheless, other constructions are possible with this model, e.g. the distance of the cells from the global branching time may be used to adjust the associated prior uncertainty.
where we use a mean-field approximation q(Z,F)=q(Z)q(F) with the latent functions F independent of the association indicators Z and \(q(Z) = \prod _{nm} \phi _{nm}\). The ϕ_{ nm } approximates the posterior probability of cell n belonging to branch m. The latter is either the trunk state or one of the two branches for the single branching considered in the applications here. Then, F can be integrated out to get the marginal likelihood p(Y).
where t_{ b }=∞ specifies that the model does not branch and we have assumed equal prior probabilities for branching and not branching.
We can infer a branch order network by computing this probability for all pairs of branching genes at the desired confidence level. In the single-cell applications we discuss later, we use this approach to construct a directed network graph of gene branching times.
We can also calculate a posterior rank for each gene with an associated confidence interval. Using samples of the posterior branching time, we can estimate quantiles of the rank distribution for each gene. This is a simple way to infer which genes branch early and which late through a probabilistic ranking. Unlike the previous approach, this does not allow pairs of genes to be compared but can provide an overall summary of the branch ordering without the need for a network analysis.
Synthetic study
Synthetic gene groups
Group | Branching time | G |
---|---|---|
Early | 0.2 | 10 |
Late | 0.8 | 20 |
No branching | NA | 10 |
Synthetic study: pseudotime rank correlation to the true time for both MFA and Monocle under both scenarios
Noise | MFA | Monocle |
---|---|---|
0.001 | −0.96 | 1.0 |
0.01 | 0.98 | 1.0 |
0.03 | −0.93 | 1.0 |
0.08 | −0.97 | 1.0 |
0.1 | −0.97 | 1.0 |
0.2 | 0.91 | 1.0 |
All methods were run with default parameter settings, so it may be possible to improve on their performance by tuning these parameters. For example, as in [17], we found the performance of the algorithm depended on the initialisation used. We contrast the performance of the BGP model both without and with an informative prior (80% prior probability) on cell assignment derived from the global Monocle assignment.
We first compare the pseudotime estimation accuracy of Monocle and MFA. Both methods achieve good performance as measured by the rank correlation of the estimated pseudotime to the ground truth (Table 2).
Synthetic study: Area under the curve for detecting branching genes
Noise | MFA | BEAM | BGP | |
---|---|---|---|---|
No prior | prior | |||
0.001 | 0.30 | 1.00 | 1.00 | 1.00 |
0.01 | 0.65 | 1.00 | 1.00 | 1.00 |
0.03 | 0.77 | 0.96 | 0.98 | 0.99 |
0.08 | 0.82 | 0.93 | 0.86 | 0.92 |
0.1 | 0.77 | 0.82 | 0.81 | 0.96 |
0.2 | 0.77 | 0.74 | 0.86 | 0.84 |
Synthetic study: root-mean-squared error for branching time estimation
Noise | BEAM | BGP | |
---|---|---|---|
No prior | prior | ||
0.001 | 0.12 | 0.03 | 0.03 |
0.01 | 0.11 | 0.04 | 0.05 |
0.03 | 0.13 | 0.06 | 0.07 |
0.08 | 0.20 | 0.14 | 0.09 |
0.1 | 0.23 | 0.12 | 0.09 |
0.2 | 0.23 | 0.15 | 0.08 |
The state estimation correctly identifies a single branching point but the majority of cells are assigned to one of the branches (red). As one of the global branches (red) spans both gene expression branches, it is unsurprising that the spline approach fails to identify the branch location correctly and in fact overestimates the true branching time of t_{ b }=0.2 (Fig. 2a). The corresponding BGP inference (Fig. 4f) overcomes the errors in global state estimation and the confidence interval includes the true branching time.
The robustness of the BGP model can be understood in terms of the probabilistic nature of the model. Prior global state information is considered as well as a likelihood term that fits a branching process. Therefore, the BGP prior model incorporates the best of both worlds: inclusion of global assignment information and assessment of cell assignment based on individual gene expression. In the supplementary material (Additional file 1: Section 3), we also show the robustness of the model to non-Gaussian data, using synthetic data generated using a single-cell RNA-seq data simulator [21] to generate zero-inflated count data across a range of library sizes and dropout rates. We also find the model robust to data downsampling (Additional file 1: Section 4), which we use to speed up inference in the studies of experimental data that follow.
Haematopoiesis and single-cell RNA-seq
We apply the BGP model on single-cell RNA-seq of haematopoietic stem cells (HSCs), which differentiate into myeloid and erythroid precursors [18]. Following the same quality control as [2], the data consists of 2312 genes and 4423 cells. We use M=30 inducing points in our sparse approximation and to speed up the computations further, we randomly subsample the data down to 467 cells. In [2], an analysis of this data set was performed and we use their estimation of pseudotime and cell trajectory assignment labels using the Wishbone algorithm. We found similar pseudotime and global branching assignments with Monocle 2 (vs 2.1), and those results are given in the supplementary material (Additional file 1: Section 7).
To reduce the computation required, we apply a t-statistic on the end states of the two branches to filter the genes that are most likely branching. We apply the BGP algorithm on the resulting set of 1072 genes. The BGP inference was performed in parallel for each gene and required approximately 2 minutes of CPU time per gene^{1}.
A probabilistic model is an appropriate choice for early haematopoiesis, which has been described as a cellular continuum of low-primed HSCs [22]. The continuum contains transitory states rather than discrete progenitor cell types. Some cell state transitions and lineage combinations are more likely to occur than others. A probabilistic model such as BGP better reflects the probabilistic nature of lineage selection highlighted in [22]. In the BGP model in particular, each cell is associated with an allocation probability for each branch. The branching point can be interpreted as the earliest pseudotime from which probabilistic biases in lineage selection can be detected.
In the network (Fig. 7), each gene is annotated with its most likely branching time and the pairwise branching time order relationships are denoted by directed edges. The posterior confidence cut-off used for the latter is 95%. For instance, both PRTN3 and MPO are found to branch before CAR1 but only the former is branching before ELANE; this can be understood by the higher uncertainty in the branching posterior of MPO (Fig. 8c). In the network we have groups genes in three distinct modules. The PRTN3 and CTSG genes (red module) branched before all other genes except CALR, i.e. ELANE, CAR1, CAR2 and GSTM1. The other group of early-branching genes, MPO and CALR (yellow module) branch before CAR1, CAR2, GSTM1 but not ELANE. Finally, the later branching group consisting of ELANE and GSTM1, branch before CAR1.
When considering more genes, looking at the entire network may be cumbersome and if the interest is solely on identifying the earliest branching genes, we can estimate the posterior rank for each gene as described in the BGP overview section. When using a lower threshold on the branching evidence (log Bayes factor >50), we find 76 genes are the earliest and latest branching genes, which are listed in the supplementary material (Additional file 1: Section 5).
This analysis demonstrates the richness of the BGP model and the range of downstream analyses afforded by the probabilistic nature of the model.
Droplet-based single-cell RNA-seq
We further demonstrate the effectiveness of the BGP model by applying it to single-cell RNA-seq data generated using droplet barcoding [19]. Klein et al. [19] monitored the transcriptomic profiles and heterogeneity in the differentiation of mouse embryonic stem cells after leukaemia inhibitory factor withdrawal. A total of 2717 cells were profiled. Altogether, 24 175 transcripts were observed with cells captured at t=0, 2, 4 and 7 days. As for the haematopoiesis data, BGP inference was performed in parallel for each gene, which required approximately 2 minutes of CPU time per gene. The effect of cell cycle was removed using the scLVM approach [7] as in [1], who used this data set for their analysis of DPT.
We use the pseudotime estimated using the DPT method of [1]. The prior on the branching labels was also derived from the DPT method with a prior confidence of 99%. We examine the first dominant branching event reported by [1]. Here 915 cells were assigned to the trunk, and 1662 and 114 cells to each branch. To allow for fast computation for all genes, we subsampled the gene expression data down from 2717 to 335 cells with 81 assigned to the trunk, and 159 and 95 to each branch. This ensures the branches have roughly the same number of points. For faster computation, we analyse the top 998 genes according to the method of [23], which is available in the ScanPy software library [24].
We use the branching time posterior for each gene to estimate a branch order network of genes. For ease of presentation, we examine only the seven branching genes with the strongest evidence of branching (log Bayes factor >500). All posterior rankings with confidence greater than 95% are included in the network. We show the gene branch order network in Fig. 11 and the corresponding gene expression profiles in Fig. 10. The earliest branching gene, Ccdc36, was found to branch before all other genes in the network within the 95% confidence threshold. The Actg1 and Ccdc36 genes branch before the later branching genes Krt8, Krt18, Bc1 and Hsp90aa1. The epiblast markers Krt8 and Krt18 and the Bc1 gene branch before Hsp90aa1, which has the latest branching time (0.36). The earliest branching genes can also be found by directly computing the posterior rank for each (see section ‘Overview of BGP’). These genes are listed in the supplementary material (Additional file 1: Section 6).
Conclusion
We have presented a flexible non-parametric probabilistic approach for robustly identifying individual gene branching times. For scalability, our model uses sparse variational inference implemented using the GPflow package [16]. The probabilistic nature of our model allows for well-defined parameter estimation via maximisation of a bound on the marginal likelihood.
The spline model used by BEAM uses global branch assignments for each cell and is, therefore, unable to accurately identify branching times earlier than the global branching time. We found that branching time estimates from this spline-based approach were generally biased towards the global branching time. In contrast, the BGP method can robustly identify branching times, as it estimates the cell branch association for each gene independently while accounting for cell assignment uncertainty in the posterior branching times. We also found the BGP approach to be robust to global state estimation errors and high noise. The BGP branching time uncertainty can also be used in a downstream analysis of the individual gene branching times, for example, ranking genes in terms of their most likely or minimum branching times.
In the BGP model, a separate assignment of cells to branches is performed for each gene, since they are treated independently. This can lead to potentially misleading results, as cells may be assigned to different branches for different genes. Therefore, to achieve good results, we use an informative prior based on the cell assignment of a global method such as Monocle [3], DPT [1] or Wishbone [2]. This gives a high prior confidence of cell assignments after the global branching point. By using this strongly informative prior, we avoid the issue of inconsistent assignments for cells with pseudotime after the global branching point. However, cell assignments can differ for genes branching prior to the global branching time and therefore, care must be taken when interpreting the results for such genes. This is an improvement over methods such as BEAM that randomly assign cells prior to the global branching time. In future work, we plan to extend the BGP model to share the cell assignment across all genes, therefore avoiding such inconsistencies and simplifying any downstream analysis.
We have also included in our comparison a probabilistic linear method [17]. The linearity allows for an efficient joint estimation of both the pseudotime and global branching structure. Although this method does not estimate gene bifurcation times, a probabilistic estimate of whether an individual gene exhibits branching behaviour is available. However, in our synthetic study, we found that the pseudotime estimation was not robust and this reduces the effectiveness of the method.
The application of the BGP method to the haematopoiesis data revealed the importance of modelling transitory gene expression, which has the potential to confuse non-probabilistic methods. The model was able to select automatically the most likely branching location, even in the presence of multiple crossing points in gene expression without the need for any post-processing heuristics such as those included in the BEAM package.
We also demonstrated the flexibility of our approach by applying it to droplet-based single-cell data using the pseudotime and branching association derived from the DPT method [1]. As the BGP approach does not rely on a particular method to estimate pseudotime and branching association, a modular approach is possible in which the best method for a given study is used. The prior uncertainty specification on the branching association allows the BGP user to quantify the expected accuracy of the global branching method.
The probabilistic nature of the BGP model allows for additional biological insights to be gained by constructing a gene branch order network and identifying early lineage priming. The former is accomplished by computing a pairwise gene probability that assesses the likelihood of a gene branching before another. The latter can be inferred by examining the gene network and identifying the earliest branching genes. We demonstrated this approach on both single-cell data sets, identifying early-branching genes and confident orderings of gene branching events.
Concurrent with our study, [25] used changepoint kernels to develop similar branching GPs to identify bifurcations in single-cell transcriptional data sets. They use a Markov chain Monte Carlo approach to estimate cell branch association and branching times. Their approach also explicitly models recombination, in which individual branches are merged, and they can jointly estimate pseudotime. However, the computational complexity of their method would make application to genome-wide inference of branching times from unlabelled data challenging and that is the motivation for our sparse inducing point variational approach.
A number of extensions of the BGP model are possible that would increase the range of possible applications. The branching kernel could be adapted to detect changepoints in time series. Whereas we have modelled a branching event as the intersection of three latent functions, a changepoint would require only two latent functions. We would also like to extend our model to non-Gaussian likelihoods, which would more accurately describe single-cell data. This would increase inference complexity but could provide better calibrated uncertainty estimates. Another useful extension would be to jointly infer pseudotime and branching behaviour, which would also improve uncertainty estimation as the uncertainty arising from the estimation of the former would be included in the posterior branching uncertainty. Extending our model to multiple branching points is straightforward from a modelling standpoint but presents a more challenging optimisation problem, for which a tree prior on the branching structure may prove helpful [26]. This extension would allow us to address the problem of selecting the correct number of branches in the global cellular branching dynamics.
Methods
Branching model derivation outline
We present an overview of the probabilistic model for BGP. The full model description and derivation of the variational inference lower bound is given in the supplementary material (Additional file 1: Section 2).
First, we define the Gaussian process kernel describing a branching structure. We then derive a lower bound on the model likelihood using variational inference. In the supplementary material (Additional file 1: Section 2.2), we present a formulation of a sparse inducing point approximation that allows the application of the model to large data sets. How to perform prediction on the full and sparse models is also presented in the supplementary material (Additional file 1: Section 2.3).
Branching kernel
For simplicity, the same kernel is used for all three functions although it would be straightforward to extend our framework to specify different kernels for each latent function. This would allow for instance, one branch to be modelled as a periodic function and the others as non-periodic. The extra flexibility would come at the cost of increasing the number of parameters that need to be estimated.
where K(T,T),K(T,t_{ p }) and K(t_{ p },t_{ p }) are the kernel functions evaluated between all training data pseudotimes T, between the training data and branching point, and solely at the branching, point respectively.
In [12], only two latent functions were specified, a control and perturbation condition where the former spanned the branching point. In our modified formulation, three functions are used, allowing for a discontinuity in the gradient between the trunk and both branch latent functions. As an extension of our model, the derivatives of the latent functions could also be constrained to intersect at the branch point to allow for differentiable paths.
Full GP inference
where Y_{ d } denotes the N×1 column vector of observations for output d and similarly F_{ d } denotes the M×1 column vector of latent function values. We omit the multiple output case from the derivation below for clarity.
where for the multinomial distribution we have \(\sum _{m=1}^{M}\left [\Pi \right ]_{nm}=1\) and K is the GP kernel^{3}.
This encodes the mean-field assumption as we assume the posterior indicators factorise.
where z_{ i } the N×1 indicator vector for latent function i=m.
This bound holds because L_{1} is a bound to logp(Y|F) and the exponent function is monotonic. More details can be found in [27].
We can also derive this bound when using an inducing point approximation. This allows the algorithm to scale up to larger data sets. The full derivation is given in the supplementary material (Additional file 1: Section 2).
Time measured on a MacBook Pro with 2.2GHz quad-core Intel Core i7 and 16GB of DDR3L onboard memory.
This expanded representation allows for efficient recomputation of the marginal likelihood for different branching times.
For simplicity, we assume the same kernel for every output and latent trajectory function. Removing this restriction does not affect the derivation but will increase the inference complexity.
Declarations
Acknowledgments
We wish to thank Xiaojie Qiu for his help with the BEAM Monocle 2 code and Kieran Campbell with running the MFA model.
Funding
MR and AB were supported by Medical Research Council award MR/M008908/1. JH was supported by Medical Research Council fellowship MR/K022016/2.
Availability of data and materials
An open source Python implementation of BGP is available at https://github.com/ManchesterBioinference/BranchedGP [28], released under the Apache 2.0 licence. Version 1.1 with DOI https://doi.org/10.5281/zenodo.1235962 was used to generate the results of the paper. Documentation and examples are available on the GitHub page.
All data used in the paper have previously been published and are freely available. The haematopoiesis data were previously analysed in [2] and are available from https://github.com/ManuSetty/wishbone. The dropseq data were previously analysed in [1] and are available from https://www.helmholtz-muenchen.de/icb/research/groups/machine-learning/projects/dpt/index.html.
Authors’ contributions
AB, JH and MR designed the algorithm. AB and JH implemented the code and AB performed the numerical experiments and analysis. AB and MR wrote the paper. All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Competing interests
The authors declare that they have no competing interests.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
- Haghverdi L, Buettner M, Wolf FA, Buettner F, Theis FJ. Diffusion pseudotime robustly reconstructs lineage branching. Nat Methods. 2016; 13(10):845–8.View ArticlePubMedGoogle Scholar
- Setty M, Tadmor MD, Reich-Zeliger S, Angel O, Salame TM, Kathail P, et al.Wishbone identifies bifurcating developmental trajectories from single-cell data. Nat Biotechnol. 2016; 34(6):637–45.View ArticlePubMedPubMed CentralGoogle Scholar
- Qiu X, Mao Q, Tang Y, Wang L, Chawla R, Pliner HA, et al.Reversed graph embedding resolves complex single-cell trajectories. Nat Methods. 2017; 14(10):979.View ArticlePubMedPubMed CentralGoogle Scholar
- Street K, Risso D, Fletcher RB, Das D, Ngai J, Yosef N, et al.Slingshot: cell lineage and pseudotime inference for single-cell transcriptomics. 2017. bioarxiv preprint: https://doi.org/10.1101/128843.
- Rasmussen CE, Williams CKI. Gaussian processes for machine learning. London: The MIT Press; 2006.Google Scholar
- Buettner F, Theis FJ. A novel approach for resolving differences in single-cell gene expression patterns from zygote to blastocyst. Bioinformatics. 2012; 28(18):i626–32.View ArticlePubMedPubMed CentralGoogle Scholar
- Buettner F, Natarajan KN, Casale FP, Proserpio V, Scialdone A, Theis FJ, et al.Computational analysis of cell-to-cell heterogeneity in single-cell RNA-sequencing data reveals hidden subpopulations of cells. Nat Biotechnol. 2015; 33(2):155–60.View ArticlePubMedGoogle Scholar
- Campbell KR, Yau C. Order under uncertainty: robust differential expression analysis using probabilistic models for pseudotime inference. PLOS Comput Biol. 2016; 12(11):1–20.View ArticleGoogle Scholar
- Reid JE, Wernisch L. Pseudotime estimation: deconfounding single cell time series. Bioinformatics. 2016; 32(19):2973–80.View ArticlePubMedPubMed CentralGoogle Scholar
- Ahmed S, Rattray M, Boukouvalas A. GrandPrix: scaling up the Bayesian GPLVM for single-cell data. 2017. bioarxiv preprint: https://doi.org/10.1101/227843.
- Lönnberg T, Svensson V, James KR, Fernandez-Ruiz D, Sebina I, Montandon R, Soon MSF, Fogg LG, Nair AS, Liligeto UN, Stubbington MJT, Ly L-H, Bagger FO, Zwiessele M, Lawrence ND, Souza-Fonseca-Guimaraes F, Bunn PT, Engwerda CR, Heath WR, Billker O, Stegle O, Haque A, Teichmann SA. Single-cell RNA-Seq and computational analysis using temporal mixture modelling resolves th1/tfh fate bifurcation in malaria. Sci Immunol. 2017;2(9). http://immunology.sciencemag.org/content/2/9/eaal2192.
- Yang J, Penfold CA, Grant MR, Rattray M. Inferring the perturbation time from biological time course data. Bioinformatics. 2016; 32(19):2956–64.View ArticlePubMedPubMed CentralGoogle Scholar
- Lázaro-Gredilla M, Van Vaerenbergh S, Lawrence ND. Overlapping mixtures of Gaussian processes for the data association problem. Pattern Recognit. 2012; 45(4):1386–95.View ArticleGoogle Scholar
- Quiñonero-Candela J, Rasmussen CE. A unifying view of sparse approximate Gaussian process regression. J Mach Learn Res. 2005; 6(Dec):1939–59.Google Scholar
- de Garis Matthews AG. Scalable Gaussian process inference using variational methods.Department of Engineering, University of Cambridge; 2016.Google Scholar
- Matthews AGdG, van der Wilk M, Nickson T, Fujii K, Boukouvalas A, León-Villagrá P, et al.GPflow: a Gaussian process library using Tensorflow. J Mach Learn Res. 2017; 18(40):1–6.Google Scholar
- Campbell K, Yau C. Probabilistic modeling of bifurcations in single-cell gene expression data using a Bayesian mixture of factor analyzers. Wellcome Open Res. 2017;2–19. https://doi.org/10.12688/wellcomeopenres.11087.1.
- Paul F, Arkin Y, Giladi A, Jaitin DA, Kenigsberg E, Keren-Shaul H, et al.Transcriptional heterogeneity and lineage commitment in myeloid progenitors. Cell. 2015; 163(7):1663–77.View ArticlePubMedGoogle Scholar
- Klein AM, Mazutis L, Akartuna I, Tallapragada N, Veres A, Li V, et al.Droplet barcoding for single-cell transcriptomics applied to embryonic stem cells. Cell. 2015; 161(5):1187–201.View ArticlePubMedPubMed CentralGoogle Scholar
- Saelens W, Cannoodt R, Todorov H, Saeys Y. A comparison of single-cell trajectory inference methods: towards more accurate and robust tools. 2018:276907. bioarxiv preprint: https://doi.org/10.1101/276907.
- Zappia L, Phipson B, Oshlack A. Splatter: simulation of single-cell RNA sequencing data. Genome Biol. 2017; 18(1):174.View ArticlePubMedPubMed CentralGoogle Scholar
- Velten L, Haas SF, Raffel S, Blaszkiewicz S, Islam S, Hennig BP, et al.Human haematopoietic stem cell lineage commitment is a continuous process. Nat Cell Biol. 2017; 19(4):271–81.View ArticlePubMedPubMed CentralGoogle Scholar
- Zheng GX, Terry JM, Belgrader P, Ryvkin P, Bent ZW, Wilson R, et al.Massively parallel digital transcriptional profiling of single cells. Nat Commun. 2017; 8:14049.View ArticlePubMedPubMed CentralGoogle Scholar
- Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology; 19(1):15. https://doi.org/10.1186/s13059-017-1382-0.
- Penfold CA, Sybirna A, Reid J, Huang Y, Wernisch L, Ghahramani Z, et al.Nonparametric Bayesian inference of transcriptional branching and recombination identifies regulators of early human germ cell development. 2017. bioarxiv preprint: https://doi.org/10.1101/167684.
- Simek K, Palanivelu R, Barnard K. Branching Gaussian processes with applications to spatiotemporal reconstruction of 3d trees In: Leibe B, Matas J, Sebe N, Welling M, editors. Computer Vision – ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, October 11–14, 2016, Proceedings, Part VIII. New York: Springer International Publishing: 2016. p. 177–93. https://doi.org/10.1007/978-3-319-46484-8_11.
- King NJ, Lawrence ND. Fast variational inference for Gaussian process models through KL-correction. In: European Conference on Machine Learning. New York: Springer International Publishing: 2006. p. 270–81.Google Scholar
- Boukouvalas A, Hensman J, Magnus R. BGP: identifying gene-specific branching dynamics from single cell data with a branching Gaussian process. 2018. https://github.com/ManchesterBioinference/BranchedGP.