Skip to main content

Biologically informed NeuralODEs for genome-wide regulatory dynamics



Gene regulatory network (GRN) models that are formulated as ordinary differential equations (ODEs) can accurately explain temporal gene expression patterns and promise to yield new insights into important cellular processes, disease progression, and intervention design. Learning such gene regulatory ODEs is challenging, since we want to predict the evolution of gene expression in a way that accurately encodes the underlying GRN governing the dynamics and the nonlinear functional relationships between genes. Most widely used ODE estimation methods either impose too many parametric restrictions or are not guided by meaningful biological insights, both of which impede either scalability, explainability, or both.


We developed PHOENIX, a modeling framework based on neural ordinary differential equations (NeuralODEs) and Hill-Langmuir kinetics, that overcomes limitations of other methods by flexibly incorporating prior domain knowledge and biological constraints to promote sparse, biologically interpretable representations of GRN ODEs. We tested the accuracy of PHOENIX in a series of in silico experiments, benchmarking it against several currently used tools. We demonstrated PHOENIX’s flexibility by modeling regulation of oscillating expression profiles obtained from synchronized yeast cells. We also assessed the scalability of PHOENIX by modeling genome-scale GRNs for breast cancer samples ordered in pseudotime and for B cells treated with Rituximab.


PHOENIX uses a combination of user-defined prior knowledge and functional forms from systems biology to encode biological “first principles” as soft constraints on the GRN allowing us to predict subsequent gene expression patterns in a biologically explainable manner.


Biological systems are complex with phenotypic states, including those representing health and disease, defined by the expression states in the entire genome. Transitions between these states occur over time through the action of highly interconnected regulatory processes driven by transcription factors. Modeling molecular mechanisms that govern these transitions is essential if we are to understand the behavior of biological systems and design interventions that can more effectively induce a specific phenotypic outcome. But this is challenging since we want not only to predict gene expression at unobserved time points but also to make these predictions in a way that explains any prior knowledge of transcription factor binding sites. Models that accurately encode such interactions between transcription factors (TFs) and target genes within gene regulatory networks (GRNs) can provide insights into important cellular processes, such as disease-progression and cell-fate decisions [1,2,3,4].

Given that many dynamical systems can be described using ordinary differential equations (ODEs), a logical approach to modeling GRNs is to estimate ODEs for gene expression using an appropriate statistical learning technique [3,4,5,6]. Although estimating gene regulatory ODEs ideally requires time-course data, obtaining such data in biological systems might be difficult. One can instead use pseudotime methods applied to cross-sectional data to order samples and subsequently estimate ODEs that capture the regulatory structure [7, 8].

While a variety of ODE estimation methods have been proposed, most suffer from critical issues that limit their applicability in modeling genome-wide regulatory networks. Some systems biology models (such as those built using COPASI [9]) formulate ODEs based solely on biochemical principles of gene regulation and use the available data to parameterize these equations. However, such methods impose several restrictions on the ODEs and cannot flexibly adjust to situations where the underlying assumptions do not hold; this increases the risk of model misspecification and hinders scalability to large networks, particularly given the enormous number of parameters necessary to specify a genome-scale model [10, 11]. Other methods including Dynamo, PRESCIENT, and RNA-ODE are based on non-parametric approaches to learning regulatory ODEs, using tools such as sparse kernel regression [3], random forests [5], variational auto-encoders [7, 12, 13], diffusion processes [4], and neural ordinary differential equations [6, 14], but these fail to include biologically relevant associations between regulatory elements and genes as constraints on the models.

These latter models can be broadly placed into two classes based on the inputs required to estimate the gradient f of the gene regulatory dynamics, where \(f(\varvec{x}) =\frac{d\varvec{x}}{dt}\). The first class consists of methods like PRESCIENT [4] and RNAForecaster [6] that can learn f based only on time series gene expression input \(\{\varvec{x}_{t_0}; \varvec{x}_{t_1}; \dots ;\varvec{x}_{t_T}\}\) without additional steps or consideration of other regulatory inputs [4, 6, 15]. In the process of learning transitions between consecutive time points, these “one-step” methods implicitly learn the local derivative (in the context of single cell sequencing often referred to as “RNA velocity” [16]) \(\frac{d\varvec{x}}{dt}\mid _{\varvec{x} = \varvec{x}_{t_m}}\), as an intermediary to estimating f. One significant issue with these approaches is scalability, and studying meaningfully large dynamical systems (ideally those describing the entire genome) has so far been hindered by a large performance loss and missing interpretability [4, 6, 17, 18]. This leads to potential issues with generalizability as regulatory processes operate genome-wide and even small perturbations can have wide-ranging regulatory effects.

A second class of approaches consists of “two-step” methods such as Dynamo [3], RNA-ODE [5], and DeepVelo [12] that instead only require snapshot expression data for learning f with two separate steps [3, 5, 12], which allows for much broader applicability to standard RNA-seq data at the cost of performance as true time course data is required for reliable causal inference. These approaches first explicitly estimate the RNA velocity \(\left(\frac{d\varvec{x}}{dt}\right)\) for each data point in a preprocessing step, requiring spliced and unspliced transcript counts, and one of many available velocity estimation tools [3, 16, 19,20,21,22,23]. In the next step, the original task of learning f is reduced to learning a vector field from expression-velocity tuples \(\left[\varvec{x}_i , \left(\frac{d\varvec{x}}{dt}\right)_i\right]\) and a suitable learning algorithm is deployed. Apart from needing additional inputs that may not always be available (for example, spliced and unspliced counts are not available for microarray data), these “two-step” methods are also sensitive to the velocity estimation tool used, many of which suffer from a multitude of weaknesses [19]. Still, the Jacobian of the estimated vector field can help inform whether the learned dynamics are biologically meaningful [2, 3, 5, 8].

While the flexibility of both classes of models helps estimate arbitrary dynamics, they are “black box” methods whose somewhat opaque nature not only makes them prone to over-fitting but also creates challenges in teasing out interpretable mechanistic insights into regulatory control [1, 6]. These models are optimized solely to predict RNA velocity or gene expression levels and so the predictions are not explainable in the sense that most cannot be related back to a sparse causal GRN. Another major issue is the scalability of these methods; because of their computational complexity, they have not yet been shown to feasibly scale up to tens of thousands of genes―and definitely not to the entire genome [3, 4, 7, 12,13,14]. Consequently, most of these methods either restrict themselves to a small set of highly variable genes [4, 6, 7, 12, 14] or resort to dimension-reduction techniques (PCA, UMAP, latent-space embedding, etc.) [3, 4, 7, 13] as a preprocessing step . Although dimensional reduction for feature selection has proven useful in some instances, such as in developing predictive biomarkers, such use of “metagenes” suffers from a lack of interpretability or apparent mechanistic association. Dimensional reduction results in certain biological pathways being masked in the dynamics and impedes the recovery of causal GRNs. Furthermore, there is no obvious way to incorporate biological constraints and prior knowledge to guide model selection and prevent over-fitting when using dimensionally reduced data [1, 24]. Given that there is a strong desire among biological scientists to understand the dynamic properties of individual genes in health and disease, we focused our efforts on developing a method capable of scaling to the genome, on the original gene expression space.

We developed PHOENIX (Prior-informed Hill-like ODEs to Enhance Neuralnet Integrals with eXplainability) as a scalable method for estimating dynamical systems governing gene expression through an ODE-based machine learning framework that is flexible enough to avoid model misspecification and is guided by insights from systems biology that facilitate biological interpretation of the resulting models [25, 26]. At its core, PHOENIX models temporal patterns of gene expression using neural ordinary differential equations (NeuralODEs) [27, 28], an advanced computational method commensurate with the scope of human gene regulatory networks―with more than 25,000 genes and 1600 TFs―and a limited number of samples. We implement an innovative NeuralODE architecture that inherits the universal function approximation property (and thus the flexibility) of neural networks while resembling Hill-Langmuir kinetics, which have been used to model dynamic transcription factor binding site occupancy [10, 29, 30]. It can hence reasonably describe gene regulation by modeling the sparse yet synergistic interactions of genes and transcription factors. Importantly, PHOENIX operates on the original gene expression space and performs without any dimensional reduction, thus preventing information loss, especially for lowly expressed genes that are nonetheless important for cell fate [4].

In the optimization step of PHOENIX, we introduce user-defined prior knowledge in the form of a “network prior.” Here, we derive a prior based on TF binding motif enrichment that is predicated on the understanding that the direct regulators of gene expression are transcription factors that bind in the region around a gene’s transcription start site (TSS). Because most transcription factors bind to distinct sequence motifs, each transcription factor has the potential to regulate only a fixed subset of genes. This regulatory constraint can be expressed as a network prior by mapping transcription factors to the promoter sequence of regulated genes, using tools such as FIMO [31] and GenomicRanges [32].

The incorporation of user-defined prior knowledge of likely network structure ensures that a trained PHOENIX model is explainable―it not only predicts temporal gene expression patterns but also encodes an extractable GRN that captures key mechanistic properties of regulation such as activating (and repressive) edges and strength of regulation.


As shown below, we formulated the PHOENIX model such that it utilizes a novel biologically motivated NeuralODE architecture and also incorporates structural domain knowledge. In subsequent experiments, we demonstrated the utility of PHOENIX for estimating gene expression dynamics by performing a series of in silico benchmarking experiments, where PHOENIX exceeded even the most optimistic performance of popular black box RNA dynamics estimation methods. We demonstrated the scalability of PHOENIX by applying it to genome-scale breast cancer microarray samples ordered in pseudotime and investigated how scaling to the complete data set improves a representation of key pathways. We further applied PHOENIX to yeast cell cycle microarray data to show that it can capture oscillatory dynamics by flexibly deviating from Hill-like assumptions when necessary. Finally, to test PHOENIX with a different data modality, we investigated available genome-scale time course RNASeq data of Rituximab treated B cells, where PHOENIX is able to capture key molecular changes in the main mechanism of action of Rituximab.

The PHOENIX model

Given a time series gene expression data set, the NeuralODEs of PHOENIX implicitly estimate the local derivative (RNA velocity) at an input data point with a neural network (NN). We designed activation functions that resemble Hill kinetics and thus allow the NN to sparsely represent different patterns of transcriptional co-regulation by combining separate additive and multiplicative blocks that operate on the linear and logarithmic scales respectively. An ODE solver then integrates the estimated derivative to reconstruct the steps taken from an input \(x_i\) at time \(t_i\) to a predicted output \(\widehat{x}_{i+1}\) at time \(t_{i+1}\) [27]. The trained neural network block thus encodes the ODEs governing the dynamics of gene expression and hence encodes the underlying vector field and GRN. An important advantage of incorporating an ODE solver is that we can predict expression changes for arbitrarily long time intervals without relying on predefined Euler discretizations, as is required by many other methods [4, 12, 18]. We further augmented this framework by allowing users to include prior knowledge of gene regulation in a flexible way, which acts as a domain-knowledge-informed regularizer or soft constraint of the NeuralODE [24] (Fig. 1). By combining the mechanism-driven approach of systems biology-inspired functional forms and prior knowledge with the data-driven approach of powerful machine learning tools, PHOENIX scales up to full-genome data sets and learns meaningful models of gene regulatory dynamics.

Fig. 1
figure 1

PHOENIX is powered by a NeuralODE engine. Given an expression vector \(\varvec{g}(t_i) \in \mathbb {R}^{\#\text {genes}}\) at time \(t_i\), a neural network (dotted rectangle) estimates the local derivative \(d\varvec{g}(t_i)/dt\), and an ODE solver integrates this value to predict expression at subsequent time points \(\widehat{\varvec{g}}(t_{i+1})\). The neural network is equipped with activation functions (\(\phi _\Sigma\) and \(\phi _\Pi\)) that resemble Hill-Langmuir kinetics and two separate single-layer blocks (\(\text {NN}_{sums}\) and \(\text {NN}_{prods}\)) that operate on the linear and logarithmic scales to model additive and multiplicative co-regulation respectively. A third block (\(\text {NN}_{combine}\)) then flexibly combines the additive and multiplicative synergies. PHOENIX incorporates two levels of back-propagation to parameterize the neural network while inducing domain knowledge-specific properties; the first (red arrows with weight \(\lambda _\mathrm{{data}}\)) aims to match the observed data, while the second (blue arrow with weight \(\lambda _\mathrm{{prior}}\)) uses simulated expression vectors \(\varvec{\gamma }(t_i) \in \mathbb {R}^{\#\text {genes}}\) to implement soft constraints defined by user-supplied prior models (\(\mathcal {P}^*\)) of putative regulatory interactions. Since the \(\varvec{\gamma }(t_i)\)s were simulated expression values, we also refer to them as “ghost inputs.” More details about their operationalization can be found in Additional File 2: Section 1.2

Neural ordinary differential equations (NeuralODEs)

NeuralODEs [27] learn dynamical systems by parameterizing the underlying derivatives with neural networks:

$$\begin{aligned} \frac{d \varvec{x}(t)}{dt} = f(\varvec{x}(t), t) \approx \text {NN}_\theta (\varvec{x}(t), t). \end{aligned}$$

Given an initial condition, the output at any given time-point can now be approximated using a numerical ODE solver \(\mathcal {S}\) of adaptive step size

$$\begin{aligned} \widehat{\varvec{x}(t_1)} = \varvec{x}(t_0) + \int _{t_0}^{t_1} \text {NN}_\theta (\varvec{x}(t), t) dt = \mathcal {S}(\varvec{x}(t_0);\text {NN}_\theta ;t_0;t_1). \end{aligned}$$

This is the basic architecture of a NeuralODE [27], and it lends itself to loss functions L (here, \(\ell _2\) loss) of the form

$$\begin{aligned} L\left( \varvec{x}(t_1), \widehat{\varvec{x}(t_1)} \right) = L\left( \varvec{x}(t_1), \mathcal {S}(\varvec{x}(t_0);\text {NN}_\theta ;t_0;t_1)\right) . \end{aligned}$$

To perform back-propagation, the gradient of the loss function with respect to all parameters \(\theta\) must be computed, which is done using the adjoint sensitivity method [27]. Building off of the NeuralODE author’s model implementation in PyTorch [28], we made biologically motivated modifications to the architecture and incorporated user-defined prior domain knowledge, as described below.

Model formulation and neural network architecture

Most models for co-regulation of gene expression are structured as a simple feedback process [29]. Given that gene regulation can be influenced by perturbations across an entire regulatory network of n genes, the gene expression of all genes \(g_j(t)\) can affect a specific \(g_i(t)\) at time point t:

$$\begin{aligned} \frac{d\varvec{g}(t)}{dt} = f_{reg}\left( \varvec{g}(t)\right) - \varvec{g}(t), \end{aligned}$$

where \(\varvec{g}(t) = \{g_i(t)\}_{i=1}^n\), \(f_{reg}: \mathbb {R}^n \rightarrow \mathbb {R}^n\), and \(f_{reg}\) is approximated with a neural network. To model additive as well as multiplicative effects within \(f_{reg}\), we used an innovative neural network architecture equipped with activation functions that emulate—and can thus sparsely encode—Hill kinetics (see Fig. 1). The Hill-Langmuir equation H(P) was originally derived to model the binding of ligands to macromolecules [30] and can be used to model transcription factor occupancy of gene regulatory binding sites [10]:

$$\begin{aligned} H(P) = \frac{P^\alpha }{\kappa ^\alpha + P^\alpha } = \frac{(P/\kappa )^\alpha }{1 + (P/\kappa )^\alpha } = \frac{Y}{1+Y}, \text {with } Y = (P/\kappa )^\alpha , \end{aligned}$$

which resembles the softsign activation function \(\phi _\mathrm{{soft}}(y)=1/(1+ \left| y\right| )\). For better neural network trainability, however, we shifted it to the center of the expression values. To approximate suitable exponents \(\alpha\), we further log-transformed H, since composing additive operations in the log-transformed space with a Hadamard \(\exp \circ\) function can represent multiplicative effects.

$$\begin{aligned} \phi _{\Sigma }(x) = \frac{x - 0.5}{1 + \mid x - 0.5 \mid }, \quad \quad \quad \phi _{\Pi }(x) =\log \left( \frac{x - 0.5}{1 + \mid x - 0.5 \mid } + 1\right) \end{aligned}$$

were employed as activation functions to define two neural network blocks (\(\text {NN}_{sums}\) and \(\text {NN}_{prods}\)), representing additive and multiplicative effects

$$\begin{aligned} \varvec{c_\Sigma }(\varvec{g}(t)) = \varvec{W_\Sigma } \phi _{\Sigma }(\varvec{g}(t))+\varvec{b_\Sigma } ~~~~~~ \varvec{c_\Pi }(\varvec{g}(t))=\exp \circ (\varvec{W_\Pi } \phi _{\Pi }(\varvec{g}(t))+\varvec{b_\Pi }). \end{aligned}$$

The concatenated vectors \(\varvec{c_\Sigma }(\varvec{g}(t)) \oplus \varvec{c_\Pi }(\varvec{g}(t))\) served as input to a third block \(\text {NN}_{combine}\) (with weights \(\varvec{W}_{\cup } \in \mathbb {R}^{n\times 2m}\)) that flexibly combined these additive and multiplicative effects. We found that a single linear layer was sufficient for this purpose. Given that ODEs have difficulty with learning zero gradients, we found it necessary (see Additional File 1: Figure S5) to introduce gene-specific multipliers \(\varvec{\upsilon } \in \mathbb {R}^n\) for modeling genes that do not exhibit any temporal variation \(\left( \frac{dg_i(t)}{dt} = 0, \forall t\right)\). We initialized \(\varvec{\upsilon }\) using i.i.d. random standard uniform values.

Accordingly, the output derivative for each gene i was multiplied with \(\text {ReLU} (\upsilon _i) = \max \{(\upsilon _i, 0)\}\). We expressed this using the Hadamard product (\(\odot\)) of the previous output and the elementwise ReLU of \(\varvec{\upsilon }\) as

$$\begin{aligned} \widehat{\frac{d\varvec{g}(t)}{dt}} = \text {ReLU} (\varvec{\upsilon }) \odot \left[ \varvec{W}_{\cup } \{\varvec{c_\Sigma }(\varvec{g}(t)) \oplus \varvec{c_\Pi }(\varvec{g}(t))\} - \varvec{g}(t)\right] . \end{aligned}$$

The trainable parameters \(\varvec{\theta } = (\varvec{W_\Sigma }, \varvec{W_\Pi },\varvec{b_\Sigma },\varvec{b_\Pi },\varvec{W}_{\cup }, \varvec{\upsilon })\) were learned based on observed data and prior domain knowledge (details in Additional File 2: Section 1).

Structural domain knowledge incorporation

One challenge we found in interpreting PHOENIX is that NeuralODEs have multiple solutions [33], of which many are inconsistent with our understanding of the process by which specific transcription factors (TFs) regulate the expression of other genes within the genome. Most solutions accurately represent gene-gene correlations but do not necessarily reflect biologically established TF gene regulation processes. Inspired by recent developments in physics-informed deep learning [24], we introduced biologically motivated soft constraints to regularize the search for a parsimonious approximation. We started with the NeuralODE prediction for the gene expression vector

$$\begin{aligned} \widehat{\varvec{g}(t_1)}{} & {} = \varvec{g}(t_0) + \int _{t_0}^{t_1} \text {ReLU} (\varvec{\upsilon }) \odot \left[ \varvec{W}_{\cup } \{\varvec{c_\Sigma }(\varvec{g}(t)) \oplus \varvec{c_\Pi }(\varvec{g}(t))\} - \varvec{g}(t)\right] dt \\ {} & {} = \mathcal {S}\left( \varvec{g}(t_0);\text {ReLU} (\varvec{\upsilon }) \odot \left[ \varvec{W}_{\cup } \{\varvec{c_\Sigma }(\varvec{g}(t)) \oplus \varvec{c_\Pi }(\varvec{g}(t))\} - \varvec{g}(t)\right] ;t_0;t_1\right) . \end{aligned}$$

We found that the unregularized PHOENIX provides an observed gene expression-based approximation for the local derivative \(\frac{d \varvec{g}(t)}{dt}\), but often we have additional structural information available about which TFs are more likely to regulate certain target genes. Hence, one could also formulate a domain knowledge-informed \(\mathcal {P^*}\left( \varvec{g}(t)\right)\) that is a prior-based approximation as

$$\begin{aligned} \underbrace{\text {ReLU} (\varvec{\upsilon }) \odot \left[ \varvec{W}_{\cup } \{\varvec{c_\Sigma }(\varvec{g}(t)) \oplus \varvec{c_\Pi }(\varvec{g}(t))\} - \varvec{g}(t)\right] }_{\text {PHOENIX (based on observed gene expression data)}} \approx \frac{d \varvec{g}(t)}{dt} \approx \underbrace{\mathcal {P^*}\left( \varvec{g}(t)\right) }_{\text {prior-based}}. \end{aligned}$$

By promoting our NeuralODE to flexibly align with such structural domain knowledge, we automatically searched for biologically more realistic models that still explained the observed gene expression data. To this end, we designed a modified loss function \(\mathcal {L}_{\text {mod}}\) that incorporated the effect of prior model \(\mathcal {P^*}\) using a set of K simulated (“ghost”) expression vectors \(\{\varvec{\gamma _k} \in \mathbb {R}^n\}_{k=1}^K\). This induced a preference for consistency with prior domain knowledge.

$$\begin{aligned} & {} \mathcal {L}_{\text {mod}}\left( \varvec{g}(t_1), \widehat{\varvec{g}(t_1)} \right) \\ ={} & {} \lambda \overbrace{L\left[ \varvec{g}(t_1), \mathcal {S}\left( \varvec{g}(t_0);\text {ReLU} (\varvec{\upsilon }) \odot \left[ \varvec{W}_{\cup } \{\varvec{c_\Sigma }(\varvec{g}(t)) \oplus \varvec{c_\Pi }(\varvec{g}(t))\} - \varvec{g}(t)\right] ;t_0;t_1\right) \right] }^{\text {loss based on matching observed gene expression data}}\\ {} & {} +(1-\lambda )\underbrace{\frac{1}{K}\sum \limits _{k=1}^K L\left[ \mathcal {P^*}\left( \varvec{\gamma _k}\right) , \text {ReLU} (\varvec{\upsilon }) \odot \left[ \varvec{W}_{\cup } \{\varvec{c_\Sigma }(\varvec{\gamma _k}) \oplus \varvec{c_\Pi }(\varvec{\gamma _k})\} - \varvec{\gamma _k}\right] \right] }_{\text {loss based on matching prior model}} . \end{aligned}$$

Here, \(\lambda\) is a tuning parameter for flexibly controlling how much weight is given to the prior-based optimization, which we tuned with cross-validation, and \(L[\varvec{x},\widehat{\varvec{x}}]\) is the primary loss function, set to the \(L = \ell _2\) loss in our experiments.

While our modeling framework is flexible regarding the nature of the prior model \(\mathcal {P^*}\), we incorporated a simple linear model, a common choice for chemical reaction networks or simple oscillating physical systems [34] as

$$\begin{aligned} \mathcal {P^*}\left( \varvec{\gamma _k}\right) = \varvec{A}\cdot \varvec{\gamma _k} - \varvec{\gamma _k} = (\varvec{A} - \varvec{I})\cdot \varvec{\gamma _k}. \end{aligned}$$

Here, \(\varvec{A}\) is the adjacency matrix of likely network structure based on prior domain knowledge, such as experimentally validated interactions and TF-gene binding information derived from motif scans, with \(\varvec{A}_{ij} \in \{+1, -1, 0\}\) representing an activating, repressive, or no prior interaction, respectively. For cases where the signs (activating/repressive) of prior interactions were unknown, we found that formulating \(\varvec{A}\) simply based on prior interaction existence, or \(\varvec{A}_{ij} \in \{1, 0\}\), would suffice (see Additional File 1: Table S3).

PHOENIX accurately and explainably learns temporal evolution of in silico dynamical systems

We began our validation studies with simulated gene expression time-series data so that the underlying dynamical system that produced the system’s patterns of gene expression was known. We adapted SimulatorGRN [29, 35] to generate time-series expression data from two synthetic S. cerevisiae gene regulatory systems (SIM350 and SIM690, consisting of 350 and 690 genes respectively). The activating and repressive interactions in each in silico system were used to synthesize noisy expression “trajectories” for each gene across multiple time points (see the “Defining a ground truth dynamical system” and “Simulating time series gene expression data” sections). We split up the trajectories into training (88%), validation (6% for hyperparameter tuning), and testing (6%) and compared PHOENIX predictions on the test set against the “known”/ground truth trajectories. Since PHOENIX uses user-defined prior knowledge as a regularizer, we also corrupted the prior model at a level commensurate with the “experimental” noise level (see Additional File 2: Section 4.2), reflecting the fact that transcription factor-gene binding is itself noisy.

Fig. 2
figure 2

We applied PHOENIX to simulated gene expression data originating from in silico dynamical systems SIM350 (A) and SIM690 (B) that simulate the temporal expression of 350 and 690 genes respectively. Each simulated trajectory consisted of five time points (\(t = 0,2,3,7,9\)) and was subjected to varying levels of Gaussian noise (\(\frac{\text {noise } \sigma }{\textrm{mean}}\) = 0%, 5%, 10%, 20%, higher noise settings in Additional File 1: Figure S6). Since PHOENIX uses a user-defined prior network model as a regularizer, we also corrupted the prior models up to an amount commensurate with the noise level. For each noise setting, we trained PHOENIX on 140 of these “observed” trajectories and validated on 10. The performance on the validation trajectories was used to determine the optimal value of \(\lambda _{\text {prior}}\). We then tested the trained model on 10 new test set trajectories. We display both observed and predicted test set trajectories for four arbitrary genes in both SIM350 and SIM690, across all noise settings. We display the mean squared error (MSE) between the predictions and the 10 pre-noise test set trajectories

We found that PHOENIX accurately learned the temporal evolution of the SIM350 and SIM690 systems (Fig. 2) and was able to recover the true test set trajectories (that is, test set trajectories pre-noise) with a reasonably high accuracy even when the training trajectories included high levels of noise and the prior knowledge model was altered to exclude transcription factor-gene interactions explicitly involved in expression (see Additional File 1: Table S1, Additional File 1: Table S2). Furthermore, the shapes of the predicted trajectories (and hence the predicted steady-state levels) obtained from feeding initial values (expression at \(t=0\)) into the trained model remained robust to noise, suggesting that a trained PHOENIX model could be used to estimate the temporal effects of cellular perturbations. Since the primary prediction engine of PHOENIX is a NeuralODE, we wanted to benchmark its performance relative to “out-of-the-box” (OOTB) NeuralODE models (such as RNAForecaster [6]) to understand the contributions of our modifications to the NeuralODE architecture. We tested a range of OOTB models where we adjusted the total number of trainable parameters to be similar to that of PHOENIX (see Additional File 2: Section 3.2). Because PHOENIX uses a domain prior of likely gene-regulation interactions in its optimization scheme, we also tested a version (\(\text {PHX}_0\)) where the weight of the prior was set to zero (\(\lambda _\mathrm{{prior}} = 0\)). For each of SIM350 and SIM690, we observed that PHOENIX outperformed OOTB NeuralODEs on the test set in noiseless settings (Additional File 1: Figure S2 and Additional File 1: Table S4). When we added noise, the PHOENIX models still generally outperformed the OOTB models, especially \(\text {PHX}_0\). The test MSEs were more comparable between all the models in very high noise settings. The consistently strong performance of PHOENIX suggests that using a Hill kinetics-inspired architecture better captures the dynamics of the regulatory process, in part because it models the binding kinetics of transcription factor-gene interactions.

In terms of the contribution of the prior constraints to PHOENIX’s performance, we saw that PHOENIX was generally outperformed by \(\text {PHX}_0\), its unregularized version (Additional File 1: Figure S2 and Additional File 1: Table S4). However, given that the prior can be interpreted as soft biological constraints on the estimated dynamical system [24], an important question is whether \(\text {PHX}_0\) (as well as OOTB models) makes accurate temporal predictions by correctly learning elements of the causal biology or whether the lack of prior information results in an alternate learned representation of the dynamics, which―despite predicting these particular trajectories well―does not explain the true biological regulatory process.

To this end, we recognized that the parameters of a trained PHOENIX model encode an estimate of the ground-truth gene regulatory network (GRN) that causally governs the system’s evolution over time. We therefore inferred encoded GRNs from trained PHOENIX models and compared them to the ground truth networks \(GRN_{350}\) and \(GRN_{690}\) used to synthesize SIM350 and SIM690 respectively (see Additional File 2: Section 2). Given PHOENIX’s simple NeuralODE architecture, we were able to develop a GRN inference algorithm that could predict edge existence, direction, strength, and sign, using just model coefficients, without any need for time-consuming sensitivity analyses (unlike other approaches [2, 12]). For comparison, we wanted to extract GRNs from the most predictive OOTB models; given their black box nature, OOTB model GRNs had to be obtained via sensitivity analyses (see Additional File 2: Section 3.2).

Fig. 3
figure 3

We extracted encoded GRNs from the trained PHOENIX models and the best-performing out-of-the-box NeuralODE models, for both in silico dynamical systems SIM350 (A) and SIM690 (B) across all noise settings (see Additional File 1: Figure S3 for more noise settings). We compared these GRN estimates to the corresponding ground truth GRNs used to formulate SIM350 and SIM690 and obtained AUC values as well as out-degree correlations (\(\rho _\mathrm{{out}}\)). We also reverse-engineered a metric (\(\mathcal {C}_{\max }\)) to inform how sparsely PHOENIX had inferred the dynamics (see Additional File 2: Section 2). Furthermore, we used these \(\mathcal {C}_{\max }\) values to obtain optimal true positive and true negative rates (\(\textrm{TPR}_{\max }\) and \(\textrm{TNR}_{\max }\)) that were independent of any cutoff value, allowing us to compare between “best possible” networks across all settings

We compared inferred and ground truth GRNs in terms of several metrics, including edge recovery, out-degree correlations, and induced sparsity. We obtained near-perfect edge recovery for PHOENIX (AUC \(\in [0.96, 0.99]\)) as well as high out-degree correlations across all noise settings (Fig. 3 and Additional File 1: Table S5). Most notably, we observed that PHOENIX predicted dynamics in a more robustly explainable way than \(\text {PHX}_0\) and the OOTB models. We measured induced sparsity by reverse engineering a metric \(\mathcal {C}_{\max }\) based on maximizing classification accuracy (see Additional File 2: Section 2) and found that PHOENIX resulted in much sparser dynamics than \(\text {PHX}_0\) (Additional File 1: Table S6). To further assess this phenomenon, we computed the estimated model effect between every gene pair in SIM350 and compared these values between PHOENIX and \(\text {PHX}_0\). We found that the incorporation of priors helped PHOENIX identify core elements of the dynamics and predict gene expression patterns in a biologically parsimonious manner (Additional File 1: Figure S4).

Since the inclusion of such static prior knowledge greatly increased the explainability of the inferred dynamics, we also investigated how explainability was affected by misspecification of the prior. In our in silico experiments, we had randomly corrupted (misspecified) the prior by an amount commensurate with the noise level (see Additional File 2: Section 4.2). We compared network representations of these misspecified prior constraints to GRNs extracted from the PHOENIX models that used these very priors. We found that PHOENIX was able to appropriately learn causal elements of the dynamics beyond what was encoded in the priors (Additional File 1: Table S1). This trend persisted when we made the model explicitly naive to subsets of domain-information by omitting them from the prior (Additional File 1: Table S2). These results indicate that even though the user-defined priors enhance explainability, PHOENIX can deviate from them when necessary and learn regulatory interactions from the data itself.

PHOENIX exceeds the most optimistic performances of current black box methods in silico

Having established PHOENIX models as both predictive and explainable, we compared its performance to other existing methods for gene expression ODE estimation in silico (Additional File 1: Table S8). As discussed earlier, these can be placed into two groups based on the input data. The “one-step” methods estimate dynamics by directly using expression trajectories; these include RNAForecaster [6] (which is an out-of-the-box NeuralODE), and PRESCIENT [4], among others [14, 15]. PHOENIX is more similar to these methods.

“Two-step” methods such as Dynamo [3], RNA-ODE [5], and DeepVelo [12] were not directly designed for time series data, but are applicable to snapshot RNA-seq with quantified isoforms. They estimate dynamics by first reconstructing RNA velocity using inputs such as spliced and unspliced mRNA counts and then estimating a vector field mapping expression to velocity. While this comparison is not the ideal setting for such two step approaches, we did compensate for their disadvantage by providing the ground truth velocities as input―information that none of the one-step approaches have―into their second step (see Additional File 2: Section 3.1). Furthermore, we used the validation set to optimize key hyperparameters of all the methods (Additional File 1: Table S8, right-most column) before finally testing predictive performance on expression values from held-out test trajectories. Most of the methods also provide a means for extracting a gene network that we used to evaluate each method’s explainability (see Additional File 2: Section 3.3).

In these comparisons, we confirm that the “one-step” trajectory-based methods generally yield better predictions than the “two-step” velocity-based methods (although Dynamo sometimes achieved performance compared to the single-step methods), which comes as little surprise as these methods were originally designed for a slightly different setting. Overall, at reasonable noise levels, PHOENIX outperformed even the optimistic versions of the black box methods by large margins both in terms of predicting gene expression (Additional File 1: Table S4) and explainability (based on consistency with the ground truth network; Additional File 1: Table S5). We found that Dynamo was the most explainable competing method in SIM350 but that, in SIM690, DeepVelo was more explainable. Finally, we found that the dynamics estimated by PHOENIX were generally much sparser than any other method and that sparsity generally decreased with noise levels (Additional File 1: Table S6).

Further ODE estimation approaches (not included in our experiments) and their functionalities are discussed in Additional File 1: Table S7 [7, 8, 13, 14, 36]. Code for performing such methodological benchmarks is included with the PHOENIX release [37].

PHOENIX predicts temporal evolution of yeast cell-cycle genes in an explainable way

We tested PHOENIX using an experimental data set [38] from a study [39] of cell-cycle synchronized yeast cells, consisting of two technical replicates of expression values for 3551 genes across 24 time points (see the “Data processing and normalization” section for data processing).

Since there were two technical replicates (or trajectories), we used one of the two yeast replicates for training, and the other replicate for testing (not seen in any way during training). Furthermore, within the replicate for training, we used the contiguous segment of time points \(t = 45, 50, 55\) min for validation (to choose \(\lambda _{\text {prior}}\)). This scheme ensured that the train and validation sets were disjunct and that we were measuring predictive performance on a test set that was independent of the training data (see the “Model setup for training and testing” section). For the domain prior, we used a simple adjacency-matrix-based prior model derived from TF motif enrichment analysis in promoters of each of the 3551 genes (see the “Model setup for training and testing” section). We tuned the prior weight (to \(\lambda _{\text {prior}} = 0.05\)) using the validation set to induce higher explainability by promoting a biologically anchored structure to the dynamics.

PHOENIX was able to learn the temporal evolution of gene expression across the yeast cycle explaining over 69% of the variation in the test set (Fig. 4B and Table 1). Notably, when we visualized the estimated dynamics by extrapolating from just initial values (expression at \(t = 0\)), we found that PHOENIX plausibly predicted continued periodic oscillations in gene expression, even though the training data consisted of only two full cell cycles (Fig. 4A). The amplitude of the predicted trajectories dampened across time points, which is expected given that yeast array data tends to exhibit underdamped harmonic oscillation during cell division possibly reflecting de-synchronization of the yeast cells over time [40]. This performance on non Hill-like oscillatory dynamics is indicative of the high flexibility of PHOENIX. It inherits the universal function approximation property from NeuralODEs, allowing it to deviate from Hill-like assumptions when necessary, while still remaining explainable due to the integration of prior knowledge.

Fig. 4
figure 4

A We applied PHOENIX (\(\lambda _{prior} =0.05\)) to 2 technical replicates of gene expression of 3551 genes each, collected across 24 time points in a yeast cell-cycle time course [38]. We trained on 40 transition pairs, used 3 for validation, and tested predictive accuracy on the remaining 3. We display both observed and predicted trajectories for 3 arbitrary genes, where the predicted trajectories are extrapolations into future time points based on just initial values (gene expression at \(t=0\)). Additional results in Additional File 1: Figure S6 and Additional File 1: Table S9. B We correlated observed versus predicted expression levels of all 3551 genes for the 3 expression vectors in the test set; \(\rho = 0.8308\) implying \(R^2 = 0.69\). C We tested the explainability of the learned dynamics by comparing encoded GRNs retrieved from a series of trained models (of varying prior dependencies) against ChIP-chip data [41] to obtain ROC curves. The \(\lambda _{\text {prior}} = 0.05\) model was the one chosen based on the validation set MSE (see Additional File 2: Section 1)

To test the biological explainability of the learned dynamical system, we extracted the encoded GRN from the trained PHOENIX model (with optimal \(\lambda _{\text {prior}} = 0.05\) as determined by the validation set) and compared it to a validation network of ChIP-chip transcription factor (TF) binding data [41]. PHOENIX had very impressive accuracy in predicting TF binding (AUC = 0.93), indicating that it had learned transcription factor binding information in the process of explaining temporal patterns in expression (Fig. 4C). In the absence of any prior knowledge (\(\lambda _{\text {prior}} = 0\)), the explainability was poor, highlighting the importance of such knowledge-based guidance in black box models [24, 25] as well as the importance of correctly tuning \(\lambda _{\text {prior}}\).

Similar to the in silico experiments, we saw that PHOENIX’s ability to predict TF binding was greater than that obtained by comparing just the prior to the validation data (Additional File 1: Table S1). This suggested that PHOENIX had used the prior knowledge of cell cycle progression as a starting point to anchor the dynamics and then used the data itself to learn improved regulatory rules.

Table 1 Comparison of PHOENIX with competitor methods on yeast data with all 3551 genes and, separately, the 500 most variable genes. Snapshot-based methods (Dynamo, RNA-ODE, Deepvelo) require RNA velocity at every time point as an additional input [3, 5, 12]. Given that this information was not available in the data set, we estimated RNA velocity using a method of finite differences applied to smooth splines through the expression trajectories [42] (see Additional File 2: Section 3.1). Predictive performance is reported as the \(R^2\) on the test set. Explainabilty AUC was calculated by comparing encoded GRNs retrieved from each trained model (Additional File 2: Section 2 and Additional File 2: Section 3.3) against ChIP-chip data [41]

In order to contextualize PHOENIX’s impressive performance on this data set, we performed comparative analyses against Dynamo [3], RNA-ODE [5], DeepVelo [12], and out-of-the-box NeuralODEs. As discussed in Additional File 1: Table S8, Dynamo, RNA-ODE, and DeepVelo are “two-step” snapshot based methods that require RNA velocity at every time point as an additional input [3, 5, 12]. Given that this information was not available in the data set, we estimated RNA velocity using a method of finite differences applied to smooth splines through the expression trajectories [42] (see Additional File 2: Section 3.1). Furthermore, given that it is a common approach for current methods to only consider a subset of top-k highly variable genes [4, 6, 7, 12, 14], we additonally performed a subsetted analysis considering only the 500 most variable genes along the trajectory.

In both sets of analysis, we found PHOENIX to be both the most predictive \(R^2 \in (69\%, 75\%)\) and by far the most explainable in terms of AUC \(\in (0.93, 0.99)\) (Table 1). Dynamo, RNA-ODE, and DeepVelo were designed for a predictive setting, and we see that they do perform well on the subsetted analysis when looking at predictive performance, albeit worse than PHOENIX. In terms of reconstruction of the underlying GRN, they however perform poorly considering the AUC, with a difference of about 0.30 or more to PHOENIX. When taken in conjunction with the poor predictive performance of out-of-the-box NeuralODEs (\(R^2 \in [30\%, 39\%]\)), as well as the poor explainability of all black box methods, this highlights the unique strength of PHOENIX to appropriately model important biological processes, such as cell cycle progression, while being highly predictive on time course data.

PHOENIX infers genome-wide dynamics of breast cancer progression and identifies central pathways

Although there are several tools for inferring the dynamics of regulatory networks, most do not scale beyond a few hundreds of genes without losing explainability, falling far short of the 25,000 genes in the human genome (Additional File 1: Table S8). Given the performance improvements we saw that were driven by PHOENIX’s use of soft constraints, we wanted to test whether PHOENIX could be extended to human-genome scale networks. Due to the dearth of longitudinal human studies with genome-wide expression measurements, we used data from a cross-sectional breast cancer study (GEO accession GSE7390 [43]) consisting of microarray expression values for 22000 genes from 198 breast cancer patients and ordered these samples in pseudotime. For consistency in pseudotime ordering, we reused a version of this data [44] that was already preprocessed and ordered (using a random-walk-based pseudotime approach) in the PROB paper [8].

After further processing (see the “Testing on breast cancer pseudotime data” section), we obtained a single pseudotrajectory of expression values for \(n_g=11165\) genes across 186 patients, each at a distinct pseudotimepoint. To explore whether PHOENIX’s performance depends on the size of the data set, we also created pseudotrajectories for \(n_g = 500\), 2000, and 4000 genes by subsetting the data set to its \(n_g\) most variable genes. We split up the 186 time points into contiguous intervals for training (170, \(90\%\)), validation (8, \(5\%\)), and test (8, \(5\%\)). For the domain prior network, we again used a simplistic prior model derived from a motif map of promoter targets and tuned \(\lambda _{\text {prior}}\) using the validation set. See the “Model setup for training and testing” section for further details.

For each pseudotrajectory of size \(n_g\), we trained a separate PHOENIX model, and measured predictive performance as the variation explained (\(R^2\)) in the test trajectory by predicting the trajectory through the model from just the initial time point (see the “Model setup for training and testing” section). We observed encouragingly high values of \(R^2 \in (92\% , 97\%)\) (Fig. 5, top) that are, however, not what to expect given this is frozen tumor tissue. Upon further investigation, we found that the high \(R^2\) was partly driven by most genes not showing enough dynamic expression, which hence are easily predicted with a constant trajectory. When we instead focused on evaluating only the top most variable genes, we observed more reasonable values of \(R^2\) (Additional File 1: Figure S10, blue line). We also considered alternate strategies of measuring predictive performance in this setting but found this to oversimplify the task (Additional File 1: Figure S10, red line). We note here that PHOENIX’s computational cost was not excessive even when \(n_g = 11,165\) (see Additional File 1: Table S13).

Next, we investigated PHOENIX’s ability to identify biologically relevant and actionable information regarding gene regulation in breast cancer. First, we tested the performance of the learned dynamical system to reconstruct a gene regulatory network and predict TF-gene interactions. While the ground truth GRN is unknown, we can estimate performance by comparing a validation network of experimental ChIP-chip binding information [45] to a subnetwork of the encoded GRN of a trained PHOENIX model. We found excellent alignment between the two GRNs with AUC \(\in (0.90, 0.96)\), even when we scaled up to \(n_g=11165\) genes (Fig. 5). It is important to note that the PHOENIX-based concordance with experimental data was much greater than that obtained by comparing just the prior knowledge to the validation network (Additional File 1: Table S1), indicating that PHOENIX was improving upon the GRN suggested by the prior knowledge, in addition to learning a dynamical model.

To better understand the benefits of PHOENIX’s scalability, we investigated how estimating regulatory dynamics based on a subset of only the \(n_g\) most variable genes can alter the perceived importance of individual genes to the regulatory system in question. We reasoned that a model trained on all assayed genes should reconstruct biological information better than those that are restricted to a subset of genes [6, 12, 14]. First, we performed a gene-level analysis by perturbing in silico the PHOENIX-estimated dynamical system from each value of \(n_g\) (500, 2000, 4000, 11165). This yielded “influence scores” representing how changes in initial (\(t=0\)) expression of each gene affected subsequent (\(t>0\)) predicted expression of all other genes (see the “Gene influence scores” section). As might be expected, the influence scores grew increasingly more concordant with centrality measures in the ChIP validation network, consistent with the key roles played by transcription factor genes in large GRNs (Additional File 1: Table S13).

We observed that highly variable genes with known involvement in breast cancer (such as WT1 [46], ESR1 [47], AR [48], and FOXM1 [49]) were generally influential across all values of \(n_g\) (Additional File 1: Figure S7). It is interesting to note that both FOXM1 and AR were very influential in the \(n_g = 500\) system, but their score dropped in the full genome (\(n_g = 11165\)) system. This is likely due to the way in which we constructed the smaller subsets of the whole genome―by selecting the most variable genes. One would expect that the most variable transcription factor genes falling within any subset would be highly correlated in expression with other genes falling in the same set and that the overall effect would be diluted by adding more―potentially uncorrelated―genes to the system. It is more interesting that genes missing in the smaller subsets (due to low expression variability) were identified as central to the dynamics in the full (\(n_g = 11165\)) system. Among these genes, we can find some encoding cancer-relevant transcription factors such as E2F1 [50, 51] CTCF [52], and ERG [53] and DNA methyltransferase enzymes (DNMT1 [54]).

We found that the more computationally manageable systems (\(n_g = 500, n_g = 2000\)) yielded an incomplete picture of gene-level influences since the method used in constructing these subsets hinders the mechanistic explainability of the resulting regulatory model. Certain genes exhibit relatively low variability in expression but are still central to disease-relevant genome-level dynamics; compared to methods that exclude such genes to make computation tractable [4, 6, 12], PHOENIX can correctly identify them as central because of its ability to model subtle but important genome-scale dynamics.

Finally, we performed a pathway-based functional enrichment analysis by translating these gene influence scores to pathway influence scores using permutation tests on the Reactome pathway database [55] (see the “Pathway influence scores” section). We reasoned that a more complete network, to have practical advantages over smaller and more manageable models, should be able to capture a more complete picture of biological processes that are involved in the cancer pseudotime. Not surprisingly, the dynamical systems with fewer genes missed many pathways known to be associated with breast cancer that were identified as over-represented in the genome-scale (\(n_g = 11,165\)) system (Fig. 5 and Additional File 1: Table S10). Notably, the pathways missed in the smaller networks include apoptosis regulation (a hallmark of cancer [56]) and TP53 regulation of caspases (relevant to apoptosis control in tumors [57]), while terms for the estrogen-related signaling and GLI/Hedgehog signaling (whose role in cancer is well documented [58, 59]) would have been missed or underestimated by the smaller models.

In a parallel analysis testing for functional enrichment of GO biological process terms, we again found the smaller systems to overlook important pathways that were clearly influential in the genome-scale analysis; these included a wide array of RNA metabolism processes that are increasingly recognized as being significant to breast cancer development [60] (Additional File 1: Figure S8). Finally, while the GO molecular function terms are consistently dominated by binding terms, we can notice how the larger models are capable of detecting more specific terms such as BHLH TF binding and E-box binding, which are subgroups of the more TF binding term that are known for the regulation of well known cancer-related genes such as NOTCH1 and MYC [61, 62] (Additional File 1: Figure S9).

These results clearly demonstrate the importance of scalable methods such as PHOENIX that can explainably model genome-wide dynamics. Our reduced gene sets from which we built the smaller PHOENIX models consisted of the 500, 2000, or 4000 most variable genes. These gene sets likely consist of variable genes that are correlated with each other, meaning that we are sampling only a portion of the biological processes driving the temporal changes in breast cancer; the full picture only emerges when looking at regulatory processes across the spectrum of genes that can contribute. Alternative approaches, such as concentrating on specific pathways, risk introducing self-fulfilling biases in the discovery process. Similarly, methods that use low-dimensional embedding (PCA, UMAP, etc.) to reduce the complexity of modeling dynamics risk losing valuable, biologically relevant insights. PHOENIX’s ability to scale while remaining explainable offers the best potential for the discovery of interpretable insights about the phenotypes under study.

Fig. 5
figure 5

We applied PHOENIX to a pseudotrajectory of 186 breast cancer samples (ordered along subsequent “pseudotimepoints”) consisting of \(n_g = 11,165\) genes [44]. We split up the 186 time points into contiguous intervals for training (170, \(90\%\)), validation to tune \(\lambda _{\text {prior}}\)(8, \(5\%\)), and test (8, \(5\%\)). We also repeated the analysis on smaller subsets of genes \(n_g = 500, 2000, 4000\), where we subsetted the full trajectory to only the \(n_g\) most variable genes in the pseudotrajectory. We measured predictive performance as the \(R^2\) on the test trajectory by applying the trained model to just the first test time point (see the “Model setup for training and testing” section). We evaluated explainability performance as the AUC from comparing encoded GRNs from trained models against a ChIP-seq validation network [45]. Finally, we used the trained PHOENIX models to extract permutation-based influence scores for pathways in the Reactome database [55] (see the “Gene influence scores” and “Pathway influence scores” sections) and visualized influence scores for a collection of the most central pathways. See Additional File 1: Table S10, Additional File 1: Table S11, and Additional File 1: Table S12 for detailed results

Fig. 6
figure 6

We compared the performance of PHOENIX to other methods of regulatory dynamics estimation on a pseudotrajectory of 186 breast cancer samples (ordered along subsequent “pseudotimepoints”) consisting of \(n_g = 11165\) genes [44]. The data was processed for model fitting via the steps described in the “Model setup for training and testing” section. Snapshot-based methods (Dynamo, RNA-ODE, Deepvelo) require RNA velocity at every time point as an additional input [3, 5, 12]. Given that this information was not available in the data set, we estimated RNA velocity using a method of finite differences applied to smooth splines through the expression trajectories [42] (see Additional File 2: Section 3.1). Once each model was trained, we sought to measure explainability by obtaining the predicted GRN from it (Additional File 2: Section 2, Additional File 2: Section 3.3). Then, for N ranging from 50 to 11165, we calculated the concordance between the subgraph of each \(\widehat{GRN}\) spanned by the top-N most variable genes in the data set against the corresponding ChIP-seq validation subnetwork [45]

In order to test whether the ability to remain explainable while scaling genome-wide is unique to PHOENIX, we performed comparative analyses on the full set of \(n_g = 11,165\) genes against Dynamo [3], RNA-ODE [5], DeepVelo [12], and out-of-the-box NeuralODEs. Analogous to the yeast example, RNA velocity information was unavailable, and hence estimated using finite differences on smooth splines[42] for the “two-step” snapshot based methods [3, 5, 12] (see Additional File 2: Section 3.1). In order to better understand how well a model trained at genome-level is still able to capture biologically meaningful nuances all across the gene-network, we obtained the predicted \(\widehat{GRN}\) (each with \(n_g = 11165\) nodes) from each trained model (Additional File 2: Section 2, Additional File 2: Section 3.3) but measured explainability in an incremental manner. For N ranging from 50 to 11165, we calculated the concordance between the induced subgraph of each \(\widehat{GRN}\) spanned by the N-most variable genes against the corresponding ChIP-seq validation subnetwork [45].

The results indicated that PHOENIX is on par with the existing black box approaches in terms of test set predictive accuracy (Additional File 1: Table S14) while offering interpretable, biologically meaningful results. This is reflected in the achieved AUC (Fig. 6), where only PHOENIX is able to properly reconstruct the underlying GRN. Most black box methods fulfill their goal of being highly predictive without offering much insight into the (interpretable) underlying processes. Notably, RNA-ODE recovers a GRN when considering only the top 100 most variable genes, which is, however, arguably not useful in practice. PHOENIX was the only approach that both scales to the full set of genes, being highly predictive, while being interpretable.

PHOENIX recovers key gene regulatory changes in Rituximab treated B cells

To challenge PHOENIX with a different data modality, we considered a longitudinal RNA-seq experiment of B cells followed over a course of 15 h, where cells where either treated with Rituximab or kept untreated [63]. Rituximab is used as treatment for specific leukemia, non-hodgkin lymphoma as well as rheumatoid arthritis, binding to B cells inducing cell death through apoptosis, NK cell-mediated cytotoxicity, or macrophage-mediated phagocytosis. Both treated and untreated B cells were sampled at \(t =\) 0, 1, 2, 4, 7, and 15 h, and two replicates were available for analysis.

We trained two separate PHOENIX models, one each for the treated and untreated cells, respectively. This allowed us to compare the gene regulatory changes after Rituximab treatment by examining the differences between the gene regulatory networks encoded by the two trained PHOENIX models. We split the time points in each condition into training (\(80\%\)) and validation (\(20\%\)). For the domain prior network, we use a TF-binding derived prior similar to the breast cancer study above (see the “Model setup for training and GRN extraction” section) and tuned \(\lambda _{\text {prior}}\) using the validation set. We observed that PHOENIX was able to model gene expression dynamics in both conditions well, with \(R^2\) values between 89 and 92% (Additional File 1: Table S16). We extracted the two encoded GRNs (one each from the PHOENIX model trained on the two conditions) and subsequently sought to examine the changes of the regulatory dynamics that were visible between untreated B cells and those treated with Rituximab.

Aggregating the regulatory effects of a protein (input) across all (output) genes, we then compute log-fold changes of these regulatory potentials between the two GRNs. On a closer look at the top 50 regulators in terms of changes in regulatory dynamics (Additional File 1: Table S15), we find several key regulators of apoptosis, the main mechanism of action of Rituximab in this experiment, as neither macrophages nor NK cells are present to enable NK-mediated cytotoxicity or M-mediated phagocytosis. Among others, we find PPP2R3C, a key inhibitor of B cell receptor-induced apoptosis [64], PHLPP2, which acts by dephosphorylation of AKT family members on apoptosis [65], RAB1A, which is part of the RAS signaling cascade that acts anti-apoptotic [66], and CLUH that regulates the mTORC1 signaling pathway and hence apoptosis [67]. All of these show a more than 2x log-fold change in terms of regulatory dynamics between the two conditions. Examining the two main pathways of action of Rituximab, apoptosis, and B cell receptor signaling, we further observe a generally strong regulatory difference of genes annotated for those pathways between treatment and control (see Additional File 1: Figure S11 and Additional File 1: Figure S12). Our analysis provides further evidence that PHOENIX reflects meaningful biological signals also on RNA-seq data, even when only few unevenly sampled measurements are available.


The predictive accuracy, scalability, flexibility, and biological explainability of PHOENIX can be attributed primarily to two things. First, our novel NeuralODE architecture includes the use of Hill-like activation functions for capturing the kinetic properties of molecular binding provides a massive advantage in terms of predictive power. Second, the introduction of soft constraints based on prior knowledge of putative network structure leads to a scalable and biologically explainable estimate of the underlying dynamics.

Using simulated data we have shown that PHOENIX outperforms other models for inferring regulatory dynamics (including other NeuralODE-based models), particularly in the presence of experimental noise. Also, an application to data from the yeast cell cycle elucidates PHOENIX’s flexibility in modeling arbitrary dynamics. More importantly, PHOENIX is the only NeuralODE method capable of extending its modeling to capture genome-scale regulatory processes, while remaining explainable. Using data from breast cancer patients organized in pseudotime, we not only illustrate the ability of PHOENIX to faithfully model genome-scale networks but also demonstrate the power of extending regulatory modeling to capture seemingly subtle but biologically important regulatory processes.

One of the challenges in modeling the evolution of network processes, of course, is obtaining data sets for which temporal data are available. However, we recognize that in any data set, the individual samples represent a continuum of states between health and disease and so can use pseudotemporal ordered samples. But this too has some limitations as there is as of yet no established method for ordering bulk samples-although there have been some methods for single-cell data adapted to “bulk” tissue samples [68]. This admits an interesting possibility: one could use additional information (such as RNA velocities, provided they are reliable) from which one could infer pseudotime (or real-time) trajectories. In that sense, one may argue that PHOENIX provides a general approach to infer interpretable GRNs and ODE models from outputs of other methods (such as Dynamo [3]), which are currently less transparent. This could provide the best of both worlds with a reduced dimensional approach to temporal ordering providing input for a more complete and interpretable final model.

Although PHOENIX, in its current implementation, is designed with one “layer” of regulation to model TF-gene interactions, we recognize that there are other regulatory elements or higher order regulatory effects in the cell that contribute to the control of gene expression. Such additional effects could potentially be modeled by increasing the complexity of the NeuralODE solver by introducing additional layers to the NeuralODE framework. However, this would increase the computational requirements by PHOENIX and reduce its current advantage of being a light-weight method.


Given the importance of regulatory networks and their dynamics, there has been a tremendous interest in inferring and modeling their physical and temporal behavior. The use of NeuralODEs represents an extremely promising technology for inferring such networks, but so far, attempts to implement NeuralODE-based network modeling have encountered significant problems, not the least of which has been their inability to scale to modeling genome-wide dynamics in a biologically explainable manner.

PHOENIX represents an important new methodological extension to the NeuralODE framework that is not only scaleable to the full human genome but also biologically well interpretable and able to capture explicitly both additive as well as multiplicative ways in which transcription factors cooperate in regulating gene expression. For a simplified analysis, the underlying gene regulatory network can also be extracted from a learned model and compared with experimental evidence. An optional feature of PHOENIX that contributes significantly to its explainability is that it can be guided by (structural) domain knowledge. Notably, PHOENIX also remains flexible to deviate from domain knowledge when necessary and learn novel insights consistent with the training data.


Testing on simulated data

Defining a ground truth dynamical system

We created a ground truth gene regulatory network (GRN) by sampling from S. cerevisiae (yeast) regulatory networks obtained from the SynTReN v1.2 supplementary data in simple interaction format (SIF) [69]. The SynTReN file provides a directional GRN containing 690 genes and 1094 edges with annotations (activating vs repressive) for edge types; we defined this GRN to be a ground-truth network \(G_{690}\). To obtain \(G_{350}\), we sampled a subnetwork of 350 genes and 590 edges from \(G_{690}\). We used the connectivity structure of \(G_{350}\) and \(G_{690}\), to define systems of ODEs (SIM350 and SIM690) with randomly assigned coefficients. This entire pipeline was executed using SimulatorGRN [35], a framework used extensively by the R/Bioconductor package dcanr [29]. Please see Additional File 2: Section 4.1 for further ODE formulation details.

Simulating time series gene expression data

For each \(n \in \{350, 690\}\), we used the ground truth dynamical system SIMn to generate expression vectors \(\varvec{g}(t) \in \mathbb {R}^n\), across time points t. We started by i.i.d. sampling 160 standard uniform \(\mathbb {R}^n\) vectors to act as initial \((t=0)\) conditions. We used these initial conditions to integrate SIMn and obtain 160 expression trajectories across \(t \in T = \{0,2,3,7,9\}\) using R’s desolve package: \(\{\{\varvec{g}(t)_i\}_{t\in T}\}_{i = 1}^{160}\). We used only five time points to emulate potential scarcities of time-series information in real data sets, while the range \(t=0\) to 9 generally covered the transition from initial to steady state. Lastly, we added Gaussian noise vectors \(\varvec{\varepsilon }(t, \sigma )_i \overset{i.i.d}{\sim }\ \mathcal {N}(\varvec{0},\sigma ^2\varvec{I})\) of varying \(\sigma\) to get noisy data sets: \(\left\{ \left\{\left\{\varvec{g}(t)_i + \varvec{\varepsilon }(t, \sigma )_i\right\}_{t \in T}\right\}_{i = 1}^{160}\right\} _{\sigma \in S}\). Since the average simulated expression value was \(\approx 0.5\), using \(\sigma \in S= \left\{0, \frac{1}{40}, \frac{1}{20}, \frac{1}{10}\right\}\) corresponded roughly to average noise levels of \(0\%, 5\%, 10\%, 20\%\).

Model setup for training and testing

For each simulation scenario, there were 160 simulated trajectories, out of which we used 140 (88%) for training, 10 (6%) for validation (hyperparameter tuning), and 10 (6%) for testing. We provide some details on PHOENIX implementation (such as training strategy and prior incorporation) in Additional File 2: Section 1 and include finer technicalities (including learning rates) in our GitHub repository [37]. For prior domain knowledge model, we used the simple linear model: \(\mathcal {P^*}\left( \varvec{\gamma _k}\right) = \varvec{A^{\sigma _{\%}}}\cdot \varvec{\gamma _k} - \varvec{\gamma _k}\), where we chose \(\varvec{A^{\sigma _{\%}}}\) to be noisy/corrupted versions of the adjacency matrices of ground truth networks \(G_{350}\) and \(G_{690}\) (details in Additional File 2: Section 4.2). We set activating edges in \(\varvec{A^{\sigma _{\%}}}\) to + 1 and repressive edges to − 1. “No interaction” was represented using 0. To validate explainability, we extracted GRNs from trained models and compared to ground truth \(G_{350}\) and \(G_{690}\) for the existence of edges, out-degree correlations, and induced sparsity (details in Additional File 2: Section 2).

Testing on experimental yeast cell cycle data

Data processing and normalization

GPR files were downloaded from the Gene Expression Omnibus (accession GSE4987 [38]) and consisted of two dye-swap technical replicates measured every 5 min for 120 min. Each of two replicates were separately ma-normalized using the maNorm() function in the marray library in R/Bioconductor [70]. The data were batch-corrected [71] using the ComBat() function in the sva library [72] and probe set mapping to the same gene were averaged, resulting in expression values for 5088 genes across fifty conditions. Two samples (corresponding to the 105 minute time point) were excluded for data-quality reasons, as noted in the original publication, and genes without motif information were then removed, resulting in an expression data set containing 48 samples (24 time points in each replicate) and 3551 genes.

Model setup for training and testing

Given that the data set contained two technical replicates of gene expression trajectories, we used one replicate for training and the other for testing (not seen in any way during training). Furthermore, within the replicate for training, we used a the expression values of contiguous segment of time points \(t = 45, 50, 55\) min for validation (to choose \(\lambda _{\text {prior}}\)). The two remaining intervals \(t \in [0, 40]\) and \(t \in [60, 120]\) in this replicate were used for training. This scheme resulted in train (87%) and validation (13%) sets that were disjoint and, crucially, a test set that was independent of the training data. We provide more details on PHOENIX implementation (such as training strategy and prior incorporation) in Additional File 2: Section 1 and include finer technicalities (including learning rate schedule) in our GitHub repository [37].

For prior domain knowledge model, we used the following simple linear model: \(\mathcal {P^*}\left( \varvec{\gamma _k}\right) = \varvec{A}\cdot \varvec{\gamma _k} - \varvec{\gamma _k}\). We based our choice of \(\varvec{A}\) on the regulatory network structure of a motif map [41] that has been used in other methods, such as PANDA [26]. The map is based on predicted binding sites for 204 yeast transcription factors (TFs) [73]. These data include 4360 genes with tandem promoters. Three thousand five hundred fifty-one of these genes are also covered on the yeast cell cycle gene expression array. One hundred five total TFs in this data set target the promoter of one of these 3551 genes. The motif map between these 105 TFs and 3551 target genes provides the adjacency matrix \(\varvec{A}\) of 0s and 1s, representing whether or not a prior interaction is likely between TF and gene.

We used ChIP-chip data [41] to create a network of TF-target interactions and used this as a validation network to test explainability. The targets of transcription factors in this ChIP-chip data set were filtered using the criterion \(p < 0.001\). We calculated AUC values by comparing the encoded GRN retrieved from the trained models (see Additional File 2: Section 2) to the validation network.

Testing on breast cancer pseudotime data

Data procurement and psuedotime ordering

The original data set comes from a cross-sectional breast cancer study (GEO accession GSE7390 [43]) consisting of microarray expression values for 22,000 genes from 198 breast cancer patients that we sorted along a pseudotime axis. We noted that the same data set was also used in the PROB [8] paper. PROB is a GRN inference method that infers a random-walk-based pseudotime to sort cross-sectional samples and reconstruct the GRN. For consistency and convenience in pseudotime inference, we obtained the same version of this data [44] that was already preprocessed and sorted by PROB. We normalized the expression values to be between 0 and 1. We limited our analysis to the genes that had measurable expression and appeared in our prior model and obtained a pseudotrajectory of expression values for 11165 genes across 186 patients. We also created pseudotrajectories for \(n_g = 500\), 2000, and 4000 genes by subsetting to the \(n_g\) highest variance genes.

Model setup for training and testing

We noted that the processed data set contained expression across 186 “pseudo” time points. We excised a contiguous interval of expression across 8 time points for testing (5%), and split up the remaining 178 time points into training (170, \(90\%\)) and validation for tuning \(\lambda _{\text {prior}}\) (8, \(5\%\)). To measure predictive accuracy of the trained model, we used it calculated a predicted trajectory based on just the first time point in the test set. We calculated the \(R^2\) between this prediction and the remaining 7 points of the test trajectory. Further implementation details are in Additional File 2: Section 1 and GitHub [37].

For the prior domain knowledge model, we used the simple linear model: \(\mathcal {P^*}\left( \varvec{\gamma _k}\right) = \varvec{W_0}\cdot \varvec{\gamma _k} - \varvec{\gamma _k}\). We based our choice of \(\varvec{W_0}\) on a motif map [74], similar to that used in the breast cancer analysis in OTTER [75]. The network \(\varvec{W_0}\) is derived from the human reference genome, for the breast tissue specifically. \(W_0\) is a binary matrix with \(\varvec{W_0}_{i,j} \in \{0, 1\}\) where 1 indicates a TF sequence motif in the promoter of the target gene. Sequence motif mapping was performed using the FIMO software [31] from the MEME suite [76] and the R package GenomicRanges [32].

Validation of explainability was challenging since there are only a few data sets that have ChIP-seq data for many TFs from the same cells. We used a validation network [45] of TF-target interactions based on ChIP-seq data for the MCF7 cell line (breast cancer, 62 TFs) in the ReMap2018 database [77]. We calculated AUC values by comparing the encoded GRNs retrieved from the trained models (see Additional File 2: Section 2) to the validation network.

Gene influence scores

Given \(\mathcal {M}_{n_g}\) a PHOENIX model trained on the pseudotrajectory consisting of only the \(n_g\) most variable genes (\(n_g \in \{500, 2000, 4000, 11165\}\)), we performed perturbation analyses to compute gene influence scores \(\mathcal{I}\mathcal{S}_{n_g, j}\). We randomly generated 200 initial (\(t=0\)) expression vectors via i.i.d standard uniform sampling \(\{\varvec{g(0)_k} \in \mathbb {R}^{n_g}\}_{k=1}^{200}\). Next, for each gene j in \(\mathcal {M}_{n_g}\), we created a perturbed version of these initial value vectors \(\{\varvec{g^{j}(0)_k}\}_{k=1}^{200}\), where only gene j was perturbed in each unperturbed vector of \(\{\varvec{g(0)_k}\}_{k=1}^{200}\). We then fed both sets of initial values into \(\mathcal {M}_{n_g}\) to obtain two sets of predicted trajectories \(\{\{\varvec{\widehat{g}(t)_k} \}_{t \in T} \in \mathbb {R}^{n_g}\}_{k = 1}^{200}\) and \(\{\{\varvec{\widehat{g}^j(t)_k} \}_{t \in T} \in \mathbb {R}^{n_g}\}_{k = 1}^{200}\) across a set of time points T. We calculated influence as the average absolute difference between the two sets of predictions that represented how changes in initial (\(t=0\)) expression of gene j affected subsequent (\(t>0\)) predicted expression of all other genes in the \(n_g\)-dimensional system:

$$\begin{aligned} \mathcal{I}\mathcal{S}_{n_g, j} = \frac{1}{200}\sum \limits _{k=1}^{200}\left[ \frac{1}{\left| T \right| }\sum \limits _{\begin{array}{c} t \in T \\ t\ne 0 \end{array}} \left( \frac{1}{n_g} \sum \limits _{\begin{array}{c} i =1 \\ i\ne j \end{array}}^{n_g} \left|\widehat{g_i}(t)_k - \widehat{g_i}^j(t)_k \right|\right) \right] \end{aligned}$$

Pathway influence scores

Having computed gene influence scores \(\mathcal{I}\mathcal{S}_{n_g, j}\) for each gene j in each dynamical system of dimension \(n_g\) genes, we translated these gene influence scores into pathway influence scores. We used the Reactome pathway data set, GO biological process terms, and GO molecular function terms from MSigDB [78] which map each biological pathway/process to the genes that are involved in it. For each system of size \(n_g\), we obtained the pathway (p) influence scores (\(\mathcal{P}\mathcal{S}_{n_g, p}\)) as the sum of the influence scores of all genes involved in that pathway:

$$\begin{aligned} \mathcal{P}\mathcal{S}_{n_g, p} = \sum \limits _{j \in p} \mathcal{I}\mathcal{S}_{n_g, j} \end{aligned}$$

We statistically tested whether each pathway influence score is higher than expected by chance using empirical null distributions. We randomly permuted the gene influence scores across the genes to recompute “null” values \(\mathcal{P}\mathcal{S}^0_{n_g, p}\). For each pathway, we performed \(K=1000\) permutations to obtain a null distribution \(\left\{\mathcal{P}\mathcal{S}^0_{n_g, p, k}\right\}_{k=1}^{K}\) that can be compared to \(\mathcal{P}\mathcal{S}_{n_g, p}\). We could then compute an empirical p-value as \(p = \frac{1}{K}\sum _{k = 1}^{K} \mathbb {I}_{\mathcal{P}\mathcal{S}^0_{n_g, p,k}>\mathcal{P}\mathcal{S}_{n_g, p}}\), where \(\mathbb {I}\) is the indicator function. Finally, we used the mean (\(\mu _{0(n_g, p)}\)) and variance \(\left(\sigma ^2_{0(n_g, p)}\right)\) of the null distribution \(\left\{\mathcal{P}\mathcal{S}^0_{n_g, p, k}\right\}_{k=1}^{K}\) to obtain and visualize pathway z-scores that are comparable across pathways and subset sizes (\(n_g\)):

$$\begin{aligned} z_{(n_g, p)} = \frac{\mathcal{P}\mathcal{S}_{n_g, p} - \mu _{0(n_g, p)}}{\sqrt{\sigma ^2_{0(n_g, p)}}} \end{aligned}$$

Testing on B cell RNASeq data

Data procurement and processing

RNA-seq data of B cells has been downloaded from the Gene expression omnibus (GEO accession GSE100441 [63]). Filtering out genes that show zero expression in all samples and genes that are not in the prior, we are left with \(n=14691\) genes with \(\log ( FPKM+1)\) transformed gene expression values. We use the same prior as in the breast cancer experiment described above. The data spans across seven time points (= 0 h, 1 h, 2 h, 4 h, 7 h, 15 h after treatment start) for B cells treated with Rituximab as well as parallel untreated B cells as control. Two replicates are available for each condition.

Model setup for training and GRN extraction

Given that the data set for each condition consists of 2 technical replicates of 6 time points each, we note that it contains 10 different transition pairs, where a transition pair consists of two consecutive expression vectors in the data set (\(\varvec{g}(t_i)\), \(\varvec{g}(t_{i+1})\)). We randomly split these 10 transition pairs into training (8, \(80\%\)) and validation (2, \(20\%\)), where the validation set was used to tune \(\lambda _{\text {prior}}\) as well as inform an early stopping criteria. Further details on the implementation are in Additional File 2: Section 1 and our GitHub repository [37]. For the prior domain knowledge model, we used the same motif-based prior as in our breast cancer analysis (the “Model setup for training and testing” section) \(\mathcal {P^*}\left( \varvec{\gamma _k}\right) = \varvec{W_0}\cdot \varvec{\gamma _k} - \varvec{\gamma _k}\).

Rituximab pathway analysis

To examine whether PHOENIX recovers meaningful biology in this challenging dataset, we focused on analyzing the differences between the derived GRNs of the Rituximab-treated and control group. Focusing on changes of regulators, we first aggregate the influence score \(s_i\) of each regulator i by summing the weights of outgoing edges, \(s_i=\sum _{j=1}^n D_{ji}\), a common approach in node-centric analysis in GRNs. Note that our dynamics matrix D incorporates the scaling factor but is not normalized per gene, to enable a comparison between the dynamics of two networks (here: treated and untreated). To examine the change of regulatory influence of each protein, we then compute the log-fold change between influence scores in the two conditions as

$$\begin{aligned} \delta _i = \log _2 s^t_i - \log _2 s^c_i, \end{aligned}$$

where \(s^t_i\) and \(s^c_i\) are the influence score in the GRN of the treatment respectively control group. As the primary mechanism of action of Rituximab is known—directly induced apoptosis, complement-dependent cytotoxicity, NK-mediated cytotoxicity, and macrophage-mediated phagocytosis—we focused on analyzing the molecular changes within these mechanisms, with a particular focus on the top 50 most changing regulators (see Additional File 1: Table S15). As the experiment contained only B cells, hence NK and macrophages were not available for cell destruction, we focused on apoptosis and cytotoxicity related to B cell receptor signaling. We provide pathway maps of these pathways colored by (normalized) \(\delta _i\) in Additional File 1: Figure S11 and Additional File 1: Figure S12, which were visualized using the Pathview package (version 1.42.0) [79] in R (version 4.3.2).

Availability of data and materials

• All simulated data for in silico experiments has been deposited (under the MIT License) at Zenodo (DOI: [81].

• The breast cancer dataset [44] is originally from a cross-sectional breast cancer study [43] and was sorted in pseudotime in the PROB paper [8]. We obtained this pseudotime-sorted version and have deposited it (under the MIT License) at Zenodo (DOI: [44].

• The B cell RNASeq data set [63] was downloaded from the Gene expression omnibus, via the link: It was then processed according to the “Data procurement and processing” section.

• The yeast cell cycle dataset [38] was downloaded from the Gene expression omnibus, via the link: It was then processed according to the “Data processing and normalization” section.


  1. Xing J. Reconstructing data-driven governing equations for cell phenotypic transitions: integration of data science and systems biology. Phys Biol. 2022;19(6):061001.

    Article  Google Scholar 

  2. Hackett SR, Baltz EA, Coram M, Wranik BJ, Kim G, Baker A, McIsaac RS. Learning causal networks using inducible transcription factors and transcriptome-wide time series. Mol Syst Biol. 2020;16(3):e9174.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  3. Qiu X, Zhang Y, Martin-Rufino JD, Weng C, Hosseinzadeh S, Yang D, Weissman JS. Mapping transcriptomic vector fields of single cells. Cell. 2022;185(4):690–711.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  4. Yeo GHT, Saksena SD, Gifford DK. Generative modeling of single-cell time series with PRESCIENT enables prediction of cell trajectories with interventions. Nat Commun. 2021;12(1):1–12.

    Article  Google Scholar 

  5. Liu R, Pisco AO, Braun E, Linnarsson S, Zou J. Dynamical systems model of RNA velocity improves inference of single-cell trajectory, pseudo-time and gene regulation. J Mol Biol. 2022;434(15):167606.

    Article  CAS  PubMed  Google Scholar 

  6. Erbe R, Stein-O’Brien G, Fertig EJ. Transcriptomic forecasting with neural ordinary differential equations. Patterns (New York). 2023;4(8):100793.

    Article  CAS  PubMed Central  Google Scholar 

  7. Li Q. scTour: a deep learning architecture for robust inference and accurate prediction of cellular dynamics. Genome Biol. 2023;24(1):149.

  8. Sun X, Zhang J, Nie Q. Inferring latent temporal progression and regulatory networks from cross-sectional transcriptomic data of cancer samples. PLoS Comput Biol. 2021;17(3):e1008379.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  9. Mendes P, Hoops S, Sahle S, Gauges R, Dada J, Kummer U. Computational Modeling of Biochemical Networks Using COPASI. In: Maly IV, editors. Systems Biology. Totowa: Humana Press; 2009. p. 17–59.

  10. Kraeutler MJ, Soltis AR, Saucerman JJ. Modeling cardiac B-adrenergic signaling with normalized-Hill differential equations: comparison with a biochemical model. BMC Syst Biol. 2010;4(1):1–12.

    Article  Google Scholar 

  11. Alon U. An introduction to systems biology: design principles of biological circuits. Chapman and Hall/CRC; 2006.

  12. Chen Z, King WC, Hwang A, Gerstein M, Zhang J. DeepVelo: single-cell transcriptomic deep velocity field learning with neural ordinary differential equations. Sci Adv. 2022;8(48):eabq3745.

  13. Farrell S, Mani M, Goyal S. Inferring single-cell transcriptomic dynamics with structured latent gene expression dynamics. Cell Rep Methods. 2023;3(9):100589.

  14. Aliee H, Richter T, Solonin M, Ibarra I, Theis F, Kilbertus N. Sparsity in continuous-depth neural networks. 2022. arXiv preprint arXiv:2210.14672.

  15. Monti M, Fiorentino J, Milanetti E, Gosti G, Tartaglia GG. Prediction of time series gene expression and structural analysis of gene regulatory networks using recurrent neural networks. Entropy. 2022;24(2):141.

    Article  PubMed  PubMed Central  Google Scholar 

  16. La Manno G, Soldatov R, Zeisel A, Braun E, Hochgerner H, Petukhov V, Kharchenko PV. RNA velocity of single cells. Nature. 2018;560(7719):494–8.

    Article  PubMed  PubMed Central  Google Scholar 

  17. Hu Y. Modeling the gene regulatory dynamics in neural differentiation with single cell data using a machine learning approach. McGill University (Canada) ProQuest Dissertations Publishing; 2021. p. 29274301.

  18. Mao G, Zeng R, Peng J, Zuo K, Pang Z, Liu J. Reconstructing gene regulatory networks of biological function using differential equations of multilayer perceptrons. BMC Bioinformatics. 2022;23(1):1–17.

    Article  Google Scholar 

  19. Bergen V, Soldatov RA, Kharchenko PV, Theis FJ. RNA velocity-current challenges and future perspectives. Mol Syst Biol. 2021;17(8):e10282.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  20. Bergen V, Lange M, Peidli S, Wolf FA, Theis FJ. Generalizing RNA velocity to transient cell states through dynamical modeling. Nat Biotechnol. 2020;38(12):1408–14.

    Article  CAS  PubMed  Google Scholar 

  21. Cui H, Maan H, Vladoiu MC, Zhang J, Taylor MD, Wang B. DeepVelo: deep learning extends RNA velocity to multi-lineage systems with cell-specific kinetics. Genome Biol. 2024;25(1):27.

  22. Gayoso A, Weiler P, Lotfollahi M, Klein D, Hong J, Streets AM, Theis FJ, Yosef N. Deep generative modeling of transcriptional dynamics for RNA velocity analysis in single cells. Nat Methods. 2024;21(1):50–9.

  23. Gu Y, Blaauw D, Welch JD. Bayesian inference of rna velocity from multi-lineage single-cell data. bioRxiv 2022.07.08.499381.

  24. Karniadakis GE, Kevrekidis IG, Lu L, Perdikaris P, Wang S, Yang L. Physics-informed machine learning. Nat Rev Phys. 2021;3(6):422–40.

    Article  Google Scholar 

  25. Wolpert DH, Macready WG. No free lunch theorems for optimization. IEEE Trans Evol Comput. 1997;1(1):67–82.

    Article  Google Scholar 

  26. Glass K, Huttenhower C, Quackenbush J, Yuan GC. Passing messages between biological networks to refine predicted interactions. PLoS ONE. 2013;8(5):e64832.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  27. Chen RT, Rubanova Y, Bettencourt J, Duvenaud DK. Neural ordinary differential equations. Advances in Neural Information Processing Systems (NeurIPS) 31. 2018;6571–83.

  28. Chen RTQ. torchdiffeq (Version 0.2.2) [Computer software]. 2021. from Accessed 13 June 2020.

  29. Bhuva DD, Cursons J, Smyth GK, Davis MJ. Differential co-expression-based detection of conditional relationships in transcriptional data: comparative analysis and application to breast cancer. Genome Biol. 2019;20(1):1–21.

    Article  CAS  Google Scholar 

  30. Gesztelyi R, Zsuga J, Kemeny-Beke A, Varga B, Juhasz B, Tosaki A. The Hill equation and the origin of quantitative pharmacology. Arch Hist Exact Sci. 2012;66(4):427–38.

    Article  Google Scholar 

  31. Grant CE, Bailey TL, Noble WS. FIMO: scanning for occurrences of a given motif. Bioinformatics. 2011;27(7):1017–8.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  32. Lawrence M, Huber W, Pagès H, Aboyoun P, Carlson M, Gentleman R, Carey VJ. Software for computing and annotating genomic ranges. PLoS Comput Biol. 2013;9(8):e1003118.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  33. Aliee H, Theis FJ, Kilbertus N. Beyond predictions in neural ODEs: identification and interventions. 2021. arXiv preprint arXiv:2106.12430.

  34. Cheng S, Sabes PN. Modeling sensorimotor learning with linear dynamical systems. Neural Comput. 2006;18(4):760–93.

    Article  PubMed  PubMed Central  Google Scholar 

  35. Bhuva DD. SimulatorGRN [Computer software]. 2017. from Accessed 7 July 2020.

  36. Weinreb C, Wolock S, Tusi BK, Socolovsky M, Klein AM. Fundamental limits on dynamic inference from single-cell snapshots. Proc Natl Acad Sci. 2018;115(10):E2467–76.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  37. Hossain I. GitHub repository of the PHOENIX package [Computer software]. 2022. from Last accessed 18 Mar 2024.

  38. Pramila T, Wu W, Miles S, Noble WS, Breeden LL. The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the S-phase gap in the transcriptional circuitry of the cell cycle. GEO Data Deposit; 2006. from Accessed 24 Feb 2022.

  39. Pramila T, Wu W, Miles S, Noble WS, Breeden LL. The Forkhead transcription factor Hcm1 regulates chromosome segregation genes and fills the S-phase gap in the transcriptional circuitry of the cell cycle. Genes Dev. 2006;20(16):2266–78.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  40. Sirovich L. A novel analysis of gene array data: yeast cell cycle. Biol Methods Protoc. 2020;5(1):bpaa018.

  41. Glass K, Huttenhower C, Quackenbush J, Yuan GC. Passing messages between biological networks to refine predicted interactions. Sourceforge. 2013. from Accessed 27 Feb 2022.

  42. Ahnert K, Abel M. Numerical differentiation of experimental data: local versus global methods. Comput Phys Commun. 2007;177(10):764–74.

    Article  CAS  Google Scholar 

  43. Desmedt C, Piette F, Loi S, Wang Y, Lallemand F, Haibe-Kains B, TRANSBIG Consortium. Strong time dependence of the 76-gene prognostic signature for node-negative breast cancer patients in the TRANSBIG multicenter independent validation series. Clin Cancer Res. 2007;13(11):3207-14.

  44. Hossain I. Breast cancer dataset used in: Biologically informed NeuralODEs for genome-wide regulatory dynamics. Zenodo. 2024. from Last accessed 29 Apr 2024.

  45. Weighill D, Guebila MB, Lopes-Ramos C, Glass K, Quackenbush J, Platig J, Burkholz R. Gene regulatory network inference as relaxed graph matching. The Network Zoo. 2021. from Accessed 18 Sept 2022.

  46. Artibani M, Sims AH, Slight J, Aitken S, Thornburn A, Muir M, Hohenstein P. WT1 expression in breast cancer disrupts the epithelial/mesenchymal balance of tumour cells and correlates with the metabolic response to docetaxel. Sci Rep. 2017;7(1):1–15.

    Article  Google Scholar 

  47. Brett JO, Spring LM, Bardia A, Wander SA. ESR1 mutation as an emerging clinical biomarker in metastatic hormone receptor-positive breast cancer. Breast Cancer Res. 2021;23(1):1–15.

    Article  Google Scholar 

  48. Kensler KH, Regan MM, Heng YJ, Baker GM, Pyle ME, Schnitt SJ, Tamimi RM. Prognostic and predictive value of androgen receptor expression in postmenopausal women with estrogen receptor-positive breast cancer: results from the Breast International Group Trial 1–98. Breast Cancer Res. 2019;21(1):1–11.

    Article  Google Scholar 

  49. Lu XF, Zeng D, Liang WQ, Chen CF, Sun SM, Lin HY. FoxM1 is a promising candidate target in the treatment of breast cancer. Oncotarget. 2018;9(1):842.

    Article  PubMed  Google Scholar 

  50. Mandigo AC, Yuan W, Xu K, Gallagher P, Pang A, Guan YF, Knudsen KE. RB/E2F1 as a master regulator of cancer cell metabolism in advanced disease. RB/E2F1 regulates cell metabolism in advanced disease. Cancer Disc. 2021;11(9):2334–53.

    Article  CAS  Google Scholar 

  51. Chen HZ, Tsai SY, Leone G. Emerging roles of E2Fs in cancer: an exit from cell cycle control. Nat Rev Cancer. 2009;9(11):785–97.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  52. Fang C, Wang Z, Han C, Safgren SL, Helmin KA, Adelman ER, Zang C. Cancer-specific CTCF binding facilitates oncogenic transcriptional dysregulation. Genome Biol. 2020;21:1–30.

    Article  Google Scholar 

  53. Adamo P, Ladomery M. The oncogene ERG: a key factor in prostate cancer. Oncogene. 2016;35:403–14.

    Article  CAS  PubMed  Google Scholar 

  54. Pathania R, Ramachandran S, Elangovan S, Padia R, Yang P, Cinghu S, Thangaraju M. DNMT1 is essential for mammary and cancer stem cell maintenance and tumorigenesis. Nat Commun. 2015;6(1):6910.

    Article  CAS  PubMed  Google Scholar 

  55. Gillespie M, Jassal B, Stephan R, Milacic M, Rothfels K, Senff-Ribeiro A, D’Eustachio P. The reactome pathway knowledgebase 2022. Nucleic Acids Res. 2022;50(D1):D687–92.

    Article  CAS  PubMed  Google Scholar 

  56. Hanahan D, Weinberg RA. Hallmarks of cancer: the next generation. Cell. 2011;144(5):646–74.

    Article  CAS  PubMed  Google Scholar 

  57. Okal A, Matissek KJ, Matissek SJ, Price R, Salama ME, Janát-Amsbury MM, Lim CS. Re-engineered p53 activates apoptosis in vivo and causes primary tumor regression in a dominant negative breast cancer xenograft model. Gene Ther. 2014;21(10):903–12.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  58. Le Romancer M, Poulard C, Cohen P, Sentis S, Renoir JM, Corbo L. Cracking the estrogen receptor’s posttranslational code in breast tumors. Endocr Rev. 2011;32(5):597–622.

    Article  PubMed  Google Scholar 

  59. Grund-Gröschke S, Stockmaier G, Aberger F. Hedgehog/GLI signaling in tumor immunity - new therapeutic opportunities and clinical implications. Cell Commun Signal. 2019;17:172.

    Article  PubMed  PubMed Central  Google Scholar 

  60. Wang X, Yang D. The regulation of RNA metabolism in hormone signaling and breast cancer. Mol Cell Endocrinol. 2021;529:111221.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  61. Gallo C, Fragliasso V, Donati B, et al. The bHLH transcription factor DEC1 promotes thyroid cancer aggressiveness by the interplay with NOTCH1. Cell Death Dis. 2018;9:871.

    Article  PubMed  PubMed Central  Google Scholar 

  62. Madden SK, de Araujo AD, Gerhardt M, et al. Taking the Myc out of cancer: toward therapeutic strategies to directly inhibit c-Myc. Mol Cancer. 2021;20:3.

    Article  PubMed  PubMed Central  Google Scholar 

  63. Mias GI, Brooks LR, Integrated transcriptomic and proteomic dynamics of rituximab treatment in primary B cells. GEO Data Deposit. 2018. from Accessed 2 Dec 2023.

  64. Xing Y, Igarashi H, Wang X, Sakaguchi N. Protein phosphatase subunit G5PR is needed for inhibition of B cell receptor-induced apoptosis. J Exp Med. 2005;202(5):707–19.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  65. Gao T, Furnari F, Newton AC. PHLPP: a phosphatase that directly dephosphorylates Akt, promotes apoptosis, and suppresses tumor growth. Mol Cell. 2005;18(1):13–24.

    Article  CAS  PubMed  Google Scholar 

  66. Downward J. Ras signalling and apoptosis. Curr Opin Genet Dev. 1998;8(1):49–54.

    Article  CAS  PubMed  Google Scholar 

  67. Pla-Martín D, Schatton D, Wiederstein JL, Marx MC, Khiati S, Krüger M, Rugarli E. CLUH granules coordinate translation of mitochondrial proteins with mTORC1 signaling and mitophagy. EMBO J. 2020;39(9):e102731.

    Article  PubMed  PubMed Central  Google Scholar 

  68. Campbell KR, Yau C. Uncovering pseudotemporal trajectories with covariates from single cell and bulk expression data. Nat Commun. 2018;9(1):2442.

    Article  PubMed  PubMed Central  Google Scholar 

  69. Van den Bulcke T, Van Leemput K, Naudts B, van Remortel P, Ma H, Verschoren A, Marchal K. SynTReN: a generator of synthetic gene expression data for design and analysis of structure learning algorithms. BMC Bioinformatics. 2006;7(1):1–12.

    Google Scholar 

  70. Yang YH, Paquet AC. Preprocessing two-color spotted arrays. In: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. New York: Springer; 2005. pp. 49-69.

  71. Johnson WE, Li C, Rabinovic A. Adjusting batch effects in microarray expression data using empirical Bayes methods. Biostatistics. 2007;8(1):118–27.

    Article  PubMed  Google Scholar 

  72. Leek JT, Johnson WE, Parker HS, Jaffe AE, Storey JD. The sva package for removing batch effects and other unwanted variation in high-throughput experiments. Bioinformatics. 2012;28(6):882–3.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  73. Harbison CT, Gordon DB, Lee TI, Rinaldi NJ, Macisaac KD, Danford TW, Young RA. Transcriptional regulatory code of a eukaryotic genome. Nature. 2004;431(7004):99–104.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  74. Weighill D, Guebila MB, Lopes-Ramos C, Glass K, Quackenbush J, Platig J, Burkholz R, Gene regulatory network inference as relaxed graph matching. The Network Zoo. 2021. from Accessed 20 Sept 2022.

  75. Weighill D, Guebila MB, Lopes-Ramos C, Glass K, Quackenbush J, Platig J, Burkholz R. Gene regulatory network inference as relaxed graph matching. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence. 2021;35(11):10263–72.

  76. Bailey TL, Boden M, Buske FA, Frith M, Grant CE, Clementi L, Noble WS. MEME SUITE: tools for motif discovery and searching. Nucleic Acids Res. 2009;37(suppl_2):W202-8.

  77. Chèneby J, Gheorghe M, Artufel M, Mathelier A, Ballester B. ReMap 2018: an updated atlas of regulatory regions from an integrative analysis of DNA-binding ChIP-seq experiments. Nucleic Acids Res. 2018;46(D1):D267–75.

    Article  PubMed  Google Scholar 

  78. Liberzon A, Birger C, Thorvaldsdóttir H, Ghandi M, Mesirov JP, Tamayo P. The molecular signatures database hallmark gene set collection. Cell Syst. 2015;1(6):417–25.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  79. Luo W, Brouwer C. Pathview: an R/Bioconductor package for pathway-based data integration and visualization. Bioinformatics. 2013;29(14):1830–1.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  80. Karlsson D, Svanström O. Modelling dynamical systems using neural ordinary differential equations. [master’s thesis], Chalmers University of Technology; 2019.

  81. Hossain I. Source code for: Biologically informed NeuralODEs for genome-wide regulatory dynamics (Version v1). Zenodo. 2024.

    Article  Google Scholar 

Download references


The authors thank Marouen Ben Guebila, Dawn DeMeo, Kimberly Glass, Camila Lopes-Ramos, Panagiotis Mandros, Soel Micheletti, Enakshi Saha, and Katherine H. Shutta for thoughtful critiques and discussions. We also thank Daniel Karlsson and Olle Svanström for sharing with us their starter code [80] for basic NeuralODE training.

Review history

The review history is available as Additional file 3.

Peer review information

Tim Sands was the primary editor of this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.

Code availability

PHOENIX is available as an open-source implementation via Github: [37]. The source code underlying the results presented in the article is also available at Zenodo (DOI: [81]. Both resources are released under the MIT License.


IH, VF, JF, and JQ were supported by a grant from the US National Cancer Institute (R35CA220523). JQ and VF have additional funding from the National Human Genome Research Institute (R01HG011393). RB was supported by the European Research Council (ERC) under the Horizon Europe Framework Programme (HORIZON) for proposal number 101116395 SPARSE-ML.

Author information

Authors and Affiliations



IH, JQ, and RB conceived of the project. RB and IH designed the modeling framework. IH carried out the bulk of the work including data processing and coding of the methodology. VF contributed to the breast cancer pathways modeling and interpretation. JF contributed the B cell modeling and interpretation. All authors contributed to the writing and editing of the manuscript.

Corresponding authors

Correspondence to Intekhab Hossain or John Quackenbush.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary information

Additional file 1: Supplemental results. This file contains Supplementary Figures S1-S12 and Tables S1-S16.


Additional file 2: Supplemental methods. This file contains additional implementation and methodological details of PHOENIX, as well as implementation details of competing methods for the benchmarking experiments.

Additional file 3: Peer review history.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Hossain, I., Fanfani, V., Fischer, J. et al. Biologically informed NeuralODEs for genome-wide regulatory dynamics. Genome Biol 25, 127 (2024).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: