RNA velocity and protein acceleration from single-cell multiomics experiments

The simultaneous quantification of protein and RNA makes possible the inference of past, present and future cell states from single experimental snapshots. To enable such temporal analysis from multimodal single-cell experiments, we introduce an extension of the RNA velocity method that leverages estimates of unprocessed transcript and protein abundances to extrapolate cell states. We apply the model to four datasets and demonstrate consistency among landscapes and phase portraits.

3 infer, along with the present and future state of cells, the past state of each cell in four peripheral blood mononuclear cell (PBMC) datasets (Fig. 1a).
The net accumulation or depletion of unspliced mRNA, spliced mRNA, and protein, scales with distance from an estimated expression equilibrium (Fig. 1b, Supplementary Note). Briefly, if a gene's current (t0) unspliced mRNA level is high relative to equilibrium, we infer that the gene is currently upregulated and future (t1) spliced mRNA level will be high. Analogously, if the current protein level is high, we infer that the gene is currently downregulated and past (t-1) spliced mRNA level was high. Unlike methods that require time-series measurements 9-11 , our method estimates protein translation dynamics from a single time-point.
The approximately linear gene-specific phase plots (Supplementary Figs. 1-4) are qualitatively consistent with a first-order model of protein production, although we do observe some deviations by cell type. A subset of high-abundance gene/protein pairs were used to estimate the gene-specific protein accumulation rates (Supplementary Table 1  suggest that alignment between the two is strongly associated with cell type. The combination of RNA and protein velocities reveals the curvature of the cell state landscape. With respect to reference frame of unspliced transcripts, the protein inferences correspond to second-order protein acceleration of unspliced transcript counts. We visualize cell movement in the spliced mRNA using a Bezier curve calculated from three points corresponding to past, present, and future. The protein acceleration landscape for the CITE-seq data shows a diversity of dynamics (Fig. 2a). B cells and a subset of T lymphocytes exhibit strong protein acceleration. This behavior may reflect recent findings 12 that describe mRNA transcript "pile-up" due to heavily suppressed translation in naïve CD4+ T cells. The unidirectional monocyte velocity suggests response or plasticity 13 . However, due to the imperfect separation of cell types in the embedding, we caution against over-interpretation of aggregated velocities.

REAP-seq results show similar T lymphocyte circulation and partitioning into static and mobile
populations (Supplementary Fig. 21), albeit with noisier data than CITE-seq. ECCITE-seq yields sparse velocity landscapes (Supplementary Figs. 22-23), which result from the very shallow sequencing of spliced transcripts: we confirmed that ECCITE-seq captures 1-2 orders of magnitude fewer RNA molecules per cell than CITE-seq or REAP-seq, which is consistent with Fig. 1b of Mimitou et al. 5 ( Supplementary Fig. 24). In addition to using genes with linear behavior to infer velocity, we qualitatively confirmed consistency between datasets for the gene CD4, which does not follow linear behavior (Fig. 2b). Our qualitative protein acceleration framework does not attempt to account for regulatory differences between cell types. Future work may involve better measurements that enable inference of parameters for a more gene-specific model of regulation that takes into account specific biochemical modulators.
Current protein quantification protocols are adapted for histological markers on the cell surface; technology that can quantify dissolved protein could aid in more extensive studies of cell state and kinetics.
9 generated by applying a Gaussian kernel to the cell-specific velocities of j nearest neighbor cells (j = 100 for CITEseq and REAP-seq, 500 for ECCITE-seq). The aggregated future state corresponding to each grid point was calculated by adding this RNA forward velocity vector to the grid point.
To calculate protein velocity, pairs of proteins and source spliced RNA consistent with the linear model hypothesis were manually selected to stand in for spliced and unspliced RNA. These were identically projected onto the same embedding (m = 499). Forward grid arrows were generated by applying a Gaussian kernel to the cell-specific velocities of l nearest neighbor cells (l = 200). The aggregated past state corresponding to each grid point was calculated by subtracting this protein forward velocity vector from the grid point. Curved arrows corresponding to the entire trajectory were produced by fitting a second-order Bézier curve to the sequence of past, present, and future states of each grid point using the bezier 0.9.0 Python package.
Scripts to reproduce the results of this paper are available at https://github.com/pachterlab/GSP_2019.