Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: Inference of single-cell phylogenies from lineage tracing data using Cassiopeia

Fig. 1

A generalized approach to lineage tracing and lineage reconstruction. a The workflow of a lineage tracing experiment. First, cells are engineered with lineage tracing machinery, namely Cas9 that cuts a genomic target site; the target site accrues heritable, Cas9-induced indels (“character states”). Next, the indels are read off from single cells (e.g., by scRNA-seq) and summarized in a “character matrix,” where rows represent cells, columns represent individual target sites (or “characters”), and values represent the observed indel (or “character state”). Finally, the character matrix is used to infer phylogenies by one of various methods. b The Cassiopeia processing pipeline. The Cassiopeia software includes modules for the processing of target-site sequencing data: first, identical reads are collapsed together and similar reads are error corrected; second, these reads are locally aligned to a reference sequence and indels are called from this alignment; third, unique molecules are aggregated per cell and intra-doublets are called from this information; finally, the cell population is segmented into clones (or lineage groups) and inter-doublets are called. These clones are then passed to Cassiopeia’s reconstruction module for phylogenetic inference. c The Cassiopeia reconstruction framework. Cassiopeia takes as input a “character matrix,” summarizing the mutations seen at heritable target sites across cells. Cassiopeia-Hybrid merges two novel algorithms: the “greedy” (Cassiopeia-Greedy) and “Steiner tree/integer linear programming” (Cassiopeia-ILP) approaches. First, the greedy phase identifies mutations that likely occurred early in the lineage and splits cells recursively into groups based on the presence or absence of these mutations. Next, when these groups reach a predefined threshold, we infer Steiner trees, finding the tree of minimum weight connecting all observed cell states across all possible evolutionary histories in a “potential graph,” using integer linear programming (ILP). Finally, these trees (corresponding to the maximum parsimony solutions for each group) are returned and merged into a complete phylogeny

Back to article page