Tree inference for single-cell data

Understanding the mutational heterogeneity within tumors is a keystone for the development of efficient cancer therapies. Here, we present SCITE, a stochastic search algorithm to identify the evolutionary history of a tumor from noisy and incomplete mutation profiles of single cells. SCITE comprises a flexible Markov chain Monte Carlo sampling scheme that allows the user to compute the maximum-likelihood mutation history, to sample from the posterior probability distribution, and to estimate the error rates of the underlying sequencing experiments. Evaluation on real cancer data and on simulation studies shows the scalability of SCITE to present-day single-cell sequencing data and improved reconstruction accuracy compared to existing approaches. Electronic supplementary material The online version of this article (doi:10.1186/s13059-016-0936-x) contains supplementary material, which is available to authorized users.


Supplementary Figures
. Estimated error rates: 0.4309 for allelic dropout and 6.04 × 10 −6 for false discovery, 45% missing data points. The mutation matrix is taken from [2]. Color coding of matrix cells: blue -heterozygous mutation called in cell, purple -homozygous mutation called in cell, white -site not mutated in cell, brown -missing data point; (b) Clear cell renal cell carcinoma: single-cell exome sequencing of 17 tumor cells, 35 mutation sites are not always present among the 50 detailed in the supplementary material of [3]. Estimated error rates: 0.1643 for false negatives and 2.67 × 10 −5 for false positives, 22% missing data points. Color coding of matrix cells: blue -heterozygous mutation called in cell, purple -homozygous mutation called in cell, white -site not mutated in cell, brown -missing data point; (c) Oestrogen-receptor positive (ER + ) breast cancer: single nuclei exome sequencing of 47 tumor cells, 40 SNVs shared by at least two cells [4]. Estimated error rates: 9.72 ± 2.19% for allelic dropout, and 1.24×10 −6 for false discovery. Missing data points: 1.4%. Color coding of matrix cells: blue -mutation is present in cell, white -site not mutated in cell, brown -missing data point.  [3]. With the missing data and small number of samples there are many co-optimal trees, but they share a linear tree structure which suggests a monoclonal tumor progression. Blue boxes indicate mutation sequences with non-identifiable order.  Fig. S8: ML tree for the (ER + ) breast cancer. The ML tree with attachments for the (ER + ) breast cancer dataset [4]. The tree structure suggests a monoclonal progression of the tumor in its early stages followed by the development of a complex subclone structure in more recent tumor stages. Yellow genes indicate non-synonymous mutations in known cancer genes [4]; blue boxes indicate mutation sequences with nonidentifiable order.  [4]. The samples were placed back in at the optimal attachment point after fixing the mutation tree.
(a) With fixed false negative rate β = 0.0972. The topology is very similar to the ML tree optimized over mutation tree and sample placement in Fig. S8. (b) The tree when β is learned using a beta prior with mean 0.0972 and standard deviation 0.04. The MAP value of β is much higher at 0.227 resulting in a fairly different tree. There are strong similarities in the overall structure, with some rearrangements lower in the tree and some reordering of mutations higher up. Yellow genes indicate non-synonymous mutations in known cancer genes [4].  , BitPhylogeny [5], PhyloWGS [6] and AncesTree [7]. The quantity ∆l is the normalized difference in log-likelihoods between the inferred and generating trees.   Supplementary Tables   Table S1: Comparison with perfect phylogeny. The average number of erroneous parents in tree reconstruction from noisy data matrices for classic perfect phylogeny reconstruction (PP), our maximum likelihood (ML) and maximum a posteriori (MAP) tree reconstruction. Setting are n = 20, m = 20, 25, 30, . . ., β = 0.05, and 100 matrices were generated for each value of m. The error rates are averaged over only those instances where a perfect phylogeny exists (mutation matrices with no contradicting data points). The fraction of contradiction free matrices for each m is given in the last row. While the experiment ran up to m = 100, the table is truncated after m = 45 as no contradiction free matrices were generated for m > 45.

Supplementary Material
Additional moves in the MCMC of SCITE To speed up the convergence of the chain we use two additional moves in our MCMC scheme.
Swap node labels. This move samples two of the n nodes uniformly and transposes them. While q(T, θ|T , θ) = q(T , θ|T, θ) simplifying Equation (14) of the main text, this move is neither irreducible nor necessarily aperiodic. But by including this on top of a chain built from the prune and reattach move we inherit those properties. In fact we swap two node labels with some fixed probability so we do not need to combine the two moves into a single neighborhood. Swap subtrees. For this third move, we select two of the n nodes uniformly. If the two nodes, i and k, are not in an ancestor/descendant relationship we detach them from their parents and reattach them to each others' former parent (Fig. S16). Since for the reverse move we would simply need to select the same pair of nodes, q(T, θ|T , θ) = q(T , θ|T, θ) and calculating the acceptance probability in Equation (14) of the main text reduces to the ratio of tree scores.
In the other case, assume k is a descendant of i. First we can cut the edge leading to k and move it with its subtree and attach it to the parent of i. When we next detach i and its remaining subtree (with k and its descendants removed) to make the move reversible we need to sample the new parent of i from among k and all of its descendants, which we do uniformly. To reverse the move, we again need to sample i and k at the start and also to sample the previous parent of k from among i and its new (remaining) descendants. If we denote the number of descendants of i as d(i), the proposal probabilities now depend also on d(i) and d(k), and the ratio of the forwards and backwards probabilities is simply which we use in Equation (14) of the main text along with the ratio of tree scores. This move is illustrated in Fig. S17. Combining the three MCMC moves. We build a mixture MCMC scheme where we pick one of the three moves at each step of the chain with a fixed probability. (In our examples we chose probability 0.1 for prune and reattach, 0.65 for swap node labels and 0.25 for swap subtrees as this gave good performance for the linear tree from the Hou et al. data [1].) Since the prune and reattach move satisfies the required properties of an MCMC scheme, the mixture similarly allows us to sample mutation trees according to their posterior. Now however we may do so more quickly due to the additional moves.
Performance gain by combining moves. To find out how much the additional moves can improve performance we used the same 400 trees with 20 nodes generated for the comparison of different methods with m = {40, 60, 80}. For each tree we found the average number of steps needed by SCITE to find a ML tree as in the main text as we varied the probability of each move in steps of 0.05 from 0.05 to 0.9. There is a large region with similarly good performance indicated by the purplish basins in Fig. S18. These regions consist of high and roughly equal amounts of the prune and reattach and swap node labels moves and low levels of the third swap subtrees move. The optimal probabilities for the simulated data are around (0.55, 0.4, 0.05) respectively which provides a speed up of a factor of 3 or 4 compared to using choosing the prune and reattach move 90% of the time, increasing to a factor of 17-19 compared to choosing prune and reattach with 98% probability. This indicates the scale of improvement that allowing the additional moves, especially swapping node labels, provides. The optimal probabilities also find the ML tree around 2 or 3 times faster compared to the choice of (0.1, 0.65, 0.25) above for such simulated data. Effect of γ on convergence. Along with the move probabilities, we may also vary the parameter γ which flattens or amplifies the score landscape, making the moves easier or harder to accept. With the optimal move probabilities there is a basin of γ values with good performance around the optimal value just below γ = 1 [ Fig. S19(a)]. Performance starts to degrade quickly for values which are too small below γ ≈ 0.5 and degrades more slowly for too large values above around γ ≈ 1.5.
We observe similar behavior with move probabilities set to (0.1, 0.65, 0.25) [ Fig. S19(b)], suggesting that selecting γ = 1, as also required for the MCMC sampling, is a good choice. 10 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q m: 40 60 q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q m: 40 60