Skip to main content
Figure 1 | Genome Biology

Figure 1

From: EMu: probabilistic inference of mutational processes and their localization in the cancer genome

Figure 1

Performance. Results of the performance test of the EM method on simulated mutation data. Each simulated tumor belongs to one of up to three different cancer types, where in each type there are five independent processes active. The total number of processes per data set is thus 5 (blue), 10 (red) or 15 (yellow). Here, we show the scaling of different observables as a function of sample size (calculated over 50 replicates for each combination of M and n). (A) The number of processes present in the data is determined via the BIC. Shown are the median (line), the smallest and the largest (shaded area) number of inferred processes. (B) The correlation between the real and the inferred mutation spectra (the difference from 1 is plotted). (C) The time until completion of the inference program scales approximately linearly with M (for constant n; the fits above correspond to 0.93, 0.99 and 1.02). (B and C show the median with the 10% and 90% quantiles.). BIC: Bayesian information criterion; EM: expectation-maximization.

Back to article page