Skip to main content
Fig. 5 | Genome Biology

Fig. 5

From: satmut_utils: a simulation and variant calling package for multiplexed assays of variant effect

Fig. 5

satmut_utils analysis of a CBS variant library by orthogonal library preparation methods. A Experimental strategy. A human CBS coding variant library was stably expressed in a landing pad cell line [31]. gDNA and total RNA were sequenced by one of two targeted library preparation methods 24 h after induction with Doxycycline (Dox). B CBS domains and sequencing coverage. Coverage for two tiles by the amplicon method is contrasted with full coverage in the RACE-like method. The maximum coverage depth is shown on the left of the track. C Filtering of variant calls. Random forest models were trained for each method using negative control libraries and the “sim” workflow. Plotted are the mean number of variant calls + / − the standard deviation (N = 3). The total possible calls are as follows: amplicon (531 SNPs, 2145 MNPs); RACE-like (4105 SNPs, 16,863 MNPs). D Differential abundance for mutation types. The difference in the median log frequency between cDNA and gDNA is shown for variants observed in all gDNA and cDNA replicates (N = 6). Outliers are greater than 1.5 * interquartile range. Asterisks indicate significant differences by one-sided Wilcoxon rank-sum tests (p < 0.05). E Variant call overlap between methods. Overlap in variant calls is shown for CBS tiles 2 and 4. Total calls are the theoretical number of variant calls. Significance of overlap was computed by a hypergeometric test. F Biological replicate correlation for amplicon libraries. Reproducibility between log10 frequencies was determined by Pearson’s correlation coefficient after filtering (“Methods”). G Similarity of variant frequency estimates between methods. The difference in median log10 frequency between methods is shown for cDNA libraries with filtering and correlation as in F. M = million

Back to article page