Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: A statistical framework for analyzing deep mutational scanning data

Fig. 3

Weighted least squares regression reduces standard error and improves replicate correlation. a The number of reads (shaded blue bars) and the distribution of variant regression weights (boxplots, solid green line is the median, dotted green line is the mean, box spans the first to third quartile, whiskers denote the data range) for each time point in a single BRCA1 E3 ubiquitin ligase selection is shown. Time points with fewer reads per variant are downweighted in the regression. The weights for later time points are lower on average because most variants decrease in frequency during the course of the selection. b A density plot of standard errors for all variants in the selection shown in (a) calculated using weighted least squares regression (blue line) or ordinary least squares regression (green line) is shown. The weighted least squares regression method returns lower standard errors using the same underlying data by minimizing the impact of sampling error in low read count time points. c The mean standard error of variants after randomly downsampling reads in a single time point in one of the E4B E3 ubiquitin ligase selections is shown. Mean standard errors for all variants at each read downsampling percentage were calculated using either weighted least squares regression (blue) or ordinary least squares regression (green). Error bars indicate the 95% confidence interval of five random downsampling trials at each percentage. d Read counts per time point in the selection described in (c) is shown. The lines on the bar for time point 2 correspond to the level of downsampling on the x-axis of (c). e, f Plots of variant scores in two replicate selections from the BRCA1 E3 ubiquitin ligase dataset are shown. Replicate agreement for scores calculated using the weighted least squares regression model (e) is higher than agreement for scores calculated using ordinary least squares regression (f). The dashed line shows the line of best fit for the replicate scores in each plot. Hex color indicates point density

Back to article page