Skip to main content
Fig. 3 | Genome Biology

Fig. 3

From: satmut_utils: a simulation and variant calling package for multiplexed assays of variant effect

Fig. 3

Read preprocessing for error correction. For A and B, solid colored circles represent SNPs, either true or false positive (error). A Primer base quality masking schematic. Base qualities for read segments determined to originate from synthetic primer sequences are set to 0. Black lines indicate the sequenced fragment. Solid gray lines with ticks represent primers/directionality. Gray dotted lines represent reads off the input fragment. Readthrough coverage refers to coverage from adjacent PCR tiles, required to call variants that overlap primers. B Consensus deduplication schematic. UMI-tools directional adjacency method [75] is used to group paired-end reads from a common unique fragment, defined by UMI and read 1 position (R1 pos.). A custom consensus deduplication algorithm generates the consensus base among duplicates at each aligned fragment position for each read. C Primer base quality masking improves accuracy of variants underlying primers. Simulated datasets (Fig. 2, N = 4) were analyzed with/without primer BQ masking and true positive variants that overlap primers are plotted compared to variants not overlapping a primer. D Consensus deduplication maintains coverage uniformity. A UMI-containing, RACE-like negative control (NC) library was generated. Waterfall plots of cumulative fragment coverage for consensus-deduplicated reads and non-deduplicated reads indicate uniform collapse of PCR duplicates. x-axis is in log10 scale with a range between 4.6 and 5.6. E Consensus deduplication reduces false positives. The effect of consensus deduplication is shown for the RACE-like NC library for each variant type. UMI = Unique molecular index; SNP = single-nucleotide polymorphism; MNP = multiple-nucleotide polymorphism; NC: negative control

Back to article page