Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: SequencErr: measuring and suppressing sequencer errors in next-generation sequencing data

Fig. 1

Measuring sequencer error rates. a, b Reference DNA method, where large amounts of reference DNA are needed. This can be achieved by starting from a small amounts of DNA/cells (to minimize inter-molecule/cell genetic heterogeneity) followed a by a large number of PCR cycles and sequencing. Alternatively, we can start from b large amounts of starting DNA/cells followed by a small number of PCR cycles (to minimize PCR errors) and sequencing. In both approaches, mutations/PCR errors (red dots) before sequencing can confound the sequencer error rate estimate (red triangles). c We interrogate the sequencer errors by focusing on discordant bases between forward and reverse reads of the same DNA segment within the overlapping regions. Such mismatches must have happened in the sequencer. d Public datasets produced by HiSeq, NextSeq, and NovaSeq as of December 2019. Datasets without proper read names, with very small sizes, or with very short reads (so that overlap is minimal) are not suitable for our analysis (see the “Methods” section). HiSeq has the most suitable datasets and we downloaded and analyzed ~ 50% of these. eg Tile-level error rate across representative sequencers for e HiSeq, f NextSeq, and g NovaSeq. In each panel, a “good” sequencer (top) is illustrated with a “problematic” sequencer (bottom), where sequencer identifiers are indicated on the right. h Comparison of overall error rate (oER) and sequencer error rate (with or without computational error suppression) measurements on a common DNA library (generated by PCR enzymes Kapa and Q5) sequenced by two sequencing providers (St. Jude Children’s Research Hospital Computational Biology Genomics Laboratory (SJ) and HudsonAlpha Institute of Biotechnology (HAIB)), with two different NovaSeq sequencers. Tile arrangements are determined according to vendor documentation (see the “Methods” section). Tile-level error rates are capped at 200 per million for visualization purposes. ***Significant Wilcoxon rank-sum test (two-sided) P value (< 0.01). n.s, not significant (P > 0.01)

Back to article page