Skip to main content

Table 2 Identifying low-quality reads and their contribution to the error rate

From: Accuracy and quality of massively parallel DNA pyrosequencing

Data selection Percent of reads Error rate
All reads 100.0% 0.49%
Reads with no Ns 94.4% 0.24%
Reads with one or more Ns 5.6% 4.7%
Reads with length ≥81 and ≤108 98.8% 0.33%
Reads with length <81 or >108 1.2% 18.9%
Reads with no Ns and length ≥81 and ≤108 93.3% 0.20%
Reads with no proximal errors 97.0% 0.45%
Reads with fewer than three proximal errors >99.99% 0.48%
Reads with more than three proximal errors <0.01% 12.2%
Reads with no Ns and length ≥81 and ≤108 and no proximal errors 90.6% 0.16%
  1. Removing reads with Ns is the most effective means we found of removing low-quality data and improving the error rates. Read lengths that are either longer or shorter than expected, and are outside the peak of common reads, also correlate strongly with incorrect reads.