Figure 2

Distribution of low quality bases along the PhiX reference genome. Analysis was performed on reads derived from an Illumina PhiX library (PhiX-95nt data set). (a) Number of bases within B-tails (consecutive bases of Q-score = 2 at the 3' end of a read) per position. (b) Average Q-score of bases in untrimmed reads. (c) Average Q-score of bases in B-tail trimmed reads. (d) Observed per-base substitution error rate. Calculations for (a-d) were performed separately for the forward strand (green) and reverse strand (red). Low quality values accumulated in certain regions even after removal of B-tails. The peaks of observed error rates occur at positions where increased low quality counts are detected, and in most cases the peak is seen only on one strand.