Frequencies and context of sequencing errors and quality scores compared to observed error rates. The sugar beet sample (yellow) and the Arabidopsis sample (blue) were each sequenced together with PhiX DNA (red and green, respectively) on a HiSeq2000 sequencing instrument. PhiX DNA only (black) was sequenced on a GAIIx. (a) Sequence context of substitution errors. The frequency of neighboring bases one position upstream and downstream of an error position is displayed. Sequence triplets were summarized for all types of base substitutions at the central position (indicated by an 'e'). We counted reads spanning the triplet positions and ignored potential further substitution errors within the triplet sequence of the read. The frequency was determined by dividing the occurrence of a triplet containing a central substitution error by the occurrence of all triplets with the same marginal bases but variable central base. The display of triplets is ordered by increasing average frequency in the HiSeq data. (b) Frequency of base substitution errors. For each sample, the proportion of each substitution is indicated (ordered by increasing average frequency in the HiSeq samples). (c) Rates of insertions or deletions in homopolymer tracts normalized by homopolymer length. Homopolymers longer than seven bases were present only in the two plant samples. Homopolymers of length 16 to 19 in the Bv-95nt data and of length 26 to 29 in the At-100nt data were each covered by less than 50 reads. (d) Expected versus observed error rates. Expected error rates according to quality scores (Q) were calculated for Q = 2 to Q = 40 (solid diagonal line). For each sample the uniquely aligned bases were grouped by quality score, and the observed error rate was determined from the number of observed substitution errors for each Q separately.