Skip to main content

Table 2 Data sets and their relative coverage on bias motifs

From: Characterizing and measuring bias in sequence data

      Relative coverage
Data set      GC extremes Special motifs
Sample # Library method Sequencing platform Coverage (x) GC ≤ 10% GC ≥ 75% GC ≥ 85% (AT)15 G|C ≥ 80% Bad promoters
P. falciparum 1 Fisher et al.a with Kapa reagents Illumina MiSeq 150 0.58 - - 0.43 - -
3D7 2 Ion Torrent standard Ion Torrent PGM 103 0.39 - - 0.11 - -
  3 Pacific Biosciences standard Pacific Biosciences RS 104 0.89 - - 0.85 - -
E. coli 4 Fisher et al.a with Kapa reagents Illumina MiSeq 380 - 0.82 - - - -
K12 MG1655 5 Ion Torrent standard Ion Torrent PGM 311 - 0.31 - - - -
  6 Pacific Biosciences standard Pacific Biosciences RS 115 - 0.97 - - - -
R. sphaeroides 7 Fisher et al.a with Kapa reagents Illumina MiSeq 388 - 0.94 0.60 - - -
2.4.1 8 Ion Torrent standard Ion Torrent PGM 302 - 0.39 0.10 - - -
  9 Pacific Biosciences standard Pacific Biosciences RS 142 - 0.97 0.87 - - -
Human 10 Aird et al. with Phusion Illumina HiSeq v2 028 0.58 0.27 0.071 0.38 0.19 0.027
NA12878 11 Aird et al. with Phusion+betaine Illumina HiSeq v2 048 0.44 0.44 0.28 0.26 0.20 0.14
  12 Aird et al. with AccuPrime Illumina HiSeq v2 075 0.42 0.42 0.23 0.23 0.38 0.16
  13 Fisher et al.a Illumina HiSeq v3 070 0.29 1.1 0.56 0.23 0.44 0.39
  14 Fisher et al.a with Kapa reagents Illumina HiSeq v3 120 0.41 0.88 0.48 0.25 0.65 0.36
  14' Fisher et al.a with Kapa reagents Illumina HiSeq v3 000.5 0.41 ± 0.0032 0.88 ± 0.0047 0.48 ± 0.0067 0.25 ± 0.0042 0.65 ± 0.012 0.37 ± 0.022
  15 Ion Torrent standard Ion Torrent PGM 001.1 0.27 0.36 0.068 0.19 0.26 0.046
  16 Complete Genomics standard Complete Genomics 079 0.24 0.53 0.18 0.28 0.61 0.092
  1. aLow-input variation of Fisher et al. [31] (see Materials and methods). Data sets from samples, library construction methods and sequencing platforms are shown, along with their total coverage of the genome, and relative coverage, for each of five bias motifs and a set of 'bad promoters' (see text). Entries are blank if the samples' genome had no instances of the given motif. Data set 14' is the summary of ten random subsamplings from data set 14, with coverage reduced to 0.5×, and we show the mean and standard deviations for the relative coverage measurements from it (see text).