Skip to main content

Table 2 Data sets and their relative coverage on bias motifs

From: Characterizing and measuring bias in sequence data

     

Relative coverage

Data set

    

GC extremes

Special motifs

Sample

#

Library method

Sequencing platform

Coverage (x)

GC ≤ 10%

GC ≥ 75%

GC ≥ 85%

(AT)15

G|C ≥ 80%

Bad promoters

P. falciparum

1

Fisher et al.a with Kapa reagents

Illumina MiSeq

150

0.58

-

-

0.43

-

-

3D7

2

Ion Torrent standard

Ion Torrent PGM

103

0.39

-

-

0.11

-

-

 

3

Pacific Biosciences standard

Pacific Biosciences RS

104

0.89

-

-

0.85

-

-

E. coli

4

Fisher et al.a with Kapa reagents

Illumina MiSeq

380

-

0.82

-

-

-

-

K12 MG1655

5

Ion Torrent standard

Ion Torrent PGM

311

-

0.31

-

-

-

-

 

6

Pacific Biosciences standard

Pacific Biosciences RS

115

-

0.97

-

-

-

-

R. sphaeroides

7

Fisher et al.a with Kapa reagents

Illumina MiSeq

388

-

0.94

0.60

-

-

-

2.4.1

8

Ion Torrent standard

Ion Torrent PGM

302

-

0.39

0.10

-

-

-

 

9

Pacific Biosciences standard

Pacific Biosciences RS

142

-

0.97

0.87

-

-

-

Human

10

Aird et al. with Phusion

Illumina HiSeq v2

028

0.58

0.27

0.071

0.38

0.19

0.027

NA12878

11

Aird et al. with Phusion+betaine

Illumina HiSeq v2

048

0.44

0.44

0.28

0.26

0.20

0.14

 

12

Aird et al. with AccuPrime

Illumina HiSeq v2

075

0.42

0.42

0.23

0.23

0.38

0.16

 

13

Fisher et al.a

Illumina HiSeq v3

070

0.29

1.1

0.56

0.23

0.44

0.39

 

14

Fisher et al.a with Kapa reagents

Illumina HiSeq v3

120

0.41

0.88

0.48

0.25

0.65

0.36

 

14'

Fisher et al.a with Kapa reagents

Illumina HiSeq v3

000.5

0.41 ± 0.0032

0.88 ± 0.0047

0.48 ± 0.0067

0.25 ± 0.0042

0.65 ± 0.012

0.37 ± 0.022

 

15

Ion Torrent standard

Ion Torrent PGM

001.1

0.27

0.36

0.068

0.19

0.26

0.046

 

16

Complete Genomics standard

Complete Genomics

079

0.24

0.53

0.18

0.28

0.61

0.092

  1. aLow-input variation of Fisher et al. [31] (see Materials and methods). Data sets from samples, library construction methods and sequencing platforms are shown, along with their total coverage of the genome, and relative coverage, for each of five bias motifs and a set of 'bad promoters' (see text). Entries are blank if the samples' genome had no instances of the given motif. Data set 14' is the summary of ten random subsamplings from data set 14, with coverage reduced to 0.5×, and we show the mean and standard deviations for the relative coverage measurements from it (see text).