Skip to main content

Table 1 Summary of simulated datasets. For each dataset, we list the library name (Library), sequence length in nucleotides (L), number of unique library sequences (\(M'\)), epistasis hyperparameter used for fitness simulation (T), read type (short, long, or hybrid), \(\%\) of the sequence of interest covered by individual reads (Cover), and number of pre-selection and post-selection reads (\(N^\text {pre}\) and \(N^\text {post}\)), which were always equal. We simulate \(4.6 \times 10^7\) short reads to match the experimental data from Zhu et al. [20], and up to \(4.6 \times 10^5\) long reads to be within the current throughput of PacBio’s technologies [34, 35]. Each dataset is described in more detail in the Methods

From: MBE: model-based enrichment estimation and prediction for differential sequencing data

Library

L

M′

T

Read type

Cover

\(N^\text {pre}=N^\text {post}\)

21-mer insertion

21

\(8.5 \times 10^6\)

140

Short

100

\(4.6 \times 10^7\)

150-mer insertion

150

\(8.5 \times 10^6\)

1000

Short

100

\(4.6 \times 10^7\)

300-mer insertion

300

\(8.5 \times 10^6\)

2000

Short

100

\(4.6 \times 10^7\)

avGFP mutagenesis

714

\(2.5 \times 10^7\)

4760

Long

100

\(4.6 \times 10^5\)

avGFP mutagenesis

714

\(2.5 \times 10^7\)

4760

Short

42

\(4.6 \times 10^7\)

AAV recombination

2253

\(2.6 \times 10^{7}\)

15,020

Long

100

\(4.6 \times 10^5\)

AAV recombination

2253

\(2.6 \times 10^{7}\)

15,020

Long

100

\(4.6 \times 10^4\)

AAV recombination

2253

\(2.6 \times 10^{7}\)

15,020

Long

100

\(4.6 \times 10^3\)

AAV recombination

2253

\(2.6 \times 10^{7}\)

15,020

Short

13

\(4.6 \times 10^7\)

AAV recombination

2253

\(2.6 \times 10^{7}\)

15,020

Hybrid

100 long + 13 short

\(4.6 \times 10^3\) long + \(4.5 \times 10^7\) short