Skip to main content

Table 1 Characteristics of benchmarking data sets

From: Strainline: full-length de novo viral haplotype reconstruction from noisy long reads

Virus mixture

Virus type

#Strain

Genome size (bp)

Coverage

Divergence (%)

Strain abundance (%)

Simulated

      

5-strain HIV

HIV-1

5

9478–9719

20,000 ×

2.7–5.6

10, 15, 20, 25, 30

6-strain Poliovirus

Poliovirus-2

6

7428–7460

20,000 ×

0.2–5.5

2, 4, 8, 16, 20, 50

6-strain Poliovirus (la1)

Poliovirus-2

6

7428–7460

20,000 ×

0.2–5.5

0.1, 1, 2, 8, 20, 68.9

6-strain Poliovirus (la2)

Poliovirus-2

6

7428–7460

20,000 ×

0.2–5.5

0.01, 0.1, 1, 2, 8, 88.89

10-strain HCV

HCV-1a

10

9273–9311

20,000 ×

2.8–7.4

5, 6, 7, 8, 9, 11, 12, 13, 14, 15

15-strain ZIKV

ZIKV

15

10,251–10,269

20,000 ×

1.1–15.1

2, 4, 5, 5, 5, 6, 6, 6, 7, 7, 8, 8, 9, 10, 12

5-strain SARS-CoV-2

SARS-CoV-2

5

26,574–29,903

20,000 ×

0.3–1.1

10, 15, 20, 25, 30

5-strain SARS-CoV-2 (la)

SARS-CoV-2

5

26,574–29,903

20,000 ×

0.3–1.1

0.1, 1, 5, 10, 83.9

Experimental

      

5-strain PVY (Mock)

PVY

5

9694–9701

5800 ×

3.6–21.6

9.3, 12.7, 21.1, 24.4, 32.5

SARS-CoV-2 (Real)

SARS-CoV-2

-

-

12,000 ×

-

-

  1. For each benchmarking data set, we specify the name of virus mixture, virus type, number of strains in the mixture, range of genome size, total sequencing coverage, pairwise divergence, and strain abundance spectrum. The pairwise divergence is equal to 1−ANI, where ANI (Average Nucleotide Identity) is calculated by FastANI [31]. In experimental data sets, 5-strain PVY is a mock community, that is the sequencing data is real, but the mixture is synthetic, whereas SARS-CoV-2 (Real) is a real sample so there is no ground truth for the strains. The data sets 6-strain Poliovirus (la1) and 6-strain Poliovirus (la2) are similar with 6-strain Poliovirus, except the lowest abundance (la) of strains extends to 0.1% and 0.01%, respectively. The data set 5-strain SARS-CoV-2 (la) is similar with 5-strain SARS-CoV-2 except the lowest abundance (la) of strains extends to 0.1%