Skip to main content

Table 1 Bystro, VEP, ANNOVAR offline command-line performance

From: Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale

Software

Dataset

Samples

Variants

Variants/s

Bystro vs

Bystro

1000G Phase 3 chr1

2504

1 × 106

8156 ± 195

1000G Phase 3 chr1

2504

2 × 106

8484 ± 67.9

1000G Phase 3 chr1

2504

4 × 106

8516 ± 57.2

1000G Phase 3 chr1

2504

6.5 × 106

7779 ± 21.8

1000G Phase 1

1092

3.9 × 107

5417 ± 76.8

1000G Phase 3

2504

8.5 × 107

7904 ± 15.9

VEP

1000G Phase 1

1092

3.9 × 107

18.67 ± 0.58

290×

1000G Phase 3

2504

8.5 × 107

10.00 ± 0.00

790×

ANNOVAR

1000G Phase 3 chr1

2504

1 × 106

74.67 ± 0.21

109×

1000G Phase 3 chr1

2504

2 × 106

75.32 ± 0.06

113×

1000G Phase 3 chr1

2504

4 × 106

75.15 ± 0.39

113×

1000G Phase 3 chr1

2504

6.5 × 106

NA

NA

1000G Phase 1

1092

3.9 × 107

NA

NA

1000G Phase 3

2504

8.5 × 107

NA

NA

  1. Bystro, VEP, and ANNOVAR were similarly configured with eight threads on Amazon i3.2xlarge servers. “Dataset” refers to the VCF file used. “Variants/s” is the average of three trials. VEP performance was recorded after 2 × 105 sites in consideration of time. In runs of 1 × 106 or more annotated sites, VEP performance did not deviate from the 2 × 105 value. ANNOVAR could not complete the full Phase 1, Phase 3, or Phase 3 chromosome 1 datasets due to memory limitations. Thus, ANNOVAR was compared to Bystro on subsets of 1000 Genomes Phase 3 chromosome 1. Bystro run times included time taken to compress outputs. 1000 Genomes Phase 1 performance reflects IO limitations