Skip to main content

Table 1 Bystro, VEP, ANNOVAR offline command-line performance

From: Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale

Software Dataset Samples Variants Variants/s Bystro vs
Bystro 1000G Phase 3 chr1 2504 1 × 106 8156 ± 195
1000G Phase 3 chr1 2504 2 × 106 8484 ± 67.9
1000G Phase 3 chr1 2504 4 × 106 8516 ± 57.2
1000G Phase 3 chr1 2504 6.5 × 106 7779 ± 21.8
1000G Phase 1 1092 3.9 × 107 5417 ± 76.8
1000G Phase 3 2504 8.5 × 107 7904 ± 15.9
VEP 1000G Phase 1 1092 3.9 × 107 18.67 ± 0.58 290×
1000G Phase 3 2504 8.5 × 107 10.00 ± 0.00 790×
ANNOVAR 1000G Phase 3 chr1 2504 1 × 106 74.67 ± 0.21 109×
1000G Phase 3 chr1 2504 2 × 106 75.32 ± 0.06 113×
1000G Phase 3 chr1 2504 4 × 106 75.15 ± 0.39 113×
1000G Phase 3 chr1 2504 6.5 × 106 NA NA
1000G Phase 1 1092 3.9 × 107 NA NA
1000G Phase 3 2504 8.5 × 107 NA NA
  1. Bystro, VEP, and ANNOVAR were similarly configured with eight threads on Amazon i3.2xlarge servers. “Dataset” refers to the VCF file used. “Variants/s” is the average of three trials. VEP performance was recorded after 2 × 105 sites in consideration of time. In runs of 1 × 106 or more annotated sites, VEP performance did not deviate from the 2 × 105 value. ANNOVAR could not complete the full Phase 1, Phase 3, or Phase 3 chromosome 1 datasets due to memory limitations. Thus, ANNOVAR was compared to Bystro on subsets of 1000 Genomes Phase 3 chromosome 1. Bystro run times included time taken to compress outputs. 1000 Genomes Phase 1 performance reflects IO limitations