Skip to main content

Table 3 Online comparison of Bystro and GEMINI/Galaxy in filtering 1 × 106 variants

From: Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale

No. Program Query Time (s) Variants Ts/Tv
1 Bystro cadd > 15 alt:(a || c || t || g) 0.004 ± 0 28,099 2.512
1 GEMINI SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15 442 ± 87 22,063 NA
2 Bystro gnomad.exomes.af < .001 cadd > 15 missense 0.007 ± 0.003 6840 3.083
2 GEMINI SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15 AND aaf_exac_all < .001 AND variant_impacts.impact = “missense_variant” 77.6 ± 18.6 5160 NA
3 Bystro gnomad.exomes.af < .001 cadd > 15 nonsynonymous 0.006 ± 0.001 6840 3.083
3 GEMINI SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15 AND aaf_exac_all < .001 AND variant_impacts.impact = “nonsynonymous_variant” NA 0 NA
  1. Bystro was compared to the latest hosted version of GEMINI (v0.8.1, on the Galaxy platform) in filtering the 1 × 106 variant subset of 1000 Genomes Phase 3, which was the largest tested file that GEMINI/Galaxy could process. GEMINI requires structured SQL queries, while Bystro allows for shorter, unstructured search. In query 1, Bystro searched for CADD scores only within single-nucleotide polymorphisms (using alt:(a || c || t || g) or equivalently the regex query alt:/[actg]/), to normalize results with GEMINI, which provides no CADD data for insertions and deletions. In queries 2 and 3, Bystro’s search engine returned identical results for the synonymous terms “missense” and “nonsynonymous,” despite annotating such sites only as “nonsynonymous.” In contrast, GEMINI required the specific term “missense_variant.” GEMINI/Galaxy and Bystro returned different results because the latest version of GEMINI on Galaxy (0.8.1) uses outdated annotation sources. Comparisons between Bystro and GEMINI/Galaxy are further limited as GEMINI does not provide a natural-language parser, annotation field filters, an interactive result browser, per-query statistics, or the ability to filter saved search results. Notably, Bystro also performed substantially faster, returning all results in < 1 s