Skip to main content

Table 3 Online comparison of Bystro and GEMINI/Galaxy in filtering 1 × 106 variants

From: Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale

No.

Program

Query

Time (s)

Variants

Ts/Tv

1

Bystro

cadd > 15 alt:(a || c || t || g)

0.004 ± 0

28,099

2.512

1

GEMINI

SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15

442 ± 87

22,063

NA

2

Bystro

gnomad.exomes.af < .001 cadd > 15 missense

0.007 ± 0.003

6840

3.083

2

GEMINI

SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15 AND aaf_exac_all < .001 AND variant_impacts.impact = “missense_variant”

77.6 ± 18.6

5160

NA

3

Bystro

gnomad.exomes.af < .001 cadd > 15 nonsynonymous

0.006 ± 0.001

6840

3.083

3

GEMINI

SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15 AND aaf_exac_all < .001 AND variant_impacts.impact = “nonsynonymous_variant”

NA

0

NA

  1. Bystro was compared to the latest hosted version of GEMINI (v0.8.1, on the Galaxy platform) in filtering the 1 × 106 variant subset of 1000 Genomes Phase 3, which was the largest tested file that GEMINI/Galaxy could process. GEMINI requires structured SQL queries, while Bystro allows for shorter, unstructured search. In query 1, Bystro searched for CADD scores only within single-nucleotide polymorphisms (using alt:(a || c || t || g) or equivalently the regex query alt:/[actg]/), to normalize results with GEMINI, which provides no CADD data for insertions and deletions. In queries 2 and 3, Bystro’s search engine returned identical results for the synonymous terms “missense” and “nonsynonymous,” despite annotating such sites only as “nonsynonymous.” In contrast, GEMINI required the specific term “missense_variant.” GEMINI/Galaxy and Bystro returned different results because the latest version of GEMINI on Galaxy (0.8.1) uses outdated annotation sources. Comparisons between Bystro and GEMINI/Galaxy are further limited as GEMINI does not provide a natural-language parser, annotation field filters, an interactive result browser, per-query statistics, or the ability to filter saved search results. Notably, Bystro also performed substantially faster, returning all results in < 1 s