Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale

Table 3 Online comparison of Bystro and GEMINI/Galaxy in filtering 1 × 10⁶ variants

No.	Program	Query	Time (s)	Variants	Ts/Tv
1	Bystro	cadd > 15 alt:(a \|\| c \|\| t \|\| g)	0.004 ± 0	28,099	2.512
1	GEMINI	SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15	442 ± 87	22,063	NA
2	Bystro	gnomad.exomes.af < .001 cadd > 15 missense	0.007 ± 0.003	6840	3.083
2	GEMINI	SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15 AND aaf_exac_all < .001 AND variant_impacts.impact = “missense_variant”	77.6 ± 18.6	5160	NA
3	Bystro	gnomad.exomes.af < .001 cadd > 15 nonsynonymous	0.006 ± 0.001	6840	3.083
3	GEMINI	SELECT * FROM variants JOIN variant_impacts ON variants.variant_id = variant_impacts.variant_id WHERE cadd_scaled > 15 AND aaf_exac_all < .001 AND variant_impacts.impact = “nonsynonymous_variant”	NA	0	NA

Bystro was compared to the latest hosted version of GEMINI (v0.8.1, on the Galaxy platform) in filtering the 1 × 10⁶ variant subset of 1000 Genomes Phase 3, which was the largest tested file that GEMINI/Galaxy could process. GEMINI requires structured SQL queries, while Bystro allows for shorter, unstructured search. In query 1, Bystro searched for CADD scores only within single-nucleotide polymorphisms (using alt:(a || c || t || g) or equivalently the regex query alt:/[actg]/), to normalize results with GEMINI, which provides no CADD data for insertions and deletions. In queries 2 and 3, Bystro’s search engine returned identical results for the synonymous terms “missense” and “nonsynonymous,” despite annotating such sites only as “nonsynonymous.” In contrast, GEMINI required the specific term “missense_variant.” GEMINI/Galaxy and Bystro returned different results because the latest version of GEMINI on Galaxy (0.8.1) uses outdated annotation sources. Comparisons between Bystro and GEMINI/Galaxy are further limited as GEMINI does not provide a natural-language parser, annotation field filters, an interactive result browser, per-query statistics, or the ability to filter saved search results. Notably, Bystro also performed substantially faster, returning all results in < 1 s

ISSN: 1474-760X