Skip to main content

Table 2 Online comparison of Bystro and recent programs in filtering 8.49 × 107 variants from 1000 Genomes

From: Bystro: rapid online variant annotation and natural-language filtering at whole-genome scale

Group

Search query

Time (s)

Variants

Tr:Tv

1

Exonic

0.030 ± 0.030

993,343

2.96

2 (a)

cadd > 20 maf < .001 pathogenic expert review missense

0.029 ± 0.009

65

1.71

2 (b)

cadd > 20 maf < .001 pathogenic expert’s review non-synonymous

0.036 ± 0.019

65

1.71

2 (c)

cadd > 20 maf < .001 pathogen expert-reviewed nonsynonymous

0.044 ± 0.025

65

1.71

3 (a)

Early onset breast cancer

0.046 ± 0.029

4335

2.51

3 (b)

Early-onset breast cancer

0.037 ± 0.020

4335

2.51

3 (c)

Early onset breast cancers

0.033 ± 0.015

4335

2.51

4 (a)

Pathogenic nonsense Ehlers-Danlos

0.038 ± 0.027

1

NA

4 (b)

Pathogenic nonsense E.D.S

0.078 ± 0.087

1

NA

4 (c)

Pathogenic stopgain eds

0.040 ± 0.022

1

NA

  1. The full 1000 Genomes Phase 3 VCF file (853 GB, 8.49 × 107 variants, 2504 samples) was filtered in the publicly available Bystro web application using the Bystro natural-language search engine. VEP, GEMINI, and wANNOVAR (not shown) were also tested, but were unable to annotate this dataset or filter it. Bystro’s search engine uses a natural language parser that allows for unstructured queries: queries in groups 2, 3, and 4 show phrasing variations that did not affect results returned, as would be expected for a search engine that could handle normal language variation. “Tr:Tv” is the transition to transversion ratio automatically calculated for each query by the search engine. The transition to transversion ratio of 2.96 for the “exonic” query is close to the ~ 2.8–3.0 ratio expected in coding regions, suggesting that the search engine accurately identified exonic (coding) variants