Skip to main content

Table 1 Variant and classification origins of the benchmark datasets used

From: GAVIN: Gene-Aware Variant INterpretation for medical sequencing

Dataset

Benign variants (n)

Pathogenic variants (n)

Origin

VariBench tolerance DS7, training set

11,347

6143

PhenCode database, IDbases, and 18 individual LSDBs

VariBench tolerance DS7, test set

1377

510

PhenCode database, IDbases, and 18 individual LSDBs

MutationTaster2 benchmark set

1194

161

HGMD Professional and 1000 Genomes

ClinVar (additions of Nov 2015 to Feb 2016)

1668

1688

Submissions by clinical molecular geneticists, expert panels, diagnostic laboratories, and companies

UMCG, variants exported from clinical diagnostic interpretation software

1176

174

Clinical diagnostic classifications of variants in cardiology, dermatology, epilepsy, dystonia, and preconception screening

UMCG, germline variants for familial cancer cases

301

26

Hereditary cancer variant classifications by an MD following ACMG guidelines

Total

17,063

8702

25,765