Table 1 Variant and classification origins of the benchmark datasets used

From: GAVIN: Gene-Aware Variant INterpretation for medical sequencing

Dataset Benign variants (n) Pathogenic variants (n) Origin
VariBench tolerance DS7, training set 11,347 6143 PhenCode database, IDbases, and 18 individual LSDBs
VariBench tolerance DS7, test set 1377 510 PhenCode database, IDbases, and 18 individual LSDBs
MutationTaster2 benchmark set 1194 161 HGMD Professional and 1000 Genomes
ClinVar (additions of Nov 2015 to Feb 2016) 1668 1688 Submissions by clinical molecular geneticists, expert panels, diagnostic laboratories, and companies
UMCG, variants exported from clinical diagnostic interpretation software 1176 174 Clinical diagnostic classifications of variants in cardiology, dermatology, epilepsy, dystonia, and preconception screening
UMCG, germline variants for familial cancer cases 301 26 Hereditary cancer variant classifications by an MD following ACMG guidelines
Total 17,063 8702 25,765