Skip to main content
Fig. 1 | Genome Biology

Fig. 1

From: SVFX: a machine learning framework to quantify the pathogenicity of structural variants

Fig. 1

Machine learning-based workflow of SVFX to identify pathogenic SVs. The original SV dataset consists of disease/case and control SVs. In our somatic model, disease SVs correspond to somatic SVs found in a cancer cohort and control SVs correspond to SVs found in the 1KG SVs. We randomly select SVs from the 1KG SV dataset such that the number of somatic SVs and control SVs matches. Similarly, for our germline model, we have (1) disease germline SVs identified in a specific disease cohort and (2) control SVs that correspond to common SVs in the 1KG SV dataset. For both germline and somatic models, we generate 1000 random iterations of the original disease and control dataset. These permuted SVs are later utilized for generating a Z-score-normalized feature matrix

Back to article page