Skip to main content
Fig. 7 | Genome Biology

Fig. 7

From: Next-Gen GWAS: full 2D epistatic interaction maps retrieve part of missing heritability and improve phenotypic prediction

Fig. 7

NGG retrieves genetic markers in epistatic signals improving machine learning procedures. A Analysis scheme employed to measure the effect of the 2D GWAS signal to improve phenotypic predictions. The dataset is divided into a train set (50%) and a test set (50%). The train set is used to perform 1D and 2D GWAS and retrieve stronger GWAS signals. The SNP (1D) and SNP combinations (2D) positions are used to predict phenotype classification from the test set that did not serve to identify the SNPs. Phenotype prediction is performed on the test set. B In this plot, each dot corresponds to a combination of (i) given machine learning model (among SVM, RF, DNN, Gaussian processes, LASSO, and Elastic Classifier) trying to predict (ii) a given phenotype (18 elemental concentrations of Arabidopsis leaves represented with different colors) combined with different learning data format including (iii) a different number of classes (3 or 5 classes) and (iv) different number of SNPs (30, 100, 500, 1000, 5000, 10,000). The x-axis reports the max F1 score for the model provided with SNPs simple 1D signals and randomly picked 2D epistatic SNP combinations (our control). The y-axis reports the max F1 score for the model provided with SNPs simple 1D signals and 2D epistatic SNP combinations. B We observe an improvement (above the y = x line) of > 57% of the models provided with 2D epistatic signals. Arrows point to the two best models (max F1 score). C Prediction improvement is even more dramatic (80%) for models predicting phenotypes from 30 top 1D plus 30 top 2D signals. D, E Examples of the two best predictions of the Molybdenum (Mo98 phenotype) classified concentrations are provided as confusion matrices

Back to article page