Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Enhanced Integrated Gradients: improving interpretability of deep learning models using splicing codes as a case study

Fig. 2

EIG framework with handwritten digit data. a On the left, mean attributions generated from 300 examples of digit 5 and median baseline digit 3 using median H-L-IG approach on a feed-forward neural network. On the right, the subset of statistically significant features for the same set (one-sided t test, Bonferroni adjusted p value ≤ 0.05.) Pixels belonging to the digit 5 are blue, positive attribution shown in green, and negative attribution shown in red. b Statistically significant features for distinguishing digit 5 from baseline 3 using median O-L-IG approach on a convolutional neural network (CNN). For generating the attributions, linear path was computed in the original feature space (O-L-IG) from a median baseline. c Statistically significant features for distinguishing digit 5 from baseline 3 using median H-L-IG approach on a CNN with a convolutional variational autoencoder (C-VAE). For generating the attributions, linear path was computed in the latent space (H-L-IG) using the C-VAE. d Performance of models trained to distinguish sample from the baseline digits using all features or only the significant features identified using our approach (0.00 to 1.44% loss in accuracy while using 8 to 20% of all pixels). The top panel enlarges the y-axis (0.98 to 1.00) to highlight the differences in performance. These models solve the binary classification task of distinguishing the sample digit from the baseline digit and thus require fewer pixels than the original multi-class classification problem of classifying each image in as one of ten possible digits. e Test set accuracy of models trained to distinguish digit 5 from baseline 3 with increasing subsets of significant features (pink) or random features (blue). x-axis shows increasing subsets of features, and y-axis shows the accuracy on the test set

Back to article page