Skip to main content
Fig. 4 | Genome Biology

Fig. 4

From: SeqScreen: accurate and sensitive functional screening of pathogenic sequences via ensemble learning

Fig. 4

Majority Voting Ensemble Classifier used to create FunSoC Database. The top three models combined are Bl. SVC + NN(OS), balanced linear support vector classifier + neural networks (oversampled); TS NN, two-stage neural network; and TS Bl.SVC, two-stage balanced linear support vector classifier. The binary predictions of each of the classifiers over each FunSoC are combined in a majority voting scheme to predict the final labels for the SeqScreen FunSoC database which is then used to annotate query sequences. Training data is split into train (56.75%), validation (18.25%), and test (25%). The two-stage methods first detect presence of at least one FunSoC and then carry out the multi-class multi-label predictions. Dropouts (neural networks) and L1-regularization (support vector classifier) are used to control for overfitting. Two of the models use random oversampling (Bl. SVC + NN(OS), after feature selection), and class weights (TS Bl. SVC) to deal with class imbalance in the training data

Back to article page