Hierarchical correction using Markov blanket structure. (a) Schematic of the local Markov blanket surrounding a GO term (Y1 is the node of interest in this example). Each GO term is represented by a blank node while the SVM classifier output for that GO node is represented by a shaded node. To address the hierarchical relationships between GO terms, for each GO term (Y1), we included all neighboring nodes in its Markov blanket to construct a Bayesian network. The distribution of SVM outputs (observed nodes) for positive and negative examples was encoded in the conditional probability tables of the Bayesian network. We then infer the probability of a particular gene's involvement in each GO term (a hidden node) based on its values in these observed nodes. (b) Improvement of the AUC for the novel set using the HIER-MB classifiers compared to single SVM predictions for selected terms for biological process terms of size 101 to 300 (number of genes annotated to this GO term in the training set). For each GO term, the best-performing sub-hierarchy was selected, and the ones that performed better than single SVM (characterized by held-out values in the training set) are plotted in this figure. (c) Median improvement of predictions for selected GO terms over different biological process GO term sizes. Hierarchical correction using Markov blanket structure performs better (when selected) for smaller terms. AUC, area under receiver operating characteristic curve; GO, Gene Ontology; HIER-MB, Markov blanket hierarchical correction; SVM, support vector machine.