Skip to main content

Table 2 Brief description of function prediction methods used

From: A critical assessment of Mus musculusgene function prediction using integrated genomic evidence

Submission identifier Approach Name Author initials
A Compute several kernel matrices (SVM) for each data matrix, train one GO term specific SVM per kernel, and map SVMs' discriminants to probabilities using logistic regression Calibrated ensembles of SVMs GO, GL, JQ, CG, MJ, and WSN
B Four different kernels are used per data set. Integration of best kernels and data sources is done using the kernel logistic regression model Kernel logistic regression [55] HL, MD, TC, and FS
C Construct similarity kernels, assign a weight to each kernel using linear regression, combine the weighted kernels, and use a graph based algorithm to obtain the score vector geneMANIA SM, DW-F, CG, DR, and QM
D Train SVM classifiers on each GO term and individual data sets, construct several Bayesian networks that incorporate diverse data sources and hierarchical relationships, and chose for each GO term the Bayes net or the SVM yielding the highest AUC Multi-label hierarchical classification [56] and Bayesian integration YG, CLM, ZB, and OGT
E Combination of an ensemble of classifiers (naïve Bayes, decision tree, and boosted tree) with guilt-by-association in a functional linkage network, choosing the maximum score Combination of classifier ensemble and gene network WKK, CK, and EMM
F Code the relationship between functional similarity and the data into a functional linkage graph and predict gene functions using Boltzmann machine and simulated annealing GeneFAS (gene function annotation system) [2, 3] TJ, CZ, GNL, and DX
G Two methods with scores combined by logistic regression: guilt-by-association using a weighted functional linkage graph generated by probabilistic decision trees; and random forests trained on all binary gene attributes Funckenstein WT, MT, FDG, and FPR
H Pairwise similarity features for gene pairs were derived from the available data. A Random Forest classifier was trained using pairs of genes for each GO term. Predictions are based on similarity between the query gene and the positive examples for that GO term Function prediction through query retrieval YQ, JK, and ZB
I Construct an interaction network per data set, merge data set graphs into a single graph, and apply a belief propagation algorithm to compute the probability for each protein to have a specific function given the functions assigned to the proteins in the rest of the graph Function prediction with message passing algorithms [57] ML and AP
  1. AUC, area under the receiver operating characteristic curve; GO, Gene Ontology.