Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Carnelian uncovers hidden functional patterns across diverse study populations from whole metagenome sequencing reads

Fig. 2

Classification of patients vs controls using Enzyme Commission (EC) markers (N-fold cross-validation experiments). a T2D vs controls in the T2D-Qin dataset (Chinese cohort). b T2D vs normal glucose tolerance (NGT) individuals in the T2D-Karlsson dataset (European cohort). c CD patients vs controls in the CD-HMP dataset (individuals from the US). d CD patients vs healthy individuals in the CD-Swedish dataset (Swedish twin studies). e PD vs controls in the PD-Bedarf dataset. In each trial, one of the N subsets was selected as the test set and the rest N−1 subsets were used as the training set. Differentially abundant ECs were selected from the training set as features input to a set of random forest classifiers. Performance of classification was measured on the test set. Carnelian-identified EC terms achieve a larger average area under the curve (AUC) in all the cases compared to those identified by other methods

Back to article page