Skip to main content
Fig. 2 | Genome Biology

Fig. 2

From: Microbiome meta-analysis and cross-disease comparison enabled by the SIAMCAT machine learning toolbox

Fig. 2

Analysis of covariates that potentially confound microbiome-disease associations and classification models. The UC dataset from Nielsen et al. [27] contains fecal metagenomes from subjects enrolled in two different countries and generated using different experimental protocols (data shown is from curatedMetagenomicData with CD cases and additional samples per subject removed). a Visualizations from the SIAMCAT confounder checks reveals that only control samples were taken from Denmark suggesting that any (biological or technical) differences between Danish and Spanish samples might confound a naive analysis for UC-associated differences in microbial abundances. b Analysis of variance (using ranked abundance data) shows many species differ more by country than by disease, with several extreme cases highlighted. c When comparing (FDR-corrected) P values obtained from SIAMCAT’s association testing function applied to the whole dataset (y-axis) to those obtained for just the Danish samples (x-axis), only a very weak correlation is seen and strong confounding becomes apparent for several species including Dorea formicigenerans (highlighted). d Relative abundance differences for Dorea formicigenerans are significantly larger between countries than between Spanish UC cases and controls (P values from Wilcoxon test) (see Fig. 1c for the definition of boxplots). e Distinguishing UC patients from controls with the same workflow is possible with lower accuracy when only samples from Spain are used compared to the full dataset containing Danish and Spanish controls. This implies that in the latter case, the machine learning model is confounded as it exploits the (stronger) country differences (see c and f), not only UC-associated microbiome changes. f This is confirmed by the result that control samples from Denmark and Spain can be very accurately distinguished with an AUROC of 0.96 (using SIAMCAT classification workflows)

Back to article page