Overview of data analysis. (A) Data were acquired from a cohort of 265 UC and FAP patients who had IPAA surgery at least 1 year previously. Biopsies were collected from each patient from both the pre-pouch ileum and j-pouch. The host transcriptome was profiled using cDNA microarrays, and the microbiome was profiled by sequencing the V4 region of the 16S gene. Data were then subjected to unsupervised reduction and linear modeling (B), and to supervised reduction and linear discriminant analysis (C). (B) After quality control, data dimensionality was reduced to maximize statistical power prior to linear modeling. After filtering low-variance transcripts, principal component analysis was used to create nine gene principal components (gPCs) to account for 50% of the variance in the transcriptome data. OTUs were filtered for minimum abundance and for presence in at least three samples. PCA was then used to create nine clade principal components (cPCs) explaining 50% of the variance in OTU data. Multivariate association with linear modeling was then used to test for associations between clades and transcripts that were significant after adjusting for metadata (inflammation, antibiotic use, and outcome). (C) In an alternative data reduction approach, a list of 449 genes was curated from IBD genome-wide association studies  and host genes that physically interact with bacteria . The expression profiles of these 449 genes were further reduced by k-medoid clustering into 75 medoids, each representing a cluster of genes with similar expression profiles. Abundant microbial clades were hierarchically clustered, and one representative from each cluster was chosen. Linear discriminant analysis was used to measure which genes and clades were most discriminant between clinical outcomes. (See also Additional file 1, Additional file 2, and Additional file 3A to C).