Microbial profile of study cohort
To study the influence of the microbiome on BCG-induced trained immunity, we vaccinated 321 healthy individuals of Western European background with BCG [18–75 years old, median age 23 years, 57% female, 83% body mass index (BMI) between 18.5 and 25]. Stool and blood samples were collected at baseline, prior to participants receiving the BCG vaccine. Blood was subsequently collected after 2 weeks and 3 months, as depicted in Fig. 1A. DNA was isolated from the 321 stool samples (“Methods”), amplified, and sequenced using whole genome sequencing. Peripheral blood mononuclear cells (PBMCs) isolated from participants at each time point were subsequently stimulated ex vivo with S. aureus or M. tuberculosis; IL-6, IL-1β, and TNF-α were measured after 24 h and IFN-γ was measured after 7 days. Additionally, metabolites were analyzed in plasma from peripheral blood at baseline (Koeken et al. in preparation). Our approach consisted of (1) assembling all microbial species, (2) testing the microbes for modulatory effects on cytokine responses, (3) assessing whether immunomodulatory microbes have stronger effects on circulating metabolites compared to the other microbes, and (4) identifying which metabolites are associated with immunomodulatory microbial species (Fig. 1B).
Reference-based profiling with MetaPhlAn 2.2 detected 202 microbial species, while de novo genome assembly using MSPminer resulted in more than 1.7 million microbial genes. Three hundred and forty-five core metagenomic species pangenomes (MSPs) matched to a phylum, of which 315 (91.3%) and 208 (60.3%) matched to the genus and species level, respectively, with more genes leading to a more precise annotation (Fig. 1C). De novo assembly was subsequently used in all downstream analyses except where explicitly mentioned; besides retrieving all 202 initial species, this method yielded a 1.7-fold increase in the size of the profiled metagenomic space. We hereafter use MSP to refer to a taxonomic unit as a gene set matched to a phylum, genus, or species.
Community-level profiling using core MSP abundances yielded three distinct clusters of samples (Fig. 1D; Additional file 1: Fig. S1A). Average cluster compositions and sample positioning in multi-dimensional scaling plots reflect abundances of the three most common phyla (Fig. 1E). Firmicutes prevailed in cluster 1, with an average relative abundance of 72.6%, followed by Bacteroidetes (16.64%, cluster 2) and Actinobacteria (9.32%, cluster 3) (Fig. 1F). Cluster 1 also showed increased relative abundances of Verrucomicrobia and Euryarchaeota (Additional file 1: Fig. S1B-E). The overall composition reflected that of developed, Western-type societies that tend to be dominated by Firmicutes, while the presence of Bacteroidetes often reflects diet-driven variation [21]. This was corroborated by the species-level profiles being indistinguishable from the independent Dutch 500FG cohort of 473 healthy participants (Additional file 1: Fig. S1F-G) [12].
All the participants in our study were asked to complete a questionnaire that included general health-related questions. Eighteen out of 185 variables associated with the three microbiome clusters (P < 0.05, unadjusted Cauchy test), including diet (fruit intake), sex, smoking status, vaccination history, and gastrointestinal status-related variables (Additional file 1: Fig. S2; Additional file 2: Table S1).
Shifts in the relative abundance of individual species correlated with 25 variables (MaAsLin2 generalized linear model, PFDR < 0.25), of which 10 correlated with more species (Fig. 1G; Additional file 2: Table S1). In particular, vaginal yeast infection, sex, and daily exercise associated with most microbial species (Fig. 1G). Other species-specific variables that were missed by cluster associations included hypertension, hay fever during youth, and antibiotic use.
Microbiome species are associated with the capacity to modulate cytokine production after BCG vaccination
Cytokine production in the supernatant of stimulated PBMCs was measured to assess the trained immune response (IL-6, IL-1β, TNF-α after 24-h stimulation) and the specific immune response (IFN-γ after 7-day stimulation). IL-1β and TNF-α production upon S. aureus stimulation and IFN-γ production upon M. tuberculosis stimulation gradually increased 2 weeks and 3 months after BCG vaccination compared to baseline (Fig. 2A; Additional file 1: Fig. S3A). Trained immunity (indicated by increases in TNF-α, IL-1β, IL-6 production) as well as specific (M. tuberculosis-induced IFN-γ) and heterologous lymphocyte-derived (S. aureus-induced IFN-γ) cytokine responses correlated with each other (Fig. 2B, C). The microbiome clusters did not show a significant divergence in any of the cytokine responses measured upon ex vivo stimulation of PBMCs (maximum P = 0.16 for IFN-γ with S. aureus, Additional file 1: Fig. S3B), prompting the search for a potential individual taxon or functional immunomodulatory factors.
The associations between microbes and both immune system modalities were assessed with two linear models, adjusted for age and sex, testing for effects on trained immunity and specific T cell responses (“Methods”). Trained immunity was associated with 40 significant species (prevalence > 20%, FDR < 0.2, Fig. 2D; Additional file 3: Table S2), while specific immunity was associated with three species: Ruminococcus (MSP 091), Eggerthella lenta (MSP 181), and Streptococcus thermophilus (MSP 076) (Fig. 2E; Additional file 3: Table S2). At FDR < 0.1, 27 species were associated with trained immunity. Because the model for the trained immunity response combined more biological markers (three cytokines) than that for the specific response (one cytokine), it retrieved more species at the same significance threshold. Nevertheless, both modalities elicited a non-uniform distribution of p-values (Fig. 2F), supporting an influence of the microbiome on BCG vaccine responses.
We next examined the correlation between species relative abundance and individual cytokine production at both time points after vaccination (Fig. 2G). A Roseburia genome (MSP 112), only identifiable on the genus level, was negatively correlated with the increase in IL-1ß and TNF-α production at both time points (P = 2.67e−05, FDR = 0.0011, Fig. 2H). In the specific response model, Ruminococcus (MSP 091, P = 0.0012, FDR = 0.17) and E. lenta (MSP 181, P = 0.0030, FDR = 0.17) showed consistent negative and positive effects, respectively, on IFN-γ at both time points (Fig. 2I). The specific response model revealed the most associations between a single cytokine and microbial species, although for all the cytokine–stimuli combinations, the number of unadjusted associations ranged between 10 and 23 microbes (Additional file 4: Table S3).
Immunomodulatory microbes strongly influence circulating metabolites
Detectable changes in cytokine production yielded potential immunomodulatory species, whose exact numbers varied with selected statistical thresholds. To corroborate cytokine associations and narrow down the potential causal mechanisms, we compared the effects of immunomodulatory microbes on the plasma metabolome, measured as previously described [59]. A total of 1607 mass/charge (m/z) peaks were matched to known compound identities and clustered into 20 groups, enriched for eight major molecular classes (Additional file 1: Fig. S4A-C, “Methods”).
Canonical correlation analysis confirmed a strong microbiome effect on compound intensities, as 15 of the top 25 canonical components (CCs) showed significant explained variance (Additional file 1: Fig. S4D-E). A t-SNE projection of the top 25 CCs showed strong grouping by molecular class, with evident co-occurrence of microbes in clusters harboring carboxylic acids and derivatives, glycerophospholipids and organic sulfuric acids (Fig. 3A). The three immunomodulatory species Roseburia (MSP 112), Ruminococcus (MSP 091), and E. lenta (MSP 181) strongly influenced the composition of the metabolite space, as seen by their polarized placement in the joint projection (Fig. 3A).
Next, we investigated whether immunomodulatory microbes have stronger effects on individual metabolites compared to the other microbes. All pairwise Spearman correlations yielded 198 significant associations (PBonferroni < 0.05) between 34 MSPs and 32 metabolite intensities (Additional file 5: Table S4). While most associations were positive (183/198), the resulting increase in metabolite intensities might result both from microbe-mediated anabolism or compound breakdown. To test whether species with large effects in either of the two linear models are more likely to elicit significant effects on the metabolome, we used three different thresholds that selected at least 40 species and computed the largest absolute effect on any one metabolite. All three settings yielded an enrichment of significant effects among immunomodulatory microbial species (maximum P < 10−11, Chi-square test, Fig. 3B). This confirmed that microbial species with effects on cytokine production also have strong effects on circulating metabolites.
Of the 198 significant metabolite associations, 18 were associated with the presence of Roseburia (MSP 112), the most of any MSP, and one with E. lenta (MSP 181) (Fig. 3C). Due to uncertainty in annotation and pathway-mediated dependence on metabolite abundance, we attempted to identify general mechanisms by grouping the metabolites using KEGG pathways. Phenylalanine metabolism was influenced most strongly by Roseburia (MSP 112), with four metabolite associations in this pathway. Roseburia (MSP 112) additionally influenced three metabolites each in tryptophan metabolism and biosynthesis of secondary metabolites (Fig. 3C).
Untargeted plasma metabolomics detected two SCFAs, acetate and propionate, which are microbial products of dietary fiber metabolism and candidate mediators of host–microbiome signaling [14]. Both molecules showed significant positive associations with trained immunity but not with the specific response (Additional file 1: Fig. S5). While immunomodulatory microbes were not significantly correlated with either acetate or propionate, E. lenta had the highest positive correlation with propionate.
Butyrate, an SCFA previously shown to decrease IL-6 production [22], was not detected in circulation, possibly due to its absorption in the gut [23]. Nevertheless, butyrate kinase, which catalyzes the last step of butyrate biosynthesis [24], was detected in a single Roseburia (MSP 209) and five Coprococcus species, where Coprococcus (MSP 030) had the strongest positive association with trained immunity, in particular an increase in IL-6, after vaccination (Additional file 1: Fig. S6A-C). Conversely, Coprococcus comes (MSP 213) correlated with a decreased trained immunity response, manifested in reduced TNF-α expression after 3 months (Additional file 1: Fig. S6D). The absence of butyrate kinase in immunomodulatory Roseburia (MSP 112) may stem from gaps in assembled genomes or allow for metabolic modalities other than SCFAs in mediating BCG-induced immunity.
Combined evidence from the (i) modeled effects of microbes on cytokine production, (ii) associations between species relative abundances and the production of two trained immunity-related cytokines at both time points after vaccination, and (iii) effects of the immunomodulatory microbiome species on the plasma metabolome yielded the prediction that Roseburia-mediated phenylalanine metabolism influences trained immunity.
Roseburia encodes phenylalanine pathway enzymes that are prevalent and abundant
The effects on phenylalanine metabolites prompted us to investigate the Roseburia genomes and confirm the presence of relevant enzymes. We compared the differential enzymatic and metabolic potential of Roseburia to phylogenetically and biochemically related taxa by a deeper profiling of the stool metagenomes. All 345 identified MSPs were grouped by phylogenetic divergence using core genes (PhyloPhlAn 3.0, Additional file 1: Fig. S7), revealing a clade harboring most Roseburia species (Fig. 4A). The phylogenetically closest Firmicutes included five Ruminococcus, three Coprococcus and two Eubacterium rectale MSPs. While E. rectale is related to Roseburia in terms of butyrate production, we also included a biochemically distinct but abundant Actinobacteria Bifidobacterium (MSP 111). The assembled panel consisted of 59 MSPs derived from 85,208 assembled genes of Bifidobacterium (7 MSPs), Roseburia (13 MSPs), Ruminococcus (17 MSPs), Eubacterium (15 MSPs), and Coprococcus (7 MSPs), with each genus having more than 95% prevalence in the cohort (Fig. 4B).
A total of 330 genes mapped to 20 out of 77 enzymes in the phenylalanine pathway (KEGG ko00360). Roseburia contributed the most genes (92, 45.1% average prevalence; Fig. 4C; Additional file 6: Table S5), whereas other genera in the panel contributed fewer [Ruminococcus (77, 19.2%), Eubacterium (66, 36.3%), Coprococcus (50, 35.5%) and Bifidobacterium (45, 46.1%)]. Roseburia also encoded the most phenylalanine enzymes (13 out of 20). Of these, hisC, mhpE, yhDR, paaH, paaF, paaI, and enr were significantly more abundant in Roseburia than in all of the other four genera (Fig. 4D).
A total of 15 m/z peaks matched to compounds from the phenylalanine pathway (KEGG ko00360), potentially spanning up to 19 compounds due to ambiguous matches. Species from the Roseburia genus significantly correlated with 11 of the 15 compounds in the pathway, of which MSP 112 contributed to six of these compounds (Spearman correlation, P < 0.05, Fig. 4E). The genera Ruminococcus, Coprococcus, Bifidobacterium, and Eubacterium each showed fewer associations (Fig. 4E). Taken together, Roseburia’s increased influence on the phenylalanine metabolism pathway compared to closely related species is confirmed through increased enzymatic potential and metabolite intensity.
Comparison of Roseburia genomes reveals alternate routes in phenylalanine metabolism
Finally, we more closely investigated the 13 phenylalanine metabolism enzymes found within the Roseburia genus and their differential effects on circulating metabolites. Five of the enzymes (hisC, mhpE, paaH, yhDR, and paaF) were common to over half of the MSPs, while the remaining eight were more species-specific (Fig. 5A). The co-occurrence of aaaT (L-phenylalanine N-acetyltransferase) and paaI (acyl-CoA thioesterase) was specific to the immunomodulatory MSP 112, as they were not found together in any other Roseburia MSP. These enzymes act in the path from phenylalanine to N-acetyl-L-phenylalanine and phenylacetylglutamine, whose plasma concentrations were associated with increased Roseburia (MSP 112) abundance (Fig. 4E).
Correlations between individual enzyme abundances and compound intensities revealed two distinct pathway modalities. First, an increase in L-tyrosine, L-phenylalanine, and phenylpyruvate (m/z ambiguous matches: 2−hydroxy−3−phenylpropenoate, trans−3−hydroxycinnamate) was accompanied by the five commonly shared enzymes (hisC, mhpE, paaH, yhDR, and paaF) as well as hipO (grouped by blue bars in Fig. 5B). Second, HPD, paaK, aaaT, and padE were associated with an increase in phenylpropanoate, hippurate (m/z ambiguous match: phenylacetylglycine), and phenylacetylglutamine (grouped by orange bars in Fig. 5B). The latter compounds also strongly decreased with the presence of the five shared enzymes, suggesting an opposing mechanism between the two groups of compounds and enzymes.
We assessed the significance of the observed correlations to verify that the differences in compound intensities in each group were indeed more likely due to the associated enzymes than any other genes. The Spearman correlation between sums of enzyme abundances and compound intensities within each group revealed even stronger effects than any individual compound–enzyme association (Fig. 5C, top-left and bottom-right panels). Conversely, correlations between groups showed weak to negative effects (Fig. 5C, top-right and bottom-left panels). Finally, the observed correlations were compared to a null distribution of correlations from 10,000 random draws of six (blue group) or four (orange group) genes from all 22,832 Roseburia genes. Indeed, all four observed effects were extreme when compared to random sets of genes (Fig. 5D), confirming that the opposing compound–enzyme associations are unlikely to be due to other Roseburia genes. The complete sets of detected compounds and enzymes are shown in Fig. 5E, with group separation indicated. These results reveal two mechanisms, each characterized by a mutually exclusive set of enzymes and compounds, outlining how species-level differences in Roseburia may affect phenylalanine metabolites in circulation.
Additional evidence for the capability of Roseburia to alter compounds in the phenylalanine metabolism pathway is provided by a study that performed targeted metabolomics using reference standards on supernatants of individual microbial species grown in defined media [25], including Roseburia inulinivorans, a close relative of Roseburia (MSP 112) with species-level annotation (Fig. 4A). A change of at least twofold in R. inulinivorans supernatant was observed for two compounds: an increase in 2-hydroxyacetate, which is encoded downstream of yhDR and hisC, and a decrease in succinate, which is encoded upstream of mphE (Additional file 1: Fig. S8, Fig. 5D). Although these compounds were not detected using our plasma metabolomics methods, the direction of changes is aligned with the relevant genes and supports an active role of Roseburia in phenylalanine metabolism pathways.