Metabolic-network-driven analysis of bacterial ecological strategies

Bacterial ecological strategies revealed by metabolic network analysis show that ecological diversity correlates with metabolic flexibility, faster growth rate and intense co-habitation.


Supplementary Note 2
To show that the correlation observed between doubling time and ESI/maximal-CHS is not affected by an uneven representation of taxonomic groups in our dataset, we used KEGG taxonomic annotations to divide the species into 25 groups according to their classes (Table S1). One representative (preferably a species that has a doubling time record) was picked randomly out of each group. Using this ensemble of 25 representatives, we built an environmental viability matrix and used it to calculate ESI and CHS values. In each set of 25 species doubling time information was available for 22 representatives. When computing the Spearman correlations between ESI and doubling time, and between CHS and doubling for 1000 random runs, we observe a negative correlation in 934 and 969 respectively ( Figure S2).
These findings imply that our results are not affected by the uneven representation of taxonmic groups in our data set.

Supplementary Note 3
The ESI and max CHS measures are tightly related by definition, and expectedly are highly correlated (r = 0.579, p<1.3e-48, Pearson). To test whether this correlation is not simply due to the definition of the variables, we compute the correlation between ESI and randomized max CHS values, obtained by separately shuffling each lines of the environmental viability matrix (thus maintaining the same number of viable environments per-organism). We compute p-values in two ways: (i) using a Zscore h r h σ − where r is the correlation index between ESI and the true max CHS, h is the mean correlation in the randomized cases and h σ is their standard deviation. (ii) Empirical p-value, calculated as the fraction of random cases in which the correlation was higher than the one observed in the original instance. In both cases the p-values are lower than 1e-3. ( Figure S3).

Supplementary Note 4
The seed set of a species is the union of metabolites that a species might extract from the external world in different habitats, as discussed in detail in [12]. Based on a topological analysis of the species' metabolic network, it provides a first approximation of the species metabolic environment, and computed for each species, it provides an approximation of the ensemble of metabolic environments that the species studied here may face. Yet, we additionally examined alternative approaches for generating other biologically-plausible sets of metabolic environments (and then recalculating the corresponding environmental viability matrix), and studied the robustness of our main findings under these conditions. One such alternative approach for studying the effect of the environmental composition on our observations is to create random sets of environments. The first set of random environments, Random Env I, is composed of 528 shuffled seeds environments, i.e., maintaining the original metabolites representation overall seeds.
That is, if a certain metabolite has X appearances over all seeds, then it is randomly assigned to X out of 528 environments. This process is repeated for each seed metabolite. The resulting environments range in size from 259 to 329 metabolites (mean number of metabolites per environment: 295). The distribution of species per environment in the original seed data and in the randomized-source set can be seen in Figure S4a and S4b. The mean and maximal co-habitation of environments from the random set is smaller than that of the seed environments (mean 1 and 5.7; max: 49 and 60 respectively), as expected for environments which were randomly constructed.
Both ESI values and maximal-CHS values are in significant negative association with doubling time, repeating and reinforcing the trends reported in the main text (Table   S5).
Although fast-growing bacteria exhibit higher ESI and maximal-CHS in comparison to slow growing bacteria (Table S5) -as observed while using the original seeds -the negative correlation between maximal-CHS and doubling time is insignificant following excluding the group of obligatory host-associated bacteria.
One possible explanation to the lack of significance is that the environmental viability matrix is very sparse while using the shuffled environment ( Figure S4). Hence we created a more densely populated set of shuffled environments, Random Env II, by increasing the number of metabolites per environment while maintaining an approximation of the original metabolites representation overall seeds. Each metabolite in the original seed environments is randomly assigned to the shuffled environments where its representation over all environments is 1.05 times in comparison to its original representation. That is, if a certain metabolite has, for example, 20 appearances over all seeds, then it is randomly assigned to 21 out of 528 environments. The distribution of species per environment in Random Env II can be seen in Figure S4c. The mean and maximal co-habitation of environments this set is higher than that of Random Env I (mean 9.7; max: 123). As in the original seed environments, the mean maximal co-habitation of gut and soil bacteria is higher than the mean maximal co-habitation of specialized and obligatory symbiont bacteria ( Figure S5). In this set -creating a less sparse environmental viability matrix -we repeat all main observation reported as well as negative correlation between doubling time and maximal-CHS following excluding the group of obligatory host-associated bacteria (Table S5).
Overall, it is reassuring to see that using several approaches for constructing potential sets of natural metabolic environments we find associations that are qualitatively similar to the ones reported in the main text.

Supplementary Note 5
We compared the annotations retrieved from NCBI [34] to the environmental sample where we identify the species. In 14 cases the sample matches the NCBI annotation (e.g., sequences annotated as aquatic or host-associated are found in marine and gut samples, respectively); in 16 cases the sample does not contradict the NCBI annotation (e.g., sequences annotated as multiple are found in marine samples); in 3 cases the experimental finding contradict NCBI annotations (in all 3 cases sequences annotated as terrestrial are found in marine samples). The 33 cases are presented at Table S2. A larger collection of environmental databases will allow a more comprehensive analysis. Table S1   The table displays the following values for the 113  [Enclosed as Additional data file 1]

Table S2
NCBI annotations and description of environmental sample for 33 species that can be identified in an environmental sample.
[Enclosed as Additional data file 2]

Table S3
Original values of environmental complexity, as downloaded from [19], and values added by manual curation. For these values added by manual curation, reference is provided (pubmed id).
[Enclosed as Additional data file 4]

Table S4
Full description and KEGG ID of the 65 biomass target metabolites.
[Enclosed as Additional data file 5]

Table S6
The table displays the following values for the 528 bacterial species: name, genome size (bp), network sizes (number of reaction-nodes), environmental scope index (ESI) and maximal co-habitation score (max-CHS) computed for the original seed enviroenmnets, fraction of regulatory genes, estimates of environmental complexity, lifestyle description (Methods), and oxygen requirements. Values were retrieved as described for Table S1.
[Enclosed as Additional data file 6]  ±The two sets of data (all species, non obligatory symbionts) were divided into two bins according to species' growth rate (fast and slow). The significance between the genomic attributes studied (e.g., genome size, network size etc) was calculated with