Assembly of hundreds of novel bacterial genomes from the chicken caecum

Background Chickens are a highly important source of protein for a large proportion of the human population. The caecal microbiota plays a crucial role in chicken nutrition through the production of short-chain fatty acids, nitrogen recycling, and amino acid production. In this study, we sequence DNA from caecal content samples taken from 24 chickens belonging to either a fast or a slower growing breed consuming either a vegetable-only diet or a diet containing fish meal. Results We utilise 1.6 T of Illumina data to construct 469 draft metagenome-assembled bacterial genomes, including 460 novel strains, 283 novel species, and 42 novel genera. We compare our genomes to data from 9 European Union countries and show that these genomes are abundant within European chicken flocks. We also compare the abundance of our genomes, and the carbohydrate active enzymes they produce, between our chicken groups and demonstrate that there are both breed- and diet-specific microbiomes, as well as an overlapping core microbiome. Conclusions This data will form the basis for future studies examining the composition and function of the chicken caecal microbiota.


Background
There are an estimated 23 billion live chickens on the planet at any one time [1], outnumbering humans by over 3:1. As most of these are reared for food, the actual number of chickens produced per year is even higher, at almost 65 billion, leading some to speculate that the accumulation of chicken bones in the fossil record will be used by future archaeologists as a unique marker for the Anthropocene [2].
Since the 1960s, worldwide chicken meat production has increased by over ten times [3]. Global meat production is predicted to be 16% higher in 2025 vs. 2015, with most of this increase originating from poultry meat production [4]. Part of the popularity of chicken meat is that due to intensive selection, chickens have been developed which are highly productive in terms of their growth rate with efficient feed conversion ratios (the rate at which chickens convert feed into muscle), decreasing from 3.0 in the 1960s to 1.7 in 2005 [5], making them a cheap source of protein in comparison to other livestock. Another reason for their popularity is a lack of religious dietary restrictions related to their consumption, in comparison to pork or beef. Chickens also produce less greenhouse gases per kilogramme of meat than pigs, cattle, and sheep [6]. The potential to manipulate the microbiota in chickens to gain further increases in productivity is of great commercial and scientific interest, leading to the use of probiotics across the poultry industry [7].
As well as playing an important role in pathogen protection [8] and immune system development [9], the microbiota of the chicken also plays a crucial nutritional role. The largest concentration of microbial cells in the chicken gastrointestinal tract can be found in the caeca, and thereby, the majority of chicken microbiota studies focus primarily on these microbial communities. Members of the caecal microbiota are able to produce short-chain fatty acids (SCFAs) such as acetate, butyrate, lactate, and propionate, mostly from carbohydrate sources which have passed through the small intestine; these SCFAs are then able to be absorbed by the bird and used as an energy source [10]. Members of the chicken caecal microbiota have also been implicated in the recycling of nitrogen by the degradation of nitrogenous compounds [11] and the synthesis of amino acids [12]. One study demonstrated that 21% of the variation in chicken abdominal fat mass could be attributed to the caecal microbiota composition, when controlling for host genetic effects [13]. Differences have also been observed between birds with high and low feed efficiency [14,15]. However, despite extensive research over many decades, the quantitative importance of the caeca in chicken nutrition remains unclear [16], and relatively few microbes commensal in the chicken gut have been sequenced and deposited in public repositories.
The emergence of cheaper DNA sequencing technologies [17,18] has led to an explosion in studies which have sought to characterise the chicken gastrointestinal microbiota, particularly using 16S rRNA gene-based methods. Using this methodology, it has been found that the chicken caecal microbiota in the first few weeks of life is predominantly colonised by members of the Firmicutes, mostly of the order Clostridiales [8,19]. While valuable, marker-gene studies do not enable an in-depth functional and genomic characterisation of the microbiome. Some microbes from the chicken caeca have been successfully cultured and sequenced, including 133 gut anaerobe strains representing a few dozen species with a wide range of metabolic potentials [20] and 42% of the gut microbiota members of 34-40-week-old layers [21]; however, it is highly unlikely that these microbes represent the entire diversity of the chicken caecal microbiota, due to the difficulty in culturing many anaerobic gut microorganisms. One method which avoids this issue of culturability is the construction of metagenome-assembled genomes (MAGs). Due to improvements in computational power and sequencing technologies, and the development of new computational approaches [22,23], it is now possible to accurately bin short-read metagenomic data into highquality genomes. Using this technique, thousands of MAGs have been generated from various environments, including humans [24,25], chickens [26], the rumen [27,28], pig faeces [29], marine surface waters [30,31], an underground aquifer system [32], and other public datasets [33].
In this study, we sought to use metagenomic sequencing, assembly and binning to investigate the chicken caecal microbiota. In order to maximise diversity, we chose two commercial bird genotypes with different growth phenotypes, fed with two different diets. This also allowed us to look at the effects of breed and diet on strain-level microbial abundance. The lines chosen for the study were Ross 308, a fast growing broiler breed, and the Ranger Classic, a slower growing broiler aimed at free-range, organic farms. All birds were fed either a vegetable-only diet or a diet based on fish meal as the protein source. The inclusion of fish meal in chicken diets has previously been linked to changes in the caecal microbiota and is correlated with an increased risk of necrotic enteritis [34,35]. We assembled 460 novel microbial strains, predicted to represent 283 novel microbial species and 42 novel microbial genera from the chicken microbiome, and went on to demonstrate both a breed-and diet-specific microbiota. We also demonstrated that our microbial genomes are abundant within European chicken flocks and represented the majority of reads from nine farms, which were part of a pan-EU study examining antimicrobial resistance (AMR) in broilers [36]. While we show that large numbers of strains are shared between our birds, it is their relative abundance that largely drives breed and diet effects. This is the first large-scale binning of the chicken caecal microbiota, and we believe these data will form the basis for future studies of the structure and function of the chicken gut microbiome.

Assembly of 469 draft microbial genomes from chicken caeca
We produced 1.6 T of Illumina data from 24 chicken samples and carried out a metagenomic assembly of single samples and also a co-assembly of all samples. On average, 98.4% (standard deviation (SD) = 0.289%) of our reads originated from bacteria, 1.2% (SD = 0.25%) originated from Eukaryota, 0.12% (SD = 0.093%) originated from viruses, and 0.31% (SD = 0.046%) originated from Archaea. A total of 4524 metagenomic bins were created from the single-sample binning, and 576 more were created from co-assembly binning. By performing co-assemblies, we are able to construct bins which would have been too low in coverage to be identified by single-sample binning. We were left with a total of 469 dereplicated genomes (99% ANI) with estimated completeness of ≥ 80% and estimated contamination ≤ 10% (Additional file 1: Figure S1), 377 of which originated from the single-sample binning and 92 from the co-assembly. Of these, 349 had completeness > 90% and contamination < 5% (high-quality draft genomes as defined by Bowers et al. [37]), 210 were > 95% complete with < 5% contamination, and 47 MAGs were > 97% complete with 0% contamination. The distribution of these MAGs (based on coverage) between the 24 samples can be found in Additional file 2. After dereplication to 95% ANI, 335 MAGs remained, representing species identified in our samples. Our dataset therefore contains 469 microbial strains from 335 species. Two hundred eighty-three of these species and 460 of these strains were novel when compared to public databases (Additional file 3).
Of the MAGs that show greater than 95% ANI (average nucleotide identity) with an existing sequenced genome, several of these genomes have previously been identified in chickens. Our MAGs include 6 1 Phylogenetic tree of the 469 draft microbial genomes from the chicken caeca, labelled by taxonomic order, as defined by GTDB-Tk. Draft genomes labelled as "undefined" were only able to be assigned taxonomy at a higher level than order novel strains of Massiliomicrobiota sp. An134 [20], and 5 novel strains of Pseudoflavonifractor sp. An184 [20].
Our MAGs represent several putative novel species from 7 taxonomic classes: including 25 species of Bacilli, 252 species of Clostridia, 2 species of Coriobacteriia, 1 species of Desulfovibrionia, 1 species of Lentisphaeria, 1 species of Vampirovibrionia, and 1 species of Verrucomicrobiae. These include 5 novel species of Lactobacillus. Our MAGs also contain 42 putative novel genera which contain 69 of our MAGs. We defined a genus as novel if all MAGs which clustered at 60% AAI (average amino acid identity) were not assigned a genus by GTDB-Tk (Additional file 5). Forty of these novel genera belong to the class Clostridia, with over half belonging to the order Oscillospirales (which contains the family Ruminococcaceae). One of the remaining novel genera contains one MAG which belongs to the Bacilli class (order Exiguobacterales) while the remaining genus belongs to the Cyanobacteriota (Melainibacteria), within the order Gastranaerophilales. Our proposed names for these genera and the species they contain can also be found in Additional file 5, alongside descriptions of their derivations. GTDB-Tk was unable to assign taxonomy to either of these genera at lower than order level, indicating that they may belong to novel bacterial families. It should also be noted that several genus-level MAG clusters do not contain any MAGs which were assigned a valid NCBI genus label but instead only received names defined by GTDB-Tk. For example, group 16 (Additional file 5) is entirely constituted by MAGs of the genus UBA7102.

Newly constructed MAGs are abundant in chicken populations across Europe
In order to assess the abundance of our MAGs in other chicken populations, we compared sequence reads generated from 179 chicken faecal, pooled, herd-level samples, collected from 9 different countries across the European Union [36], to the 469 MAGs generated as part of this study. The read mapping rates can be seen in Fig. 2. Over 50% of the reads mapped to the MAGs in all samples; in 8 out of 9 countries, the average read mapping rate was above 70%, and in Italy, the average read mapping rate was above 60%. This demonstrates that our MAGs are representative of the broiler gut microbiome in populations throughout the EU, and account for the majority of reads in all cases. The abundance of the MAGs across the 179 samples can be seen in Fig. 3. While there is clear structure in the data, samples do not appear to cluster by country, and the observed similarities may be explained by other factors not available, such as breed, age, or diet.

Presence of a core broiler caecal microbiota
A total of 125 MAGs were found to be present in at least 1× coverage in all of our samples, and 4 of these MAGs were found to be ≥ ×10 in all of our samples: Alistipes sp. CHKCI003 CMAG_6, uncultured Bifidobacterium sp. CMAG_55, uncultured Bifidobacterium sp. CMAG_59, and Firmicutes bacterium CAG_94 CMAG_438. Only one MAG was found to be uniquely present in only one sample at ≥ 1× coverage: uncultured Clostridia sp. CMAG_391 in chicken 16 (Ross 308: vegetable diet). The distribution of MAGs between groups can be seen in Fig. 4. Two hundred seventy-six MAGs were on average present in at least 1× coverage in all groups and could therefore be described as a core microbiota shared amongst the chickens in our study.

Differences in caecal MAGs based on chicken line and diet
When comparing samples based on the coverage of MAGs, significant clustering of samples by group can be observed when comparing all groups (PER-MANOVA (permutational multivariate analysis of variance), P < 0.001), between chicken lines (all samples: PERMANOVA, P < 0.001; within vegetable diet: PERMANOVA, P = 0.015; within fish meal diet: PER-MANOVA, P = 0.0082) (Fig. 5) and between diets (all samples: PERMANOVA, P = 0.008; within Ross 308 line: PERMANOVA, P = 0.018; within Ranger Classic line: PERMANOVA, P = 0.0043) (Fig. 5). A significant interaction was also observed between line and diet (Line × Diet PERMANOVA: P = 0.038). Gender and DNA extraction batch were not found to have significantly affected the abundance of MAGs (PERMA-NOVA: P > 0.05).
MAGs which were significantly more abundant by coverage between groups were identified by DESeq2 (Fig. 6); a full list of these MAGs can be found in Additional file 6. In Ross 308 birds, 43 MAGs were found to be differentially abundant between the 2 diets, while in Ranger Classic birds, 45 MAGs were found to be differentially abundant. Several MAGs were found to be differentially abundant between the 2 lines when birds were consuming a vegetable diet (61 MAGs) or a fish    Lactobacilli are of particular interest to probiotic manufacturers. We found that both MAGs identified as L. gallinarum were more abundant in Ross 308 birds when controlling for diet, and four of the five MAGs identified as L. crispatus were more abundant in birds fed a diet with fish meal when controlling for chicken line.
One notable observation is the high amount of Helicobacter pullorum observed in the Ross 308: vegetable diet group. While H. pullorum is often thought of as a pathogen, it has previously been isolated from the caeca of asymptomatic chickens [43] and carriage of Helicobacter by chickens is common in commercial flocks [53][54][55].

Differences in CAZymes between lines and diets
Carbohydrate active enzymes (CAZymes) are enzymes involved in the metabolism, synthesis, and binding of carbohydrates. They are grouped by the CAZy database [56] into the following major groups: the auxiliary activities (AAs) class, carbohydrate-binding modules (CBMs), carbohydrate esterases (CEs), glycoside hydrolases (GHs), glycosyltransferases (GTs), and polysaccharide lyases (PLs). As their names suggest, CEs are responsible for the hydrolysis of carbohydrate esters while CBMs are responsible for binding carbohydrates. GHs and PLs are both responsible for cleaving glycosidic bonds, hydrolytically or nonhydrolytically, respectively, while GTs are able to catalyse the formation of glycosidic bonds. The AA class are not themselves CAZymes but instead act in conjunction with them as redox enzymes. We compared the predicted proteins from our MAGs with the CAZy database using dbcan with the cut-off E value < 1e−18 and coverage > 0. 35. When clustering groups by the abundance of MAGderived CAZymes, all groups separate visually (Fig. 7) but only the following differences were significant: Ross 308 birds were shown to cluster significantly by diet (PERMANOVA, P = 0.021), and birds receiving a fish meal diet clustered significantly by line (PERMANOVA, P = 0.0065). A significant interaction was observed between line and diet (Line × Diet PERMANOVA: P = 0.0051). Using DESeq2, we also found that the abundances of specific CAZymes differed between groups (Fig. 8), full lists of which can be found in Additional file 7. We found several starch degrading enzymes to be differentially abundant between lines when controlling for diet, including GH13 subfamily 10, GH15, GH57, GH4, and GH31, and between diets when controlling for line, including GH13, GH13 subfamily 28, and GH13 subfamily 33. We also found that several CAZymes involved in metabolising cellulose and hemicellulose were differentially abundant between lines when controlling for diet, including GH5 (subfamilies 19, 37, 48, 44, 18), CE6, GH43 (subfamilies 30, 19, 29, 12), GH115, CE2, and GH67, and between diets when controlling for line, including GH5 (subfamilies 7 and 48) and GH43 (subfamilies 33, 4, and 35). Gender and DNA extraction batch were not found to have significantly affected the abundance of CAZymes (PERMA-NOVA: P > 0.05).

Line and gender impact the weight of the chicken
As we did not monitor individual feed intake, we cannot comment on the feed-conversion ratio of these birds; however, when housed and fed as a group, there are clear statistical differences between the birds in terms of weight (Additional file 1: Figure S2). Univariate GLMs with fixed factors of gender, line, and diet were performed, with bird weight as the dependent variable. Both gender (P < 0.001) and line (P < 0.001) were found to significantly impact weight, as expected. Diet was not found to significantly affect bird weight overall (P = 0.220). We did observe a significant increase in bird weight in Ranger Classic birds (P = 0.007), of both genders, fed a fish meal diet, which was not observed in the Ross 308 birds (P = 0.778).

Discussion
It may be possible to increase chicken productivity by the manipulation of the chicken caecal microbiota. However, before this is possible, we need to develop a good understanding of the types of bacteria present in the chicken and their nutritional function.
In this study, we constructed 469 metagenomeassembled genomes from chicken caecal contents, greatly expanding upon previous chicken caecal MAGs [26]. Three hundred forty-nine of our MAGs had completeness > 90% and contamination < 5% and can therefore be classed as high-quality draft genomes as defined by Bowers et al. [37]. Our MAGs include 460 novel strains and 283 novel species, including 5 novel Lactobacillus species. Ninety-seven MAGs were able to be identified to species level by GTDB-Tk, and a further 246 could be identified to genus. We also identified 42 novel bacterial genera, 40 of which belonged to the class Clostridia. The remaining 2 genera belonged to the Bacilli class and the Gastranaerophilales order of Cyanobacteriota, and may also belong to novel taxonomic families. Our method of defining genera is conservative, as genera within different taxonomies may cluster at higher AAIs [57][58][59]. We used GTDB-Tk instead of NCBI to assign taxonomies to our MAGs for the following reasons. The vast majority of our MAGs are members of the Clostridia, whose taxonomies are known to fit poorly with genomic data [60]. Indeed, when we constructed a phylogenetic tree of our MAGs using NCBI classifications, we found many discrepancies between the taxonomic assignments and our tree (data not shown) resulting in the need for many manual corrections. However, using GTDB-Tk, it was only necessary to manually correct one of our MAGs (CMAG_333) which was originally classified as a member of the Dehalobacteriia but clearly sat within the Clostridia in our tree. Our experiences reflect those of Coil et al. who found that the use of GTDB-Tk required less labour and reduced the need for subjective decisions in taxonomic assignment [61]. The majority of our MAGs belonged to the orders Oscillospirales and Lachnospirales, members of the Clostridia class. The high abundance of Clostridia observed during our study correlates with several previous studies examining the chicken caecal microbiota [20,[62][63][64][65][66][67]. This is likely the product of chicks being raised in an environment where they are not exposed to a maternal microbiota as feral hens and chicks exposed to an adult hen have microbiotas which are far less dominated by Firmicutes and contain higher abundances of Bacteroidetes [68,69]. Within our dataset, we found 276 microbes which were on average present at a minimum 1× coverage in all 4 of our groups, potentially indicating a core microbiota across our dataset. However, caution must be taken as all of our chickens were raised in the same facility and samples were all taken at the same time point, which will have limited the variability in microbes present. Chicken microbiota can vary across flocks [70], at different times in the bird's life [71] and between freerange and intensively-reared chickens [72]. To provide a truly representative dataset of chicken microbial genomes, it would be necessary to sequence caecal samples from birds from multiple lines and raised under a variety of conditions. However, we do think it is likely that there is a core broiler caecal microbiota which is shared across sites and is irrespective of management conditions. Our comparison to chicken faeces samples from nine countries which were part of a pan-EU project on AMR demonstrates that our MAGs are abundant in chicken populations across Europe and that these new genomes can account for the majority of reads in chicken gut microbiome studies. We also identified several novel Lactobacillus strains which have previously been posited as potential chicken probiotics, including L. crispatus [44][45][46], L. gallinarum [47], L. johnsonii [48,49], L. oris [50], L. reuteri [41,44,51], and L. salivarius [41,49,52].
We also compared the abundance of our MAGs and MAG-derived CAZymes. It should be noted that care should be taken when generalising our findings, as the composition of the microbiota can vary significantly between chicken flocks [70,73]. When analysing the abundance of MAGs between birds from different lines, consuming either a vegetable diet or a diet containing fish meal, we found significant differences in the microbial communities based on both line and diet. This agrees with previous studies where significant differences have been described in the intestinal microbiota of chickens from different lines, including those from faster and slower growing lines [73][74][75]. Differences have also previously been observed in the microbiota when feeding chickens a diet supplemented with fish meal [34,35]. This correlates with differences observed in the weights of birds fed the fish meal diet. Ranger Classic birds fed a fish meal diet weighed significantly more than those fed a vegetable-only diet, whereas there was no significant difference between the weight of the Ross 308 birds fed on these two diets.
Examining those bacteria which were consistently significantly increased in a specific line regardless of diet or a specific diet regardless of line, the majority of these bacteria are novel species; therefore, it is difficult to hypothesise why they are more abundant in particular bird lines or when birds are fed certain diets. Of those species that had previously been identified, the two L. galinarum strains were both consistently found to be more abundant in Ross 308 birds, while Lachnoclostridium sp. An76 CMAG_121 and Faecalibacterium sp. An121 CMAG_31 were found to be more abundant in birds on the vegetable diet. L. gallinarum is a homofermentative and thermotolerant [47,76] species which has previously been suggested as a potential chicken probiotic [45,77,78], while Lachnoclostridium sp. An76 and Faecalibacterium sp. An121 [20] have only very recently been discovered and are therefore not well characterised.
We are unsure why H. pullorum was observed in such high levels in the Ross 308: vegetable diet group. We are unable to rule out contamination from the environment as our groups were housed in separate pens within the same room. We did not observe any negative health effects in this group, and the bacterium is very common in some flocks [43,[53][54][55]79].
We wondered whether the differences in microbiota we observed between groups were associated with changes in the metabolic potential of the caecal microbial communities. Microbes isolated from the chicken caeca have previously been shown to have highly variable metabolic pathways [80,81]. We found that the abundances of certain MAG-derived CAZymes involved in starch and cellulose degradation were significantly differently abundant between lines and diets. These molecules are highly abundant in the predominantly grain-based diets fed to chicken. However, energy from starches and celluloses is not available to the chicken host unless this is first degraded into smaller carbohydrates by the gut microbiota; therefore, differences between the ability of the caecal microbiota to degrade these molecules may lead to greater efficiency of energy extraction from feed [65].
It is also interesting to note that when analysing the abundance of MAG-derived CAZymes in the chicken caeca, we only observed significantly separate clustering of birds by diet in the Ross 308 birds and by line in animals that were consuming the fish meal diet. This indicates that the differences in MAG abundances for these groups resulted in significantly different pools of metabolic genes. However, significant differences in MAG abundances were also observed for Ranger Classics on the two diets and for chickens of different lines consuming the vegetable diet, but this did not result in a significant difference in the total abundance of CAZymes. This finding serves to highlight that changes in microbiota community composition do not necessarily lead to significant changes in the total metabolic potential of that community, although it is possible more significant differences would be observed with a larger sample size. It is worth noting that while our Ross 308 vegetable diet group contained 4 males and 2 females and the other groups contained 3 males and 3 females, gender was found to have no impact on the abundance of CAZymes or MAGs and this therefore should not have impacted our results.
One outlier was observed in our data: chicken 2 appeared to cluster separately by the abundance of its MAGs in comparison to other Ross 308 birds consuming a fish meal diet, supporting the idea that while diet and line are associated with differences in the microbiota, variation will still exist between birds of the same line consuming similar diets. It should also be noted that the individual feed intake of each bird was not measured, meaning that some birds may have consumed different quantities of food, which could lead to variation in their microbiota compositions.

Conclusions
Through the construction of metagenome-assembled genomes, we have greatly increased the quantity of chickenderived microbial genomes present in public databases and our data can be used as a reference dataset in future metagenomic studies. While previous studies have demonstrated that Clostridia are very common in the chicken caeca, our study shows that within this class, there is a wide diversity of species present, something which has perhaps been underestimated by culture-based studies. To gain a mechanistic insight into the function of these bacteria and to capture the wide diversity of bacteria present in chickens, large-scale culture-based studies will be necessary, and despite the utility of metagenomic studies for constructing microbial genomes, culturing followed by whole genome sequencing remains the gold-standard method.

Study design
Ross 308 (Aviagen, UK) (n = 12) and Ranger Classic (Aviagen, UK) (n = 12) chickens were hatched and housed at the National Avian Research Facility in Edinburgh (UK). Birds were fed either a vegetable only diet or a diet supplemented with fish meal (Additional file 1: Table S1) (diet formulation: Additional file 1: Table S2 and S3, nutritional info: Additional file 1: Table S4). Birds received Mareks-Rispins vaccinations (Merial, France) at 1-2 days of age and were housed by group in separate floor pens (within the same room) with wood shaving bedding, and received food and water ad libitum. Stocking densities were based on UK Home Office Animals (Scientific Procedures) Act 1986, resulting in a floor area per bird of 0.133 m 2 at 5 weeks of age. Birds were euthanized by cervical dislocation at 5 weeks of age, and caecal content samples were collected. Contents from both caeca were pooled to make one sample per bird. Samples were stored at 4°C for a maximum of 24 h until DNA extraction, except for those from DNA extraction batch 2 which were frozen at − 20°C for 9 days prior to DNA extraction (Additional file 1: Table S5). DNA extraction was performed as described previously using the DNeasy PowerLyzer PowerSoil Kit (Qiagen, UK) [82]. Shotgun sequencing was performed on a NovaSeq (Illumina) producing 150 bp paired-end reads.
METABAT2 [23] was used on both single-sample assemblies and co-assemblies to carry out metagenomic binning, taking into account coverage values and with the options --minContigLength 2000, --minContigDepth 2. All bins were dereplicated using dRep [89] with the options dereplicate_wf -p 16 -comp 80 -con 10 -str 100 -strW. Bins were dereplicated at 99% average nucleotide identity (ANI), resulting in each MAG being taxonomically equivalent to a microbial strain. On average, 78.43% (SD = 0.022%) of sample reads mapped to these MAGs. Bins were also dereplicated at 95% ANI to calculate the number of species represented within our MAGs. CompareM was used to calculate average amino acid identity (AAI) [90].
The completeness and contamination of all bins were assessed using CheckM [91] with the options lineage_wf, -t 16, -x fa and filtering for completeness ≥ 80% and contamination ≤ 10%. GTDB-Tk [92] was used to assign taxonomy to MAGs, except for CMAG_333 which upon visual inspection of taxonomic trees was identified more accurately as Clostridia. For submission of our MAGs to NCBI, MAGs were named based on the following rule: if the lowest taxonomy assigned by GTDB-Tk did not correlate with an NCBI classification at the correct taxonomic level, then MAGs were named after the lowest taxonomic level at which NCBI and GTDB-Tk matched. Comparative genomics between the MAGs and public datasets was carried out using MAGpy [93]. The taxonomic tree produced by MAGpy was re-rooted manually using Figtree [94] at the branch between Firmicutes and the other bacterial phyla, and subsequently visualised using Graphlan [95]. The novelty of genomes in comparison to those present in public databases was also determined. Genomes were defined as novel strains if the ANI output by GTDB-Tk was < 99%. Genomes were determined as novel species if the ANI output by GTDB-Tk was < 95%, or if an ANI was not output by GTDB-Tk, then the average protein similarity output by MAGpy was < 95%. Genera were defined as novel if all MAGs which clustered at 60% AAI [57] were not assigned a genus by GTDB-Tk. Proposed names for new genera and species belonging to these genera were formulated based on the International Code of Nomenclature of Prokaryotes [96]. To assess the abundance of our MAGs in other chicken populations, reads from Munk et al. [36] were downloaded from the European Nucleotide Archive (accession number: PRJEB22062), trimmed using cutadapt [97], aligned to the MAG database using BWA MEM, and processed using SAMtools.
Carbohydrate active enzymes (CAZymes) were identified by comparing MAG proteins to the CAZy database [56] using dbcan2 (version 7, 24 August 2018). The abundance of CAZyme groups was then calculated as the sum of reads mapping to MAG proteins within each group after using DIAMOND [98] to align reads to the MAG proteins.

Statistics and graphs
Univariate general linear models (GLMs) were performed in SPSS Statistics 21 (IBM) with gender, line, and diet as fixed factors. All other statistical analyses were carried out in R [99] (version 3.5.1.). NMDS (non-metric multidimensional scaling) graphs were constructed using the Vegan package [100] and ggplot2 [101], using the Bray-Curtis dissimilarity. Boxplots were constructed using the ggplot2 package. UpSet graphs were constructed using the UpSetR package [102]. Correlation coefficients, using R's hclust function, were used to cluster samples and MAGs within heatmaps. PERMANOVA analyses were performed using the Adonis function from the Vegan package. The package DESeq2 [103] was used to calculate differences in abundance for individual MAGs, taxonomies, and CAZymes. For MAGs, subsampling to the lowest sample coverage was performed prior to analysis by PERMA-NOVA and NMDS and before calculating the 1× and 10× coverage of MAGs in samples.

Review history
The review history is available as Additional file 8.

Peer review information
Anahita Bishop was the primary editor on this article and managed its editorial process and peer review in collaboration with the rest of the editorial team.