Genomic signatures accompanying the dietary shift to phytophagy in polyphagan beetles
Genome Biology volume 20, Article number: 98 (2019)
The diversity and evolutionary success of beetles (Coleoptera) are proposed to be related to the diversity of plants on which they feed. Indeed, the largest beetle suborder, Polyphaga, mostly includes plant eaters among its approximately 315,000 species. In particular, plants defend themselves with a diversity of specialized toxic chemicals. These may impose selective pressures that drive genomic diversification and speciation in phytophagous beetles. However, evidence of changes in beetle gene repertoires driven by such interactions remains largely anecdotal and without explicit hypothesis testing.
We explore the genomic consequences of beetle-plant trophic interactions by performing comparative gene family analyses across 18 species representative of the two most species-rich beetle suborders. We contrast the gene contents of species from the mostly plant-eating suborder Polyphaga with those of the mainly predatory Adephaga. We find gene repertoire evolution to be more dynamic, with significantly more adaptive lineage-specific expansions, in the more speciose Polyphaga. Testing the specific hypothesis of adaptation to plant feeding, we identify families of enzymes putatively involved in beetle-plant interactions that underwent adaptive expansions in Polyphaga. There is notable support for the selection hypothesis on large gene families for glutathione S-transferase and carboxylesterase detoxification enzymes.
Our explicit modeling of the evolution of gene repertoires across 18 species identifies putative adaptive lineage-specific gene family expansions that accompany the dietary shift towards plants in beetles. These genomic signatures support the popular hypothesis of a key role for interactions with plant chemical defenses, and for plant feeding in general, in driving beetle diversification.
Species richness among eukaryotes varies substantially, with some clades having only a few representatives and others comprising hundreds of thousands of extant species. In particular, the class Insecta outnumbers all other classes with more than half of all described extant species [1, 2]. Beetles (Coleoptera) encompass approximately 380,000 described species, representing ca. 40% of described insect diversity . Several hypotheses have been proposed to explain this richness, notably their complex interactions with flowering plants [1, 4,5,6,7] and a high lineage survival rate . Nevertheless, detailed supporting evidence from molecular genetic studies remains sparse, making it difficult to assess the relative importance of these and other potentially important contributing factors [9, 10].
The remarkable evolutionary success of beetles may have been driven by the interplay between their trophic niche and their genomic content and architecture. This is based on the premise that environmental and ecological conditions are likely to be predominant factors influencing the fate of genetic variation in populations under natural selection , eventually driving divergence into distinct species . Among all components of the biotic environment, the trophic niche (principal source of nourishment) of an organism plays a crucial role in shaping the evolution of phenotypic innovations and their underlying genomic changes, e.g., feeding modes in cichlid fishes , mouth development in Pristionchus nematodes , and bitter taste receptors in vertebrates . Among several hypotheses explaining the tremendous diversity among beetles, a shift from an ancestral diet as saprophages (detritus feeding) or mycophages (fungi feeding)  to phytophagy (feeding on living plant material in a broad sense) is often evoked [1, 4, 5]. While the suborder Adephaga (~ 45,000 species) comprises mostly predatory species, including ground beetles and diving beetles, the largest beetle suborder, Polyphaga (~ 315,000), is predominantly comprised of phytophagous clades, among which the most species-rich families are weevils (Curculionidae, ~ 51,000), longhorn beetles (Cerambycidae, ~ 30,000), and leaf beetles (Chrysomelidae, ~ 32,000) . Phytophagy appeared approximately 425 million years ago, quickly after terrestrial life was established . It progressively diversified to target most plant tissues , shortly before the radiation of flowering plants 120–100 million years ago . In response, plants have evolved diverse strategies to protect themselves, which in turn impose selective pressures on the animals that feed on them.
While many biological processes are likely to play a role in this evolutionary battle, a key weapon in the arsenal of phytophagous insects’ adaptations is their ability to neutralize or minimize the effects of plant secondary compounds. Protein families known to be crucial for eliminating harmful plant toxins are cytochrome P450 monooxygenases (P450s), carboxylesterases (CEs), UDP-glycosyltransferases (UGTs), and glutathione S-transferases (GSTs) . While P450s and CEs modify residues to make compounds more hydrophilic, UGTs and GSTs conjugate xenobiotic compounds to hydrophilic molecules. Detoxification is completed by membrane transporters, such as ATP-binding cassette (ABCs) transporters, which move xenobiotic compounds to where they can either be excreted or less frequently sequestered in order to be reused as a defense mechanism . Additionally, to prevent phytophagous insects from digesting their tissues, plants produce enzyme inhibitors that block catalytic sites or compete with the substrates of enzymes involved in digestion. The major families affected are endopeptidases, such as cysteine (CYSs), and serine (SERs) proteases, as well as more specific enzymes such as glycoside hydrolases (GHs), certain types of which are able to break down polysaccharide molecules, including cellulose, hemicellulose, and pectin in plant cell walls [21, 22]. Other adaptations to phytophagy include repertoires of chemoreceptors that are crucial for finding appropriate food sources  and the specialization of mouthparts in response to plant mechanical barriers, which are highly diversified in insects .
As lineages diverge, their genomes accumulate changes, some of which are expected to be directly linked to functional adaptations. Identifying such genomic features and linking them to phenotypic differences, while robustly distinguishing between the effects of stochastic changes and natural selection , is critical to deciphering the genomic drivers of species radiations . Changes include point substitutions, which may affect the existing functional elements, but also larger-scale changes such as duplications, from individual genes to entire genomes, which by adding new members to the repertoires of key gene families may constitute an ideal mechanism to facilitate the emergence of novel functions leading to successful phytophagy . Whereas newly generated gene copies are usually redundant or deleterious and pseudogenized, rendering the gene copy non-functional , they are sometimes maintained. Particularly, interesting cases of gene family expansions are the ones restricted to specific lineages, resulting in lineage-specific expansion (LSE). Evolutionary mechanisms causing LSE are numerous and not all adaptive (see  for a comprehensive review). However, duplicated gene copies may provide an immediate selective advantage and be maintained by selection. This can be due to an increased dosage of the gene product, or to changes following the duplication being selected in one gene copy but not the other, which might allow evolution towards a different function in so-called neo-functionalization processes. Enzymes are considered particularly relevant candidates for such evolutionary processes as they could expand their range of substrates .
Here, we apply a comparative genomics approach to examine the evolution of genes putatively involved in plant-insect interactions by sampling from the two largest beetle suborders, which, generally speaking, present contrasting trophic niches. We contrast exemplars from the characteristically predaceous Adephaga with exemplars from Polyphaga, and we hypothesize that plant-insect interactions during the dietary shift to phytophagy should be accompanied by genomic evolutionary signatures visible at the subordinal scale. Using genomic and transcriptomic data from 18 beetle species, we estimate ancestral gene family content, taking into account gene gains and losses across the species phylogeny, to identify significant LSEs of gene families related to phytophagy and signatures of adaptive expansions in these families. Ignoring sensory receptors, as their evolution might be driven by agents other than those related strictly to trophic niche , and morphological genes, as their inferred association with diet is less robust, we focus on genes coding for enzymes, for which adaptive LSE specific to Polyphaga would suggest a role for detoxification and digestive pathways in driving adaptation and speciation.
A representative sampling of the two major coleopteran suborders
Reliable estimation of gene gain and loss events requires a robust evolutionary framework, i.e., a phylogeny that includes the species studied, as well as the characterization of gene families across complete gene sets from these same species. To study adaptation to phytophagy, we sampled from both Adephaga (mostly predaceous) and Polyphaga (with diverse trophic habits, including a very large number of phytophagous species). A balanced sampling of each suborder was achieved comprising 12 transcriptomes and 6 genomes, with Benchmarking Universal Single-Copy Ortholog (BUSCO) completeness estimates [31, 32] ranging from 71.9 to 97% (Fig. 1, Table 1). The molecular species phylogeny estimated using protein sequences of 405 BUSCO genes found to be complete in all species and in the strepsipteran outgroup, Stylops melittae, was used to build the time-calibrated species phylogeny (Fig. 1). Subsequent analyses of gene gain and loss rates and of signatures of adaptive gene family expansion employed this ultrametric species tree, which importantly shows no significant difference in node ages within each suborder. Protein-coding sequence predictions ranged from 9844 to 24,671 genes per beetle species. These sequences matched 14,908 Arthropoda orthologous groups (OGs) containing at least 1 species of Coleoptera in the OrthoDB v8 catalog . This represented a minimum of 6742 and a maximum of 11,149 OGs for Carabus frigidus and Leptinotarsa decemlineata, respectively. OGs containing genes from only 1 of the 2 sampled suborders were excluded, resulting in a total of 9720 OGs for the analysis that have evolutionary histories traceable to the last common ancestor of beetles. Functional annotations of the sequences within these OGs were used to identify and assign several of them to enzyme families relevant to the tested hypothesis. These candidate OGs comprised 4 UGTs, 22 P450s, 19 CEs, 6 GSTs, 4 SERs, 7 CYSs, 28 ABCs, and 1 GH, for a total of 91 candidate OGs from 8 families of genes (i.e., functional categories) (Table 2).
Polyphaga exhibit more frequent gains across a larger set of OGs
Analysis of per-species gene counts of the complete set of 9720 OGs was performed with the Computational Analysis of gene Family Evolution (CAFE v3)  tool. CAFE analyses changes in gene family sizes using a stochastic birth and death process to model gene gain and loss across a species phylogeny. It estimates gain and loss rates from the extant gene count data, taking errors into account to allow for accurate inferences even with incomplete datasets, and identifies gene families with significantly accelerated rates of gain and loss. The mode considering distinct gene gain (λ = 0.0019 gain/gene/million years) and gene loss (μ = 0.0018 loss/gene/million years) was preferred over a single value for λ and μ, having a significantly greater maximum likelihood score (see the “Methods” section). The λ (gain) and μ (loss) values predicted when CAFE was run on each suborder separately were λ = 0.0020 and μ = 0.0027 for Adephaga, versus λ = 0.0023 and μ = 0.0021 for Polyphaga, showing a tendency for Adephaga to lose genes and for Polyphaga to gain genes. Among the 9720 OGs were 21 with reported expansions originating at the Adephaga root and 126 at the Polyphaga root (see Fig. 1 to locate the nodes). Conversely, 240 OGs showed gene losses for Adephaga and 354 for Polyphaga. Two expansions and 21 losses affected the candidate OGs for Adephaga, and 9 expansions and 6 losses for Polyphaga. Other polyphagan nodes leading to phytophagous-rich clades (i.e., Chrysomeloidea and Curculionidae) also exhibited more candidate OGs expanding than contracting (Fig. 1). All counts of gene gains and losses per node are presented in Additional file 1: Figures S1 and S2. Additionally, CAFE assigned individual OG p values of < 0.01 to a subset of 910 (9.3%) OGs, which, according to De Bie et al. 2006 , indicates gene families likely to have experienced accelerated rates of gain and loss. These are interesting to investigate further as they may represent large OGs of potentially unequal size between the suborders. Among these were 26 of the 91 candidate OGs (28.6%), a significantly larger proportion (two-sample test for equality of proportions, chi-square test, p value < 0.0001) compared with just 9.3% of non-candidate OGs.
Signatures of adaptive expansion are more prevalent in Polyphaga
All 910 OGs with significant variations in their gene content were tested for signatures of adaptive expansion in each suborder, by comparing Brownian motion (BM, neutral) to Ornstein-Uhlenbeck (OU, selective pressure) evolutionary models . As mentioned previously, these included 26 OGs that belong to one of the functional categories listed in Table 2 (“candidate” OGs). The models consider per-species gene count as a trait that can evolve towards a value, which may or may not differ between the two suborders and may or may not be guided by selective pressure; we call this the “optimum” value in models integrating selection. The BM models assume no selection where differences between the suborders result from stochastic processes whose rates are estimated. The OU models assume that reaching an optimal gene family size in each suborder is driven by selective pressure. No adaptive LSE is represented by the null hypotheses of BM models with a single rate for the whole tree (BM1) or with different rates for each suborder (BMS) or the OU model with selection towards the same optimum for both suborders (OU1). Adaptive LSE is represented by the alternative hypotheses of OU models with selection towards two optima having the same variance (OUM) or with selection towards two optima having two variances (OUMV). In total, 21 OGs displayed a higher optimum for Adephaga (0.2% of the initial 9720 OGs) and 88 for Polyphaga (0.9%). Eight of these 88 OGs with higher optima in Polyphaga (Table 3; Additional file 1: Table S1; and gene trees in Fig. 2 and Additional file 1: Figures S3-S9) are candidate OGs belonging to one of the candidate gene families of Table 2, while none of the 21 OGs with higher optima in Adephaga belong to any of the candidate gene families. The proportion of OGs with expansions and higher optima in the background (all “candidate” and remaining “control” OGs) was significantly larger for Polyphaga compared to Adephaga (two-sample test for equality of proportions, chi-squared, 88/9720 vs. 21/9720, p value < 1e−09), indicating that Polyphaga have experienced globally more LSE under selection on protein-coding genes. Additionally, a test for enrichment (see the “Methods” section) of OGs with LSE under selection from the candidate families (Table 2) compared to the background was significant for Polyphaga (8/91 vs. 88/9720, p value < 1e−09). The same test applied individually on each candidate gene family within the candidate dataset demonstrated that categories enriched for LSE under selection in Polyphaga were GSTs (3/6 positive tests, FDR-corrected p value < 1e−09) and CEs (3/19, FDR-corrected p value < 0.005), as shown in detail in Table 4. Furthermore, functional genomics data from the polyphagan Asian longhorned beetle  supported a biological role in plant feeding activities for the candidate OGs that tested positive for adaptive expansions, which were enriched with genes upregulated in larvae fed on sugar maple trees vs. a nutrient-rich artificial diet (36/114 vs. 1391/12,461, p value < 1e−10).
Comparative genomic analyses often highlight expanded gene families and link these expansions to biological functions peculiar to, or of special interest in, their focal organism(s). However, these analyses usually do not explicitly test for any hypothesized evolutionary model that might support such links. Here, we test a specific hypothesis of adaptation to a phytophagous diet, by comparing candidate gene family repertoires from nine adephagan (a mostly predaceous suborder) and nine polyphagan (a highly phytophagous suborder) beetle species. These candidate families are putatively involved in detoxification of plant allelochemicals and digestion of plant tissues. Specifically, we identify evolutionary signatures consistent with adaptive gene family expansions in the species-rich Polyphaga. These patterns should nevertheless be interpreted in the context of potentially confounding factors that could arise from combining genomic and transcriptomic datasets, conservative definitions of candidate gene families, or the greater species richness of the Polyphaga (see discussion points below and Additional file 1: Supplementary Results). Through explicitly testing for adaptive LSEs, these results offer support for the key evolutionary role of the phytophagous trophic niche in driving gene family expansions in Coleoptera (specifically Polyphaga), a feature that likely facilitated the adaptation of polyphagan beetles to specialized plant feeding.
For the comparison of gene repertoires between the two groups to be unbiased, the gene content of all analyzed species should be of similar accuracy and completeness. The number of predicted proteins for the genomic resources for each beetle species (Table 1, mean 15,977 and standard deviation 3748) was within the range expected for insects (see ). The average total gene count for Adephaga species (all transcriptomes) was about 4200 fewer than for Polyphaga, which include 2 genomes with more than 22,000 genes. This difference in average gene counts is reduced to just 1384 when considering only genes assigned to the 9720 OGs selected for the analysis. Our conservative orthology filtering therefore ensured that the comparisons focused on gene families with reliably traceable evolutionary histories that span both groups of beetles. Secondly, assessments of completeness showed that the majority of the datasets contained more than 90% of complete BUSCOs (Fig. 1, Table 1). While the dynamically evolving families that are the focus of this study are clearly not universal single-copy orthologs, the high levels of BUSCO completeness support the assumption that the datasets represent good coverage of the species’ gene content. Furthermore, the transcriptomes were sequenced from adults collected from their natural habitats, so members of the gene families that are the focus of our study were likely being actively transcribed at the time of sampling. Re-analyses of our data that exclude the two adephagan beetle species with fewer than 80% complete BUSCOs reduced the power of the model tests but nevertheless still identified the three GST OGs that favor a model with higher optima for Polyphaga (see Additional file 1: Table S2). Three of the adephagan transcriptomes showed more than 10% of duplicated BUSCOs, which could have arisen from suboptimal filtering of the transcriptomes, i.e., failure to remove alternative transcripts of the same gene. While such potentially inflated gene counts for these adephagans might prevent the identification of some true expansions in Polyphaga, they do not invalidate those that were identified. Conversely, transcriptome assemblies might collapse very similar paralogs into a single transcript and thereby underestimate true gene counts. Our gene trees (Fig. 2 and Additional file 1: Figures S3-S9) nevertheless show several examples of closely related paralogs from the transcriptomes, indicating that they can be successfully recovered. Finally, half of the OGs representing positive results showed a higher mean value for polyphagan transcriptomes than genomes, including the three GST OGs (Additional file 1: Table S1), and explicitly testing for effects due to using both genome and transcriptome data for the species of Polyphaga, by performing a modified OUwie analysis with data type as the regime under selection, identified only one CE (EOG8KD911) for which the favored model linked gene family expansion to species with genomes (see Additional file 1: Table S3).
Candidate OG identification
The annotation strategy was designed to link OGs to candidate gene families based on manually selected keywords used to filter sequence search results, as well as Pfam and InterPro identifiers (Table 2), with the aim of excluding false positives (see the “Methods” section). This conservative strategy may not have fully captured all possible candidate OGs, which would therefore have remained in the background set of OGs that were used as controls. For example, we identified nine GST OGs (six were retained as candidates after filtering) while ten subclasses have been identified in arthropods . While the strict (conservative) strategy we employed to identify candidate OGs may have resulted in an underestimate of the extent of the observed effects, this does not invalidate those that were identified. In addition, filtering the OGs to retain only those with genes from both Adephaga and Polyphaga excluded from the analyses any genes that were specific to either suborder. These might include genes with key roles in phytophagy, e.g., enzymes acquired by horizontal gene transfer identified from the A. planipennis, A. glabripennis, and D. ponderosae genomes . While acknowledging their importance, here, we explicitly tested for adaptive LSE in one lineage vs. the other so gene evolutionary histories were required to span the two suborders and thus be traceable to their last common ancestor.
The more speciose Polyphaga exhibit more dynamic gene repertoire evolution
The loss (μ) and gain (λ) values reported by CAFE on all 9720 OGs are consistent with assessments of other insect clades [46, 47]. Although the overall gain rate is slightly higher than the loss rate, the number of OGs losing genes reported by CAFE at each individual node is generally larger than the number of OGs with gains. This is reconciled by considering that across Coleoptera, many OGs lost a few genes while few families gained many genes. As most OGs display a low number of genes per species, i.e., they are evolving under “single-copy control” , losing more than one ortholog per species is understandably rare, while there is no theoretical limit for an OG to gain new members. Comparing the two clades, Polyphaga has a higher rate of gene gain and six times more OGs with gains, and while the Adephaga rate of gene loss is higher, Polyphaga have 1.5 times more OGs that have experienced gene losses. Importantly, these rates were estimated using a time-calibrated ultrametric species phylogeny with no significant difference in node ages between the two suborders. Hence, the gene repertoires of Polyphaga exhibit a more dynamic evolutionary history with more gains (rate) in more OGs (counts) and fewer losses (rate) spread out over more OGs (counts). It is possible that this greater dynamism may be generally linked to the greater species richness of Polyphaga, with no specific role for phytophagy underpinning this trend. However, among the candidate OGs for detoxification and digestion, there are also more gains in Polyphaga and, in contrast to the background, fewer losses. Thus, both gain and maintenance are higher for candidates in Polyphaga, which is consistent with a key role for phytophagy in driving dynamic gene repertoire evolution, and particularly LSEs.
Support for adaptive expansions of gene families involved in detoxification in polyphagan beetles
In addition to observing more expansions among candidate OGs in the suborder Polyphaga, the positive results from the OUwie analysis support the hypothesis that selective pressures drive detoxification enzymes towards larger gene family sizes. This is especially pronounced for GSTs, for which half of the OGs tested positive and for which a significant enrichment compared to the background was found. The importance of GSTs in dietary shifts to phytophagy has been noted in mustard-feeding flies, where duplicated GSTs involved in the mercapturic acid pathway showed signatures of positive selection . Our results therefore suggest that comparable phenomena have been acting at the level of polyphagan beetles. The CEs also show a statistically significant enrichment for positive results compared to the background, further supporting the diet detoxification hypothesis. The other positive results include a P450 OG and a CYS OG, neither of which led to a category enrichment compared to the background. The P450 OG is by far the largest among the positive results (Additional file 1: Figure S3), highlighting the importance of P450s in beetle (and generally insect) physiology with diverse roles beyond detoxification, e.g., hormone biosynthesis . However, while considered as significantly expanded and under selection by our model, the actual mean values in the suborders are not dramatically different. Importantly, the enrichment of positive results among candidates still holds if this OG is excluded (see Additional file 1: Supplementary Results). The involvement of P450s in many other processes may explain why a broader difference between the suborders was not identified. Apart from one positive result among the cysteine proteases (no significant category enrichment), our study did not highlight additional expansions in other digestive enzymes or in transporters within a suborder. The lack of evidence for expansion in Polyphaga with respect to ABC transporters, which is the candidate functional category encompassing the largest number of OGs, may indicate that the ancestral diversity of transporters was sufficient for maintaining the excretion of toxins, despite variations in the substrates imposing a selective pressure on early stages of the detoxification pathway. Alternatively, if such pressure were acting on later stages of the pathway, i.e., transporters, its strength could have been too low for the detection power of our methods and data, unlike for GSTs or CEs.
Our modeling of gene repertoire evolution across 18 beetle species identifies putative adaptive lineage-specific gene family expansions that accompany a dietary shift towards plant feeding. As discussed above, the use of transcriptome data for 12 species presents a potentially confounding factor that we attempt to control for in our analysis design and supplementary analyses. In addition, our set of species allows for only a single comparison between a mainly plant feeding and a mainly predacious suborder of beetles. Confirmation and generalization of our observed trends would thus ideally involve whole-genome sequencing to assemble and annotate high-quality genomes for improved resolution and confidence, as well as sampling from other beetle clades or some of the many groups of insects with dietary shifts towards plant feeding  to enable phylogenetic replication .
By comparing the degree of expansion among the gene families involved in detoxification of plant secondary compounds in two suborders of beetles characterized by generally contrasting trophic niches (i.e., Polyphaga contain a high proportion of phytophagous species while Adephaga encompass mostly predacious species), we identify genomic support for the popular hypothesis that Coleoptera species richness may be in part explained by their interaction with land plants. Candidate OGs of GSTs, CEs, P450s, and CYSs tested positive for adaptive LSEs in the phytophagous polyphagan beetle lineage, and categories of GSTs and CEs, in particular, were enriched for OGs with such adaptive LSEs. Moreover, across all OGs tested, Polyphaga exhibited significantly more adaptive LSEs than Adephaga. This indicates that genes other than the candidate detoxification and digestion enzymes, which could include genes with functions less obviously related or unrelated to phytophagy, are also likely to have played a role in the adaptive success and diversification of Polyphaga. While this suggests that additional functional categories remain to be explored, contrasting gene family evolution across the two major suborders of beetles suggests a role for interactions with plant secondary compounds, and supports a role for phytophagy in general, as important drivers of the remarkable radiation of polyphagan beetles.
This study included 6 genomes and 13 transcriptomes representing a balanced sampling of polyphagan and adephagan beetles, along with 1 representative of the sister group to Coleoptera, Strepsiptera, to root the species phylogeny. Annotated gene sets from 4 genomes were sourced from the i5k pilot project datasets  (Anoplophora glabripennis v0.5.3 , Leptinotarsa decemlineata v0.5.3 , Onthophagus taurus v0.5.3, Agrilus planipennis v0.5.3) and 2 were independently published Dendroctonus ponderosae Ensembl Metazoa v1.0  and Tribolium castaneum Ensembl Metazoa v3.22 . One Polyphaga transcriptome, Laparocerus tessellatus (Additional file 1), was sequenced for this project , and the others were provided by the 1KITE project (Additional file 1, [34, 35, 54]). A detailed list is presented in Table 1.
Coding sequence predictions, transcriptome, and genome quality assessments
Coding sequences and peptide sequences were predicted from all transcriptomes using TransDecoder (v2.0.1 https://transdecoder.github.io [last accessed May 8, 2019]) along with a custom python script to retain the best-scoring entry among overlapping predictions. The coding sequences and peptide sequences from the genomes were retrieved from their official annotated gene sets. All genome and transcriptome gene sets were assessed using BUSCO (v2.0, python 3.4.1, dataset insecta_odb9/2016-10-21, mode proteins) . This tool identifies near-universal single-copy orthologs by using hidden Markov model profiles from amino acid alignments. CD-HIT-EST v4.6.1  was run on the protein sequences with a 97.5% identity threshold to ensure that all species datasets were filtered to select a single isoform per gene.
The OrthoDB  hierarchical orthology delineation procedure was employed to predict orthologous protein groups (OGs). Briefly, protein sequence alignments are assessed to identify all best reciprocal hits (BRHs) between genes from each pair of species, which are then clustered into OGs following a graph-based approach that starts with BRH triangulation. The annotated proteins from the genomes of A. planipennis, O. taurus, and all transcriptomes were mapped to OrthoDB v8 at the Arthropoda level (with 87 species including 4 of the beetles with sequenced genomes). Mapping uses the same BRH-based clustering procedure but only allows genes from mapped species to join existing OGs. These OGs were then filtered to identify the 9720 OGs with representatives from both Polyphaga and Adephaga to focus the study on OGs with evolutionary histories traceable to the last common ancestor of all the beetles, i.e., 5188 OGs with genes from only 1 of the 2 suborders were removed.
Time-calibrated species phylogeny
To build an ultrametric phylogeny required for the CAFE analyses, the maximum likelihood molecular species phylogeny was first estimated based on the concatenated superalignment of orthologous amino acid sequences from each of the datasets. Protein sequences of single-copy BUSCO genes and the best-scoring duplicated genes present in all species were individually aligned for each set of BUSCO-identified orthologs using MAFFT with the --auto parameter , and each result was manually reviewed to exclude poor-quality alignments. Four hundred and five alignments were retained out of 436 and concatenated into a superalignment, partitioned according to the best model for each set of orthologs using aminosan 1.0.2015.01.23 . RAxML v8.1.2 (-f a -m PROTGAMMA -N 1000)  was used to compute the maximum likelihood tree. The monophyly of Geadephaga and Hydradephaga was constrained to match the generally accepted resolution of Adephaga (as ). The chronos function of the R package ape (v3.4 on R 3.2.1, relaxed model)  was used to obtain an ultrametric tree, and the tip to root length was adjusted to match the approximately 250 million-year evolutionary history of crown group Coleoptera .
Functional annotation and definition of candidate genes
InterProScan was run on all species' protein sets (-appl Pfam --goterms, 5.16.55)  to identify protein families. Additionally, blastp 2.3.0 [61, 62] was run against uniref50 (version June 22, 2016; ) with an e value cutoff of 1e−20. An OG was included in the set of candidate OGs when it had a match to both the uniref50 clusters and Pfam families  or gene ontologies  as detailed in Table 2.
The number of genes in OGs for each species was counted. All candidates and remaining (control) OGs were pooled together and processed with CAFE 3.1 , to infer gene family evolution in terms of gene gains and losses. First, the python script provided by CAFE was used to estimate the error in our dataset. The CAFE software was then run using the mode in which the gain and loss rates are estimated together (λ) and a second mode in which they are estimated separately (gains = λ, losses = μ). The more complex model was retained as it reached a significantly better score (− 199,989 for a single estimated parameter and − 199,981 for two distinct estimated parameters, 2× delta log-likelihood = 16, chi-squared distribution, df = 1). For the entire analysis, the CAFE overall p value threshold was kept at its default value (0.01). To run CAFE on each suborder separately, the newick file was pruned to retain only the required species using newick utils 1.1.0 .
To evaluate adaptive OG expansion, the likelihood of the count data was tested by optimizing parameters considering two methods provided by the OUwie R package v1.51 [43, 67]. First, a Brownian motion (BM) approach was used, which assumes no selection and thus differences result from a stochastic process whose rate is estimated. Second, Ornstein-Uhlenbeck (OU) models were used. They take into account an optimal family size that is obtained by selective pressure. Two groups were defined in the phylogeny, namely Polyphaga and Adephaga, to which the two different regimes to consider were assigned, plus a third regime to the root. This represents a simplified scenario allowing for the comparison of gene contents between one group and the other rather than attempting to estimate “levels” of phytophagy or zoophagy across the phylogeny. The models BM1 (Brownian motion with a single rate for the whole tree), BMS (Brownian motion with different rates for each group), and OU1 (selection towards the same optimum for both groups) were optimized as null hypotheses (H0) and compared to OUM (selection towards two optima, same variance) and OUMV (selection towards two optima, two variances) models as alternative hypotheses (H1). The Akaike information criterion corrected for small sample size (AICc)  was used to compare the models, and an AICc > 2 between the best H0 and the best H1 model was considered as significant to prefer the H1 model.
All results for candidates and controls were pooled together to obtain a background distribution of positive and negative results. Positive results are those OGs that passed the OUwie analysis, and negative results are all of the 9720 OGs that did not obtain a significant overall CAFE p value or did not pass the OUwie analysis. Then, 100,000 random draws (using the R function sample, without replacement) having the sample size of the candidate category to test for enrichment were taken from the background, and the significant outcomes for Polyphaga and Adephaga were counted. A p value was calculated for each group as follows: the number of random draws reaching the amount of significant outcomes found for the candidate category, or more, divided by 100,000. Additionally, the multiple tests conducted on each individual candidate category were corrected for false discovery rate (FDR) using the R p.adjust function (method BH, Benjamini Hochberg).
The alignments for the gene trees for the eight OGs that tested positive for adaptive expansions were produced using MAFFT with the --auto parameter. The gene trees were computed with RAxML v8.1.2 (-f a -m PROTGAMMALGF -N 100) and plotted with EvolView [69, 70].
Adenosine triphosphate-binding cassette transporters
Akaike Information Criterion corrected for small sample size
Benchmarking Universal Single-Copy Ortholog assessment tool
Computational Analysis of gene Family Evolution tool
False discovery rate
Multiple Alignment using Fast Fourier Transform tool
Cytochrome P450 monooxygenase
Farrell BD. “Inordinate Fondness” explained: why are there so many beetles? Science. 1998;281:555–9.
Grimaldi DA, Engel MS. Evolution of the insects. Cambridge: Cambridge University Press; 2005.
Slipinski SA, Leschen RAB, Lawrence JF. Order Coleoptera Linnaeus, 1758. In: Zhang Z, editor. Animal biodiversity: An outline of higher-level classification and survey of taxonomic richness. Aukland: Magnolia Press; 2011. p. 237.
Leschen RAB, Buckley TR. Multistate characters and diet shifts: evolution of Erotylidae (Coleoptera). Syst Biol. 2007;56:97–112 Lewis P, editor.
McKenna DD, Sequeira AS, Marvaldi AE, Farrell BD. Temporal lags and overlap in the diversification of weevils and flowering plants. Proc Natl Acad Sci U S A. 2009;106:7083–8.
Mckenna DD, Wild AL, Kanda K, Bellamy CL, Beutel RG, Caterino MS, et al. The beetle tree of life reveals that Coleoptera survived end-Permian mass extinction to diversify during the Cretaceous terrestrial revolution. Syst Entomol. 2015;40:835–80 John Wiley & Sons, Ltd (10.1111).
Zhang S-Q, Che L-H, Li Y, Dan Liang D, Pang H, Ślipiński A, et al. Evolutionary history of Coleoptera revealed by extensive sampling of genes and species. Nat Commun. 2018;9:205 Nature Publishing Group.
Hunt T, Bergsten J, Levkanicova Z, Papadopoulou A, St JO, Wild R, et al. A comprehensive phylogeny of beetles reveals the evolutionary origins of a superradiation. Science. 2007;318:1913–6.
Barraclough TG, Barclay MV, Vogler AP. Species richness: does flower power explain beetle-mania? Curr Biol. 1998;8:R843–5.
Suchan T, Alvarez N. Fifty years after Ehrlich and Raven, is there support for plant-insect coevolution as a major driver of species diversification? Entomol Exp Appl. 2015;157:98–112 John Wiley & Sons, Ltd.
Barrick JE, Lenski RE. Genome dynamics during experimental evolution. Nat Rev Genet. 2013;14:827–39.
Seehausen O, Butlin RK, Keller I, Wagner CE, Boughman JW, Hohenlohe PA, et al. Genomics and the origin of species. Nat Rev Genet. 2014;15:176–92.
Parsons KJ, Concannon M, Navon D, Wang J, Ea I, Groveas K, et al. Foraging environment determines the genetic architecture and evolutionary potential of trophic morphology in cichlid fishes. Mol Ecol. 2016;25:6012–23.
Ragsdale EJ, Müller MR, Rödelsperger C, Sommer RJ. A developmental switch coupled to the evolution of plasticity acts through a sulfatase. Cell. 2013;155:922–33.
Li D, Zhang J. Diet shapes the evolution of the vertebrate bitter taste receptor gene repertoire. Mol Biol Evol. 2014;31:303–9.
Betz O, Thayer MK, Newton AF. Comparative morphology and evolutionary pathways of the mouthparts in spore-feeding Staphylinoidea (Coleoptera). Acta Zool. 2003;84:179–238. John Wiley & Sons, Ltd (10.1111).
Labandeira CC. The history of associations between plants and animals. In: Herrera CM, Pellmyr OM, editors. Plant-animal interactions: an evolutionary approach. Oxford: Blackwell Science; 2002. p. 24–74.
Labandeira CC. Deep-time patterns of tissue consumption by terrestrial arthropod herbivores. Naturwissenschaften. 2013;100:355–64.
Grimaldi D. The co-radiations of pollinating insects and angiosperms in the Cretaceous. Ann Missouri Bot Gard. 1999;86:373.
Voelckel C, Jander G. Insect-plant interactions. Annual Plant Reviews, Volume 47. Oxford: Wiley Blackwell; 2014.
McKenna DD, Scully ED, Pauchet Y, Hoover K, Kirsch R, Geib SM, et al. Genome of the Asian longhorned beetle (Anoplophora glabripennis), a globally significant invasive species, reveals key functional and evolutionary innovations at the beetle–plant interface. Genome Biol. 2016;17:227.
Pauchet Y, Wilkinson P, Chauhan R, Ffrench-Constant RH. Diversity of beetle genes encoding novel plant cell wall degrading enzymes. PLoS One. 2010;5:e15635.
Goldman-Huertas B, Mitchell RF, Lapoint RT, Faucher CP, Hildebrand JG, Whiteman NK. Evolution of herbivory in Drosophilidae linked to loss of behaviors, antennal responses, odorant receptors, and ancestral diet. Proc Natl Acad Sci U S A. 2015;112:3026–31.
Labandeira CC. Insect mouthparts: ascertaining the paleobiology of insect feeding strategies. Annu Rev Ecol Syst. 1997;28:153–93 4139 El Camino Way, P.O. Box 10139, Palo Alto, CA 94303-0139, USA.
Hurst LD. Fundamental concepts in genetics: genetics and the understanding of selection. Nat Rev Genet. 2009;10:83–93.
Shaw KL, Lesnick SC. Genomic linkage of male song and female acoustic preference QTL underlying a rapid species radiation. Proc Natl Acad Sci. 2009;106:9737–42.
Kondrashov FA. Gene duplication as a mechanism of genomic adaptation to a changing environment. Proc R Soc B Biol Sci. 2012;279:5048–57.
Innan H, Kondrashov F. The evolution of gene duplications: classifying and distinguishing between models. Nat Rev Genet. 2010;11:97–108.
Francino MP. An adaptive radiation model for the origin of new gene functions. Nat Genet. 2005;37:573–8.
Brito NF, Moreira MF, Melo ACA. A look inside odorant-binding proteins in insect chemoreception. J Insect Physiol. 2016;95:51–65 Pergamon.
Simão FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31:3210–2.
Waterhouse RM, Seppey M, Simão FA, Manni M, Ioannidis P, Klioutchnikov G, et al. BUSCO applications from quality assessments to gene prediction and phylogenomics. Mol Biol Evol. 2018;35:543–8 Oxford University Press.
Niehuis O, Hartig G, Grath S, Pohl H, Lehmann J, Tafer H, et al. Genomic and morphological evidence converge to resolve the enigma of Strepsiptera. Curr Biol. 2012;22:1309–13.
Vasilikopoulos A, Balke M, Beutel RG, Donath A, Podsiadlowski L, Pflug JM, et al. Phylogenomics of the superfamily Dytiscoidea (Coleoptera: Adephaga) with an evaluation of phylogenetic conflict and systematic error. Mol Phylogenet Evol. 2019;135:270–85.
Misof B, Liu S, Meusemann K, Peters RS, Donath A, Mayer C, et al. Phylogenomics resolves the timing and pattern of insect evolution. Science. 2014;346:763–7.
Keeling CI, Yuen MM, Liao NY, Docking TR, Chan SK, Taylor GA, et al. Draft genome of the mountain pine beetle, Dendroctonus ponderosae Hopkins, a major forest pest. Genome Biol. 2013;14:R27.
Schoville SD, Chen YH, Andersson MN, Benoit JB, Bhandari A, Bowsher JH, et al. A model species for agricultural pest genomics: the genome of the Colorado potato beetle, Leptinotarsa decemlineata (Coleoptera: Chrysomelidae). Sci Rep. 2018;8:1931.
Seppey M, Pitteloud C, Emerson BC, Alvarez N. Laparocerus tessellatus adult full-body transcriptome [Internet]. Zenodo. 2018. Available from: https://doi.org/10.5281/zenodo.1336288.
Richards S, Gibbs RA, Weinstock GM, Brown SJ, Denell R, Beeman RW, et al. The genome of the model beetle and pest Tribolium castaneum. Nature. 2008;452:949–55.
Kriventseva EV, Tegenfeldt F, Petty TJ, Waterhouse RM, Simão FA, Pozdnyakov IA, et al. OrthoDB v8: update of the hierarchical catalog of orthologs and the underlying free software. Nucleic Acids Res. 2015;43:D250–6.
Han MV, Thomas GW, Lugo-Martinez J, Hahn MW. Estimating gene gain and loss rates in the presence of error in genome assembly and annotation using CAFE 3. Mol Biol Evol. 2013;30:1987–97.
De Bie T, Cristianini N, Demuth JP, Hahn MW. CAFE: a computational tool for the study of gene family evolution. Bioinformatics. 2006;22:1269–71.
Beaulieu JM, Jhwueng DC, Boettiger C, O’Meara BC. Modeling stabilizing selection: expanding the Ornstein-Uhlenbeck model of adaptive evolution. Evolution. 2012;66:2369–83.
Waterhouse RM. A maturing understanding of the composition of the insect gene repertoire. Curr Opin Insect Sci. 2015;7:15–23.
Roncalli V, Cieslak MC, Passamaneck Y, Christie AE, Lenz PH. Glutathione S-transferase (GST) gene diversity in the crustacean Calanus finmarchicus - contributors to cellular detoxification. PLoS One. 2015;10:e0123322. Uversky VN, editor.
Hahn MW, Han MV, Han SG. Gene family evolution across 12 Drosophila genomes. PLoS Genet. 2007;3:e197.
Neafsey DE, Waterhouse RM, Abai MR, Aganezov SS, Alekseyev MA, Allen JE, et al. Highly evolvable malaria vectors: the genomes of 16 Anopheles mosquitoes. Science. 2015;347:1258522.
Waterhouse RM, Zdobnov EM, Kriventseva EV. Correlating traits of gene retention, sequence divergence, duplicability and essentiality in vertebrates, arthropods, and fungi. Genome Biol Evol. 2011;3:75–86.
Gloss AD, Vassão DG, Hailey AL, Nelson Dittrich AC, Schramm K, Reichelt M, et al. Evolution in an ancient detoxification pathway is coupled with a transition to herbivory in the Drosophilidae. Mol Biol Evol. 2014;31:2441–56.
Kong Y, Liu X-P, Wan P-J, Shi X-Q, Guo W-C, Li G-Q. The P450 enzyme shade mediates the hydroxylation of ecdysone to 20-hydroxyecdysone in the Colorado potato beetle, Leptinotarsa decemlineata. Insect Mol Biol. 2014;23:632–43.
Mitter C, Farrell B, Wiegmann B. The phylogenetic study of adaptive zones: has phytophagy promoted insect diversification? Am Nat. 1988;132:107–28 [University of Chicago Press, American Society of Naturalists].
Kopp A. Metamodels and phylogenetic replication: a systematic approach to the evolution of developmental pathways. Evolution. 2009;63:2771–89.
Robinson GE, Hackett KJ, Purcell-Miramontes M, Brown SJ, Evans JD, Goldsmith MR, et al. Creating a buzz about insect genomes. Science. 2011;331:1386.
Peters RS, Krogmann L, Mayer C, Donath A, Gunkel S, Meusemann K, et al. Evolutionary history of the Hymenoptera. Curr Biol. 2017;27:1013–8 Cell Press.
Li W, Godzik A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics. 2006;22:1658–9.
Katoh K, Standley DM. MAFFT multiple sequence alignment software version 7: improvements in performance and usability. Mol Biol Evol. 2013;30:772–80.
Tanabe AS. Kakusan4 and Aminosan: two programs for comparing nonpartitioned, proportional and separate models for combined molecular phylogenetic analyses of multilocus sequence data. Mol Ecol Resour. 2011;11:914–21.
Stamatakis A. RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies. Bioinformatics. 2014;30:1312–3.
Paradis E, Claude J, Strimmer KAPE. Analyses of phylogenetics and evolution in R language. Bioinformatics. 2004;20:289–90 Oxford University Press.
Jones P, Binns D, Chang HY, Fraser M, Li W, McAnulla C, et al. InterProScan 5: genome-scale protein function classification. Bioinformatics. 2014;30:1236–40.
Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, et al. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997;25:3389–402.
Camacho C, Coulouris G, Avagyan V, Ma N, Papadopoulos J, Bealer K, et al. BLAST+: architecture and applications. BMC Bioinformatics. 2009;10:421 BioMed Central.
Suzek BE, Wang Y, Huang H, McGarvey PB, Wu CH, UniProt Consortium. UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches. Bioinformatics. 2015;31:926–32.
Finn RD, Coggill P, Eberhardt RY, Eddy SR, Mistry J, Mitchell AL, et al. The Pfam protein families database: towards a more sustainable future. Nucleic Acids Res. 2016;44:D279–85.
Carbon S, Dietze H, Lewis SE, Mungall CJ, Munoz-Torres MC, Basu S, et al. Expansion of the gene ontology knowledgebase and resources. Nucleic Acids Res. 2017;45:D331–8.
Junier T, Zdobnov EM. The Newick utilities: high-throughput phylogenetic tree processing in the Unix shell. Bioinformatics. 2010;26:1669–70.
Beaulieu JM, O’Meara B. OUwie: analysis of evolutionary rates in an OU framework [Internet]. 2016. Available from: https://cran.r-project.org/web/packages/OUwie/index.html
Hurvich CM, Tsai CL. Regression and time series model selection in small samples. Biometrika. 1989;76:297–307.
He Z, Zhang H, Gao S, Lercher MJ, Chen W-H, Hu S. Evolview v2: an online visualization and management tool for customized and annotated phylogenetic trees. Nucleic Acids Res. 2016;44:W236–41.
Zhang H, Gao S, Lercher MJ, Hu S, Chen W-H. EvolView, an online tool for visualizing, annotating and managing phylogenetic trees. Nucleic Acids Res. 2012;40:W569–72.
Seppey M, Ioannidis P, Emerson BC, Pitteloud C, Robinson-Rechavi M, Roux J, et al. Genomic signatures accompanying the dietary shift to phytophagy in polyphagan beetles [Internet]. Zenodo. 2019. Available from: https://doi.org/10.5281/zenodo.2593899.
The authors thank the anonymous reviewers for their constructive feedback. The authors thank Rolf Beutel, Adam Slipinski, Kai Schuette, Ralph Peters, and Eric Anton for providing specimens for the 1KITE samples, as well as Michael Balke, Fenglong Jia, Shengquan Xu, and Alexandros Vasilikopoulos for the Sinaspidytes wrasei data. The authors also thank Shanlin Liu, Alexander Donath, Lars Podsiadlowski, Karen Meusemann, and the 1KITE Coleoptera group for providing transcriptome assemblies of 1KITE species. The authors gratefully acknowledge the pre-publication access to gene predictions for Onthophagus taurus and Agrilus planipennis from the Baylor College of Medicine Human Genome Sequencing Center’s i5k pilot project, represented by Stephen Richards. The authors additionally thank Catherine Berney who contributed to the production of the Laparocerus tessellatus transcriptome, Erin D. Scully for providing lists of Anoplophora glabripennis genes upregulated during sugar maple tree feeding, and Nicolas Salamin for the advice on the applied methods and feedback on the manuscript. Some of the computations were performed at the Vital-IT (http://www.vital-it.ch) center for high-performance computing of the Swiss Institute of Bioinformatics.
This work was supported by the United States National Science Foundation (DEB 1355169) and the United States Department of Agriculture (cooperative agreement 8130-0547-CA) to DDM, the Spanish grant CGL2013-42589-P awarded by the MINECO and co-financed by FEDER to BCE, the Science Foundation DFG grant BA2152/11-1, 2, the BGI-Shenzhen, the China National Genebank, and the following Swiss National Science Foundation grants: 31003A_143936 (PI), 31003A_173048 (MRR), PP00P3_170664 (RMW), and PP00P3_172899 (NA). Funding for open access charge: Geneva Natural History Museum.
Availability of data and materials
The datasets generated and/or analyzed during the current study are available from the National Center for Biotechnology Information (NCBI), https://www.ncbi.nlm.nih.gov; Genome Database (genomes and annotated genes); the Transcriptome Shotgun Assembly (TSA) Sequence Database (transcriptomes); and Zenodo (https://zenodo.org). NCBI Genome Database resource identifiers for the species with genome assemblies: Anoplophora glabripennis, ID: 14033 ; Leptinotarsa decemlineata, ID: 12832 ; Onthophagus taurus, ID: 12827 (I5k, unpublished); Agrilus planipennis, ID: 12835 (I5k, unpublished); Dendroctonus ponderosae, ID: 11242 ; and Tribolium castaneum, ID: 216 . NCBI TSA accessions of 1KITE transcriptomes: Cybister lateralimarginalis, GDLH00000000 ; Dineutus sp., GDNB00000000 ; Gyrinus marinus, GAUY00000000 , Haliplus fluviatilis, GDMW00000000 ; Noterus clavicornis, GDNA00000000 ; Sinaspidytes wrasei, GDNH00000000 ; Cicindela hybrid, GDMH00000000 (this study); Calosoma frigidum, GDLF00000000 (this study); Elaphrus aureus, GDPI00000000 (this study); Aleochara curtula, GATW00000000 ; Meloe violaceus, GATA00000000 ; and Stylops melittae, GAZM00000000 . Further details of all 1KITE transcriptomes are given in Additional file 1: Table S4. The Laparocerus tessellatus transcriptome data are available from Zenodo https://doi.org/10.5281/zenodo.1336288 . Custom scripts are available from Zenodo https://doi.org/10.5281/zenodo.2593899 .
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Supplementary results and methods. Details of the results from supplementary analyses and additional details on materials and methods including supplementary Tables S1–S4 and supplementary Figures S1–S10. (PDF 2228 kb)
About this article
Cite this article
Seppey, M., Ioannidis, P., Emerson, B.C. et al. Genomic signatures accompanying the dietary shift to phytophagy in polyphagan beetles. Genome Biol 20, 98 (2019). https://doi.org/10.1186/s13059-019-1704-5