Genomic signatures accompanying the dietary shift to phytophagy in polyphagan beetles

Background The diversity and evolutionary success of beetles (Coleoptera) are proposed to be related to the diversity of plants on which they feed. Indeed the largest beetle suborder, Polyphaga, mostly includes plant-eaters among its ~315,000 species. In particular, plants defend themselves with a diversity of specialized toxic chemicals. These may impose selective pressures that drive genomic diversification and speciation in phytophagous beetles. However, evidence of changes in beetle gene repertoires driven by such interactions remains largely anecdotal and without explicit hypothesis testing. Results To address this, we explored the genomic consequences of beetle-plant trophic interactions by performing comparative gene family analyses across 18 species representing the two most species-rich beetle suborders. We contrasted the gene contents of species from the mostly plant-eating suborder Polyphaga with those of the mainly predatory Adephaga. We found gene repertoire evolution to be more dynamic, with significantly more adaptive lineage-specific expansions, in the more speciose Polyphaga. Testing the specific hypothesis of adaptation to plant-feeding, we identified families of enzymes putatively involved in beetle-plant interactions that underwent adaptive expansions in Polyphaga. There was especially strong support for the selection hypothesis on large gene families for glutathione S-transferase and carboxylesterase detoxification enzymes. Conclusions Our explicit modeling of the evolution of gene repertoires across 18 species identifies adaptive lineage-specific gene family expansions that accompany the dietary shift towards plants in beetles. These genomic signatures support the popular hypothesis of a key role for interactions with plant chemical defenses, and for plant-feeding in general, in driving beetle diversification.


70
Species richness among eukaryotes varies substantially, with some clades having only a few 71 representatives and others comprising hundreds of thousands of extant species. In particular, the class 72 Insecta outnumbers all other classes with more than half of all described extant species (Farrell, 1998;73 Grimaldi and Engel, 2005). Beetles (Coleoptera) encompass approximately 380,000 described species, 74 representing ca. 40% of described insect diversity (Slipinski et al., 2011). Several hypotheses have 75 been proposed to explain this richness, notably their complex interactions with flowering plants (Farrell, 76 1998;Leschen and Buckley, 2007;McKenna et al., 2009McKenna et al., , 2015Zhang et al., 2018) and a high lineage 77 survival rate (Hunt et al., 2007). Nevertheless, detailed supporting evidence from molecular genetic 78 studies remains sparse, making it difficult to assess the relative importance of these and other 79 potentially important contributing factors (Barraclough et al., 1998;Suchan and Alvarez, 2015). 80

81
The remarkable evolutionary success of beetles may have been driven by the interplay between their 82 trophic niche and their genomic content and architecture. This is based on the premise that 83 environmental and ecological conditions are likely to be predominant factors influencing the fate of 84 genetic variation in populations under natural selection (Barrick and Lenski, 2013), eventually driving 85 divergence into distinct species (Seehausen et al., 2014). Among all components of the biotic 86 environment, the trophic niche (principal source of nourishment) of an organism plays a crucial role in 87 shaping the evolution of phenotypic innovations and their underlying genomic changes, e.g. feeding 88 modes in cichlid fishes (Parsons et al., 2016), mouth development in Pristionchus nematodes (Ragsdale 89 et al., 2013), and bitter taste receptors in vertebrates (Li and Zhang, 2014). Among several hypotheses 90 explaining the tremendous diversity among beetles, a shift from an ancestral diet as saprophages 91 (detritus-feeding) or mycophages (fungi-feeding) (Betz et al., 2003) to phytophagy (feeding on living 92 plant material in a broad sense) is often evoked (Farrell, 1998;Leschen and Buckley, 2007;McKenna 93 et al., 2009). While the suborder Adephaga (~45,000 species) comprises mostly predatory species, 94 including ground beetles and diving beetles, the largest beetle suborder, Polyphaga (~315,000), is 95 predominantly comprised of phytophagous clades, among which the most species-rich families are 96 weevils (Curculionidae, ~51,000), longhorn beetles (Cerambycidae, ~30,000), and leaf beetles 97 (Chrysomelidae, ~32,000) (Slipinski et al., 2011). Phytophagy appeared approximately 425 million 98 years ago, quickly after terrestrial life was established (Labandeira, 2002). It progressively diversified 99 to target most plant tissues (Labandeira, 2013), shortly before the radiation of flowering plants 120-100 100 million years ago (Grimaldi, 1999). In response, plants have evolved diverse strategies to protect 101 themselves, which in turn impose selective pressures on the animals that feed on them. 102 103 While many biological processes are likely to play a role in this evolutionary battle, a key weapon in the 104 arsenal of phytophagous insects' adaptations is their ability to neutralize or minimize the effects of plant 105 secondary compounds. Protein families known to be crucial for eliminating harmful plant toxins are 106 cytochrome P450 monooxygenases (P450s), carboxylesterases (CEs), UDP-glycosyltransferases 107 (UGTs), and glutathione S-transferases (GSTs) (Voelckel and Jander, 2014). While P450s and CEs 108 modify residues to make compounds more hydrophilic, UGTs and GSTs conjugate xenobiotic 109 compounds to hydrophilic molecules. Detoxification is completed by membrane transporters, such as 110 ATP-binding cassette (ABCs) transporters, which move xenobiotic compounds to where they can either 111 be excreted, or less frequently sequestered in order to be reused as a defense mechanism (Voelckel 112 and Jander, 2014). Additionally, to prevent phytophagous insects from digesting their tissues, plants 113 produce enzyme inhibitors that block catalytic sites or compete with the substrates of enzymes involved 114 in digestion. The major families affected are endopeptidases, such as cysteine, and serine proteases, 115 as well as more specific enzymes such as glycoside hydrolases (GHs), certain types of which are able 116 to break down polysaccharide molecules, including cellulose, hemicellulose and pectin in plant cell walls 117 (McKenna et al., 2016;Pauchet et al., 2010). Other adaptations to phytophagy include repertoires of 118 chemoreceptors that are crucial for finding appropriate food sources (Goldman-Huertas et al., 2015), 119 and the specialization of mouthparts in response to plant mechanical barriers, which are highly 120 diversified in insects (Labandeira, 1997). 121

122
As lineages diverge, their genomes accumulate changes, some of which are expected to be directly 123 linked to functional adaptations. Identifying such genomic features and linking them to phenotypic 124 differences, whilst robustly distinguishing between the effects of stochastic changes and natural 125 selection (Hurst, 2009), is critical to deciphering the genomic drivers of species radiations (Shaw and 126 Lesnick, 2009). Changes include point substitutions, which may affect existing functional elements, but 127 also larger-scale changes such as duplications, from individual genes to entire genomes, which by 128 adding new members to the repertoires of key gene families may constitute an ideal mechanism to 129 facilitate the emergence of novel functions leading to successful phytophagy (Kondrashov, 2012). 130 Whereas newly generated gene copies are usually redundant or deleterious and pseudogenized, 131 rendering the gene copy non-functional (Innan and Kondrashov, 2010), they are sometimes maintained. 132 Particularly interesting cases of gene family expansions are the ones restricted to specific lineages, 133 resulting in lineage specific expansion (LSE). Evolutionary mechanisms causing LSE are numerous 134 and not all adaptive (see Innan and Kondrashov, 2010 for a comprehensive review). However, 135 duplicated gene copies may provide an immediate selective advantage and be maintained by selection. 136 This can be due to an increased dosage of the gene, or to changes following the duplication being 137 selected in one gene copy but not the other, which might allow evolution towards a different function in 138 so-called neo-functionalization processes. Enzymes are considered particularly relevant candidates for 139 such evolutionary processes as they could expand their range of substrates (Francino, 2005). 140

141
Here we apply a comparative genomics approach to examine the evolution of genes putatively involved 142 in plant-insect interactions by sampling from the two largest beetle suborders, which, generally-143 speaking, present contrasting trophic niches. We contrast exemplars from the characteristically 144 predaceous Adephaga with exemplars from Polyphaga and we hypothesize that plant-insect 145 interactions during the dietary shift to phytophagy should be accompanied by genomic evolutionary 146 signatures visible at the subordinal scale. Using genomic and transcriptomic data from 18 beetle 147 species, we estimate ancestral gene family content, taking into account gene gains and losses across 148 the species phylogeny, to identify significant LSEs of gene families related to phytophagy and 149 signatures of adaptive expansions in these families. Ignoring sensory receptors, as their evolution might 150 be driven by agents other than those related strictly to trophic niche (Brito et al., 2016), and 151 morphological genes, as their inferred association with diet is less robust, we focus on genes coding 152 for enzymes, for which adaptive LSE specific to Polyphaga would suggest a role for detoxification and 153 digestive pathways in driving adaptation and speciation. 154

155
A representative sampling of the two major coleopteran suborders 156 Reliable estimation of gene gain and loss events requires a robust evolutionary framework, i.e. a 157 phylogeny that includes the species studied, as well as the characterization of gene families across 158 complete gene sets from these same species. To study adaptation to phytophagy, we sampled from 159 both Adephaga (mostly predaceous) and Polyphaga (with diverse trophic habits, including a very large 160 number of phytophagous species). A balanced sampling of each suborder was achieved comprising 161 twelve transcriptomes and six genomes, with Benchmarking Universal Single-Copy Ortholog (BUSCO) 162 completeness estimates Waterhouse et al., 2018) ranging from 71.9% to 97% 163 ( Figure 1, Table 1). The species phylogeny was estimated using 405 BUSCO genes found to be 164 complete in all species and the strepsipteran outgroup, Stylops melittae ( Figure 1). Protein-coding 165 sequence predictions ranged from 9,844 to 24,671 genes per beetle species. These sequences 166 matched 14,908 Arthropoda orthologous groups (OGs) containing at least one species of Coleoptera 167 in the OrthoDB v8 catalogue . This represented a minimum of 6,742 and a 168 maximum of 11,149 OGs for Carabus frigidus and Leptinotarsa decemlineata, respectively. OGs 169 containing genes from only one of the two sampled suborders were excluded, resulting in a total of 170 9,720 OGs for the analysis that have evolutionary histories traceable to the last common ancestor of 171 beetles. Functional annotations of the sequences within these OGs were used to identify and assign 172 several of them to enzyme families relevant to the tested hypothesis. These candidate OGs comprised 173 four UGTs, 22 P450s, 19 CEs, six GSTs, four SERs, seven CYSs, 28 ABCs, and one GH, for a total of 174 91 candidate OGs from eight families of genes (i.e. functional categories) ( Table 2). 175 proteins, and number of orthologous groups (OGs) with genes from each species. The outgroup species used in the phylogeny, Stylops melittae, belongs to the order 178 Strepsiptera, which is the sister group of Coleoptera (Niehuis et al., 2012). 179  186 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made

Polyphaga exhibit more frequent gains across a larger set of OGs 187
Analysis of per-species gene counts of the complete set of 9,720 OGs was performed with the 188 Computational Analysis of gene Family Evolution (CAFE v3) (Han et al., 2013) tool. The mode 189 considering distinct gene gain (λ=0.0019 gain/gene/million years) and gene loss (µ=0.0018 190 loss/gene/million years) was preferred over a single value for λ and µ, having a significantly greater 191 maximum likelihood score (see Methods). The λ (gain) and µ (loss) values predicted when CAFE was 192 run on each suborder separately were λ=0.0020 and µ=0.0027 for Adephaga, versus λ=0. chi-square test, p-value < 0.0001) compared with just 9.3% of non-candidate OGs. 206 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. ; https://doi.org/10.1101/399808 doi: bioRxiv preprint

Signatures of adaptive expansion are more prevalent in Polyphaga 227
All 910 OGs with significant variations in their gene content were tested for signatures of adaptive 228 expansion in each suborder, by comparing Brownian motion (BM, neutral) to Ornstein-Uhlenbeck (OU, 229 selective pressure) evolutionary models (Beaulieu et al., 2012). As mentioned previously, these 230 included 26 OGs that belong to one of the functional categories listed in Table 2 ("candidate" OGs). The 231 models consider per-species gene count as a trait that can evolve towards a value, which may or may 232 not differ between the two suborders and may or may not be guided by selective pressure; we call this 233 the "optimum" value in models integrating selection. In total, 21 OGs displayed a higher optimum for 234 Adephaga (0.2% of the initial 9,720 OGs) and 88 for Polyphaga (0.9%). Eight of these 88 polyphagan 235 OGs (Table 3 and   in Polyphaga were GSTs (3/6 positive tests, fdr-corrected p-value < 1e-09) and CEs (3/19, fdr-corrected 246 p-value < 0.005), as shown in detail in Table 4. 247 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. species optimum is 11.69 vs. 6.85 for Adephaga (blue labels), see Table 3). The presence of several 253 clades of polyphagan and adephagan genes delineates duplication events following the divergence of 254 the two suborders. Encircling the gene labels are red bars that highlight polyphagan clades with 255 bootstrap support of >50% and yellow bars that highlight intra-specific duplications with bootstrap 256 support of >50%. Corresponding full names of species are given in Table 1. Branch lengths represent 257 substitutions per site and bootstrap support below 50% is not displayed. 258 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. ; https://doi.org/10.1101/399808 doi: bioRxiv preprint . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. ; https://doi.org/10.1101/399808 doi: bioRxiv preprint Comparative genomic analyses often highlight expanded gene families and link these expansions to 287 biological functions peculiar to, or of special interest in, their focal organism(s). However, these analyses 288 usually do not explicitly test for any hypothesized evolutionary model that might support such links. Here 289 we test a specific hypothesis of adaptation to a phytophagous diet, by comparing candidate gene family 290 repertoires from nine adephagan (a mostly predaceous suborder) and nine polyphagan (a highly 291 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. ; https://doi.org/10.1101/399808 doi: bioRxiv preprint phytophagous suborder) beetle species. These candidate families are putatively involved in 292 detoxification of plant allelochemicals and digestion of plant tissues. Specifically, we identify evidence 293 for potentially adaptive gene family expansions in the species rich Polyphaga. This result is robust to 294 potentially confounding factors that could arise from combining genomic and transcriptomic datasets, 295 conservative definitions of candidate gene families, or the greater species richness of the Polyphaga 296 (see discussion points below and Supplementary Information). Through explicitly testing for adaptive 297 LSEs, these results offer conclusive supporting evidence for the key evolutionary role of the 298 phytophagous trophic niche in driving gene family expansions in Coleoptera (specifically Polyphaga), a 299 feature that likely facilitated adaptation of polyphagan beetles to specialized plant feeding. 300 301

Dataset heterogeneity 302
For the comparison of gene repertoires between the two groups to be unbiased, the gene content of all 303 analyzed species should be of similar accuracy and completeness. The number of predicted proteins 304 for the genomic resources for each beetle species (Table 1, mean 15,977 and standard deviation 3,748) 305 was within the range expected of insects (see Waterhouse, 2015). The average total gene count for 306 Adephaga species (all transcriptomes) was about 4,200 fewer than for Polyphaga, which include two 307 genomes with more than 22,000 genes. This difference in average gene counts is reduced to just 1,384 308 when considering only genes assigned to the 9,720 OGs selected for the analysis. Our conservative 309 orthology filtering therefore ensured that the comparisons focused on gene families with reliably 310 traceable evolutionary histories that span both groups of beetles. Secondly, assessments of 311 completeness showed that the majority of the datasets contained more than 90% of complete BUSCOs 312 ( Figure 1, Table 1). While the dynamically evolving families that are the focus of this study are clearly 313 not universal single-copy orthologs, the high levels of BUSCO completeness support the assumption 314 that the datasets represent good coverage of the species' gene content. Re-analyses of our data that 315 exclude the two adephagan beetle species with fewer than 80% complete BUSCOs reduced the power 316 of the model tests but nevertheless still identified the three GST OGs that favor a model with a higher 317 optima for Polyphaga (see Supplementary Results). Three of the adephagan transcriptomes showed 318 more than 10% of duplicated BUSCOs, which could have arisen from suboptimal filtering of the 319 transcriptomes, i.e. failure to remove alternative transcripts of the same gene. While such potentially 320 inflated gene counts for these adephagans might prevent the identification of some true expansions in 321 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The annotation strategy was designed to link OGs to candidate gene families based on manually 330 selected keywords used to filter sequence search results, as well as Pfam and InterPro identifiers (Table  331 2), with the aim of excluding false positives (see Methods). This conservative strategy may not have 332 fully captured all possible candidate OGs, which would therefore have remained in the background set 333 of OGs that were used as controls. For example, we identified nine GST OGs (six were retained as 334 candidates after filtering) while ten subclasses have been identified in arthropods (Roncalli et al., 2015). 335 While the strict (conservative) strategy we employed to identify candidate OGs may have resulted in an 336 underestimate of the extent of the observed effects, this does not invalidate those that were identified. 337 In addition, filtering the OGs to retain only those with genes from both Adephaga and Polyphaga 338 excluded from the analyses any genes that were specific to either suborder. These might include genes 339 with key roles in phytophagy, e.g. enzymes acquired by horizontal gene transfer identified from the A. 340 planipennis, A. glabripennis, and D. ponderosae genomes (McKenna et al., 2016). While 341 acknowledging their importance, here we explicitly tested for adaptive LSE in one lineage versus the 342 other so gene evolutionary histories were required to span the two suborders and thus be traceable to 343 their last common ancestor. 344

345
The more speciose Polyphaga exhibit more dynamic gene repertoire evolution 346 The µ and λ values reported by CAFE on all 9,720 OGs are consistent with assessments of other insect 347 clades (Hahn et al., 2007;Neafsey et al., 2015). Although the overall gain rate is slightly higher than 348 the loss rate, the number of OGs losing genes reported by CAFE at each individual node is generally 349 larger than the number of OGs with gains. This is reconciled by considering that across Coleoptera 350 many OGs lost a few genes while few families gained many genes. As most OGs display a low number 351 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. ; https://doi.org/10.1101/399808 doi: bioRxiv preprint of genes per species, i.e. they are evolving under 'single-copy control' (Waterhouse et al., 2011), losing 352 more than one ortholog per species is understandably rare, while there is no theoretical limit for an OG 353 to gain new members. Comparing the two clades, Polyphaga has a higher rate of gene gain and six 354 times more OGs with gains, and while the Adephaga rate of gene loss is higher, Polyphaga have 1.5 355 times more OGs that have experienced gene losses. Hence, the gene repertoires of Polyphaga exhibit 356 a more dynamic evolutionary history with more gains (rate) in more OGs (counts) and fewer losses 357 (rate) spread out over more OGs (counts). It is possible that this greater dynamism may be generally 358 linked to the greater species richness of Polyphaga, with no specific role for phytophagy underpinning 359 this trend. However, among the candidate OGs for detoxification and digestion there are also more 360 gains in Polyphaga, and, in contrast to the background, fewer losses. Thus, both gain and maintenance 361 are higher for candidates in Polyphaga, which is consistent with a key role for phytophagy in driving 362 dynamic gene repertoire evolution, and particularly LSEs. 363 364

Evidence for adaptive expansions of gene families involved in detoxification in polyphagan beetles 365
In addition to observing more expansions among candidate OGs in the suborder Polyphaga, the positive 366 results from the OUwie analysis support the hypothesis that selective pressures drive detoxification 367 enzymes towards larger gene family sizes. This is especially pronounced for GSTs, for which half of 368 the OGs tested positive and for which a significant enrichment compared to the background was found. 369 The importance of GSTs in dietary shifts to phytophagy has been noted in mustard-feeding flies, where 370 duplicated GSTs involved in the mercapturic acid pathway showed signatures of positive selection 371 (Gloss et al., 2014). Our results therefore suggest that comparable phenomena have been acting at the 372 level of polyphagan beetles. The CEs also show a statistically significant enrichment for positive results 373 compared to the background, further supporting the diet detoxification hypothesis. The other positive 374 results include a P450 OG and a CYS OG, neither of which led to a category enrichment compared to 375 the background. The P450 OG is by far the largest among the positive results (Supplementary Figure  376 3), highlighting the importance of P450s in beetle (and generally insect) physiology with diverse roles 377 beyond detoxification, e.g. hormone biosynthesis (Kong et al., 2014). However, while considered as 378 significantly expanded and under selection by our model, the actual mean values in the suborders are 379 not dramatically different. Importantly, the enrichment of positive results among candidates still holds if 380 this OG is excluded (see Supplementary Results). The involvement of P450s in many other processes 381 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. ; https://doi.org/10.1101/399808 doi: bioRxiv preprint may explain why a broader difference between the suborders was not identified. Apart from one positive 382 result among the cysteine proteases (no significant category enrichment), our study did not highlight 383 additional expansions in other digestive enzymes or in transporters within a suborder. The lack of 384 evidence for expansion in Polyphaga with respect to ABC transporters, which is the candidate functional 385 category encompassing the highest number of OGs, may indicate that the ancestral diversity of 386 transporters was sufficient for maintaining the excretion of toxins, despite variations in the substrates 387 imposing a selective pressure on early stages of the detoxification pathway. Alternatively, if such 388 pressure were acting on later stages of the pathway, i.e. transporters, its strength could have been too 389 low for the detection power of our methods and data, unlike for GSTs or CEs.  Misof et al., 2014, Peters et al., 2017 to retain the best-scoring entry among overlapping predictions. The coding sequences and peptide 430 sequences from the genomes were retrieved from their official annotated gene sets. All genome and 431 transcriptome gene sets were assessed using BUSCO (v2.0, python 3.4.1, dataset insecta_odb9/2016-432 10-21, mode proteins) (Waterhouse et al., 2018). This tool identifies near-universal single-copy 433 orthologs by using hidden Markov model profiles from amino acid alignments. CD-HIT-EST v4.6.1 (Li 434 and Godzik, 2006) was run on the protein sequences with a 97.5 percent identity threshold to ensure 435 that all species datasets were filtered to select a single isoform per gene. 436 437

Orthology delineation 438
The OrthoDB  hierarchical orthology delineation procedure was employed to 439 predict orthologous protein groups (OGs). Briefly, protein sequence alignments are assessed to identify 440 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. ;https://doi.org/10.1101/399808 doi: bioRxiv preprint all best reciprocal hits (BRHs) between genes from each pair of species, which are then clustered into 441 OGs following a graph-based approach that starts with BRH triangulation. The annotated proteins from 442 the genomes of A. planipennis, O. taurus and all transcriptomes were mapped to OrthoDB v8 at the 443 Arthropoda level (with 87 species including four of the beetles with sequenced genomes). Mapping 444 uses the same BRH-based clustering procedure but only allows genes from mapped species to join 445 existing OGs. These OGs were then filtered to identify the 9,720 OGs with representatives from both 446 Polyphaga and Adephaga to focus the study on OGs with evolutionary histories traceable to the last 447 common ancestor of all the beetles, i.e. 5,188 OGs with genes from only one of the two suborders were 448 removed. 449 450

Species phylogeny 451
To build an ultrametric phylogeny required for the CAFE analyses, the maximum likelihood molecular 452 species phylogeny was first estimated based on the concatenated superalignment of orthologous amino 453 acid sequences from each of the datasets. Protein sequences of single-copy BUSCO genes and the 454 best-scoring duplicated genes present in all species were individually aligned for each set of BUSCO-455 identified orthologs using MAFFT with the --auto parameter (Katoh and Standley, 2013) and each result 456 was manually reviewed to exclude poor-quality alignments. Four hundred and five alignments were 457 retained out of 436 and concatenated into a superalignment, partitioned according to the best model 458 for each set of orthologs using aminosan 1.0.2015.01.23 (Tanabe, 2011). RAxML v8.1.2 (-f a -m 459 PROTGAMMA -N 1000) (Stamatakis, 2014), and used to compute the maximum likelihood tree. The 460 monophyly of Geadephaga and Hydradephaga was constrained to match the generally accepted 461 resolution of Adephaga (as in McKenna et al., 2015). The chronos function of the R package ape (v3.4 462 on R 3.2.1, relaxed model) (Paradis et al., 2004) was used to obtain an ultrametric tree and the tip to 463 root length was adjusted to match the approximately 250 million year evolutionary history of crown 464 group Coleoptera (McKenna et al., 2015). 465 466 Functional annotation and definition of candidate genes 467 InterProScan was run on all species protein sets (-appl Pfam --goterms, 5.16.55) (Jones et al., 2014) 468 to identify protein families. Additionally, blastp 2.3.0 (Altschul et al., 1997;Camacho et al., 2009) was 469 run against uniref50 (version Jun 22, 2016;Suzek et al., 2015) with an e-value cut-off of 1e-20. An OG 470 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. ; https://doi.org/10.1101/399808 doi: bioRxiv preprint was included in the set of candidate OGs when it had a match to both the uniref50 clusters and Pfam 471 families (Finn et al., 2016) or gene ontologies (The Gene Ontology Consortium, 2017) as detailed in 472 The number of genes in OGs for each species were counted. All candidates and remaining (control) 476 OGs were pooled together and processed with CAFE 3.1 (Han et al., 2013), to infer gene family 477 evolution in terms of gene gains and losses. First, the python script provided by CAFE was used to 478 estimate the error in our dataset. The CAFE software was then run using the mode in which the gain 479 and loss rates are estimated together (λ) and a second mode in which they are estimated separately 480 (gains=λ, losses=µ). The more complex model was retained as it reached a significantly better score (-481 199,989 for a single estimated parameter and -199,981 for two distinct estimated parameters, 2x delta 482 log-likelihood = 16, chi-squared distribution, df=1). For the entire analysis, the CAFE overall p-value 483 threshold was kept at its default value (0.01). To run CAFE on each suborder separately, the newick 484 file was pruned to retain only required species using newick utils 1.1.0 (Junier and Zdobnov, 2010). 485

Evolutionary models 487
To evaluate adaptive OG expansion, the likelihood of the count data was tested by optimizing 488 parameters considering two methods provided by the OUwie R package v1.51 (Beaulieu and O'Meara, 489 2016;Beaulieu et al., 2012). First, a Brownian motion (BM) approach was used, which assumes no 490 selection and thus differences result from a stochastic process whose rate is estimated. Second, 491 Ornstein-Uhlenbeck (OU) models were used. They take into account an optimal family size which is 492 obtained by selective pressure. Two groups were defined in the phylogeny, namely Polyphaga and 493 Adephaga, to which the two different regimes to consider were assigned, plus a third regime to the root. 494 (H1). The Akaike information criterion corrected for sample-size (AICc) (Hurvich and Tsai, 1989) was 501 used to compare models and an AICc>2 between the best H0 and the best H1 model was considered 502 as significant to prefer the H1 model. 503

Statistical enrichment 505
All results for candidates and controls were pooled together to obtain a background distribution of 506 positive and negative results. Positive results are those OGs that passed the OUwie analysis, and 507 negative results are all of the 9,720 OGs that did not obtain a significant overall CAFE p-value or did 508 not pass the OUwie analysis. Then, 100,000 random draws (using the R function sample, without 509 replacement) having the sample size of the candidate category to test for enrichment were taken from 510 the background and the significant outcomes for Polyphaga and Adephaga were counted. A p-value 511 was calculated for each group as follows: the number of random draws reaching the amount of 512 significant outcomes found for the candidate category, or more, divided by 100,000. Additionally, the 513 multiple tests conducted on each individual candidate category were corrected for false discovery rate 514 (FDR) using the R p.adjust function (method BH, Benjamini Hochberg). 515 516

Gene trees 517
The alignments for the gene trees were produced using MAFFT with the --auto parameter. The gene 518 trees were computed with RAxML v8.1.2 (-f a -m PROTGAMMALGF -N 100) and plotted with EvolView 519 (He et al., 2016;Zhang et al., 2012). 520 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. NA wrote the manuscript, with input from all authors. 548 . CC-BY 4.0 International license available under a not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made The copyright holder for this preprint (which was this version posted August 26, 2018. ; https://doi.org/10.1101/399808 doi: bioRxiv preprint