A depauperate immune repertoire precedes evolution of sociality in bees

Sociality has many rewards, but can also be dangerous, as high population density and low genetic diversity, common in social insects, is ideal for parasite transmission. Despite this risk, honeybees and other sequenced social insects have far fewer canonical immune genes relative to solitary insects. Social protection from infection, including behavioral responses, may explain this depauperate immune repertoire. Here, based on full genome sequences, we describe the immune repertoire of two ecologically and commercially important bumblebee species that diverged approximately 18 million years ago, the North American Bombus impatiens and European Bombus terrestris. We find that the immune systems of these bumblebees, two species of honeybee, and a solitary leafcutting bee, are strikingly similar. Transcriptional assays confirm the expression of many of these genes in an immunological context and more strongly in young queens than males, affirming Bateman’s principle of greater investment in female immunity. We find evidence of positive selection in genes encoding antiviral responses, components of the Toll and JAK/STAT pathways, and serine protease inhibitors in both social and solitary bees. Finally, we detect many genes across pathways that differ in selection between bumblebees and honeybees, or between the social and solitary clades. The similarity in immune complement across a gradient of sociality suggests that a reduced immune repertoire predates the evolution of sociality in bees. The differences in selection on immune genes likely reflect divergent pressures exerted by parasites across social contexts.


Background
Group living confers many benefits (for some examples see [1][2][3][4]) and highly social insects such as ants -epitomes of a highly organized animal society -have risen to ecological dominance in many ecosystems of the world [5]. But group living is also associated with costs. Parasites present an enhanced risk to social animals, as large group size [6], high density, and often close relatedness among individuals increases the exposure and spread of infectious diseases (for example, [7][8][9][10][11][12][13][14]; but see [15]). On the continuum of sociality, eusocial insects are an extreme, forming dense colonies with often very highly related individuals (up to an average coefficient of relatedness of r = 0.75), where individuals perform specific functions within the group, at its simplest specializing as reproductive and worker castes. Given a generally higher risk of disease in social insect colonies, it is surprising that complete genome sequencing revealed that honeybees (Apis mellifera) had approximately only one-third as many immune genes as the two existing genomic model insect systems at the time, Drosophila melanogaster and Anopheles gambiae [16]. Honeybee biology differs from these model species in several ways, which may partly explain the striking difference in immune genome organization among these taxa. For instance, honeybees have a suite of hygienic behaviors where they groom both themselves and others, and live on food (pollen and nectar) that is also relatively clean (not withstanding the fact that food-borne diseases have been described in honeybees, for example, [17,18]). The observation that ant genomes also have few immune genes [19] indicates that this deficiency may be a more general characteristic of social hymenoptera and not primarily an artifact of honeybee breeding [20]. Sociality may instead typically allow for group-based defenses ('social immunity' [21]) that should reduce selective pressures on the evolution and maintenance of immune genes. Given the recent and dramatic declines in populations of important bee pollinators [22][23][24] and the role Figure 1 Diagram of the classical insect immune responses to parasites: Toll, IMD/JNK, JAK/STAT pathways and the melanization and antiviral RNA interference responses. Colors of the genes indicate evidence of selection as detected by either positive selection (across the four taxa phylogeny, on the branch between Bombus and Apis, the branch leading to Bombus, Apis, or Megachile) in red, or differences in selection between Bombus and Apis (yellow), or between the social and solitary clades (blue). More complete information about selection on these genes can be found in Additional files 8, 9, 10 and 11. *PGRP-LF is only found in B. impatiens. **PGRP-SC2 is not among the automated predictions for B. terrestris, although sequence in the trace archive suggests that it is present. We also detect expression of PGRP-SC2 in B. terrestris. AMP, anti-microbial peptide.
of parasites in some of these declines (for example, [23,25,26]), understanding the architecture of the immune system of bees in relation to other insects is increasingly important.
Bumblebees (genus Bombus) are essential natural and commercial pollinators and have been declining due to anthropogenic disturbances, including habitat destruction and fragmentation (reviewed in [24,27]), but also due to introduced competitors [28,29], and more recently pesticides [30,31] and parasites [24,26,32,33] have been implicated as important drivers of declines. Bumblebee declines are of both ecological and practical importance as they contribute substantially to human food crops either directly [24,27,34] or as part of a community of wild pollinators that are supplemented by managed honeybees [35]; therefore, they also aid the maintenance of plant diversity [36]. Among Bombus species, Bombus impatiens, and Bombus terrestris, both key commercial and natural pollinators, have been most extensively studied, in particular for host-parasite interactions [37][38][39][40]. These two species occupy comparable niches in North America (B. impatiens) and Europe (B. terrestris). They last shared a common ancestor approximately 18 million years ago [41].
While B. terrestris and B. impatiens share ecological factors, such as diet, with honeybees, they differ from the latter in colony organization, sociality, longevity, and mating system. Bumblebees, including B. terrestris and B. impatiens, are less advanced in their sociality than honeybees, as the physiological and morphological difference between queens and workers is not as pronounced, division of labor is weak, and colonies are much smaller (dozens or hundreds instead of thousands of workers) and very simply organized [42]. Bumblebee colonies as a whole are also shorter-lived than those of honeybees, with bumblebee queens living for one year but the colony persisting for only a few months, whereas honeybee queens and their colonies can live for several years. Like most bumblebee species, B. terrestris queens mate singly and B. impatiens queens mate singly or occasionally doubly [43], whereas Apis queens mate with between 10 and over 100 males [44][45][46][47]. This has important consequences for disease susceptibility as both multiply mated honeybees [48] and B. terrestris [49] that were artificially inseminated with sperm from multiple males produce colonies with lower parasite loads than colonies from singly mated queens.
All of these differences may have profound consequences for the evolution of their immune systems. Here, using the recently sequenced complete genomes of both the North American B. impatiens and the European B. terrestris we explore patterns of immune system evolution across a social gradient by comparing the immune repertoire and sequences of immune genes of these two species of bumblebees with those of two species of highly social honeybees and the solitary leaf-cutting bee Megachile rotundata.

Immunological repertoire
Regardless of social organization, all bee species examined shared a core set of immune genes, including all members of the canonical immune pathways (Figure 1) with only minor differences in gene numbers ( Figure 2). We found no relationship between the degree of sociality and the total number of canonical immune-related genes. With regard to important immune response effectors, such as anti-microbial peptides (AMPs), both Bombus species Figure 2 Number of genes belonging to 27 families of immune genes from OrthoDB. The colors in this heatmap reflect the number of genes in that category relative to the other species. Numbers with asterisks were manually adjusted according to our annotation efforts or the literature. The tree represents a clustering analysis using Euclidean distances based on the number of genes within these groups.
Homology searches for bumblebee serpins against sequences of other members of the superfamily Apoidea identified single orthologs for the eusocial honeybee A. mellifera, and the solitary leafcutter bee M. rotundata. Outside of the Hymenoptera, these serpins shared sequence similarity with serpin-1 (alaserpin) of the lepidopteran Manduca sexta.
We also find what appears to be a homolog of the apoptosis-involved caspase decay (Figure 4). There also appears to be a Hymenoptera-specific clade of caspases that share the most homology with Ice in Drosophila. We find an additional PGRP (peptidoglycan receptor protein) in B. impatiens (XP_003487752), which is missing in B. terrestris and A. mellifera. On the genomic sequence, this novel PGRP is immediately downstream of XP_003487751, which is homologous to XP_003400160 in B. terrestris and XP_392452 in A. mellifera, likely from tandem duplication.

Immunological expression
We used quantitative PCR to determine whether 27 candidate immune genes (Table 1) were functionally expressed in B. terrestris, including if they were differentially expressed following exposure to Gram-negative or Grampositive bacterial cues. We measured expression in gynes and males also to investigate sex-specific patterns. All surveyed genes were actively expressed in both gyne and male B. terrestris, including the novel serpins (serpin 3/4 A, serpin 3/4 B, alaserpin) and the decay homolog. Both sex (Tables 2 and 3; Figure 5) and treatment ( Figure 6) significantly influenced expression of this battery of genes and the different sexes responded differently to the treatments as revealed by the sex*treatment interaction (Tables 2 and 3; Figure 6). The recognition receptors beta-glucan receptor protein 2 (BGRP2) and peptidoglycan receptor protein SA (PGRP-SA) were more strongly expressed in queens than males, but BGRP1 and PGRP-LB had male-biased expression. PGRP-SC2 was more strongly expressed in queens but was also upregulated in queens given the challenge whereas males downregulated this gene upon challenge. All antimicrobial peptides (AMPs) were more strongly expressed in queens than males and most were induced upon challenge and induced more dramatically in queens. The effectors lysozyme, transferrin, the signaling transducer relish, antiviral genes argonaute and aubergine, and melanization related genes PPO and punch follow a similar pattern with queen-biased expression and greater induction of expression when there was a significant treatment by sex interaction. An exception to this general pattern is the serpin 27a, which inhibits melanization. Queens had lower expression of this gene and the expression appears to be reduced upon bacterial exposure. Males did not reduce their expression of serpin 27a as intensely as the queens did.

Signatures of selection
While we did not identify any pattern of immune gene numbers varying with sociality, we did find variation in the evolution of these immune genes both between the highly social Apis clade and the less social Bombus clade, and between the solitary Megachile and the broader social clade containing Apis and Bombus. Globally, the ratio of non-synonymous to synonymous substitutions was ω = 0.12 ± 0.01 (mean ± standard error of the mean) and ω = 0.10 ± 0.01 in the four-and five-taxa M0 analyses, respectively (Additional files 1 and 2), although ω differed dramatically across ortholog groups (range 0.367 to 0.0001, 0.30 to 0.0047; Additional files 3, 4, 5, 6, and 7). The orthologs of dscam (down syndrome cell adhesion molecule), three antiviral proteins (aubergine, argonaute 2, and rm62) a trypsin-like serine protease that is homologous to the scavenger receptor tequila, a CLIP serine protease orthologous to CG4998, the peroxidase cardinal, a . Across the whole five-taxa phylogeny we again found evidence for positive selection on argonaute 2, dscam, ark, and the CG4998 ortholog, and additionally found positive selection on a second CLIP serine protease without a clear D. melanogaster ortholog but which is similar to CG11843 and snake, which are involved in toll signaling [50] ( Table 5). In the branch leading to Apis the small RNA regulatory or anti-viral gene drosha, and the RNA helicase rm62, which has been implicated in both RNA interference [51] and bacterial response [52], the bacterial recognition gene BGRP1, a serine protease inhibitor, the caspase ark and IMD of the IMD pathway, are under selection (Table 6). On the branch leading to Bombus we find evidence of selection on argonaute 2, rm62-F (which is also an RNA helicase but has not been directly linked to immune responses), and the toll-7 receptor, which has been implicated in viral defenses [53]. We also find evidence of selection on a number of members of the toll pathway, including dorsal, myd88, and BGRP1, which recognizes bacterial pathogens and initiates toll pathway signaling. Domeless, the receptor of the JAK/STAT pathway, had the most sites showing evidence of selection while dorsal showed stronger evidence of positive selection but across fewer sites. Two catalases, ark and catalase, a serpin and a scavenger receptor, snmp1, also appear to be under selection in bumblebees (Table 7; Figure 9). A number of genes show evidence of different selection between honeybees and bumblebees ( Figure 9; Table 8; Additional file 8), including dorsal, spaetzle, and tube, all from the toll pathway, a nimrod gene, argonaute 2, a number of serpins, and dscam. Considerably more genes differ in selection between the social and solitary clades ( Figure 10; Additional file 9) perhaps in part due to the difference in time since sharing an ancestor with both Bombus and Apis. However, genes that exhibit signs of different selection within the social clades (upper diagonal in Figure 10) are likely more robust than those showing signs of selection only in the solitary M. rotundata (lower diagonal) as the genes that appear to be evolving rapidly in the solitary group might be inflated due to the disproportionate phylogenetic distance of M. rotundata to the Apis and Bombus clades. A summary of genes for which we found evidence of selection and according to which selection model is provided in Additional file 10 (four taxa: Bombus and Apis) and Additional file 11 (five taxa: Bombus, Apis, and Megachile).

Discussion
We find that the genomes of both species of bumblebee encode a remarkably similar repertoire of immune-related genes to the honeybee A.mellifera and solitary leafcutting bee M. rotundata ( Figure 2). All the components of the major immune pathways, Toll, IMD, JAK/STAT, JNK, and the antiviral machinery are present in both Bombus species. Furthermore, the subset of these genes that we surveyed are detectibly expressed and many are induced Transformed expression values (dCt) were treated as dependent on the sex of the bees (male/queen) and the treatment they received (naïve, sterile Ringers solution injection, injection with Arthrobacter globiformis, or injection with Escherichia coli). -, P > 0.1;~, P < 0.1; *P < 0.05; **P < 0.01; ***P < 0.001; full statistics can be found in Additional file 13.
upon immune activation. Indeed, these immune genes are expressed in a sex-specific fashion as predicted by Bateman's principle of greater investment into maintenance for the choosier sex, usually females [54]. The sex differences in expression appear to be independent of gene dose since the expression of housekeeping genes was not significantly different between haploid males and gynes.
Overall, the number of immune genes is very consistent among the sequenced bees regardless of their degree of sociality, that is, from solitary (Megachile) to primitive (Bombus) and higher (Apis) eusociality ( Figure 2). Primitive eusociality evolved about 87 million years ago in corbiculate bees [55], whereas higher sociality evolved in the Apini and Meliponi/Bombini with sociality being presumably secondarily lost in the Euglossini [55]. According to our results, the solitary bee M. rotundata, which split from the Apidae some 105 million years ago, has a comparable number of immune genes to honeybees and both Bombus species. These results suggest that the immune repertoire of A. mellifera, which was described as depauperate relative to dipterans, is probably a characteristic of bees more generally and predates the evolution of sociality and certainly existed before advanced eusociality in bees, and perhaps even as far back as before the split with ants [20]. Therefore, the relatively limited immune repertoire of honeybees does not seem to be the result of the transition to sociality and the associated behavioral adaptations for social immunity as suspected before [16]. An intriguing but purely speculative thought is that, rather than sociality reducing the need for immune genes, reduced immune complexity may have facilitated (for example, by way of easing the self/foreign distinction) or empowered (by way of allowing for social defenses) the evolution of social groups in the first place.
Both Bombus species have a small expansion of serpins ( Figure 3). These serpins appear similar to the silkworm moth B. mori's antitrypsin, which is involved in prophenoloxidase (PPO) regulation and is upregulated upon fungal infection [56]. We confirmed that these serpins are expressed in B. terrestris when challenged and are thus likely functional. The honeybee homolog seems to have a mutation within the binding region PS00284, which does not conform to the consensus pattern of this active site. It is unclear whether this gene in honeybees is a functional serpin. We also find a caspase that is similar to Decay in D. melanogaster (Figure 4), which has not been found in either A. mellifera or Nasonia vitripennis.
Despite having simpler colony organization and shorter colony lifespan, both bumblebee species nevertheless appear largely like honeybees in their immune-gene characteristics. Indeed, they also appear similar to the solitary leaf-cutting bee M. rotundata. While the complement of canonical immune genes may be consistent, it is important to recognize that our understanding of immunity is largely based on the known repertoire of non-social insects, and in particular the fruit fly D. melanogaster. As such, we are limited in being able to identify only known immune genes that have been functionally characterized in model systems. Bees may have further unexplored immune genes, novel defenses, and social behaviors that aid disease control and are unavailable to solitary species [21]. These adaptations are also genetically controlled, but the genes behind these traits are less well defined than the canonical immune response genes. Thus, while the Apoidea may appear to have consistent immune genomic profiles at the level of genes shared, they may differ considerably in the genetic underpinning of other key aspects of disease control in a social context, such as grooming, nest hygiene, and other behaviors. As a class, immune genes are rapidly evolving [57][58][59][60]. Here we explored which, among these immune genes, show particularly rapid evolution, or differences in selection among the different clades investigated. We found that some genes are under stronger selection in Bombus compared with Apis (genes below the diagonal in Figure 9), and a number of genes are under stronger positive selection in the social clade (upper diagonal in Figure 9) than in M. rotundata. While it is likely that clades with ω > 1 are under positive selection, these results should also be interpreted cautiously because without population data it is not possible to distinguish positive selection from relaxed constraints on selection [61]. Interestingly, we found a strong signature of selection on Figure 6 Logfold gene expression relative to invariant housekeeping genes in males and gynes according to treatment (x-axis: N, naïve; R, Ringer injection; A, Arthrobacter globiformis injection; E, Escherichia coli injection). Next to the gene name we depict whether the expression differed significantly according to sex (S), treatment (T), or the interaction between sex and treatment (S*T). Full details of these statistics can be found in the supplemental materials.
dscam, a gene primarily important for neuronal selfavoidance, but that is increasingly of interest in hostparasite interactions because alternative splicing of this gene can theoretically produce over 150,000 isoforms in D. melanogaster [62]. As such, dscam is hypothesized to be important for host-parasite specificity in susceptibility, and for specific immune memory [63]. The region under selection in dscam is limited to the beginning of the aligned protein ( Figure 7A). This region corresponds to the fifth immunoglobulin I-set domain (sixth immunoglobulin domain overall). All of the previous immunoglobulin domains (1 to 5) were trimmed because they were not present in the A. mellifera gene. This gene appears to be under selection at least in the fifth immunoglobulin I-set domain but may also be variable in earlier domains. A previous study that examined the sequence of alternatively splicing exon cassettes did not detect selection in the crustacean Daphnia magna and several Drosophila species, at immunoglobulin (Ig) 2, 3 and 7 [64]. Our domain, however, likely corresponds to Ig4 or 5 in [64] and thus was not directly tested in their analysis. Nevertheless, our analysis is suggestive of differences in selective pressures among bee species. Among the other genes that show evidence of selection are a number of antiviral genes, including argonaute 2, aubergine ( Figure 8A, B), and dicer 2, all of which have been found in other systems to be under selection [60,65]. We also detect evidence of selection on two AMPs, abaecin and defensin (Additional files 8, 10, and 11), both of which appear to be under stronger selection in the Apis clade ( Figure 9). Our results corroborate those of Erler et al. [66], who also found positive selection on AMPs across several European bumblebee species. Interestingly, we find that dorsal appears to be under different selection in bumblebees than in honeybees, where Harpur and Zayed [61] found that dorsal was under purifying selection. We also find that all but one of the sites under selection in dorsal are outside of the relish domain ( Figure 7C). Population level studies of the genes that appear to be evolving under different pressures in honeybees and bumblebees, and in the social and solitary clades will be instrumental in determining which of these genes are evolving under positive, relaxed or balancing selection [61].

Conclusions
Social insects have a suite of adaptations that have been hypothesized to reduce the pressure on immune system evolution, to the point of widespread gene losses, or inversely failing to produce or maintain duplicates. However, we find no evidence of great variation in immune  gene complement, or in terms of the total number of immune-related genes across a gradient of sociality (highly social Apis > primitively social Bombus > solitary M. rotundata). Instead, we find a more nuanced pattern of immune system evolution, with variation in signatures of selection among these taxa. The different selective pressures that drive the evolution of these immune genes may in turn reflect the different parasite pressures and life history characteristics of different bee species. The depauperate immune repertoire of honeybees relative to model species thus appears to be ancestral to the evolution of bee sociality and not a consequence of sociality.

Survey for immunological repertoire and annotation
The genomes of haploid males from a single colony of B. terrestris and of B. impatiens were sequenced by the Bumblebee Genome Consortium and the details of the (See figure on previous page.) Figure 7 Sites under selection within the Apis, Bombus phylogeny for three genes of interest. The title for each gene presents the OrthoDB accession, the gene name, and the immune category. We only present a subset of the genes that showed an overall signature for selection highlighting codons at three different significance thresholds: Bayesian posterior probability >0.75 (plus signs along the top of each panel), >0.95 (x's), and where Eω -sqrt(Var(ω)) > 1.25 (circles). The blue shadow indicates an estimate of error at each codon. We show Pfam domains in colored blocks and Phobius regions along the x-axis. Crosshatched regions were trimmed from the alignment. sequencing, assembly, and automated annotation can be found in [67]. Using OrthoDB [68,69] orthologous groups, we identified orthologs from the two bumblebees, as well as from Apis florea, M. rotunda, and N. vitripennis, Tribolium castaneum, of previously characterized immune genes from D. melanogaster, A. gambiae, and A. mellifera that comprise 27 immune-related gene families or pathways. To complement the orthology searches, we searched for homologs of known immune proteins from the two bumblebees using blastp [70,71] against the official gene sets (NCBI RefSeqs). To confirm the absence of any proteins that appeared to be missing, we searched the genome assemblies and Short Reads Archive using tblastn.

Immunological expression
To confirm the relevance of these genes to immune activation and the validity of novel genes revealed in our annotation we challenged 2-to 3-week-old unmated male and gyne (that is, daughter queen) B. terrestris by injecting them with 2 μl of a suspension of either heatkilled E. coli (Gram-negative) or Arthrobacter globiformis (Gram-positive) at a concentration of 10 8 cfu/ml, or with sterile Ringer solution under the tergites of the abdomen, or as naïve controls handled them in the same way without any injection. We used 12 replicates for each treatment/caste combination (total N = 96). These experimental bees were the granddaughters and grandsons of queens collected in northern Switzerland in spring 2012 that had established colonies in the lab. We confirmed that these colonies were free of common parasites such as Crithidia bombi and Nosema bombi through visual inspection. All experimental bees were immobilized on ice for 30 minutes before treatment, including the naïve controls. After treatment we housed the bees singly with ad libitum pollen and 50% (w/w) sugar water. Eight hours after treatment we snap froze the bees in liquid nitrogen. We homogenized the abdomens before extraction with 0.5 g Zirkonium beads at 0°C to −4°C using an Omni Bead Ruptor 24 Homogenizer   (OMNI International, Kennesaw, GA, USA). We then extracted total RNA using Qiagen RNeasy Plus Mini extraction kits (Qiagen, Hilden, Germany) according to the manufacturer's instructions. We confirmed RNA integrity of every sample with 2100 Bioanalyzer (Agilent Technologies, Santa Clara, CA, USA) with the RNA 6000 Nano Kits. We transcribed the RNA to cDNA using Quantitect reverse transcription kits (Qiagen) including controls without reverse transcriptase (no-RT controls) to test for genomic contamination. All samples were checked using quantitative PCR for our housekeeping genes to ensure that the no-RT controls amplified at least 10 cycles later, and thus contain less than 0.1% of the transcripts found in the RT samples.
Based on the full genomic sequences, we selected 27 candidate genes to represent various components of the immune response of insects, including the Toll, JAK/STAT, IMD and JNK pathways; recognition genes, melanization responses, various effectors and antiviral genes. We explored the expression of these genes upon immune stimulation relative to the geometric mean of three housekeeping genes (pla2, ak, ef1a). The full list of genes, their accession numbers and primers can be found in Table 1. All primers were designed using QuantPrime [72], based on the GenBank sequences (Table 1), except those for relish, which were published in [73]. The primers for kayak were designed based on a manually annotated gene given the temporary identifier (Bter:08277927). All primers were tested and have minimal dimer and high amplification efficiency (1.9 to 2.1).
We measured expression on a Fluidigm 96.96 Dynamic array on the BioMark system (Biomark Inc., Pueblo, CO, USA) using EvaGreen as a reporter according to the manufacturer's protocol (Advanced Development Protocol 14; PN 100-1208 B). We eventually measured expression of 95 samples (12 replicates for each treatment in males and in queens with one naïve queen randomly dropped to make room for the negative control). We ran the samples with three technical replicates and used the average cycle threshold (Ct) of these technical replicates for further analysis.
We standardized expression of all genes of interest relative to the geometric mean of our three housekeeping genes (yielding deltaCt (dCt) values; all dCT values first transformed with Yeo-Johnson power transformations to improve normality and homoscedasticity, 'car' package in R) after confirming that the composite housekeeping value did not vary with sex (F 1,87 = 0.09, P = 0.77), treatment (F 3,87 = 0.29, P = 0.83), or their interaction (F 3,87 = 0.70, P = 0.56) by ANOVA. We analyzed these dCt values using MANOVA with sex (gyne and male) and treatment (naïve, injected with Ringer's solution, heat-killed E. coli, or heat-killed A. globiformis) as fixed, fully crossed factors  (base package in R). We used MANOVA for these analyses, since the expression of any of these genes is not independent of the expression of other genes and because MANOVA accounts for multiple testing and is thus robust to type I error. When MANOVA effects were significant, we subsequently explored the univariate individual gene effects.

Building gene family phylogenies
We retrieved protein sequences of selected gene families from OrthoDB [68,69] and aligned them using ClustalW [74] and adjusted the alignments manually or with Gblocks [75] before tree-building using MrBayes [76] with the mixed model. We ran MrBayes for as long as was necessary (typically for 20,000 to 400,000 generations) to reduce the average deviation of split frequencies below 0.01 or until the split frequency approached 0.01 but did not improve further. We discarded the initial 25% of the trees as a burn-in.

Testing for signatures of selection
We extracted orthologous groups of immune-related genes from OrthoDB6 [68,69]. From the 130 orthologous groups with sequences from B. terrestris, B. impatiens, A. mellifera and A. florea we extracted 148 multiple sequence alignments containing exactly one sequence from each species. We use these 148 alignments for comparisons between the Bombus and Apis clades. From the 122 orthologous groups that contain M. rotundata sequences we further extracted 139 alignments that also contain a M. Figure 9 Differences in evolutionary pressure between Apis and Bombus across orthology groups. Names are taken from D. melanogaster when available. The size of the point is scaled according to the proportion of codons that are evolving under different selection in the two clades. Names were moved to improve legibility taking care to maintain x-axis position in the insert (denoted with an asterix). Full table of these results can be found in Additional file 8. rotundata ortholog, which we use as an outgroup to compare social with solitary (non-social) bees. In six of the alignments (abaecin, basket, cactus, defensin, kayak and tak1) one or more orthologs were not present in OrthoDB6 and had to be retrieved from an alternative source (that is, NCBI). Protein sequences were aligned independently for the four-taxa (Bombus and Apis) or fivetaxa trees (with Megachile) with ProGraphMSA [77] and trimmed using Gblocks with the stringent settings as described in [75]. Where orthologous groups contained multiple sequences for some species the most closely related sets of sequences were aligned. In the 12 orthologous groups that contained more than one sequence for each species we extracted the maximum number of alignments, such that each alignment contains only one sequence from each species. We retrieved cDNA sequences for the alignments from the official gene sets (A. We used likelihood-based codon models implemented in the PAML package [78] to analyze differences in the rate of evolution and to test for signals of selection. We tested hypotheses by using likelihood ratio tests to select the best fitting model among pairs of nested models that differ only in their representation of ω, the ratio of nonsynonymous to synonymous substitutions (ω = dN/dS).  We make the assumption that ω > 1 indicates positive selection, while ω < 1 and ω = 1 indicates negative and neutral selection.
The average rate of evolution was determined using the M0 [79] model, which assumes a constant ω across all sites and branches. The average ω is not a good indicator for the presence of positive selection, since functional and structural constraints ensure that most sites in functional genes are conserved [80]. Hence, we used the M7 and M8 models to test for the presence of positively selected sites. [79]. Both models allow ω to vary from site-to-site according to a Beta distribution, but the M8 model additionally allows some sites to evolve with ω > 1, to account for sites under positive selection.
In order to detect episodes of positive selection on the connecting branches between clades we used the branch-site model [81,82]. Some branches are assigned a priori to the foreground, where some sites are allowed to evolve with ω > 1, while all sites on background branches are constrained to ω ≤ 1. The branch-site model is compared to a null model where there is no difference between foreground and background branches. We used Clade model D [83] to test for more general differences between clades. This model allows ω to differ between clades in some sites. It is compared to a null model where there are no differences in ω between clades.
To ensure that the PAML optimization did not get stuck in local optima we used six different initial estimates for ω in all analyses and initialized branch lengths to values calculated with PhyML [84]. We corrected for multiple testing by controlling the false discovery rate using the method of Benjamini and Hochberg [85]. To calculate the posterior probabilities of sites being under positive selection in the M8 and Branch-site models we used the Bayes Empirical Bayes (BEB) approach implemented in PAML [86].
We repeated the analyses using Probcons [87] for aligning sequences. However, we only report the results from alignments produced by ProGraphMSA, as these alignments give more conservative estimates and hence a smaller chance of falsely reporting positive selection. Similarly, we do not report results from using Gblocks with the relaxed settings, as described in [75], or no trimming at all. These results are available from the authors.

Data
Sequence data can be found on NCBI (B. impatiens