Obligate mutualism within a host drives the extreme specialization of a fig wasp genome
- Jin-Hua Xiao†1,
- Zhen Yue†2,
- Ling-Yi Jia†1, 3,
- Xin-Hua Yang†2,
- Li-Hua Niu†4,
- Zhuo Wang†2,
- Peng Zhang†1,
- Bao-Fa Sun1, 3,
- Shun-Min He1,
- Zi Li1, 3,
- Tuan-Lin Xiong1, 3,
- Wen Xin5,
- Hai-Feng Gu1, 3,
- Bo Wang1, 3,
- John H Werren6,
- Robert W Murphy7, 8,
- David Wheeler6,
- Li-Ming Niu9,
- Guang-Chang Ma9,
- Ting Tang10,
- Sheng-Nan Bian4,
- Ning-Xin Wang4,
- Chun-Yan Yang4,
- Nan Wang4,
- Yue-Guan Fu9,
- Wen-Zhu Li1,
- Soojin V Yi11,
- Xing-Yu Yang11,
- Qing Zhou2,
- Chang-Xin Lu2,
- Chun-Yan Xu3,
- Li-Juan He2,
- Li-Li Yu2,
- Ming Chen2,
- Yuan Zheng2,
- Shao-Wei Wang2,
- Shuang Zhao2,
- Yan-Hong Li2,
- Yang-Yang Yu2,
- Xiao-Ju Qian2,
- Yue Cai2,
- Lian-Le Bian2,
- Shu Zhang2,
- Jun-Yi Wang2,
- Ye Yin2,
- Hui Xiao1,
- Guan-Hong Wang1, 3,
- Hui Yu12,
- Wen-Shan Wu13,
- James M Cook14, 15Email author,
- Jun Wang2Email author and
- Da-Wei Huang1, 4, 10Email author
© Xiao et al.; licensee BioMed Central Ltd. 2013
Received: 12 September 2013
Accepted: 20 December 2013
Published: 20 December 2013
Fig pollinating wasps form obligate symbioses with their fig hosts. This mutualism arose approximately 75 million years ago. Unlike many other intimate symbioses, which involve vertical transmission of symbionts to host offspring, female fig wasps fly great distances to transfer horizontally between hosts. In contrast, male wasps are wingless and cannot disperse. Symbionts that keep intimate contact with their hosts often show genome reduction, but it is not clear if the wide dispersal of female fig wasps will counteract this general tendency. We sequenced the genome of the fig wasp Ceratosolen solmsi to address this question.
The genome size of the fig wasp C. solmsi is typical of insects, but has undergone dramatic reductions of gene families involved in environmental sensing and detoxification. The streamlined chemosensory ability reflects the overwhelming importance of females finding trees of their only host species, Ficus hispida, during their fleeting adult lives. Despite long-distance dispersal, little need exists for detoxification or environmental protection because fig wasps spend nearly all of their lives inside a largely benign host. Analyses of transcriptomes in females and males at four key life stages reveal that the extreme anatomical sexual dimorphism of fig wasps may result from a strong bias in sex-differential gene expression.
Our comparison of the C. solmsi genome with other insects provides new insights into the evolution of obligate mutualism. The draft genome of the fig wasp, and transcriptomic comparisons between both sexes at four different life stages, provide insights into the molecular basis for the extreme anatomical sexual dimorphism of this species.
In symbiosis, different species live together intimately. Symbiosis is responsible for several major transitions in evolution, including the origin of eukaryotes, and it underpins key ecosystem functions like nitrogen fixation and pollination . In this system, a large species (host) usually interacts with a smaller one (symbiont), which may live inside it as an ‘endosymbiont’. Interactions can be antagonistic, when the symbiont harms the host, such as for parasites and most plant-herbivore interactions , or mutualistic, where host and symbiont both benefit from the association and their evolutionary interests are more closely aligned [1, 3]. Obligate herbivore-plant mutualisms are relatively uncommon, but the fig pollinating wasp-fig mutualism is an ancient and stable association that originated about 75 million years ago .
Typically, endosymbionts are host-specific and show specialized adaptations to life inside their hosts [5, 6]. For example, parasites often exhibit a series of morphological reductions. Symbionts may also tend towards gene loss and genome reduction . For example, parasites like tapeworms  and the mutualistic bacterium Buchnera[9, 10] have smaller genomes than their free-living relatives. Many endosymbionts spend most or all of their lives in the host [5, 6] and are vertically transmitted from one host generation to the next, often through eggs or propagules [9–11]. Even when an endosymbiont has a free-living stage, it is often quiescent (for example, tapeworm eggs); transmission to other hosts occurs via host contact or through food or water. Consequently, most obligate endosymbionts, benign or parasitic, live most of their lives in somewhat simplified, relatively predictable environments, allowing selective reduction of their anatomies and genomes [5, 6].
Ceratosolen solmsi is the obligate pollinating wasp of Ficus hispida, and it has the lifestyle and morphological characters typical of fig pollinators (Figures 1 and 2). We investigate how the longstanding mutualism and largely endosymbiotic lifestyle have shaped the genome of this herbivorous insect by deciphering the draft genome, as well as exploring life-staged transcriptomic differences between both sexes. We also test whether this largely endosymbiotic lifestyle has led to the endosymbiont signature of genome reduction, or if a life cycle with dispersive females prevents this from happening. In the latter case, we predict that male morphological reduction largely reflects reduced gene expression. This draft genome provides a valuable new genome resource for hymenopteran insects, and also permits comparisons with other insects to shed new light on the evolution of obligate mutualism.
Results and discussion
Genome assembly and genome size
General assembly statistics for the genome of the fig pollinator, C. solmsi
15,018 (length ≥100 bp)
Average length of contigs
Max contig length
Total length of contigs
276,647,649 bp (length ≥ 100 bp)
Contig size N50
7,397 (length ≥ 100 bp)
Average length of scaffolds
Max scaffold length
Total length of scaffolds
277,939,842 bp (length ≥ 100 bp)
Scaffold size N50
94.52% (estimated size 294,060,873 bp)
Assessment of genome assembly and annotation
Several analyses serve to infer the accuracy and completeness of the genome sequence of C. solmsi. Gene coverage of C. solmsi and some other insect genomes were assessed against 248 core eukaryotic genes. The coverage rates for assembly and the gene-set of C. solmsi are 100% and >88%, respectively, using CEGMA 2.4 . These rates are comparable to other insects (Additional file 1: Tables S2 and S3). Analyses involving our independently sequenced and assembled C. solmsi transcriptome and EST datasets (unpublished EST data from our lab) find that all assemblies cover most of the gene regions (Additional file 1: Table S4).
Repetitive DNA and non-coding RNA
Fig wasps have one of the most AT-rich (69.6%) insect genomes sequenced to date. The genome contains only 27.4 Mbp of repetitive sequences and only 6.4% transposable elements (TEs) (details in Additional file 1: Tables S1 and S5). Consequently, fig wasps have far less repetitive DNA than jewel wasps, which have among the highest abundance of transposable elements in insects , and ants . Nevertheless, fig wasps are quite similar to honeybees in having few repetitive sequences . Our annotation also includes 64 microRNAs, 138 tRNAs, 39 rRNAs, and 19 small nuclear RNAs in the genome of C. solmsi (Additional file 1: Table S6).
Gene annotations, comparative genomics, and natural selection of genes
Phylogenetic analysis based on all single copy orthologous genes provides a good resolution of the phylogenetic relationship among nine sequenced insect genomes. It also dates the divergence of the jewel wasps and fig wasps to approximately 42.8 to 83.4 million years ago (Figure 3). This date is consistent with previous inferences about the origin of fig and pollinator mutualism .
The fig wasp genome has a higher overall Ka/Ks ratio than other insects (mean omega values: C. solmsi, 0.039; Tribolium castaneum, 0.004; N. vitripennis, 0.026; A. mellifera, 0.028; Camponotus floridanus, 0.029; paired Wilcoxon rank sum tests: P <0.0001) (Additional file 1: Figure S3), indicating its faster proteins evolution. All the 157 genes with Ka/Ks > 1 along the branch leading to C. solmsi are analyzed further using the codon-based branch-site tests implemented in PAML , and 13 genes are identified as obviously rapidly evolved, which may be due to positive or relaxed selection (functional annotations of these 13 genes are provided in Additional file 1: Table S8). Eight of these 13 genes either encode transmembrane proteins, or are transmembrane protein-associated; these often act as transporters of signals or substances and are expected to evolve rapidly. For example, CSO_001764 may encode a hippocampus abundant transcript 1 protein, a sugar transporter , while CSO_006481 is a lipid storage droplets surface-binding protein 1-like gene involved in the activation of lipolysis . Interestingly, two positively selected genes, CSO_003961 (rac GTPase-activating protein 1-like) and CSO_005676 (guanine nucleotide exchange factor 2), act together yet contrarily to control the activity of G proteins .
Extreme contraction of gene repertoires in the chemosensory toolbox
Comprehensive list of chemosensory system genes among insects
Fig wasps have only five Gr genes, the smallest repertoire known in insects, and one less than in human body lice, which have been permanent parasites for the past 5 to 7 million years . Among the Gr genes, two orthologs of sugar receptors, CsGr1 and CsGr2 that help insects acquire nutrition, are conserved in other insects. Although CsGr3 and CsGr4 are conserved along with the gene AmGr7 in the honeybee, they are expanded in the genome of the jewel wasp (NvGr48-58). Lineages CsGr4 and NvGr48-50 are chalcidoid-specific (Additional file 1: Figure S4). No orthologs occur for the otherwise conserved carbon dioxide and the bitter receptors Gr genes. Interestingly, transcriptomic data and PCR experiments confirm the presence of one pseudogene (CsGr5PSE), which encodes a highly conserved Gr in all other insects (for example, NvGr3 and DmGr43a, Additional file 1: Figure S4). The gene appears to generally be involve in sensing fructose, and in Drosophila, it is also an important brain fructose receptor to sense hemolymph fructose and promote feeding in hungry flies but suppress feeding in satiated ones . The pseudogenization of this gene in fig wasps is unusual. We also detect six other pseudogenes distributed on different scaffolds, each of which contains a short fragment similar to part of a gustatory receptor gene only. These might be remnants of gene losses.
Fig wasps have 46 Ors (with two pseudogenes), compared to 301 in jewel wasps and 174 in honeybees, and they lack fig wasp-specific Or subfamilies (Additional file 1: Figure S5). Similarly, jewel wasps have 90 OBPs, while honeybees (with 21) and fire ants (with 18) have far fewer , and fig wasps have just seven. The contrast with other herbivorous insects is also striking. For example, Tribolium beetles have 307 Ors and 50 OBPs  and the diamondback moth has 83 Ors and 64 OBPs . The extreme contraction of Or and OBP diversity in fig wasps is convergent with the case of the human body louse , possibly reflecting the common theme of extreme host-specificity.
Fig wasp Gr and Or genes, but not those of jewel wasps or honeybees, appear to be under more relaxed selection as they show a higher ka/ks ratio than single copy orthologous genes (Additional file 1: Table S9). Consistent with the repertoire reductions, this may be associated with the process of host specialization .
Genes involved in the chemosensory system have been suggested to be involved in the evolution of host specialization in some insects, such as pea aphids  and Drosophila sechellia[34, 35]. In fig wasps, sensing other distracting aspects of the environment is probably of trivial consequence, and perhaps it reduces the success of finding a new host. Thus, the Gr, Or, and OBP families in fig wasps appear to have experienced dramatic contractions relative to other insects. Though fig wasps have few Or genes, they still have far more than lice . This may reflect more complex mate-searching, as male fig wasps must locate mates inside galled flowers in the dark cavity of a fig that often contains other species of fig wasps, including abundant and sympatric non-pollinating wasps (Figure 1).
Reduced system of detoxification
In insects, three major groups of enzymes have important functions in the processing of environmental chemicals: glutathione-S-transferases (GSTs), cytochrome P450s (P450s), and carboxylesterases (CCEs). These enzymes play the major roles in disarming toxic xenobiotics ; the P450s and CCEs are also used for clearing signals related to the reception of kairomones and pheromones [37, 38]. Most insects have similar numbers of genes involved in detoxification , except for the honeybee, which has far fewer genes associated with xenobiotic metabolism. Reductions in the honeybee may be related to its specialized eusocial behavior and homeostasis of the nest environment . In the obligate fig-fig pollinator mutualism, the highly specialized ecology and strict host specificity of fig wasps may obviate the need for many detoxification genes.
C. solmsi has only 11 cytosolic GST genes, far fewer than the 19 to 37 of other herbivorous arthropods but similar to the eight of honeybees (Additional file 1: Table S10). Honeybees have the smallest number of P450 genes (with 46) among previously characterized insect genomes. However, the fig wasp has only 34 P450 genes (Additional file 1: Table S10), which cluster according to sequence similarity into mitochondrial CYP, CYP2, CYP3, and CYP4 groups. The dramatic difference in gene members is not due to the CYP2 or CYP mitochondrial groups, which are conserved across different insects, but mainly due to the CYP3 and CYP4 clans, which are important gene members involved in xenobiotic metabolism . For example, the fig wasp has a significant reduction to only 11 basic members in CYP3 clan (Additional file 1: Table S10 and Figure S6). In addition, the genome of C. solmsi has only one-third as many genes in the CYP4 clan as N. vitripennis.
The fig wasp genome has 17 CCE sequences, again the smallest number in any of the insect genomes characterized so far (Additional file 1: Table S10). These genes occur in three major functional classes: dietary/detoxification; hormone/semiochemical processing; and neuro-developmental/cell adhesion . Fig wasps have 11 genes in neuro-developmental/cell adhesion, and the fig wasp, jewel wasp, and honeybee (all hymenopterans) have the same number of genes in each of the six clades in this class. This indicates an ancient origin of these lineages  and suggests that these genes may play vital roles beyond detoxification; they may act as cell adhesion molecules in the development of the nervous system [40, 41]. The other two classes are extremely under-represented in the fig wasp genome. For example, the fig wasp genome contains only four members in the clade of dietary/detoxification genes, and only two members in the clade of hormone/semiochemical processing - hymenopteran b-esterases and juvenile hormone esterases.
Most herbivorous insects have an antagonistic relationship with their host plants , but mutualistic fig wasps develop inside benign hosts. The fig acts as a ‘nursery’ for developing fig wasp larvae, each of which consumes one presumptive seed (Figure 1). Thus, figs (and galls within figs) provide a relatively safe nursery by insulating developing larvae from the external environment and many antagonists. In tune with this, the fig wasp genome shows a marked reduction of its detoxification system relative to other insects. It has not only the fewest cytochrome P450 and carboxylesterase genes of nine arthropod genomes compared, but also far fewer glutathione-S-transferases genes than other insect species.
Reduced cuticular protein genes mainly expressed in late pupa stage
A group of cuticular proteins specific to arthropods (hereafter called CPR), characterized by an extended version of the Rebers and Riddiford (R&R) consensus, a conserved chitin-binding domain of about 63 amino acids , have previously been studied in many insect species and the results indicate that hymenopterans have far fewer CPR genes than dipterans and lepidopterans. The relatively high level of protection afforded for the development of these hymenopteran larvae may account for the trend .
Gene members in innate immune response
Insects have effective immune responses in defense of their exposure to a variety of infections, including bacteria, fungi, viruses, and parasites. The responses involve clotting, phagocytosis, encapsulation, and the production of antimicrobial proteins, with the activation of four signaling pathways: Toll, immunodeficiency (IMD), c-Jun N-terminal kinase (JNK), and Janus kinase/Signal transducers and activators of transcription (JAK/STAT) [43, 44].
Sex-specific gene expression and extreme anatomical sexual dimorphism
Male fig wasps live within the fig for their entire lives and females for about 98% of theirs (Figure 1). The genders exhibit extreme intraspecific morphological divergence in the compound eyes, wings, antennae, body color, and size, as expected given their functional adaptations to different life histories (Figure 2). Females must emerge from syconia and fly to another fig tree to oviposit and pollinate; they need fully developed compound eyes (with ocelli), wings, and antennae to do so. In contrast, males have no compound eyes (ocelli absent), vestigial wings, and shorter antennae because these characters are advantageous for living inside syconia.
Analyses of GO functional enrichment among genes divergently expressed in females and males provide critical insights into sexual dimorphism (Additional file 1: Table S13). Larvae do not have significantly enriched divergent genes, except for some upregulation of genes involved in oxidoreductase activity in males. Few genes are divergently expressed at early pupa stage, when upregulated male genes are enriched in GO functions of catalytic and oxidoreductase activities, and downregulated in channel, receptor, and transmembrane transporter activities. At late pupa and adult stages, expressed genes are distinctly sex-biased and the distributions of up- and downregulated male genes differ. Very few genes are upregulated in late pupa stage and no significant enrichment is detected. Upregulated, enriched genes of adults involve transporter activity functions. Downregulated genes in late pupa and adult stage males are significantly enriched in the structural constitution of ribosomes, and many functions involved in gene translation. Thus, the sexual divergence of gene expression results in female and male morphological dimorphism.
Screening for Horizontal Gene Transfer (HGT)
Many recent studies report the horizontal transfer of genes between bacteria and animals and this may be important source of evolutionary innovation . Given the tight association of fig wasp with its microbes such as Wolbachia, and the long co-evolutionary history with fig trees, fig wasps may have acquired genes from their microbes, or even from fig trees.
Our exhaustive search for genes of bacteria, fungi, viruses, and plants in the genome of fig wasp fails to detect any from either fungi or plants. However, some gene fragments appear to have apparent bacterial or viral origins. A blastp search of the all-protein sequences database detects transferred genes encoding RNA-dependent RNA polymerase (CSO_001922) and ANK repeat protein (CSO_009275). Both genes are intact without introns. CSO_001922 may have been transferred from a virus and CSO_009275, which also occurs in N. vitripennis, appears to be a recent transfer from Wolbachia (Additional file 1: Figures S7 and S8). Thus, fig wasps appear to have acquired a few functional gene fragments and intact genes via lateral gene transfer from bacteria and viruses.
The sequenced fig wasp species is uninfected with Wolbachia, which facilitates our detection of lateral gene transfers from Wolbachia by reducing the risk of contamination. A further blastn search of the fig wasp genome against the whole bacteria database detects 12 small fragments putatively recently acquired from bacteria; their lengths range from 105 bp to 1,509 bp (Additional file 1: Table S14). PCR amplification of the sequences spanning the transferred fragment plus its flanking region confirms one candidate fragment in scaffold 108 that is similar to UDP-N-acetylmuramoylalanine-D-glutamate ligase of Wolbachia endosymbiont wVitB, although it cannot be translated due to a premature stop codon. This fragment may represent an ancient transfer event in an ancestor of C. solmsi infected with Wolbachia, or it may indicate that Wolbachia previously infected C. solmsi, but not now. The transferred fragments and frequency of laterally transferred genes events in the genomes of N. vitripennis and C. solmsi differ substantially, indicating that many of these transfer events happened independently and relatively recently in different lineages .
The fig-fig wasp system is one of the most dramatic and ancient examples of an obligatory herbivore-plant mutualism known. Both figs and fig wasps show morphologies highly specialized to this mutualism. The first draft genome sequence of a fig wasp, that of C. solmsi, reveals how the long-term mutualism has shaped its genome. The genome generally resembles that of other insects in that it has similar gene content to other hymenopterans and other insects in genes such as those encoding for heat shock proteins, circadian rhythm, yellow and royal jelly-like proteins, hox complex, sex determination, and DNA methylation (for further information on these analyses plus an interesting analysis on the nutritional and microbial in the genome, please see Additional file 1). However, importantly, the fig wasp genome shows marked reductions of genes associated with environmental sensing and protection (for example, detoxification). Extreme host-specificity and endosymbiosis in a mutualistic host appears to drive this pattern. Counter-intuitively, complex dispersal is viable with a streamlined chemosensory toolbox, maybe because overwhelming selection pressure involves homing in on the correct host during the short adult lives of females; the sensing of other aspects of a complex external environment is of little importance, and might even be distracting. Comparisons of transcriptomes between female and male fig wasps at the key life stages of larva, early pupa, late pupa, and adult indicate that extreme anatomical sexual dimorphism likely results from a strong bias in sex-specific gene expression. Our analyses facilitate a deeper understanding of the biology of fig wasps, and also provide new insights into the evolution and genomic adaptations of mutualisms.
Materials and methods
We chose three trees of Ficus hispida in Danzhou (19° 30’ N, 109° 31’ E), Hainan province, China for inoculation of the fig pollinator species (Ceratosolen solmsi). Samples for genomic DNA extraction were collected from June to August in 2010. In each inoculation experiment, naturally growing figs were covered with mesh bags from their very early developmental stages to exclude all insects, including fig wasps. One mated female (‘mother’) fig wasp was introduced into each bagged fig to lay eggs. About 1 month later, when the offspring were mature, we transferred several mated female daughter fig wasps into other bagged figs at their receptive stages. Following this second generation of development, we collected all male grandson fig wasps for genome sequencing. In fig wasps, like other hymenopterans, males are haploid and provide better targets than diploid females for genome sequencing projects. These processes reduced genomic heterozygosity and, thus, improved the quality of assembly. After thoroughly washing with double-distilled water, we immediately froze the samples in liquid nitrogen and transported them to the laboratory on dry ice. DNA extraction occurred immediately on arrival.
The following groups of samples were selected for transcriptome analyses: (1) larval female-16th day (larva), (2) larval male-16th day (larva), (3) pupal female-21th day (early pupa), (4) pupal male-21th day (early pupa), (5) pupal female-25th day (late pupa), (6) pupal male-25th day (late pupa), (7) adult female-29th day, and (8) adult male-29th day. The days indicate time since the eggs were laid. For sample groups 1 to 6, we used only one fig wasp for each RNA extraction. For each adult group, we used 50 individuals. All RNA extractions occurred immediately after collection.
About 500 male fig wasps were divided into 10 samples of 50 individuals each and used for DNA extraction using a method modified from the protocol developed by J. Rehm, in the Berkeley Drosophila Genome Project . Briefly, 50 fig wasps were completely homogenized in 400 μL Buffer A (100 mM Tris–HCl, pH 7.5; 500 mM EDTA; 100 mM NaCl; 0.5% SDS). A total of 3 μL RNaseA was added to the homogenate, followed by 2 h incubation at 37°C. Then 3 μL Proteinase K was added to the mixture, followed by 2 h incubation at 58°C. Next, 800 μL of LiCL/KAc solution (5 M KAc and 6 M LiCl) was added before the tube was incubated on ice for 10 min. Then the mixture was centrifuged at 14,000 g for 15 min at 4°C, and 1 mL of supernatant was transferred into a new 2 mL tube. To precipitate the genomic DNA from the supernatant, 0.8 mL ice isopropanol was added, and centrifuged at 14,000 g for 15 min at 4°C. The supernatant was then aspirated. The DNA pellet was washed with 70% ethanol, followed by drying for 5 min. The DNA was dissolved in 50 μL of TE buffer and stored at −80°C.
Total RNA was isolated using the RNeasy® Micro Kit (Qiagen, Shanghai, China) and treated with DNase (Qiagen, Shanghai, China). A NanoDrop ND-1000 Spectrophotometer (Nano-Drop Technologies, Wilmington, DE, USA) was used to confirm adequate RNA concentration and A260/A280 ratio. RNA was dissolved in 20 μL RNase-free water and kept at −80°C. Larval females and males that had no distinct morphological divergence were discriminated by the variable splicing pattern of the sex determination gene doublesex. The procedure used 50 ng dissolved RNA of larva fig wasp to synthesize first-strand cDNA by priming with oligo(dT) with TransScript® II First-Strand cDNA Synthesis SuperMix (TransGen Biotech, Beijing, China). The sex of larva individual was then confirmed by PCR of the male-specific splice isoform of doublesex.
Shotgun libraries construction and sequencing
Genomic DNA was sheared into fragments and seven libraries were constructed with inserted fragment sizes ranging from 200 bp, 500 bp, 800 bp, 2 kb, 5 kb, and 10 kb to 20 kb by the manufacturer’s library kit (Illumina) . A PCR-free library was also constructed. The libraries were sequenced using the Illumina-HiSeq™ 2000 platform with paired-end sequencing approaches.
For RNA-seq, beads with oligo(dT) were used to isolate poly(A) mRNA. Fragmentation buffer was then added for cutting mRNA into short fragments, which were used as templates. Random hexamer primers were used to synthesize first-strand cDNA. Second-strand cDNA was synthesized using a mixture of buffer, dNTPs, RNase H, and DNA polymerase I. Short fragments were purified with QiaQuick PCR extraction kits and resolved with EB buffer for end repair and addition of poly(A). Next, the short fragments were connected with sequencing adaptors. For amplification with PCR, we selected suitable fragments as templates based on agarose gel electrophoresis. Finally, the libraries were sequenced using an Illumina HiSeq™ 2000. RNA-seq for abdomen of adult Drosophila willistoni was downloaded and compared between female and male .
construct contig: split the short-insert size library data into 43-mers and construct a de Bruijn graph. Next, obtain parameters from a simplified graph. Finally, connect the 43-mer path to get contigs;
construct scaffold: realign all usable reads onto contig sequences, then calculate the amount of shared paired-end relationships between each pair of contigs, weight the rate of consistent and conflicting paired-ends, and then construct scaffolds;
fill gaps: use the paired-end information from the short insert size library and the PCR-free library to retrieve read-pairs that had one end mapped to a unique contig and the other end located in the gap region. We then carried out a local assembly of the collected reads to fill the gaps using Gapcloser.
To evaluate the assembly, we employed CEGMA and EST evaluations. For CEGMA (version 2.4)  evaluation, we used 248 ultra-conserved core eukaryotic genes (CEGs) that were widely distributed and conserved in species to assess the completeness of genome assembly and gene-set. The CEGMA evaluation utilized several software packages, including tblastn (blast-2.2.25), genewise (wise2.2.3), hmmer (hmmer-3.0), and geneid (geneid v1.4). Four insect genomes including C. solmsi were compared. For the EST evaluation, we used BLAT to map Sanger-sequenced ESTs or Trinity  assembled tgicl  clustered unigenes to the C. solmsi genome assembly. Then we calculated genome coverage using both the percentage of bases covered by ESTs and the percentage of scaffold numbers with >90% or 50% covered by ESTs. Eight transcriptome datasets were assembled separately by Trinity and then clustered to remove redundancy by tgicl to get unigene sequences before evaluation.
Multiple approaches were used to predict gene structures in this genome including de novo, homology-based, EST and RNA-seq based predictions. De novo prediction was performed based on the repeat-masked genome and with the help of the HMM model using AUGUSTUS , GENSCAN, and SNAP. Homologous proteins of the following species were mapped to the genome using tblastn with an E-value cutoff 1e-5: Homo sapiens (H. sap): data downloaded from ; Apis mellifera (A. mel): data downloaded from : Bombyx mori (B. mor): data downloaded from ; Drosophila melanogaster (D. mel): data downloaded from ; and Nasonia vitripennis (N. vit): data downloaded from . The aligned sequences, as well as their corresponding query proteins, were then filtered and passed to GeneWise to search for accurately spliced alignments. ESTs (unpublished data in our lab) were aligned to the genome using BLAT to generate spliced alignments. The alignments were then linked according to overlap using PASA. Source evidence generated from the above three approaches was integrated by GLEAN to produce a consensus gene set.
To improve the integrity and correctness of the genome, transcriptome reads were aligned against the genome using TopHat to identify candidate exon regions and the donor and acceptor sites. Cufflinks was employed to assemble the alignments into transcripts. Then, based on assembled candidate transcript sequences, ORFs were predicted to get reliable transcripts by using HMM-based training parameters. Finally, we combined the GLEAN set with the transcripts from RNA-seq to generate a confident gene set.
Gene function annotation
Gene functions were assigned according to the best match of the alignments based on blastp to the databases SwissProt (release2011 01)  and TrEMBL (release2011 01). The motifs and domains of genes were determined by InterProScan (iprscan 4.7)  against protein databases such as ProDom, PRINTS, Pfam, SMART, PANTHER, and PROSITE. Gene Ontology (GO)  IDs for each gene were obtained from the corresponding InterPro entries. All genes were aligned against KEGG (release54)  proteins, and the pathway in which the gene might be involved was inferred from matched genes.
Annotation of repeats and non-coding RNA
Initially, non-interspersed repetitive regions (including simple repeats, satellites, and low complexity repeats) were predicted by RepeatMasker  with the ‘-noint’ option. These tandem repeats were also annotated using Tandem Repeats Finder (v.4.04) with parameters of ‘Match = 2, Mismatch = 7, Delta = 7, PM = 80, PI = 10, Minscore = 50, and MaxPeriod = 2000’ .
Implementing a homology strategy, we identified known transposable elements (TEs) against the Repbase database (v.20120418) in the genome of C. solmsi using RepeatMasker v.open-3.3.0 (ab-blast engine , with parameters ‘-nolow, -no_is -norna, -parallel 1 -s’)  and RepeatProteinMask (with parameters ‘-noLowSimple, -pvalue 0.0001’) at the DNA and protein level, respectively .
A de novo repeat library was also generated using RepeatModeler (v1.0.5)  and PILER-DF , and a RepeatMasker analysis against the final non-redundant library was performed again to find homologs in the genome and to classify the found repeats.
We searched the whole genome sequence to detect four types of non-coding RNAs. Employment of tRNAscan-SE identified reliable tRNA positions. We searched for small nuclear RNAs and microRNAs using a two-step method: sequences were aligned with blast and then searched with INFERNAL against the Rfam database (release 9.1) . The rRNAs were found by aligning with blastn against a ref rRNA sequence from the closest related species.
Orthologous gene clusters and the phylogeny of arthropods
We identified gene families using TreeFam  and the following steps: first, blastp was used to compare all the protein sequences of 10 species including C. solmsi. The E-value threshold was set to 1e-7; second, HSP segments of each protein pair were concatenated by solar, H-scores were computed based on Bit-scores, and these were taken to evaluate the similarity between genes; finally, gene families were identified by clustering of homologous gene sequences using hcluster_sg. Genes specific to C. solmsi were those that did not cluster with other arthropods chosen for gene family construction, and those that did not have homologs in the predicted gene repertoire of the compared genomes (Figure 3). However, these genes could have GO annotation if they had the functional motifs. The motifs and domains of these genes were determined by InterProScan (iprscan 4.7)  against protein databases such as ProDom, PRINTS, Pfam, SMART, PANTHER, and PROSITE. GO IDs of each gene were obtained from the corresponding InterPro entries, from which we also obtained gene functional enrichment.
Single-copy orthologous genes were used to reconstruct the phylogeny. CDS sequences from each gene were aligned using MUSCLE and protein sequences were concatenated to form one super gene for each species. Codon position 2 of aligned CDS sequences was extracted for subsequent analysis. PhyML  was used to construct the phylogeny using the GTR substitution model and gamma distribution rates model across sites. Branch reliability was assessed via aLRT values.
Divergence times were estimated using PAML mcmctree while implementing the approximate likelihood calculation method. Gamma prior and alpha parameters were computed based on the substitution rate per time unit estimated by PAML baseml. We ran mcmctree to sample 10,000 times, with sampling frequency set to 5,000, and burnin parameter set to 5,000,000 using correlated molecular clock and REV substitution model. Finetune parameters were set to make acceptance proportions fall in range (0.2, 0.4). Other parameters were the defaults. Convergence of results was checked by Tracer and two independent runs were performed to confirm convergence.
Gene family expansion and contraction
We identified gene families using CAFE , which employed a random birth and death model to study gene gains and losses in gene families across a user-specified phylogeny. The global parameter λ, which described both the gene birth (λ) and death (μ = −λ) rate across all branches in the tree for all gene families, was estimated using maximum likelihood. A conditional P value was calculated for each gene family, and families with conditional P values less than threshold (0.0001) were considered as having an accelerated rate of gain or loss. We identified branches responsible for low overall P values of significant families.
Evolutionary rates of genes
We calculated ka/ks ratios for all single copy orthologs of C. solmsi, Nasonia vitripennis, Apis mellifera, Camponotus floridanus, and Tribolium castaneum. Alignment quality was essential for estimating positive selection. Thus, orthologous genes were first aligned by PRANK , a good alignment tool for studies of molecular evolution. We used Gblocks  to remove ambiguously aligned blocks within PRANK alignments. We employed ‘codeml’ in the PAML package  with the free-ratio model to estimate Ka, Ks, and Ka/Ks ratios on different branches. The difference in mean Ka/Ks ratios for single-copy orthologous genes between C. solmsi and each of the other species were compared with paired Wilcoxon rank sum tests.
Genes that showed values of Ka/Ks higher than 1 along the branch leading to C. solmsi were reanalyzed using the codon based branch-site tests implemented in PAML [23, 82]. The branch-site model, which allowed ω to vary both among sites in the protein and across branches, was used to detect episodic positive selection. Test 1 (M1a vs. branch site model) and test 2 (branch site null model vs. branch site model), which differentiated positive selection from the relaxation of selective constrains, were used. The pairwise comparisons M1a vs. branch-site model, and branch-site model (model = 2, NSsites = 2) vs. branch-site null model (fixed ω = 1 and ω = 1) were used to perform likelihood ratio tests (LRTs). Their significance was evaluated using a χ2 distribution. When the LRT was significant, a Bayes Empirical Bayes (BEB) analysis was conducted to identify putatively positively selected sites, which may also be relaxed selected sites though.
Manual annotation and evolutionary analyses of interested genes
For genes requiring greater annotation, protein homologs of N. vitripennis, A. mellifera, and sometimes D. melanogaster were collected from NCBI, Hymenoptera Genome Database , and FlyBase . Both tblastn and blastp searches were performed for candidate genes in the assembled genome of C. solmsi. The annotated protein set used an E-threshold 0.005. The threshold was raised when protein sequences were short and few blast hits were found. A blast of candidate genes to the NCBI non-redundant (nr) protein database confirmed their orthology. IGV browser was used to view gene annotations, EST, and RNA-seq BAM alignments in the genome of C. solmsi. Gene models were refined manually according to RNA evidence and tblastn results conducted with the assistance of custom perl scripts. Pseudogenes and irregular features such as missing start codons, stop codons, and other anomalies were noted. For annotation of cuticular proteins with an R&R consensus, the ‘chitin_bind_4’ domain was required. Proteins containing cysteines were removed unless the cysteines lay in signal peptide regions, which were identified by SignalP . For P450s, gene models for C. solmsi were searched by tblastn and blastp against D. melanogaster, A. mellifera, and N. vitripennis CYP sequences representing the CYP2, 3, 4 and mitochondrial P450 clans (E-value cutoff = 10-4). All models with predicted proteins that included the canonical heme binding sequence were manually verified for the presence of the other key features of P450 enzymes; the gene model was corrected whenever necessary (incorrect predictions such as fusions with adjacent genes or fragmentation) or possible (when RNA-seq sequences were available). Final gene models were confirmed by blasting back to the reference gene set to confirm reciprocal best hits. The obtained gene models were inspected and, if necessary, edited. Care was taken to ensure that the predicted gene structures matched corresponding transcriptomic data. Pseudogenes and gene fragments (detritus exons) were separated from putative full-length CYP coding sequences. To annotate genes involved in the development of eyes and wings, proteins putatively participating in development of eyes and wings described for D. melanogaster were used as query sequences. These queries were used in blastp and tblastn searches (E-value cutoff = 10-4) against the protein predictions and scaffolds of the C. solmsi genome. Iterative searches were also conducted with each new protein of C. solmsi as a query until no new genes were identified in each major subfamily or lineage.
To further understand the evolutionary history and homologies between gene families of C. solmsi and other insects, we performed a phylogenetic analysis using the genes found in C. solmsi and some other insect taxa with completed genomes: A. mellifera, N. vitripennis, and D. melanogaster. The amino acid sequences of homologous genes were aligned with ClustalX v2.0 . ProtTest  identified evolutionary models that best fit this dataset according to the Akaike Information Criterion. A maximum likelihood tree was then reconstructed with PhyML 3.0 using the best-fit model with a gamma correction using four discrete classes, an estimated alpha parameter and proportion of invariable sites . Node support values were obtained by the rapid bootstrap algorithm as implemented in PhyML 3.0 (100 replicates). Some tree images were created using the iTOL web server . Gray circles on branches were used to indicate bootstrap values >80% from 100 bootstrap replicates.
We tested for selection on Gustatory receptor (Gr) and Olfactory receptor (Or) genes. The calculation of each Ka/Ks value of Gr or Or gene were based on each orthologous group of Gr or Or gene members among A. mellifera, N. vitripennis, and C. solmsi. The difference between mean Ka/Ks of Or plus Gr genes and all single-copy genes was compared using paired Wilcoxon rank sum tests. This determined if Or and Gr genes underwent different selective pressures than single-copy genes.
Validation of extremely contracted CCE, OBP, and Gr gene families
Since the fig wasp had many fewer genes than other insect species, we performed additional analyses to confirm that the absence of genes is not due to a poor or incomplete assembly or inadequate annotation. Validation of assembly quality was fully analyzed and described above (see genome assembly). We also tried to confirm the absence of genes by focusing on the annotations of the gene families CCE, OBP, and Gr, in which the fig wasp had the fewest gene members among the studied insects, by delving into the raw genomic reads.
We chose the library with the inserted fragment size of 500 bp (altogether 68,549,132 pair end reads after correction), which was 25× coverage of the genome, and aligned it to the assembled genome of C. solmsi by SOAP  (default except: -v, 5; -g, 3). Altogether, 91.2% of reads mapped to the genome. We then compared the unmapped reads (= 6,042,453) to the protein sequences of all CCE, OBP, and Gr members of the fig wasp, jewel wasp, and honeybee by blastx (references given in Table 2 and Additional file 1: Table S10) with e-value of 1e-5. Only one read mapped to gene AmGr9 (a Gr gene member in the honeybee), which indicated that the unmapped reads scarcely had any similar sequences in the three gene families. Thus, the absence of genes in these gene families was not due to incomplete annotation.
Detection of horizontal gene transfer
Two independent approaches were used to identify possible HGT events. The first used gene models. We used blastp (E-value cutoff 10-10 and a continuous overlap threshold of 33%) to compare the predicted protein sequences of C. solmsi against sequences in the RefSeq and nr databases to exclude unique genes and those with high similarity to other insects. Next, we constructed phylogenies for the retained proteins with highest similarity to non-insects. Multiple alignments were performed by using ClustalW2 , followed by manual refinement. Phylogenetic analyses were conducted using maximum likelihood (ML) and Bayesian inference (BI). A distance-based phenogram was also constructed using neighbor joining (NJ). ML trees were estimated by PhyML  using best-fit substitution model estimated by Prottest 3.0 . In all cases, we used a discrete gamma-distribution model with four rate-categories plus invariant positions. The gamma parameter and proportion of invariant sites were estimated from the data. Bootstrap branch-support values involved 1,000 pseudoreplicates. BI used MrBayes 3.1.2 . For each HGT, we ran two independent analyses using four MCMC chains (one cold and three hot) for one million generations and stopped them when the average deviation of split frequencies fell well below 0.01. We sampled trees every 100 generations and discarded the initial 25% of the total trees as burn-in. Compatible groups were shown in the majority rule consensus tree. NJ trees were constructed by using Neighbor in Mega5 . Bootstrap values were obtained by generating 1,000 pseudoreplicates. HGTs were detected by clustering non-related species on a well-supported node.
Using the scaffold sets of C. solmsi, we identified regions involved in recent HGT events between bacteria, fungi, plants, and other non-insects to C. solmsi. The pipeline involved blastn of C. solmsi against other non-insects. Usually, this approach has detected two categories of genes: candidate HGT events; and highly conserved genes shared by non-insects and C. solmsi (not HGTs). Thus, the pipeline formed two categories based on the following certain criteria:
HGT_scaf: non-insects score > animal score AND scaffold length > 5 K AND range_per < 90%;
highly_cons: best non-insects score < insect score.
The scaffolds with both non-insect and insect sequences served as evidence of non-contamination.
Data for this Whole Genome Shotgun project were deposited at DDBJ/EMBL/GenBank under the accession no. ATAC00000000. The version described in this paper is version ATAC01000000. The transcriptome data reported in this paper were deposited in the National Center for Biotechnology Information Short Read Archive  under the accession no. SRP029703.
Horizontal gene transfer
Janus kinase/signal transducers and activators of transcription
c-jun N-terminal kinase
Odorant binding protein
We thank Prof. Hugh M. Robertson in University of Illinois at Urbana-Champaign for helping in manual annotation of chemoreceptor genes. Thanks go to Prof. En-Kui Duan in Institute of Zoology, Chinese Academy of Sciences for helping in the analysis of the ommatidium structure of fig wasp. This project was supported by the National Natural Science Foundation of China (NSFC grant no. 31090253, 31172072, 31210103912), partially by Major Innovation Program of Chinese Academy of Sciences (KSCX2-EW-Z-2), by Program of Ministry of Science and Technology of the Republic of China (2012FY111100, 2011FY120200), by a grant (No. O529YX5105) from the Key Laboratory of the Zoological Systematics and Evolution of the Chinese Academy of Sciences, and by National Science Fund for Fostering Talents in Basic Research (Special subjects in animal taxonomy, NSFC- J1210002). Manuscript preparation was supported by a Discovery Grant from the Natural Sciences and Engineering Research Council of Canada (3148) and by a Visiting Professorship for Senior International Scientists from the Chinese Academy of Sciences to R.W. Murphy.
- Leigh EG: The evolution of mutualism. J Evol Biol. 2010, 23: 2507-2528. 10.1111/j.1420-9101.2010.02114.x.PubMedView ArticleGoogle Scholar
- You M, Yue Z, He W, Yang X, Yang G, Xie M, Zhan D, Baxter SW, Vasseur L, Gurr GM, Douglas CJ, Bai J, Wang P, Cui K, Huang S, Li X, Zhou Q, Wu Z, Chen Q, Liu C, Wang B, Li X, Xu X, Lu C, Hu M, Davey JW, Smith SM, Chen M, Xia X, Tang W, et al: A heterozygous moth genome provides insights into herbivory and detoxification. Nat Genet. 2013, 45: 220-225. 10.1038/ng.2524.PubMedView ArticleGoogle Scholar
- Herre EA, Knowlton N, Mueller UG, Rehner SA: The evolution of mutualisms: exploring the paths between conflict and cooperation. Trends Ecol Evol. 1999, 14: 49-53. 10.1016/S0169-5347(98)01529-8.PubMedView ArticleGoogle Scholar
- Cruaud A, Rønsted N, Chantarasuwan B, Chou LS, Clement WL, Couloux A, Cousins B, Genson G, Harrison RD, Hanson PE, Hossaert-Mckey M, Jabbour-Zahab R, Jousselin E, Kerdelhue C, Kjellberg F, Lopez-Vaamonde C, Peebles J, Peng YQ, Pereira RAS, Schramm T, Ubaidillah R, VanNoort S, Weiblen GD, Yang DR, Yodpinyanee A, Libeskind-Hadas R, Cook JM, Rasplus JY, Savolainen V: An extreme case of plant–insect codiversification: figs and fig-pollinating wasps. Syst Biol. 2012, 61: 1029-1047. 10.1093/sysbio/sys068.PubMedPubMed CentralView ArticleGoogle Scholar
- Price PW: Evolutionary Biology of Parasites. 1980, Princeton, NJ: Princeton University PressGoogle Scholar
- Thompson JN: The Coevolutionary Process. 2009, Chicago, IL: University of Chicago PressGoogle Scholar
- Ellers J: Toby Kiers E, Currie CR, McDonald BR, Visser B: Ecological interactions drive evolutionary loss of traits. Ecol Lett. 2012, 15: 1071-1082. 10.1111/j.1461-0248.2012.01830.x.PubMedView ArticleGoogle Scholar
- Tsai IJ, Zarowiecki M, Holroyd N, Garciarrubio A, Sanchez-Flores A, Brooks KL, Tracey A, Bobes RJ, Fragoso G, Sciutto E, Aslett M, Beasley H, Bennett HM, Cai J, Camicia F, Clark R, Cucher M, De Silva N, Day TA, Deplazes P, Estrada K, Fernandez C, Holland PWH, Hou J, Hu S, Huchvale T, Hung SS, Kamenetzky L, Keane JA, Kiss F, et al: The genomes of four tapeworm species reveal adaptations to parasitism. Nature. 2013, 496: 57-63. 10.1038/nature12031.PubMedPubMed CentralView ArticleGoogle Scholar
- Shigenobu S, Watanabe H, Hattori M, Sakaki Y, Ishikawa H: Genome sequence of the endocellular bacterial symbiont of aphids Buchnera sp. APS. Nature. 2000, 407: 81-86. 10.1038/35024074.PubMedView ArticleGoogle Scholar
- van Ham RCHJ, Kamerbeek J, Palacios C, Rausell C, Abascal F, Bastolla U, Fernández JM, Jiménez L, Postigo M, Silva FJ, Tamames J, Viguera E, Latorre A, Valencia A, Moran F, Moya A: Reductive genome evolution in Buchnera aphidicola. Proc Natl Acad Sci USA. 2003, 100: 581-586. 10.1073/pnas.0235981100.PubMedPubMed CentralView ArticleGoogle Scholar
- Klasson L, Walker T, Sebaihia M, Sanders MJ, Quail MA, Lord A, Sanders S, Earl J, O’Neill SL, Thomson N, Sinkins SP, Parkhill J: Genome evolution of Wolbachia strain wPip from the Culex pipiens group. Mol Biol Evol. 2008, 25: 1877-1887. 10.1093/molbev/msn133.PubMedPubMed CentralView ArticleGoogle Scholar
- Cook JM, Rasplus J-Y: Mutualists with attitude: coevolving fig wasps and figs. Trends Ecol Evol. 2003, 18: 241-248. 10.1016/S0169-5347(03)00062-4.View ArticleGoogle Scholar
- Weiblen GD: How to be a fig wasp. Annu Rev Entomol. 2002, 47: 299-330. 10.1146/annurev.ento.47.091201.145213.PubMedView ArticleGoogle Scholar
- Jevanandam N, Goh AGR, Corlett RT: Climate warming and the potential extinction of fig wasps, the obligate pollinators of figs. Biol Lett. 2013, 9: 20130041-10.1098/rsbl.2013.0041.PubMedPubMed CentralView ArticleGoogle Scholar
- Ahmed S, Compton SG, Butlin RK, Gilmartin PM: Wind-borne insects mediate directional pollen transfer between desert fig trees 160 kilometers apart. Proc Natl Acad Sci USA. 2009, 106: 20342-20347. 10.1073/pnas.0902213106.PubMedPubMed CentralView ArticleGoogle Scholar
- Kirkness EF, Haas BJ, Sun W, Braig HR, Perotti MA, Clark JM, Lee SH, Robertson HM, Kennedy RC, Elhaik E, Gerlach D, Kriventseva EV, Elsik CG, Graur D, Hill CA, Veenstra JA, Walenz B, Tubio JMC, Ribeiro JMC, Rozas J, Johnston JS, Reese JT, Popadic A, Tojo M, Raoult DL, Reed DL, Tomoyasu Y, Kraus E, Mittapalli O, Margam VM, et al: Genome sequences of the human body louse and its primary endosymbiont provide insights into the permanent parasitic lifestyle. Proc Natl Acad Sci USA. 2010, 107: 12168-12173. 10.1073/pnas.1003379107.PubMedPubMed CentralView ArticleGoogle Scholar
- Honeybee Genome Sequencing Consortium: Insights into social insects from the genome of the honeybee Apis mellifera. Nature. 2006, 443: 931-949. 10.1038/nature05260.View ArticleGoogle Scholar
- Werren JH, Richards S, Desjardins CA, Niehuis O, Gadau J, Colbourne JK, Group TNGW: Functional and evolutionary insights from the genomes of three parasitoid Nasonia species. Science. 2010, 327: 343-348. 10.1126/science.1178028.PubMedView ArticleGoogle Scholar
- Parra G, Bradnam K, Ning Z, Keane T, Korf I: Assessing the gene space in draft genomes. Nucleic Acids Res. 2009, 37: 289-297. 10.1093/nar/gkn916.PubMedPubMed CentralView ArticleGoogle Scholar
- Suen G, Teiling C, Li L, Holt C, Abouheif E, Bornberg-Bauer E, Bouffard P, Caldera EJ, Cash E, Cavanaugh A, Denas O, Elhaik E, Fave MJ, Gadau JD, Graur D, Graur D, Grubbs KJ, Hagen DE, Harkins TT, Helmkampf M, Hu H, Johnson BR, Kim J, Marsh SE, Moeller JA, Munoz-Torres MC, Murphy MC, Naughton MC, Nigam S, Overson R, et al: The genome sequence of the leaf-cutter ant Atta cephalotes reveals insights into its obligate symbiotic lifestyle. PLoS Genet. 2011, 7: e1002007-10.1371/journal.pgen.1002007.PubMedPubMed CentralView ArticleGoogle Scholar
- Schulze SR, Wallrath LL: Gene regulation by chromatin structure: paradigms established in Drosophila melanogaster. Annu Rev Entomol. 2007, 52: 171-192. 10.1146/annurev.ento.51.110104.151007.PubMedView ArticleGoogle Scholar
- Fazzari M, Greally J: Introduction to epigenomics and epigenome-wide analysis. Statistical Methods in Molecular Biology, Volume 620. Edited by: Bang H, Zhou XK, Epps HL, Mazumdar M. 2010, New York: Humana Press, 243-265.View ArticleGoogle Scholar
- Yang Z: PAML 4: phylogenetic analysis by maximum likelihood. Mol Biol Evol. 2007, 24: 1586-1591. 10.1093/molbev/msm088.PubMedView ArticleGoogle Scholar
- Matsuo N, Kawamoto S, Matsubara K, Okubo K: Cloning of a cDNA encoding a novel sugar transporter expressed in the neonatal mouse hippocampus. Biochem Biophys Res Commun. 1997, 238: 126-129. 10.1006/bbrc.1997.7252.PubMedView ArticleGoogle Scholar
- Patel RT, Soulages JL, Hariharasundaram B, Arrese EL: Activation of the lipid droplet controls the rate of lipolysis of triglycerides in the insect fat body. J Biol Chem. 2005, 280: 22624-22631. 10.1074/jbc.M413128200.PubMedView ArticleGoogle Scholar
- Bourne HR, Sanders DA, McCormick F: The GTPase superfamily: a conserved switch for diverse cell functions. Nature. 1990, 348: 125-132. 10.1038/348125a0.PubMedView ArticleGoogle Scholar
- Ware AB, Kaye PT, Compton SG, van Noort S: Fig volatiles: their role in attracting pollinators and maintaining pollinator specificity. Plant Syst Evol. 1993, 186: 147-156. 10.1007/BF00940794.View ArticleGoogle Scholar
- Kaupp UB: Olfactory signalling in vertebrates and insects: differences and commonalities. Nat Rev Neurosci. 2010, 11: 188-200.PubMedGoogle Scholar
- Miyamoto T, Slone J, Song X, Amrein H: A fructose receptor functions as a nutrient sensor in the Drosophila brain. Cell. 2012, 151: 1113-1125. 10.1016/j.cell.2012.10.024.PubMedPubMed CentralView ArticleGoogle Scholar
- Gotzek D, Robertson HM, Wurm Y, Shoemaker D: Odorant binding proteins of the red imported fire ant. Solenopsis invicta: an example of the problems facing the analysis of widely divergent proteins. PLoS ONE. 2011, 6: e16289-10.1371/journal.pone.0016289.PubMedPubMed CentralView ArticleGoogle Scholar
- Tribolium Genome Sequencing Consortium: The genome of the model beetle and pest Tribolium castaneum. Nature. 2008, 452: 949-955. 10.1038/nature06784.View ArticleGoogle Scholar
- Smadja C, Shi P, Butlin RK, Robertson HM: Large gene family expansions and adaptive evolution for odorant and gustatory receptors in the pea aphid, Acyrthosiphon pisum. Mol Biol Evol. 2009, 26: 2073-2086. 10.1093/molbev/msp116.PubMedView ArticleGoogle Scholar
- Smadja CM, Canback B, Vitalis R, Gautier M, Ferrari J, Zhou JJ, Butlin RK: Large-scale candidate gene scan reveals the role of chemoreceptor genes in host plant specialization and speciation in the pea aphid. Evolution. 2012, 66: 2723-2738. 10.1111/j.1558-5646.2012.01612.x.PubMedView ArticleGoogle Scholar
- McBride CS: Rapid evolution of smell and taste receptor genes during host specialization in Drosophila sechellia. Proc Natl Acad Sci USA. 2007, 104: 4996-5001. 10.1073/pnas.0608424104.PubMedPubMed CentralView ArticleGoogle Scholar
- Matsuo T: Genes for host-plant selection in Drosophila. J Neurogenet. 2008, 22: 195-210. 10.1080/01677060802298483.PubMedView ArticleGoogle Scholar
- Li X, Schuler MA, Berenbaum MR: Molecular mechanisms of metabolic resistance to synthetic and natural xenobiotics. Annu Rev Entomol. 2007, 52: 231-253. 10.1146/annurev.ento.51.110104.151104.PubMedView ArticleGoogle Scholar
- Isin EM, Guengerich FP: Complex reactions catalyzed by cytochrome P450 enzymes. BBA-General Subjects. 2007, 1770: 314-329. 10.1016/j.bbagen.2006.07.003.PubMedView ArticleGoogle Scholar
- Ishida Y, Leal WS: Rapid inactivation of a moth pheromone. Proc Natl Acad Sci USA. 2005, 102: 14075-14079. 10.1073/pnas.0505340102.PubMedPubMed CentralView ArticleGoogle Scholar
- Oakeshott JG, Johnson RM, Berenbaum MR, Ranson H, Cristino AS, Claudianos C: Metabolic enzymes associated with xenobiotic and chemosensory responses in Nasonia vitripennis. Insect Mol Biol. 2010, 19: 147-163.PubMedView ArticleGoogle Scholar
- Biswas S, Russell RJ, Jackson CJ, Vidovic M, Ganeshina O, Oakeshott JG, Claudianos C: Bridging the synaptic gap: Neuroligins and Neurexin I in Apis mellifera. PLoS ONE. 2008, 3: e3542-10.1371/journal.pone.0003542.PubMedPubMed CentralView ArticleGoogle Scholar
- Claudianos C, Ranson H, Johnson RM, Biswas S, Schuler MA, Berenbaum MR, Feyereisen R, Oakeshott JG: A deficit of detoxification enzymes: pesticide sensitivity and environmental response in the honeybee. Insect Mol Biol. 2006, 15: 615-636. 10.1111/j.1365-2583.2006.00672.x.PubMedPubMed CentralView ArticleGoogle Scholar
- Rebers JE, Willis JH: A conserved domain in arthropod cuticular proteins binds chitin. Insect Biochem Mol Biol. 2001, 31: 1083-1093. 10.1016/S0965-1748(01)00056-X.PubMedView ArticleGoogle Scholar
- Boutros M, Agaisse H, Perrimon N: Sequential activation of signaling pathways during innate immune responses in Drosophila. Dev Cell. 2002, 3: 711-722. 10.1016/S1534-5807(02)00325-8.PubMedView ArticleGoogle Scholar
- Lemaitre B, Hoffmann J: The host defense of Drosophila melanogaster. Annu Rev Immunol. 2007, 25: 697-743. 10.1146/annurev.immunol.25.022106.141615.PubMedView ArticleGoogle Scholar
- Tian C, Gao B, Fang Q, Ye G, Zhu S: Antimicrobial peptide-like genes in Nasonia vitripennis: a genomic perspective. BMC Genomics. 2010, 11: 187-10.1186/1471-2164-11-187.PubMedPubMed CentralView ArticleGoogle Scholar
- Syvanen M: Evolutionary implications of horizontal gene transfer. Annu Rev Genet. 2012, 46: 341-358. 10.1146/annurev-genet-110711-155529.PubMedView ArticleGoogle Scholar
- DeWayne SD, Machado CA, Molbo D, Werren JH, Windsor DM, Herre EA: The distribution of Wolbachia in fig wasps: correlations with host phylogeny, ecology and population structure. Proc R Soc London, B. 2002, 269: 2257-2267. 10.1098/rspb.2002.2100.View ArticleGoogle Scholar
- Berkeley Drosophila Genome Project. http://www.fruitfly.org,
- Verhulst EC, Beukeboom LW, van de Zande L: Maternal control of haplodiploid sex determination in the wasp Nasonia. Science. 2010, 328: 620-623. 10.1126/science.1185805.PubMedView ArticleGoogle Scholar
- Li R, Fan W, Tian G, Zhu H, He L, Cai J, Huang Q, Cai Q, Li B, Bai Y, Zhang Z, Zhang Y, Wang W, Li J, Wei F, Li H, Jian M, Li J, Zhang Z, Nielsen R, Li D, Gu W, Yang Z, Xuan Z, Ryder OA, Leung FC, Zhou Y, Cao J, Sun X, Fu Y, et al: The sequence and de novo assembly of the giant panda genome. Nature. 2010, 463: 311-317. 10.1038/nature08696.PubMedPubMed CentralView ArticleGoogle Scholar
- Meisel RP, Malone JH, Clark AG: Disentangling the relationship between sex-biased gene expression and X-linkage. Genome Res. 2012, 22: 1255-1265. 10.1101/gr.132100.111.PubMedPubMed CentralView ArticleGoogle Scholar
- Kent WJ: BLAT–The BLAST-like alignment tool. Genome Res. 2002, 12: 656-664.PubMedPubMed CentralView ArticleGoogle Scholar
- Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I, Adiconis X, Fan L, Raychowdhury R, Zeng Q, Chen Z, Mauceli E, Hacohen N, Gnirke A, Rhind N, di Palma F, Birren BW, Nusbaum C, Lindblad-Toh K, Friedman N, Regev A: Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nat Biotechnol. 2011, 29: 644-652. 10.1038/nbt.1883.PubMedPubMed CentralView ArticleGoogle Scholar
- Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S, Lee Y, White J, Cheung F, Parvizi B, Tsai J, Quackenbush J: TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003, 19: 651-652. 10.1093/bioinformatics/btg034.PubMedView ArticleGoogle Scholar
- Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B: AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic Acids Res. 2006, 34: W435-W439. 10.1093/nar/gkl200.PubMedPubMed CentralView ArticleGoogle Scholar
- Homo sapiens genome database. [http://ftp.ensembl.org/pub/release-64]
- Apis mellifera genome database. [http://hymenopteragenome.org/beebase/]
- Bombyx mori genome database. [http://silkdb.genomics.org.cn/silkworm/jsp/data.jsp]
- Drosophila melanogaster genome database. [http://ftp.flybase.org/genomes/Drosophila_melanogaster/dmel_r5.27_FB2010_04/]
- Nasonia vitripennis genome database. [http://hymenopteragenome.org/nasonia/]
- Birney E, Clamp M, Durbin R: GeneWise and Genomewise. Genome Res. 2004, 14: 988-995. 10.1101/gr.1865504.PubMedPubMed CentralView ArticleGoogle Scholar
- Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ, Salzberg SL, Wold BJ, Pachter L: Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nat Biotechnol. 2010, 28: 511-515. 10.1038/nbt.1621.PubMedPubMed CentralView ArticleGoogle Scholar
- Bairoch A, Apweiler R: The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic Acids Res. 2000, 28: 45-48. 10.1093/nar/28.1.45.PubMedPubMed CentralView ArticleGoogle Scholar
- Zdobnov EM, Apweiler R: InterProScan - an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001, 17: 847-848. 10.1093/bioinformatics/17.9.847.PubMedView ArticleGoogle Scholar
- Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT, Harris MA, Hill DP, Issel-Tarver L, Kasarskis A, Lewis S, Matese JC, Richardson JE, Ringwald M, Rubin GM, Sherlock G: Gene Ontology: tool for the unification of biology. Nat Genet. 2000, 25: 25-29. 10.1038/75556.PubMedPubMed CentralView ArticleGoogle Scholar
- Kanehisa M, Goto S: KEGG: kyoto encyclopedia of genes and genomes. Nucleic Acids Res. 2000, 28: 27-30. 10.1093/nar/28.1.27.PubMedPubMed CentralView ArticleGoogle Scholar
- RepeatMasker Open-3.3.0. [http://www.repeatmasker.org]
- Benson G: Tandem repeats finder: a program to analyze DNA sequences. Nucleic Acids Res. 1999, 27: 573-580. 10.1093/nar/27.2.573.PubMedPubMed CentralView ArticleGoogle Scholar
- AB-BLAST. [http://blast.advbiocomp.com]
- Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J: Repbase update, a database of eukaryotic repetitive elements. Cytogenet Genome Res. 2005, 110: 462-467. 10.1159/000084979.PubMedView ArticleGoogle Scholar
- RepeatModeler Open-1.0.5. [http://www.repeatmasker.org]
- Edgar RC, Myers EW: PILER: identification and classification of genomic repeats. Bioinformatics. 2005, 21: i152-i158. 10.1093/bioinformatics/bti1003.PubMedView ArticleGoogle Scholar
- Todd ML: Sean RE: tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res. 1997, 25: 955-964.View ArticleGoogle Scholar
- Griffiths-Jones S, Moxon S, Marshall M, Khanna A, Eddy SR, Bateman A: Rfam: annotating non-coding RNAs in complete genomes. Nucleic Acids Res. 2005, 33: D121-D124.PubMedPubMed CentralView ArticleGoogle Scholar
- Li H, Coghlan A, Ruan J, Coin LJ, Hériché J-K, Osmotherly L, Li R, Liu T, Zhang Z, Bolund L, Wong GKS, Zheng W, Dehal P, Wang J, Durbin R: TreeFam: a curated database of phylogenetic trees of animal gene families. Nucleic Acids Res. 2006, 34: D572-D580. 10.1093/nar/gkj118.PubMedPubMed CentralView ArticleGoogle Scholar
- Guindon S, Dufayard J-F, Lefort V, Anisimova M, Hordijk W, Gascuel O: New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol. 2010, 59: 307-321. 10.1093/sysbio/syq010.PubMedView ArticleGoogle Scholar
- Yang Z, Rannala B: Bayesian estimation of species divergence times under a molecular clock using multiple fossil calibrations with soft bounds. Mol Biol Evol. 2006, 23: 212-226.PubMedView ArticleGoogle Scholar
- Tracer version 1.4. [http://beast.bio.ed.ac.uk/Tracer]
- De Bie T, Cristianini N, Demuth JP, Hahn MW: CAFE: a computational tool for the study of gene family evolution. Bioinformatics. 2006, 22: 1269-1271. 10.1093/bioinformatics/btl097.PubMedView ArticleGoogle Scholar
- Löytynoja A, Goldman N: Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis. Science. 2008, 320: 1632-1635. 10.1126/science.1158395.PubMedView ArticleGoogle Scholar
- Talavera G, Castresana J: Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol. 2007, 56: 564-577. 10.1080/10635150701472164.PubMedView ArticleGoogle Scholar
- Zhang J, Nielsen R, Yang Z: Evaluation of an improved branch-site likelihood method for detecting positive selection at the molecular level. Mol Biol Evol. 2005, 22: 2472-2479. 10.1093/molbev/msi237.PubMedView ArticleGoogle Scholar
- Hymenoptera genome database. [http://hymenopteragenome.org/]
- Fruitfly genome database. [http://flybase.org]
- SignalP 4.1 Server. [http://www.cbs.dtu.dk/services/SignalP/]
- Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, McWilliam H, Valentin F, Wallace IM, Wilm A, Lopez R, Thompson JD, Gibson TJ, Higgins DG: Clustal W and Clustal X version 2.0. Bioinformatics. 2007, 23: 2947-2948. 10.1093/bioinformatics/btm404.PubMedView ArticleGoogle Scholar
- Abascal F, Zardoya R, Posada D: ProtTest: selection of best-fit models of protein evolution. Bioinformatics. 2005, 21: 2104-2105. 10.1093/bioinformatics/bti263.PubMedView ArticleGoogle Scholar
- Letunic I, Bork P: Interactive tree of life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics. 2007, 23: 127-128. 10.1093/bioinformatics/btl529.PubMedView ArticleGoogle Scholar
- Li R, Li Y, Kristiansen K, Wang J: SOAP: short oligonucleotide alignment program. Bioinformatics. 2008, 24: 713-714. 10.1093/bioinformatics/btn025.PubMedView ArticleGoogle Scholar
- Robertson HM, Wanner KW: The chemoreceptor superfamily in the honey bee, Apis mellifera: expansion of the odorant, but not gustatory, receptor family. Genome Res. 2006, 16: 1395-1403. 10.1101/gr.5057506.PubMedPubMed CentralView ArticleGoogle Scholar
- Robertson HM, Gadau J, Wanner KW: The insect chemoreceptor superfamily of the parasitoid jewel wasp Nasonia vitripennis. Insect Mol Biol. 2010, 19: 121-136.PubMedView ArticleGoogle Scholar
- Wurm Y, Wang J, Riba-Grognuz O, Corona M, Nygaard S, Hunt BG, Ingram KK, Falquet L, Nipitwattanaphon M, Gotzek D, Dijkstra MB, Oettler J, Comtesse F, Shih CJ, Wu WJ, Yang CC, Thomas J, Beaudoing E, Pradervand S, Flegel V, Cook ED, Fabbretti R, Stockinger H, Long L, Farmerie WG, Oakey J, Boomasma JJ, Pamilo P, Yi SV, Heinze J, et al: The genome of the fire ant Solenopsis invicta. Proc Natl Acad Sci USA. 2011, 108: 5679-5684. 10.1073/pnas.1009690108.PubMedPubMed CentralView ArticleGoogle Scholar
- Ronquist F, Huelsenbeck JP: MrBayes 3: Bayesian phylogenetic inference under mixed models. Bioinformatics. 2003, 19: 1572-1574. 10.1093/bioinformatics/btg180.PubMedView ArticleGoogle Scholar
- Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S: MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. 2011, 28: 2731-2739. 10.1093/molbev/msr121.PubMedPubMed CentralView ArticleGoogle Scholar
- Sequence Read Archive (SRA). [http://www.ncbi.nlm.nih.gov/sra]
This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.