New genes in the evolution of the neural crest differentiation program
Genome Biology volume 8, Article number: R36 (2007)
Development of the vertebrate head depends on the multipotency and migratory behavior of neural crest derivatives. This cell population is considered a vertebrate innovation and, accordingly, chordate ancestors lacked neural crest counterparts. The identification of neural crest specification genes expressed in the neural plate of basal chordates, in addition to the discovery of pigmented migratory cells in ascidians, has challenged this hypothesis. These new findings revive the debate on what is new and what is ancient in the genetic program that controls neural crest formation.
To determine the origin of neural crest genes, we analyzed Phenotype Ontology annotations to select genes that control the development of this tissue. Using a sequential blast pipeline, we phylogenetically classified these genes, as well as those associated with other tissues, in order to define tissue-specific profiles of gene emergence. Of neural crest genes, 9% are vertebrate innovations. Our comparative analyses show that, among different tissues, the neural crest exhibits a particularly high rate of gene emergence during vertebrate evolution. A remarkable proportion of the new neural crest genes encode soluble ligands that control neural crest precursor specification into each cell lineage, including pigmented, neural, glial, and skeletal derivatives.
We propose that the evolution of the neural crest is linked not only to the recruitment of ancestral regulatory genes but also to the emergence of signaling peptides that control the increasingly complex lineage diversification of this plastic cell population.
As first proposed by Gans and Northcutt [1, 2], the major evolutionary innovation of the vertebrate body plan relies on elaboration of a new head at the anterior end of an ancestral chordate trunk. The three existing groups of the phylum Chordata, namely urochordates (ascidians), cephalochordates (amphioxus), and craniates (including vertebrates and agnates), share many characteristics. These include a notochord, segmented trunk muscles, and a dorsal nerve cord. Molecular data have further confirmed these anatomic descriptions, revealing a conserved patterning mechanism along the anterior-posterior and dorso-ventral axes of the neural tube . Resting on this archetypal chordate body plan, unique populations of cells, the neural crest and the ectodermal placodes, evolved in craniates (referred to here as 'vertebrates' for simplicity). The emergence of these pluripotent cells is linked to the evolution of more sophisticated sensory and predatory organs (for instance, jaws). These new organs, in conjunction with an increasingly complex brain, allowed the shift from a filter-feeding style of life toward active predatory strategies [2, 4].
The neural crest is a transient population of embryonic cells that originate at the boundary between neural plate and dorsal ectoderm. Secreted from neighboring tissues, signaling molecules of the Wnt, Fgf, and Bmp families cooperate to activate a distinct combination of transcription factors at the neural plate border. Among those are members of the Pax, Zic, Snail, Sox, and Msx families, which constitute the neural crest specification network [5, 6]. Shortly after their dorsal specification, neural crest cells undergo an epithelial-to-mesenchymal transition, migrate, and finally, upon arrival at their destination, they give rise to a variety of cell types. These include peripheral neurons, glial and Schwann cells, pigment cells, endocrine cells, cartilage, and bone [7, 8]. This large diversity of derivatives arises through a complex mechanism of lineage restriction, which operates both early, on the pluripotent precursors at the dorsal neural tube , and later, during the migration and differentiation of precursors already committed to different degrees [10, 11]. Environmental cues found throughout neural crest migratory routes play a fundamental role not only in instructing the precursor's differentiation into particular phenotypes, but also in controlling their proliferation and survival . Among these extracellular cues, classical signaling molecules such as Fgfs, Wnts, Bmps and transforming growth factor (TGF)-βs, in conjunction with locally produced cytokines such as neurotropins, endothelins, glial-derived neurotropic factor (GDNF), neuregulin and cKit, have been shown to influence precursor fate and survival [12, 13].
The neural crest has traditionally been considered the key structure acquired very early by craniate pioneers. The presence of cartilage first and biomineralized material later in the head of the earliest craniate fossils supports this view [14, 15]. Because of their particular nature, the evolution of cartilage and bone elements can easily be traced in the large collection of Cambrian fossils. Many fossil fish exhibit neural crest derived exoskeletal coverings of dermal bone that extend partially over the trunk, with no trace of mesenchymal endoskeleton . These paleontologic records indicate that in early vertebrates cartilage and bones arose first in the context of the cephalic neural crest, and that only later was this genetic program co-opted by the para-axial sclerotome .
The existence of an ancestral population of cells in early chordates that give rise to vertebrate neural crest on the one hand and to basal chordate dorsal derivatives on the other has been proposed several times [2, 18–20]. This hypothesis is supported by the conservation of many components of the neural crest specification network in chordates . Furthermore, migratory cells that express neural crest markers and differentiate as pigmented cells have recently been identified in the urochordate Ecteinascidia turbinate . These data reinforce the hypothesis of pan-chordate 'precursors' behaving similarly and expressing a set of genes homologous to the modern neural crest. According to this view, the innovative drive impelling neural crest evolution stems from the evolution of their cis-regulatory elements - a process facilitated by the ancestral duplication of the vertebrate genome. The duplication of key developmental genes would have released enough evolutionary pressure to facilitate their divergence and hence the evolution of new functions . Although the existence of pan-chordate 'precursors' offers a satisfactory answer to the evolutionary origin of the neural crest, it fails to account for the acquisition of fundamental properties of this tissue. These include the pluripotency of the neural crest precursors that now give rise to novel cell types that are present neither in basal chordates nor in other metazoans.
To gain insight into the origin and evolution of neural crest properties, we have chosen a bioinformatics approach to analyze the phylogeny of tissue-specific developmental programs in a systematic manner. Our analytical tool takes advantage of an extensive collection of mouse genes annotated through Mammalian Phenotype Ontology terms  (at Mouse Genome Informatics [MGI] ). According to their related mouse mutant phenotype annotations, we grouped genes into tissue-specific genetic programs. We then explored the phylogeny of each program using a sequential blast pipeline. We defined as 'new genes' those encoding proteins that did not exhibit any significant homology in previous phylogenetic categories, either because they are extremely divergent or because they have evolved de novo. For each group, the total number of new genes at each branch of the evolutionary tree was analyzed. These graphical representations (gene emergence plots) are characteristic for each tissue/organ. They show how the rate of gene innovation has changed during the evolution of a particular tissue. These data substantiate the traditional concept that neural crest is a vertebrate innovation. In addition, our systematic analysis demonstrates that neural crest evolution builds not only on the rewiring of gene networks but also on the emergence of new genes. Gene Ontology (GO) analysis of the group of new neural crest components revealed remarkable enrichment in extracellular ligands. Half of the vertebrate new genes encode secreted cytokines that are known to control the specification and survival of the different neural crest derivatives, including pigment cells, neurons, glial cells, and skeletal components. Here we propose that the emergence of these novel ligands is associated with the evolutionary transition of a relatively simple cell population, in the dorsal neural tube of ancestral chordates, toward the lineage complexity of the vertebrate neural crest.
Results and discussion
How animal body plans are modified in relation to the evolution of their genome is an intricate issue. Acquisition of novel properties in a particular cell type, or even innovative changes in tissues and organs, can very often be attributed to modifications in the wiring of pre-existing gene networks . However, a fundamental process in genome evolution is also the emergence of new genes. Several molecular mechanisms, including exon shuffling, gene duplication and fusion, transposition, fast sequence divergence, and entire de novo origin, have been proposed to serve as sources for gene innovation . In this work we explore the phylogeny of the genes that are involved in neural crest development to gain insight into the evolution of neural crest properties. We aimed to determine which components of the vertebrate neural crest gene program are ancient, and hence have been recruited to perform a function in this tissue, and which components evolved only recently.
Determining the origin of vertebrate proteins through a sequential blast pipeline
As a first step in determining when neural crest genes evolved, we filtered mouse proteins through a sequential blast pipeline. All 23,658 known mouse protein sequences (EnsEMBL v31) were consecutively blasted against available genomes grouped into seven different evolutionary categories (prokaryota, eukaryota, metazoa, deuterostomia, chordata, vertebrata, and mammalia) using a relaxed threshold of E = 10-4, as established in similar studies [26, 27]. Proteins exhibiting homology when blasted against the prokaryotic genomes were classified as ancient. The remaining genes were subsequently blasted against eukaryotic genomes and the procedure was repeated until all genes were classified (Figure 1a). According to our definition, 'new genes' in each category are those encoding proteins that did not exhibit any significant homology in previous categories, either because they have diverged extensively from a former protein or because they have evolved de novo.
A direct comparison of the percentage of genes appearing in each category with an estimation of their respective age in millions of years  indicated that the frequency of gene emergence is higher for late categories (specifically, metazoans to mammals; Figure 1b,c). This higher frequency of innovation correlates with the reported observation that the rate of evolution for proteins (calculated as the ratio between nonsynonymous and synonymous amino acid substitutions) is also higher for more recent categories .
To elucidate whether 'new proteins', because of their divergent amino acid sequences, correlate with the emergence of novel molecular functions, we performed a GO analysis . For each evolutionary category we identified the GO terms that are statistically over-represented compared with all of the known mouse proteins. The 10 most significantly over-represented GO terms for each of the seven different categories are listed in Table 1 (also see Additional data file 1 for a full list of over-represented GO terms). Our analysis shows that, within a large evolutionary window, innovations are associated with the emergence of 'new genes'. Although the first category, prokaryota, is enriched in genes that are involved in general cell metabolism, GO terms of genes appearing first in eukaryotes demonstrate their function in the newly evolved subcellular organelles. In metazoans we find the GO terms 'cell communication', 'signal transduction', and 'receptor activity' to be highly over-represented, which is in accordance with a de novo requirement for cell-cell communication and tissue subspecialization in the context of multicellularity. Interestingly, the collection of genes appearing first in vertebrates and mammals is enriched in terms such as 'hormone activity', 'receptor binding', 'extracellular space', and 'cytokine response', suggesting that diversification of receptor ligands is linked to vertebrate evolution. In summary, our sequential blast pipeline reliably classifies genes according to their first appearance within the phylogenetic tree.
Assignment of neural crest genes based on phenotypic data
In order to investigate when neural crest genes arose during evolution, it was necessary to build a comprehensive list of genes involved in the development of this tissue. A large number of studies, in particular the phenotypic analysis of mutations in mice, generated by either mutagenesis or genetic engineering, have led to the identification of many genes that are involved in neural crest development . The Mammalian Phenotype Browser, at MGI , provides a comprehensive resource of phenotypic information derived from mouse mutant studies . Because phenotypic analysis annotations offer the most reliable read out of gene function, we took advantage of this large collection of mouse mutants in our study. The collection includes more than 14,000 genotype records associated with a total of 6,442 genes (27% of the total mouse transcriptome), and furthermore it includes the majority of the genes demonstrated to play a bona fide role in neural crest development. In the MGI database each mutation is annotated by a controlled vocabulary of phenotypic terms that describe the effect of a genetic variation on different tissues, organs, or systems. We selected the Mammalian Phenotype Ontology for terms associated with mutations affecting both neural crest precursors and its derivative cell types and tissues.
At the Mammalian Phenotype Browser the ontology term 'abnormal neural crest cells' (MP:0002949:) is reserved for phenotypes that affect the early migration of neural crest cells. Because of this stringent definition, only eight genes are included in this definition. However, when we took phenotypes associated with the development of neural crest derivatives into account, we retrieved a comprehensive list of 615 genes. In our analysis we considered three main groups of neural crest derivatives: pigmented cells, skeletal components, and elements of the peripheral nervous system. The 'pigmentation derivatives phenotype' is completely covered by a single term, namely 'pigmentation phenotype' (MP:0001186). The 'bone derivatives phenotype' terms consist of 'craniofacial phenotype' (MP:0005382) and 'skeleton phenotype' (MP:0005390). At this point, it could be argued that vertebrate neural crest cells only give rise to cranial skeleton and teeth, whereas the axial skeleton has a mesodermal origin. As already mentioned, however, paleontologic records indicate that skeletal elements evolved within the context of the neural crest and only later was this genetic program co-opted by the sclerotome . The 'peripheral nervous system derivatives phenotype' consists of 'abnormal autonomic nervous system morphology' (MP:0002751), 'abnormal peripheral nervous system glia' (MP:0001105), 'abnormal somatic sensory system morphology' (MP:0000959), and 'peripheral nervous system degeneration' (MP:0000958). We grouped these three categories under the general term 'neural crest derivatives phenotype'.
Determining the origin of the neural crest gene set: gene emergence rate plots
The sequential blast pipeline provides a list of genes that emerge along the evolutionary tree in each of the seven defined categories, whereas the phenotypic annotation provides a functional link for each of these genes. Combining both, we determined in which category each of the 615 neural crest genes emerged (see Additional data file 2 for the full dataset). Previous studies had promoted the idea that gene co-option was the driving force for neural crest invention . Our data strongly support this view because the majority (91%) of genes involved in neural crest development was already present in basal metazoans or even before. Thus, key transcription factors acting as both 'neural plate border specifiers' (such as Pax3, Dlx5, Zic, and Msx1/2) and 'neural crest specifiers' (such as FoxD, Snail/Slug, Sox9/10, Twist, and AP-2) can be traced back to our category 'metazoans' or 'eukaryotes'. Similarly, the Fgf, Wnt, and Bmp signaling pathways involved in induction of the neural plate border are ancestral. Although their corresponding ligands can be traced back to basal metazoans, the kinase activity of their receptors was already present in prokaryotes. Altogether, these data confirm the idea that gene recruitment played an important role during neural crest evolution.
However, we found that a substantial percentage of the genes (9%, listed in Table 2) involved in neural crest development evolved in deuterostomes during the past 550 million years. To determine, within this evolutionary window, how the rate of gene emergence in the neural crest relates to the rate of innovation in other tissues, we plotted the cumulative number of genes appearing in each category. In these graphs, the tissue-specific evolutionary profile of gene emergence is depicted (Figure 2). In order to quantify the profile of the graphs we calculated 'gene emergence rate' (ger) values, as a numeric representation of the gene innovation rate from an earlier category to a later one (see Materials and methods for a description of the formula). A ger value of 1 indicates a constant profile of gene innovation. Higher ger values indicate increased appearance of new genes in a particular tissue.
For each of the tissue-specific gene programs studied, we ordered the ger values at the chordate-vertebrate transition (Figure 2a). Notably, tissues/systems ontogenetically derived from ventral mesoderm, and hence considered modern vertebrate innovations [2, 17, 30, 31], such as the hematopoietic, immune, or renal/urinary system, exhibit graphs that peak at the chordate-vertebrate transition (Figure 2b). In contrast, other tissues already present in all chordates, namely the epidermis or endodermal derivatives such as liver, respiratory, and digestive systems, have a flat profile, with lower ger values (Figure 2b). Both the profile of the neural crest gene emergence plot (Figure 3) and its ger value (3.1) indicate that the neural crest is among the most innovative vertebrate tissues (Figure 2a). This concept can be extended to each individual neural crest lineage, in particular to pigmented or bone derivatives, as deduced from their respective gene emergence plots (Figure 3). Interestingly, compared with the other crest derivatives, the ger value of the gene set associated with the peripheral nervous system derivatives is lower (1.6). This may best be explained by co-option from the ancestral program of neural development. In summary, our gene emergence plots that reliably reflect evolutionary innovation highlight the novelty of neural crest as a tissue.
Emergence of neural crest molecules defining novel cellular functions
The notion of neural crest as a tissue with a high rate of gene innovation apparently contradicts our finding that all known neural crest specifiers can be traced back at least to metazoans. To further address this point, we focused on the collection of neural crest 'new genes' to gain insight into their molecular nature and function.
Neural crest has been postulated as a fourth germ layer . This concept builds on neural crest pluripotency and the fact that in vertebrates it gives rise to novel cell types such as the skeletal derivatives or the specialized melanocytes . Consistently, in the collection of vertebrate/mammalian new genes, we found molecules defining the physiology of these novel cell types. This is the case for the genes Ru (Hermansky-Pudlak syndrome 6) and silver, which encode components of the specialized melanocyte lysosomes, the melanosomes. Similarly, several new genes encode extracellular proteins that constitute part of the bone matrix (for example, bone gla protein and the phosphoglycoprotein mepe) and enamel, the outermost covering of teeth and the hardest tissue in the body (for example, ameloblastin and amelogenin).
Emergence of ligands for neural crest lineage specification
Strikingly, 50% of neural crest genes appearing first in vertebrates encode extracellular ligands. This remarkable enrichment (confirmed by exploring GO term frequency; see Additional data file 3) is in accordance with our previous whole-transcriptome GO analysis (Table 1). It suggests that diversification of receptor ligands played an important role during vertebrate evolution in general and neural crest evolution in particular. Individual analysis of the function of these peptides during the development of the neural crest demonstrates that they control the commitment of precursors to the different lineages.
Conserved signaling pathways have an early influence on the phenotypic diversification of premigratory neural crest cells . Bmp2/4 can directly induce autonomic neurogenesis [33, 34], while Wnt signaling participates in melanocyte specification . Superimposed on this, a second network of 'modern' vertebrate specific cytokines, produced locally, acts not only in neural crest cell fate specification but also in the migratory behavior and survival of all neural crest lineages . Melanocyte specification and survival depend on soluble proteins such as steel factor (kit ligand), endothelin-3, α-melanocyte stimulating hormone, and nonagouti ; gliogenesis in the peripheral nervous system is controlled by neuregulins and endothelin-3 [37, 38]; the development of autonomic and sensory neurons is controlled by neurothropins (brain-derived neurotropic factor, neurothropin-3, and neurothropin-4) and GDNF family members (GDNF and neurturin) [39, 40]; and, finally, the differentiation of the skeletal lineage is specified by endothelin-1 . Our sequential blast pipeline analysis shows that the vast majority (9/11) of the above-mentioned cell fate specification ligands emerged in vertebrates or, to a lesser extent (steel factor and nonagouti), in mammals.
Interestingly, the blast pipeline uncovered a positive hit in the echinoderm Strongylocentrotus purpuratus genome for the neurotropin family members brain-derived neurotropic factor and neurothropin-3. Because it has been proposed that neurotropins constitute a vertebrate innovation , we performed a ClustalX alignment  of mouse neurotropins against the echinoderm sequence ( Additional data file 4). This revealed that the particular array of cysteines conserved in all neurotropins, the so-called 'cysteine knot' , is also present in the echinoderm sequence and therefore identifies it as a putative growth factor. However, the limited amino acid identity (33%) and the lack of conservation in critical residues required for neurotropin binding to Trk receptors indicate that the echinoderm neurotropin-related protein cannot be considered a bona fide neurotropin. This suggests that neurotropins evolved from divergent ligands present in ancestral chordates. In fact, the example of neurotropins may be just part of a more general mechanism because other 'new cytokines' can be related to pre-existing growth factors. Supporting this view, GDNF and neurturin are divergent members of the TGF-β superfamily of ligands, as indicated by their particular cysteine knot and hence folding . Similarly, despite their limited homology, neuregulins belong to the epidermal growth factor superfamily of ligands .
Taken together, our data show that the cytokine network acting in neural crest cell fate specification is mainly a vertebrate innovation (Figure 4). Furthermore, these analyses indicate that an important proportion of the 'new ligands' are derived from fast evolving growth factors.
Phylogenetic analysis of the emergence of Pfam domains
The comparative analysis of gene emergence plots highlights a high rate of gene innovation for the neural crest during vertebrate evolution. In fact, there are reasons to believe that our estimation on the rate of gene emergence may be conservative. In the sequential blast pipeline analysis, the presence of an ancestral conserved domain will shadow the appearance of evolutionarily more recent domains within the same molecule. This may be particularly relevant in the case of large multidomain proteins such as receptors.
To overcome this constraint and to complement our studies, we conducted a phylogenetic analysis of the Pfam motifs (defined by multiple alignment of proteins ) occurring in the collection of 615 neural crest genes. From a total of 8,183 Pfam domains annotated in EnsEMBL, 499 are present in the set of 615 neural crest genes. We screened for these motifs in the seven different categories, detecting homology through two different approaches: blasting Pfam consensus sequences (threshold of E = 10-4) and searching for hidden Markov models (HMMs) using HMMER software with standard parameters . We compiled a table including all neural crest genes with their Pfam domains and when they occur first in the defined seven temporal classes, as detected using either of the methods (Additional data file 5). A list including only those genes that contain a Pfam domain emerging in vertebrates is compiled in Table 3. Pfam domain detection supports and refines our sequential blast pipeline results. Thus, GDNF and neurturin were identified as divergent members of the TGF-β superfamily, and the kit-ligand and nonagouti domains were detected as vertebrate novelties (previously detected as mammalian innovations; Table 2). Furthermore, the analysis also confirmed the ClustalX alignments demonstrating that the neurotropin domain (nerve growth factor; Table 3) is indeed a vertebrate innovation. In summary, our domain-based approach (more sensitive and accurate, but limited to annotated Pfam domains) complements the sequential blast analysis (Table 2), providing independent confirmation of the emergence in vertebrates of growth factors that are involved in the specification/survival of the neural crest cells (Table 3).
In addition, the domain-based approach also detected 'new Pfam motifs' masked in the sequential blast pipeline studies by the presence of an ancient domain. An example is the appearance in vertebrates of regulatory domains, such as TF_Otx, caudal_act, and Hox9_act, which are present in homeobox-containing transcription factors that belong to the Otx, Cdx, and Hox9 families, respectively (Table 3). We have shown that half of the neural crest genes appearing first in vertebrates encode extracellular ligands. This is contrasted by the Pfam domain analysis of the corresponding receptors. Only a single domain in ligand receptors is identified as a vertebrate novelty, namely the GDNF domain, which is present in the GDNF and neurturin coreceptor termed GFRalpha-1. This observation suggests that receptor evolution requires only subtle changes (in the sequence of their extracellular domains) to allow interaction with the 'new ligands', changes that are too subtle to be detected as discrete 'new domains' in our analysis.
Final remarks: toward a comprehensive hypothesis on neural crest evolution
Our understanding of how developmental regulatory pathways evolved in metazoans is now building upon steadily accumulating sequence collections that cover representative taxonomic groups. Here we have developed and applied a bioinformatics approach that allows us first to define components of the neural crest developmental gene program and then to analyze their phylogeny. Our evolutionary study, as for others based on comparative genomics, is limited by the quality of the available resources. The validity of the conclusions, beyond individual evolutionary relationships among genes, arise from the global picture provided by the properties of large datasets in which no systematic bias has been introduced. In our study we have considered several potential sources of bias. An important limitation in comparative studies is the arbitrary definition of the components of a particular gene network or gene program. Often this definition is directly inferred from the literature . To avoid this, the phenotypic analysis of mutants offers the most reliable read out of gene function, and at the same time it provides an unbiased definition. The fact that 'less conspicuous' phenotypic features, such as phenotypes associated with the immune or hematopoietic system, are as well annotated as the more obvious ones in pigmentation or skin indicates that there is no global bias in our analysis toward the detection of a given phenotype. Another possible caveat when interpreting studies of this type may come from massive gene loss in sister phyla, which will result in the false impression of new genes emerging in the phylum considered. These losses are particularly apparent in protostomia . In our analysis, focused on deuterostomia groups, these effects are well buffered by filtering the data not only through tunicates but also through echinoderm and cephalochordate sequences. The fact that it is highly unlikely that the same gene is independently lost in all three phylogenetic branches levels potential bias through gene loss and gives robustness to our approach.
Our data show that new genes, either resulting from gene divergence or de novo gene evolution, are linked to the appearance of novel molecular and cellular functions. Comparative study of different tissues shows the highest gene emergence rates for those tissues considered vertebrate innovations, such as neural crest and ventral mesoderm derivatives . For the neural crest gene program, we show that half of the genes appearing first in vertebrates encode growth factors with a reported role in committing precursor fate (Figure 4). Our whole-genome analysis also shows that GO terms such as 'hormone activity', 'receptor binding', 'extracellular space', and 'cytokine response' are highly enriched in the collection of genes that emerge in vertebrates. Therefore, the expansion of the ligand toolkit during evolution does not appear to be limited to the neural crest. Rather, it also occurred in other vertebrate-specific tissues, which evolved from ancestral chordates. Examples of this are the vertebrate-specific interleukins and hematopoietic cytokines that control fate, maturation, and survival of the complex lymphoid and blood cell lineages. Taken together, our data indicate that the appearance of new growth factors satisfied an evolutionary requirement for signal diversification, beyond the ancestral network of signaling peptides.
The diversification of ancestral ligands may represent an independent evolutionary advantage distinct from a parallel diversification of receptors. The independent evolution of derived growth factor genes, now under the control of divergent promoter sequences, introduces additional complexity in the spatial and temporal regulation of signaling. Divergent ligands may also interact very differently with extracellular matrix components, with concomitant changes in gradient shape and presentation to the receptors. Ligands with those new properties can now take advantage of existing receptors and downstream signaling pathways that are present in competent cells. This further refinement of the activity of growth factors may be particularly important in lineage-rich tissues, in which it is crucial to discriminate inductive signals involved in cell fate determination.
Previous theories on neural crest evolution have mainly focused on the ontogenetic and phylogenetic origin of the tissue from the dorsal area of the ancestral chordate neural tube. Along these lines, the rewiring of the genes involved in the neural crest specification network has been invoked as the main evolutionary driving force . Our data now expand this view by suggesting that new signaling molecules were required to control further the development of the neural crest into its different derivatives, which are essential components of the actual vertebrate body plan.
Materials and methods
Blast searches and assignment of temporal categories
To estimate the emergence of mammalian genes we analyzed the set of all 23658 known mouse protein sequences from EnsEMBL v31. We identified any gene product related to the mouse reference sequences in 225 different genomes by using blast . A complete list of genomes and their origin is given in supplementary information (Additional data file 6). Genomes were downloaded from Cogent, EnsEMBL, and NCBI. These genomes were grouped into seven temporal categories based on their evolutionary origin (Table 4).
The first appearance of mouse proteins during evolution was assessed through a sequential blast pipeline using a relaxed cutoff value (E = 10-4) and standard parameters (blastp for protein and tblastn for nucleotide databases) to detect homology in more distant species [26, 27]. We assigned each of the 23,658 mouse proteins to one of the seven evolutionary categories according to when their first hit occurred (in which taxonomic group). Genes already assigned to a temporal category were excluded from further blast analysis. The remaining genes were then subsequently blasted against the following genomes until eventually all mouse genes were classified.
To account for any possible effect due to mouse-specific gene loss biasing our analysis, we performed the following control. In addition to using the mouse gene set as an input for the sequential blast pipeline, we also launched the filtering process using other vertebrate groups, namely chicken, xenopus, and zebrafish genomes. Independent of the input used, we observed a similar distribution of genes in the various evolutionary categories (Additional data file 7). This finding indicates that there is no evident specific gene loss in the mouse. This necessary control further corroborates the choice of the mouse genome as a representative vertebrate.
Gene Ontology analysis
We looked for GO terms that were statistically over-represented in our temporal categories. Each of these gene sets was compared with the whole set of GO annotated mouse genes. We used mouse MGI GO annotation available at the GoStat web server  for this analysis . GoStat compares the occurrence of each GO term for each different temporal category and for the reference genes, and performs a Fisher's exact test to judge whether the observed difference is significant. A complete list of all over-represented and under-represented GO annotations is provided in Additional data file 1.
Retrieving phenotype annotations
The list of genes described by phenotype ontology was obtained from the MGI report (3.22 release): MRK_Pheno_Ensembl.rpt . This table represents the MGI marker associations with Phenotype Annotations and EnsEMBL sequence. The main phenotypical categories stored in the Mammalian Phenotype Ontology are the following: adipose tissue phenotype (MP:0005375), behavior/neurologic phenotype (MP:0005386), cardiovascular system phenotype (MP:0005385), craniofacial phenotype (MP:0005382), digestive/alimentary phenotype (MP:0005381), endocrine/exocrine gland phenotype (MP:0005379), hearing/ear phenotype (MP:0005377), hematopoietic system phenotype (MP:0005397), immune system phenotype (MP:0005387), limbs/digits/tail phenotype (MP:0005371), liver/biliary system phenotype (MP:0005370), muscle phenotype (MP:0005369), nervous system phenotype (MP:0003631), pigmentation phenotype (MP:0001186), renal/urinary system phenotype (MP:0005367), reproductive system phenotype (MP:0005389), respiratory system phenotype (MP:0005388), skeleton phenotype (MP:0005390), skin/coat/nails phenotype (MP:0005393), and vision/eye phenotype (MP:0005391).
Gene emergence rate calculation
In order to quantify the relative change in the number of 'new genes' arising at a given temporal category, we define the gene emergence rate (ger) as the ratio between the number of genes emerging in the analyzed temporal category (vertebrates in our case) and the number of genes emerging in the previous temporal category (chordates in our case). Thus, for the transition from chordates to vertebrates, the ger value is defined as follows:
Where N deu is the cumulative number of 'new genes' at the level of deuterostomes, N cho is the cumulative number of novel genes at the level of chordates, and N ver is the cumulative number of 'new genes' at the level of vertebrates.
Assignment of Pfam domains to temporal categories through HMM and blast searches
To derive a more detailed view of the evolution of proteins involved in neural crest development, we examined when the protein domains found in the list of neural crest genes are first detectable in our temporal categories. First, we downloaded the Pfam annotations for the identified 615 neural crest genes from EnsEMBL (version 32). Of the total of 8183 Pfam domains, 499 are present (annotated in EnsEMBL) in the set of 615 genes. We downloaded consensus sequences and HMMs for these domains from Pfam (version 19.0) .
Pfam domains were searched in the temporal category databases using two methods: blasting the Pfam consensus sequence with an E value threshold of 10-4 and searching HMM using HMMER software  applying standard parameters. In general, the HMM search was more sensitive and able to detect a domain earlier than the Pfam consensus blast analysis. For the expressed sequence tag databases, we used only the blast search. A full table, including all neural crest genes with their Pfam domains and their appearance as detected by either of the methods, was compiled (Additional data file 5).
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a table including a full list of statistically over-represented GO annotations of genes belonging to each of the seven categories (cutoff P < 0.001, sample count ≥ 15). Additional data file 2 is a table listing the 615 neural crest genes compiled using Phenotype Ontology annotations for each of the seven temporal categories considered in this work. Additional data file 3 is a table showing statistically over-represented GO annotations of the set of neural crest developmental genes that emerged in vertebrates (cutoff P < 0.001). Additional data file 4 shows ClustalX alignment of mouse neurotropins against the echinoderm peptide. Additional data file 5 shows phylogenetic analysis of neural crest Pfam domains emergence through evolution. Additional data file 6 provides a complete list of genomes of species included in this work and their respective sources. Additional data file 7 sequential blast analysis using other vertebrate groups as a control for the gene phylogeny analysis.
Gans C, Northcutt RG: Neural crest and the origin of vertebrates: a new head. Science. 1983, 220: 268-272. 10.1126/science.220.4594.268.
Northcutt RG, Gans C: The genesis of neural crest and epidermal placodes: a reinterpretation of vertebrate origins. Q Rev Biol. 1983, 58: 1-28. 10.1086/413055.
Wada H, Satoh N: Patterning the protochordate neural tube. Curr Opin Neurobiol. 2001, 11: 16-21. 10.1016/S0959-4388(00)00168-9.
Manzanares M, Nieto MA: A celebration of the new head and an evaluation of the new mouth. Nuron. 2003, 37: 895-898. 10.1016/S0896-6273(03)00161-2.
LaBonne C, Bronner-Fraser M: Molecular mechanisms of neural crest formation. Annu Rev Cell Dev Biol. 1999, 15: 81-112. 10.1146/annurev.cellbio.15.1.81.
Meulemans D, Bronner-Fraser M: Gene-regulatory interactions in neural crest evolution and development. Dev Cell. 2004, 7: 291-299. 10.1016/j.devcel.2004.08.007.
LeDouarin N, Kalheim C: The Neural Crest. 1999, New York: Cambridge University Press
Morales AV, Barbas JA, Nieto MA: How to become neural crest: from segregation to delamination. Semin Cell Dev Biol. 2005, 16: 655-662. 10.1016/j.semcdb.2005.06.003.
Bronner-Fraser M, Fraser SE: Cell lineage analysis reveals multipotency of some avian neural crest cells. Nature. 1988, 335: 161-164. 10.1038/335161a0.
Fraser SE, Bronner-Fraser M: Migrating neural crest cells in the trunk of the avian embryo are multipotent. Development. 1991, 112: 913-920.
Le Douarin NM, Dupin E: Cell lineage analysis in neural crest ontogeny. J Neurobiol. 1993, 24: 146-161. 10.1002/neu.480240203.
Le Douarin NM, Dupin E: Multipotentiality of the neural crest. Curr Opin Genet Dev. 2003, 13: 529-536. 10.1016/j.gde.2003.08.002.
Dorsky RI, Moon RT, Raible DW: Environmental signals and cell fate specification in premigratory neural crest. Bioessays. 2000, 22: 708-716. 10.1002/1521-1878(200008)22:8<708::AID-BIES4>3.0.CO;2-N.
Holland LZ, Holland ND: Evolution of neural crest and placodes: amphioxus as a model for the ancestral vertebrate?. J Anat. 2001, 199: 85-98.
Mallatt J, Chen JY: Fossil sister group of craniates: predicted and found. J Morphol. 2003, 258: 1-31. 10.1002/jmor.10081.
Donoghue PC, Sansom IJ: Origin and early evolution of vertebrate skeletonization. Microsc Res Tech. 2002, 59: 352-372. 10.1002/jemt.10217.
Shimeld SM, Holland PW: Vertebrate innovations. Proc Natl Acad Sci USA. 2000, 97: 4449-4452. 10.1073/pnas.97.9.4449.
Baker CV, Bronner-Fraser M: The origins of the neural crest. Part II: an evolutionary perspective. Mech Dev. 1997, 69: 13-29. 10.1016/S0925-4773(97)00129-9.
Wada H: Origin and evolution of the neural crest: a hypothetical reconstruction of its evolutionary history. Dev Growth Differ. 2001, 43: 509-520. 10.1046/j.1440-169X.2001.00600.x.
Stone JR, Hall BK: Latent homologues for the neural crest as an evolutionary novelty. Evol Dev. 2004, 6: 123-129. 10.1111/j.1525-142X.2004.04014.x.
Jeffery WR, Strickler AG, Yamamoto Y: Migratory neural crest-like cells form body pigmentation in a urochordate embryo. Nature. 2004, 431: 696-699. 10.1038/nature02975.
Smith CL, Goldsmith CA, Eppig JT: The Mammalian Phenotype Ontology as a tool for annotating, analyzing and comparing phenotypic information. Genome Biol. 2005, 6: R7-10.1186/gb-2004-6-1-r7.
Mouse Genome Informatics. 2005, [http://www.informatics.jax.org]
Davidson EH, Erwin DH: Gene regulatory networks and the evolution of animal body plans. Science. 2006, 311: 796-800. 10.1126/science.1113832.
Long M, Deutsch M, Wang W, Betran E, Brunet FG, Zhang J: Origin of new genes: evidence from experimental and computational analyses. Genetica. 2003, 118: 171-182. 10.1023/A:1024153609285.
Alba MM, Castresana J: Inverse relationship between evolutionary rate and age of mammalian genes. Mol Biol Evol. 2005, 22: 598-606. 10.1093/molbev/msi045.
Waterston RH, Lindblad-Toh K, Birney E, Rogers J, Abril JF, Agarwal P, Agarwala R, Ainscough R, Alexandersson M, An P, et al: Initial sequencing and comparative analysis of the mouse genome. Nature. 2002, 420: 520-562. 10.1038/nature01262.
Benton MJ, Ayala FJ: Dating the tree of life. Science. 2003, 300: 1698-1700. 10.1126/science.1077795.
Beissbarth T, Speed TP: GOstat: find statistically overrepresented Gene Ontologies within a group of genes. Bioinformatics. 2004, 20: 1464-1465. 10.1093/bioinformatics/bth088.
Dehal P, Satou Y, Campbell RK, Chapman J, Degnan B, De Tomaso A, Davidson B, Di Gregorio A, Gelpke M, Goodstein DM, et al: The draft genome of Ciona intestinalis : insights into chordate and vertebrate origins. Science. 2002, 298: 2157-2167. 10.1126/science.1080049.
Hoang T: The origin of hematopoietic cell type diversity. Oncogene. 2004, 23: 7188-7198. 10.1038/sj.onc.1207937.
Hall BK: The neural crest as a fourth germ layer and vertebrates as quadroblastic not triploblastic. Evol Dev. 2000, 2: 3-5. 10.1046/j.1525-142x.2000.00032.x.
Shah NM, Groves AK, Anderson DJ: Alternative neural crest cell fates are instructively promoted by TGFbeta superfamily members. Cell. 1996, 85: 331-343. 10.1016/S0092-8674(00)81112-5.
White PM, Morrison SJ, Orimoto K, Kubu CJ, Verdi JM, Anderson DJ: Neural crest stem cells undergo cell-intrinsic developmental changes in sensitivity to instructive differentiation signals. Neuron. 2001, 29: 57-71. 10.1016/S0896-6273(01)00180-5.
Dorsky RI, Moon RT, Raible DW: Control of neural crest cell fate by the Wnt signalling pathway. Nature. 1998, 396: 370-373. 10.1038/24620.
Tachibana M: MITF: a stream flowing for pigment cells. Pigment Cell Res. 2000, 13: 230-240. 10.1034/j.1600-0749.2000.130404.x.
Dupin E, Glavieux C, Vaigot P, Le Douarin NM: Endothelin 3 induces the reversion of melanocytes to glia through a neural crest-derived glial-melanocytic progenitor. Proc Natl Acad Sci USA. 2000, 97: 7882-7887. 10.1073/pnas.97.14.7882.
Leimeroth R, Lobsiger C, Lussi A, Taylor V, Suter U, Sommer L: Membrane-bound neuregulin1 type III actively promotes Schwann cell differentiation of multipotent Progenitor cells. Dev Biol. 2002, 246: 245-258. 10.1006/dbio.2002.0670.
Kalcheim C: The role of neurotrophins in development of neural-crest cells that become sensory ganglia. Philos Trans R Soc Lond B Biol Sci. 1996, 351: 375-381. 10.1098/rstb.1996.0031.
Sariola H, Saarma M: Novel functions and signalling pathways for GDNF. J Cell Sci. 2003, 116: 3855-3862. 10.1242/jcs.00786.
Clouthier DE, Williams SC, Yanagisawa H, Wieduwilt M, Richardson JA, Yanagisawa M: Signaling pathways crucial for craniofacial development revealed by endothelin-A receptor-deficient mice. Dev Biol. 2000, 217: 10-24. 10.1006/dbio.1999.9527.
Hallbook F: Evolution of the vertebrate neurotrophin and Trk receptor gene families. Curr Opin Neurobiol. 1999, 9: 616-621. 10.1016/S0959-4388(99)00011-2.
Jeanmougin F, Thompson JD, Gouy M, Higgins DG, Gibson TJ: Multiple sequence alignment with Clustal X. Trends Biochem Sci. 1998, 23: 403-405. 10.1016/S0968-0004(98)01285-7.
Butte MJ: Neurotrophic factor structures reveal clues to evolution, binding, specificity, and receptor activation. Cell Mol Life Sci. 2001, 58: 1003-1013. 10.1007/PL00000915.
Holmes WE, Sliwkowski MX, Akita RW, Henzel WJ, Lee J, Park JW, Yansura D, Abadi N, Raab H, Lewis GD, et al: Identification of heregulin, a specific activator of p185erbB2. Science. 1992, 256: 1205-1210. 10.1126/science.256.5060.1205.
Bateman A, Coin L, Durbin R, Finn RD, Hollich V, Griffiths-Jones S, Khanna A, Marshall M, Moxon S, Sonnhammer EL, et al: The Pfam protein families database. Nucleic Acids Res. 2004, 32: D138-D141. 10.1093/nar/gkh121.
Kortschak RD, Samuel G, Saint R, Miller DJ: EST analysis of the cnidarian Acropora millepora reveals extensive gene loss and rapid sequence divergence in the model invertebrates. Curr Biol. 2003, 13: 2190-2195. 10.1016/j.cub.2003.11.030.
Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ: Basic local alignment search tool. J Mol Biol. 1990, 215: 403-410.
GOstat web page. [http://gostat.wehi.edu.au/]
Pfam web page. [http://www.sanger.ac.uk/Software/Pfam/]
HMMER software. [http://hmmer.wustl.edu/]
We are thankful to Miguel Manzanares and David Torrents for their encouraging comments on this work. We are extremely grateful to Katherine Brown, Laurence Ettwiller, and Felix Loosli for comments and critical reading of the manuscript. J-RM-M received a Marie Curie Fellowship. This study was supported by grants from the European Union (Strep Hygeia) and the German Research Foundation, Collaborative Research Centre 488.
Juan-Ramon Martinez-Morales, Thorsten Henrich, Mirana Ramialison contributed equally to this work.
Electronic supplementary material
Additional data file 1: The table includes a full list of the statistically over-represented GO annotations of genes belonging to each of the seven categories (cutoff P < 0.001, sample count = 15). (PDF 102 KB)
Additional data file 2: The table comprises a full list of the 615 neural crest genes compiled using Phenotype Ontology annotations for each of the seven temporal categories considered in this work: prokaryota (pro), eukaryota (euk), metazoa (met), deuterostomia (deu), chordata (cor), vertebrata (ver), and mammalia (mam). (PDF 68 KB)
Additional data file 3: The table shows statistically over-represented GO annotations of the set of neural crest developmental genes that emerged in vertebrates (cutoff P < 0.001). (PDF 18 KB)
Additional data file 4: ClustalX alignment of mouse neurotropins against the echinoderm peptide. The comparison reveals a limited amino acid identity. (PDF 58 KB)
Additional data file 5: Phylogenetic analysis of neural crest Pfam domains emergence through evolution. The table shows a full list of the compiled 615 genes involved in neural crest development and the first appearance of their Pfam domains in the different clades. All the corresponding Pfam domains of these genes, when these domains have appeared, and the classification of the genes according to our previous sequential blast analysis (blast; color-coded) are indicated. (PDF 72 KB)
Additional data file 6: A complete list of genomes of species included in this work and their respective source is compiled in this table. Abbreviations: arc (archaeobacteria), bac (bacteria), euk (eukaryota), met (metazoa), deu (deuterostomia), cor (chordata), ver (vertebrata). (PDF 104 KB)
Additional data file 7: As a control of our gene phylogeny analysis, we also run the sequential blast pipeline using other vertebrate groups, namely (chicken, xenopus and zebrafish genomes). The tables show the number or percentage of genes assigned to each evolutionary category. The graphical representation of the gene phylogeny for the four vertebrate species analyzed revealed a very similar gene loss/emergence profile. (PDF 41 KB)
Authors’ original submitted files for images
Below are the links to the authors’ original submitted files for images.
About this article
Cite this article
Martinez-Morales, JR., Henrich, T., Ramialison, M. et al. New genes in the evolution of the neural crest differentiation program. Genome Biol 8, R36 (2007). https://doi.org/10.1186/gb-2007-8-3-r36