Divergent evolution of arrested development in the dauer stage of Caenorhabditis elegans and the infective stage of Heterodera glycines

The generation and analysis of over 20,000 ESTs allowed the identification and expression profiling of 6,860 predicted genes in the nematode Heterodera glycines. This revealed that gene expression patterns in the dauer stage of Caenorhabditis elegans are not conserved in H. glycines.


Background
Heterodera glycines, the soybean cyst nematode, is the economically most important pathogen in soybean production and causes estimated annual yield losses of $800 million in the USA alone [1]. H. glycines completes its life cycle in about one month [2]. The first molt of the larvae takes place inside the eggs, and, after hatching, infective second-stage juveniles (J2) migrate through the soil and invade soybean roots to become parasitic J2. Once inside host roots, J2 move intracellulary through the root tissue to the central cylinder, where they initiate the formation of feeding sites (syncytia) and become sedentary. Only after feeding commences do nematodes molt and pass through two more juvenile stages (J3, J4) and, after a final molt, develop into adults. The enlarging body of the female, which remains sedentary for the remainder of the life cycle, breaks through the root cortex into the rhizosphere. Males regain motility and leave the root to fertilize females. After fertilization, females produce eggs, the majority of which are retained inside the uterus. Upon death of the adult female, its outer body layers harden and form a protective cyst (hence the name cyst nematodes) around the eggs until the environment is favorable again for a new generation of nematodes [2,3]. Even though eggs in their cysts are the primary dispersal stage of this nematode in an epidemiological sense, the J2 stage is mobile and, thus, comparable to the dispersal stage of Caenorhabditis spp.
In the past, numerous reports on cyst nematodes (Heterodera spp. and Globodera spp.) focused on selected genes, rather than taking a genomic approach, to elucidate nematode biology or the host-pathogen interactions between these nematodes and their host plants. Many of these studies dealt with so-called parasitism genes that are expressed in the dorsal and subventral esophageal glands during parasitic stages of cyst nematodes. The products of these genes are thought to be secreted into the host tissue to mediate successful plant parasitism [4][5][6][7][8][9][10][11][12][13]. However, a comprehensive genomic analysis beyond this limited group of genes has been lacking to date. To fill this gap, we generated 20,100 H. glycines expressed sequence tags (ESTs). Analyses of these ESTs plus approximately 1,900 sequences already in public databases produced a grouping into 6,860 unique genes. We assigned putative functions to these genes based upon sequence homology and established their expression profiles throughout the major life stages of H. glycines. Our data sets and results now represent a comprehensive resource for molecular studies of H. glycines.
Genomic analyses provide powerful tools to elucidate relationships between plant-parasitic nematodes and their hosts. Previous reports focused on the analysis of ESTs of plant-parasitic nematodes [14][15][16][17] or used differential display [18,19] and microarrays [20][21][22] to study gene expression changes in Arabidopsis and soybean in response to cyst nematode infection. Only recently, the advent of the Affymetrix Soybean Genome Array GeneChip enabled a parallel analysis of gene expression changes in both soybean and soybean cyst nematode during the early stages of infection [23]. The Affymetrix Soybean Genome Array GeneChip contains 37,500 probesets from soybean plants and additionally 15,800 probesets from the oomycete Phythophthora sojae and 7,530 probesets from the soybean cyst nematode H. glycines, two of the most important soybean pathogens. The H. glycines sequences used for the GeneChip have been generated in the study presented here.
The completion of the C. elegans genome sequence [24] was a milestone for biology at large, but it especially set the stage for comparisons to other (for example, parasitic) nematode species and has ushered in an era of comparative genomics in nematology [25][26][27][28][29]. One question of particular interest is whether the dauer larva, a facultative stage in the free-living species C. elegans, is homologous to the obligate dauer stage in parasitic nematodes. Dauer larvae were first described [30] as an adaptation to parasitism to overcome adverse environmental conditions and facilitate dispersal, but have been best studied in C. elegans. Genetic analysis has revealed the pathway controlling entry to and exit from the dauer stage [31]. This biochemical pathway, which is highly conserved across the animal kingdom, including humans [32], assesses and allocates energy resources to nematode development, ageing and fat deposition. The dauer pathway is primarily neuronally mediated, but presumably communicates with endocrine functions.
There is no strict definition of a 'dauer', but these larvae share the properties of being developmentally arrested, motile, non-feeding, non-ageing and hence long-lived [31,33,34]. Dauer stages have been well-documented for some plantassociated genera, including Anguina [35] and Bursaphelenchus [36], and it has been proposed that the infective stages of the sedentary endo-parasitic forms, including H. glycines, function as dauers [37]. In addition to the developmental attributes of the dauer, H. glycines J2 exhibit detergent resistance [38], intestinal morphology with sparse luminal microvilli [39] and numerous lipid storage vesicles characteristic of C. elegans dauers.
The dauer larva stage in C. elegans has distinct metabolic hallmarks [31]. Enzymes involved in the citrate cycle (with the exception of malate dehydrogenase) are less active in dauer larvae relative to adult C. elegans. Dauers show an increased level of phosphofructokinase activity and, therefore, glycolysis relative to adults [40]. The citrate cycle is less active than the glyoxylate cycle in dauer larvae compared to adults, consistent with the important role of lipids in energy storage in the dauer stage [41]. Also, heatshock protein 90 (Hsp90) is up-regulated fifteen-fold in dauer larvae relative to other stages [42], and superoxide dismutase and catalase activities show significant increases as well [43,44]. Although it is widely assumed that the dauer pathway per se is utilized to regulate dauer entry/exit in various animal-parasitic [45,46] and plant-parasitic species [26], little is known in these diverse species about the nature of the biochemistry that is regulated by the dauer pathways (that is, the effectors). Intriguingly, one experimental study in the human-parasitic nematode Strongyloides stercoralis [27] could not find clear evidence for a conserved dauer gene-expression signature, suggesting that the effectors of dauer biology might be diverged across nematode species.
However, a common feature among the dauer stage of C. elegans and the infective stage of parasitic nematode species seems to be the down-regulation of collagens, which make up a major portion of the nematode cuticle [27]. Collagens share a high degree of sequence identity due to numerous repeats, but they are not functionally redundant and often are developmentally regulated [47][48][49][50]. Previous EST studies found just three collagen transcripts in the infective stage of Meloidogyne incognita [14] and none in the infective stage of S. stercoralis [27]. In C. elegans, collagens could not be identified among dauer-specific transcripts [51].
Determining whether developmental arrest in C. elegans and parasitic nematodes like H. glycines is executed via the same mechanisms is a fundamental question of nematode biology. Of more than just academic interest, it may have important ramifications for potential control strategies that focus on the dauer pathway as a promising biochemical target to disrupt parasitic nematode life cycles. For example, it is very appealing to envision a strategy to induce dauer exit and concomitant resumption of ageing and development in the absence of a suitable host.
Here, we analyze and compare for the first time global geneexpression changes throughout all major life stages (eggs, infective J2, parasitic J2, J3, J4, virgin females) except adult males of a parasitic nematode and compare expression profiles to those of the model nematode C. elegans, with a particular focus on developmental arrest using EST and microarray data. Taken together, the sequence generation, sequence analyses and expression profiling work presented in this paper represent the most comprehensive and informative genomic resource available for the study of cyst nematode development and parasitism to date.

EST generation and sequence analysis
Life stage-specific (eggs, infective J2, J3, J4, virgin females) cDNA libraries of H. glycines, the soybean cyst nematode, were generated to provide templates for EST sequencing, totaling 20,100 5' ESTs or almost 10 million nucleotides (GC content 48.9%). Sequences from all five developmental stages were represented in approximately equal proportions (Table  1). In addition to these stage-specific libraries, 1,858 H. glycines sequences previously deposited in GenBank were included in the dataset for this study, bringing the total number of sequences analyzed here to 21 In order to determine sequence similarities of our contigs and in particular to identify genes that are conserved between different nematode species, we BLAST searched the 6,860 H. glycines contigs versus three databases ( Figure 1). About half of the contigs (44%) matched sequences in at least one of these three databases at a threshold value of E = 1e -20 . Examination of the BLAST match distribution revealed that 19% of the contigs that matched all three databases are most likely representing highly conserved genes involved in fundamental housekeeping processes in metazoans, while the 31% of contigs exclusively matching sequences in the cyst nematode database contained genes that likely are important for specific host adaptations of Heterodera spp.
When assessing BLAST hit identities, the cluster that contained the most ESTs (HgAffx.13905.2; 599 ESTs) belonged to a gene coding for a putative cuticular collagen. Identities of other highly represented contigs were actin, tropomyosin and myosin, as well as additional house-keeping genes like ribosomal components, ubiquitin, arginine kinase, synaptobrevin  ucts of the 25 most conserved genes were heat shock proteins, proteins related to transcription and translation (for example, elongation and splicing factors and RNA polymerase II) and structural proteins, including tubulin and actin, as well as enzymes, including guanylate cyclase (Additional data file 3).

A survey and functional classification of developmentally regulated genes
In order to identify H. glycines genes that are developmentally regulated and to document their expression profiles, we designed a microarray experiment using three complete and independent biological replications (that is, three independent sample series representing three complete life cycles). We identified 6,695 probesets (Additional data file 4) as described in Materials and methods that were differentially expressed with a false discovery rate (FDR) of 5% when observed over the entire life cycle of H. glycines. This group of probesets equals 89% of all H. glycines probesets on the microarray. In other words, the vast majority of H. glycines genes represented on the GeneChip significantly changed expression during the nematode life cycle. We then grouped these 6,695 probesets into 10 clusters based on their expression profiles ( Figure 2). As an exemplary gene family, we analyzed the expression pattern of FMRF (Phe-Met-Arg-Phe-NH 2 )-related neuropeptide (FaRP)-encoding genes. This group encodes a specific class of neuronally expressed tetrapeptides that are potent myoactive transmitters in nematode neuromusculature [52][53][54][55][56], which are expressed in motor neurons that act on body wall muscle cells [57][58][59]. Based on our BLAST searches against various databases as detailed above, we identified five probesets for genes encoding FaRPs (HgAffx.23446.1.S1_at, HgAffx.23636.1.S1_at, HgAffx63.1.S1_at, HgAffx20469.1.S1_at, HgAffx.24161.1.S1_at). All five probesets were co-expressed with each other and showed an expression peak in the infective J2 stage (Additional data file 1). These FaRP probesets were differentially expressed when observed over the entire life cycle of the nematode, and, with the exception of HgAffx.20469.1.S1_at, which was found in cluster 4, all probesets were grouped in cluster 7 ( Figure 2). The general profile of cluster 4 showed an expression peak in infective J2 and fell steadily in later life stages, while cluster 7 demonstrated the same overall pattern but showed a more pronounced increase from egg to infective J2.  the number of probesets showing differential expression (FDR 5%) in these comparisons is given in Table 3.
We used InterProScan [60] to conduct functional classification for all 6,860 contigs in all expression clusters of each comparison. The relative abundance of the 25 InterPro domains with the highest representation in each of the 10 clusters for contigs that showed differential expression (FDR 5%) throughout the entire life cycle is summarized in Additional data file 5. While most clusters contained a wide range of genes represented by diverse InterPro domains, collagen domains stood out, in that they accumulated in cluster 2 at a high frequency relative to other InterPro domains. Since it has been suggested that down-regulation of collagens might be a common feature in dauer and infective stages of nematodes [27], we analyzed the expression profiles of H. glycines collagens in more detail. Using a reciprocal BLAST strategy as described in Materials and methods, we identified eight H. glycines probesets representing seven unique contigs orthologous to C. elegans collagens ( Table 4). The temporal expression pattern of these seven orthologs was very similar ( Figure  3) and congruent with observations in other nematode species [14,51], which supports the hypothesis that down-regulation of collagen transcription is a conserved characteristic of non-molting infective and dauer-stage nematodes [27].
Heterodera glycines orthologs of dauer-enriched C. elegans genes are more likely to be down-regulated upon transition from infective J2 to parasitic J2 and J3 than other genes In addition to providing a comprehensive gene characterization and expression resource, we wished to demonstrate the applicability and power of our data by addressing the question of whether the infective J2 stage of H. glycines is Differentially expressed H. glycines probesets Figure 2 Differentially expressed H. glycines probesets. Temporal expression patterns of 6,695 H. glycines probesets that are differentially expressed (FDR 5%) when observed over the entire life cycle. Probesets were placed into ten clusters based on their temporal expression patterns. The average expression pattern of the probesets in each cluster is indicated by a red line. For visualization purposes, each probeset's estimated mean log-scale expression profile was standardized to have mean 0 and variance 2 prior to plotting. infJ2, infective J2; parJ2, parasitic J2. biochemically analogous to the C. elegans dauer larva stage. We compiled a list of 1,839 C. elegans genes that were identified by Wang and Kim [61] as so-called dauer-regulated genes by conducting microarray experiments comparing gene expression in C. elegans larvae that were in transition from dauer to non-dauer with that of freshly fed L1 larvae that had been starved. Dauer-regulated genes showed significant expression changes during a dauer exit time course that were not related to the introduction of food [61]. Reciprocal BLAST searches resulted in the identification of 438 H. glycines probesets that could be categorized unambiguously as orthologs of these C. elegans dauer-regulated genes (Additional data file 6). Because of the deliberate redundancy of the Affymetrix GeneChip, these 438 probesets corresponded to 396 unique H. glycines gene predictions. In other words, we identified H. glycines orthologs for 22% of the 1,839 C. elegans dauer-regulated genes (396/1,839).
We also compiled a list of H. glycines genes that are orthologous to the 488 C. elegans gene subset of the C. elegans dauer-regulated genes that Wang and Kim [61] determined to be up-regulated during the dauer stage and down-regulated upon dauer exit, a group that was called dauer-enriched. These genes presumably define dauer-specific properties, including stress resistance and longevity. These dauerenriched genes were of particular interest to us because upregulation of orthologous genes in the H. glycines infective J2 stage would suggest involvement in developmental arrest of these genes not only in C. elegans, but also in H. glycines.
Using the same reciprocal BLAST search strategy, we identified 74 H. glycines probesets corresponding to 69 unique H.
To test whether the frequency of H. glycines orthologs to C. elegans dauer-regulated and dauer-enriched genes is similar to that of other, randomly chosen genes, we randomly selected 1,000 C. elegans proteins from the Wormpep database (v. 157) and repeated the reciprocal BLAST searches. In these searches, we identified 159 unique H. glycines contigs that fulfilled our criteria (data not shown). In other words, 16% of these randomly selected C. elegans genes have H. glycines orthologs. These analyses showed that C. elegans dauer-regulated genes have a slightly higher frequency (22%) of having orthologs in H. glycines than either dauer-enriched (14%) or random (16%) genes, both having about the same rate.
Following the identification of H. glycines orthologs for C. elegans dauer-regulated and dauer-enriched genes, we clustered these genes according to their expression profiles throughout the life cycle. Clustering the 438 probesets for the dauer-regulated orthologs led to their placement into nine groups (Additional data file 2), while clustering of the 74 H. glycines probesets for dauer-enriched gene orthologs resulted in seven distinct groups ( Figure 4). It is obvious that not all H. glycines dauer-enriched orthologs were down-regulated from infective J2 to parasitic J2. Indeed, only 41% out of 74 probesets were significantly down-regulated, whereas 22% were up-regulated and 38% did not exhibit a statistically significant change in expression. Similarly, when comparing the infective J2 stage with the J3 stage of H. glycines, only 47% out of 74 probesets were down-regulated. Twenty-two percent were up-regulated, and 31% did not exhibit a statistically significant change in expression. In other words, in both comparisons, the majority of H. glycines genes that are orthologous to C. elegans genes down-regulated upon dauer exit were up-regulated or did not exhibit a statistically significant change in expression.  those 181, 27% were down-regulated from infective J2 to parasitic J2, 24% were up-regulated and 50% did not exhibit a statistically significant change. When infective J2 were compared to J3, 26% out of 181 were down-regulated, 20% upregulated and 54% did not exhibit a statistically significant change. We used a Fisher's exact test [62] to examine whether the proportion of down-regulated genes among the set of dauer-enriched H. glycines orthologs was significantly different from all other genes on the array or from the H. glycines probesets that were orthologous to the 1,000 randomly chosen C. elegans genes, respectively. We found that the observed differences between the proportions of down-regulated probesets between dauer-enriched H. glycines orthologs and the entire set of probesets on the microarray were significant at the 0.05 level in comparisons of both infective J2 versus parasitic J2 (P = 0.000015) and infective J2 versus J3 (P = 0.007160). Similarly, the differences between dauer-enriched H. glycines orthologs and the H. glycines probesets orthologous to random C. elegans genes were significant for comparisons of infective J2 versus parasitic J2 (P = 0.0219190) and for infective J2 versus J3 (P = 0.0094261). If a Bonferroni correction is used to control the overall type I error rate for this family of four tests, all comparisons would remain significant at the 0.05 level except the comparison between dauer-enriched H. glycines orthologs and the H. glycines probesets orthologous to random C. elegans genes for infective J2 versus parasitic J2. In other words, while the majority of H. glycines genes that are orthologous to C. elegans dauer-enriched genes was not down-regulated upon transition to parasitic J2 or J3, the proportion of H. glycines orthologs that were in fact down-regulated was statistically significantly enriched about two times compared to all H. glycines genes on the microarray or to orthologs to random C. elegans genes.
The identities of H. glycines genes that followed the expression pattern of their dauer-enriched C. elegans orthologs (that is, they were down-regulated upon transition to infective J2 or J3) reflect a wide range of effector functions and biochemical pathways, including peptidases, epoxide and glycoside hydrolases, phosphate transporters and neuropeptidelike proteins. H. glycines genes that did not follow the C. elegans pattern of expression (that is, they were not down-regulated) span an equally diverse group of genes and include carbohydrate kinase, catalase and glutathione peroxidase ( Table 5).

Metabolism in C. elegans dauer larvae and H. glycines infective J2 is dissimilar
To investigate whether the infective J2 stage in H. glycines shows an expression profile of metabolic pathway genes similar to that of C. elegans dauer larvae, we conducted a BLAST search (threshold E = 1e -20 ) against the Wormpep database (v. 152) to search for Affymetrix probesets coding for H. glycines enzymes active in the citrate cycle, glycolysis and other pathways that undergo marked changes during the dauer state [31]. We identified 37 probesets coding for 24 proteins active in six different pathways (Table 6). We then compared the expression levels of these H. glycines probesets in the assayed H. glycines life stages and determined differential expression (FDR 5%). While phosphofructokinase has been found to be up-regulated in dauer larvae relative to adults in C. elegans [40], we could not find differential expression between infective J2 and adult females in H. glycines. The citrate cycle is down-regulated in the C. elegans dauer stage and active at a lower level than the glyoxylate pathway [40,41]. In H. glycines, out of eight genes for citrate cycle enzymes found, all but one (fumarase) showed differential expression in at least one out of three stage-by-stage comparisons (egg/infective J2, infective J2/feeding J2, infective J2/

H. glycines probesets orthologous to dauer-enriched C. elegans genes
was differentially expressed when compared to eggs (up-regulated in infective J2) and females (down-regulated in infective J2), and catalase-3 was differentially expressed and down-regulated in infective J2 when compared to eggs. In summary, we conclude from these data that the physiological and biochemical landscape of developmentally arrested C. elegans dauer larvae must be different from that of developmentally arrested H. glycines infective J2.   genes. The qRT-PCR template was the same biological material used for hybridization of the microarrays, and the reactions were performed in technical triplicates. Additional data file 7 shows that there is qualitative agreement between our microarray approach and qRT-PCR for 28 out of 30 genes.

Discussion
This study describes the generation of the most exhaustive EST collection of the soybean cyst nematode H. glycines to date and the analyses and characterization of this EST collection. The Affymetrix Soybean Genome Array GeneChip contains a significant portion of probesets for H. glycines genes and became a possibility only because of the extensive EST collection generated in this project. This GeneChip now is the first commercially available microarray for a nematode other than C. elegans. We analyzed the expression profiles of all 6,860 genes throughout all major developmental stages, excluding the adult male. Furthermore, we classified all genes by predicted function and conducted stage-wise comparisons to identify differentially expressed genes. Finally, these data now represent a resource for any molecular project targeting H. glycines, and we have demonstrated the versatility of this genomics resource by advancing our understanding of arrested development in the infective stage.
It has been proposed that the C. elegans genome can serve as a guide to examine aspects of the biology of other nematode species, particularly those that are parasitic [26], and we have shown that this comparative genomics approach has great power. In particular, we examined whether the biochemistry underpinning the developmentally arrested, infective J2 stage of H. glycines is functionally analogous to that of the dauer stage in C. elegans. For this purpose, we exploited published microarray expression data obtained from C. elegans during dauer exit [61]. These down-regulated genes, termed 'dauer-enriched,' exhibit high mRNA abundance during the dauer stage. We asked if these genes were conserved in H. glycines, both in sequence and expression pattern. While our reciprocal BLAST searches suggest that a portion of C. elegans dauer-enriched genes is indeed conserved in the H. glycines genome with about the same frequency as randomly selected C. elegans genes, we did not find that H. glycines orthologs of C. elegans dauer-enriched genes are uniformly down-regulated upon transition to the parasitic J2 or J3 stages. While in C. elegans dauer-enriched genes are downregulated upon dauer exit [61], we found that only 41% of their H. glycines orthologs are down-regulated upon transition to the feeding J2 stage, which we hypothesize is a developmental transition equivalent to dauer exit in C. elegans. Nevertheless, H. glycines dauer-enriched orthologs are more likely to be down-regulated than: all genes represented on the microarray; and H. glycines orthologs for randomly selected C. elegans genes. In other words, while our data do not support the idea of a broadly conserved gene expression signature between the dauer stage in C. elegans and infective J2 in H. glycines, they indicate that dauer-enriched orthologs are more likely to share a common expression profile than other genes. A similar preliminary observation was made in an EST study [27] that compared the infective stage of the humanparasitic nematode S. stercoralis and dauer-specific transcripts that were identified in serial analysis of gene expression (SAGE) in C. elegans [51]. Mitreva et al. [27] found that dauer-specific genes were conserved in S. stercoralis, but that there was no evidence of a broadly conserved expression signature. However, in this study, we were able to compare our exhaustive microarray data for H. glycines with microarray data for C. elegans, which enables us to draw more compelling conclusions regarding dauer-regulated genes in both species.
We further compared the expression of metabolic pathway genes of C. elegans dauer larvae with infective J2 of H. glycines. Our data for H. glycines genes whose products are active in the glyoxylate pathway or citrate cycle, both of which undergo marked gene expression changes in C. elegans dauer larvae, as well as for genes encoding Hsp90 or superoxide dismutase, show dramatic differences between H. glycines infective J2 and C. elegans dauer larvae. Our findings suggest that the C. elegans dauer larva and the H. glycines infective J2 do not share similar expression profiles of metabolic pathway genes.
Although based upon inferences from transcript levels, our data point to striking differences in the underlying biochemistry and physiology of developmentally arrested and recover-  ing C. elegans dauers and H. glycines J2. Does the phenomenon of developmental arrest in these stages therefore reflect a substantially different regulatory pathway, or does it reflect life-style differences between the parasitic and free-living species? Until the genome of H. glycines is complete, it will not be possible to determine if more putative orthologs of the daf and sma genes that compose the C. elegans dauer pathway are present than we were able to identify here. Many of these regulators of dauer entry/exit [31] are typically expressed at low levels, and thus are not common in EST sets.
Numerous nematode species are carried passively to new habitats as eggs and achieve active dispersion as dauer larvae or infective stage. This active dispersal stage in many nematodes is thought to have evolved from a common ancestral stage, and it has been assumed that there has been sufficient conservation of gene expression over time that the expression patterns in different species would appear similar. Our findings of marked differences in gene expression between C. elegans dauer larvae and H. glycines infective J2 could mean that these two developmentally arrested stages have evolved independently from each other and that all apparent similarities are based on convergent evolution. Alternatively, and we believe more likely, is that these two stages could in fact have a common origin but molecular evolution could have been sufficiently fast that any broadly conserved gene expression patterns would have been lost since the last common ancestor. Perhaps the best example of rapid diversification of genetic and biochemical processes underlying an analogous biological process comes from comparisons of vulval induction in C. elegans and Pristionchus pacificus [63]. In both species, vulva induction occurs post-embryonically via an identical cellular process (inductive signaling to P5.p, P6.p and P7.p cells from the anchor cell). Despite the obvious homology of these processes, the underlying regulation is strikingly distinct [63]. Similarly, other studies have demonstrated that gene expression patterns between mouse and human, which split only about 25 million years ago, changed rapidly [64][65][66].
Concerning the C. elegans dauer stage and potentially analogous stages in parasitic nematodes, it will be interesting to define the mechanisms that lead to developmental arrest in H. glycines and to compare them further to processes in C. elegans. The comprehensive dataset generated here will be a valuable resource for the field of nematology and sets the stage for many more comparative studies. In particular, a comprehensive analysis of H. glycines parasitism-associated gene expression profiles has been conducted by these authors and will be published later.

Conclusion
Our data indicate that expectations of a conserved phylumwide dauer expression signature shared across nematodes may not be realistic. Such general expectations severely underestimated the degree to which expression profiles change and should be replaced by careful analyses of dauerlike stages of closely related nematode species instead. For example, one could ask whether the dauer expression profiles of C. elegans and Caenorhabditis briggsae are the same or whether the expression profiles of H. glycines infective J2 and M. incognita infective J2 are conserved.

Materials and methods
Nematode cultivation, cDNA library generation, sequencing and clustering H. glycines strain OP-50 [67] was cultivated under greenhouse conditions and isolated as described previously [68]. Genome Array GeneChip and all consensus sequences and contig size details can be accessed at Affymetrix [69].

Nematode cultivation for microarray experiments
For each replication, 40 pots with 10 seeds each of Kenwood 94 soybeans were planted in a 2:1 sand:soil mixture in the greenhouse. Two weeks after planting, each pot, containing an average of 7 to 8 germinated seedlings, was inoculated with 15,000 to 20,000 H. glycines strain OP-50 [67] infective J2. The inoculum was collected by setting up two hatch chambers, each containing about two million H. glycines OP-50 eggs, and allowing the eggs to hatch for four days. From the same batch of eggs used in the hatch chamber, 50,000 eggs were collected and flash frozen in liquid nitrogen for use as the egg stage of the replication. After 4 days, the hatched infective J2 were collected and counted, and an aliquot of 50,000 larvae was flash frozen in liquid nitrogen for use as the infective J2 stage of the replication. The remainder of hatched infective J2 larvae was divided up among the 40 pots for seedling inoculation. Four days after infection, 12 pots were collected, and the soil was washed away from the root systems of these pots to isolate the parasitic J2 stage. Eight days after infection, another 12 pots were harvested for collection of J3 juveniles, and, 14 days after infection, a further 10 pots were used to isolate J4 juveniles. Finally, 21 days after infection, the final 6 pots were harvested for collection of adult females. These stages were isolated as published previously [68]. All stages were divided in about 30 mg aliquots in 1.5 ml screw cap tubes, flash frozen in liquid nitrogen and stored at -80°C.

RNA extraction for microarray experiments and GeneChip procedures
Frozen (-20°C) 1.0 mm zirconia beads (BioSpec Products, Bartlesville, OK, USA) were added to frozen nematode tissue in a 1.5 ml screw cap tube in a 1:1 tissue:bead ratio. RNA was isolated using the Versagene kit from Gentra Systems (Minneapolis, MN, USA). On ice, 800 μl lysis buffer was mixed with 8 μl tri 2-carboxyethyl phosphine (TCEP), and 50 μl was added to one screw cap tube with about 30 mg nematode tissue. Tissue was homogenized in a beadbeater (BioSpec Products) at setting 48 for 10 s and chilled on ice for 60 s. This step was repeated three times. Two more tubes for the same life stage were treated in the same way. The suspension was collected by pipetting and transferred into a new tube. Beads were rinsed with remaining buffer, and the standard Versagene protocol was followed from then on. For all stages and all repetitions, we obtained 113 to 320 mg frozen tissue and 7.85 to 173.38 μg total RNA. The RNA concentration and quality of each sample were determined by a NanoDrop spectrophotometer (NanoDrop Technologies, Wilmington, DE, USA) and by RNA Nanochip on a 2100 Bioanalyzer (Agilent Technologies Inc., Palo Alto, CA, USA). RNA was submitted to the Iowa State University GeneChip Facility, where standard procedures recommended by Affymetrix were followed for reverse transcription and labeling of the probes and for hybridization and scanning of the GeneChips.

Experimental design of microarray experiments and GeneChip data analysis
Expression was measured using a total of 18 Affymetrix Soybean Genome Array GeneChips (three replications × six life stages) using a randomized complete block design with replications as blocks. Prior to performing the analysis, the Affymetrix signal data were transformed using the natural log (ln) and normalized by median centering (that is, the median ln signal from each particular chip was subtracted from all ln signals on the chip). The normalized data for each gene were analyzed separately using a standard linear model with fixed effects for replications and stages. F tests, resulting in p-values, were performed to test for a difference in expression between the life stages for each probeset. A q-value was computed for each p-value using the method described by Storey and Tibshirani [70]. These q-values can be used to identify differential expression while maintaining approximate control of the FDR. For example, FDR is controlled at approximately 5% if the sets of tests with q-values at or below 0.05 are declared significant. See [71] for a discussion of linear modeling and FDR control in the context of plant microarray experimentation.
Clustering was used to organize and summarize the observed expression patterns of differentially expressed genes. For each probeset, the mean normalized expression level was estimated for all six life stages. The six estimated values were standardized to have mean 0 and standard deviation 1 within each probeset. The Euclidian distance between any pair of standardized expression profiles was used as a measure of dissimilarity in all clustering algorithms. This approach considers genes with similar expression patterns to be close in six-dimensional space and is equivalent to using (1-r) 0.5 as the measure of dissimilarity, where r is the Pearson correlation coefficient between non-standardized expression profiles.
K-medoids clustering [72] was performed on those probesets with the 1,000 lowest p-values (< 0.0000143) for the overall test for expression change across the life cycle. Using the Gap statistic [73], we determined the number of clusters to be 10. Probesets with q-values less than 0.05 were added to these 10 base clusters to produce the clustering depicted in Figure 2. All other clusters were produced using hierarchical agglomerative clustering, using average linkage to measure the dissimilarity between clusters.
All clusters and related figures were generated using the free open-source statistical software package R [74]. Hierarchical clustering was carried out using the R function hclust; Kmedoids clustering was carried out using the function pam from the R cluster library.

BLAST searches of H. glycines contigs
We built three nematode-specific nucleotide databases as follows: cyst nematodes (

Identification of C. elegans dauer and collagen orthologs
A previous microarray study identified 1,984 so-called dauerregulated and 540 dauer-enriched genes in C. elegans [61]. We manually compiled a list of 1,839 dauer-regulated and 488 dauer-enriched C. elegans genes using these data (the remaining genes were either deleted or merged with other genes in Wormbase). To isolate H. glycines orthologs, we conducted BLAST searches between these C. elegans genes and all 7,530 H. glycines probeset nucleotide sequences (threshold values 1 e-15 , bit score at least 50, identity at least 30%). We then conducted an additional BLAST search between the H. glycines probesets that passed those criteria and the entire Wormpep (v. 159) database using the same cutoff criteria as above. Only those ortholog pairs in which the C. elegans dauer-regulated or dauer-enriched hit was either the best one compared to C. elegans hits of the entire Wormpep database or within 5% of the best hit's bit score were kept. These searches generated a list of 438 H. glycines probesets orthologous to C. elegans dauer-regulated genes and 74 H. glycines probesets orthologous to C. elegans dauer-enriched genes.
To identify collagen orthologs, we downloaded the sequences for collagens listed at the Sanger Institute web site [76] and followed basically the same BLAST strategy, with the exception that we required a higher stringency with a bit score of at least 100.

qRT-PCR
The transcript abundance of 30 differentially expressed cDNA clones was analyzed by qRT-PCR to confirm microarray results. Gene-specific primers were designed, and the sequences are shown in Additional data file 7. DNase-treated RNA (10 ng) was used for cDNA synthesis and PCR amplification using an iScript One-Step RT-PCR kit (BIO-RAD, Hercules, CA, USA) according to manufacturer's protocol. The PCR reactions were performed using an iCycler (BIO-RAD) under the following conditions: 50°C for 10 minutes, 95°C for 5 minutes and 40 cycles of 95°C for 30 s and 60°C for 30 s. Following PCR amplification, the reactions were subjected to temperature ramp to create the dissociation curve, measured as changes in fluorescence as a function of temperature, by which the non-specific products can be detected. The dissociation program was 95°C for 1 minute, 55°C for 10 s, followed by a slow ramp from 55°C to 95°C. Three replicates of each reaction were performed, and constitutively expressed Actin 1 (AF318603) was used as internal control to normalize gene expression levels. Quantifying the relative changes in gene expression was performed using the 2 -ΔΔCT method as described by Livak and Schmittgen [77].

Data
The Affymetrix Soybean Genome Array GeneChip raw and normalized data files were deposited in the ArrayExpress database [78] under accession number E-MEXP-1110. This database is MIAME-compliant. file 7 is a table listing qRT-PCR results, oligonucleotide primers used and probeset identities. Additional data file 8 provides transformed and normalized Affymetrix probeset mean data for each probeset in each life stage and q values for each tested combination of life stages as described in Materials and methods.
Additional data file 1 Temporal expression pattern of H. glycines probesets for FaRP-encoding genes BLAST searches identified seven H. glycines probesets orthologous to C. elegans FaRP-encoding genes (mean expression pattern indi-cated in red). For visualization purposes, each probeset's estimated mean log-scale expression profile was standardized to have mean 0 and variance 1.5 prior to plotting. infJ2, infective J2; parJ2, para-sitic J2. Click here for file Additional data file 2 Temporal expression patterns and clusters of 438 H. glycines probesets orthologous to C. elegans dauer-regulated genes Reciprocal BLAST searches identified 438 H. glycines probesets as orthologous to C. elegans dauer-regulated genes. These probesets were grouped into nine clusters based on their temporal expression profiles. The average expression pattern of the probesets in each cluster is indicated by a bold line. infJ2, infective J2; parJ2, para-sitic J2. Click here for file Additional data file 3 The