Identification of novel regulatory factor X (RFX) target genes by comparative genomics in Drosophila species

An RFX-binding site is shown to be conserved in the promoters of a subset of ciliary genes and a subsequent screen for this site in two Drosophila species identified novel RFX target genes that are involved in sensory ciliogenesis.


Background
Eukaryotic cilia and flagella are present in many types of tissues and organisms and are important for sensory functions, cell motility, molecular transport, and several developmental processes, such as the establishment of left-right asymmetry in vertebrates [1][2][3][4][5]. Several human diseases are known to result from defects in ciliary assembly or function and have recently been designated as ciliopathies [5]. Cilia are welldefined structures consisting of a microtubular axoneme composed of specific proteins that are assembled dynamically in a strict stereotypical pattern (for reviews, see [6,7]). Ciliary assembly depends on intraflagellar transport (IFT) a dynamic process highly conserved in organisms ranging from the green algae Chlamydomonas to mammals (reviewed in [1,8,9]). Several studies in various organisms have been instrumental in the identification of genes involved in the assembly and function of the cilium. The proteomic analysis of detergent-extracted ciliary axonemes from cultured human epithelial cells identified 214 proteins [10]. More recently, a biochemical fractionation of Chlamydomonas reinhardtii flagella led to the identification of about 700 proteins, of which 360 had high confidence of truly being involved in flagellar composition [11]. A proteomic analysis of Trypanosoma brucei flagella allowed the identification of 522 proteins [12]. Two remarkable approaches took advantage of the availability of complete genome sequences to identify genes encoding ciliary and flagellar proteins. By comparing the genomes of ciliated versus non-ciliated organisms, Avidor-Reiss et al. [13] and Li et al. [14] selected 187 and 688 genes, respectively, that are specific to ciliated organisms. Stolc et al. [15] used microarray hybridization to analyze induction levels of all C. reinhardtii genes after deflagellation. They identified 220 genes that are induced at least two-fold and, therefore, are likely to be involved in the assembly or function of cilia and flagella.
Much less is known about the regulatory pathways that control the expression of ciliary components or direct the differentiation of ciliated cells. The transcription factor FoxJ1 appears to govern the differentiation of ciliated cells in vertebrates, but so far, only one gene has been shown to be directly regulated by FoxJ1 [16]. The transcription factor HNF1-β has also been shown to regulate several genes involved in ciliogenesis in the kidney [17]. Most importantly, regulatory factor X (RFX) transcription factors play a key role in regulating genes involved in ciliogenesis. RFX transcription factors are conserved in a wide range of species, including Saccharomyces cerevisiae, Caenorhabditis elegans, Drosophila melanogaster and mammals. They share a characteristic DNAbinding domain of the winged-helix DNA binding family and bind to an X-box motif, an imperfect inverted repeat with variable spacing between the repeats [18,19]. Whereas only one Rfx gene is described in yeast and C. elegans, two Rfx genes are present in the Drosophila genome and five in mammals [20]. Major clues on RFX functions in metazoans have been obtained from work on invertebrates. daf-19, the sole Rfx gene in C. elegans, is a key regulator of ciliogenesis [21]. dRfx in Drosophila is expressed in ciliated cells and is necessary for ciliated sensory neuron differentiation: all sensory neurons are present but cilia are missing at the dendritic tips [22,23]. In mouse, we have shown that RFX function in ciliogenesis is conserved. Indeed, Rfx3 controls the growth of mouse embryonic node cilia [24] and Rfx3 loss-of-function leads to hydrocephalus with differentiation defects of ciliated ependymal cells of the choroid plexus and subcommisural organ [25]. Moreover, Rfx3 mutant mice show insulin secretion failure and impaired glucose tolerance correlated with primary ciliary growth defects on islet cells [26]. In zebrafish, Rfx2 is expressed specifically in multiciliated cells of the pronephros and loss of Rfx2 leads to cyst formation and loss of multicilia [27]. The function of the other RFX proteins has yet to be linked to ciliogenesis. Rfx5, the most divergent mammalian member, regulates major histocompatibility class II gene expression and mutations in it are responsible for the bare lymphocyte syndrome [28]. Rfx4 has been implicated in dorsal patterning of brain development in mice and may participate in circadian rhythm regulation in humans [29][30][31][32].
Because RFX function in ciliogenesis appears conserved from C. elegans to mammals, X-box promoter motif sequences can guide the search for ciliary genes. Indeed, genome wide searches for genes controlled by DAF-19 in C. elegans have identified many genes involved in ciliogenesis [14,21,[33][34][35][36][37][38]. Genomic X-box searches thus comprise a key method to identify genes involved in ciliary development. We show here that ciliogenic RFX regulatory cascades are well conserved between D. melanogaster and C. elegans and identify a first set of 14 RFX target genes. In particular, we show that all known Drosophila homologs of genes defective in human Bardet-Biedl syndrome (BBS), a human ciliopathy with complex phenotypes, are controlled by dRFX. Moreover, by using comparative genomic screens we show that genes under dRFX control in D. melanogaster share conserved X-boxes with another divergent Drosophila species, D. pseudoobscura. Applied to the whole genome of both species, our comparative approach led to the identification of at least 11 novel RFX target genes. In vivo reporter assay studies for three of them confirmed their involvement in ciliary structure or function in Drosophila, thus illustrating the accuracy of our screen. In addition, we have established a highly confident Drosophila cilia and basal body (DCBB) gene list and highlight several genes as novel candidates for ciliogenesis. Our data are of particular importance for further genetic and genomic studies in the field of ciliogenesis and, consequently, for identifying genes involved in human ciliopathies.

RFX target genes in C. elegans and D. melanogaster and in compartmentalized ciliogenesis
and one was significantly over expressed. Eleven genes did not show significant expression variations between control and mutant background (Table 1).
In order to demonstrate the accuracy of our quantification procedure, we performed in vivo observations of reporter constructs of some of the genes in wild-type and dRfx deficient backgrounds ( Figure 1). As previously published, sensory neuron ciliary endings are missing in a dRfx deficient background [23]. As observed in the cell body or remaining dendrite, the expression of osm-1 is totally shut down in the dRfx deficient background, whereas the expression of oseg1 is not affected (Figure 1), in agreement with real-time RT-PCR results. Interestingly, CG3259 and CG9227 cDNAs were hardly detectable by real-time PCR and, thus, difficult to quantify. However, in vivo observations of reporter constructs in wild-type and dRfx mutant backgrounds show a complete absence of expression of these two genes in the mutant background ( Figure 1).
In summary, we show that RFX target genes are mainly conserved between C. elegans and D. melanogaster. Our functional comparative approach between both organisms combined with the work of Avidor-Reiss et al. in Drosophila allowed us to identify 27 genes that are regulated by dRFX in Drosophila. A majority of them are shown to be involved in ciliogenesis.

X-box conservation between D. melanogaster and D. pseudoobscura
As previously described [13,14,21,[36][37][38][39], the X-box promoter motif has been used successfully to screen for genes involved in ciliogenesis. As shown above, this first set of X-box gene data in Drosophila is thus a key to better understand the link between X-box sequences and dRFX transcriptional control in Drosophila. We looked for X-boxes in the promoters of dRFX target genes. We searched for X-boxes up to 3 kb upstream of the ATG for each of them, with the most degenerated X-box consensus deduced to date from known RFX protein binding sites (RYYNYY N1-3 RRNRAC). We could identify several X-boxes for each gene ( Table 2, columns 2 and 3). However, known negative control genes also presented X-boxes at the same frequency and no particular constraint on the consensus seemed to correlate with one set of genes. Therefore, the presence for one gene of an X-box upstream of its ATG is not predictive of dRFX-dependent expression. We thus turned to the D. pseudoobscura genome. The two Drosophila species' most recent common ancestor occurred 40-60 million years ago. The average identity of coding sequence between D. melanogaster and D. pseudoobscura at the nucleotide level is 70% for the first and second bases of codons, and 49% for the wobble base. Intron sequences are 40% identical, untranslated regions 45-50%, and DNA protein binding sites extracted from the literature have been estimated to an average of 63% [40]. Moreover, detailed comparison of both Drosophila genomes showed that 50-70% of known DNA binding sites reside in conserved sequence blocks in the genomes, called conserved regulatory elements (CREs), whereas the overall conservation of the cisregulatory regions is low [41][42][43].
We thus looked for D. pseudoobscura homologs of either dRFX positively regulated or invariant genes and for X-boxes up to 3 kb upstream of the ATG. Interestingly, 70% of conserved dRFX target genes present a conserved X-box in both species (Table 2), whereas only 23% of negative control genes present the same characteristic. Even more precisely, while the sequence and the location of X-boxes for dRFX target genes are conserved, this is not the case for negative control genes. Interestingly, palindromic X-boxes are significantly over-represented compared to non-palindromic X-box sequences in dRFX regulated genes in the two species.
We also looked for overall sequence conservation around the selected X-boxes by Vista promoter sequence comparison between the two Drosophila species. The percentage of identities was quantified either on 100 bp or 25 bp windows surrounding the X-boxes ( Figure 2, Table 2) and block conservation was considered positive if identities were over 50%. As shown in Table 2, sequences around the X-boxes are generally not well conserved. Two representative examples are depicted in Figure 2. For the CG9595/osm-6 gene, one of the two conserved X-boxes falls into an overall conserved 100 bp block, whereas the other one does not. For CG8853/che-13, the X-box falls into a poorly conserved region. These results are in agreement with previously published data showing that sequence block conservation alone cannot discriminate regulatory regions, but that binding site clusters present in multiple species more likely discriminate active and inactive clusters [43].

Screening Drosophila species' genomes for dRFX regulated genes
The presence of a conserved X-box upstream of genes in both D. melanogaster and D. pseudoobscura is thus a good prognostic factor to predict novel dRFX target genes. We thus screened the genome of both Drosophila species for the presence of X-boxes. We searched for all possible matches to a defined motif sequence using a Perl based algorithm [36]. The most degenerated consensus RYYNYY N1-3 RRNRAC found 50,000 hits throughout the entire genome of D. melanogaster and, therefore, could not be used within our experimental framework. We selected five different more restricted consensus motifs that cover X-boxes of the entire set of known target genes at the time (see Materials and methods). Four (RYYVYY N1-3 RRHRAC, GYTNYY N1-3 RRNRAC, GYTDYY N1-3 RRNRAC, GYTRYY N1-3 RRHRAC) were searched in a 1 kb window upstream of the ATG, and the less degenerated one, RTNRCC N1-3 RGYAAC, in a 3 kb window.
Under these conditions, 4,726 non-redundant genes in D. melanogaster and 3,848 in D. pseudoobscura with an X-box upstream of the start codon were selected. Based on a best hit reciprocal search between the two coding sequence (CDS) lists, we identified 1,462 homologous genes having an X-box in their 5' region in both species. This first set of 1,462 genes was further restricted by selecting only genes that share an Xbox with no more than 4 bases different (out of the 12 nucleotides recognized by the protein on either side of the spacer) between each species and in a conserved position upstream of the ATG (500 bp difference at most). The list was thus restricted to a subset of 412 genes (Additional data file 1). An even more restricted subset of genes was selected using the Xbox motif GYTRYY N1-3 RRHRAC, which was found upstream of most known target RFX genes at the beginning of this work, leading to a list of 83 genes (Table 3). Indeed, among the identified dRFX target genes for which a conserved X box was found in both Drosophila species (Table 2), the highest percentage of target genes (50%, 8 out of 16) was found in this list of 83 genes. The remaining 50% of known RFX target genes ( Table 2) were not selected by the X-box screen and thus represent false negatives (see Discussion for a comprehensive analysis).

X-box genes and ciliogenesis
In order to check for enrichment of genes involved in ciliogenesis, we compared our three X-box gene lists to previously published lists of genes potentially involved in cilium or centrosome composition. We first identified the Drosophila homologs for the full set of previously published genes from various organisms from several studies. These include comparative genomic studies of species that have cilia versus species that do not and proteomic analyses of human cilia and centrosome, Chlamydomonas flagellar or basal body and Trypanosoma brucei proteomes [10][11][12][13][14]44,45]. This set also includes recent genome-wide transcriptional analysis of gene expression during flagellar regeneration in Chlamydomonas or identified by SAGE analysis of ciliated neurons combined with X-box searches in C. elegans [15,36,37]. The full set of Drosophila homologs that we found for all studies combined is listed as the DCBB gene set (Additional data file 2).
Interestingly, comparing our set of 1,462 Drosophila X-box candidate genes with the DCBB dataset shows that our list is slightly enriched in DCBB genes. Whereas 5% of the D. melanogaster genome is in the DCBB dataset, our 412 and the 83 X-box gene candidate datasets appear to be highly enriched in DCBB genes (11% and 22%, respectively), suggesting that the X-box conservation is a good marker for genes potentially involved in ciliogenesis ( Table 4).
The full set of genes with a putative function in ciliogenesis has also been summarized in parallel in two independent databases called the Ciliary proteome and Ciliome databases [46][47][48][49]. Surprisingly, when we compared the two published databases with the DCBB dataset that we established for Drosophila using similar comparative methods (see Materials and methods and Additional data file 2), we observed large discrepancies between all three datasets (illustrated in Figure  3 and Additional data file 3). There are some differences between the three studies with regard to the initial published sets of genes that were included in the database. The major difference resides in which data are included from the work of Blacque et al. [37]. The Ciliome database [47] includes the complete SAGE dataset from Table S1 in [37], whereas our DCBB dataset includes only data from Table 1 from Blacque et al. (2005), which contains part of the SAGE data combined with an X-box search. The ciliary proteome database [46] includes data from Table S4 of the Blacque et al. study [37], which reports the list of putative X-box genes in the nematode. These differences could account for the high number of genes exclusively represented in the Ciliome database [47] but cannot account for all the discrepancies between our DCBB dataset and the Ciliary proteome database [46] (Additional data file 3). Very likely, the differences observed between all three studies illustrate the problems inherent in automatically processing published tables and gene lists that are then used to compile homologous genes from several different organisms. Another major explanation for the observed discrepancies resides in the order BLAST searches were performed to create each database. For example, the Ciliary proteome database [46] was obtained by looking first for human homologs for each study, and then for the Drosophila ones (unless Drosophila was the starting study). In our DCBB dataset, we have looked for Drosophila homologs, which were then compared to other datasets. Hence, genes that do not have an ortholog in Drosophila or in human are lost in the respective studies.
However, we show that our lists of 412 and 83 X-box genes are enriched in genes involved in ciliogenesis, whatever database is considered (Table 3, Additional data file 1). Thus, our    genome wide X-box consensus motif search allowed the establishment of promising sets of candidate genes for ciliogenesis studies.

Functional analysis of identified X-box genes
We performed functional expression studies to determine whether or not some of the 83 X-box genes (Table 3) are indeed under dRFX control and if they are involved in ciliogenesis. Twenty-five genes were tested by real time RT-PCR to compare their levels of expression in wild-type versus dRfx deficient fly samples. Interestingly, 16 are under dRFX control (Table 3, fold variation indicated in column 2). Among them, 11 have not yet been described as RFX targets in any biological system and two of them have no assigned function as of yet. Nine genes were not found to be under dRFX control (Table 3, noted as 'Neg' in column 2). Among 19 genes also represented in the DCBB dataset (Table 3, Additional data file 2), 17 were tested by real time PCR. Fourteen are indeed regulated by dRFX and only three do not appear to be regulated by it. The two remaining genes were not amplified by real time RT-PCR and, thus, could not be analyzed by this approach. Interestingly, among six genes that were not found in any ciliary database and whose expression was quantified by real-time PCR, two (CG13415/Cby, CG31036) were downregulated in dRfx mutants. Thus, a high proportion of the genes on the list of 83 X-box genes are indeed dRFX target genes. The 58 remaining genes from this list that have not yet been analyzed are thus promising candidates. Our whole genome screen led to the identification of novel dRFX target genes.
Among the 11 novel dRFX target genes that we identified in this screen and that have never been described as RFX target genes in any organism, 9 do have a described or highly predic-tive function in ciliogenesis in other organisms. For example, CG15161 encodes the homolog of the IFT46 subunit in Chlamydomonas [50] and the dyf-6 ciliary gene in C. elegans [51]. CG15148/btv, CG3723 and CG17150 encode different dynein subunits. beethoven (btv) mutants show defects in sensory cilia in Drosophila [52], whereas no functional studies are available for either CG3723 and CG17150 or their orthologs in any biological system. CG6129 is the only Drosophila member of the rootletin family of proteins. In mammals, rootletin is necessary for retinal cilia stability and centrosome cohesion in mammalian cells [53][54][55][56]. CG4536/ osm-9 encodes a vanilloid receptor of the transient receptor potential (TRP) family of ion channels. osm-9 is involved in sensory cilia function in Drosophila and C. elegans, and in mammals, TRPV4 plays a crucial role in ciliary activity [57]. CG9227/Tectonic has been described as being involved in Shh signaling in mouse [58]. It has been isolated by comparative genomics as a candidate for ciliogenesis and shown to be specific to ciliated cells in Drosophila [13]. CG13125 has recently been shown to be specific to species with motile cilia and its homolog, TbCMF46, is necessary for flagellar motility in T. brucei [59]. CG3259 encodes the MIP-T3 protein that associates with the tumor necrosis factor receptor in human cells. It is also an inhibitor of the IL13 signaling pathway that is known to repress ciliary differentiation of human epithelial cells in vitro [60][61][62]. It is expressed in ciliated sensory cells in Drosophila [13]. Thus, the gene CG3259 may have a direct function in ciliogenesis, which functional studies in Drosophila will allow to be deciphered.
Interestingly, two novel dRFX target genes have not been described as being involved in ciliogenesis in any organism. CG13415/Chibby encodes a protein that interacts with the βcatenin protein and has been shown in Drosophila and in Listing established with the restricted GYTRYYN{1-3}RRHRAC X-box consensus. Neg, invariant expression in dRfx deficient background compared to wt; iv, in vivo confirmation of reporter construct down regulation in dRfx deficient background compared to wild type. The presence of a gene in other published studies is noted as ν. mammalian cells to antagonize the Wg/Wnt signaling pathway [63][64][65]. The second gene, CG31036, has an unknown function and no obvious ortholog in vertebrates. Protein structure prediction algorithms detect a central transmembrane domain and a signal peptide at the amino-terminus of the protein encoded by CG31036.

Expression profile of three novel dRFX target genes
In order to further validate our screen, we chose three genes (CG6129/rootletin, C13125/TbCMF46 and CG31036) for in vivo study. CG6129 was selected to address the question of conservation in Drosophila of the dual role described in mammals for the rootletin protein in centrosome and ciliary biology. CG13125 is of particular interest to evaluate the possible involvement of a 'motility gene' in Drosophila sensory cilia. Last, since nothing was known about CG31036, we wanted to address whether this gene is involved in ciliogenesis and, thus, validate the overall X-box screening strategy.
Reporter constructs were made by cloning large promoter fragments including the conserved X-box, plus coding sequences in frame with green fluorescent protein (GFP). Transgenic flies were established and analyzed for GFP expression. Two types of ciliated cells have been described in Drosophila: spermatozoa and type I sensory neurons that innervate the proprioceptive chordotonal organs and external sensory organs that are mechano-or chemosensory. Remarkably, the expression of all three reporter constructs was observed only in type I sensory neurons. As a control, reporter GFP expression was compared to mRNA expression by in situ hybridization. CG6129/rootletin protein expression reproduces the expression of the transcript in only type I sensory neurons of the embryo (data not shown). CG31036 RNA expression is also available from the BDGP database [66]. CG31036 mRNA is restricted to type I sensory neurons of the head, thoraxes and abdomen of the embryo and reflects the protein expression of our transgene. However, we did not observe a strong protein expression in the gut as observed for the transcript. This could either reflect a non-specific hybridization signal or the presence of other transcript isoforms driven by a different promoter. We could not detect CG13125 transcripts by in situ hybridization, likely illustrating the faint expression of this gene in Drosophila.
Chimeric CG6129::GFP protein was present in the rootlet processes of the chordotonal dendrites, in agreement with the predicted function of rootletin in ciliary rootlet organization ( Figure 4). It was also detected faintly at the cilium tip ( Figure  4d) and clearly in axons ( Figure 4). Since our construct does not include all the coding sequences of the rootletin protein, it is possible that the GFP expression does not reflect the exact location of the endogenous protein. Rootletin has been shown in mammalian cell culture to be localized to the ciliary rootlet and to be involved in centrosome cohesion [56]. We show that CG6129/Rootletin expression is restricted to ciliated chordotonal neurons in Drosophila, thus suggesting an involvement only in ciliogenesis. Despite strong GFP expression in the chordotonal organs, no expression was observed in the ciliated sensory neurons that innervate external sensory organs. Either the expression in those cells is too weak, or ciliary rootlets in Drosophila, as represented by CG6129/rootletin GFP expression, are restricted only to chordotonal organs, as observed previously by electron microscopy [67,68].
CG31036::GFP specifically marks the ciliated endings of chordotonal neurons and confirms that this novel protein is a component of ciliated endings (Figure 4). The GFP signal is apposed to the 21A6 antibody staining, directed against the eyes shut protein, which has been described to locate at the ciliary dilation around the tip of the ciliated segment [69]. This implies that CG31036::GFP most likely locates to the tip of the tubular bundle that extends after the ciliary dilation (schematic in Figure 1a). However, only ultrastructural observations of immunogold labelings will allow precise subcellular localization of both CG6129/rootletin and CG31036. Interestingly, CG31036::GFP expression is also detectable in external sensory neurons as a dot apposed to the 21A6 antibody staining (Figure 4f). Finally, we confirmed that both reporter constructs are under dRfx control as the GFP signal was completely shut down in a dRfx mutant background (compare Figure 4d and 4e or 4i and 4j).
For the third construct, CG13125::GFP localization was consistently observed in the chordotonal neurons at the base of the cilium, presumably the basal body region, and also at the tip of what is likely the cilium. GFP expression was also often observed in the external sensory neurons as a dot but without consistent reproducibility, probably illustrating a threshold level of expression for these cells and the faint level of expression of the CG13125/TbCMF46 transgene (Figure 4k,l).
In conclusion, the three novel dRFX target genes that we identified in our X-box motif searches are indeed under dRFX control in vivo and specifically expressed in ciliated sensory neurons in Drosophila. In addition, they encode proteins that are localized to the base or the tip of the cilium, thus suggesting a role in ciliary structure or function.

Discussion
Ciliogenic RFX regulatory networks are conserved between C. elegans and D. melanogaster. Based on these first observations, the genomic screens we conducted combined with functional and in vivo gene analyses led to the identification of at least 11 novel genes that had never been described as RFX targets in any biological model. In addition, our screen allowed us to identify at least two novel genes specifically expressed in ciliated sensory neurons in Drosophila that are potentially involved in sensory ciliogenesis. These results validate the accuracy of our screens. Our work thus provides a new set of candidate genes for further functional studies in ciliogenesis.

Molecular nature of RFX target gene products
Our Drosophila genome wide X-box screen led to the identification of 83 X-box genes among which we report 11 novel RFX targets. Combined with the genes identified by comparisons to C. elegans or to other genomic studies in Drosophila (Table 1) [13], we report 35 genes regulated by dRFX in Drosophila. Most of these genes can be classified based on their described function. Many of the RFX target genes are involved in IFT, which is necessary for cilium assembly and function [1]. Remarkably, a second class of genes regulated by dRFX includes all the Drosophila homologs of BBS genes.
Similarly, most C. elegans BBS genes are regulated by DAF-19 [14,36,37]. This strong dependence of BBS genes on RFX control may thus be conserved in mammals. Hence, RFX proteins may be involved in BBS in humans. Interestingly, two of the three Drosophila genes coding for proteins with B9 domains are also controlled by dRFX (tectonic, CG14870). One human B9 domain protein, MKS1, is known to be involved in the human Meckel-Gruber syndrome [70]. The molecular function of this domain is unknown and work in Drosophila suggested that these two B9 domain containing proteins are likely involved in ciliogenesis [13]. Several of the novel dRFX target genes that we identified in this study encode known components of the ciliary axoneme and associated structures, such as axonemal dyneins or rootletin. Other genes encode different types of proteins likely involved in sensory transduction (CG4536/osm-9/TRPV4 or MIP-T3). A last class includes genes for which the function is either not described or poorly understood, such as CG31036 and CG13125. However, our functional studies strongly suggest that they are also probably involved in sensory ciliogenesis in Drosophila as well. Thus, RFX target genes play various roles in ciliary structure and function and our X-box search strategy has proven to be useful to identify novel ciliogenic genes.
Comparison of the DCBB set of genes with the Ciliary proteome and Ciliome databases Figure 3 Comparison of the DCBB set of genes with the Ciliary proteome and Ciliome databases. Venn diagram presenting the overlaps between the three datasets: the cilia proteome [46,48]; the ciliome [47,49], and the DCBB (Additional data file 2). Asterisks indicate this study. Note that only 412 common genes are found in the three datasets. The number of genes also found in the 1,462, 412 or 83 X-box gene lists (Table 4), respectively, are noted in parentheses. The numbers of genes selected in the different studies to construct each dataset are given in Additional data file 3.

Database mining using the X-box promoter motif
This full set of dRFX target genes in Drosophila is of crucial importance, as we can now more precisely define X-box sequences and the promoter context required for dRFX control. This will be particularly useful for further database mining of dRFX target genes in Drosophila. In fact, several genes that are under dRFX control (Table 1, for example CG4525, CG17599) for which an X-box can be identified did not come out in the whole genome X-box screen. Several reasons can explain this result. First, homologs were not all annotated in CDS listings that were available at the time of the search (for example, CG18631, CG9595, nompB in D. pseudoobsura). Second, annotation of both Drosophila databases is incomplete, as sometimes the start codon is not properly defined for all genes. Our X-box search algorithm keeps only genes for which the X-box match is upstream of the ATG. For example, for CG15666/GA13881, we clearly predict that the correct ATG should be considered 75 bp downstream of the currently defined ATG, based on evolutionarily conserved sequences. This definition clearly excludes the homologous genes CG15666 and GA13881 from the dataset. However, as illustrated in Table 2, in a few cases, our X-box consensus cannot define a clearly conserved X-box match in the two Drosophila species for genes that appear to be down-regulated in a dRfx mutant, while several individual X-boxes are found separately in each organism. This could either reflect that these genes are not direct dRFX targets but are shut down by a feedback control loop that is not dependent on a Xbox motif, or that the X-box is only loosely conserved in some promoter contexts. Notably, homologs of these genes in C. elegans are under RFX (DAF-19) control and have a well defined X-box (for example, CG9333/che-2, CG13691/bbs- 8), which argues in favor of the second possibility. Interestingly, we also quantified the expression levels in control and dRfx deficient Drosophila of several genes of the DCBB dataset that did not come out of the X-box genome-wide motif search. It allowed us to identify several novel genes that are indeed down-regulated in dRfx mutants, but for which no conserved X-box can be recognized based on our initial con-sensus motif (AL, unpublished). Altogether, our observations clearly highlight the difficulties encountered in motif definition in promoters. Similar conclusions were deduced from a parallel approach performed in C. elegans, which has led to the identification of several novel DAF-19 target genes [38]. Interestingly, in that study the in silico search was associated with microarray analysis of transcripts in wild-type and daf-19 mutant worms. The in silico search allowed the identification of 93 X-box genes. Yet, among the 466 genes that were shown to be down-regulated at least two-fold in microarray hybridization experiments, only 25 were also represented in the 93 in silico X-box gene list. Thus, in silico searches on isolated motifs are likely hampered by a high level of false negatives. In order to improve the screening efficiency, the use of combinatorial motif searches would probably greatly enhance the accuracy of the screen as proposed by other studies [71,72]. Even though, since conserved X-boxes that we identified are rarely associated with highly conserved surrounding sequences (Table 2), it is reasonable to assume that other conserved nearby motifs, still to be identified, could help to discriminate between false positives and false negatives.

Regulatory network of ciliary genes
We have identified 35 genes that are transcriptionally downregulated in dRfx mutants. We show that RFX regulatory networks are conserved between C. elegans and Drosophila as most of the genes controlled by DAF-19 in C. elegans are also under dRFX control in D. melanogaster. Interestingly, our results show that only certain subsets of ciliogenic genes are regulated by RFX proteins. For example, in our assay conditions all the genes known to be involved in IFT-A complexes are not regulated by dRFX, whereas all IFT-B homologous proteins are regulated by dRFX. In addition, retrograde motors are also regulated by dRFX (CG15148/btv and CG3769), whereas anterograde motors seem not to be. Indeed, in addition to CG10642/KIF3A, the main described anterograde motor in several organisms, we have shown that two other kinesin subunits, CG17461/Kif3C/osm-3 and CG7293/Klp68D, are invariantly expressed in wild-type and Reporter GFP expression studies for three X-box containing genes dRfx-deficient Drosophila (AL, data not shown). It is also interesting to note that all the BBS gene homologs in D. melanogaster are under dRFX control ( Table 1).
The biological significance of these observations is unclear. It could reflect the fact that IFT-B proteins, BBS proteins and the dyneins involved in IFT are dedicated to ciliogenesis and, therefore, need to be turned on concomitantly only when the cilium is formed, whereas IFT-A complexes or anterograde transport kinesin II share more complex regulatory controls as they might be necessary also for other cellular functions. This is the case for kinesin II motors [73], but does not seem to be true for IFT-A complexes as these proteins are proposed to be specific for ciliated organisms [13]. In C. elegans, the ciliary IFT machinery works in modular fashion [74], and it is tempting to speculate that RFX-dependent proteins could be involved in specialized ciliogenic transport modules.
Genes necessary for centriole biogenesis or replication, such as the recently described DSas-6, DSas-4 or sak genes [75][76][77][78] are not present in our screen and no conserved X-box can be found upstream of these genes. Thus, dRFX does not seem to regulate centriole biogenesis and appears to be restricted to cilia assembly only.
To find which transcription factors are responsible for governing other sets of ciliary proteins will certainly be one track to follow. Based on our data, it would be of particular interest to compare promoter sequences of genes, either regulated by dRFX, or not. It may allow us to discover novel regulatory motifs and protein modules that are necessary to coordinate ciliogenesis control. So far, only a few transcription factors have been shown to be involved in the control of ciliogenesis: the RFX proteins [21,23,24], Foxj1 [16], and HNF1-beta [17]. However, the last two have no obvious homologs in Drosophila. Thus, our work strongly suggests that novel transcription factors necessary for ciliogenesis still need to be discovered.

Novel RFX target genes
Some of the novel RFX target genes found in Drosophila were unexpected. For example, we identified several proteins that are proposed to be involved in flagella or cilia motility, such as dynein heavy chains (CG17150/Dhc93AB). Recently, a CG13125 homolog has also been shown to function as a motility factor in T. brucei (TbCMF46) [59]. Sensory cilia are thought not to be motile in general. However, it has been shown that Drosophila chordotonal neurons of the antenna generate motion that depends on the integrity of proteins encoded by genes such as CG15148/btv (cytoplasmic dynein heavy chain) or CG14620/tilB (LRRC6 homolog), described to affect the axonemal structure [52,79] (D Eberl, personal communication). In addition, cilia of the chordotonal neurons of the grasshopper bend upon vibration stimulation [80]. Thus, proteins involved in axonemal motility might be important for motion generation of the cilium in response to mechanical stimulation. It will be of high interest to determine whether flies defective in these 'motility' genes are affected in hearing and, more specifically, in the motility of the mechanosensory cilium that amplifies hearing vibrations. Interestingly, CG13125/TbCMF46 does not seem to be expressed in fly testis (AL, unpublished), where the spermatozoa are the only cell type with a motile flagellum in flies. This suggests that like CG15148/btv, CG13125/TbCMF46 function could be restricted to the sensory cilium and, more specifically, in allowing these cilia to mechanically respond to auditory vibrations [52]. Thus, our data suggest that in the fly, possible axonemal motility could be regulated by different subsets of proteins in sperm flagella and in mechanosensory cilia. This is of particular interest with regard to hearing in mammals, which is dependent on hair cell motility. It will be very interesting to determine whether the CG13125/ TbCMF46 homolog in mammals does have a specific function in those cell types.
We also identified in our screen three genes (CG6054/Su(fu), CG13415/Cby, CG33038/Ext (2)) known to be involved in the hedgehog or wingless signaling pathways in Drosophila. Su(fu) and Ext (2) are involved in the Hedgehog pathway and Su(fu) is localized to cilia in mammalian cells [81]. However, Su(fu) and Ext (2) do not appear to be under dRfx control according to real-time PCR quantification results (Table 3) and may be false positives in our screen. This result argues in favor of the generally accepted observation that the Hedgehog signaling pathway does not seem to depend on ciliogenic proteins in Drosophila [82]. Only Chibby (Cby) is statistically down-regulated two-fold in a dRfx deficient background. Cby was isolated in a two-hybrid screen for armadillo/beta-catenin interactors. RNAi knock-down of Cby in Drosophila embryos leads to ectopic activation of the wingless pathway [63]. Cby is also described to antagonize the Wnt/beta-catenin pathway in mammalian cells [64,65]. However, the expression pattern of Cby in Drosophila is not documented, so we do not know if the variations of expression observed in the dRfx deficient background are connected to dRfx expression and, thus, if it is biologically significant.
Among the 83 genes with conserved X-boxes between D. melanogaster and D. pseudoobscura (Table 3), several genes were hardly detectable by quantitative RT-PCR. Hence we were unable to determine by this approach if they are under dRFX control. This could reflect that these genes are expressed only in a subset of sensory neurons and, thus, difficult to detect by quantitative RT-PCR. Nevertheless, several genes are interesting as potential ciliogenic or RFX target genes. For example, CG14079 is homologous to a mouse protein that appears to be specific to testis. CG11356 is homologous to mammalian arl13, which has just been isolated in an ethyl-nitroso-urea screen for neural tube defects in mouse. Indeed, mutation of arl13 affects ciliary architecture and Sonic-Hedgehog signaling in mouse [83]. This gene, CG11356, was not found in any previous ciliogenesis study, again illustrating the accuracy of our screen. Functional studies in Drosophila will be of particular importance to demonstrate the role of this gene in sensory ciliogenesis.

Conclusion
We have identified more than 30 dRFX target genes in Drosophila by exploiting the efficiency of the X-box promoter motif search by using two divergent Drosophila species in a comparative approach. These full sets of RFX dependent or independent ciliary genes are of particular importance for studies of X-box promoter motifs and associated promoter contexts in Drosophila. More remarkably, our screen allowed the identification of at least two novel genes specific to sensory ciliary architecture in D. melanogaster and provides several new RFX target gene candidates potentially involved in ciliogenesis. This is of particular importance with regard to the growing number of human diseases that are being associated with ciliary defects (for reviews, see [4,5,7]).

Quantitative RT-PCR
Total RNA was extracted from 40-hour old puparium using TRIzol reagent (Invitrogen, Carlsbad, CA, USA) or RNeazy (Qiagen, Venlo, The Netherlands). Pupae head and abdomen were removed as well as internal organs and muscles in order to enrich as much as possible the extract for sensory organs from thoraxes, legs and wings. DNA was digested with DNAfree reagent (Ambion, Austin, TX, USA

Bioinformatics
Individual X-boxes (consensus RYYNYYN{1-3}RRNRAC) were searched for in the 5' upstream regions of ATGs on the same strand (+) and the antiparallel strand (-) in both D. mel-anogaster and D. pseudoobscura homologs [84]. Genome wide searches for X-box promoter motifs were primarily performed using a Perl-based algorithm that identifies all possible matches in a given DNA sequence. First, the algorithm finds all sequences that match a defined consensus, then the main module implements a cross-match file that compares a 3 kb window downstream of each match to a file containing the DNA sequences for all predicted genes [36]. Genome sequence information, gene prediction and CDS files for Xbox searches were obtained from the following sources: the D. melanogaster complete genome sequence used was BDGP release 4; the complete CDS list was built from release 3.2.1 [85]. For D. pseudoobscura the 28 August 2003 genome assembly was used and release 2.1 of CDS sequences from BCM-HGSC were used [40]. Reverse BLASTP analysis was performed between the two CDS files in order to establish a list of orthologous genes between the two fly species with a cut-off value of BLAST e-score <1 e -10 . Comparisons of all listed gene information were performed on a Unix platform. BDGP and Flybase databases were mined for expression patterns and gene information. Genome conservation between the two fly species was evaluated using the VISTA interface [86].

DCBB dataset
The ciliary and basal body genes in Additional data file 2 were identified using a reverse BLASTP strategy to define the best homologous proteins or genes described in the following studies: 210 proteins published in Table 2 from the human ciliary proteome [10] as modified by Marshall [87], 159 putative target genes of DAF-19 [36], 219 over expressed genes after deflagellation in C. reinhardtii described in Table 9 of Stolc et al. [15], 54 genes (Table 1) expressed in ciliated sensory neurons in C. elegans [37], 654 proteins identified in C. reinhardtii flagella [11], 380 proteins identified in the T. brucei flagella proteome [12] and 114 proteins listed in Table S1 for the human cell centrosome [44]. The following Drosophila homologs were extracted from published work: 260 genes described as homologous to the FABB proteins from C. reinhardtii in Table S1 of Li et al. [14], 51 genes described as homologous to 195 proteins described in Table S2 for the basal body proteome of C. reinhardtii [45] and 187 genes from Table S1 of compartmentalized cilia predicted genes, which has been modified to 188 genes according to Flybase annotation [13].

Reporter constructs
DNA fragments were amplified from wild-type fly genomic DNA using the Expand Long Template PCR system (Roche). Cloning strategies used primers to clone in frame the gene of interest to the GFP sequence of the PW8-GFP vector [88]. CG13125::GFP plasmid, a 3,547 bp genomic DNA fragment containing the complete coding sequence of CG13125-RA and RB, was amplified from Canton-S using primers starting 1,484 bp upstream of the RB ATG until the penultimate codon of the gene. CG6129::GFP plasmid, a 4,129 bp genomic DNA fragment containing part of the CG6129-RB gene, was amplified by PCR from Charolles genomic DNA using primers starting 2,619 bp upstream of the RB ATG. CG31036::GFP plasmid, a 3,780 bp genomic DNA fragment containing part of the CG31036-RA gene, was amplified by PCR from Canton-S using primers starting 1,800 bp upstream of the ATG. All coding regions cloned were entirely sequenced prior to transgenesis. Transgenic lines were established by P-element mediated germline transformation as described [89].
The preparation of embryos for staining assays was carried out according to general methods described previously [90]. Live observations of dechorionated embryos and larvae were performed on mounted material under coverslips in DakoCytomation media. For pupae immunostaining, 72-to 96-hour old animals were fixed for 20 minutes in 4% paraformaldehyde, 3% triton X-100 in phosphate-buffered saline. Primary antibodies were rabbit anti-GFP (1:250) from Torres Pines Biolabs (Houston, TX, USA), or (1:500) from Molecular Probes (Invitrogen, Carlsbad, CA, USA), mouse anti-eys 21A6 and mouse anti-Futch 22C10 (kindly provided by S Benzer), mouse anti-elav 9F8A9 (1:500) obtained from the Developmental Studies Hybridoma Bank, Iowa City, IA, USA. Secondary conjugated antibodies were A488 and A546-anti-mouse and anti-rabbit (Molecular Probes, Invitrogen, Carlsbad, CA, USA). Images were obtained on a Zeiss Imager Z1 and LSM510 confocal microscope.