Function-informed transcriptome analysis of Drosophila renal tubule

Analysis of the transcriptome of the Drosophila melanogaster Malpighian (renal) tubule gives a radically new view of the function of the tubule, emphasising solute transport rather than fluid secretion.


Background
Microarrays allow the interrogation of the transcriptome, the set of genes transcribed in a particular cell type under a particular condition [1]. Arrays are particularly potent tools when their coverage is relatively comprehensive, based on a completed and well annotated genome, such as that of Drosophila [2]. Commonly, they are used in time series, for example of development, of life events such as metamorphosis [3], of rhythmic behavior [4] or of responses to environment, such as aging or starvation [5,6]. In Drosophila, arrays are frequently used for whole-organism studies, but in multicellular organisms the ease of experimentation must be balanced against two potential problems: sensitivity and opposing changes. In the first case, even large changes in gene expression in a small tissue will not significantly influence the overall levels in the whole organism; in the second, changes in opposite directions in roughly balanced populations of cells (for example, the sharpening of expression patterns of pairrule genes) will cancel out at an organismal scale. It is thus vital to resolve gene expression not only over time but also over space. In practice, this means looking at gene expression in defined cell types and tissues as well as in the whole organism. Our assumption is that the expression of many putative genes will go undetected until such tissue-specific studies are performed [7] -with obvious consequences for post-genomics -and we illustrate this point in this paper.
We applied Affymetrix arrays in the context of a defined tissue with extensive physiological characterization, the Malpighian (renal) tubule of Drosophila melanogaster. The tubule is a valuable model for studies of both epithelial development and function. Developmentally, the tissue is derived from two distinct origins: an ectodermal outpushing of the hindgut and subsequent invasion (late in embryogenesis) by mesodermal cells [8]. Tubule morphology is very precisely and reproducibly specified; in the tiny tissue of 150 cells, there are altogether six cell types and six regions, specified to single-cell precision [9]. The transport processes that underlie fluid production in the tubule are known in extraordinary detail for so small an organism [10][11][12]. The dual origin of the cell types is reflected by dual roles for the ectodermal principal cells and mesodermal stellate cells in the mature tubule; the principal cell is specialized for active transport of cations, whereas the stellate cell appears to control passive shunt conductance [11,13,14]. Cell signaling pathways are also understood in considerable detail: several peptide hormones that act on tubule have been identified [15][16][17], and the second messengers cyclic AMP, cyclic GMP, calcium and nitric oxide have all been shown to have distinct roles in each tubule cell type [10,[18][19][20].
This wealth of physiological knowledge provides a framework for the analysis of the results, and thus -unusually in genetic model organisms -a reality check on the usefulness of the experiment.

Results
The principle of the experiment was to compare the transcriptome of 7-day adult Drosophila melanogaster Malpighian (renal) tubules, for which defined state there is a wealth of physiological data, with matched whole flies. As described in Materials and methods, data were analyzed by Affymetrix MAS 5.0 software, or by dChip, or dChip and Significance Analysis of Microarrays (SAM) software. Both methods of identifying differentially expressed genes from dChip-normalized data gave virtually the same results. Indeed, SAM analysis followed by further filtering produced 1,465 differentially expressed genes compared to 1,455 genes identified within filtering by dChip alone. Furthermore, the latter list is indeed a subset of the former one. For that reason we report only the list generated by dChip in comparison with MAS data.
Both MAS and dChip/SAM gave comparable views of the data, despite the radically different approaches to analysis. It has been shown that the average absolute log ratios between replicate arrays calculated with dChip are significantly lower than one calculated with Affymetrix software (Li and Wong [21]). This bias affecting fold-change calculations is the price of the increased precision that manifests itself in reduced variance, and consequently in the increased sensitivity of identification of differentially expressed genes. Nonetheless, the rank correlation is good (Spearman's r = 0.6, p < 0.0001). Taking genes called as significant by both systems, MAS5 'up' call or dChip t-test p-value of 0.01, and narrowing the list by setting an arbitrary cutoff of twofold enrichment and minimum mean difference of 100, MAS5 reported 683 genes and dChip reported 671. Furthermore, the dChip-reported genes overlap with 77% of MAS5-reported genes and this number increases to 91% if only the top 500 MAS5-reported genes are considered. Our confidence in the quality of the dataset is thus high. For simplicity, and because the two analyses produce concordant results, further analysis is restricted to the MAS5 results.
The full microarray data have been deposited in ArrayExpress [22]. The fly versus fly and tubule versus tubule samples were extremely consistent, despite the technical difficulty in obtaining the latter (30,000 tubules were dissected in total). In contrast, there was wide divergence between fly and tubule samples ( Figure 1). Although a common set of housekeeping genes showed comparable abundance, there was a large set of genes enriched in the fly sample, and a smaller set of genes strongly enriched in the tubule sample. In detail, of 13,966 array entries, 6,613 genes were called 'present' in all five fly samples, compared with 3,873 in tubules. A total of 3,566 genes were present in both fly and tubule: 3,047 in fly only and 307 in tubule only. This illustrates the point that wholeorganism views of gene expression are not necessarily helpful in reflecting gene-expression levels in individual tissues. The microarray data are summarized in Tables 1,2.

Validation of the microarray
Four genes were selected from each of three fly tubule expression classes: very highly enriched; uniformly expressed; and very highly depleted. The expression of each gene was verified by quantitative reverse transcription PCR (RT-PCR) and the data are presented in Table 3. The agreement between Affymetrix microarray and quantitative PCR determination is good, further increasing our confidence in the robustness of the dataset, and in the approximate correspondence between signal and RNA abundance as a population average. It should be noted that the absolute sizes of the ratios are quite variable; this is a property of dividing a large number by a very small one. Nonetheless, genes scored as enriched or depleted on the arrays are invariably similarly scored by quantitative RT-PCR (QRT-PCR).
These data can also be used to validate the use of the normalized Affymetrix signal as a semi-quantitative measure of RNA abundance (Table 1). If the QRT-PCR dataset of Table 3 is normalized against corresponding signals for rp49 (generally taken to be a ubiquitous gene with invariant expression levels in Drosophila), and compared with the globally normalized Affymetrix signal, the agreement is seen to be excellent (Figure 2), with a Spearman's r of 0.83 (p < 0.0001). With appropriate caution, the normalized Affymetrix signal can thus be taken as a reasonable estimate of expression levels between genes. Table 1 shows the top 20 genes listed by mean Affymetrix signal intensity. Although this is only a semi-quantitative measure of transcript abundance, the identities of the known genes in the lists are illuminating, and persuade us that the approach has some informal value. Specifically, mRNAs for ribosomal proteins dominate the list, and transporters are conspicuous in the balance. For example, the V-ATPase that energizes transport by tubules is represented by one gene (other subunits are also abundant, but just below the cutoff for Table 1). The α-subunit of the Na + , K + ATPase is also highly abundant: this is more surprising, and is discussed below. Two organic cation transporters are also very abundant. Alcohol dehydrogenase, long known to be expressed in tubules [23,24], is also a major transcript. There are also surprises: the most abundant signal is for metallothionein A. This is entirely consistent with our classical understanding of tubule function: it has long been known as a route for metal sequestration and excretion [25][26][27][28][29][30]. However, in the entire literature on Malpighian tubules, we are not aware of a physiological investigation of the role of metallothionein, other than documentation of expression [31,32]. The microarray results can thus potently direct and inform future research. Table 2 lists the 53 tubule-enriched genes that are enriched at least 25-fold, in comparison with the whole fly (the full list is provided as an additional data file). The conspicuous feature of these data is the extent to which tubule transcripts differ from any previously published profile. When comparing fly with tubule, there is a large set of genes that are downregulated and another large set of genes that are upregulated in tubule. The extent of the upregulation is also remarkable: the top gene is 99-fold enriched; the top 10 at least 50-fold enriched; and the top 100 at least 16-fold enriched in tubule compared to fly. The standard errors are also extremely low, meaning that we can be very confident (by two separate statistical measures) of the genes called significantly enriched in tubule.

The phenotype gap
Another prominent feature of the signal data in Table 1 is the relatively large fraction of novel genes (those for which there is not even a computer prediction of function) at the top of the list. Indeed, five of the top 10 genes by signal intensity are completely novel -that is, there are no known orthologs -and should provide tantalizing insights into tubule function. The 'phenotype gap' [33,34] is a key problem in functional genomics; that is, the genetic models preferred for genomics are historically not the organisms selected by physiologists. This can lead to a log-jam in reverse genetics, which depends critically on a wide range of phenotypes to identify effects of the mutation of target genes [12]. It has recently become possible to quantify the phenotype gap [35]. The present dataset elegantly exposes the phenotype gap in Drosophila, and shows that the tubule phenotype may go some way to closing it. Around 20% of Drosophila genes have been studied in sufficient detail to attract names (beyond the standard 'CG' notation for computer-annotated genes). Figure 3 shows that the fraction of anonymous genes in the tubule-enriched list is far higher than would be expected. That is, previous work has tended to overlook these genes. Conversely, because it is possible to perform detailed physiological analysis in tubules, it is possible to close the phenotype gap for these genes. There is a general implication from these data: that functional genomics, in Drosophila and other species, will rely increasingly on the study of specific tissues, as it is only in this context that expression of genes will be either measurable or explicable.  Log mean signal (tubules) Log mean signal (whole fly)

Reconciling array data with function
Many microarray experiments merely classify enriched genes to their Gene Ontology families. However, the uniquely detailed physiological data available on the Malpighian tubule allows a much more informative approach. The dataset can be validated by inspection, based on known molecular functions in the tissue and new functions can be inferred from abundant or enriched transcripts in the dataset. As the array is relatively comprehensive (corresponding to the 13,500 genes in release 1 of the Gadfly annotation), the results are also relatively authoritative.

Organic solutes
The housekeeping ribosomal transcripts vanish from the enrichment list (Table 2), which is now dominated by transporters. Intriguingly, these are not for the V-ATPase that is considered to dominate active transport by the tubule, but for organic and inorganic solutes. There is a range of broad-specificity transporters -for organic cations, anions, monocarboxylic acids, amino acids and multivitamins. There are also multiple inorganic anion co-transporters for phosphate and iodide. Most are not only very highly enriched, but also highly abundant. In more detail, the results are remarkable (Table  4). Nearly every class of transporter is represented, and almost all of these have at least one representative that is both abundant and enriched, implying a very specific renal role; indeed, this table contains the genes with the highest average enrichments of any class, frequently more than 30-fold. Some transporters have been documented implicitly as having a tubule role; many of the classical Drosophila eye-color mutants also have an effect on tubule color, and have since been shown to encode genes for transport of eye-pigment precursors [12,36]. These genes now turn out to be both abundant and enriched; among the ABC transporters are scarlet and white, and among the monocarboxylic acid transporters is CG12286, which we have recently argued to correspond to karmoisin, a probable kynurenine tranporter [37]. Glucose and other sugar transporters are consistently abundant and enriched, implying that sugar transport is a major (and previously unsuspected) role of the tubule. Inorganic transporters are also included in the table; there are also copper and zinc transporters, which is consistent with electron-probe X-ray Cyp6a18 Cytochrome P450, CYP6A18 25.5 microanalysis data that heavy metals accumulate in tubule concretions [38,39], and with the extreme abundance of metallothionein A ( Table 1).
As well as specific transporters, the tubule is enriched for several families of broad-specificity transporters (organic anion and cation transporters, multivitamin transporters, ABC multidrug transporters and an oligopeptide transporter). When combined these would be capable of excreting a huge majority of organic solutes. These results invite a substantial revision of our interpretation of the role of the tubule. Classically, it is considered to be the tissue that excretes waste material, both metabolites and xenobiotics, and provides the first stage of osmoregulation. However, nearly all work on insect tubules in the last half-century has focused on the ionic basis of fluid secretion and its control, as these are easily measured experimentally. Although there have been sporadic reports on the active transport of organic solutes such as dyes [40][41][42], the historical view was of a relatively leaky epithelium, with a paracellular default pathway for those solutes not recognized by specific transporters. While consistent with the more classical view of the tubule, our results also suggest that the insect is emulating a leaky epithelium to produce the primary urine by incorporating a vast array of broad-specificity active transporters in the plasma membranes of what is electrically rather a tight epithelium. Indeed, this interpretation Semi-quantitative inter-gene comparison is possible using Affymetrix signal Figure 2 Semi-quantitative inter-gene comparison is possible using Affymetrix signal. The 24 QRT-PCR results underlying Table 3 were normalized against rp49, and plotted against the Affymetrix signal globally normalized as in MAS 5.0. Spearman's r was calculated, and significance of the correlation assessed (one-tailed), using Graphpad Prism 3.0.
The phenotype gap Figure 3 The phenotype gap. Genes enriched in tubules are historically underresearched. The percentage of genes with explicit names (other than automatic CG annotations) is shown for the entire genome, and for the top 50, 100 and 200 genes (as judged by fold enrichment) from the tubule dataset. is consistent with other independent data: the intercellular junctions in tubule are known to be of the pleated stellate variety, the invertebrate equivalent of tight junctions [43]; and, like salivary glands, tubule cells are known to be highly polytene [44][45][46][47] or even binucleate [48], adaptations that maximize the size of cells and thus maximize their area/circumference ratios.

V-ATPases
Physiological analysis of the tubule has concentrated on the secretion of primary urine, and the energizing transporter is a plasma membrane proton pump, the V-ATPase [13,[49][50][51]. This is a large holoenzyme of at least 13 subunits, encoded by 31 Drosophila genes [52,53]. V-ATPases have two distinct roles, one carried out at low levels in endomembrane compartments of all eukaryotic cells and the other in the plasma membranes of specialized epithelial cells of both insects and vertebrates [54]. In such cells, the V-ATPases can pack the plasma membrane to such an extent that they resemble semi-crystalline arrays when observed by electron microscopy [55]. It is clearly of interest to find out which genes contribute to the plasma-membrane role of the V-ATPase, though this would normally involve difficult and tedious generation of selective antibodies capable of distinguishing between very similar proteins. However, the mRNAs for those V-ATPase subunits enriched in epithelia should also be particularly abundant; one could thus predict that at least one gene encoding each V-ATPase subunit should show enrichment in tubule compared with the rest of the fly. This is indeed the case (Table 5): invariably, one gene for each subunit is both  All genes significantly similar to known human or yeast V-ATPase subunits were identified by BLAST search, extending our previously reported annotation of the V-ATPase family [53], by identifying the genes underlined above as V-ATPase subunits. For comparison, enrichment ratios significantly greater than 1 and signals over 1,000 are shown in bold. (vha16-2 and vha16-3 are in tandem repeat and share the same Affymetrix oligo set, and so cannot be distinguished here.) significantly enriched, and far more abundant, than any other gene encoding that subunit. The reason that the enrichment is not higher is probably because the whole-fly samples contain other epithelia, each with enriched V-ATPase, as minor parts of the overall sample.
The array data thus allow a rapid and authoritative prediction to be made on the subunit composition of the plasma membrane V-ATPase. It will be interesting to extend these data to other epithelia in which V-ATPase is known to be functionally significant.

Na + , K + -ATPase
The role of the classical Na + , K + -ATPase in tubule is enigmatic. In nearly all animal epithelia, transport is energized by a basolateral Na + , K + -ATPase, which establishes a sodium gradient that drives secondary transport processes. By contrast, insect epithelia are energized by a proton gradient from the apical V-ATPase [56,57] and, consistent with this, many insect tissues are paradoxically refractory to ouabain, the specific Na + , K + -ATPase inhibitor [58]. Accordingly, models of insect epithelial function tend not to include the Na + , K + -ATPase. It is thus interesting to note that both Atpalpha and Nervana 1 (encoding isoforms of the α and β subunits, respectively) are among the most abundant transcripts in tubule (Table 6). Both are about as enriched in tubule as the V-ATPase subunits, but are significantly more abundant (compare Table 5). By contrast, a novel alpha-like subunit (CG3701), and both Nrv2 (the neuronal β-subunit) and other novel β-like subunits are at near-zero levels. As Na + , K + -ATPase has previously been documented as being particularly abundant in Drosophila tubule [59], it may thus be prudent to re-include the Na + , K + -ATPase as an important part of models of tubule function.

Potassium channels
Potassium is actively pumped across the tubule, and the main basolateral entry step is via barium-sensitive potassium channels, both in tubule [50,60,61] and in other V-ATPasedriven insect epithelia [62,63]. Of the ion channels, the potassium channel family is by far the most diverse in all animals: in Drosophila, there are at least 28, and in human 255, K +channel genes [64]. Inspection of the potassium channels on the array (Table 7) clearly identifies just four that are expressed at appreciable levels. Irk3, Ir, Irk2 and NCKQ are all both very abundant and highly enriched in tubule. Irk3 in particular is 80-fold enriched over the rest of the fly, implying a unique role in tubule. Three of these genes are members of the inward rectifier family of potassium channels: supporting the hypothesis that they are critical for potassium entry, these channels are known to be highly barium-sensitive [65]. An inward rectification of potassium current (meaning that potassium would pass much more easily into the cell than out) would be ideal for a basolateral entry step. Inward rectifier channels normally associate with the sulfonylurea receptor (SUR), an ABC transporter, in order to make functional channels [66,67]. In tubules, SUR mRNA is present at extremely low abundance (signal 6, enrichment 0.9 times). However, CG9270, a gene with very close similarity to SUR (1 × 10 -28 by BLASTP) is very abundant in tubule (see Table 4), (signal 422, enrichment 21 times). A second very similar gene, CG31793 (previously also known as CG10441 and CG17338), is very much less abundant (signal 24, enrichment 0.5). We therefore predict that novel inward rectifiers, formed between Irk3, Ir or Ir2 and CG9270, may provide the major basolateral K + entry path in tubule. In contrast, the other classes of K + channel, and the Na/K/Cl co-transporter that has been documented in tubule, are all relatively low in both abundance and enrichment.

Chloride and water flux
In a fluid-secreting epithelium, a necessary correlate of the active transport of cations must be the provision of a shunt pathway for anions and a relatively high permeability to water. In Drosophila tubules, a hormonally regulated chloride conductance pathway has been shown to occur in the stellate cells, although the molecular correlate of the currents has not been determined. There are three ClC-type chloride channels in the Drosophila genome, and RT-PCR has shown that all three are expressed in tubule [12]. The array data present a prime candidate (Table 8). Although all three genes are expressed, only one (CG6942) is both very abundant and enriched in tubule (signal 251, enrichment 4). It is thus an obvious candidate partner to provide a shunt pathway for the epithelial V-ATPase.
Water flux through the tubule is also phenomenally fast: each cell can clear its own volume of fluid every 10 seconds [12]. Although traditionally it was thought that only a leaky epithelium could sustain such rates, the identification of aquaporins (AQP) (the predominant members of the major intrinsic pro- tein (MIP) family) as major water channels in both animals and plants [68] provides an obvious counter-explanation. There is physiological and molecular data for the presence of aquaporins in Drosophila tubule [69], and AQP-like immunoreactivity has been demonstrated in stellate cells [12]. Table 9 shows that only four of the seven AQP/MIP genes are abundant, and only three enriched. One can thus tentatively assign an organism-wide role to CG7777 (signal 243, enrichment 0.6), but tubule-specific roles to CG4019, CG17664 and DRIP. In particular, CG17664, is both highly abundant and very highly enriched (signal 705, enrichment 7.9).

Control of the tubule
The hormonal control of fluid secretion is well understood. The major urine-producinig region of the tubule is the main segment [70], and is composed of two major cell types, principal and stellate cells [9,13,71]. Active cation transport in the These are the three genes with clear similarity to the ClC gene family of vertebrates [12].  principal cell is stimulated by the hormones calcitonin-like peptide and corticotrophin releasing factor (CRF)-like peptide, both of which act through cyclic AMP (cAMP). Another peptide family, the CAPA peptides, act through intracellular calcium to stimulate nitric oxide synthase and thus raise cyclic GMP (cGMP), an unusual autocrine role for nitric oxide [20,72]. In the stellate cell, the chloride shunt conductance is activated by leucokinin [17,73], and a role for tyramine as an extracellular signal has also been proposed [74]. So far, the CAPA and leucokinin receptors have been identified [75,76]; both are prominent among the receptors enriched in tubule ( Table 10). The CAPA receptor appears much more highly enriched in tubule than the leucokinin receptor, which is consistent with our understanding of each: the tubule is the only known target of CAPA, whereas leucokinin receptors are widely distributed in the adult gut, gonad and nervous system [75].
There are many other receptors that are reasonably abundant and enriched in tubule. As well as candidate receptors for calcitonin-like and other neuropeptides, there are two glycine/ GABA-like receptors that might be expected to form ligandgated chloride channels, together with good matches to vascular endothelial growth factor-like, insulin-like and bombesin-like receptors. The localization of, ligands for, and functional roles of these receptors will be of great interest. It should be noted in this context that all hormones characterized so far act on one of the two main cell types in the principal section of the tubule. There are, however, six genetically defined cell types and six regions in the adult tubule [9], and it is likely that there will at least be ligands acting on the initial segment to stimulate calcium excretion, and others acting to regulate reabsorption by the lower tubule. If any of these receptors maps to these regions, they would be prime candidates for such roles.
Overall, the main surprise from these data is the sheer range of candidate ligands that could be inferred; this more than doubles the size of the endocrine repertoire so far postulated for insect tubules.
On a more general level, it is possible to trace out the key genes in all three intracellular signaling pathways that have been studied in detail in Drosophila tubule (Table 11). The results for signaling genes tend not to be as clear-cut as for transporters, as many are rather widely distributed, and so do not show enrichment, and many do not require high standing levels of protein (and implicitly mRNA) to achieve their effects. Nonetheless, it is possible to identify genes that are at least present, and frequently enriched, in tubule. For the cAMP pathway, it is possible to identify adenylate cyclases, protein kinase A catalytic and regulatory subunits, and a phosphodiesterase (dunce). For cGMP, there are both soluble and membrane guanylate cyclases, implying that the tubules may produce cGMP directly in response to novel ligands, as has recently been suggested [77]. Both Drosophila genes encoding protein kinase G are expressed in tubule, and one is highly enriched. This is consistent with the renal phenotype observed both in foraging mutants [78], and in tubules in which protein kinase G is overexpressed [79]. There is also a PDE11-like phosphodiesterase. For calcium, two genes for phospholipase C, one for calmodulin, and one for protein kinase C and for calcium/calmodulin-dependent protein kinase are apparent. There are also a number of interesting modulatory or anchoring proteins, such as 14-3-3 zeta, Akinase anchoring proteins, and receptors for activated Ckinase (Rack1).

How is the tubule specified?
The developmental origin of the tubule has been reviewed in detail [80][81][82]. Briefly, four unique 'tip cells', specified by a cascade of neurogenic genes, control cell division in four outpushings (anlagen) of the hindgut, to form the Malpighian tubules. Late in embryogenesis the tubule is invaded by mesodermal cells, which intercalate between the future principal cells, and which then differentiate to form stellate cells [8]. In the adult, there are known to be at least six cell types and six tubule regions [9]. These regions are specified to great precision, and it is clear that each cell in the tubule has a precise positional identity. How does this identity persist throughout the lifetime of the animal? Presumably, combinations of transcription factors interact to provide both regional and cell- type coordinates and, after early establishment, these combinations must persist into adulthood. The microarray data allow the identification of transcription factors that are either highly abundant or highly enriched in tubule. Although this is by no means a complete list of transcription factors that are of importance to tubules, it is a good starting point. Further- Receptor for activated C-kinase more, there are enhancer trap or reporter gene constructs available for many transcription factors. Accordingly, the top transcription factors and DNA-binding proteins were identified from the array dataset (Table 12).
Some of these transcription factors are already known to be present in tubule, and their presence is confirmed: cut, which is known to be required for development of, and expressed in adult Malpighian tubules [83]; and forkhead and homothorax, both implicated by expression or mutational analysis to be involved in tubule development [84,85]. Teashirt, which has recently been shown to be stellate-cell specific in the late embryo [8], is also present in the adult, with fairly high enrichment (4.6 times).
The array results also implicate a further set of transcription factor genes (ETS21C, CG4548, bowl, sequoia, tap, CG1162, pnt, shaven, forkhead domain 59A, sloppy paired 2, lim3) as important in adult. Significantly, these mainly encode transcription factors implicated in development of the nervous system (another ectodermal tissue), so their reuse in the adult tubule is not too surprising. Once the binding sites for these factors are known, it will be interesting to model gene expression in different tubule regions.
As transcription factors have been studied experimentally in some detail, they are relatively well represented by enhancer trap and other in vivo construct lines. Although individual lines do not necessarily represent the complete expression pattern of their cognate genes, a collection of such lines can provide a rapid first validation of a gene list (Table 12). Accordingly, representative reporter gene lines were ordered from the Bloomington Stock Center [86], and their adult staining patterns in tubule and gut are shown in Figure 4. The results are exciting: most lines showed patterned staining in tubule that is consistent with our original genetically derived map of the tubule [9]. For example, homothorax marks out the initial, main and transitional segments of the tubule, whereas CG7417 marks the complementary lower tubule domain. The latter line is widely used as a highly specific mushroom body GAL4 driver line in brain, and it is interesting that the two known lower tubule GAL4 driver lines (c507 and c232) are both insertions in alkaline phosphatase 4, a gene which is only expressed in lower tubule and the ellipsoid bodies of brain (next to the mushroom bodies) [87]. There is also a cell-type-specific transcription factor: corto is found only in stellate cells. Several other transcription factors show ubiquitous, rather than patterned, expression in the tubule, but this is nonetheless consistent with their identification in the microarray dataset.
Another interesting aspect of the data in Table 12 is the number of anonymous CG genes implicated in tubule function. These genes have been annotated as transcription factors because of DNA-binding domains, for example, but have not been characterized functionally. The epithelial phenotype gap is thus evident even in this most intensely studied group of genes.

Exceptions to the rule
The whole premise of microarray work is that an abundant or enriched signal indicates the importance of a gene product in a particular context. This hypothesis is normally both untested and unchallenged. The unusual depth of functional understanding of the tubule allows a more rigorous appraisal. In fact, the majority of the genes implicated in tubule function are found well up the list. There are, however, several conspicuous exceptions (Table 13). The calcium channels trp and trpl are normally considered to be eye-specific, and have an essential role in phototransduction [88][89][90]. It is thus not surprising to find both genes almost at the bottom of the gene Table 12 Transcription factors and DNA-binding proteins that are abundant or enriched in tubule list. We have shown, however, that fluid secretion is severely compromised by mutations in either gene. Similarly, nitric oxide synthase (NOS) is a major signal transducer in tubule [20,72]. Nonetheless, all three genes are within the 'bottom' 20 of the whole array, with signals that are barely detectable and significant depletion compared with the whole fly. This is a cautionary example: while abundant or enriched signals can be taken as reliable indicators of functional significance, the converse is not necessarily true.

The tubule and human disease
Consequent to the demonstration of the phenotype gap, there are some intriguing, abundant and enriched genes which by virtue of their non-uniform expression, are likely to be important in (and best studied in) tubule. A systematic approach was taken by combining the tubule-enriched gene list with the homophila database of Drosophila genes with known human disease homologs. The results (Table 14) show the 50 human diseases with Drosophila homologs that are upregulated at least threefold in tubules. Intriguingly, several of these genes have human kidney phenotypes. Some are extremely well studied: for example, rosy (one of the first Drosophila mutations recorded) encodes xanthine oxidase, and mutation in either human or fly produces severe nephrolithiasis with concomitant distortion of tubules (reviewed in [12]). The distension of tubules is remarkable ( Figure 5). In both species, lethal effects can be ameliorated by a highwater, low-purine diet. Other diseases, although less well doc-Expression patterns in tubules of some of the transcription factor genes indicated by the microarray data as being expressed in tubules umented, have plausible renal phenotypes: for example, antenatal Bartter syndrome, a severe salt-wasting renal disease, associated with mutations in the ROMK channel (homolog ir); Dent disease, caused by mutation in ClC5 (homolog CG5284); proximal renal tubular acidosis, caused by mutation in the NDAE co-transport (homolog ndae1); nephrophatic cystinosis, caused by mutation in a lysosomal cystine transporter (homolog CG17119); mucopolysaccharidosis type IV, caused by mutation in galactosamine-6-sulphatase, an enzyme enriched in both human and fly kidney (homolog CG7402). Overall, there is a clear message that human and fly renal function may be relatively similar over quite a wide range of properties.
The tubule phenotype may also prove highly informative for other genes implicated in disease. Recently, a small 10 kDa protein, bc10, was shown to be downregulated in the transition from early-stage to invasive bladder carcinoma [91]. The normal function of this protein is not yet established, but its homolog (bc10) is highly abundant (893 ± 50) and moderately enriched (1.9 ± 0.09) in tubule, and a P-element insertion within the gene P{GT1}BG02443, is available from stock centers.
This comparative approach can be extended to non-human species. For example, CG4928 represents an abundant and enriched transcript (3,778, 13 times enriched), that is highly similar (1.9 × 10 -75 ) to the C. elegans gene unc-93 [92]. This is associated with a 'rubber-band' phenotype, in which motor co-ordination is sluggish; it is thus taken to be a myogenic or neuromuscular gene. The discovery that a close homolog is highly enriched in renal tissue opens new lines of investigation for this gene.

Discussion
These data have value at two distinct levels: specific and general. Specifically, we have found out more about the operation of the Malpighian tubule than in any single published piece of work since the very first pioneering days: a summary is given in Figure 6. This tissue is of great interest, both for developmental studies and for integrative physiological study of epithelial function. Despite 990 papers on Malpighian tubules since the start of the twentieth century, and a really rather good understanding of ion and water transport, the microarray data provide strong indications that these are only minor properties of the tubule. Whole families of transporters are represented by abundant mRNAs and transport solutes that have yet to be studied in the context of tubule. Some datasets implicate particular genes in processes that have been studied in great physiological detail, and the presence of known genes with the novel can only increase our confidence in the result.
In this context, the demonstrated abundance of transporters for almost every class of organic and inorganic solute dramatically diminishes the number of solutes for which a nonspecific paracellular pathway need be invoked. The data thus allow the conceptual view of the epithelium to alter from leaky to tight in a physiological-transport sense: this is consistent with electrophysiological data [93].
There are two areas where microarray data deserve comment. Firstly, more than 300 genes are expressed in tubule but called as absent in whole-fly samples. Although there is an obvious convenience and consistency in employing wholeorganism samples for array studies, it is important to recognize that the approach is very likely to suppress the detection of those interesting genes that are not widely expressed. Secondly, the premise that abundance on an array (or more generally, abundance of an RNA species) necessarily correlates with functional significance can be spectacularly refuted by three examples, the trp and trpl channels and NOS. It is, however, probably significant that these are cellsignaling molecules, where a relatively small number of molecules can have a disproportionate influence on cell behavior. By contrast, the transport genes for which the tubule is so enriched are much more likely to exert effects proportional to their abundance.

Conclusions
Reverse genetics is a vital tool in functional genomics, but the 'phenotype gap' has hampered widespread implementation of this approach [35]. As the tubule presents a range of easily assayed phenotypes [12], this work specifically identifies those genes that are likely to be best studied in tubule by virtue of their very high enrichment. In addition to the obvious transport genes, it is interesting that many transcription factors and human disease gene homologs fall into this category.  This work thus stresses the importance of systematic, finegrained, tissue-specific microarray analysis in closing the phenotype gap for multicellular model organisms.

Microarrays
Tubules were dissected in batches of 1,000 by a group of eight experimenters. Tubules were aggregated into Trizol every 15 min to minimize the distortion of the transcriptome by the trauma of dissection and in vitro incubation. Care was taken to sever the tubules from the gut at the lower ureter so that no other tissue was included in the sample. For each experimental point, whole flies from the same culture were homogenized in Trizol in batches of 100, to permit a matched pair comparison. Six repeats were performed. RNA was extracted according to standard protocols, and quality was assessed with an Agilent RNA Bioanalyzer. Samples of 20 µg total RNA were reverse-transcribed, then in vitro transcribed, according to Affymetrix standard protocols. The quality of the ccomplementary RNA (cRNA) was also checked on an Agilent RNA Bioanalyzer, with a sample in which the broad cRNA peak exceeded the height of the low molecular weight degradation peak taken to be satisfactory. Samples were then run on the Affymetrix Drosophila genome array under standard conditions. Quality control was at several levels: the Affymetrix MAS 5.0 software provided evidence of successful sample preparation, with test genes providing a 3':5' signal ratio of less than 3. dChip [97] provided an alternative view, with a direct oligo-by-oligo view on the success of hybridization across the array surface; slides with both single-probe and probe-set outlier rates of less than 5% were taken as satisfactory. Only arrays in which both results were in range were accepted. In this case, 11 of 12 arrays were satisfactory; the first tubule array failed both MAS and dChip criteria, and so the first experimental pair was discarded to leave a five-sample paired design. As will be seen from the results, this design

Drosophila tubule as a model for human genetic disease
Recapitulation of human xanthinuria type 1 by rosy mutants Figure 5 Recapitulation of human xanthinuria type 1 by rosy mutants. (a) Wild-type tubule; (b) tubule from adult ry2 homozygous fly. Both micrographs are at the same magnification, and the diameter of the wild-type tubule can be taken as 35 µm.

(a) (b)
was sufficient to identify tubule-enriched genes with a high level of confidence. As sample collection extended over the whole day, array results from morning versus afternoon samples were compared (data not shown), but no difference was found between the two groups at this very broad time resolution.

Bioinformatics
Microarray samples were analyzed by two independent routes. The first was low-level analysis with the Affymetrix MAS 5.0 suite and identification of differentially expressed genes using the Affymetrix Data Mining Tool. The second was low-level analysis using dChip software [97] followed by assessment of significance using SAM software [98] followed by post-analysis by dChip. The MAS5 low-level analysis consisted of background subtraction followed by robust conversion of probe-level perfect match-mismatch (PM-MM) expression values into probe-set-level signals followed by linear multi-chip normalization (scaling). Tubule enrichment was based on an Affymetrix 'up' call, and a critical level of p < 0.05. In this analysis method, tubule and fly samples were taken as matched pairs, reflecting their biological origin. The dChip-based low-level analysis consisted of background correction followed by the multi-chip, 'invariant-set' nonlinear normalization at probe level followed by the calculation of model-based expression indices using PM expression values only. Differentially expressed genes between two groups of five replicates were identified within dChip by filtering data using the following criteria: lower 90% confidence bound of fold-change [21] > 2; difference between group means on antilog scale > 100 and p-value for t-test of equal group means < 0.01. Alternatively, the differentially expressed genes were identified using SAM software with 1,000 sample permutations and false-discovery rate cutoff of 1%. These were then post-filtered using two first criteria from the dChip analysis mentioned above. Fold change was calculated as a ratio of group means. Outputs were saved as Excel files, and parsed by hand-coded Perl scripts.
Summary of major genes enriched in tubule Figure 6 Summary of major genes enriched in tubule. Genes shown are upregulated at least threefold.