Comparative profiling of the sense and antisense transcriptome of maize lines
© Ma et al.; licensee BioMed Central Ltd. 2006
Received: 2 November 2005
Accepted: 8 February 2006
Published: 13 March 2006
There are thousands of maize lines with distinctive normal as well as mutant phenotypes. To determine the validity of comparisons among mutants in different lines, we first address the question of how similar the transcriptomes are in three standard lines at four developmental stages.
Four tissues (leaves, 1 mm anthers, 1.5 mm anthers, pollen) from one hybrid and one inbred maize line were hybridized with the W23 inbred on Agilent oligonucleotide microarrays with 21,000 elements. Tissue-specific gene expression patterns were documented, with leaves having the most tissue-specific transcripts. Haploid pollen expresses about half as many genes as the other samples. High overlap of gene expression was found between leaves and anthers. Anther and pollen transcript expression showed high conservation among the three lines while leaves had more divergence. Antisense transcripts represented about 6 to 14 percent of total transcriptome by tissue type but were similar across lines. Gene Ontology (GO) annotations were assigned and tabulated. Enrichment in GO terms related to cell-cycle functions was found for the identified antisense transcripts. Microarray results were validated via quantitative real-time PCR and by hybridization to a second oligonucleotide microarray platform.
Despite high polymorphisms and structural differences among maize inbred lines, the transcriptomes of the three lines displayed remarkable similarities, especially in both reproductive samples (anther and pollen). We also identified potential stage markers for maize anther development. A large number of antisense transcripts were detected and implicated in important biological functions given the enrichment of particular GO classes.
Maize geneticists and breeders utilize thousands of inbred and hybrid lines in their research. The diversity of extant lines reflects both the ease of crossing corn (Zea mays L.) and the long life of seeds. These lines are derived from hundreds of landraces collected in US farmers' fields and from native Americans beginning in the early 20th century. Lineage records track these materials, the crosses among them, and the inbred lines derived over the past century [1, 2]. Phenotypic differences between inbreds can be subtle or dramatic as lines were bred for size, floral morphology, days to flowering, seed constituents, and myriad other traits; distinctive alleles as well as epistatic interactions between loci are the genetic basis for these traits. Differences among lines are notable in genetic analysis when a particular allele, such as a new mutant allele, is introgressed into a range of inbred lines: there can be a striking impact in some lines but a quenching of the expected phenotypes in other lines . Climatic conditions at specific locations also constrain which lines will flourish, reflecting differences in environmental responses. Therefore, it is of great interest to quantify line-specific aspects of gene expression that are the underlying basis for phenotypic variation among inbreds and hybrids and to determine the characteristic patterns of gene expression in specific organs in multiple wild-type lines before examining the impact of mutations on the transcriptome of developing organs.
One complication in defining gene functions in maize is that the species has a tetraploid genome from an event about 11 to 15 mya. The genome retains most of the duplicated chromosomal segments as well as more recently generated duplicated genes . Based on approximately 407,000 public Expressed Sequence Tags, representing parts of gene transcripts, there are 31,375 tentative contigs plus 27,207 singleton sequences totaling approximately 58,582 possible genes (The Institute for Genomic Research (TIGR) Maize Gene Index release 15.0, September 2004), a number likely to shrink to approximately 50,000 with more complete transcript sequencing. Despite the apparent redundancy of genes within this assembly, visible mutants are readily recovered . At present, 6,505 maize loci are defined . Therefore, alleles of many individual genes have distinctive functions in at least one tissue or organ compared to related loci.
A key question that can be addressed with transcriptome profiling is whether lines express the same loci in specific organs and tissues. That is, does the normal phenotype of an organ require that nearly all of the same genes be expressed and in a quantitatively similar manner or can the wild-type condition be achieved despite significant variation in the transcriptome? A related question is how distinctive the progression in gene expression can be during organ development in phenotypically distinctive maize lines. A third question considers whether some organs show more highly conserved patterns of gene expression in diverse lines than other organs, suggesting canalization of the regulatory alleles and of their targets in specifying certain plant parts.
The topic of organ-specific gene expression within one hybrid line was addressed previously by Cho et al. , who examined 7 organs of maize in a hybrid line composed of 75% inbred K55, 20% W23, and 5% Robertson's Mutator stocks; for roots, leaf blades, and leaf sheaths several developmental stages were examined. A printed cDNA microarray containing approximately 5,600 different genes was used for transcriptome profiling, and the data generated were sufficient to organize a hierarchy of relatedness among the tested organs. As expected, all leaf blade samples clustered together with leaf sheaths as a close sister group; organs associated with reproduction, whether photosynthetic husk leaves or floral organs, clustered together. A major limitation in this study was that cross-hybridization among family members would be expected to obscure many interesting patterns of gene expression; indeed, only 7% of the queried cDNAs showed organ-specific expression, as would be expected if a gene class was required in all the examined organs . The cDNA array format could not determine which member of a recently duplicated gene pair or gene family was expressed in each organ; on a limited scale, suites of oligonucleotide probes printed on the same slide for a few selected gene families showed that short oligonucleotide probes could provide gene-specific data necessary to resolve which family members are expressed in specific patterns .
To begin to answer the question of organ-specific expression and to determine the congruence in transcriptomes among lines, a new microarray platform containing in situ synthesized 60-mer oligonucleotide probes was employed. A reference design experiment comparing the W23 and A619 derivative lines and W23 and the F1 ND101/W23 hybrid was used with samples from juvenile leaves, mature pollen, and two stages of anther development. In this way, we could examine overlap in gene expression between vegetative, floral, and haploid gametophyte stages as well as determining the similarities between lines. For our validation analysis, both quantitative RT-PCR and hybridization to a second oligonucleotide-based microarray platform were employed.
Biological materials and study design
The W23, ND101, and an A619 derivative are Corn Belt Dent varieties, a classification based on origin and seed morphology, but they share no recent common ancestor . They are very similar in gross morphology at all stages of development, but can be distinguished in quantitative traits such as days to flowering, typical seed set, leaf length and width (data not shown). One specific motivation for choosing these lines is that we have begun analyzing male-sterile mutants of maize that are available in these three particular backgrounds. The lines were grown in a common field and four organ types - juvenile leaf blade, 1 mm anther, 1.5 mm anther, and haploid pollen - were recovered for comparison. Mature anthers are sacs composed of four concentric rings of somatic tissue layers; in the middle of each anther hundreds of pre-germinal cells initiate meiosis . Four haploid gametophytes (pollen grains) develop from each meiosis; each pollen grain contains two sperm cells required for the double fertilization characteristic of maize and other flowering plants. Based on Cho et al. , the expectation was that leaf, anther, and pollen samples would exhibit approximately an equal number of organ-specific transcripts and that the two anther stages would be significantly more similar to each other than to either leaf or pollen. Although these two stages are only one day apart, they are very distinctive developmentally. Within the 1 mm anther, cell divisions are common in the epidermis, in the three internal somatic layers (endothecium next to the epidermis, middle layer, and then tapetum), and in the innermost cell group of pre-germinal cells . Although the somatic cells are already organized into the concentric rings characteristic of a mature anther, cellular specializations are incomplete; the pre-germinal cell population is still expanding, and there is no evidence of pre-meiotic cells (data not shown). At the 1.5 mm stage, each of the cell layers has further differentiated and, based on chromosomal condensation characteristics, meiosis will soon initiate in some of the pre-germinal cells (L Harper and WZ Cande, personal communication).
Because the maize genome has not yet been sequenced, the 22,000 probes for the Agilent arrays were designed from the MaizeGDB December 2003 EST assemblies . Later these probes were mapped onto the TIGR Maize Gene Index assemblies (release 15.0, September 2004). In summary, these probes represent approximately 8,000 sense transcripts, approximately 5,000 antisense transcripts, and approximately 8,000 transcripts with undetermined orientation in this classification. Probes showing significant hybridization were manually analyzed to refine their classification as sense or antisense, and we estimate the array had probes to approximately 13,000 sense transcripts. Note that in the rest of the text, transcripts denote RNA species that were detected on the arrays because they hybridized to one or more oligo probes, either sense or antisense. Generally, the number of hybridized probes is larger than the number of possible transcripts, because there are two or more probes for a subset of genes. When we discuss antisense transcripts, we refer to RNA species that overlap with a known or highly likely cDNA on the reverse strand. The exact length of overlap is not known, but one or more probes to the antisense strand hybridized to the RNA sample with a dye signal above the background threshold. A concern regarding such transcripts might be their generation during cDNA synthesis through fold back self-priming. This will not be a significant problem for the oligo array platform because cRNAs were produced and labeled for hybridizations, although the precise representation of most transcripts was not independently verified in the cRNAs (see Materials and methods).
To identify probes that hybridized, we used an iterative approach and generated statistics from probes that are above background signals in all hybridizations (see Materials and methods for details). Analysis of the final results showed that the thresholds chosen were around the 90th percentile of median signals for the known antisense probes, most of which fail to hybridize with target RNAs, providing a reasonable cross validation of the approach (data not shown). Another benefit of this approach is to remove variances between biological replicates reflecting environmental factors, although this kind of difference is small compared to true line-specific expression differences. For the whole probe set, the correlation coefficients of the raw dye median intensities between each pair of biological replicate are mostly between 0.95 and 0.98, even when they were labeled with different dyes and presumably dye bias could have an effect. This is comparable to technical variances as assessed by duplicated probes on the arrays and both can be removed effectively by our approach.
Distinctive patterns of gene expression in organs and by genetic background
Transcript expression analyzed by biological sample type
Anther 1 mm
Anther 1.5 mm
Because a two-color hybridization protocol was employed in which each A619 or hybrid ND101/W23 sample was compared to W23, it was also feasible to define differentially expressed genes in the paired tests. A619 showed more differences compared to W23 than did the F1 hybrid of ND101 with W23; there were approximately 300 differentially expressed genes in each anther stage and in leaf in the A619-W23 comparison and fewer than 100 for pollen. The number of differences in the W23-ND101/W23 comparison was about half of the A619 differences in the anther samples but very similar for the other two tissues. Although parentage should be highly predictive of gene expression patterns, and it would therefore be logical to expect A619 to be more distinctive than the F1 hybrid, hybrid vigor is an important consideration. This phenomenon was discovered in maize at the beginning of the 20th century ; after inbreeding depresses plant yield and growth, combination with another inbred line typically yields an F1 hybrid far superior to either parent, suggesting significant changes in gene expression. Nonetheless, for the lines examined here, the ND101/W23 hybrid is more similar to W23 than the heterologous A619 line.
The complete results from the analysis of the common and unique transcript types in each genotype as well as across tissues are shown using Venn diagrams in Figure 2. Pollen and both anther stages have highly conserved transcriptome patterns, because fewer than 1% (both pollen and 1 mm anther) or about 1% (1.5 mm anther) of the transcripts are uniquely expressed in one line compared to the total shared in all 3 genotypes. In contrast, approximately 3% of the transcripts are line-specific in juvenile leaves. A global genotype analysis was conducted (Figure 2e) in which all four tissue samples were combined within each genotype. Comparing the three genotypes on this basis again highlights that A619 is the most distinctive, while W23 and the hybrid ND101/W23 are much closer in transcriptome pattern. In the global tissue analysis (Figure 2f), only transcripts that are expressed in all 3 lines (7,367 in total) were considered, and the 2 anther stages were treated as a single tissue type. There were 2,038 transcript types in common among the three biological sample types, the beginning of an enumeration of constitutively expressed or 'housekeeping' genes for maize. In the global assessment it is also clear that juvenile leaf and anthers share many transcripts in common (2,571), twice the number that each organ uniquely expressed. Pollen and the other two tissue types share approximately 150 transcripts each, about 11% of the 2,691 pollen transcripts found, indicating that although fewer transcripts are expressed than in other tissues examined (compare to 5,925 for anthers and 5,693 for leaf), there is a distinctive suite of transcripts present in pollen (>13% unique transcripts).
Enrichment of Gene Ontology classes
Significantly enriched GO terms in transcript groups
Number in test group
Cyclin-dependent protein kinase regulator activity
Nucleobase, nucleoside, nucleotide and nucleic acid metabolism
DNA replication initiation
Enzyme inhibitor activity
Cytoplasmic membrane-bound vesicle
Carboxylic ester hydrolase activity
Hydrolase activity, hydrolyzing O-glycosyl compounds
Hydrolase activity, acting on glycosyl bonds
Cell wall modification
Establishment of localization
Cytoskeletal protein binding
Enzyme regulator activity
Aspartyl esterase activity
Cytoskeleton organization and biogenesis
External encapsulating structure
Actin cytoskeleton organization and biogenesis
Expressed in all three tissue types (1,091)
Differentially expressed, ND101/W23 pollen versus W23 pollen (47)
Regulation of programmed cell death
Regulation of apoptosis
Negative regulation of programmed cell death
Negative regulation of apoptosis
Differentially expressed, ND101/W23 juvenile leaf versus W23 juvenile leaf (158)
In general, the GO analysis displayed very consistent patterns in accordance with already well-known functions of a given tissue type (Table 2). Leaf-specific genes are abundant with terms related to the plastid (GO:9536) and the key step in photosynthesis, oxygen binding (GO:19825). Over-represented GO terms for anther-specific genes include cyclin-dependent protein kinase regulator activity (GO:16538), DNA replication initiation (GO:6270), and a great number of genes involved in nucleic acid metabolism (GO:6139). On the other hand, pollen-specific genes are enriched in pectin esterase activity (GO:30599), a gene family that has been shown to function specifically late in pollen development , hydrolase activity (GO:16787), secretory pathway and secretion (GO:46903), transport (GO:6810), cell wall modification and cytoskeleton activities, among many other cellular functionalities that underlie a series of biological processes during pollen maturation. Not surprisingly, the ubiquitous endomembrane system (GO:12505) is represented in all tissue types. These results indirectly confirmed the utility of mining the GO data structure by this method. When we tested the differentially expressed gene groups, none showed any significant over-representation except in the comparison of W23 samples to the ND101/W23 pollen and juvenile leaf (Table 2). Interestingly, the GO analysis showed that the differentially expressed genes in the ND101/W23 hybrid pollen sample are enriched in negative regulators of apoptosis and programmed cell death (GO:43067, GO:6916). In the leaf sample, genes involved in oxidoreductase activity (GO:16491) and chloroplast (GO:9507) functions are differentially regulated. The functional significance of these gene regulations to the plant and their possible connection to the hybrid genomic background remain to be tested.
Antisense transcripts detected for many genes
Natural antisense transcripts (NATs) have been identified experimentally and predicted computationally from many organisms, including human, mouse, yeast, fruit fly, and Arabidopsis [19–23]. By definition, NATs contain sequences complementary to the sense transcripts of protein-coding genes. They may be transcribed in cis from the reverse strand (called cis-NAT) or in trans from separate loci (called trans-NAT). In eukaryotes, the majority of NATs are of the cis type. Unexpectedly, NATs are common: up to 20% of human genes have a NAT. Furthermore, many NATs are conserved, implying regulatory functions for these transcripts in eukaryotic gene expression [22, 24, 25]. To address the question of what fraction of maize genes might be regulated through an antisense transcript, the array platform was constructed to contain approximately 5,000 probes to detect the antisense strand of gene models constructed from EST assemblies; in some cases more than one 60-mer antisense oligo was designed per gene.
Analysis of antisense transcripts in the total transcriptome
Anther 1 mm
Anther 1 mm
Anther 1 mm
Because NATs are often discussed in the context of the corresponding sense transcripts, we identified 1,063 potential transcripts on the array that are represented by at least one pair of sense-antisense probes. Considering all the hybridization data, for 136 such pairs both probes hybridized, indicative of both sense and antisense transcripts in the RNA samples (see Additional data file 5), for 665 only sense probes hybridized, and for 41 only antisense probes hybridized (data not shown).
Significantly enriched GO terms in antisense transcripts
Organismal physiological process
Sensory perception of light
Spindle pole body
Microtubule organizing center
Intracellular non-membrane-bound organelle
Photosynthetic electron transport
Expressed sense-antisense pairs (120)
Amino acid derivative metabolism
Biogenic amine biosynthesis
Amino acid metabolism
Amino acid catabolism
Nitrogen compound catabolism
Aromatic amino acid family metabolism
Validation of microarray data
Maize has been shown to display considerable genomic heterogeneity and non-colinearity between inbred lines. These differences reflect mostly insertions of many transposable elements and translocation of individual loci from one chromosome to another, a process likely mediated by transposons [27, 28]. Brunner et al.  recently examined more than 2.8 Mb of maize syntenic chromosomal regions in two inbred lines and found more than one-third of the loci are absent in one inbred. Therefore, a key question is whether lines express the same loci in specific organs and tissues even when loci are in a new chromosomal context. Our results showed that despite the many likely instances of genomic non-colinearity in the 3 lines examined, they share more than 95% of their transcripts (Figure 2e). Furthermore, quantitatively about 95% of the transcriptomes are expressed at comparable levels between the two inbred lines W23 and A619; there is an even greater congruence between W23 and the hybrid ND101/W23 (Table 1; Figure 3b). Thus, although there is high nucleotide polymorphism in maize genes, the 60-mer and 70-mer probes are likely to hybridize well across lines. We conclude, therefore, that the non-colinearity observed for maize inbred lines seems to have little effect on the transcriptome in three major organs - leaf, anther and pollen. A related question concerns development per se: does the normal phenotype of an organ require that nearly all of the same genes be expressed and in a quantitatively similar manner, or can it be achieved with significant variation in the transcriptome? Because of the very high overlap in expression among the lines at each stage, the normal phenotypes are achieved with near-identical patterns of gene expression. The differences identified, although relatively few in number, will be important in further studies to relate the quantitative phenotypic differences distinguishing each line to the expression of specific transcripts.
It should be cautioned that transcriptome analysis using microarrays is plagued by two universal caveats: cross hybridization and the limitation in detection resolution. It may be even more severe for the maize genome given the high polymorphisms between inbred lines and the prevalence of duplicated genes. The problem of cross hybridization can be circumvented by careful probe design. Because the maize genome has not been completely sequenced, the probes on our arrays may cross hybridize with yet undefined gene transcripts. Therefore, our conclusions attest mainly to the congruence of overall gene expression in the three genetic backgrounds. In considering the second problem, statistically insignificant expression differences may be biologically significant and cause quantitative phenotypic differences. In recent years, efforts have been made to map gene expression quantitative trait loci (eQTL) and link them with classic quantitative traits. Both cis and trans-acting eQTLs have been identified for regulatory loci in yeast, maize, Arabidopsis, human and mice [29–32]. Thus combining microarray and eQTL analyses has proven to be more powerful in elucidating genetic control of gene expression in maize and other organisms.
A third question concerns how distinctive gene expression is among the organs examined. If we look only at the transcripts that are expressed in all three lines (thus with a high confidence of their expression), more than one-third of the transcripts for any single tissue are shared among all three tissues, and for pollen the frequency increases to more than three-quarters (Figure 2f). This might reflect the bias of the probes towards highly to moderately expressed genes. Nonetheless, compared to the work of Cho et al.  showing that only 7% of transcripts were tissue-specific after hybridization to a cDNA array platform, we find that one-third of the combined transcripts are tissue-specific. Even more striking is the large number of transcripts shared between leaf and anther, including several photosynthesis genes that are expressed highly in early anther (Figures 2f and 4). This certainly provides evidence to strengthen the model that anthers and other floral organs are modified leaves.
A fourth question considers whether some organs show more highly conserved patterns of gene expression in diverse lines, which may indicate canalization of the regulatory genes and of their targets in specifying certain plant parts. From the transcriptome analysis, reproductive tissues, represented by anther and pollen, express a more conserved transcriptome than vegetative tissues, represented by leaf. Both A619 and ND101/W23 had much more line-unique transcripts that are also specific for leaf than for either pollen or anther (Figure 2a–d). As for expression levels, both lines also showed more differentially expressed transcripts in the leaf than in the anthers (Table 1). The conservation of gene expression patterns during anther development and pollen function may be important to insure reproductive success.
Because ND101/W23 is a hybrid and much more robust than W23, one interesting question to ask is whether heterosis (hybrid vigor) is determined by drastic transcriptome changes compared to a parental line. Fu and Dooner  proposed that complementation of weak, line-specific alleles could partially contribute to hybrid vigor. However, accumulating evidence suggests that dosage-dependent, non-additive gene expression may play a bigger role ; that is, epistatic interactions among new combinations of alleles result in the significant phenotypic differences between many hybrids and their parents. For example, Song and Messing  found unexpected differences in the expression of shared and line-specific genes in reciprocal hybrids of two maize inbred lines. Our results demonstrate that the ND101/W23 hybrid is actually very close in gene expression to W23 in every tissue sample tested (Figure 3). It does share about the same number of common transcripts with either W23 or A619 (Figure 2e), however, suggesting an unbiased expression of line-specific genes. Given the lack of data from reciprocal hybrid lines between W23 and ND101, and also from the parental ND101 inbred, we could only speculate on this important question in maize genetics.
Natural antisense transcripts have been implicated in the regulation of a number of biological processes, including RNA interference, translation regulation, alternative splicing, genomic imprinting, and RNA editing. However, very few NATs have been experimentally analyzed, and the exact roles of the large number of NATs in seemingly every eukaryotic genome analyzed so far remain elusive [19–25]. Nonetheless, even though their possible functions in the maize genome are largely unknown, the diversity of antisense transcripts discovered in this study indicates that this class of RNAs is likely to play important roles in maize development and physiology.
This report also provided a good cross validation between two array platforms, each having specific strengths. The Agilent platform displayed superb hybridization images and a very consistent low background. On the other hand, the University of Arizona platform provided many more probes and hence much wider coverage of the maize transcriptome.
Despite the phenotypic and genotypic diversity of maize, transcriptome profiling indicates that the three lines tested share remarkable similarities in gene expression patterns across diverse tissue types, especially in both reproductive tissues (anther and pollen). Our ultimate goal is to define the genetic basis for anther morphology and the functions of cells within this floral organ. We are using diverse male-sterile mutants that affect the differentiation of anther cell types at specific stages to define organ ontogeny. As plants lack a germ line, it is of particular interest to define the mechanisms underlying pre-germinal fate determination, which requires that somatic cells become competent to initiate meiosis. More than 400 male-sterile mutations are available, but they are in diverse genetic backgrounds. Because only two or three generations of corn can be grown annually, introgression to the status of a near-isogenic inbred line can require years. We were therefore motivated to determine the extent of line-specific gene expression in anthers that could confound comparisons between different male-sterile mutants and a reference male-fertile line into which all the mutants would eventually be introgressed. Our results show that despite a congruent transcriptome observed across the different genetic backgrounds, the number of differentially expressed genes is still considerable. Therefore, any mutant to wild-type comparisons will be carried out using sterile and fertile siblings in the same family to circumvent the problem.
Materials and methods
Biological materials and tissue collection
The ND101 line was supplied by P Bedinger and the A619 derivative by W Sheridan. The W23 line carrying the bz2 mutation (lack of anthocyanin accumulation) is maintained in the Walbot laboratory by self-pollination. These materials were grown at Stanford University in the summer of 2003 and phenotypes were quantified (data not shown); the lines were propagated by self-pollination of male-fertile individuals and by crosses of W23 as pollen parent onto the ND101 male-fertile individuals. For collection of tissues, the resulting lines were grown in summer 2004 at Stanford University; leaf and pollen samples were collected in the field, transferred to a labeled plastic tube, and immediately frozen in liquid nitrogen. Multiple biological samples from fully expanded juvenile leaves (leaves 8, 9, 10 in these lines) on different plants were harvested. At flowering, tassels were bagged to collect pollen shed from exerted anthers, which was then sieved to remove extraneous debris. Pollen from the same individual was pooled to make one biological sample. Multiple biological samples were collected over a period of several days for each line. Anthers must be dissected from developing flowers; to do this, plants of approximately the correct stage were identified in the field on the basis of tassel size, and the entire plant was harvested by cutting near ground level with a knife. The harvested plants of each line were kept in separate buckets of water in an air-conditioned field laboratory. A maize tassel contains hundreds of flowers, borne in pairs called the upper and lower floret. Each floret contains three anthers. Because the upper florets mature more quickly than the lower florets, and the two floret types exhibit some transcriptome differences at the 2 mm stage of development (D Skibbe, personal communication), dissection was restricted to upper florets. Anthers were dissected into 2.0 ml microfuge tubes containing liquid nitrogen; the tubes were supported in a styrofoam pad and periodically refilled with liquid nitrogen. For 1 mm anthers, a sample of several hundred anthers was collected for each genotype, typically from just one tassel. Approximately 100 anthers at the 1.5 mm stage were sufficient for an RNA extraction suitable for microarray and RT-PCR analysis. Up to 15 replicate samples were obtained.
Array design and analysis - Agilent platform
Agilent Technologies microarrays are built using phosphoramidite chemistry to synthesize 60-mers in situ on glass slides . There are 322 internal positive and 314 negative controls on each maize array. Maize probes were designed from the December 2003 maize EST assembly of MaizeGDB . The 21,939 maize probes represent 21,782 unique probes, with 157 probes duplicated. Hybridizations for duplicated probes for the 8 experiments were highly correlated as assessed from correlations between median signal intensities (r2 = 0.97 for both dyes; data not shown) and between log2 ratios of the signals (r2 = 0.94; data not shown). Oligonucleotide sequences, gene identities and both raw and normalized hybridization intensities for each probe can be downloaded from our array data submitted to the Gene Expression Omnibus database .
To identify unique genes or transcripts, we mapped the probe set to the TIGR Maize Gene Index (release 15.0, September 2004) . The TIGR dataset provides annotations for each Tentative Contig assembly based on protein similarity search results and EST sequence orientation information (for example, the presence of a poly-A tail). The assembly will be annotated as 'coding strand' if there is strong supporting evidence. By the stringent criterion of at most 2 mismatches over an alignment length of 60 nucleotides (the full probe length), the probes were found to represent approximately 21,132 unique transcripts (either sense or antisense; see below).
Identification of antisense probes
We used a combination of two independent approaches to identify antisense probes. First to avoid assembly errors, the probes were mapped back to their original EST sequences (downloaded from NCBI), and the corresponding EST sequences (with an average length of 555 base-pairs) were subjected to a BLASTX similarity search against a plant protein database extracted from UniProt (the December 2004 dump). The following criteria were used: first, the top hit must be from peptide translated from the reverse strand of the EST and the BLAST score >80; and second, if there is also a hit(s) from the sense strand, its BLAST score must be below 50 and the top score must be over 100 (for a reverse hit). The BLASTX results were cross-validated by mapping the probes to the TIGR Maize Gene Index dataset, which provides additional information on the orientation of the TC sequences. A probe is annotated as 'antisense' if both the BLASTX results and TIGR Maize Gene Index evidence showed it to hybridize to the reverse strand of a coding sequence. A total of 5,075 probes were identified as antisense probes. To further confirm this probe set, we randomly picked 100 probes and manually verified that they were antisense probes given available information on the maize transcriptome.
RNA extraction, target cRNA preparation and array hybridization
Total RNA was extracted from 30 to 60 mg of frozen tissues at each developmental stage using the RNeasy Plant Kit (Qiagen, Valencia, CA, USA) and subjected to DNase I (Invitrogen, Carlsbad, CA, USA) or Turbo DNase treatment (Ambion, Austin, TX, USA) and a second round of RNeasy column purification (Qiagen). The yield and RNA purity were determined spectrophotometrically with a SpectraMax 250 plate spectrophotometer (Molecular Devices, Sunnyvale, CA, USA) and verified by agarose gel analysis. Target cRNA was prepared and labeled with either Cy-3 or Cy-5 dye (PerkinElmer, Boston, MA, USA) from 0.5 μg of total RNA using an Agilent Low RNA Input Fluorescent Linear Amplification Kit. Array hybridizations were carried out according to the manufacturer's instructions. Specifically, each array was hybridized with two samples, each of 0.75 μg labeled target cRNA, for 17 hours at 60°C. Data were acquired with an Agilent G2565BA scanner.
Microarray data analysis
Microarray experiments and data were managed and analyzed using a customized implementation of the BASE system . The reliability and reproducibility of analyses was ensured by the use of triplicates in each experiment, the normalization of all 24 arrays to the median probe intensity level with background subtracted, and the use of well accepted and freely available software packages. The slide images were processed with FeatureExtraction v. 0.75 (Agilent). After filtering out saturated spots flagged by FeatureExtraction, we took an iterative approach to estimate non-specific hybridizations. For each of the three slides in a given experiment, we first calculated thresholds for background hybridizations with the 314 internal negative controls, as Average (median intensities - background) of negative controls] + 2 × standard deviation for both dyes. We then added to the 'non-hyb' (non-hybridizing) set only probes that showed below-threshold signals in at least five out of the six median intensities for the triplicates. Then iteratively, new thresholds for each slide were calculated and new non-hybridizing probes identified until there were none left. Probes that showed above-threshold signals in at least five out of the six median signals were labeled as the 'all-hyb' set. Observing a strong dye bias, we subjected the union of the 'all-hyb' sets for the 8 experiments (7,900 probes in total) to normalization for each slide. We chose the rank invariant method  for selecting non-differentially expressed genes and subsequently a loess fitness method for non-linear normalization using the identified invariant genes. After normalization, scaling procedures were applied to bring the variances of filtered and normalized expression values among the triplicates to the same variation level. Outliers were detected by a Grubb's test (p = 0.01) and flagged. The procedures were carried out using a MadScan BASE plug-in .
To estimate the number of transcripts for each tissue sample, we furthermore identified probes that showed below-threshold hybridizations for one dye but above-threshold hybridizations for the other. We required that all 3 dye intensities for the hybridizing samples to be over the 90 percentile of the median intensity of the 'all-hyb' set for it to be called 'present'. In the case of W23, which was used as the reference, at least five of the six dye intensities for a probe (from six hybridizations) must be larger than thresholds for it to be predicted as 'present' in W23 but 'absent' from the other two lines.
To assess differential expression, the Rank Products  method was used, which is a non-parametric testing against a random simulated background. It proves to be especially robust for our dataset given the presence of a large number of non-hybridizing probes and the single copy representation for most of the probes. To be more conservative, a slight modification to the algorithm was made which required all three log2 ratios to have the same sign ('+' for up-regulation and '-' for down-regulation) in order for a transcript to be picked as differentially expressed. The significance level was set to give a false discovery rate (FDR) of 5%. Hierarchical clustering of expressed genes was performed with EPCLUST , with correlation measure based distance and average linkage clustering methods. We used both normalized log2 ratios and log2 values of normalized absolute median intensities for clustering. When using absolute intensities, first we applied a linear regression to one test sample (either ND101/W23 or A619) based on normalized intensities of the common W23 reference tissue. Very good correlations were found between the W23 references for each of the four sets of experiments (all with r2 > 0.90, data not shown). For W23 intensities, the mean was taken after scaling. Finally the values log2(scaled absolute intensity/median of the all-hyb set in the given tissue) were fed to the clusters.
Array design and analysis - University of Arizona platform
To provide an additional level of confirmation and comparison between in situ synthesis and spotted arrays, six additional spotted arrays were utilized as part of a beta-testing study for the Maize Oligonucleotide Array Project (MOAP) . This platform has approximately 58,000 spotted 70-mer oligonucleotide probes printed on two slides for each array. These were used as technical replicates for the experimental comparisons of the two anther stages between the W23 and A619 background already completed with the Agilent arrays. To minimize differences that might occur from separate cRNA preparations and to utilize valuable labeled sample, the protocol as recommended by MOAP was altered in the following ways. DNA probe immobilization was completed by placing each array DNA-side down over a 42°C water bath for 5 to 10 seconds. Slides were then immediately placed DNA-side up on a 70 to 80°C heat block and snap-dried for 3 to 10 seconds. DNA probes were then UV cross-linked to the slide at 65 mJ for 90 seconds using a UV Stratalinker 1800 (Stratagene, LA Jolle, CA, USA). Slides were incubated for 2 minutes in a 1% SDS bath, washed in a 95°C water bath for 2 minutes with gentle shaking, and finally placed in a water bath at room temperature to rinse briefly. Slides were centrifuged at 500 rpm for 5 minutes to dry, and stored in the dark at room temperature with desiccant until used. Hybridization methods were adopted from Agilent's protocol for processing oligoarrays except that 750 ng of each labeled cRNA sample was combined and hybridized to the slides at 55°C for 15 hours. Slide washing was done according to the MOAP protocol  and scanned in an Agilent G2565BA scanner.
The Arizona probe sequences were blasted against the set of EST contigs and singletons used to generate the 60-mers for the Agilent microarray. For each Arizona 70-mer, the top hit in the same orientation was selected among those with a minimum e-value of 1E-8, a minimum alignment of 68 bases, a maximum of 3 mismatches, and no gaps. There were 3,568 probe matches for slide A and 4,092 for slide B. The distance between each pair of probes was determined by comparing each Arizona probe's blast start position to the start position of the matching Agilent probe within the source EST contig or singleton. Scatter plots were generated using the basic plot function in R.
Gene Ontology analysis
Because currently no maize GO project exists, we used the Blast2Go program  for our GO data mining. Blast2Go started with a Blastx similarity search (with e-value of 1E-10) against the nr NCBI protein database. Statistically significant matches were then assigned to each query, and GO annotations were mapped from known associations. To reduce errors we used GO annotations from the TIGR Maize Gene Index dataset if provided, which covered more than 2,000 hybridizing transcripts on the array. To assess significant over-representation of the GO terms we used Gossip , which takes an heuristic approach to control the family-wise error rate (FWER) as the multiple testing correction and outputs three p values: one for a single test, one adjusted p value to control the FWER, and one adjusted p value to control the FDR. For out tests, we required the p value for the single test to be less than 0.005 and the other two adjusted p values to be less than 0.1.
Quantitative real-time PCR verification
DNase-treated RNA was reverse transcribed with poly-dT primer using a SuperScript III cDNA synthesis kit (Invitrogen), and stored at -20°C. Several reactions were pooled to avoid reaction-related variations. Primers were designed using Primer3  and synthesized by Operon (Huntsville, AL, USA). Primer sequences are provided in Additional data file 6. All primers were tested to ensure amplification of single discrete bands with no primer-dimers. Melting curves were performed on the product to test if only a single product was amplified. Samples were also evaluated on a 2% agarose gel to confirm that a single product of the correct size was generated. The PCR products were purified from the gel and sequenced to verify their identities in some cases. Real-time PCR was carried out in a DNA Engine OPTICON2 (MJ Research, part of Bio-Rad, Hercules, CA, USA). Each reaction contained 1× buffer (with 2 mM MgCl2), 200 μM mixed dNTPs, 0.4 μU DyNAzyme II (MJ Research), 0.5× SYBR Green I (Molecular Probes, part of Invitrogen, Carlsbad, CA, USA), 0.25 μM of each primer, and about 12.5 ng cDNA in a final volume of 20 μl. Three replicates were performed for each sample plus template-free samples as negative controls. Cycling parameters consisted of an initial denaturation step at 94°C for 3 minutes, followed by 35 amplification cycles at 94°C for 15 seconds, 58°C for 15 seconds, and 72°C for 25 seconds. Fluorescence measurements were taken at the end of the annealing phase at 78°C, 82°C, and 86°C. The qRT-PCR data were analyzed using the 'mid-point' method, which calculates amplification efficiencies for each sample from its amplification profile . Two internal standards were used for each tissue stage (Additional data file 6) and results averaged over all biological replications to reduce both systematic and biological variances.
Additional data files
The following additional data are available with the online version of this paper. Additional data file 1 is a spreadsheet listing the putative stage-specific transcripts expressed in 1 mm anthers. Additional data file 2 is a spreadsheet listing the putative stage-specific transcripts expressed in 1.5 mm anthers. Additional data file 3 is a spreadsheet listing the putative stage-specific transcripts expressed in mature pollen. Additional data file 4 is a spreadsheet listing the putative stage-specific transcripts expressed in juvenile leaves. Additional data file 5 is a spreadsheet listing the transcripts with both sense and antisense probes hybridized. Additional data file 6 is a spreadsheet listing the primer sequences, putative gene product, and expression values for the qRT-PCR validation experiments. Additional data file 7 is a spreadsheet listing the transcripts potentially involved in cell cycle regulation and processes.
We are grateful to David Duncan for help with collecting samples. We also thank three anonymous reviewers for their valuable suggestions. This work was supported by the National Science Foundation (98-72657). JM was supported by a National Library of Medicine Genome Training Grant awarded to the Stanford Biomedical Informatics Program.
- Gerdes JT, Tracy WF: Pedigree diversity within the Lancaster Surecrop heterotic group of maize. Crop Sci. 1994, 33: 334-337.View ArticleGoogle Scholar
- Gerdes JT, Behr CF, Coors JG, Tracy WF: Compilation of North American Maize Breeding Germplasm. 1993, Madison WI: Crop Science Society of AmericaGoogle Scholar
- Poethig RS: Heterochronic mutations affecting shoot development in maize. Genetics. 1988, 119: 959-973.PubMedPubMed CentralGoogle Scholar
- Gaut BS: Patterns of chromosomal duplication in maize and their implications for comparative maps of the grasses. Genome Res. 2001, 11: 55-66. 10.1101/gr.160601.PubMedPubMed CentralView ArticleGoogle Scholar
- Neuffer MG, Coe EH, Wessler SR: Mutants of Maize. 1997, Plainview, NY: Cold Spring Harbor Laboratory PressGoogle Scholar
- MaizeGDB. [http://www.maizegdb.org/cgi-bin/locusreports.cgi?id=1]
- Cho Y, Fernandes J, Kim SH, Walbot V: Gene-expression profile comparisons distinguish seven organs of maize. Genome Biol. 2002, 3: research0045-10.1186/gb-2002-3-9-research0045.PubMedPubMed CentralView ArticleGoogle Scholar
- Kiesselbach TA: The structure and reproduction of corn. Nebraska Agric Exp Stn Res Bull. 1949, 161: 1-96.Google Scholar
- Chaubal R, Zanella C, Trimnell MR, Fox TW, Albertsen MC, Bedinger P: Two male-sterile mutants of Zea Mays (Poaceae) with an extra cell division in the anther wall. Am J Bot. 2000, 87: 1193-1201.PubMedView ArticleGoogle Scholar
- Casati P, Walbot V: Gene expression profiling in response to ultraviolet radiation in maize genotypes with varying flavonoid content. Plant Physiol. 2003, 132: 1739-1754. 10.1104/pp.103.022871.PubMedPubMed CentralView ArticleGoogle Scholar
- PlantGDB. [http://plantgdb.org/]
- Fu Y, Emrich SJ, Guo L, Wen TJ, Ashlock DA, Aluru S, Schnable PS: Quality assessment of maize assembled genomic islands (MAGIs) and large-scale experimental verification of predicted genes. Proc Natl Acad Sci USA. 2005, 102: 12282-12287. 10.1073/pnas.0503394102.PubMedPubMed CentralView ArticleGoogle Scholar
- Stuber CW: Mapping and manipulating quantitative traits in maize. Trends Genet. 1995, 11: 477-481. 10.1016/S0168-9525(00)89156-8.PubMedView ArticleGoogle Scholar
- TIGR Maize Gene Index. [http://www.tigr.org/tigr-scripts/tgi/T_index.cgi?species=maize]
- Cigan AM, Unger E, Xu R, Kendall T, Fox TW: Phenotypic complementation of ms45 maize requires tapetal expression of MS45. Sex Plant Reprod. 2001, 14: 135-142. 10.1007/s004970100099.View ArticleGoogle Scholar
- Conesa A, Gotz S, Garcia-Gomez JM, Terol J, Talon M, Robles M: Blast2GO: a universal tool for annotation and visualization in functional genomics research. Bioinformatics. 2005, 21: 3674-3676. 10.1093/bioinformatics/bti610.PubMedView ArticleGoogle Scholar
- Gossip. [http://gossip.gene-groups.net]
- Wakeley PR, Rogers HJ, Rozycka M, Greenland AJ, Hussey PJ: A maize pectin methylesterase-like gene, ZmC5, specifically expressed in pollen. Plant Mol Biol. 1998, 37: 187-192. 10.1023/A:1005954621558.PubMedView ArticleGoogle Scholar
- Merino E, Balbas P, Puente JL, Bolivar F: Antisense overlapping open reading frames in genes from bacteria to humans. Nucleic Acids Res. 1994, 22: 1903-1908.PubMedPubMed CentralView ArticleGoogle Scholar
- Yelin R, Dahary D, Sorek R, Levanon EY, Goldstein O, Shoshan A, Diber A, Biton S, Tamir Y, Khosravi R, et al: Widespread occurrence of antisense transcription in the human genome. Nat Biotechnol. 2003, 21: 379-386. 10.1038/nbt808.PubMedView ArticleGoogle Scholar
- Kiyosawa H, Yamanaka I, Osato N, Kondo S, Hayashizaki Y, RIKEN GER Group., GSL Members: Antisense transcripts with FANTOM2 clone set and their implications for gene regulation. Genome Res. 2003, 13: 1324-1334. 10.1101/gr.982903.PubMedPubMed CentralView ArticleGoogle Scholar
- Dahary D, Elroy-Stein O, Sorek R: Naturally occurring antisense: transcriptional leakage or real overlap?. Genome Res. 2005, 15: 364-368. 10.1101/gr.3308405.PubMedPubMed CentralView ArticleGoogle Scholar
- Wang XJ, Gaasterland T, Chua NH: Genome-wide prediction and identification of cis-natural antisense transcripts in Arabidopsis thaliana. Genome Biol. 2005, 6: R30-10.1186/gb-2005-6-4-r30.PubMedPubMed CentralView ArticleGoogle Scholar
- Chen J, Sun M, Hurst LD, Carmichael GG, Rowley JD: Genome-wide analysis of coordinate expression and evolution of human cis-encoded sense-antisense transcripts. Trends Genet. 2005, 21: 326-329. 10.1016/j.tig.2005.04.006.PubMedView ArticleGoogle Scholar
- Lavorgna G, Dahary D, Lehner B, Sorek R, Sanderson CM, Casari G: In search of antisense. Trends Biochem Sci. 2004, 29: 88-94. 10.1016/j.tibs.2003.12.002.PubMedView ArticleGoogle Scholar
- Casati P, Walbot V: Rapid transcriptome responses of maize (Zea mays) to UV-B in irradiated and shielded tissues. Genome Biol. 2004, 5: R16-10.1186/gb-2004-5-3-r16.PubMedPubMed CentralView ArticleGoogle Scholar
- Morgante M, Brunner S, Pea G, Fengler K, Zuccolo A, Rafalski A: Gene duplication and exon shuffling by helitron-like transposons generate intraspecies diversity in maize. Nat Genet. 2005, 37: 997-1002. 10.1038/ng1615.PubMedView ArticleGoogle Scholar
- Brunner S, Fengler K, Morgante M, Tingey S, Rafalski A: Evolution of DNA Sequence Nonhomologies among Maize Inbreds. Plant Cell. 2005, 17: 343-360. 10.1105/tpc.104.025627.PubMedPubMed CentralView ArticleGoogle Scholar
- Brem RB, Yvert G, Clinton R, Kruglyak L: Genetic dissection of transcriptional regulation in budding yeast. Science. 2002, 296: 752-755. 10.1126/science.1069516.PubMedView ArticleGoogle Scholar
- Schadt EE, Monks SA, Drake TA, Lusis AJ, Che N, Colinayo V, Ruff TG, Milligan SB, Lamb JR, Cavet G, et al: Genetics of gene expression surveyed in maize, mouse and man. Nature. 2003, 422: 297-302. 10.1038/nature01434.PubMedView ArticleGoogle Scholar
- Hubner N, Wallace CA, Zimdahl H, Petretto E, Schulz H, Maciver F, Mueller M, Hummel O, Monti J, Zidek V, et al: Integrated transcriptional profiling and linkage analysis for identification of genes underlying disease. Nat Genet. 2005, 37: 243-253. 10.1038/ng1522.PubMedView ArticleGoogle Scholar
- Decook R, Lall S, Nettleton D, Howell SH: Genetic regulation of gene expression during shoot development in Arabidopsis. Genetics. 2005, doi: 10.1534/genetics.105.042275Google Scholar
- Fu H, Dooner HK: Intraspecific violation of genetic colinearity and its implications in maize. Proc Natl Acad Sci USA. 2002, 99: 9573-9578.PubMedPubMed CentralView ArticleGoogle Scholar
- Birchler JA, Riddle NC, Auger DL, Veitia RA: Dosage balance in gene regulation: biological implications. Trends Genet. 2005, 21: 219-226. 10.1016/j.tig.2005.02.010.PubMedView ArticleGoogle Scholar
- Song R, Messing J: Gene expression of a gene family in maize based on noncollinear haplotypes. Proc Natl Acad Sci USA. 2003, 100: 9055-9060. 10.1073/pnas.1032999100.PubMedPubMed CentralView ArticleGoogle Scholar
- Hughes TR, Mao M, Jones AR, Burchard J, Marton MJ, Shannon KW, Lefkowitz SM, Ziman M, Schelter JM, Meyer MR, et al: Expression profiling using microarrays fabricated by an ink-jet oligonucleotide synthesizer. Nat Biotechnol. 2001, 19: 342-347. 10.1038/86730.PubMedView ArticleGoogle Scholar
- Gene Expression Omnibus. [http://www.ncbi.nlm.nih.gov/projects/geo/query/acc.cgi?acc=GSE3640]
- Saal LH, Troein C, Vallon-Christersson J, Gruvberger S, Borg A, Peterson C: BioArray Software Environment (BASE): a platform for comprehensive management and analysis of microarray data. Genome Biol. 2002, 3: SOFTWARE0003-10.1186/gb-2002-3-8-software0003.PubMedPubMed CentralView ArticleGoogle Scholar
- Tseng GC, Oh MK, Rohlin L, Liao JC, Wong WH: Issues in cDNA microarray analysis: quality filtering, channel normalization, models of variations and assessment of gene effects. Nucleic Acids Res. 2001, 29: 2549-2557. 10.1093/nar/29.12.2549.PubMedPubMed CentralView ArticleGoogle Scholar
- Le Meur N, Lamirault G, Bihouee A, Steenman M, Bedrine-Ferran H, Teusan R, Ramstein G, Leger JJ: A dynamic, web-accessible resource to process raw microarray scan data into consolidated gene expression values: importance of replication. Nucleic Acids Res. 2004, 32: 5349-5358. 10.1093/nar/gkh870.PubMedPubMed CentralView ArticleGoogle Scholar
- Breitling R, Armengaud P, Amtmann A, Herzyk P: Rank products: a simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett. 2004, 573: 83-92. 10.1016/j.febslet.2004.07.055.PubMedView ArticleGoogle Scholar
- EPCLUST. [http://ep.ebi.ac.uk/EP/EPCLUST]
- MOAP. [http://www.maizearray.org/index.shtml]
- Primer3. [http://frodo.wi.mit.edu/cgi-bin/primer3/primer3_www.cgi]
- Peirson SN, Butler JN, Foster RG: Experimental validation of novel and conventional approaches to quantitative real-time PCR data analysis. Nucleic Acids Res. 2003, 31: e73-10.1093/nar/gng073.PubMedPubMed CentralView ArticleGoogle Scholar
This article is published under license to BioMed Central Ltd. This is an open access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.