The ultimate function of the gametophyte is the production of viable offspring through the fusion of the male and female gametes. The process of double fertilization is unique to flowering plants and results in the formation of a diploid (one maternal: one paternal) embryo and typically triploid (two maternal: one paternal) endosperm. Similarities between the male and female gametophytes may result from conserved functions in gamete production or may have arisen from the inheritance of an ancestral condition of bisexual gametophytes found in many non-seed plants (for example, Physcomitrella) . However, the developmental patterns and cellular functions of the gametophytes are quite distinct. Identification of the genes active in the gametophyte generation provides a better understanding of their function, similarities, and uniqueness. To better understand the function of the maize gametophyte generation we have performed a full transcriptome analysis of mature male and female gametophytes using RNA-seq.
Genome-wide expression analysis reveals several implications for maize genome organization. Analysis of expression of genes annotated as transposon-related, as well as analysis of intergenic transcript models with similarity to repeat sequences, reveals that repetitive DNA elements are more likely to be expressed in both the male and female gametophytes than in sporophytic tissues. These data agree with results in Arabidopsis that gametophytes produce RNA from highly repetitive DNA elements [23,27,36]. Perhaps, as in Arabidopsis, in maize this is done as a means for silencing mobile elements in the germline, although the data here do not resolve in which cells these transcripts accumulate or are synthesized. Future experiments are necessary to determine if these transcripts are present in the gametes, whether or not they are transcribed in the gametes themselves, or if, as is the case in Arabidopsis pollen, they are transcribed in subsidiary cells (that is, the antipodal cells and synergids of the female gametophyte and the vegetative cell of the pollen grain). Expression of repetitive elements is not identical between the male and female gametophytes, with a greater likelihood for their expression in the female than in the male.
In Arabidopsis central cells, non-exonic transcripts, including known transposon and other intergenic transcripts, are more common than in other tissues - approximately two- to four-fold more non-exonic transcripts are in central cells than in seedlings or immature floral buds [34,73,74] - raising the possibility that transcriptional activity in ‘intergenic’ regions is a common feature of angiosperm gametophytes. Arabidopsis pollen also has a high frequency of intron reads  as well as expression of TEs . In Arabidopsis, like maize, most predicted intergenic transcripts in the gametophytes are less than 500 bp . However, most of the non-exonic Arabidopsis central cell reads were intronic, suggesting that this is driven in large part by incomplete annotation . In contrast, in maize 90% or more of these reads are intergenic, suggesting that both incomplete annotation and TE transcripts are responsible for the exceptional ES transcriptome. Consequently, true intergenic transcriptional activity may vary between species. The higher expression of transposons and other intergenic sequences in maize embryo sacs may reflect either a higher activity of maize transposons than of those in Arabidopsis or the difference between sampling the whole embryo sac in maize versus the central cell in Arabidopsis. Cell-specific analysis of these transcripts in maize is needed to resolve whether it is one of these two alternatives or a combination of the two. Two classes of transposons are also expressed in rice ovules but it is not known if these are in the embryo sac or the surrounding ovule tissue .
Like the pattern of TE transcripts, intergenic, non-repeat transcripts are more common in ES samples than other tissues. Potential novel genes were defined as gene models assembled directly from the RNA-seq data that lacked homology to known TEs and other repeats. More potential protein-coding novel genes were identified in ES-enriched and MP-enriched gene sets than in sporophyte-enriched sets, with the greatest number present in the embryo sac. The relative inaccessibility of this tissue may have caused embryo sac-specific transcripts to be underrepresented in the expression data used to help build maize gene models, and thus be omitted from annotated gene sets. The high number of gametophyte transcripts intergenic to the WGS may be an additional consequence of the genome-wide relaxation of silencing of repetitive elements (and sequences adjacent to repetitive elements) in the gametophytes compared with the sporophyte. RNA-seq transcript assembly, including the samples in this study, identified lncRNA genes in the maize genome, and many of these were also found to be intergenic to WGS gene models . Interestingly, reproductive tissues, including pollen and embryo sac, had more examples of lncRNA expression than any other tissues characterized.
The pollen transcriptome is also notable for its unusual representation in the two subgenomes of maize. Maize consists of two subgenomes from an ancient allotetraploidy event, with subgenome 2 characterized by reduced expression and reduced gene retention rates relative to subgenome 1 . However, relative to the other three tissues assessed (which conform to expectations), pollen is associated with a significantly greater proportion of expression associated with genes of subgenome 2. This increase in subgenome 2 expression is not due to over-representation of pollen singleton genes in subgenome 2 (that is, genes for which the corresponding subgenome 1 duplicate has been lost over evolutionary time), but rather due to a retention of more duplicate pairs (that is, both subgenome 1 and 2 genes are retained in the genome) and correspondingly fewer pollen singleton genes in subgenome 1. Moreover, both members of a duplicate pair are more likely to be in the MP-enriched transcriptome than duplicates are to be in the other three tissues, consistent with the idea that expression of both plays a functional role in pollen. Thus, selection could be acting to maintain functional copies of both members of pollen-expressed genes following tetraploidization.
The gene balance hypothesis, which emphasizes that the expression dosage of genes encoding members of multi-subunit complexes, components of signal transduction pathways, or TFs needs to be maintained for correct function, has been invoked as an explanation for the retention of duplicates in genomes [76,77]. In one view, this balance would be even more critical in the male gametophyte, and therefore may result in a greater proportion of duplicate retention. First, the male gametophyte is haploid, so loss of one gene copy via mutation after tetraploidization reduces expression by half in the first generation, rather than by one-quarter, as would occur in the diploid. Second, differentiating it from the female gametophyte (which did not show such preferential retention), in an outcrossing species such as maize, pollen and the pollen tube are potentially under more stringent selection than other phases of the life cycle, via intense competition as a haploid for efficient pollen tube germination, tip growth and fertilization processes. Consistent with this idea, pollen-specific genes in an outcrossing relative of Arabidopsis (Capsella grandiflora) are associated with stronger purifying selection and greater proportion of adaptive substitutions than sporophyte-specific genes . In this interpretation of gene balance, one would expect to see a larger percentage of pollen-critical genes to be retained as duplicates in maize, and furthermore, that mutation of either copy should result in a deleterious phenotype. At least one such example has already been described, the rop2/rop9 duplicate pair, although the deleterious effect of rop2 mutation is only revealed when competing with wild-type pollen . This interpretation thus predicts that the MP-enriched duplicate genes identified herein are more likely to be associated with such competitive defects. Consequently, it also suggests that the overrepresented GO category processes identified in the MP-enriched subgenome 2 set (localization, transmembrane transport, and pectinesterase activity) are more likely subject to such dosage sensitivity.
Analysis of GO categories confirms previous results (for example, [10-13]) that regulation of a dynamic cytoskeleton is an important aspect of pollen biology. Additionally, post-translational modification is also overrepresented in the pollen transcriptome. Protein modification (for example, protein phosphorylation) may facilitate the rapid growth reorientations in response to local cues necessary for pollen tube function. In the ES-enriched gene set, regulation of transcription and small peptide DEFLs were overrepresented. Because of the presence of the DEFL gene family in the embryo sac transcriptome, additional small peptide gene families were also analyzed, since they were mostly not included in the GO term analysis. A second family of small signaling peptides, the EAL family, is also overrepresented in the ES transcriptome. Some members of both of these families have previously been shown to have female gametophyte expression [28,58], and to be involved in cell identity  and species-specific interactions with the pollen tube [47,60,61]. Here we have expanded the analysis of these gene families and shown that many members are enriched in the female gametophyte transcriptome. Certain DEFL genes show enriched expression in Arabidopsis central cells , suggesting that at least some of the DEFL enrichment reported here is associated with the central cell of maize. Correlations of gametophyte expression with phylogenetic relationships, including their location in tandem arrays, suggests that female gametophyte expression is an ancestral feature of some branches of both the DEFL and EAL gene families. The DEFL family also has members enriched in both male and female gametophyte transcriptomes. Mirrored expression of these small peptides in the two gametophytes may indicate a mechanism for reciprocal signaling between them. Shared and reciprocal signaling pathways of the male and female gametophyte will be easier to identify and resolve once it is known how cells perceive and respond to these small peptides.
Enrichment for transcriptional regulation in the embryo sac transcriptome was concentrated in five gene families: MADS, NAC, AP2/EREB, MYB-R, and WRKY. The MADS box gene family is also over-represented in Arabidopsis gametophyte transcriptomes. Maize and Arabidopsis both show a prevalence of pollen-expressed genes in the MIKC* family, suggesting that pollen function for MIKC* genes may predate the split between monocots and eudicots. Both maize and Arabidopsis also have members of the type 1 class α MADS genes. However, while in Arabidopsis the type 1 class β genes are overrepresented in female gametophytes, this clade is absent in maize. In maize, these functions may be taken over by other MADS gene clades (for example, the MIKC class, present in the ES-enriched gene set of maize, but not of Arabidopsis). The NAC, AP2, MYB-R, and WRKY gene families are also over-represented in the ES-enriched gene set. An overlapping set of TF families are over-represented in the transcriptome of whole rice ovules, including not only the AP2/EREB and MADS families but also the ABI3, AP2, YABBY, C2H2, HSF, LFY, MYB, and ZfHD families . Many of these differences likely arise from the inability to compare the embryo sac to its surrounding ovule tissue in the rice study, but the shared groups may reflect gametophyte functions in the ancestor of maize and rice.
Mapping expression patterns on a gene phylogeny assists in evolutionary analyses, as a shared expression pattern by multiple members of a clade provides a hypothesis for the expression pattern of the common ancestor of that clade. The notable example of this is in the NAC TF family. A large ES-enriched clade includes duplicate genes from the ancestral maize allotetraploidization, as well as from older expansions of this gene family. In other cases, conserved genes with shared female gametophyte expression are part of a tandem cluster of genes with high similarity, suggesting more recent family expansion. This is seen for clusters of genes in the DEFL and EAL gene families, in which most or all of the genes in the cluster are expressed in the embryo sac. In fact, based on the phylogenetic analyses, the enrichment for female gametophyte expression of these families is apparently largely driven by expansion through tandem duplication. In some cases these tandem arrays are present in multiple grass lineages, as suggested by the maize EA1 and EAL1 genes being less similar to each other than to their rice homologs, which are also present as tandem duplications. In support of this hypothesis, the only one of the three rice EA1/EAL1 genes in the cluster that was assayed by microarray hybridization was expressed in both the egg and synergids, supporting the model that gametophyte expression of these genes reflects shared ancestral gene regulation .
Analysis of mutants and mutant frequencies show that genes significantly enriched in the gametophyte transcriptomes are more likely to be required in the gametophyte than other genes. Mutant frequencies and transmission rates confirm that gametophyte-enriched expression is predictive of a requirement for gametophyte function without making additional accommodations for genetic redundancy. Consequently, the entire transcriptomic dataset is expected to prove useful for identification of candidates for gametophyte mutants, as well as for additional broader analysis of gametophyte functions.