Epigenetic signatures associated with imprinted paternally-expressed genes in 2 the Arabidopsis endosperm 3 4

Background Imprinted genes are epigenetically modified during gametogenesis and maintain the established epigenetic signatures after fertilization, causing parental-specific gene expression. Results In this study, we show that imprinted paternally-expressed genes (PEGs) in the Arabidopsis endosperm are marked by an epigenetic signature of Polycomb Repressive Complex2 (PRC2)-mediated H3K27me3 together with heterochromatic H3K9me2 and CHG methylation, which specifically mark the silenced maternal alleles of PEGs. The co-occurrence of H3K27me3 and H3K9me2 on defined loci in the endosperm drastically differs from the strict separation of both pathways in vegetative tissues, revealing tissue-specific employment of repressive epigenetic pathways in plants. Based on the presence of this epigenetic signature on maternal alleles we were able to predict known PEGs at high accuracy and identified several new PEGs that we confirmed using INTACT-based transcriptomes generated in this study. Conclusions The presence of the three repressive epigenetic marks, H3K27me3, H3K9me2, and CHG methylation on the maternal alleles in the endosperm serves as a specific epigenetic signature that allows to predict genes with parental-specific gene expression. Our study reveals that there are substantially more PEGs than previously identified, indicating that paternal-specific gene expression is of higher functional relevance than currently estimated. The combined activity of PRC2-mediated H3K27me3 together with the heterochromatic H3K9me3 has also been reported to silence the maternal Xist locus in mammalian preimplantation embryos, suggesting convergent employment of both pathways during the evolution of genomic imprinting.

H3K9me2, and CHG methylation 155 We addressed the question whether the presence of H3K9me2 and CHG methylation is 156 functionally relevant for maternal allele repression by testing whether the presence of previously predicted PEGs [12]. The combination of CHG methylation in the central cell 170 together with maternal-specific H3K27me3 and H3K9me2 allowed to predict the highest 171 number of previously described PEGs in Col and Ler accessions (24 out of 42 (57%), 172 [12]) in relation to the number of genes in the category with the highest score ( Fig 4A, 173 S1 Table) and was chosen for further analysis. The category with the highest score was 174 significantly enriched for PEGs (hypergeometric test, P=1.0e-32); while categories with 175 lower scores contained only few PEGs (Fig 4A). Similarly, out of 64 PEGs that had been 176 predicted by a recent study re-evaluating previously published imprintome datasets [21], 177 40 (62%) PEGs were present in the highest score category ( Fig 4A). Nearly half (96 178 genes, 46.4%) of those genes in the highest score category were significantly paternally 179 biased (Chi-square <0.05, Bonferroni corrected, S2 Table), which was significantly more 180 than the 8% paternally-biased genes identified among all genes tested (Hypergeometric 181 test, P= 8.9e-52). We thus conclude that the presence of the three modifications, CHG 182 in the central cell, H3K27me3 and H3K9me2 on maternal alleles in the endosperm 183 allows to predict genes with paternally-biased expression. Paternally biased genes were 184 particularly involved in transcriptional regulation (P=7.54e-5) and chromatin organization 185 (P=1.57e-3), consistent with previous reports on the functional role of PEGs [12,22].

187
Published endosperm transcriptome data contain a substantial fraction of transcripts 188 from the maternal seed coat, which may limit the correct prediction of paternally-biased 189 genes [21]. We hypothesized that there are several genes that based on their 190 epigenetic modifications (score 12, Fig 4A and  and downstream genic regions fused to the green fluorescent protein (GFP) of seven 203 genes belonging to the highest score category but predicted to be maternally 204 (AT2G33620, AT1G43580, AT1G47530, AT1G64660, AT2G30590, AT4G15390) or 205 biallelically expressed (AT5G53160). We detected a GFP signal in the endosperm only 206 for construct AT1G64660 (Fig 5, S3Table). For construct AT1G47530 a signal was 207 detected in the seed coat, while for the other constructs no GFP signal was detected in 208 seeds, indicating that the regulatory elements required for the expression of those 209 genes are located outside the promoter and genic regions used to generate the reporter 210 lines. Reciprocal crosses using the AT1G64660 reporter lines revealed that this gene is 211 indeed a PEG and strongly expressed in the endosperm when paternally, but not when 212 maternally inherited (Fig 5). This data support the hypothesis that seed coat 213 contamination limits the transcriptome-based identification of PEGs.
Ler × Col reciprocal crosses. Isolated RNA was sequenced and profiled for allele-218 specific gene expression. By analyzing the maternal to total reads ratio in each 219 epigenetic category, we confirmed that the genes in the highest score category (group  Table). Following previously established criteria [12], we predicted 107 PEGs 222 that were reciprocally imprinted in both directions of the crosses. There was a 223 significantly higher number of PEGs present in the highest score category compared to 224 a representative random sample of genes with informative reads (Fig 6B). Furthermore, 225 the highest score category had the highest frequency of PEGs, with other categories 226 having significantly fewer PEGs ( Fig 6C). Of the 107 genes that we predicted as PEGs 227 based on our RNA sequencing data, 38 were present in the highest score category, 228 which is significantly more than expected by chance (p=3.6 e-59, Fig 6D). Out of those, 229 20 were previously predicted based on published data [10,12,24,25], while 18 genes 230 are likely new high-confidence PEGs, revealing that PEGs are more common than 231 previously estimated.

233
In this study, we identified the concomitant presence of maternal-specific CHG  Residual DNA was removed using Invitrogen DNase I (Amplification Grade), and cDNA 300 was synthesized using the Fermentas first strand cDNA synthesis kit according to the 301 manufacturer's instructions. Quantitative PCR was performed using a MyiQ5 real-time 302 PCR detection system (Bio-Rad) and Solis BioDyne-5x Hot FIREPol EvaGreen qPCR products were purified and analyzed by Sanger sequencing. For the imprinting-by-305 restriction enzyme digestion assay, the PCR products were purified and digested.

306
Restriction enzymes and primers used are listed in the S5 Table. 307 Data analysis 308 We made use of endosperm-specific ChIP-seq data that have been previously   Table). Based on this  Allele-specific expression analysis 346 We defined a minimum threshold of 20 and 30 informative reads for Col × Ler (2 347 replicates) and Ler × Col (3 replicates) crosses, respectively. Statistical differences 348 between maternal and paternal read counts for each gene were calculated using a Chi-square test, considering genes with a false discovery rate adjusted P-value of less than  ]. Scores are calculated as described in the S1 Representative images of seeds derived after reciprocal crosses of the AT1G64660 590 reporter line (fusion with green fluorescent protein (GFP)) with wild-type (WT) plants.

591
GFP fluorescence was detected in the seed coat when the AT1G64660 reporter was 592 maternally inherited, but endosperm-specific expression was only detected when the 593 AT1G64660 reporter was paternally inherited. Seeds at 2 DAP were used for imaging.

598
The distribution for the total population of genes with informative reads (U) is included.

599
Genes with the highest levels of CHG methylation, H3K27me3, and H3K9me2 show a