Seeds from Shaw et al.  were obtained for generations indicated in Fig. 1a. Seeds were planted and grown in 16-h day lengths and tissue was harvested from young leaf tissue. Tissue was flash frozen in liquid nitrogen and DNA was isolated using a Qiagen Plant DNeasy kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s protocol.
MethylC-seq library construction and sequencing
Line 12 samples, met1-3, Col-0 wild type, met1 epiRILs, C24 wild type, Ler wild type, and C24-Ler F2 samples were previously published [31, 33, 39, 41, 47, 48]. Raw sequence reads were downloaded and reanalyzed. The libraries for the C24-Ler F2 samples comprised paired end reads, so to maintain consistency with other samples, only the first reads were used.
All other libraries were prepared according to the protocol described in Urich et al. . MethylC-seq libraries were sequenced to 150 bp using Illumina NextSeq500 (Illumina, San Diego, CA, USA).
MethylC-seq sequencing analysis
MethylC-seq reads were processed and aligned as described in . Briefly, reads were trimmed for adapters using Cutadapt v1.1.0 , parameters minimum quality score 10 and minimum read length 30 and aligned to the TAIR10 reference genome  using Bowtie v1.1.0  with parameters “-k 1 –m 1 –chunkmbs 3072 –best –strata –o 4 –e 80 –l 20 –n 0”. Only uniquely mapped reads were retained. Non-conversion rate (the rate which unmethylated cytosines fail to convert to uracil) was calculated from reads aligning to the chloroplast. Positions were considered methylated based on the binomial test followed by Benjamini–Hochberg false discovery rate (FDR) correction. Non-conversion rate was used as the expected probability for the binomial test and only positions with at least three mapped reads were included. Methylation level is computed as the weighted methylation . The weighted methylation level was calculated as mC/(mC + uC) where mC is the number of methylated reads and uC is the number of unmethylated reads.
Identification of methylated regions
Methylomes generated for transgenerational lines 12 and 69 were computationally combined to form pan-methylomes for lines 12 and 69 independently. Specifically, the number of methylated and total (methylated plus unmethylated) reads was summed at each position across all samples in the line. Additionally, unmethylated pan-methylomes were generated by setting methylated reads to zero, while maintaining the total number of reads as reported in each line’s pan-methylome. For each line, the methylpy DMR identification program  was applied, comparing the pan-methylome and unmethylated pan-methylome to identify all C DMRs, i.e., CNN DMRs (N = A, C, G, T). Parameters used were 3000 simulations, 100 significant tests, and 250 bp maximum DMR distance. Of these regions, regions 40+ bp long that had at least ten cytosines covered by at least three reads in the combined methylome were retained. A one-sided z-test was used to test for expected methylation level of 25% in at least one generation, i.e., 25%/8 generations = 3.125% for line 12 and 25%/10 generations = 2.5% for line 69. The resulting P values were adjusted using Benjamini–Hochberg correction (n = 33,208 for line 12 and n = 31,569 for line 69). After computing methylation level for each generation in each line, regions where all generations of a line had a methylation level less than 10% were removed. The mean length for methylated regions was 1138 and 1319 bp for lines 12 and 69, respectively.
Identification of transgenerational epialleles
DMRs were identified for all samples within a line using methylpy as described above. Of these regions, only regions 40+ bp long that had at least ten cytosines covered by at least three reads in all generations of a line and at least 20% difference in methylation level between the highest and lowest methylation level of generations in a line were retained. For each region in a line, a one-sided z-test was performed to test for a significance greater than 25% difference in methylation level between adjacent generations. Resulting P values were adjusted using Benjamini–Hochberg correction (n = 910 for line 12 and n = 1674 for line 69). A region is considered an epiallele between generations with an adjusted P value ≤0.05 (Additional file 1: Tables S13 and S14). Regions that did not overlap with identified methylated regions were eliminated. Regions with at least one epiallele were considered an epilocus. The mean length of epialleles in both lines was 94 bp. Due to the unequal distribution in length between identified epialleles and methylated regions, estimates of stability were computed for regions of 100 bp.
Methylated regions and epialleles identified in lines 12 and 69 were merged. At imperfect overlaps, the minimum start and maximum end was used to create the merged region. Methylated regions, excluding identified epialleles, were ordered using R’s ward.D clustering of the Euclidean distance of sample methylation values between regions . Line 12 epialleles were ordered by R’s ward.D clustering of the Euclidean distance of methylation values for line 69 samples. Line 69 epialleles were similarly ordered using methylation values of line 12 samples. Overlapping epialleles were ordered using methylation values of samples in both lines.
Categorization of epialleles by genomic features
A genomic feature map was created such that each base pair of the TAIR10 genome  was assigned a single feature type (transposable element, promoter, untranslated region, coding sequence, intron, and non-coding RNA) based on the TAIR10 annotation . Promoters were defined as 1 kb upstream of the transcription start site of protein-coding genes. At positions where multiple feature types could be applicable, such as a transposon in an intron or promoter overlapping with adjacent gene, priority was given to non-coding RNA (highest), untranslated regions, introns, coding sequences, promoter, and transposon (lowest). Positions without an assignment were considered intergenic. Genomic feature content of each epiallele and methylated region was assigned proportionally based on the number of bases in each category.
Experimental set-up for genetic cross
An individual from line 49 generation 23 (line 49-G23) and an individual from line 69 generation 19 (line 69-G19) of Shaw et al.  were each self-fertilized and grown. Individuals from the offspring were crossed such that line 49-G24 was the mother of the cross and line 69-G20 was the father. Siblings of these individuals were used for WGBS. Offspring of the cross were grown to create the F1 generation. A single randomly selected individual from the F1 was self-crossed and offspring, the F2 generation, were grown. Twenty individuals from the F2 generation were randomly selected for WGBS and subsequent analysis.
Additionally, the individuals from line 49-G20 and line 69-G24 used for the cross were self-crossed and created lines 49-G′1 and 69-G′1, respectively. One individual from those populations was randomly selected for WGBS and was self-crossed again to create lines 49-G′2 and 69-G′2. A single individual from each population was randomly chosen for WGBS.
Plants were grown in 16-h day lengths and tissue was harvested from young leaf tissue. Tissue was flash frozen in liquid nitrogen and DNA was isolated using a Qiagen Plant DNeasy kit (Qiagen, Valencia, CA, USA) according to the manufacturer’s protocol.
Feature density and definition of centromere
Density as number of genic base pairs per 100 kb was computed for each chromosome. A genic base pair is one that occurred within the gene feature coordinates of the TAIR10  annotation. A spline was constructed using R’s smooth.spline function  and the minimum of that spline was considered the centromere center for the chromosome. For chromosomes 1, 2, 3, and 5, the centromere was defined as 1.5 Mb on either side of the center. This was 13.3–16.3 Mb for chromosome 1, 2.4–5.4 Mb for chromosome 2, 12–15 Mb for chromosome 3, and 10.4–13.4 Mb for chromosome 5. Due to the bimodal gene distribution caused by a heterochromatic knob on chromosome 4, the centromere was defined as 1.6–1.9 Mb and 2.9–5 Mb.
Epigenotyping procedure by MethylC-seq
First, differentially methylated positions in all sequence contexts were identified between the mother and father. A position was considered differentially methylated if coverage was at least three in both parents and only one parent was methylated based on the binomial test. Using these positions, the weighted methylation  was computed for all samples (mother, father, F2s, G′1 s, and G′2 s) keeping only positions with at least three reads in all samples.
Next, each chromosome was analyzed independently. The chromosome was broken into bins of size x. Then bins with less than three positions were combined with a neighboring bin. For each bin, feature vectors (methylation level at all positions within the bin) were created for the mother sample and father sample. These feature vectors were combined to create a mid-parent value (MPV) feature vector as the average methylation between the parents at each position. The G′1 and G′2 samples were used as parental replicates and additional MPV feature vectors were created using the G′1 and G′2 samples as the corresponding MPV replicates. The maternal, paternal, and MPV feature vectors were used to train a logistic regression classifier using the sklearn toolkit v0.17.1  in python v3.5.2. The classifier was run with one-versus-rest multiclass option and the liblinear solver. Classification states were weighted 0.25 for mother and father and 0.5 for MPV since F2s are expected to follow a 1:2:1 ratio. This trained logistic regression classifier was applied to the bin in all samples, including the mother, father, and MPV samples. The predicted probability for each state (maternal, paternal, MPV) was determined and the state with the highest probability at the bin was reported as the preliminary predicted epigenotype.
Then, for each chromosome a transition matrix was computed using all samples except the mother and father. The transition from class l to class k is the sum of bins i, i =1 to N where bin i is class l and bin i +1 is class k. A pseudo-count of 1 was included for each transition, and the transition matrix was normalized.
A hidden Markov model was constructed with three states (mother, mid-parent, and father) for each bin and the transition matrix as transition probabilities. For all states of each bin, the logistic regression prediction probabilities were used as the emission probabilities. Then the forward-backward procedure  was applied to each chromosome of each sample. The forward-backward procedure identifies the most likely state at each bin based on the posterior probability distribution. Centromere regions were masked such that only transition probabilities contributed to the posterior probability, i.e., emission probabilities were 1.0. The state with the highest probability for each bin was reported as the forward-backward prediction and the posterior probabilities were used as forward-backward prediction probabilities.
Next, a new transition matrix was computed using the forward-backward predictions.
The Viterbi algorithm  was applied using forward-backward prediction probabilities as the emission probabilities and new transition matrix as the transition probabilities. It finds the most likely sequence of classes using maximum likelihood. Again, in centromeric regions, only the transition probability was used. The traceback procedure of the Viterbi decoding algorithm was used to assign the final class prediction or predicted epigenotype.
Simulations to test the epigenotyping procedure used the observed methylation values of differentially methylated positions on chromosome 3 for parents of the line 49–line 69 cross. Twenty samples were created with zero to 19 possible breakpoints equally spaced along the chromosome. For all samples, genotype (maternal, heterozygous, paternal) was randomly assigned to each region between breakpoints along the chromosome. Genotype probabilities were 0.25, 0.5, 0.25 for maternal, heterozygous, and paternal to emulate the expected probability of the F2s. Adjacent regions could be assigned the same genotype; thus, a sample with, for example, five possible breakpoints could have zero to five actual breakpoints.
Let x be the expected methylation level of the assigned genotype which equals the maternal, paternal, and mid-parent methylation level at a position. Let y be the error parameter such that the assigned methylation level at a position is randomly chosen between min (0, x – y) and max (x + y, 1). Simulations were run for values of y from 0 to 1 in 0.1 increments. The bin size parameter can have dramatic effects on the results, so the algorithm was run for bin sizes 10, 20, 50, 100, 200, and 500 kb. The epigenotyping procedure was run at each bin size using the observed mother and father samples as the parental samples and 20 simulated samples for a given variability y. Centromeric regions were not specified.
Accuracy of the prediction made by the algorithm was computed as the F1-score using sklearn toolkit  with “micro” for the average parameter using the assigned genotypes as truth. The process of assigning sample genotypes and subsequent analysis was repeated for a total of 25 iterations. The final accuracy reported for each sample, variability, and bin size is the average accuracy of these 25 iterations.
Epigenotyping met1 epiRIL lines
The procedure was applied such that the logistic regression classifier was trained to classify WT, heterozygous, and met1 using the WT sample for the mother and the met1-3 sample for the father. Individuals of F8 epiRILs are expected to have much lower levels of heterozygosity compared to F2 individuals (0.5 vs 0.0078 for F2 and F8, respectively). To account for this, classification states were weighted by 0.4961, 0.0078, and 0.4961 (127:2:127) for mother, MPV, and father, respectively, based on the expected heterozygosity for F8 individuals. There was strong bias towards the mother (WT) for the logistic regression classifier. The prediction probability for the father (met1) was split between the father state and MPV state. The MPV state was not predicted in any sample, including the MPV sample, so the computed transition matrix remained unaffected; however, the forward-backward algorithm overrepresented the mother state due to the biased emission probabilities. To correct this, the emission probabilities used by the forward-backward algorithm were adjusted such that the emission probability of the MPV state was added to the probability of the father state. Due to remethylation events that can occur in epiRILs, only CG positions within the coding regions of gene body methylated genes  that were differentially methylated between WT and met1 were used. The epigenotyping procedure was run using 50-kb bins and centromeres were not specified.
Epigenotyping C24-Ler F2 samples
The epigenotyping procedure was run using cytosines in all sequence contexts and 50-kb bins. Centromeres were defined as previously described. The genetic maps from Greaves et al.  were created using 10-kb bins, so each 50-kb bin from the epigenotyping procedure was separated into five 10-kb bins. Most recombination events occurred in regions where genotype could not be determined in the genetic map. When determining the distance between breakpoints predicted by both maps, if the predicted breakpoint from the epigenotyping procedure occurred within the undetermined genotype region, distance was considered zero. When the predicted breakpoint was outside the undetermined genotype region, distance was calculated from the closest edge of the region to the breakpoint. Bins with unknown genotype were not included when computing agreement between the genetic map and epigenotype map.
Identification of breakpoints and expected crossover number
For all F2 individuals, breakpoints/crossovers were identified along each chromosome where adjacent bins had different epigenotypes. For each chromosome, a Poisson distribution was fitted using the mean number of crossovers. Confidence interval was identified using sample standard error. Expected number of crossovers per chromosome was found given a Poisson distribution with the mean observed for each chromosome. Probabilities were computed for x = 0 – 6 because fewer than one of 20 individuals were expected to be observed with more than six crossovers. In the F2s, no more than five crossovers were observed. An exact multinomial test  was used to test for a difference between expected and observed crossover number for each chromosome. Resulting P values were adjusted with Benjamini–Hochberg correction.
Allele frequency inferred from epigenotype
Allele frequency or genotype ratio was tested using epigenotype predictions of chromosomes 1–5 of F2s from 50-kb bin size. Allele frequency is expected to be 1:1 for maternal and paternal. At each bin, the frequency of each allele was computed and a Chi-squared goodness of fit test was run in R. Resulting P values from the chi-squared test were adjusted using Benjamini–Hochberg correction.
WGBS SNP verification
Based on Ossowski et al. , 36 SNPs exist between line 49 generation 31 and line 69 generation 31. A subset of these SNPs were expected to have occurred by generation 24 in line 49 and generation 20 in line 69. Additionally, SNPs of unmethylated cytosine to thymine cannot be used because they are not differentiable in WGBS. Samtools v1.12 mpileup  was run on the mapped WGBS reads at positions of all possible SNPs to get read coverage at each SNP. Of the original 36 SNPs, 18 were differentiable between the parent samples, i.e., at least one nucleotide was unique to each sample and had sufficient coverage to predict genotype in all F2 samples. Using the unique nucleotides at each SNP, the genotype of F2s was assigned maternal/paternal if it only had reads matching the unique maternal/paternal nucleotide and assigned heterozygous if it had at least one read matching both.
Identification of sibling-specific DMRs in parents
DMRs were identified with methylpy in the CNN context for line 49 (mother, 49-G′1, and 49-G′2) using the same parameters as transgenerational DMRs. Resulting DMRs were filtered by length and difference is weighted methylation between the most and least methylated samples such that only DMRs of at least 40 bp and 25% absolute difference in methylation level were retained. For each remaining DMR, a z-test was performed to calculate the P value for a greater than 25% difference in methylation level pairwise for line 49-G24 (mother) to 49-G′1 and line 49-G24 (mother) to 49-G′2. Resulting P values were corrected with the Benjamini–Hochberg procedure. A DMR was considered an epiallele if at least one comparison was significant, adjusted P value ≤0.05. The same procedure was applied for line 69 with line69-G20 (father), 69-G′1, and 69-G′2.
Identification of DMRs between parents
Using WGBS from line 49-G20 and line 69-G20, the parents of the cross, DMRs in the CNN, or all Cs, context were identified using the same program and parameters as transgenerational DMRs. Resulting DMRs were filtered by length and difference in weighted methylation such that only DMRs of at least 40 bp and 25% absolute difference in methylation level were retained. Regions that overlapped with sibling-specific DMRs were eliminated.
Categorization of epiallele inheritance patterns in F2 samples
At each identified epiallele differing between the parents, methylation level was computed for the parents and F2s. The F2 epigenotype at each epiallele was assigned from the epigenotype map.
Regions were grouped into four categories: expected association, parental dominant, no association, and ambiguous. To assign each region a category, the Games–Howell post-hoc method  was used to compare the difference in mean methylation level for each F2 epigenotype group. For each epiallele, a t-value was obtained for each pairwise comparison between F2 epigenotype groups (maternal, heterozygous, paternal). Regions where a t-value could not be obtained were removed from analysis. Then, 2000 bootstrapped samples were run randomly, assigning the same distribution of epigenotype as the epigenotypes of F2 samples at the epiallele and t-values obtained. This provided a null distribution on t-values to test the observed t-value against. Each comparison was considered significantly different if the observed t-value was greater or equal to the 99th percentile of the bootstrapped t-values.
If all comparisons (maternal–paternal, heterozygous–paternal, heterozygous–maternal) were significantly different, the region was assigned “expected association” as each epigenotype group had a unique mean methylation level. If the average methylation level of all F2 samples was within 10% of one parent’s methylation level and the methylation level of each F2 sample was closer to the same parent’s methylation level, the DMR was assigned “parental dominant”. At regions with no significant heterozygous comparison, there was no association between epigenotype and methylation level and region was assigned “no association”. If only one heterozygous comparison (heterozygous–paternal or heterozygous–maternal) was significant, indicating heterozygous samples had methylation levels similar to one homozygous parental epigenotype, the region was assigned “ambiguous”.