Analysis of the recombination landscape of hexaploid bread wheat reveals genes controlling recombination and gene conversion frequency

Background Sequence exchange between homologous chromosomes through crossing over and gene conversion is highly conserved among eukaryotes, contributing to genome stability and genetic diversity. A lack of recombination limits breeding efforts in crops; therefore, increasing recombination rates can reduce linkage drag and generate new genetic combinations. Results We use computational analysis of 13 recombinant inbred mapping populations to assess crossover and gene conversion frequency in the hexaploid genome of wheat (Triticum aestivum). We observe that high-frequency crossover sites are shared between populations and that closely related parents lead to populations with more similar crossover patterns. We demonstrate that gene conversion is more prevalent and covers more of the genome in wheat than in other plants, making it a critical process in the generation of new haplotypes, particularly in centromeric regions where crossovers are rare. We identify quantitative trait loci for altered gene conversion and crossover frequency and confirm functionality for a novel RecQ helicase gene that belongs to an ancient clade that is missing in some plant lineages including Arabidopsis. Conclusions This is the first gene to be demonstrated to be involved in gene conversion in wheat. Harnessing the RecQ helicase has the potential to break linkage drag utilizing widespread gene conversions. Electronic supplementary material The online version of this article (10.1186/s13059-019-1675-6) contains supplementary material, which is available to authorized users.

Note S1: For the Holliday junction ATP-dependent DNA helicase RuvB-like that was associated with CO frequency, we identified two lines with likely knockouts via introduced stop codons and a further six with missense mutations (Supplemental Table S6). We defined CO frequency for each of the eight mutant lines using SNPs that were defined from the TILLING population exome capture data (CO-Phenotype, Methods). To enable comparison, we identified ten control lines from the TILLING population with no mutations in our genes of interest and calculated their CO frequencies (CO-Phenotype). After comparison, the CO frequencies of the knockout RuvB lines and control group largely overlapped with average frequencies of 57.4 and 57.6 respectively (Two tailed t test, P=0.9562, t=0.0558, df=16) (Figure 4a, Methods).
Note S2: For RecQ-7 that was associated with GC frequency, we identified four lines with likely knockouts via introduced stop codons (Supplemental Table S6) and defined GC frequency for each of the four mutant lines (GC-Phenotype, Methods). We used the same control lines for comparison as used for the RuvB analysis but this time calculated their GC frequencies for the lines.
Note S3: The average number of SNPs that were available to perform our GC phenotyping analysis across the Cadenza TILLING lines under analysis was 5,207 with a range from 1,700-9,018. Following on from this 462.5 was the average number of CO/GCs that were identified across the TILLING lines with 405.5 likely GCs. Previously, using the array SNPs, with on average 4335 SNPs available for analysis per population, we defined an average of only 104 GCs per RIL across the 13 populations (Supplemental Table S2). This increase in GC detection for the TILLING lines is thought to be due to a combination of the increased number of SNPs for analysis alongside the larger population size of the TILLING population (1,200 lines compared to the average size of 158 RILs for the 13 previously analyzed populations). This is supported by our identification of 335.5 GCs per RIL in the Paragon x Chinese Spring population where we have a larger number of 8,369 SNPs available and a larger population size than many of the the other RIL populations at 269. Note S4: From Figure 4d, we were able to identify the three homoeologous wheat homologs of the Arabidopsis recombination candidate genes RecQ4A and RecQ4B. We then used the Cadenza TILLING population, to ascertain if knockouts of these homoeologous genes showed CO frequency phenotypes. We were able to identify 18 knockouts across the three homoeologs (Supplemental Table S6). Knockouts of our gene candidates resulted in a decrease in the average GC frequency per line from 498.5 in the control group to 413.7, 460.7 and 446.9 for sub-genome A, B and D respectively (GC-Phenotype), although only the decrease from knockouts of sub-genome A was statistically significant (Two tailed t test, subgenome A; P=0. 0163, t=2.6833, df=16, sub-genome B; P=0.3710, t=0.9243, df=14, subgenome D; P=0.2533, t=1.2466, df=7). Looking at CO frequency, knockouts of our gene candidates resulted in a decrease in the average CO frequency per line from 59 in the control group to 53.5, 53.5 and 54.8 for sub-genome A, B and D respectively (CO-Phenotype), although none of the decreases were statistically significant (Two tailed t test, sub-genome A; P=0.1240, t=1.6236, df=16, sub-genome B; P=0.2426, t=1.2200, df=14, sub-genome D; P=0.3610, t=0.9496, df=12).
Note S5: From Figure 4d, we were able to identify the homoeologous B and D sub-genome copies of our candidate gene RecQ-7 on chromosome 2A. There was an additional low confidence gene that was also closely related to our homoeologous trio and observed on chromosome 2D. We then used the Cadenza TILLING population, to ascertain if knockouts of these homoeologous genes showed similar phenotypes that were observed with the knockout on chromosome 2A. We were able to identify three knockouts of the homoeolog on chromosome 2B, one knockout of the homoeolog on chromosome 2D and an additional knockout of the closely associated low confidence gene on chromosome 2D (Supplemental Table S6). Knockouts of our gene candidates resulted in a decrease in the average GC frequency per line from 498.5 in the control group to 414.4 and 449.4 for sub-genomes B and D respectively (GC-Phenotype). Although a decrease was observed (Supplemental Figure   S9), it was not statistically significant, potentially due to the low number of knockouts under analysis (Two tailed t test, sub-genome B; P=0.1504, t=1.5457, df=11, sub-genome D; P=0.095, t=1.8656, df=9).
Note S6: For GC frequency (as per GC-Phenotype, Methods) we identified multiple robust QTL that were seen in populations other than the Paragon x Chinese spring population.
Firstly, we identified ATP-dependent DNA helicase PIF2 from the Paragon x CIMMYT 47 analysis, for this gene we were unable to identify any Cadenza TILLING lines showing knockouts. We were able to identify two TILLING lines showing missense mutations with sift scores <0.05 in this gene, however, these mutants showed an average GC frequency of 486.5 which, although lower than the control group (498.5), was not significantly different (Welch two tailed t test, p=0.8973, t=0.1628, df=1) (Supplemental Table S7).
Secondly, we identified a gene encoding the protein HIRA from the Paragon x Watkins 94 analysis. For this gene we were able to identify six TILLING lines with potential knockouts i.e. stop codons gained (Supplemental Table S7). These mutants showed an average GC frequency of 439.4 which, although lower than the control group (498.5) by almost 60 GCs, was not significantly different (Welch two tailed t test, p=0.1632, t=1.4945, df=11).
Finally, we identified the WPP domain-interacting protein 1 from the Paragon x Baj analysis.
For this gene we were unable to identify any Cadenza TILLING lines showing knockouts.
However, we were able to identify seven TILLING lines with missense mutations with sift scores <0.05 (Supplemental Table S7). These mutants showed an average GC frequency of 454.5 which, although again lower than the control group (498.5), was not significantly different (Welch two tailed t test, p=0.2069, t=1.3235, df=14). Interestingly, when we went on to define CO frequency for these mutants to determine if our GC-Phenotype could translate to CO frequency, we observed a significant decrease in CO frequency in the mutant group compared to the control group (Welch two tailed t test, P=0.0277, t=2.4562, df=14) with average CO frequencies of 50.7 and 59 for mutant and control respectively. Therefore, this gene could be an important candidate for further analysis.
Note S7: The candidate gene for our third QTL encoded the protein HIRA on chromosome 4B that is needed for chromatin reassembly during DSB repair (Li, X. and Tyler, J.). HIRA mediates DNA synthesis-independent nucleosome assembly and in mammalian cells has been shown to take over for nucleosome assembly on newly replicated DNA in the absence of CAF-1 (Brachet et al., 2015). While, CAF-1 interacts in a DNA damage-dependent manner with the ATP-dependent RNA helicase RecQ (Bernstein et al., 2014;Hoek et al., 2011). This QTL therefore may highlight, within the Paragon x Watkins 94 population, the use of an alternative but complementary pathway to the previously defined QTLs for recombination.
Our final QTL identified the WPP domain-interacting protein 1 on chromosome 5A as a candidate gene alongside the Regulator of chromosome condensation (RCC1) gene. WPP domain-interacting protein 1 mediates and enhances nuclear envelope docking of RANGAP proteins; the cycling of Ran between its GTP-and GDP-bound forms is catalysed by (RCC1) and RANGAP resulting in an intracellular concentration gradient of RanGTP that acts as a GPS for cells with regard to meiotic spindle formation that can affect chromosome segregation (Zhao, et al., 2008;Kalab and Heald, 2008;Cesario and McKim, 2011).
Furthermore, Cadenza TILLING lines showing missense mutations for this gene showed a drop in GC frequency that translated to a significant drop in CO frequency. It is evident from all our QTL analyses, that the maximum peak in our defined QTL interval is consistently within 1Mbp of a functionally relevant gene that has previously been linked to either homologous recombination or DSB repair and these genes represent potential targets for wheat breeding programs in the drive to increase recombination rate. individual and each parental genotype is represented by either black or grey. Each individual resultant RIL is a mosaic of the parental lines with widespread homozygosity between chromosome pairs. By comparing the genotype of each RIL to that of its parental founders, it is possible to see shifts in genotype from alleles specific to Parent 1 to alleles specific to Parent 2. These shifts can be used as an estimate of recombination points or chromosomal COs across the genome to predict where they occurred during population generation.
A P x C47 (GC-Phenotype) B P x W94 (GC-Phenotype) C P x Baj (GC-Phenotype) Chromosome Chromosome Chromosome   The RecQ-D group also includes knockouts for the low confidence gene on chromosome 2D that also shows high similarity to the homoelogous gene trio. Finally, all knockouts, independently of which homoeologous gene is knocked out, are pooled into a single group for comparison (RecQ-ABD).