Immune receptors with exogenous domain fusions form evolutionary hotspots in grass genomes

Understanding evolution of plant immunity is necessary to inform rational approaches for genetic control of plant diseases. The plant immune system is innate, encoded in the germline, yet plants are capable of recognizing diverse rapidly evolving pathogens. Plant immune receptors (NLRs) can gain pathogen recognition through point mutation, recombination of recognition domains with other receptors, and through acquisition of novel ‘integrated’ protein domains. The exact molecular pathways that shape immune repertoire including new domain integration remain unknown. Here, we describe a non-uniform distribution of integrated domains among NLR subfamilies in grasses and identify genomic hotspots that demonstrate rapid expansion of NLR gene fusions. We show that just one clade in the Poaceae is responsible for the majority of unique integration events. Based on these observations we propose a model for the expansion of integrated domain repertoires that involves a flexible NLR ‘acceptor’ that is capable of fusion to diverse domains derived across the genome. The identification of a subclass of NLRs that is naturally adapted to new domain integration can inform biotechnological approaches for generating synthetic receptors with novel pathogen ‘traps’.


INTRODUCTION
Plants have powerful defence mechanisms, which rely on an arsenal of plant immune receptors (Jones, Vance and Dangl, 2016;Dodds and Rathjen, 2010). The Nucleotide Binding Leucine Rich Repeat (NLR) proteins represent one of the major classes of plant immune receptors.
Plant NLRs are modular proteins characterized by a common NB-ARC domain similar to the NACHT domain in mammalian immune receptor proteins (Jones, Vance and Dangl, 2016). On the population level, NLRs provide plants with enough diversity to keep up with rapidly evolving pathogens (Hall et al., 2009;Joshi et al., 2013).
With over 50 fully sequenced plant genomes today, it is Hotspots in Plant Immunity Gene Fusions 2 timely to apply comparative genomics approaches to investigate common trends in NLR evolution across the plant kingdom, including key crop species.
In contrast to the highly conserved NB-ARC domains, the Leucine Rich Repeats (LRRs) of NLRs show high variability (Noel et al., 1999;Jacob, Vernaldi and Maekawa, 2013). The functional consequence of high LRR variation is thought to be the generation of novel recognition specificities (Bakker et al., 2006;Sukarta, Slootweg and Goverse, 2016). In addition, recent findings show that novel pathogen recognition specificities can also be acquired through the fusion of non-canonical domains to NLRs (Le Roux et al., 2015;Kroj et al., 2016). These exogenous domains can serve as 'baits' mimicking host targets of pathogen-derived effector molecules and therefore act in concert with LRR variation to broaden the spectra of recognised pathogenderived effectors (Cesari, Bernet al., 2014a;Cesari et al., 2014b;Le Roux et al., 2015).
NLRs plant immune receptors were discovered over 20 years ago through cloning of plant disease resistance genes in Arabidopsis (Mindrinos et al., 1994;Bent et al., 1994). Sequencing of the Arabidopsis genome allowed annotation of the NLR repertoire based on a genomewide scan for the conserved NB-ARC domain that subsequently revealed common and non-canonical NLR architectures. Application of this method to newly sequenced plant genomes has revealed common principles in NLR composition. Additionally, genome scans have contributed to our understanding of the genome-wide architecture of NLRs, including a tendency for NLRs to form major resistance clusters (Christopoulou et al., 2015;Christie et al., 2016). The relatively poor quality of assembled genome sequence in repetitive regions has hampered accurate identification and annotation of NLR genes, which are present at high copy number in the genome and also encode repetitive LRR domains. To overcome this problem, a method called resistance gene enrichment sequencing was developed (Jupe et al., 2013;Witek et al., 2016;Andolfo et al., 2014); it involves enrichment of NLRs from genomic or transcribed DNA and enables their accurate assembly. The identification of NLRs across plant genomes using uniform computational methods, such as scanning genomes with Hidden Markov Models (HMMs) for the NB-ARC domain, has allowed the NLR repertoire to be compared across species (Sarris et al., 2016;Kroj et al., 2016;Yue et al., 2016). This has led to identification of plant families with a significantly expanded or reduced number of NLRs (Sarris et al., 2016;Kroj et al., 2016;Zhang et al., 2016) and the identification of co-evolutionary links between NLR diversification and their regulation by miRNAs (Zhang et al., 2016). Comparative genomics analyses also revealed that formation of NLRs with non-canonical architectures is common across flowering plants (Sarris et al., 2016;Kroj et al., 2016).
The NLR copy number variation identified in genomic and RenSeq scans of different plant genomes has been attributed to the birth and death process of gene evolution (Michelmore, Meyers and Young, 1998). The mechanisms by which new NLR genes are created and upon which selection can act remains elusive. The prevailing consensus holds that NLR diversity is likely to be generated through a variety of mechanisms including duplication, unequal crossing over, non-homologous (ectopic) recombination, gene conversion and transposable elements (Jacob, Vernaldi and Maekawa, 2013 (Vogel et al., 2010;Choulet et al., 2010;Wicker et al., 2016 Figure 1A). One hotspot clade was particularly enriched in NLR-ID proteins (59 % are NLR-IDs) compared to 8% of proteins with NLR-IDs across all clades ( Figure 1A, hotspot 1, highlighted in red). This clade was found to be nested within an outer clade ( Figure 1A, highlighted in blue) with only 0 to 14 % of proteins containing NLR-IDs. These two clades include proteins representative of all the studied grass species with the exception of Z. mays ( Figure 1E). Therefore, we predict that this hotspot clade originated before the split of Panicodae, Ehrhartoidae and Pooidae (BEP and PACCMAD clades) from the rest of the Poaceae 60 MYA (Vogel et al., 2010). Supporting our hypothesis, an outer ancestral clade was apparent ( Figure  It is also clear that NLR(-ID) protein duplication has proliferated most strongly in these species for this hotspot clade ( Figure 1E). However, the relative ratio of NLRs with and without extra domains in this clade has remained relatively constant at around 59% suggesting that the rate of domain recycling has been constant across these species ( Figure 1B; Supplemental Table1).
Two other major NLR-ID hotspots were investigated  To further understand the evolution of ID fusions, the section of the tree in Figure 1 for hotspot 1 and the associated outer and ancestral clades were re-aligned and analyzed by maximum likelihood phylogeny ( Figure   3A;

Genomic locations involved in proliferation and diversification of NLR-IDs
We observed that NLRs from the hotspot clade were found on different chromosomes across and within species. For five species analyzed in this study, the chromosomal location of NLR-IDs was available from the genome annotation. We looked to see whether there was any enrichment of NLR-IDs from the hotspot clade on any particular chromosome and investigated whether these inter-species differences could be explained by whole-genome rearrangement during evolution (Table 2).   (Salse et al., 2008;Clavijo et al., 2016). This indicates that proliferation of NLR-IDs in Triticeae might be linked to greater plasticity of its genome. Since some of the larger translocations in wheat occurred after the formation of NLR-ID hotspot 1, it is also possible that the interaction across members of NLR-ID locus contribute to larger genomic rearrangement events.
When we examined orthologous NLRs located on different wheat sub-genomes, we identified rapid local

Possible mechanisms driving NLR-ID diversification
Any mechanism that creates gene fusions requires a move or a copy and paste event of an exogenous gene from one location to another. Since NLRs from the hotspot clade are mostly found at syntenic locations, yet harbour diverse fusions, it is most likely that these NLRs act as hotspot 'acceptors' for exogenous genes to create NLR-IDs rather than move themselves. We observed that the overall number of NLRs in the hotspot increases proportionally to the total increase of NLRs in the genome. Therefore, we hypothesize that duplication of    (Leister et al., 1998), or alternatively by local activity of transposable elements and endogenous DNA repair machinery as has been previously documented for other types of gene duplications in cereals (Wicker, Buchmann and Keller, 2010).
In the future, the availability of higher quality genome assemblies as well as multiple genomes for each species will allow more detailed analyses of syntenic gene clusters and will identify precise location of DNA

Identification of NLRs and NLR-IDs in plant genomes
NLR plant immune receptors were identified in nine monocot species by the presence of common NB-ARC domain (Pfam PF00931) as described previously (Sarris et al., 2016). T. aestivum (TGAC v1) and A. tauschii genomes (ASM34733v1) were downloaded from EnsemblPlants and analyzed using the same pipeline as before (Sarris et al., 2016). All up to date scripts are available from https://github.com/krasilevagroup/plant_rgenes.

Phylogenetic Analysis
An  Supplemental Table 3: Genomic locations of all NLRs and NLR-IDs present in the tree in Figure 1A, anchored to the genetic map of T. aestivum CS42.